The Metabolic Pathway Engineering Handbook: Tools and Applications (v. 2)

THE METABOLIC PATHWAY ENGINEERING HANDBOOK Tools and Applications The Metabolic Pathway Engineering Handbook, 1st Edi...

Author: Christina Smolke

111 downloads 1225 Views 11MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

THE METABOLIC PATHWAY ENGINEERING HANDBOOK Tools and Applications

The Metabolic Pathway Engineering Handbook, 1st Edition The Metabolic Pathway Engineering Handbook: Fundamentals The Metabolic Pathway Engineering Handbook: Tools and Applications

THE METABOLIC PATHWAY ENGINEERING HANDBOOK Tools and Applications

Edited by

Christina D. Smolke

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2010 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-13: 978-1-4200-7765-0 (Hardcover) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data The metabolic pathway engineering handbook : tools and applications / editor, Christina D. Smolke. p. ; cm. Includes bibliographical references and index. ISBN 978-1-4200-7765-0 (alk. paper) 1. Genetic engineering--Handbooks, manuals, etc. 2. Biosynthesis--Handbooks, manuals, etc. I. Smolke, Christina D. II. Title. [DNLM: 1. Genetic Engineering--methods. 2. DNA--metabolism. 3. Genetic Techniques. 4. RNA--metabolism. QU 450 M587 2010] TP248.6.M48 2010 660.6’5--dc22 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

2008045948

Contents

Introduction................................................................................................... ix Editor.. ............................................................................................................xv Contributors................................................................................................ xvii

Section I Evolutionary Tools in Metabolic Engineering Claudia Schmidt-Dannert

1 Evolutionary Engineering of Industrially Important

Microbial Phenotypes...................................................................................... 1-1 Stephen S. Fong

2

3 Engineering DNA and RNA Regulatory Regions through Random

Improving Protein Functions by Directed Evolution.................................... 2-1 Nikhil u. Nair and Huimin Zhao

Mutagenesis and Screening.. ........................................................................... 3-1 Ichiro Matsumura, Sean A. Lynch, and Justin P. Gallivan

4 Evolving Pathways and Genomes for the Production of Natural and

Novel Compounds.. ......................................................................................... 4-1

5

Ethan T. Johnson and Claudia Schmidt-Dannert

Models Predicting Optimized Strategies for Protein Evolution.. .................. 5-1 Jonathan J. Silberg and Peter Q. Nguyen

Section II Gene Expression Tools for Metabolic Pathway Engineering Christina D. Smolke

6

Low-Copy Number Plasmids as Artificial Chromosomes.. ........................... 6-1

7

Chromosomal Engineering Strategies............................................................. 7-1

Kristala L. Jones Prather Kenan C. Murphy

v

vi

Contents

8 Regulating Gene Expression through Engineered RNA

Technologies.................................................................................................... 8-1

9

Maung Nyan Win and Christina D. Smolke

Tools Designed to Regulate Translational Efficiency.................................... 9-1 Claes Gustafsson

10 Metabolic Engineering of the Secretory Processing Pathway

in Eukaryotes................................................................................................. 10-1

Mohak Mhatre, Maira P. Pellegrini, and Michael J. Betenbaugh

11 Engineering Multifunctional Enzyme Systems for Optimized

Metabolite Transfer between Sequential Conversion Steps.. ........................ 11-1

Robert J. Conrado, Thomas J. Mansell, and Matthew P. DeLisa

12 Practical Pathway Engineering—Demonstration in

Integrating Tools.. .......................................................................................... 12-1

Sung Kuk Lee and Jay D. Keasling

Section III Application of Emerging Technologies to Metabolic Engineering Jay D. Keasling

13 Genome-Wide Technologies: DNA Microarrays, Phenotypic

Microarrays, and Proteomics........................................................................ 13-1

14

Seh Hee Jang, Mee-Jung Han, Sang Yup Lee, Jong Hyun Choi, and Xiao Xia Xia

Monitoring and Measuring the Metabolome................................................ 14-1 Maria Rowena N. Monton and Tomoyoshi Soga

Section IV Future Prospects in Metabolic Engineering Bernhard ø. Palsson and Sang Yup Lee

15 Systems Biology, Genome-Scale Models, and

Metabolic Engineering.. ................................................................................. 15-1

16 17

Sang Yup Lee, Hyun Uk Kim, Hongseok Yun, Seung Bum Sohn, Jin Sik Kim, Bernhard Ø. Palsson, Markus J. Herrgård, and Vasiliy A. Portnoy

Cell-Free Systems for Metabolic Engineering............................................... 16-1 Kara A. Calhoun and James R. Swartz

In Silico Models for Metabolic Systems Engineering.. .................................. 17-1

Kumar Selvarajoo, Satya Nanda Vel Arjunan, and Masaru Tomita

Contents

vii

Section V Tools for Experimentally Determining Flux through Pathways Ralf Takors

18

19

GC–MS for Metabolic Flux Analysis.. ........................................................... 18-1

Christoph Wittmann

Tools for Measuring Intermediate and Product Formation......................... 19-1

Marco Oldiges

20 Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux

Analysis in Saccharomyces cerevisiae and Penicillium chrysogenum.......... 20-1

Wouter A. van Winden, Roelco J. Kleijn, Walter M. van Gulik, and Joseph J. Heijnen

Section VI Future Applications of Metabolic Engineering Brian F. Pfleger

21 Energy and Cofactor Issues in Fermentation and

Oxyfunctionalization Processes.................................................................... 21-1

Bruno Bühler, Lars M. Blank, Birgitta E. Ebert, Katja Bühler, and Andreas Schmid

22 Microbial Biosynthesis of Fine Chemicals:

An Emerging Technology.............................................................................. 22-1

Zachary L. Fowler and Mattheos Koffas

23 Applications of Metabolic Engineering for Natural

Drug Discovery.............................................................................................. 23-1

24

Yi Tang, Suzanne Ma, and Wladyslaw A. Wojcicki

Metabolic Engineering for Alternative Fuels.. .............................................. 24-1

Yandi Dharmadi and Ramon Gonzalez

Index.. ............................................................................................................ I –1

Introduction

Progression of Biological Synthesis Methods toward Commercial Relevance The advent of recombinant DNA in the 1970s brought transformative technologies for the synthesis and manipulation of artificial genetic material. The ability to amplify, cut, and piece together fragments of DNA outside of a cell and to get (or transform) that DNA into a cell of interest resulted in a set of molecular cloning tools that enabled the field of genetic engineering. In genetic engineering, foreign DNA that encodes for new or altered functions or traits is inserted into an organism of interest. Many early applications of recombinant DNA technology focused on heterologous protein production in microbial hosts. The first medicine made through recombinant DNA technology that was approved by the United States Federal Drug Administration was the synthesis of synthetic “human” insulin in Escherichia coli. This was an important early application of recombinant DNA technology, as the success of producing a safe and effective synthetic hormone in a bacteria led to the widespread acceptance of the technology and significant resources and funding to be directed to its support and advancement. As the technologies in support of synthesizing and manipulating artificial DNA matured and advanced, so did the applications to which they were applied. The early successful applications of recombinant DNA technology resulted in alternative routes to the synthesis of medicines, such as insulin, human growth factor, and erythropoietin, vaccines, and even genetically modified organisms, including crops that exhibit more desirable traits. Technologies were developed for the manipulation of artificial DNA in both prokaryotic and eukaryotic host organisms, including mammalian and plant cells. In addition, inspired by the diversity of natural products, chemicals, and materials synthesized by biological systems that are observed in the natural world, researchers began to look beyond applications that were limited to the synthesis of a single heterologous protein product in a cellular host to more complicated engineering feats. In particular, these new applications focused on the manipulation of sets or combinations of proteins, or enzymes, that acted in conjunction in a cell, within metabolic pathways, to convert energy and precursor chemicals into desired natural and non-natural products. The production of chemicals, materials, and energy through biology presents an alternative to traditional chemical synthesis routes. While the development of chemical synthesis methods for the production of valuable chemicals and small molecule pharmaceuticals is a more mature field and has demonstrated significant successes, many chemicals remain difficult to be synthesized through such strategies, particularly those with many chiral centers. Biological catalysts, or enzymes, have demonstrated remarkable adeptness at the synthesis of very complex molecules. In addition, cellular biosynthesis strategies offer several advantages over traditional chemical synthesis strategies in that the former is often conducted under less harsh conditions, thereby enabling “green” synthesis strategies that are associated with the production of fewer toxic by-products. In addition, cellular biosynthesis ix

x

Introduction

takes advantage of the cell’s natural ability to replenish enzymes and cofactors and to provide precursors from often inexpensive and renewable starting materials. Such advantages are particularly compelling in light of the global challenges we face today in energy, the environment, and sustainability. However, new challenges are presented when manipulating the metabolic pathways in cellular hosts that link energy sources and starting materials to products of commercial interest. The unique challenges faced in engineering metabolic pathways, when compared to the early genetic engineering applications of heterologous protein production, require the development of new enabling technologies, spanning experimental and analytical techniques and computational tools.

The Field of Metabolic Engineering Metabolic engineering is a field that includes the construction, redirection, and manipulation of cellular metabolism through the alteration of endogenous and/or heterologous enzyme activities and levels to achieve the biosynthesis or biocatalysis of desired compounds. Researchers in metabolic engineering often view the biological system as a chemical factory that is converting starting materials to different value-added products. Because the yield or productivity of the process is linked to its commercial viability, the ability to precisely regulate the flow of energy and materials through different cellular pathways becomes critical to the optimization of the overall process, drawing parallels to the more traditional engineering discipline of chemical process design. The basic tenet of metabolic engineering, the use of biology as a technology for the conversion of energy, chemicals, and materials to value-added products, has a long history. Early applications can be cited, even prior to the development of recombinant DNA technology, in the food and beverage industry where more traditional methods of strain development based on evolution, mating, and selection strategies were used to develop more desired production hosts for particular applications. However, recombinant DNA technology enabled the capability to introduce new enzymatic activities and pathways into production hosts allowing access to different energy resources and starting materials and to the production of different chemicals and materials. Such technologies support the forward design of more complex synthetic pathways in host organisms or the targeted manipulation of endogenous pathways, enabling more directed manipulation of the cellular host. Current metabolic engineering efforts are focused on the synthesis of products such as chemical commodities, small molecule drugs, and alternative energy sources including biofuels. In addition, significant effort is also directed to the engineering of host metabolisms to utilize renewable, low cost energy resources. Many of the challenges faced in metabolic engineering are related to the engineering of energy and material flow within complex systems. More specifically, metabolic pathways make up complex interconnected networks in cells, which can rarely be manipulated in isolation of the rest of the network. Highlighting the interconnections between cellular metabolites is the fact that all metabolites are made from a set of 12 common precursors. In addition, the flow of metabolites through a network of enzymes, and in the background of other cellular enzymes that may exhibit activity on these metabolites, is often controlled through layered processes that act at different time scales, implement dynamic feedback control, and utilize localization and transport. Metabolic engineering requires a breadth of skill sets to tackle different points of system design and as a result has developed into a very interdisciplinary field. Researchers with expertise spanning a variety of disciplines, including chemical engineering, biological engineering, environmental engineering, biochemistry, molecular biology, cell biology, bioinformatics, and control theory, are working in different areas of metabolic engineering. However, as an academic endeavor, metabolic engineering has remained an interdisciplinary research discipline with courses covering aspects of the field depending on the expertise of the department in which it is taught. As it has matured, metabolic engineering has gained greater industrial significance. Initial industrial interest was directed to the synthesis of chemical commodities in microorganisms largely at groups within larger chemical companies. However, many smaller startup companies have developed in recent years that are focused on the synthesis of specialty chemicals such as pharmaceuticals and biofuels, on

Introduction

xi

the development of computational and modeling programs to direct metabolic engineering efforts, and on the discovery and development of new enzyme activities in support of engineering new synthetic pathways into host organisms. The intersection of metabolic engineering, with other emerging areas of systems and synthetic biology, presents exciting opportunities to develop solutions to many of the global challenges we face in energy, the environment, health and medicine, resources, and sustainability, and will likely continue to fuel a significant sector of the biotechnology industry in future years.

An Overview of The Metabolic Pathway Engineering Handbook The purpose of The Metabolic Pathway Engineering Handbook is to provide a thorough overview of the field of metabolic engineering. Each section provides an overview of different aspects of a particular topic that is a central component of the field by experts in that area. Sections are introduced by section editors to provide a perspective on the topic and a description of how the chapters in that section link together to form an integrated overview of that particular topic. The sections are split into two books, where the content of the first book focuses on “fundamentals” or basic principles of metabolic engineering and the second book focuses on “tools and applications” in metabolic engineering. Due to its organization, the handbook can be used as a reference book and read for individual sections or chapters, or it can be used as a book for advanced courses in metabolic engineering. Section I in The Metabolic Pathway Engineering Handbook: Fundamentals provides an overview of the basic processes that support cellular metabolism. The boundary of a cell is defined by its cellular membrane, which acts to separate cellular constituents from the environment. Metabolism begins with systems that allow the import of nutrients and starting materials across the cellular membrane and efforts to engineer transport systems for particular chemicals have been important strategies in enabling cells to convert those chemicals to desired products. Once inside the cell, nutrients are broken down into common precursors for metabolic syntheses, which provide the energy and reducing power necessary for cell survival. In addition, precursors are channeled into the synthesis of important building blocks that the cell then utilizes to build larger macromolecules, including lipids, nucleic acids, and proteins. An understanding of the central metabolic pathways and the general flow of metabolism through a small number of common precursors and carriers is critical to being able to effectively link new synthetic nutrient or product pathways to endogenous metabolisms. Finally, the wealth of untapped diversity in nature, particularly in the microbial biosphere, provides significant opportunities in harvesting new enzymatic activities from nature that can be applied to the production of new chemicals and materials in engineered hosts. Section II provides an overview of mass balances and reaction models applied to predicting product formation and microbial growth in fermentation processes. Various models have been proposed and utilized in the field that exhibit varying levels of detail to provide predictions of product yield and cell growth. Conversion rates are calculated from mass balances and rate equations that take into account the basic nutrients and constituents of cellular systems. Different models, such as those based on thermodynamic or metabolic network constraints, can be utilized to predict product yield and cell growth in fermentation processes. Different models may be more or less appropriate based on the specifics of the fermentation. The application of such models to experimental systems can allow minimization of error in detection strategies resulting in optimized control schemes for fermentations based on such experimental measurements. Section III provides an overview of transcriptional regulation of metabolic pathways in bacterial systems. Bacterial cells use a variety of mechanisms to regulate the transcription of enzymes involved in primary and secondary metabolisms. Transcriptional regulatory strategies exist that regulate a small set of genes in response to specific environmental chemicals, such as operon-specific regulation and two-component systems. However, other strategies exist that regulate larger sets of genes in response to significant environmental changes such as heat shock or nitrogen starvation, through sigma factors and global transcriptional factors. An understanding of the strategies used to regulate the expression of

xii

Introduction

enzymes in a cellular host is critical in metabolic engineering to developing effective strategies to alter the expression of endogenous enzymes and to design synthetic systems that exhibit more sophisticated regulatory schemes to balance and coordinate the expression of multiple enzymes to ultimately optimize flux through desired pathways. Section IV is an overview of modeling tools that have been developed for metabolic engineering applications. Earlier modeling and computation efforts that resulted in tools for metabolic flux analysis (MFA) and metabolic control analysis (MCA) have been very powerful for the elucidation of fluxes and control strategies in metabolic networks given partial sets of data. Computation tools based on network and graph concepts have enabled structure and flux analyses that provide optimization tools for metabolic engineering. In addition, metabolic network reconstruction and modeling efforts have resulted in genome-scale models of cellular metabolism for specific organisms based on sets of constraints that enable prediction of flux distributions under different conditions. Whereas multi-scale modeling tools are extending current predictive capabilities by integrating stoichiometry, kinetics, and regulatory and control responses in metabolic networks, such multi-scale tools can be utilized by metabolic engineers to predict the dynamic metabolic response. Section V provides an overview of common cellular hosts that are used in metabolic engineering applications. In particular, the bacterial hosts Escherichia coli, Bacillus subtilis, and Streptomyces have been utilized in various metabolic engineering applications, with E. coli being the most well-developed and utilized host largely due to the genetic tools available for manipulating pathways in this host organism. In addition, two lower eukaryotic hosts, yeast and filamentous fungi, have been utilized in various metabolic engineering applications for the production of natural products or for pathway enzymes that are more readily expressed in functional forms in eukaryotic organisms. Finally, much effort has also been put toward the development of mammalian cell culture hosts for the production of metabolites and products that are more readily produced in mammalian cells. Each host may present advantages and disadvantages in the synthesis of a desired chemical based on the genetic tools available for manipulating pathways and the endogenous metabolism and processing pathways present in that organism, such that the selection of a suitable host is driven largely by the properties of the pathway of interest. Section I in The Metabolic Pathway Engineering Handbook: Tools and Applications provides an overview of the evolutionary tools widely in use in the engineering of metabolic enzymes and networks. Evolutionary strategies have been traditionally used in metabolic engineering to select for desired phenotypes in host organisms. As biological organisms naturally undergo processes of evolution and selection, design strategies that integrate evolutionary engineering objectives with metabolic engineering objectives may result in a more robustly performing engineered cellular system. Directed evolution is a laboratory tool that is used to mimic the evolutionary process in a test tube, by generating diversity in cellular components and then screening or selecting through this diversity for optimized component properties. Various experimental strategies have been utilized for generating and screening through component diversity. In addition, computational tools have been developed that optimize the design of laboratory evolution strategies. These experimental and computational tools have been applied to the directed evolution of enzymes, regulatory systems, pathways, and whole genomes for the optimization of flux through targeted metabolic pathways. Section II provides an overview of gene expression tools that have been utilized in metabolic engineering applications. Various tools have been developed that regulate DNA copy number and enable chromosomal engineering in host organisms. In addition, a variety of other genetic tools have been developed that precisely regulate gene expression levels through post-transcriptional and translational mechanisms. Still other tools have been developed that regulate the activity of enzymes through posttranslational engineering strategies. The application of the tools described in this section is critical to balancing the expression of multiple enzymes, such that individual conversion steps do not limit product yield, toxic intermediates do not accumulate, and cellular resources and energy are efficiently utilized by the host cell. Several examples exist of engineered systems that have utilized such genetic tools for the optimization of flux through metabolic pathways.

Introduction

xiii

Section III provides an overview of emerging technologies and their application to metabolic engineering. Genome-wide technologies that allow global profiling of cellular transcripts, proteins, metabolites, and phenotypes are critical for efficient troubleshooting and debugging of engineered systems. Bioinformatics tools that allow for management and analysis of the vast amounts of data collected from these techniques are also critical. As these technologies mature and become more available, their implementation as standard techniques in metabolic engineering will improve our understanding of the engineered system response and result in efficient troubleshooting and optimization strategies. Section IV provides an overview of key future prospects in metabolic engineering. The integration of new computational tools, such as genome-scale models, and new technologies for analyzing and understanding complex systems, such as systems biology, with metabolic engineering are rapidly advancing the success with which metabolic networks can be forward engineered. In addition, alternative strategies to cellular biosynthesis that remove complications associated with engineering living, evolving systems, such as cell-free synthesis systems, have demonstrated impressive successes. Finally, the modeling and optimization of engineered metabolic pathways in silico, prior to construction and characterization, will significantly transform the field of metabolic engineering and integrate advances in computational modeling, systems biology, and engineering design. Section V provides an overview of common tools that are utilized to determine flux through metabolic pathways. Various types of isotope flux labeling strategies have been widely used to monitor flux through metabolic pathways, where the data from such experiments are typically integrated into the modeling tools described in Section IV. In addition, various analytical strategies are utilized to profile cellular metabolites, where current and future efforts have been focused on developing strategies to profile and quantify global metabolite levels. Section VI provides an overview of various metabolic engineering application areas. One broad application area is focused on the engineering and regulation of the energy state, cofactor supply, and redox balance of cellular hosts. This is a challenge that affects most if not all metabolic engineering applications, where the introduction of new pathways or the manipulation of endogenous pathways can result in imbalances in cellular pathways and stress responses. Metabolic engineering applications are generally directed toward the synthesis of commercially relevant molecules including specialty or commodity chemicals, small molecule drugs, or alternative energy sources. Each of these application areas of metabolic engineering presents distinct challenges that must be addressed in the process design based on chemical and pathway complexity, market cost of the product, volume demand of the product, end use of the product, and purity requirements.

Metabolic Engineering: Looking toward the Future Metabolic engineering as a field has evolved significantly over the past 10 to 15 years in large part due to the scientific and technological advances made during this time frame in support of this application area. The future prospects of metabolic engineering are extremely exciting, and as other supporting scientific and engineering fields mature it is likely to see transformative advances that direct it further toward an engineering discipline. There are several key supporting fields that will aid in directing this transformation. First, enzyme engineering and enzyme discovery will be critical to expanding the diversity of natural and non-natural products that can be produced in engineered organisms. Much of the living world has not been cultured and characterized. Even in those organisms that have been cultured, we do not have genome sequence information, have not mapped functions to many of the sequenced genes, or have not characterized many of the enzyme activities in these organisms. For example, many pathways in plants responsible for the synthesis of diverse pharmacologically relevant molecules have not been elucidated, although many of these activities and their corresponding genes are currently present in large expressed sequence tag (EST) libraries. Because we cannot forward design enzymes to exhibit specific catalytic

xiv

Introduction

activities, the existing limitations in characterized enzyme activities severely limit the pathways that we can reconstruct in organisms. In addition, programs that will allow us to predict and design enzyme function from sequence will be critically enabling for the design of new activities that have not been recovered from natural systems. Second, because metabolic engineering is largely a systems engineering challenge, continued advances in systems biology will provide important insights into the function of biological systems that will inform engineering design and strategies directed at manipulating metabolic pathways. Many analytic techniques in support of systems biology, including strategies that allow global profiling of transcript, protein, and metabolite levels, are providing vast amounts of information regarding levels of cellular constituents under different conditions. In addition, computational tools are being developed to process the vast amounts of data coming from these techniques. Newer and future efforts in systems biology must focus on taking the information coming from these techniques and abstracting from it the organizing principles governing cellular metabolism and regulation. An understanding of how cells generally layer metabolic pathways with different regulatory strategies will allow engineers to design more robustly performing synthetic pathways that are better integrated with the endogenous metabolic pathways. In addition, such understanding will allow better identification of manipulation points in endogenous networks to alter flux through pathways. Third, the integration of information theory and control theory with systems biology and metabolic engineering will likely have a significant impact on our understanding of biological systems. Such tools will enable a deeper understanding of architectures and properties of complex networks that support robustness, evolvability, and fragility of the system, providing a conceptual framework to systems biology. In addition, such tools will allow researchers to more quantitatively examine models of control schemes around metabolic pathways to better elucidate the design principles around regulating flux through metabolic pathways. Such tools can also be used to examine synthetic network and control scheme designs and guide the more effective design of engineered systems. Finally, metabolic engineering is seeing a transformation with the emerging field of synthetic biology. Synthetic biology is the design, construction, and characterization of biological systems using engineering design principles. To support a framework for engineering biology, synthetic biology is rooted in foundational technologies that enable the construction of more complex, heterologous networks in living systems. With advances in DNA sequencing and synthesis it is becoming common practice to synthesize entire genes and pathways from scratch, no longer limiting researchers to the physical DNA that they obtain from natural organisms. In addition, abstraction frameworks have been proposed to enable rapid assembling and reassembling of basic biological components (or parts) into larger networks (or devices) and systems, supporting the rapid prototyping and troubleshooting and reliable construction of complex metabolic pathways in cellular hosts (or chassis). An example of a synthetic biology approach to the rapid prototyping of a metabolic pathway in Escherichia coli was recently described (http://parts. mit.edu/wiki/index.php/MIT_2006). There are also efforts directed to the engineering of specific chassis, or cellular hosts, optimized for metabolic engineering applications. Finally, enabling genetically encoded technologies are being developed for use in precise and quantitative manipulation of pathway components such as enzymes.

Christina D. Smolke

Editor-in-Chief

Editor

Christina Smolke is an assistant professor in the Department of Bioengineering at Stanford University. She graduated with a BS in chemical engineering with a minor in biology from the University of Southern California in 1997. She conducted her graduate training as a National Science Foundation Fellow in the Chemical Engineering Department at the University of California at Berkeley and earned her PhD in 2001. Christina conducted her postdoctoral training as a National Institutes of Health Fellow in cell biology at UC Berkeley. She started her independent research program as an assistant professor in the Division of Chemistry and Chemical Engineering at the California Institute of Technology from 2003– 2008. She has pioneered a research program in developing foundational technologies for the design and construction of engineered ligand-responsive RNA-based regulatory molecules, their integration into molecular computation and signal integration strategies, and their reliable implementation into diverse cellular engineering applications. These technologies are resulting in scaleable platforms for the construction of molecular tools that work across many cellular systems and allow regulation of targeted gene expression levels in response to diverse endogenous or exogenous molecular ligands. Her research is rapidly advancing current capabilities of noninvasive detection of cellular state and programming cellular function. In particular, her laboratory is examining the application of these tools to the optimization of metabolic pathway engineering strategies in organisms such as yeast. Dr. Smolke’s innovative research program has recently been recognized with the receipt of a National Science Foundation CAREER Award, a Beckman Young Investigator Award, an Alfred P. Sloan Research Fellowship, and the listing of Dr. Smolke as one of Technology Review’s Top 100 Young Innovators in the World. She is also a member and adjunct faculty of the Comprehensive Cancer Center’s Cancer Immunotherapeutics Program at the City of Hope, where she has several translationally oriented collaborative projects exploring the clinical applications of these technologies. She is the inventor of over nine patents and serves on the Scientific Advisory Board of Codon Devices. Dr. Smolke is currently serving as the President of the Institute of Biological Engineering. She is a member of AIChE, ACS, the RNA Society, and IBE.

xv

Contributors

Satya Nanda Vel Arjunan Institute for Advanced Biosciences Keio University Tsuruoka, Japan

Michael J. Betenbaugh Department of Chemical and Biomolecular Engineering Johns Hopkins University Baltimore, Maryland

Lars M. Blank

Chemical Biotechnology TU Dortmund and ISAS-Institute for Analytical Sciences Dortmund, Germany

Bruno Bühler

Chemical Biotechnology TU Dortmund Dortmund, Germany

Katja Bühler

Chemical Biotechnology TU Dortmund Dortmund, Germany

Kara A. Calhoun

Department of Chemical Engineering Stanford University Stanford, California

Jong Hyun Choi

Department of Chemical and Biomolecular Engineering BioProcess Engineering Research Center Bioinformatics Research Center Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

Robert J. Conrado

School of Chemical and Biomolecular Engineering Cornell University Ithaca, New York

Matthew P. DeLisa

Stephen S. Fong

Department of Chemical and Life Science Engineering Virginia Commonwealth University Richmond, Virginia

Zachary L. Fowler

Department of Chemical and Biological Engineering State University of New York at Buffalo Buffalo, New York

Justin P. Gallivan

School of Chemical and Biomolecular Engineering Cornell University Ithaca, New York

Department of Chemistry and Center for Fundamental and Applied Molecular Evolution Emory University Atlanta, Georgia

Yandi Dharmadi

Ramon Gonzalez

Department of Chemical and Biomolecular Engineering Rice University Houston, Texas

Birgitta E. Ebert

Chemical Biotechnology TU Dortmund Dortmund, Germany

Department of Chemical and Biomolecular Engineering Rice University Houston, Texas

Claes Gustafsson DNA2.0, Inc. Menlo Park, California

xvii

xviii

Mee-Jung Han

Department of Chemical and Biomolecular Engineering BioProcess Engineering Research Center Bioinformatics Research Center Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

Joseph J. Heijnen

Bioprocess Technology Group Department of Biotechnology Delft University of Technology Delft, the Netherlands

Markus J. Herrgård Department of Bioengineering University of California San Diego, California

Seh Hee Jang

Contributors

Jin Sik Kim

Department of Chemical and Biomolecular Engineering Center for Systems and Synthetic Biotechnology Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

Roelco J. Kleijn

Bioprocess Technology Group Department of Biotechnology Delft University of Technology Delft, the Netherlands

Mattheos Koffas

Department of Chemical and Biological Engineering State University of New York at Buffalo Buffalo, New York

Department of Chemical and Biomolecular Engineering BioProcess Engineering Research Center Bioinformatics Research Center Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

Sang Yup Lee

Ethan T. Johnson

Sung Kuk Lee

Department of Biochemistry, Molecular Biology and Biophysics University of Minnesota St. Paul, Minnesota

Department of Chemical and Biomolecular Engineering Center for Systems and Synthetic Biotechnology Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

School of Nano Bio Chemical Engineering Ulsan National Institute of Science and Technology (UNIST) Ulsan, Korea

Jay D. Keasling

Berkeley Center for Synthetic Biology University of California Berkeley, California

Hyun Uk Kim

Department of Chemical and Biomolecular Engineering Center for Systems and Synthetic Biotechnology Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

Sean A. Lynch

Department of Chemistry and Center for Fundamental and Applied Molecular Evolution Emory University Atlanta, Georgia

Suzanne Ma

Department of Chemical and Biomolecular Engineering University of California Los Angeles, California

Thomas J. Mansell

School of Chemical and Biomolecular Engineering Cornell University Ithaca, New York

Ichiro Matsumura

Department of Biochemistry and Center for Fundamental and Applied Molecular Evolution Rollins Research Center Emory University School of Medicine Atlanta, Georgia

Mohak Mhatre

Department of Chemical and Biomolecular Engineering Johns Hopkins University Baltimore, Maryland

Maria Rowena N. Monton Institute for Advanced Biosciences Keio University Tsuruoka, Japan

Kenan C. Murphy

Molecular Genetics and Microbiology University of Massachusetts Medical School Worcester, Massachusetts

Nikhil U. Nair

Department of Chemical and Biomolecular Engineering University of Illinois at UrbanaChampaign Urbana, Illinois

Peter Q. Nguyen

Department of Biochemistry and Cell Biology Rice University Houston, Texas

Marco Oldiges

Institute of Biotechnology Forschungszentrum Jülich GmbH Jülich, Germany

xix

Contributors

Bernhard Ø. Palsson Department of Bioengineering Univerisity of California San Diego, California

Maira P. Pellegrini

Department of Chemical Engineering/COPPE Federal University of Rio de Janeiro Rio de Janeiro, Brazil

Brian F. Pfleger

Department of Chemical and Biological Engineering University of Wisconsin Madison, Wisconsin

Vasiliy A. Portnoy

Department of Bioengineering University of California San Diego, California

Kristala L. Jones Prather Department of Chemical Engineering Massachusetts Institute of Technology Cambridge, Massachusetts

Andreas Schmid

Chemical Biotechnology TU Dortmund and ISAS-Institute for Analytical Sciences Dortmund, Germany

Claudia Schmidt-Dannert Department of Biochemistry, Molecular Biology, and Biophysics University of Minnesota St. Paul, Minnesota

Kumar Selvarajoo

Institute for Advanced Biosciences Keio University Tsuruoka, Japan

Jonathan J. Silberg

Wouter A. van Winden

Christina D. Smolke

Maung Nyan Win

Department of Biochemistry and Cell Biology Rice University Houston, Texas

Division of Chemistry and Chemical Engineering California Institute of Technology Pasadena, California

Tomoyoshi Soga

Institute for Advanced Biosciences Keio University Tsuruoka, Japan

Seung Bum Sohn

Department of Chemical and Biomolecular Engineering Center for Systems and Synthetic Biotechnology Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

James R. Swartz

Department of Chemical Engineering Stanford University Stanford, California

Ralf Takors

Evonik Degussa GmbH– Health & Nutrition KÜnsebeck, Germany

Yi Tang

Department of Chemical and Biomolecular Engineering University of California Los Angeles, California

Masaru Tomita

Institute for Advanced Biosciences Keio University Tsuruoka, Japan

Walter M. van Gulik

Bioprocess Technology Group Department of Biotechnology Delft University of Technology Delft, the Netherlands

Bioprocess Technology Group Department of Biotechnology Delft University of Technology Delft, the Netherlands

Division of Chemistry and Chemical Engineering California Institute of Technology Pasadena, California

Christoph Wittmann Biochemical Engineering Institute Technische Universitāt Braunscheweig Braunscheweig, Germany

Wladyslaw A. Wojcicki Department of Chemical and Biomolecular Engineering University of California Los Angeles, California

Xiao Xia Xia

Department of Chemical and Biomolecular Engineering BioProcess Engineering Research Center Bioinformatics Research Center Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

Hongseok Yun

Department of Chemical and Biomolecular Engineering Center for Systems and Synthetic Biotechnology Institute for the BioCentury Korea Advanced Institute of Science and Technology Daejeon, Korea

Huimin Zhao

Department of Chemical and Biomolecular Engineering Institute for Genomic Biology University of Illinois at UrbanaChampaign Urbana, Illinois

Evolutionary Tools in Metabolic Engineering

I

Claudia Schmidt-Dannert University of Minnesota

1 Evolutionary Engineering of Industrially Important Microbial Phenotypes Stephen S. Fong.....................................................................................................1-1 Introduction to Evolutionary Engineering • Microbial Evolution • In Vitro Evolution Systems • Evolutionary Engineering Results

2 Improving Protein Functions by Directed Evolution Nikhil U. Nair and Huimin Zhao............................................................................................................................ 2-1 Introduction • Tools of Directed Evolution: Techniques for Generating Mutant Libraries • Tools of Directed Evolution: Screens, Enrichments, and Selections • Successes and Applications of Directed Evolution • Patent and Licensing Issues • Outlook

3 Engineering DNA and RNA Regulatory Regions through Random Mutagenesis and Screening Ichiro Matsumura, Sean A. Lynch, and Justin P. Gallivan....................................................................................................3-1 Introduction • Promoters, Operators, and Enhancers • Practical Approaches to Promoter Mutagenesis and Cloning • Review of Engineered RNA-Based Regulatory Systems • Principles and Protocols for Creating Synthetic Riboswitches • Conclusions

4 Evolving Pathways and Genomes for the Production of Natural and Novel Compounds Ethan T. Johnson and Claudia Schmidt-Dannert...................................4-1 Introduction • Initial Pathway Design and Integration into Host Metabolic Network Optimization of Metabolic Enzyme Levels Using Evolutionary Design Evolving Metabolic Pathways for the Production of New Compounds • Addressing Complex Traits Using Evolutionary Methods • Screening Technologies Conclusions and Future Directions

I-1

I-2

Evolutionary Tools in Metabolic Engineering

5 Models Predicting Optimized Strategies for Protein Evolution Jonathan J. Silberg and Peter Q. Nguyen....................................................................................5-1 Introduction • Random Mutagenesis • Directed Mutagenesis • Recombination Optimizing Chimeric Libraries • Conclusions

T

he development of recombinant DNA technologies in the 1980s enabled the first rational approaches to manipulate the metabolic capabilities of microbial cells. Initial strategies were directed at the manipulation of native pathways in microbial production hosts by overexpression or deletion of individual enzymes. Further developments in the field of genetic engineering combined with an increasing understanding of metabolic processes on the molecular level has enabled more sophisticated metabolic engineering strategies including, for example, the assembly of multienzyme pathways composed of genes selected from different organisms. The analysis of metabolic fluxes and their control has provided means to direct rationally and optimize a desired metabolic output by an engineered cell. Presently, the increasing availability of genomic information provides a seemingly limitless resource for new metabolic functions to be explored and implemented into engineered metabolic pathways for increased production levels and/or synthesis of new compounds. Despite the tremendous advances made in metabolic engineering during the last two decades, rational design efforts are still challenged by the complexity and redundancy of the cellular metabolic network as well as the lack of structural information and biochemical understanding of the metabolic enzyme(s). Recognizing these challenges, engineers have turned to nature as a guide for the development of new design strategies that mimic evolutionary mechanisms. Section I will introduce strategies and applications of evolutionary engineering as they apply to engineering of metabolic enzymes and networks. Evolutionary engineering strategies were first used in metabolic engineering to evolve and select for desired traits or phenotypes in industrially relevant microorganisms. Chapter 1 describes evolutionary engineering strategies applied to whole cell design and contrasts the objectives of such strategies with the objectives of standard metabolic engineering strategies. Whole cell evolutionary engineering strategies attempt to incorporate the natural processes of evolution and selection present in biological organisms into the engineered system design so that a more reliable and stable engineered cellular host will be generated through the combined strategies. The concept of evolutionary engineering has its roots in the development of laboratory evolution strategies, termed directed evolution or in vitro evolution, that aimed to improve the function of individual enzymes and proteins. This approach relies on iterative rounds of mutation and selection for protein variants with desired functional properties. A variety of techniques and strategies have been developed for directed protein evolution, which are now widely applied for the optimization of industrially relevant biocatalysts and proteins. Chapter 2 provides an introduction to the various laboratory evolution methods and critically discusses their strengths and weaknesses. Most evolutionary engineering is focused on optimizing catalytic activities and expression levels of individual enzymes in engineered pathway. However, the optimization of a metabolic pathway not only depends on the manipulation of individual metabolic activities, but also requires the control of metabolic activities along an engineered pathway as well as the integration of such a pathway into the metabolic network of the engineered host cell. Increasingly, methods and approaches are now being developed that allows a recombinant pathway to become part of a heterologous metabolic network. Chapter 3 describes evolutionary engineering strategies that target the regulation of gene expression. The above discussed strategies of directed evolution initially were applied to the optimization of individual enzyme and protein functions but quickly have been extended to the evolution of metabolic enzyme functions and pathways, which are discussed in Chapter 2. The complexity of the cellular metabolic network soon prompted the development of more sophisticated evolutionary strategies that

Evolutionary Tools in Metabolic Engineering

I-3

would introduce multiple mutations throughout the genome of a host to elicit global metabolic changes. Laboratory evolution applied to individual metabolic enzymes, metabolic pathways and eventually whole genomes is discussed in Chapter 4 to show how evolutionary strategies have been successfully utilized to optimize complex cellular machineries for metabolite production. Evolutionary engineering strategies require the generation of mutational diversity and screening and/or selection of variant libraries represent therefore the most laborious and time consuming steps of this approach. The practical feasibility and success of many directed evolution applications is critically dependent on the quality of the library, meaning the fraction of functional variants present in the library, to be screened. As a consequence, a number of quantitative and predictive models have been developed during the last years for the optimization of directed evolution strategies. Chapter 5 therefore surveys and compares computational models and approaches developed for optimizing protein evolution strategies.

1 Evolutionary Engineering of Industrially Important Microbial Phenotypes 1.1 1.2

Introduction to Evolutionary Engineering...................................1-1 Microbial Evolution ��1-3

1.3 1.4

In Vitro Evolution Systems ��1-7 Evolutionary Engineering Results..................................................1-8

The Evolutionary Process • Evolutionary Genotype Changes • Evolutionary Phenotype Changes

Stephen S. Fong Virginia Commonwealth University

Stress Tolerance • Substrate Utilization • Concurrent Selection of Multiple Objectives • Developing New Selection Criteria

References �� 1-11

1.1 Introduction to Evolutionary Engineering Evolutionary engineering is an interesting scientific endeavor blending aspects of biological evolution and directed manipulation. In its application to metabolic engineering, this combination seeks to make use of what a cell inherently knows how to do (grow and evolve) and directs human intervention to achieve some metabolic engineering goal. Throughout this chapter the term “cellular objective” will be used to generically describe the functional purposes inherent in a cell. This is contrasted with a “metabolic engineering objective” that describes the function that we as engineers would like to introduce or improve in a cell. The term evolutionary engineering was first born from a study that used a continuous culture experiment to select for streptomycin-resistant mutants of Streptomyces griseus.1 This study demonstrates some of the key components of evolutionary engineering (Figure 1.1). For evolutionary engineering to work successfully, the natural cellular objective (usually growth for microbes) and the metabolic engineering objective should overlap. This area of overlap provides the basis for implementing an experimental system that provides an appropriate selection pressure during evolution. In the case of the Streptomyces griseus study,1 the cellular objective for growth/survival overlapped with the metabolic engineering objective of antibiotic resistance such that evolution selected for mutants that were able to grow in the presence of streptomycin. Successful implementation of evolutionary engineering for a metabolic engineering application is often difficult as the cellular objective and metabolic engineering objective not only may not overlap, but may be in direct conflict. For most microbes, their most fundamental cellular objectives are to grow and reproduce. Cellular functions associated with growth and reproduction utilize cellular resources to produce biomass components. Thus, if a microbe is trying to grow and/or reproduce it will consume nutrients with the goal of optimizing production of biomass components. This is directly at odds with 1-1

1-2

Evolutionary Tools in Metabolic Engineering

Problem Cellular objective

Metabolic engineering objective

Experimental design

Overlap between cellular and metabolic engineering objectives

Selection of experimental evolution system

Result

Evolutionary selection of desired phenotype

Figure 1.1 Schematic depicting steps involved in the evolutionary engineering process. Identification of overlap between cellular and metabolic engineering objectives leads to selection of an experimental evolution system to produce the desired results.

many metabolic engineering applications where the objective may not be to produce cells. For example, a typical metabolic engineering goal may be to overproduce a chemical compound. Furthermore, it is often the case where the target chemical compound for production is usually not a critical or essential biomass component. In this situation, as the cell attempts to satisfy its cellular objective by focusing on producing biomass components, the amount of resources used for the metabolic engineering objective decrease. Consider a typical metabolic engineering production strain design to illustrate this point of potentially conflicting objectives. The most common production strains are generated using recombinant DNA technology to delete or add genes. If a total of ten genetic manipulations (gene deletions and/ or additions) were introduced to achieve the metabolic engineering objective, this leaves thousands of native genes intact and functional within the cell. While the introduced genetic changes may initially lead to a desired phenotype, the remainder of the cellular network is still focused on achieving its cellular objective (growth/reproduction). Given enough time (evolution) a metabolic network may adjust and compensate for the ten introduced genetic manipulations resulting in a loss of the phenotype desired for the metabolic engineering objective. If this is the case where cellular objectives and metabolic engineering objectives conflict, then why couple evolution to a metabolic engineering objective? The main advantage of evolutionary engineering is that when properly employed, evolutionary engineering results in a stable, whole-cell metabolic engineering design. The difference between this and other metabolic engineering approaches is really a matter of scale. As in the example given above, metabolic engineering designs often involve a small number of directed genetic alterations that largely leaves the remainder of the metabolic network unchanged. This type of design may not be stable as the cell may adjust and compensate for the introduced changes and the scope of the cellular changes is limited to the small number of changes that were directly introduced. In the case of evolutionary engineering, all cellular components are potentially subjected to modification and beneficial modifications will be selected during evolution. In an ideal sense, this means that evolution leads to a phenotype where every cellular function has been optimized. Furthermore, since evolutionary changes are implemented by the cell itself and have been selected during evolution, they are more likely to be retained and stable. Thus, the result of successful evolutionary engineering is an evolutionarily stable strain where potentially all cellular components have been optimized for a certain function. Another advantage of evolutionary engineering is that evolutionary selection will help to screen large numbers of mutants and primarily retain those with beneficial characteristics. This aspect of the evolutionary process is remarkably similar in principle to the iterative process of strain improvement often undertaken in metabolic engineering (Figure 1.2). Metabolic engineering strain improvement is normally a cyclic process2 involving an introduced genetic manipulation, characterization of the effects,

Evolutionary Engineering of Industrially Important Microbial Phenotypes Metabolic engineering

Evolution

Genetic manipulation

Spontaneous mutation

Phenotype testing

Evolutionary selection

1-3

Figure 1.2 Comparison of basic steps involved in metabolic engineering and evolution.

and reevaluation of the design before implementing another genetic change into the strain. Evolution uses similar mechanisms in that genetic changes are introduced via mutations, effects of the mutations are manifested in phenotype, and selection of the fittest mutations occurs as new genetic variants arise starting another cycle. Thus, the principle steps employed in metabolic engineering strain design occur naturally during evolution. The benefit of evolution is that combinatorial genetic changes are naturally introduced resulting in a larger array of genetic variants being examined. In addition, the cyclic selection process occurs during each generation of cells produced, so all of the genetic variants produced are also efficiently screened. Given the potential advantages of whole-cell designs resulting from evolutionary engineering and the similarities in approach between metabolic engineering and the process of evolution, there is a great deal of promise in evolutionary engineering. One of the keys to successful evolutionary engineering is to learn and to understand basic evolutionary principles. This is a critical component in the design phase of any evolutionary engineering project. The remaining sections of this chapter will discuss different aspects central to evolutionary engineering. First, concepts of microbial evolution will be presented to provide background into cellular objectives (and associated mechanisms). Experimental systems commonly used to conduct evolution experiments will be discussed to demonstrate different types of selection pressure that can be imposed experimentally. Finally, several recent examples of evolutionary engineering studies will be presented with an emphasis placed on design aspects. Evolutionary engineering is an approach to strain improvement for metabolic engineering applications that utilizes an organism’s natural ability to grow and evolve. Design of an evolutionary engineering experiment involves determining areas of overlap between a cell’s normal functional objective (often growth) and the metabolic engineering objective. An appropriate experimental evolution system is then chosen to direct evolution toward the target phenotype through evolutionary selection. In this scenario, cellular modifications and evaluation of phenotypes are part of the normal evolutionary process. The engineering input is to design and implement the appropriate evolutionary environment.

1.2 Microbial Evolution The first stage of an evolutionary engineering project is to carefully consider the problem and goal of the project. Metabolic engineering projects are normally started with a specific production problem or goal in mind, however evolutionary engineering projects add another dimension. The native cellular objectives must also be taken into consideration at the beginning of the project as this will affect both the implementation and results of the project. In evolutionary engineering, the cellular objective is as important to generating an improved strain as the metabolic engineering objective. Both objectives must be accounted for when designing an evolution experiment, so in this section aspects of microbial evolution and associated cellular objectives will be discussed.

1-4

Evolutionary Tools in Metabolic Engineering

1.2.1 The Evolutionary Process

Fitness

Generally speaking, evolution is the process by which something changes or adapts over time. Usually implied in this term is a sense that the change is beneficial and represents a new stable functional state. In biology, beneficial changes are those that confer some phenotypic advantage, and they are stably maintained through genetic changes. The process of evolution has often been studied using microbes as an in vitro model due to the numerous advantages associated with microbial growth.3 Microbial systems allow the process of evolution to be studied in real time such that the dynamics, mechanisms, and outcomes of evolution can be monitored. Central to studying microbial evolution is the interplay between the two key components of our definition, the changes in phenotype and genotype. Natural selection (or “survival of the fittest”) largely describes evolution at the level of phenotype change. As a change in behavior or function arises, it will be positively selected and maintained if it is beneficial to the organism. The principle means by which beneficial traits are maintained is at the level of the genotype. However, the phenotype does not dictate genotype changes, rather it is the genotype that specifies the phenotype. Thus, the process can really be seen to originate from genetic changes that lead to altered phenotypes that are selected via natural selection. By this process changes in an organism during evolution are thought to occur in a discrete, stepwise manner over time. Genetic variation will naturally occur within a population in the form of point mutations or genomic rearrangements. Some of the genetic changes may result in a phenotypic change that is incrementally beneficial within the growth environment. This method of incremental phenotypic improvement is often conceptualized as a fitness landscape. A fitness landscape4 is used to describe all of the possible phenotypic variation in an organism. This is typically visualized in three dimensions with valleys representing areas of low fitness and peaks signifying areas of high fitness (Figure 1.3). This means of conceptualizing evolution also helps clarify the possibility that evolution may not always lead to a single global maximum of fitness. This may be illustrated by thinking of the fitness landscape as a mountain range. Your intention is to climb to the highest point possible (maximum fitness) but you cannot see. Regardless of your starting point, the way to get to the highest point would be that every step you take will be up. This will eventually take to you the top of some peak, but you will never know if another mountain exists that is taller. The mechanism by which an organism traverses the fitness landscape is determined by genetic changes and the associated phenotypic changes. Both of these are equally important aspects of the evolutionary process. The following sections will discuss each separately.

O

X

Figure 1.3 Schematic depiction of a phenotypic fitness landscape. The X–Y plane illustrates phenotypic variation and the Z-axis is phenotypic fitness. Peaks indicate areas of highest fitness and valleys indicate areas of low fitness. A sample evolutionary path has been shown using arrows from the starting point (X) to the ending point (O).

Evolutionary Engineering of Industrially Important Microbial Phenotypes

1-5

1.2.2 Evolutionary Genotype Changes Genetic changes are the foundation and driving force for changes that occur during evolution. There are many different types of genetic changes that can occur during evolution. Each genetic change also can vary greatly in its effect on an organism’s phenotype. This section will discuss some types of genetic changes that can occur during evolution. Genetic changes that occur during evolution can be broadly categorized as either single nucleotide polymorphisms (SNPs) or genomic rearrangements. SNPs involve changes in a single base in the DNA sequence. Genomic rearrangements are characterized by genetic changes involving a group of nucleotides that change location on the chromosome, are duplicated, or are deleted. 1.2.2.1 Single Nucleotide Polymorphisms SNPs occur at various rates in vivo due to mismatches that occur during DNA replication. Each organism has a basal mutation rate that is due to the fidelity of an organism’s DNA polymerase and also the presence or absence of DNA proof-reading mechanisms. It has been estimated that the typical error rate (mutations per base pair per genome replication) varies from 0.5 × 10 -10 in Homo sapiens to 7.2 × 10 -11 in Neurospora crassa.5 These estimates of DNA replication error rates allow for the quick estimation of the number of expected point mutations that would occur per cell per generation and thus can give an indication of what changes would be expected during evolution. For example, if the bacterium Escherichia coli has an estimated error rate of 5.4 × 10 -10 mutations per base pair per genome replication5 and a genome size of 4.6 × 106 bases, it can be expected that there would be approximately 0.0025 mutations in each genome per genome replication.5 If we have a culture that reaches a density of 5 × 1011 cells each day and allow evolution to occur for 100 days, we would expect that at least 1.25 × 1011 mutations would have occurred in the evolving population. In this case, over the course of evolution it would be expected that each site on the genome would have experienced, on average, 27,000 mutations (1.25 × 1011/4.6 × 106). A number of different types of SNPs can occur. One of the broad classifications for SNPs is whether the base change is a transition (purine to purine or pyrimidine to pyrimidine) or a transversion (purine to pyrimidine or pyrimidine to purine). Another means of classifying SNPs as either synonymous or nonsynonymous is more directly related to functional implications of the SNP. Since specific codons (groups of three nucleotides) designate specific amino acids in the translation process, a change in a codon that changes the coded amino acid would result in a potentially large change in the resulting protein. A SNP that does not change the amino acid coded for by a specific codon is termed synonymous and will preserve the original protein sequence. A SNP that causes an amino acid change and thus a change in the protein sequence is called a nonsynonymous mutation. 1.2.2.2 Genome Rearrangements In addition to SNPs, large-scale changes in a genome can be found during evolution where discrete segments of DNA are modified. These large-scale changes are normally mediated by transposons, insertion sequence (IS) elements, or native recombinases. Both transposons and IS elements are segments of DNA that are capable of replication and relocation within a chromosome. The main difference is that IS elements are thought to only contain genes associated with transposition and regulation6 whereas transposons can contain genes with additional functions. These mechanisms typically mediate genomic changes such as deletions, duplications, transpositions, and inversions. The size of genomic segments affected by these mechanisms is normally on the order of hundreds to thousands of DNA bases. While SNPs constitute the means of changing the specific base content of DNA, genomic rearrangements can be thought to alter the order of the existing DNA content. Given these two potential mechanisms of genomic change, the relative contributions of each in generating genetic variability is an important and unanswered question. While there is a growing body of evidence that genomic

1-6

Evolutionary Tools in Metabolic Engineering

rearrangements do contribute to genetic diversity during evolution and can confer an adaptive advantage, the specific means by which a genomic rearrangement conveys this advantage may not be entirely clear. However, it is possible that SNPs and rearrangements contribute to genetic variation at different stages of cellular growth as the occurrence of SNPs is directly related to the fidelity of DNA polymerases during DNA replication whereas mechanisms associated with rearrangements are not constrained by any cell replication machinery. 1.2.2.3 Mutator Cells Outside of SNPs and genomic rearrangements, another factor that can affect the generation of genetic variation is a change in a cell’s mutation rate. It has been observed that cells can adjust the rate at which mutations occur as an adaptive response. Changes in mutation rate are normally associated with modifications to different aspects of DNA replication, DNA repair, or elevated transposon activity.3,7 DNA replication can be affected by a change in the concentration of different DNA polymerases being utilized as some polymerases are more prone to replication errors than others. For example, E. coli is known to have at least five different active DNA polymerases involved in DNA replication that greatly vary in their replication fidelity. Under normal growth conditions, Pol II, which is relatively accurate, is the dominant DNA polymerase but the replication errors can start to be introduced if Pol III, Pol IV, or Pol V are induced.7 In addition to the fidelity of the DNA polymerase being used, mutations or changes in the methyl-directed mismatch repair system can also result in higher frequencies of mutation (as much as a 100-fold increase in mutation rate). The activity of transposons or recombination-related mutations can also play a significant role, especially in non-growing populations of cells where DNA replication is not active. The frequency of these recombination-related events is dictated by the activity of recombinases inside the cell, and it has been found that recombinases can be induced under a number of different conditions.7

1.2.3 Evolutionary Phenotype Changes Genetic changes are the starting point for changes that occur during evolution, but evolutionary selection is conducted based upon manifested phenotypes. Regardless of the genetic mechanism behind a phenotypic change, an organism with a large phenotypic advantage is likely to prosper. In evolutionary engineering, the way in which a cellular objective is manifested as a phenotype is of fundamental importance to the design of any project. Thus, it is important to understand what phenotypic traits are beneficial and how these characteristics are brought about during evolution. The most obvious and commonly observed phenotype change is an improvement in cellular growth rate. It is intuitive that evolutionary changes that bring about an increased growth rate would confer an advantage to any cell within a population of growing cells. If a cell grows faster than its neighbors, its progeny will eventually outnumber the slow-growing cells until the slow-growing cells become extinct. This type of phenotype improvement is the most ubiquitous phenotype improvement and represents a dominant selection pressure in microorganisms. A second closely related type of phenotype improvement is an improvement in biomass yield (or metabolic efficiency). The contrast between growth rate and biomass efficiency was studied explicitly by Helling8 using competing parallel pathways for glutamate synthesis in E. coli. In situations where nutrients are finite and limited, having improved biomass efficiency can be advantageous. As a finite nutrient becomes depleted, a cell that can satisfy all of its growth requirements using the smallest amount of input would survive longer than one that requires larger amounts of nutrient. The vast majority of phenotype improvements that are experimentally observed during evolution can be categorized as either growth rate or biomass efficiency improvements. Another broad category of evolutionary changes that occur during evolution can be termed either generalist or specialist changes. When applied to phenotype changes, a generalist phenotype improvement is one where the phenotype change is beneficial to the organism in more than one environment.

Evolutionary Engineering of Industrially Important Microbial Phenotypes

1-7

A specialist phenotype improvement is one where the phenotype change is only confers a benefit in one specific environment and may even be detrimental in other environments. The concept of a specialist phenotype is very closely linked to the genetic concept of Muller’s ratchet.9 The concept of Muller’s ratchet is that deleterious mutations arise as often (if not far more often) as beneficial mutations. Deleterious mutations over time accumulate within a population due to random drift that causes the irreversible loss of clones that do not house deleterious mutations. Thus, as a population continues to grow in an environment, it remains viable in that environment, but the accumulation of background deleterious mutations becomes sufficient that the strain is effectively crippled for growth in any other environment and is therefore highly specialized and only suited to growth in a single environment. Experimental determination of these types of phenotype improvements is highly dependent upon the diversity of conditions used as phenotype changes may exist but are not reflected in the conditions tested and thus remain “silent phenotypes.”10–13 Thus, a range of experimental results have been generated in this area14 with supporting studies for both generalist improvements15,16 and specialist improvements.17–19 An additional category of adaptations that are found in experimental evolution experiments are cross-feeding adaptations. Cross-feeding is the term given to the situation where a population is composed of at least two different subpopulations. If subpopulation A and subpopulation B existed concurrently in the same environment, cross-feeding would be demonstrated by an interconnectedness between the subpopulations where subpopulation A would consume the primary carbon source in the environment and secrete excess carbon as a metabolic by-product. Subpopulation B would then manifest an adaptation where it would fulfill at least a portion of its metabolic needs by consumption of the by-product secreted by subpopulation A. This type of adaptation has been demonstrated for evolution experiments of E. coli grown on glucose where subpopulations were found that were adapted to better utilize acetate that was being secreted by other cells.20,21 Microbial evolution is a natural process that occurs over time to adjust and refine an organism’s functionality in response to its environment. Genomic changes in the form of SNPs and genomic rearrangements generate genetic variation. Genetic changes can lead to changes in manifested cellular phenotypes, and it is the phenotype that is selected during evolution. Selection and improvement occur in a step-wise fashion that preserves beneficial traits. The most common selection pressures are associated with basic growth behaviors. Besides growth rate, other evolutionarily beneficial traits can include improved biomass efficiency and cross-feeding adaptations.

1.3 In Vitro Evolution Systems In previous sections material has been presented on mechanisms and consequences of evolutionary change at both the genotypic and phenotypic levels. All of these changes occur over time in response to environmental stimulus and selection. Thus, the environment and its influence on cellular properties is a critical component in understanding how and why evolution occurs and is an important factor in evolutionary engineering. There is an inherent level of interconnectedness between the environment, a cell’s genotype, and a cell’s phenotype. To study and understand evolutionary changes experimentally, one of the challenges is to establish controlled environments to stimulate consistent cellular changes. In this section, different experimental systems will be presented as means of conducting evolution experiments and imposing different evolutionary selection pressures. The simplest form of cell culture system and environmental selection pressure occurs during batch culture growth. Batch cultures are instituted by aliquoting a finite amount of medium into a culture vessel, inoculating the vessel, and allowing growth to proceed until all of the nutrients are depleted. Evolution experiments can be conducted in this manner by allowing growth to proceed in the batch culture then transferring some of the culture into a new culture vessel containing fresh medium.22 This process of serial transfer continues for the duration of the evolution experiment and the number of generations occurring during growth in each culture vessel is dictated by the size of the inoculum. During each phase of growth, cellular growth rate is not externally constrained until the nutrients are depleted.

1-8

Evolutionary Tools in Metabolic Engineering

This means of conducting evolution experiments tends to apply a growth rate selection pressure on the population. As there is a finite amount of nutrients in each vessel, cells can potentially experience different phases of cellular growth beginning with lag phase, exponential growth phase, and stationary phase as the nutrients become depleted. It has been shown that cellular responses differ greatly between exponential growth and stationary phase, so evolution experiments utilizing serially transferred batch cultures can occur in two different manners. Evolving cultures can be permitted to growth through all phases of growth (including stationary phase) before being passed into fresh medium23 or prolonged exponential growth can be achieved by passing growing cultures into fresh medium before stationary phase is reached.24 A second, and perhaps more commonly used, system for conducting experimental evolution experiments is a chemostat. Developed in 1950, 25 the chemostat is culture system that is designed to maintain a chemically constant growth environment. Fresh medium is continuously added to the culture vessel and cells adjust their growth rate until they reach a state of equilibrium where their growth occurs at a constant rate that is dictated by the limiting concentration of nutrient influx. To maintain stability in the vessel, a constant efflux of medium and cells is also employed. This experimental system has been used in a number of different studies20,26–29 and has the advantage of producing a constant, stable environment for long cultivation periods with a high degree of control over the growth rate of the cells being evolved. In this situation, since the evolving cells are kept at a constant growth rate (dictated by the rate of medium influx) the evolutionary selection pressure is not growth rate. In this environment, cells that are efficient in biomass generation would have a competitive advantage as would any cell that develops a means of avoiding being washed out in the effluent stream (thus, increasing their residence time in the culture vessel). One other point to consider in chemostat cultures is that the growth rate that is set by the influx medium is always below the maximum growth rate of the cells (to avoid washout) and thus since the cells are operating at a submaximal condition, there is an increased possibility that cells would operate using different alternate functional states as they are not as highly constrained by the growth requirements. A third apparatus that has been used to study evolution experimentally is called an auxostat, the most common of which is a turbidostat. This system is similar to a chemostat in that is designed to be continuous cell culture system, however the mechanism by which the culture is maintained is quite different. Turbidostats are designed to monitor cell density through measuring the turbidity of the culture and the culture is kept in growth phase by adding fresh medium to the vessel after the turbidity of the culture has reached a threshold value.30 In this sense, a turbidostat is can be likened to an automated batch culture evolution as the periods of growth are more like that in a batch culture rather than a chemostat. While this type of culture system should impose a relatively similar stringent selection pressure for cellular growth with the advantage of being automated, this system has not been used as extensively as the two previously mentioned culture methods. This is largely due to technical complications involved in maintaining the turbidostat in a good functional condition, as they are highly prone to fouling which makes the turbidity measurements inaccurate.31 After gaining a sense of a cellular objective and determining how to overlap a metabolic engineering objective to the cellular objective, it is necessary to determine how the experiment will actually be carried out. Different systems for conducting evolution experiments all have subtle differences that may influence experimental results. Since evolutionary changes are a biological response to the surrounding environment, the environmental conditions and the means of controlling the environmental conditions are important. Batch cultures, chemostats, and turbidostats all have their advantages and disadvantages which must be carefully weighed before choosing and implementing a system.

1.4 Evolutionary Engineering Results In the previous sections, basic concepts of evolution and experimental evolution systems were discussed. Knowledge in these two areas provides the basis for evolutionary engineering. Understanding of the

Evolutionary Engineering of Industrially Important Microbial Phenotypes

1-9

atural evolutionary process gives insight into mechanistic changes and cellular objectives that will n result from evolution. Different types of experimental evolution systems provide different types of selection pressure. The effective combination of evolutionary principles and appropriate experimental evolution and selection are key components of evolutionary engineering. In this section examples of evolutionary engineering will be presented. A number of examples of evolutionary engineering studies have been previously reviewed. 32 The studies discussed here will focus on recent progress and developments in several key areas of evolutionary engineering. Overviews of four different areas of evolutionary engineering will be presented along with examples for each case.

1.4.1 Stress Tolerance Evolutionary engineering has its roots in the application of developing strains with better tolerance to stress conditions. Environmental stress and metabolic stress constitute the two major categories32 of conditions that can be addressed by environmental engineering. Environmental stresses include variables external to a cell that affect growth behavior. Examples of environmental stress include antibiotics, toxic chemicals, extreme temperatures, and nutrient limitation. Metabolic stresses occur inside a cell and can include intracellular accumulation of cytotoxic chemicals or increased metabolic burdens often associated with heterologous expression or overexpression of proteins. Either type of stress can be addressed using evolutionary engineering with the majority of the work to date focused on environmental stresses. One recent example of using evolutionary engineering to improve cellular function in the presence of an environmental stress used continuous cultures to study acetate tolerance in Acetobacter aceti and Escherichia coli.33 Acetate is a common metabolic by-product and extracellular acetate accumulation is inhibitory to cellular growth and productivity. Biotechnology processes often involve interconversion of chemicals at high concentrations and build-up of acetate can impede the process. The design process for this study can be viewed within the framework presented earlier. The metabolic engineering objective in this case is to produce a strain that has increased tolerance to extracellular acetate such that high concentrations of acetate will not inhibit growth or productivity. The cellular objective in this case is growth and the overlap between the two objectives is that cells that acquired increased tolerance to acetate will grow better in environments containing high concentrations of acetate. This area of overlap dictates that an appropriate evolution experiment should provide prolonged exposure to high concentrations of acetate. In this case, cultures of A. aceti and E. coli were grown in continuous culture for more than 100 generations in media containing high acetate concentrations. This prolonged exposure to acetate allowed cells with improved acetate tolerance to out-compete cells with poor acetate tolerance. Thus, by the end of the experiment the process of evolution had helped to select cells with improved acetate tolerance.

1.4.2 Substrate Utilization A second area of research where evolutionary engineering is commonly employed is improvement of strain substrate utilization. The approach used to improve substrate utilization is very similar to that taken to improve stress tolerance as the substrate utilization problem can be viewed as a variant of nutrient limitation stress. Strains at the beginning of an evolutionary engineering project to improve substrate utilization typically natively possess the capability to process the target substrate at a slow rate or have been engineered to consume the substrate. Evolutionary engineering is used to increase the rate of substrate consumption and also to help ensure that the substrate consumption is a stable component of the phenotype. The most prevalent reason for targeting substrate utilization in evolutionary and metabolic engineering is to reduce the cost of the starting material for a process. One substrate that has been focused upon

1-10

Evolutionary Tools in Metabolic Engineering

is the five carbon sugar xylose. Xylose is an inexpensive carbon substrate as it is found in abundance in plants, however not all organisms process xylose as efficiently as other sugars. Thus, xylose utilization is a common target for evolutionary engineering. Many efforts have been made to improve the ability of the yeast Saccharomyces cerevisiae to utilize xylose34 primarily to reduce the cost of producing ethanol. Evolutionary engineering is well-suited to this application, and recent work on S. cerevisiae has focused on improving the ability to utilize xylose under different growth conditions.35,36 The metabolic engineering objective in this situation is to improve the growth characteristics in environments containing xylose. As with stress tolerance, the cellular objective that will be satisfied is cellular growth. The overlap between these two objectives gives a situation where cells will grow better if they are able to utilize xylose so they will out-compete cells that utilize xylose poorly. Continuous cultures are most often used for this application. Most experiments involving substrate utilization begin with strains that can utilize the substrate to some extent so this problem is meant to improve the rate of utilization. Thus, a series of continuous cultures are often employed with incremental changes in the growth condition to progressively improve the phenotype.

1.4.3 Concurrent Selection of Multiple Objectives To this point, examples of evolutionary engineering have focused on metabolic engineering objectives that are directly related to the cellular objective of growth. If evolutionary engineering can only be utilized in these circumstances, application of evolutionary engineering is very limited in scope. The major challenge in expanding evolutionary engineering to additional applications is the problem of identifying and designing overlap between cellular objectives and the target metabolic engineering objective. Often these objectives can have little in common and in the worst case, may be diametrically opposed to each other. In the examples of stress tolerance and substrate utilization, evolutionary engineering used cellular growth objectives to produce as an end product a strain. In both cases the cellular objective of growth was strongly selected during evolution and led to the desired strain improvements. Given the success of experimentally implementing evolution and growth selection for strain improvement, the question is if the cellular growth objective can be made to overlap with a broader spectrum of metabolic engineering objectives. One method that has been developed that broadens the application of evolutionary engineering while still focusing on the cellular objective of growth is a computational design approach called OptKnock. 37 This approach (and other subsequent variants of it 38,39) is built upon a genome-scale metabolic reconstruction.40 These metabolic reconstructions can be used to simulate and predict growth behaviors.16,24,41,42 The OptKnock design algorithm is a bi-level optimization scheme that concurrently optimizes two different objectives (cellular and metabolic engineering) by implementing gene deletions. The OptKnock algorithm was originally designed to optimize both a cellular objective of growth and a metabolic engineering objective of chemical overproduction such that the chemical production rate would increase proportionally with increases in growth rate (Figure 1.4). The OptKnock algorithm was initially tested for overproduction of lactic acid in E. coli.43 Three different gene deletion designs predicted by OptKnock were constructed and evolved. In these experiments, evolution was conducted by maintaining E. coli in prolonged exponential growth by serial passage of batch cultures. All three strain designs secreted lactic acid, showed stable production of lactic acid throughout the duration of evolution (~1,000 generations), and exhibited increased lactic acid secretion rates as growth rates increased. Thus, the OptKnock design algorithm represents one means of intentionally designing overlap between cellular growth and chemical overproduction such that evolutionary improvements in growth rate will also result in improvement in chemical production rates.

Evolutionary Engineering of Industrially Important Microbial Phenotypes (b)

Coupled production

Chemical secretion rate

Uncoupled production

Chemical secretion rate

(a)

1-11

Growth rate

Growth rate

Figure 1.4 Schematic depiction of relationship between metabolite secretion and growth rate as growth rates increase during evolution. Shaded areas represent feasible metabolic phenotypes. (a) Uncoupled phenotypes utilize metabolic resources to increase growth rate causing a decrease in target metabolite secretion. (b) OptKnock designs stoichiometrically couple metabolite secretion to growth causing increased secretion as growth rate increases.

1.4.4 Developing New Selection Criteria All of the examples described above build upon the cellular objective of growth. This is largely because all microbes naturally have a growth objective and also because it is relatively easy to experimentally impose an evolutionary selection pressure for growth. To further expand the application of evolutionary engineering, methods for applying different evolutionary selection pressures need to be developed. In cases where the metabolic engineering objective is overproduction of a chemical compound, there is often a trade-off between the metabolic engineering objective and the cellular objective of growth. There is a metabolic tug-of-war over cellular resources that is normally biased toward growth. To partially overcome this problem, a new evolutionary engineering strategy has been developed to select for metabolically active quiescent cells. Nondividing (quiescent) cells do not have the same metabolic demands as actively growing cells but still has the capacity to perform regular metabolic functions. Thus, a quiescent cell may be a more efficient biocatalyst than a growing cell. The problem with quiescent cells is how to select for them and also how to improve their functionality. Both of these problems were addressed through the use of short-term ammonium-limited chemostat cultures.44 By setting the dilution rate very low and limiting the amount of ammounium available, the chemostat created a growth condition near the transition from exponential growth to stationary phase. Coupling this system with a means of testing for metabolic activity allowed cells to be isolated that had high metabolic activity but were quiescent. These results demonstrate a new experimental selection pressure and also provide a useful tool in metabolic engineering. Evolutionary engineering can be successfully employed for a metabolic engineering purpose through carefully constructed experiments that account for both cellular and metabolic engineering objectives. The most common evolutionary engineering experiments build upon the growth cellular objective as it is ubiquitous and easily controlled experimentally. Recent developments have expanded the scope of evolutionary engineering applications and provided new avenues from metabolic engineering.

References 1. Butler, P. R., Brown, M., and Oliver, S. G. Improvement of antibiotic titers from Steptomyces bacteria by interactive continuous selection. Biotechnol. Bioeng., 49, 185–96, 1996. 2. Petri, R. and Schmidt-Dannert, C. Dealing with complexity: evolutionary engineering and genome shuffling. Curr. Opin. Biotechnol., 15, 298–304, 2004.

1-12

Evolutionary Tools in Metabolic Engineering

3. Elena, S. F. and Lenski, R. E. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nat. Rev. Genet., 4, 457–69, 2003. 4. Wright, S. Character change, speciation, and the higher taxa. Evolution, 36, 427–43, 1982. 5. Drake, J. W. The distribution of rates of spontaneous mutation over viruses, prokaryotes, and eukaryotes. Ann. N. Y. Acad. Sci., 870, 100–7, 1999. 6. Schneider, D. and Lenski, R. E. Dynamics of insertion sequence elements during experimental evolution of bacteria. Res. Microbiol., 155, 319–27, 2004. 7. Foster, P. L. Adaptive mutation: implications for evolution. Bioessays, 22, 1067–74, 2000. 8. Helling, R. B. Speed versus efficiency in microbial growth and the role of parallel pathways. J. Bacteriol., 184, 1041–45, 2002. 9. Muller, H. J. The relation of recombination to mutational advance. Mutat. Res., 106, 2–9, 1964. 10. Bouche, N. and Bouchez, D. Arabidopsis gene knockout: phenotypes wanted. Curr. Opin. Plant Biol., 4, 111–17, 2001. 11. Raamsdonk, L. M. et al. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nature Biotechnol., 19, 45–50, 2001. 12. Thorneycroft, D., Sherson, S. M., and Smith, S. M. Using gene knockouts to investigate plant metabolism. J. Exp. Botany, 52, 1593–1601, 2001. 13. Weckwerth, W., Loureiro, M. E., Wenzel, K., and Fiehn, O. Differential metabolic networks unravel the effects of silent plant phenotypes. Proc. Natl. Acad. Sci. USA, 101, 7809–14, 2004. 14. Kassen, R. The experimental evolution of specialists, generalists, and the maintenance of diversity. J. Evol. Biol., 15, 173–90, 2002. 15. Fong, S. S., Marciniak, J. Y., and Palsson, B. O. Description and interpretation of adaptive evolution of Escherichia coli K-12 MG1655 by using a genome-scale in silico metabolic model. J. Bacteriol., 185, 6400–8, 2003. 16. Fong, S. S. and Palsson, B. O. Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes. Nat. Genet., 36, 1056–58, 2004. 17. Cooper, V. S. Long-term experimental evolution in Escherichia coli. X. Quantifying the fundamental and realized niche. BMC Evol. Biol., 2, 12, 2002. 18. Cooper, V. S., Schneider, D., Blot, M., and Lenski, R. E. Mechanisms causing rapid and parallel losses of ribose catabolism in evolving populations of Escherichia coli B. J. Bacteriol., 183, 2834–41, 2001. 19. Lenski, R. E. et al. Evolution of competitive fitness in experimental population of E. coli: What makes one genotypes a better competitor than another? Antonie van Leeuwenhoek, 73, 35–47 1998. 20. Helling, R. B., Vargas, C. N., and Adams, J. Evolution of Escherichia coli during growth in a constant environment. Genetics, 116, 349–58, 1987. 21. Treves, D. S., Manning, S., and Adams, J. Repeated evolution of an acetate-crossfeeding polymorphism in long-term populations of Escherichia coli. Mol. Biol. Evol., 15, 789–97, 1998. 22. Atwood, K. C., Schneider, L. K., and Ryan, F. J. Periodic selection in Escherichia coli. Proc. Natl. Acad. Sci. USA, 37, 146–55, 1951. 23. Lenski, R. E. and Travisano, M. Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations. Proc. Natl. Acad. Sci. USA, 91, 6808–14, 1994. 24. Ibarra, R. U., Edwards, J. S., and Palsson, B. O. Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature, 420, 186–89, 2002. 25. Novick, A. and Szilard, L. Description of the chemostat. Science, 112, 715–16, 1950. 26. de Crecy-Lagard, V. A., Bellalou, J., Mutzel, R., and Marliere, P. Long term adaptation of a microbial population to a permanent metabolic constraint: overcoming thymineless death by experimental evolution of Escherichia coli. BMC Biotechnol., 1, 10, 2001. 27. Dykhuizen, D. E., Dean, A. M., and Hartl, D. L. Metabolic flux and fitness. Genetics, 115, 25–31, 1987. 28. Notley-McRobb, L. and Ferenci, T. Adaptive mgl-regulatory mutations and genetic diversity evolving in glucose-limited Escherichia coli populations. Environ. Microbiol., 1, 33–43, 1999.

Evolutionary Engineering of Industrially Important Microbial Phenotypes

1-13

29. Weikert, C., Sauer, U., and Bailey, J. E. Use of a glycerol-limited, long-term chemostat for isolation of Escherichia coli mutants with improved physiological properties. Microbiology, 143, 1567–74, 1997. 30. Northrop, J. H. Apparatus for maintaining bacterial cultures in the steady state. J. Gen. Physiol., 38, 105–15, 1954. 31. Sorgeloos, P., Vanoutryve, E., Persoone, G., and Cattoirreynaerts, A. New type of turbidostat with intermittent determination of cell density outside culture vessel. Appl. Environ. Microbiol., 31, 327–31, 1976. 32. Sauer, U. Evolutionary engineering of industrially important microbial phenotypes. Adv. Biochem. Eng. Biotechnol., 73, 129–69, 2001. 33. Steiner, P. and Sauer, U. Long-term continuous evolution of acetate resistant Acetobacter aceti. Biotechnol. Bioeng., 84, 40–44, 2003. 34. Hahn-Hagerdal, B. et al. Metabolic engineering of Saccharomyces cerevisiae for xylose utilization. Adv. Biochem. Eng. Biotechnol., 73, 53–84, 2001. 35. Kuyper, M. et al. Evolutionary engineering of mixed-sugar utilization by a xylose-fermenting Saccharomyces cerevisiae strain. FEMS Yeast Res., 5, 925–34, 2005. 36. Sonderegger, M. and Sauer, U. Evolutionary engineering of Saccharomyces cerevisiae for anaerobic growth on xylose. Appl. Environ. Microbiol., 69, 1990–98, 2003. 37. Burgard, A. P., Pharkya, P., and Maranas, C. D. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng., 84, 647–57, 2003. 38. Patil, K. R., Rocha, I., Forster, J., and Nielsen, J. Evolutionary programming as a platform for in silico metabolic engineering. BMC Bioinformatics, 6, 308, 2005. 39. Pharkya, P., Burgard, A. P., and Maranas, C. D. OptStrain: a computational framework for redesign of microbial production systems. Genome Res., 14, 2367–76, 2004. 40. Price, N. D., Papin, J. A., Schilling, C. H., and Palsson, B. O. Genome-scale microbial in silico models: the constraints-based approach. Trends Biotechnol., 21, 162–69, 2003. 41. Covert, M. W., Knight, E. M., Reed, J. L., Herrgard, M. J., and Palsson, B. O. Integrating highthroughput and computational data elucidates bacterial networks. Nature, 429, 92–96, 2004. 42. Edwards, J. S., Ibarra, R. U., and Palsson, B. O. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nature Biotechnol., 19, 125–30, 2001. 43. Fong, S. S. et al. In silico design and adaptive evolution of Escherichia coli for production of lactic acid. Biotechnol. Bioeng., 91, 643–48, 2005. 44. Sonderegger, M., Schumperli, M., and Sauer, U. Selection of quiescent Escherichia coli with high metabolic activity. Metab. Eng., 7, 4–9, 2005.

2 Improving Protein Functions by Directed Evolution 2.1 2.2

Introduction ��2-1 ools of Directed Evolution: Techniques for Generating T Mutant Libraries ��2-3

Random Mutagenesis • Gene Recombination • Semirational Approaches

2.3

ools of Directed Evolution: Screens, Enrichments, and T Selections ��2-17

Cartesian Co-Localization • Compartmentalization Physical Linkage

2.4

Nikhil U. Nair University of Illinois at Urbana-Champaign

Huimin Zhao University of Illinois at Urbana-Champaign

Successes and Applications of Directed Evolution....................2-22

Altering Catalytic Activity • Improving Enzyme Stability Evolving Proteins with Therapeutic Value • Understanding Natural Evolution

2.5 Patent and Licensing Issues ��2-26 2.6 Outlook ��2-26. Acknowledgments ��2-27 References ��2-27

2.1 Introduction While the concept of Darwin’s evolution has yet remained only a theory, its impact has led to the birth of a field we now call directed evolution. Directed evolution, or perhaps more aptly premeditated synthetic evolution, is undertaken in the laboratory with a desired end. This, perhaps more than anything else, differentiates contemporary directed evolution from natural evolution. Synthetic evolution is usually undertaken at the level of a single gene, which is primarily due to current technological limitations rather than constraints of its definition; although several successes have already emerged in the synthetic evolution of complete pathways and organisms [1,2]. Application of directed evolution promises to answer various industrial, medicinal, and perhaps even anthropological evolutionary questions. Interest in evolving biological systems stems from the fact that they can be highly efficient microfactories able to metabolize and produce a variety of natural and unnatural compounds and their use in everyday life even predates the notion of quantized, cellular life. While uses of biological systems in fermentation, baking, and composting are commonplace, their value in chemical and therapeutic industries spurred extensive research in new ways to harness their power. Conceptually, directed evolution is as simple as it is ingenious. Its advantage is that very little knowledge of the system to be improved is required a priori. Crystal structures and mechanistic data—vital 2-1

2-2

Evolutionary Tools in Metabolic Engineering

for the traditional, rational design of proteins—are unnecessary. In spite of recent advances enabling rational design with relatively rapid elucidation of crystal structures by NMR, x-ray crystallography, or homology modeling, the initial investment of time is significant. Even empowered with such data, our limited understanding of factors determining tertiary protein structure and the inability to accurately predict critical molecular interactions does not allow us to find distant allosterically linked residues for rational design. However, random mutagenesis has been shown to identify and target such sites with remarkable success [3–8]. One of the first uses of in vitro evolution technology can be attributed to Mills et al. [9] as early as 1967, but it did not gain widespread acceptance as a bona fide field until the 1990s when molecular biology techniques had matured to allow for its commercial applications. Most successes have focused on evolving single enzymes for higher stability or activity, but recent research has addressed more complex systems such as therapeutic proteins [10–13], viruses [14–17], pathways [18–22], and even complete genomes [1,2] with properties like decreased immunogenicity, increased therapeutic potency, propensity to utilize and produce novel compounds, or for bioremediation. Amongst the less product-oriented applications is its use to understand and trace the natural evolution of proteins by mimicking natural selective pressure [23,24]. Directed evolution starts by prudently picking a target—a single gene, multiple genes, pathways, or organisms, and a desired goal. An iterative process of creating mutant libraries and choosing desired phenotypes over a synthetic fitness landscape is then begun until the goal is achieved or the desired property cannot be further improved (Figure 2.1). Various tools are currently available for each step in this process that can be categorized under relatively few general themes. For mutant library creation, random mutagenesis, gene recombination, and semirational mutagenesis techniques exist. These methods are near-blind processes providing an overwhelming number of possibilities and require an additional influence to guide them to the desired goal. This influence is crucial, and while in nature, there is really only one primary pressure—survival—the goals of directed evolution experiments are often not easily aligned with such a goal. Thus, two additional strategies available are enrichments and high-throughput screens, the latter of which is akin to the oft-stated “looking for a needle in a haystack” analogy. In the absence of a selection scheme or an enrichment strategy, the primary bottleneck in most directed evolution experiments remains the library screening. Several improvements have been made Target gene(s)/pathway(s)/organism(s)

Iterative rounds

Random mutagenesis gene recombination semi-rational mutagenesis A library of mutants

Selection or screening Functionally improved mutants

Goal achieved

Figure 2.1 General scheme of directed evolution.

Improving Protein Functions by Directed Evolution

2-3

to make this step quicker and less labor-intensive, yet it remains the weakest link in directed evolution. Rapid growth of genome and protein sequence data, along with bioinformatic tools such as BLAST searches and multiple sequence alignments (MSA), and protein modeling tools are able to assist screening by reducing the library size. Such a strategy is often dubbed semirational or computationally aided directed evolution. The aim of this chapter is to describe the current status of directed evolution of single proteins. This includes techniques and strategies for library creation, screening, selection, enrichment, as well as successes, issues and future prospects. Improvement of nucleic acids, pathways, and organisms via directed evolution will not be discussed here.

2.2 Tools of Directed Evolution: Techniques for Generating Mutant Libraries The natural wilderness is a harsh place, and in a state of constant flux. Organisms are exposed to stresses and strains, forcing them to adapt, which in turn alters their environment, again forcing different stresses on their progeny, thus continuing the never-ending spiral of change and adaptation. The nature of a stress determines the extent of mutation. Minimal and gradual stresses lead to incidental mutations during vegetative genome replication, whereas abrupt changes such as viral infections or conjugations induce more drastic genome rearrangements. The primary forms of mutations are point mutations, insertions, deletions, recombinations, substitutions, duplications, inversions, and translocations. In any case, detrimental or beneficial mutations determine the competence of the organism to survive in its environment. Neutral mutations are allowed to accumulate, which may later alter the organism’s propensity for engendering certain mutations, thus conditioning divergent evolution from common ancestry. However, natural evolution is a slow and complex process, and the rate of incidental mutations decreases drastically in higher organisms with robust DNA replication mechanisms. It should therefore be no surprise that the natural evolution of innumerable distinct species has taken billions of years. In every case, selection is preceded by mutation for diversity generation. The larger the number of distinct mutants, the higher is the number of beneficial mutations represented [25]. Random, higher complexity mutations such as duplications, inversions, and translocations usually lend themselves to exponentially diminishing probabilities of providing beneficial phenotypes. Such dwindling returns on investment are acceptable if the timescale of an experiment is in millions of years, as in nature. However, the success of a directed evolution experiment is only meaningful if results can be obtained on a reasonable timescale. Thus, investigators currently prefer simpler forms of mutagenesis such as point mutation and recombination, although insertions and deletions have also been applied with some success [26–30]. Development of new techniques for efficiently generating meaningful mutant libraries of manageable size via more complex techniques is an area of active research. Popular mutagenesis techniques generally fall under three primary categories—random mutagenesis, gene recombination, and semirational mutagenesis. Random mutagenesis produces mutations throughout the target gene and includes: (i) point mutations (transitions and transversions); (ii) insertions; (iii) deletions; (iv) inversions; and (v) frameshift mutations. Gene recombination is a “sexual” mutagenesis technique involving exchange of gene fragments between two or more genes. This itself can be subdivided into four major categories: (i) homologous recombination, where recombination occurs between genes with high sequence identity; (ii) nonhomologous recombination, where recombination occurs between genes with low sequence similarity; (iii) reciprocal recombination, in which a symmetrical exchange of genetic material occurs between two DNA strands; and (iv) site-specific recombination, where recombination occurs between two strands sharing little homology at a target site. The third technique, semirational mutagenesis is, a cross between site-directed mutagenesis and random mutagenesis. Specific sites for mutation are rationally determined by crystal structure, mechanistic data, or insights from bioinformatic analysis, and are then individually or combinatorially randomized to all

2-4 (a)

Evolutionary Tools in Metabolic Engineering (b)

(c)

Figure 2.2 Mutagenesis techniques (a) random, (b) recombination, and (c) semirational approaches.

20 amino acids. Figure 2.2 illustrates the general principle of each of the three techniques. While the list of techniques discussed below is extensive, it is not exhaustive. There are several proprietary techniques that engineer proteins via directed evolution. Many of these techniques will not be discussed here— interested readers can obtain more information about these techniques directly from the companies or from the patents themselves.

2.2.1 Random Mutagenesis Although geneticists have used several techniques to induce random mutagenesis in vivo, directed evolution generally prefers the simplicity of in vitro mutagenesis. In vivo techniques generally rely on the template strand damage, base-analogs, mutator strains, or transposition events to induce mutations during replication. Template strand damage can be induced by chemical mutagens, free radicals, or UV irradiation, whereas mutator strains deficient in high fidelity replication systems (such as E. coli XL1Red®, Stratagene, or E. coli mutD5 strains) can generate random mutations on average 1 bp per 1000 bp [8,31–34]. More recently, hypermutating B cells have also been identified as a tool for generating mutations in vivo [5]. Transposon kits are commercially available (Finnzymes, Epicenter, Invitrogen) to introduce mutations by insertions and inductive excision. However, the major drawback of transposition is that the location of insertion cannot be easily targeted in the desired gene. Transposition events are particularly suited to genetic studies such as gene knockout and mapping studies, but do not lend themselves well to the random mutagenesis required for directed evolution studies. Mutator strains have the added complexity of background mutations, thus requiring cloning of the desired gene into a mutationfree strain to ascertain the nature of the mutation(s) on the target gene. Yet another less intrusive genetic technique to generate mutants is adaptive mutagenesis, which closely resembles natural evolution, and as a result is considerably more time consuming. However, this technique is more useful for directed evolution of complete organisms rather than single genes, and will thus not be discussed here. In vitro techniques are much easier to control since they avoid the complexity of a cellular environment. UV irradiation [35,36] and popular chemical mutagens such as base analogs (5-bromo-deoxyuridine [36], 2-aminopurine [37], hydroxylamine [35]), alkylators (ethyl methane sulfonate [38], nitrosoguanidine [39], diethylsulfate), oxidizing agents (nitrous oxide [35], sodium bisulfite [40]), and intercalating agents (ethidium bromide [41] and acridines) can also be used in vitro with low fidelity polymerases; however, these methods are generally biased toward certain types of mutations. For example, UV irradiation tends to favor mutations across pyrimidine repeats whereas intercalating agents primarily cause frameshift mutations. With such biases, it is impossible to obtain a nonbiased randomly mutated library of the template gene. Several other techniques have thus been developed, the most popular of which are discussed in further sections.

2-5

Improving Protein Functions by Directed Evolution

2.2.1.1 Error-Prone Polymerase Chain Reaction (epPCR) Truly the poster child of random mutagenesis techniques, the effectiveness and simplicity of this technique has deterred the development of other random point mutagenesis techniques. EpPCR [42–45] relies on the fallibility of DNA replication, and promotes point mutations by providing DNA polymerase nonideal reaction conditions in vitro (Figure 2.3). Base analogs and alkylating agents can be used in conjunction with low fidelity or mutagenic polymerases, but in general such techniques suffer from severe biases and are generally not used. The most favored protocols utilize an Mg2+ -dependent thermostable polymerase (such as Taq or Mutazyme®, Stratagene) that lacks exonuclease activity and substitute Mn2+ for Mg2+, as well as unequal amounts of the four precursor deoxynucleotidetriphosphates [44,46]. These nonstandard conditions further increase the rate of inaccurate base-pairing during primer elongation. The primary advantage of this technique is that the rate of mutagenesis can be conveniently controlled by varying the concentration of Mn2+ in the reaction mix or the number of cycles of amplification. In general, one to five base mutations/kbp target DNA resulting in one to three amino acid substitutions is not exceeded because high mutagenic rates are often disruptive to enzyme function [47–51]. Additionally, as already mentioned, high mutation rates generate very large libraries and as a result, the probability of finding a functional mutant without extensive screening is diminishingly small. However, in cases where a good selection scheme or robust high-throughput screen is available, high mutation rates have been [49,52] and should be applied [25]. There are several drawbacks to epPCR, the dominant one being that mutagenesis is not completely random. In an ideal situation, the probability of any position on the protein being randomized to all 20 amino acids should be equal. But, since mutagenesis is performed on the gene and not the protein itself, the occurrence of such an ideal distribution is improbable due to the redundancy of the genetic code. Since 64 codons code for 20 amino acids and three stop codons, a mutation at the wobble position is frequently neutral, and mutations at two adjacent positions by epPCR is highly infrequent on average, only about six amino acids are substituted at each position [53]. Even then many mutations are semiconservative with amino acids having similar physiochemical properties [53]. A second limitation is that Taq and Mutazyme®, the most common polymerases used in epPCR, tend to favor transition mutations [53]. This issue has been addressed to a certain degree by Stratagene’s Mutazyme II®, which claims to induce mutations across all four nucleotides evenly. However, polymerases still tend to favor mutating certain regions of a gene, designated “hot spots.” The last major issue with epPCR is the fact

(a) (e)

(f)

(d)

(b)

(c)

Figure 2.3 Scheme of error-prone PCR (epPCR). Point mutations are introduced in replicas during primer extension during a PCR amplification of parental template. (a) Primer extension under “error-prone” condition, (b) introduction of point mutations, (c) completion of first strand synthesis (d) second strand synthesis, (e) repeat for 10–20 cycles, (f) library of mutants.

2-6

Evolutionary Tools in Metabolic Engineering

that it is a PCR reaction, and thus relies on replicating replicas. Therefore, any mutations introduced early on will be propagated with higher frequency, skewing the distribution of mutation rates. This can be addressed to some degree by pooling several small reaction products together to create the library, or by first using only linear amplification with a mutagenic polymerase to produce single-stranded mutant DNA and then generating the second strand via a high-fidelity polymerase. The first alternative can be easily incorporated into experiments, while the second alternative requires additional work and is thus not often used. Of course, several of these issues would be addressed if mutations could be introduced at the amino acid level rather than nucleotide level; however, no known PCR equivalent exists for amino acids. Error-prone Rolling Circle Amplification (epRCA [54]) is claimed to be an alternative to epPCR by its inventors, however, it more closely resembles an in vitro mutagenic strain. Using random primers and a low-fidelity polymerase at isothermal conditions, whole plasmid amplicons are transformed into cells without the requirement of any ligation steps. While its advantage includes negation of specific primers, thermocyclers, and general cloning procedures, its limitations are similar to those of mutagenic strains. The inability to distinguish incidental mutations on the plasmid from phenotypic change due to target gene mutation requires recloning of the gene into a mutation-free vector and reassessment, thus essentially negating the advantage of skipping general cloning steps. 2.2.1.2 Sequence Saturation Mutagenesis (SeSaM) SeSaM [53] was also developed as an alternative approach for epPCR that overcomes several barriers faced by its predecessor. This technique is a true random point mutagenesis technique able to substitute any amino acid with another without any mutational bias. This is achieved by incorporating an alkaline labile analog, α-phosphothionate nucleotide, into the gene and then hydrolyzing it to produce fragments of random length. After treatment with a terminal deoxynucleotidyl transferase (TdT) to incorporate a random number of universal bases at the 3ʹ-terminus, these fragments are used as megaprimers for unidirectional extension into full-length genes. Random point mutations are thus introduced opposite universal bases on further PCR amplification of single-stranded DNA. The authors have shown that SeSaM produces unbiased point mutations and codon mutations throughout a gene, which translates into exchange of any amino acid at any residue with any of the 19 others. However, the number of supplementary steps required, in addition to the complexity of introducing two different types of base analogs, and use of biotinylated primers for purification is nontrivial. In general, epPCR coupled with subsequent saturation mutagenesis, where identified functional important residues are mutagenized into all 20 amino acids in search of the most beneficial mutation, has been able to produce similar results [55]. 2.2.1.3 Indel Mutagenesis Insertion and deletion (indel) mutagenesis opens an area for random mutagenesis inaccessible to substitution mutagenesis alone. By randomly inserting and deleting sections of a gene, thereby altering its size, one can generate a diverse mutant library. Transposable elements are naturally suited for such mutagenesis since their insertion rates are controllable and they usually carry selective phenotypic markers. Use of transposable elements has been quite common at the organism level, but at the single gene level, its application has been less successful. Pentapeptide scanning mutagenesis [56] uses a transposable element Tn4430 that inserts randomly in the target and is then excised leaving behind a 15 bp in-frame fragment. This effectively adds a five amino acid fragment at a random location in the protein. However, there are several drawbacks to using this technique. It can only insert five amino acids, four of which are predetermined due to the cognate target site duplication by Tn3-based transposons and preset restriction site sequence. The fifth codon is variable only at the wobble position, and thus does not contribute significantly to diversity. Therefore, this technique cannot easily generate the diverse and unbiased library desired for directed evolution.

Improving Protein Functions by Directed Evolution

2-7

Another technique developed by Murakami et al. [29] called random insertion/deletion (RID) addresses several issues of transposon-based mutagenesis at the single gene level. Using this technique, the authors were able to delete up to 16 consecutive bases from random locations in a gene and insert a specific or random sequence of arbitrary length at the same location. Not only can this technique produce a large diversity unattainable by epPCR, it can also randomize any residue in a protein to all 20 amino acids by deleting a codon and then inserting another one in its place. As appealing as this sounds, this method is relatively complex, involving eight major steps. A technique for randomly introducing indel mutations called Random Insertional-deletional Strand Exchange (RAISE), [27] has recently been introduced, and is a much simpler technique combining the use of Terminal deoxynucleotidyl Transferase (TdT) for random nontemplated insertions with DNA shuffling technology (see Section 2.2.2.1). DNaseI fragmented genes with random nucleotides incorporated at their 3ʹ-ends are conjoined in a primerless PCR during which imperfect annealing of TdTextended bases introduces random insertions and deletions into final products. A technology that has seemingly swum against the current of the generally accepted dogma of diversity generation has been developed by Kashiwagi et al. [57]. Frameshifts are usually disfavored because they are liable to introduce premature nonsense mutations more often than beneficial mutations, thus encoding truncated, inactive proteins. Frame shuffling [57] however, utilizes three to six of the unutilized reading frames using a microgene technology [58] that minimizes the introduction of nonsense mutations. The introduction of nonsense codons cannot, however be completely eliminated because of the possibility of their introduction by random point mutations. This technology could open a new domain of sequence space hitherto purposely avoided, if the authors can demonstrate the applicability of this technique to naturally occurring genes that do have stop codons in their various reading frames. All random mutagenesis techniques have one common drawback—the generated mutants are functionally similar to the parent, i.e., the evolutionary progression is small. Multiple iterations are therefore required to make significant changes to the protein. While the best mutant may seem like a prudent choice as template for the next iteration, its potential for improvement cannot be guaranteed. On the other hand, if high mutagenic rates are used, disruptive mutations tend to accumulate more frequently than beneficial ones. As a result, mutations with synergistic effect are rarely discovered [59]. Still, drastic changes such as those required for novel functions are rarely discovered even after several rounds of mutagenesis. Readers interested in studying further limitations of random mutagenesis are referred to some excellent published reviews [25,60,61]. Table 2.1 provides a comparative view of the aforementioned techniques. By combining the diversity present on multiple templates, recombination techniques can circumvent this demerit of random mutagenesis.

2.2.2 Gene Recombination Comparable to sexual recombination, these techniques rely on exchange of genetic materials between two or more genes to generate sequence diversity. In nature, gene recombination has been one of the most efficient techniques for generating competent progeny. Certain flowering plants such as petunias have developed a highly polymorphic gene called S-RNase, which encodes for what effectively acts as a self-incompatibility agent thereby minimizing self-pollination, and promoting out-crosses [62,63]. Similarly, in prokaryotes antibiotic resistance tends to be more easily acquired by horizontal gene transfer, rather than adaptive, random mutagenesis. Directed evolution experiments utilizing recombination technique in vitro are able to effectively produce drastic changes in phenotypes, such as those required for novel functions, by taking giant steps across the evolutionary sequence space. Parental templates chosen for recombination are usually fully functional proteins, and as a result, a significant fraction of progeny produced with such techniques tends to be functional as well. Simultaneous backcrossing with excess of parental templates

2-8

Evolutionary Tools in Metabolic Engineering

Table 2.1 Summary of Random Mutagenesis Techniques Technique

Advantages

Chemical mutagens

• Simplicity

Mutator strains

• Simplicity

epPCR

• • • •

epRCA

SeSaM

RID

RAISE

Frame shuffling

• • • • •

Simplicity Tunable mutation rate Simplicity Specific primers not needed Tunable mutation rate Cloning steps skipped Isothermal Unbiased mutagenesis Codon randomization possible

• Random insertions and deletion • Large diversity possible • Codon randomization possible • Random insertions and deletion • Codon randomization possible • Can utilize three to six unused frames

Disadvantages

Key Factor(s)

• Biased mutagenesis • Required handling of toxic chemicals • Background mutations • Need to retransform mutants into mutation-free strain • Mutagenesis rate not tunable • Low mutation rate • Limited amino acid substitution • Biased mutagenesis

• Identity and concentration of mutagen

• Background mutations • Need to retransform mutants into mutation-free strain • Biased mutagenesis

• Mutation rate

• 2–3 days to perform • Several steps, reagents and enzymes required • Special primers required • Several purification steps involved • Several steps, reagents and enzymes required • Frameshift mutations possible

• Duration of incubation with TdT • Choice of analogs and universal base

• Frameshift mutations possible • DNaseI digestion bias

• Duration of incubation with TdT

• Applicable only to synthetic genes • Synthetic genes are tandem repeats with no stop codons in any reading frame • Applicability to functional genes not demonstrated

• Frameshift introduction frequency by polymerase

• Generations grown • Mutation rate of host

• Mutation rate

• Duration of random ssDNA cleavage • Efficiencies of intra- or interstrand ligations

also tends to minimize accumulation of neutral and deleterious mutations. While four types of gene recombination are possible (homologous recombination, nonhomologous recombination, reciprocal recombination, and site-specific recombination), only the former two are widely used in directed evolution. 2.2.2.1 Homology-Dependent Gene Recombination As the name suggests, homologous recombination requires a certain degree of sequence similarity, or homology, between DNA strands. In general, as with random mutagenesis, in vitro techniques are preferred due to their simplicity, although recently in vivo techniques have garnered some interest primarily due to limitations of in vitro DNA ligations and subsequent transformations. In such cases, the high-efficiency in vivo homologous recombination is exploited to generate molecular diversity.

2-9

Improving Protein Functions by Directed Evolution

(a)

(b)

(d)

(c)

(e)

Figure 2.4 Scheme of DNA shuffling. Chimeric progeny of two or more parental templates are created by random fragmentation and reconstruction of full-length genes using primerless PCR. (a) DNaseI fragmentation, (b) denaturation and annealing, several cycles of (c) extension and (d) denaturation and annealing, to yield (e) fulllength hybrids.

2.2.1.2 Primerless Fragment-Reassembly DNA shuffling, developed by Stemmer et al. [64], was the pioneering and breakthrough technique for in vitro recombinant mutagenesis and diversity generation. Thus, subsequently developed techniques are invariably juxtaposed against it to glean for advantages. As shown in Figure 2.4, DNaseI first digests parental templates randomly to small fragments and then they are reassembled via a primerless PCR reaction. Denatured fragments from different genes anneal to each other based on sequence homology and are extended via a DNA polymerase to full-length fragments. The method can be used to recombine the most promising mutations from random mutagenesis, thus advantageously combining beneficial mutations and displacing deleterious ones, or to shuffle distinct, but related genes. DNA shuffling in combination with a low fidelity DNA polymerase such as Taq can introduce additional mutations if desired, but a high fidelity DNA polymerase can also be utilized to minimize incidental mutations. A similar technique called family shuffling [65] was later introduced that is conceptually similar to DNA shuffling, but uses naturally occurring related genes as templates rather than mutants of a single parent.

2-10

Evolutionary Tools in Metabolic Engineering

This technique has been shown to expedite access to larger diversity compared to DNA shuffling [65]. While DNA shuffling and family shuffling are powerful tools, they are not without limitations. These techniques rely on homology between genes for recombination, and as a result, crossovers tend to accumulate at certain “hotspots” of high homology and more frequently between templates with higher sequence similarity [66]. In addition, most chimeras tend to contain only one crossover region, thus omitting significant theoretically accessible sequence space [66]. In cases where sequence homology is low among parental templates, most reassembled genes tend to be parental genes instead of hybrids [67]. Bias during DNaseI digestion tends to carry over during chimera reassembly, thus affecting diversity [68]. In an effort to mitigate some of these biases, Kikuchi et al. [69] proposed a modified family shuffling technique that replaces DNaseI with restriction endonucleases. The mixed fragments are amplified by PCR, again relying on homology between genes for reassembly. While they indicated an increased frequency in crossover rate, the technique still relies heavily on sequence homology during gene reassembly and can potentially succumb to similar pitfalls of over-representative recombination regions. In addition, fragmentation is not random thus excluding certain blocks from recombining. To address the low crossover rate issue associated with DNA shuffling, Coco et al. developed a new method called random chimeragenesis on transient templates (RACHITT, [70]). A parental template containing uracil is made single-stranded to serve as a scaffold for hybridization of second-strand fragments from a collection of homologous genes. 5ʹ- and 3ʹ-overhangs are degraded using Taq and Pfu DNA polymerases, respectively, the latter of which also fills in gaps on the transient template. Ligation of nicks is followed by treatment with uracil-DNA-glycosylase that degrades the scaffold. Finally, PCR is used to amplify the chimerical product for cloning. While this procedure shows a drastic improvement in crossover rate, averaging 14 per progeny, it is also more labor intensive and time consuming than DNaseI-based DNA shuffling. Optimization of reaction temperatures and time and strict control of substrate quality are crucial to a successful application of RACHITT. Another alternative technique to traditional DNA shuffling is NExT DNA Shuffling (nucleotide exchange and excision technology, [71]), which uses “exchange nucleotides” (dUTP or 8-oxo-guanine) to dictate fragmentation pattern of amplified templates, thus ignoring the accepted doctrine favoring randomly generated fragments. First, the exchange nucleotides are incorporated into genes and they are then excised by a nucleotide-specific glycosylase. Piperidine is utilized to fragment these genes and finally, internal self-priming is used to reassemble full-length hybrid genes. A computer program also developed by the authors is able to predict fragment size distribution, crossover locations and frequencies based on the standard nucleotide to exchange nucleotide ratio. 2.2.1.3 Oligonucleotide- and Primer-Dependent Reassembly The above-described techniques rely on self-priming of gene fragments based on homology. Other techniques have been developed that required end-primers or the addition of several oligonucleotides to encourage frequent crossover between templates. Random priming in vitro recombination (RPR, [72]), staggered extension process (StEP, [73]), synthetic shuffling [74] and degenerate oligonucleotide gene shuffling (DOGS, [67]) are some of the most well-established and popular techniques. RPR is similar to DNA shuffling but utilizes random primers in place of DNaseI fragmented parental templates. Primers anneal to random locations along the length of genes and when extended, produce fragments similar to those produced by DNaseI digestion in DNA Shuffling. However, the omission of DNaseI reduces the bias of nonrandom fragmentation that frequently plagues standard DNA shuffling. In StEP, full-length chimeras are produced during PCR with extremely short extension and annealing steps. Short steps extend end-primers in small increments with frequent denaturation steps enabling the annealing to different templates in subsequent annealing and extension cycles (template-switching), thereby producing crossovers. A related technique to DNA shuffling, synthetic shuffling is entirely devoid of the use of full-length parental genes for amalgamation. Diversity data between designated “templates” is obtained from bioinformatic analysis and encoded combinatorially into a large number of synthetic degenerate oligonucleotides. These oligonucleotides containing overlapping ends are then extended and amplified

Improving Protein Functions by Directed Evolution

2-11

into full-length composite genes. Since the entireties of the full-length genes are oligonucleotide-based, codon-optimizations, or site-directed mutations can be directly incorporated into the oligonucleotides. Assembly of designed oligonucleotides (ADO, [75]) is a very similar strategy but can additionally use ligation to join nonhomologous regions rather than relying completely on overlap of oligonucleotide ends. DOGS utilizes degenerate internal primers for conserved motifs in genes to be shuffled to create a library of gene fragments, which are subsequently assembled by overlap-extension PCR. The use of degenerate oligonucleotides allows this technique to bias products toward a specific parent or parents and shuffle genes with differing G + C contents much more efficiently than other techniques. 2.2.1.4 In Vivo Techniques In the rush to improve on the shortcomings of in vitro mutagenesis techniques, the prospect of developing in vivo protocols for diversity generation has been largely overlooked. One of the early utilizations of combined in vitro and in vivo recombination events to generate libraries was in the yeast Saccharomyces cerevisiae. In combinatorial libraries enhanced by recombination in yeast (CLERY, [76]), parental genes cloned into plasmids are fragmented randomly using DNaseI and then reassembled using a primerless “progressive hybridization” PCR program. Instead of using only one annealing step, this program uses nine steps with progressively decreasing temperature to facilitate annealing of low-homology genes. The reassembled products are then reamplified using flanking primers and full-length genes along with linearized plasmids are transformed into competent cells. Increased genetic diversity and plasmid circularization is realized by in vivo homologous recombination in the transformants. This diversity was found to be higher than the number of colonies selected as retransformation of rescued plasmids from a single clone into E. coli indicated that the yeast clone contained multiple plasmids variants. In E. coli, a method called heteroduplex recombination [77] uses in vivo mismatch repair on heteroduplex DNA created from strands of two different but homologous genes to create a library of chimerical genes. Single-stranded DNA is prepared from each gene either by unidirectional amplification or exonuclease degradation, mixed, and quenched on ice to enhance heteroduplex formation. After cloning into a plasmid vector and transformation, the system relies on the cells to create hybrid homoduplexes, as mentioned earlier. While this technique does do away with PCR- and DNaseI-based biases, it suffers from a major limitation in being able to use only two parental templates, thus restricting molecular diversity. Another technique utilizing recA-mediated homologous recombination in a recBC sbcA E. coli mutant was recently published [78]. While this technique also has the limitations of heteroduplex recombination, in that it can exploit only two parental templates and produces chimerical genes with only two c rossovers, it nevertheless provides the initial steps toward the avenue of in vivo diversity generation. In this technique, a linearized plasmid is ligated at each end to parental templates that are rendered disjointed in between by virtue of dephosphorylated ends. Homologous recombination rectifies the inability of the plasmid to replicate by circularization in vivo creating single crossover progeny. Using different progeny as parental stands, the process is repeated to produce a second generation with two crossovers. 2.2.2.5 Homology-Independent Gene Recombination Homology-dependent techniques provide a significant advantage over random mutagenesis techniques by making larger leaps across diversity sequence space [64], and in some cases, without much additional procedural complexity. However, in situations where recombination among genes with low homology or even those with no homology is desired, the previously described techniques prove extremely inefficient [79]. With the increasing number of available crystal structures of various proteins, it is easy to see why such a need could arise. A noticeable number of enzymes sharing little sequence homology often fold into similar three-dimensional structures. The (βα)8-barrel structure is a prime example of such a case with about 10% of all enzymes with known structure folding into this configuration [80]. This structure has also been shown to be resistant to large insertion mutations [81], justifying the significant interest it has garnered among protein engineers. Hybridization, swapping, or grafting amongst modular protein domains could also benefit significantly from homology-independent recombination. Several in vitro techniques have been developed in answer to these prospects.

2-12

Evolutionary Tools in Metabolic Engineering

2.2.2.5.1 Two-Parent Nonhomologous Gene Recombination Some of the earliest attempts at homology-independent recombination could utilize only two parental templates. They are the fancifully named incremental truncation for the creation of hybrid enzymes (ITCHY), SCRATCHY, and sequence homology-independent protein recombination (SHIPREC). ITCHY introduced by Ostermeier et al. [79] randomly fuses truncated fragments from two different genes to create hybrids. Controlled truncation from opposite ends on each gene using exonuclease III under nonideal conditions is followed by removal and reaction quenching of aliquots at various timepoints. This generates collections of truncated genes that are intermixed and then blunt-end ligated together forming genes of various lengths. Optimization and control of exonuclease digestion is key to the success of this technique, and it therefore makes ITCHY difficult and time consuming to perform. The library of hybrids is not full-length and the two genes are not necessarily fused at structurally related sites [82], potentially leading to a large number of inactive clones. Finally, only a fraction of the fusions occur where the gene sequences align [82] and the number of crossovers is limited to one. A modified version of ITCHY introduced later incorporates α-phosphothioate nucleotide analogs at low frequency in genes. Since nuclease activity of exonuclease III is inhibited at sites of analog incorporation, removal of aliquots is not required, thus making the process significantly less labor-intensive. As an added bonus, if incorporation of α-phosphothioate is during PCR amplification, use of error-prone conditions can simultaneously introduce point mutations. This modified technique is called THIO-ITCHY [83]. Another variation that combines ITCHY with DNA Shuffling is called SCRATCHY (Figure 2.5) and was also described by Ostermeier et al. [84]. Yet another improvement called enhanced-crossover SCRATCHY [85] was developed in order to further increase the number of crossovers in hybrid genes.

(a)

(a)

(b)

(b)

(c)

(c)

(d)

(d)

(e)

(e)

(f)

(f)

(g)

Figure 2.5 Scheme of SCRATCHY. Creation of hybrids by combining incremental truncation (ITCHY) and DNA shuffling. (a) Cloning both genes into vectors, (b) restriction digestion at linker, (c) incremental truncation from ends, (d) recircularization, (e) isolation of chimeric genes, (f) DNaseI fragmentation, (g) reassembly by PCR.

Improving Protein Functions by Directed Evolution

2-13

To do so, ITCHY hybrids created from parental genes are subsequently amplified in blocks using skewed primers, i.e., forward primers from the first gene and reverse primers from the second. This step selectively amplifies genes with crossover point, thus enriching them in the pool. Shuffling of pooled hybrid blocks create the library of multicrossover mutants. Soon after the introduction of ITCHY, Sieber et al. introduced an independent method called SHIPREC [82] to address some of the issues associated with the ITCHY method. In SHIPREC, the parental genes are fused together with a linker containing a unique restriction site and then fragmented by controlled DNaseI digestion. Those fragments corresponding to the length of either parent are isolated, polished at the termini to create blunt-ends, and ligated to create covalently closed circular gene hybrids. The circular hybrids are then linearized at the unique restriction site in the linker. Although SHIPREC does produce more functional hybrids due to fragment purification based on gene length, it still can produce only one crossover event. However, if combined with improvements offered by enhanced-crossover SCRATCHY, hybrids could have a larger number of crossover points. 2.2.2.5.2 Multiple-Parent Nonhomologous Gene Recombination In spite of the advantages provided by ITCHY, SCRATCHY and SHIPREC, the inability to recombine more than two genes is a serious hindrance in the search for more random and larger functional libraries. Multiple-parent gene recombination is therefore the next logical step. Whilst some of the multiparent nonhomologous gene recombination techniques have analogs in homologous recombination techniques, the extension to, and advantage of extending the techniques to low-homology systems is definitely not trivial. Exon shuffling [86] was introduced by Kolkman and Stemmer in the same issue of Nature Biotechnology as SHIPREC, signifying the interest in, and need for homology-independent recombination techniques. Exon shuffling, however, takes advantage of the intron–exon organization of eukaryotic genes to first break down genes into libraries of domain-encoding fragments and then reassemble them combinatorially. To join together various domain-encoding fragments, a large number of oligonucleotides that contain crossover points are used to amplify the exons, and then self-primed overlap-extension PCR amplified full-length genes. The gene length, functionality, and order can be easily controlled by permuting the domain crossovers via oligonucleotide sequences. Thus, unlike the previously described homology-independent techniques that ligate genes at random, Exon Shuffling grafts entire functional domains together to form a hybrid that is hopefully greater than the sum of its parts. Random multirecombinant polymerase chain reaction (RM-PCR, [87]) is similar to exon shuffling, but does not require preorganization of genes into domain encoding exons, and is thus generalized toward prokaryotic and eukaryotic genes alike. The reassembly of blocks into full-length genes is enabled by overlap-extension PCR through oligonucleotides that contain crossovers. Degenerate homoduplex recombination (DHR, [88]) utilizes a set of degenerate top-strand oligonucleotides that combinatorially incorporated among them the diversity information of the genes to be shuffled. Gaps between top-strand oligonucleotides are filled in by templating with dephosphorylated bottom-strand oligonucleotides, and then ligated together. This provides a highly polymorphic set of genes with predefined cap on diversity. Since this technique is similar to synthetic shuffling in several aspects, the requirement for a certain degree of homology is required, or the number of degenerate oligonucleotides required will be formidable. Nonhomologous random recombination (NRR, [89]) employs ligation of DNaseI fragmented genes polished to remove 3ʹ-overhangs and amplify 5ʹ-overhangs. Added hairpin oligonucleotides cap the ends of genes, and preferential intermolecular ligation is performed in presence of PEG. The stoichiometric ratio of added hairpins controls the overall length of the hybrid genes, with higher concentrations favoring shorter length chimeras. The presence of hairpins with known sequence allows for amplification of the library for cloning. This process can therefore produce multiple crossover hybrids without the use of large numbers of oligonucleotides. An issue with this method is the production of a significant number of frameshift mutants with premature stop codons, which limited the functional diversity. This was

2-14

Evolutionary Tools in Metabolic Engineering

addressed with the addition of a CAT fusion [90], which allows protein NRR to simultaneously select against truncated and insoluble proteins. A rather different, albeit more involved, approach to shuffling is Y-ligation-based block shuffling (YLBS, [91]). A Y-structure created by hybridizing complementary sequences of oligonucleotides has blocks of parental genes at their 5ʹ- or 3ʹ-ends. The hybridized section of the oligonucleotides forms the stem and noncomplimentary gene blocks along with different linkers containing different restriction sites constitute the branches. Single-stranded branch ends are ligated together to create a stem-loop structure that is then denatured and amplified by stem-specific biotinylated primers to produce double-stranded DNA with fused blocks. These are restriction digested in separate reactions with different enzymes to reintroduce distinct phosphorylated 5ʹ-ends. Single-stranded DNA with biotinylated ends purified from one reaction hybridizes with nonbiotinylated 5ʹ-phosphorylated purified single strands from the other reaction to recreate the Y-structures. Several iterations of hybridization, ligation, amplification, digestion, and purification can exponentially increase the number of crossovers. However, the elaborate scheme deters multiple iterations and makes the process generally unfeasible to produce a large number of crossovers. Gene recombination—whether homology-dependent or independent—has the major advantage that it can take leaps across sequence space far quicker than simple random mutagenesis. In addition, gene recombination holds the possibility of uncovering novel, but foreseeable functions that are rarely discovered by just random mutations. Predictable or guided outcomes are possible only if there exists similarity amongst diversity. By combining natural and unnatural evolutionary diversity, either based on sequence identity, functional similarity, or structural compatibility, for generation of further diversity, recombination techniques are truly evolution building on evolution. For readers interested in more critical assessments of the aforementioned techniques, recommended reading is referenced [66,92–94]. Table 2.2 and Table 2.3 provide an abstract view of the homology-dependent and independent techniques, respectively. Table 2.2 Homology-Dependent Mutagenesis Techniques Technique

Advantages

Disadvantages

Key Factor(s)

DNA shuffling

• Robust, flexible • Back-crossing to parent removes nonessential mutations • Synergistic/additive mutations can be found

• DNaseI digestion bias • Biased to crossovers in high homology regions • Low crossover rate • High percentage of parent

• Homology of parents • Fragmentation pattern

Family shuffling

• Exploits natural diversity • Accelerated phenotype improvement

• DNaseI digestion bias • Biased to crossover in high homology regions • Need high sequence homology in family • Low crossover rate • High percentage of parent

• Homology of parents • Fragmentation pattern

Family shuffling with restriction endonucleases

• Exploits natural diversity • Accelerated phenotype improvement • No DNAseI bias

• Biased to crossover in high homology regions • Need high sequence homology in family • Nonrandom fragmentation pattern predefined

• Homology of parents • Fragmentation pattern

RACHITT

• No parent genes in shuffled library • Higher rate of recombination • Recombine genes of low sequence homology

• Several steps, reagents and enzymes required • Requires synthesis and fragmentation of singlestranded complement DNA

• Reaction time and temperature • Purity of ssDNA scaffold

(continued)

2-15

Improving Protein Functions by Directed Evolution Table 2.2 Homology-Dependent Mutagenesis Techniques (Continued) Technique

Advantages

Disadvantages

Key Factor(s)

NExT DNA shuffling

• Predictable fragmentation pattern

• Nonrandom fragmentation • Several steps, reagents and enzymes required • Toxic piperidine used

• dUTP:dTTP concentration

RPR

• No DNAseI bias • Independent of template length • Specific primers not required • Isothermal reaction

• Biased point mutations also occur • Low crossover rate

• Homology of parents • Duration of primer elongation • Primers:templates concentration

StEP

• Simplicity

• Need high homology • Low crossover rate • Need tight control of PCR

• Annealing and extension times

Synthetic shuffling

• Greater flexibility • Increased diversity • Parental genes not required

• Required synthesis of many degenerate oligonucleotides • Gene sequences required

• Quality and number of oligonucleotides

DOGS

• Greater flexibility • Parental genes required • Low percent of parents in products

• Required synthesis of many degenerate oligonucleotides • Bias representation possible

• Designed primers encoding crossover regions

ADO

• Greater flexibility • Increased diversity • Parental genes not required

• Required synthesis of many degenerate oligonucleotides • Gene sequences required

• Quality and number of oligonucleotides

CLERY

• Not limited by ligation efficiency of gene into vector

• Transformants contain more than one mutant, so rescue and retransformation required • Long PCR program for reassembly • DNaseI digestion bias • Background mutation in plasmid possible • Limited diversity

• Progressive hybridization temperatures • Frequency of homologous recombination in vivo

Heteroduplex recombination

• No fragmentation bias

• Limited to only two genes • Still limited by ligation efficiency • Limited diversity

In vivo homologous recombination in E. coli

• No fragmentation bias

• Limited to only two genes • Limited to one crossover per iteration • Complex, labor intensive • Limited diversity

• Heteroduplex formation frequency/stability • Frequency of homologous recombination • Frequency of homologous recombination

2.2.3 Semirational Approaches The generally accepted idea and basis for rational enzyme design is that certain types of mutations at specific regions of enzymes have a higher probability of yielding beneficial effects than those at other sites [95,96]. For example, to increase activity, alter substrate specificity, or product selectivity, mutations near the active site may be more beneficial than those on the surface of an enzyme [96]. These sites can be identified readily from crystal structures, homology models, or other experimental data such

2-16

Evolutionary Tools in Metabolic Engineering

Table 2.3 Homology-Independent Mutagenesis Techniques Technique ITCHY

THIO-ITCHY

SCRATCHY

Enhanced crossoverSCRATCHY SHIPREC

Exon Shuffling

RM-PCR

DHR

Advantages

Disadvantages

Key Factor(s)

• Eliminates recombination bias • Structural knowledge not needed • Completely homologyindependent • Same advantages as ITCHY • Combines recombination and random mutagenesis • Simplified ITCHY method • Eliminates recombination bias • Structural knowledge not needed • Multiple crossovers possible • Same advantages as SCRATCHY • Enriched in recombinants • Crossovers occur at structurally related sites • Larger pool of active recombinants • Preserves exon function • Larger pool of active recombinants

• Limited to two parents • One crossover per iteration • Significant fraction of progeny out-of-frame • Complex, labor-intensive • Single crossovers • Same disadvantages as ITCHY • Incorporated dNTP analogs may complicate further experimentation • Limited to two parents • Significant fraction of progeny out-of-frame • Complex, labor-intensive • DNaseI digestion bias • Same disadvantages as SCRATCHY

• Exonuclease digestion and quenching times

• • • • • •

• DNaseI digestion time

• Unbiased incorporation of variable size DNA fragments • Useful for prokaryotic and eukaryotic genes

• •

• High recombination rate • Eliminates recombination bias • Parental genes not required

• •

• • • • • •

Limited to two parents One crossover per iteration Complex, labor-intensive DNaseI digestion bias Limited to eukaryotic genes Requires known intron–exon organization Limited diversity Required synthesis of many oligonucleotides Frameshifts may occur Mutants may be longer or shorter than expected Requires insights into locations of beneficial crossover points Complex, labor intensive Limited diversity Required synthesis of many oligonucleotides Gene sequences needed Limited diversity

NRR

• Large number of crossovers

• DNaseI digestion bias • Complex, labor-intensive

Protein-NRR

• Same as NRR • Larger pool of soluble and in-frame recombinants • Recombines variable size DNA fragments • Shuffles large fragments such as exons or domains • Parental genes not required

• Same as NRR

YLBS

• • • • • •

One crossover per iteration Frame shifts may occur Gene sequences needed Low product recovery Complex, labor-intensive Requirement for biotinylated primers

• Frequency of analog incorporation

• Fragmentation pattern

• Fragmentation pattern

• Quality and number of oligonucleotides

• Quality and number of oligonucleotides • Location of defined crossovers

• Quality of dephosphorylated bottom strand • Quality and number of oligonucleotides • Concentration of hairpin oligonucleotides • Efficiency of intramolecular ligation • Same as NRR

• Efficiency of ssDNA ligation • Quality of oligonucleotides

Improving Protein Functions by Directed Evolution

2-17

as alanine-scanning mutagenesis or mechanistic studies. The vast number of crystal structures and sequence data that are readily accessible have made such data mining possible. At the advent of directed evolution, or in the pre- “omics” era, vast sequence and crystallographic data was largely unavailable, thus the “blind is better” credo [97] for mutagenesis and extensive screening was justifiable. The choke in the flow of directed evolution experiments was, and still is, the library screening, and any effort to mitigate it would be beneficial. Therefore, to discount physiochemical or simulation data today and adhere only to brute-force random mutagenesis and screening would be imprudent. By first identifying protein regions with higher functional significance, one can avoid the need to screen prohibitively large libraries for elusive beneficial mutations. The combination of these two orthogonal techniques, rational design and directed evolution, in a paradoxical semirational approach can be formidable. 2.2.3.1 Targeted Randomization Since the effect of mutating a particular residue to another cannot be accurately assessed a priori, saturation mutagenesis, which uses degenerate codons (NNN, NNS, NNK, or NNB) to mutate a particular residue in a protein to all 20 amino acids, can be used in a semirational approach to search for most beneficial mutation. Although another technique, maximum efficiency (MAX, [98]) randomization, removes the genetic redundancy by utilizing only one codon per amino acid, resultantly requiring a smaller library for equivalent completeness [99], saturation mutagenesis is the currently preferred technique due to its simplicity. Coupled with information on functionally important residues, saturation mutagenesis has been used individually [100] or combinatorially [101,102] to create mutant libraries in a semirational approach. Even in the absence of information on functionally important residues, saturation mutagenesis can been used to generate mutant libraries. Gene site saturation mutagenesis (GSSM, [103]) is one such technique that mutates every amino acid in a sequence to all others at the genetic level using degenerate codons to create libraries of mutants. Combinatorial consensus mutagenesis (CCM, [104]) uses multiple sequence alignments of the family of the gene of interest to locate residues in the target that differ from the consensus. Targeted residues are combinatorially mutagenized to the consensus amino acid identity to yield mutants with beneficial properties. 2.2.3.2 Guided Recombination Libraries created by techniques that recombine sequences with low homology, such as ITCHY, SCRATCHY, and SHIPREC, usually contain a large number of inactive variants, primarily because the locations of beneficial recombination are hard to predict. While exon shuffling has locations of crossovers predetermined, RM-PCR, a close relative also utilizing user-defined oligonucleotides to specify crossover locations, could use additional insight into determining effective recombination spots. Computational methods such as SCHEMA [105], FamClash [106], and SCOPE [107] can be used to supplement recombination techniques, either homology-dependent or independent, to reduce sequence space, thereby increasing chances of finding positive mutants. Sequence-independent site-directed chimeragenesis (SISDC, [108]) is another block shuffling technique. Similar to RM-PCR, the sites of recombination in SISDC are defined by inserted sequence tags determined with the help of SCHEMA. The inclusion of a type IIb restriction enzyme recognition site within the tag and engineered consensus sequences at the cleavage sites allow for serial extension of shuffled blocks after digestion and purification from tags. To ensure the success of reassembly and to correct reading frame orientation during cloning careful design of cleavage site sequences is essential.

2.3 Tools of Directed Evolution: Screens, Enrichments, and Selections The generation of a large number of mutants necessitates high throughput assays to find the variant(s) with the desired properties. The challenge in designing such an assay is two-fold. The first is the design of the actual assay, which can be implemented in a high-throughput fashion while remaining

2-18

Evolutionary Tools in Metabolic Engineering

sensitive, economical, and compatible with the protein-DNA tagging. An assay here refers to screens, enrichments, or selection—although currently, enrichments are generally grouped with screens. Thus, a screen involves scrutinizing individual mutants for the desired property, while in a selection mutants deficient in the property of interest are eliminated and only the mutants with the desired property are allowed to propagate. The second challenge is posed by the need to tag proteins with the DNA encoding them in order to facilitate further rounds of mutagenesis. Since manipulating DNA is a much simpler task than proteins, due in part to their inherent stability, and part in view of ease of modification by in vitro techniques such as PCR, all mutagenesis in directed evolution studies is toward DNA alone. While mRNA may also serve as a tag for protein, DNA is still widely preferred for the reasons mentioned above. The mode of tagging DNA to its protein is perhaps the most intuitive way to categorize assay techniques. While direct physical linkage may seem to be the most obvious form of tagging, its general use in directed evolution is still relatively novel. Spatial co-localization is the most commonly used technique, and among them, cellular compartmentalization is the most popular choice as a simultaneous tag and expression host. However, the transformation efficiency of cells is a limiting factor in how many mutants can be assayed by this technique. In order to increase efficiency of space utilization for even higher throughput assay, several in vitro techniques have now shown great promise, and have the potential to screen libraries large enough to where the amount of DNA is limiting ~1014 molecules corresponding to 1 mg of plasmid DNA. Advances in mass-spectrometry (MS) have also shown excellent potential with applicability to detection of very low protein quantities via matrix-assisted laser desorption-ionization (MALDI) or electrospray ionization (ESI). While theoretically, MS can overcome the DNA barrier limit; the logistics of adapting it to a high throughput screen for different types of assays still needs significant work. An added advantage of using MS is that with direct protein sequencing by tandem MS, the requirement to tag a protein molecule to its DNA counterpart is obviated. It may be enticing to use the highest throughput assay for all studies, but this is usually not required. The choice of assay for any particular case depends on two major factors. The first is dependent on the throughput required. To improve physical properties like stability at extreme temperatures, pH, or in organic solvents, screening of about 104 variants is generally sufficient to find a clone with the desired properties. Thus, installation of infrastructure for extremely high-throughput techniques is unnecessary. The second is dependent of the property of the protein to be examined. Display technologies lend themselves naturally to affinity or binding assays, whereas enzymatic reactions are generally more adaptable to microtiter plate assays. Considering the ease of creating large mutant libraries, there must be screening and selection methods available. This section will describe the various technologies currently available to directed evolution. The various sections are thus divided based on the method used to tag a gene to its product. Table 2.4 gives a summary of the techniques and their basic characteristics, in addition to types of assays they are primarily used for.

2.3.1 Cartesian Co-Localization In this technique, the protein and its associated DNA are given a common spatial “address”, and are confined to a restricted area unable to diffuse and mix with other DNA-protein pairs. Microtiter plates are the most common among these, although protein-derivatized solid supports in a chip format, analogous to DNA microchips, have recently made some headway. 2.3.1.1 Microtiter Plates Practically just scale-down versions of test tubes, microtiter plates have been a favorite of immunologists for enzyme-linked immunosorbent assay (ELISA), [109] studies, as well as of drug discoverers for quite some time. They are the most versatile assay technologies and are thus easily adaptable to most types of assays—from enzymatic activity to binding-based assays. However, at the cost of versatility

2-19

Improving Protein Functions by Directed Evolution Table 2.4 Screening and Selection Techniques Technique

Linkage Type

Characteristics

1. Microtiter plates

Cartesian colocalization

• Labor intensive • Low throughput (<104) • Adaptable to many assays

2. Protein chips

Cartesian colocalization

3. In vivo compartmenttalization

Compartmentalization

• • • • • •

4. In vitro compartmenttalization

Compartmentalization

5. Phage display

Physical linkage

6. Cell-surface display 7. Ribosome and mRNA-display

Physical linkage

8. Plasmid display

Physical linkage

Physical linkage

Purified proteins required Difficult to create chips Limited application as screening technique Limited by transformation efficiency (<1012) Adaptable to many assays Selection possible

• Not limited by transformation efficiency • Selection possible • Post-translational modification of proteins not possible • Limited by transformation efficiency (<1012) • Tunable valency of proteins displayed • Protein expressed as fusions • Limited by transformation efficiency (<1012) • Protein expressed as fusions • Not limited by transformation efficiency • Decreased efficiency for large proteins and polypeptides • Protein expressed as fusions • Limited by transformation efficiency (<1012) • Protein expressed as fusions

Assays Used • UV/Vis/IR Absorbance/ transmittance • Fluorescence • Binding • Binding

• Colorimetric • Auxotrophy complementation • Fluorescence • PCR based

• Binding

• Binding • Fluorescence • Binding

• Binding

comes relatively low throughput. Each well has to be attended to individually with components and either purified proteins or crude cell extracts. And although plates with up to 9600 wells are commercially available, the 96-well plate is most synonymous with microtiter plates. UV/Vis absorbance [110], chemiluminescence [111], scintillation proximity [112], and fluorescence [113,114], with variations like Förster resonance energy transfer (FRET), [115] and fluorescence polarization assays are most popularly associated with microtiter plates. Another drawback of these assays is that most of the time, they are only applicable to reactions which can be linked to spectrophotometric properties such as absorbance and fluorescence. 2.3.1.2 Protein Chips The success of DNA microchips has inspired efforts into development of protein chips. In this case, purified protein products are immobilized in a high-density format on solid support such as polyvinylidene difluoride membranes [116], nitrocellulose membranes [117], glass slides [118,119], or polyacrylamide gels [120]. Although protein chips are currently used primarily for proteomic studies, once larger varieties of immobilized proteins on chips become available, their application in binding-based screens [117,120,121] for directed evolution should become more widespread. A clever strategy to adapt protein chips to a wider variety of enzymatic assays is to immobilize proteins in nanowells [122], thus melding microtiter plate technology with protein chips. However, protein chips still face several challenges. First, it is only applicable to proteins that are stable outside the cell and are amenable to rough manipulation such as printing or covalent modifications. Second, the high level of accuracy required to utilize protein chips necessitates robotic precision. The widespread use of protein chips in directed evolution may not become reality in the near future, but its advantage and applicability cannot, and should not, be discounted.

2-20

Evolutionary Tools in Metabolic Engineering

2.3.2 Compartmentalization Closely related to Cartesian colocalization, compartmentalization also restricts movement of proteinDNA pairs disallowing comingling with other pairs. However, instead of providing spatially addressable Cartesian coordinates, here the compartments are distributed randomly in a three-dimensional volume. The most well known of these compartments is a cell itself, with DNA libraries introduced as a ligation mixture or plasmids, thereby providing an expression host as well as a segregated compartment. However, the number of diverse sequence easily generated by mutagenesis dwarfs the limitation of introducing plasmids with a maximal efficiency of 1010 colony forming units (cfu) per µg DNA by electroporation. In vitro oil-in-water emulsion techniques have the ability to overcome this cap by providing minimalist “cells” such that encapsulations are created around DNA, rather than introducing DNA into compartments. 2.3.2.1 In Vivo Compartmentalization Geneticists have been using cell-based screens and selection for a long time, and there is much that can be directly carried over to directed evolution. Cells are natural compartments as DNA-protein tags, and with techniques like electroporation and transformation of plasmids or ligation mixtures, the transfer of single DNA variants to each cell, and subsequent isolation using a variety of commercially available kits have become routine. Cellular assays are highly suited for selections where the property of interest is phenotypically linked by complementation to an essential metabolic function in an auxotrophic strain. The ease of implementing cellular assays makes it the first choice in any directed evolution study where <1010 clones need to be examined. Cellular assays have been applied to improving protein properties such as resistance to antibiotics [123], increased soluble expression levels [124–126], or ability to metabolize certain carbon sources [127,128]. The development of two-hybrid systems has increased the repertoire of selectable properties to encompass protein-DNA, protein-RNA, proteinsmall molecule interactions, and even catalytic activity [129–135]. However, other properties such as activity at extreme temperatures or pH, and altering substrate specificity are not directly amenable to selections. In such cases, clever techniques can be devised for selection, but usually when implementation of a negative selection is not possible, screens on solid media using visually identifiable markers such as fluorescence and colorimetry, colony-lift methods, or supplementation with microtiter plates may be required. 2.3.2.2 In Vitro Compartmentalization (IVC) By encapsulating only DNA and the transcriptional-translational machinery to encode protein in a water-in-oil emulsion, Tawfik and Griffiths [136] provided a minimalist cell within which to conduct screens or selections (Figure 2.6). Since the compartments are created entirely in vitro, several steps involved in general cloning, such as restriction digests, vector ligations, transformations, etc. are altogether avoided. Not only does this expedite experiments, it has the ability to raise the cap of the number of maximum screenable mutants to where the amount of DNA is limiting. Selecting for properties that directly affect the translation/transcription process [137–141] by IVC seems feasible; however, its adaptation to other types of assays is its next major challenge.

2.3.3 Physical Linkage Perhaps the simplest form of tagging a DNA molecule to the protein it encodes for is by physically linking the two, either directly—as in the case of ribosome- or mRNA-display—or indirectly by phage- or cell-surface display. While it may be tempting to compare phage-display and cell-surface display to compartmentalization, the major difference between the two technologies is that the proteins here are free to interact with each other. Ribosome and mRNA display are in vitro techniques, thus potentially limited only by the amount of DNA produced in the laboratory. Phage display and

2-21

Improving Protein Functions by Directed Evolution

(a) (b)

(e) (c) (f)

(d)

Figure 2.6 Scheme of in vitro compartmentalization (IVC). Compartments for individual genes and translation/transcription system are created in an oil-water emulsion for screening or selection. (a) Emulsification of library tagged with substrate, (b) expression of gene, (c) conversion of substrate into product by protein, (d) capture of gene using product tag, (e) repeat till (f) desired goal is achieved.

cell-surface display are still limited by transformation efficiency of host cells. The applicability of these techniques is primarily limited to binding assays, although recently, usefulness to enzymatic assays has been shown. 2.3.3.1 Phage Display The first display technology introduced, and therefore the most well studied [142–147], phages provide the simplest scaffolds for protein display via fusions to their coat proteins. The filamentous phages M13 has been the most popular host for this technology, with coat proteins pIII [146,148,149], pVII [150,151], pVIII [150,152], or pIX [150,153] serving as the mounts for the encoded proteins. Phage display has been especially prominent in antibody engineering, particularly as a simpler alternative to the time consuming hybridoma technology [154,155]. Additionally, engineering catalytic antibodies [156–158] protein-DNA interactions [159–161], and protein-small molecule interactions [162,163] have already been demonstrated. Enzyme catalysis is more challenging for phage display, but several successes and strategies have already been reported [164–168]. Phage display is a powerful system, capable of multivalent display of multiple proteins, but is ultimately limited by transformation efficiency of the host cells required to express proteins and assemble phage particles. In any case, the ability of phage display to screen a wide variety of enzymatic reactions will eventually ascribe its usefulness in directed evolution, since enzymes are the major targets for engineering. 2.3.3.2 Cell-Surface Display As in phage display, here proteins are expressed and displayed on the outer surface of the cell, thus not requiring cell disruption to detect activity. Bacteria are used to display proteins using an LppOmpA (lipoprotein-outer membrane protein A) fusion [169], although yeasts and mammalian cell display technologies have also been developed [170,171]. Applications of cell-surface display have been, unsurprisingly, similar to phage display, used on peptides [172], antibodies [173,174] and T-cell receptors

2-22

Evolutionary Tools in Metabolic Engineering

[175], and more recently to engineer nuclease activity [176]. Affinity-based assay are easily applicable to this technology, although FACS is also generally quite popular. Like phage display, the application of cell-surface technology to a wide repertoire of enzymatic activities would require some clever manipulations. 2.3.3.3 Ribosome and mRNA Display Chloramphenicol is known to inhibit cell growth by causing cessation of ribosomal protein translation. Mattheakis et al. [177] observed that in vitro, the ribosome, mRNA, and polypeptide remain associated after chloramphenicol addition. Taking advantage of this, they developed a ribosome-display based enrichment, where the incompletely translated polypeptide is bound to an affinity matrix, after which the complex is dissociated and mRNA reverse transcribed into encoding DNA. While this technology was found effective for short polypeptides, efficiency drops considerably for larger proteins [178]. In addition, the ternary complex was found to be unstable, and these shortcomings eventually inspired the development of mRNA display in which the mRNA is covalently linked to the growing peptide [179,180]. The antibiotic puromycin is an analog for the aminoacyl end of tRNA and is covalently bound to a linker DNA at the 3ʹ-end of the mRNA. Upon reaching the ribosome, puromycin is added to the growing C-terminus of the encoded polypeptide, thus physically linking the mRNA to the translated protein. Unlike the previous two display technologies, these two are not limited by the transformation efficiency of cells. Once extended to different enzyme chemistries, it should prove to be a very powerful screening tool. 2.3.3.4 Plasmid Display Taking DNA-protein tagging literally, plasmid display uses a DNA-binding protein to noncovalently link the encoded protein to its plasmid. Cull et al. [181] used a Lac repressor-peptide fusion to bind lacO DNA sequences on the plasmid, and then screened peptides based on binding to an immobilized antibody. While this is conceptually novel, its application has thus far been limited to only affinity based screens of short peptides, although it should easily lend to the same assays as the previously mentioned physical linkage techniques. A critical shortcoming this technique shares with phage and cell-surface display is that the library size it can screen is limited to the transformation efficiency of the cell. However, it is conceivable that in the future, application of plasmid display in conjunction with in vitro compartmentalization can be used to overcome barriers faced by both techniques. Available techniques to screen or select mutagenic libraries vary over a wide range of applicability and maximal throughput, with one usually coming at the cost of the other. More seasoned techniques like cellular compartmentalization and microtiter plates have already found use in almost every facet of protein phenotype improvement. Screens in such cases have usually been limited to <105 clones, whereas selections have been able to examine larger libraries, however still capped by the electroporation efficiency of cells at <1012. The more novel techniques like in vitro compartmentalization and display techniques are very promising for protein engineering, albeit still in their nascent stage with limited applications. Inventive modifications and utilization in improvement of different types of enzymatic reactions will eventually determine their usefulness and longevity as tools in directed evolution and protein engineering. While these techniques are able to screen larger libraries more efficiently, the cap of maximum producible DNA in a laboratory limits the maximum screenable library to 1014 distinct sequences (~1 mg of plasmid DNA). The major drawback is thus the need to tag DNA to a protein. Mass spectroscopy can therefore overcome this limitation by directly sequencing proteins. Potentially, armed with techniques that mutagenize at the protein- or mRNA-level instead of the DNA level, it could effectively screen a much larger sequence space.

2.4 Successes and Applications of Directed Evolution Synthetic chemistry is a much better-established field than biocatalysis, and has been very successful in the past in catalyzing difficult reactions. Additionally, synthetic catalysts are a more cost-effective means of increasing reaction rates than enzymes because they do not require time-consuming cell

Improving Protein Functions by Directed Evolution

2-23

culturing and low-throughput, low-yield batch chromatographic purification steps. Why, then is any need or interest in utilizing biocatalysts? In general, for any chemical process, the most expensive step is the purification, and the most efficient means to mitigate purification costs is to minimize formation of side-products during reaction. This is the primary reason for employing enzymes—as conduits to desired products—and orders of magnitude enhancement in reaction rate is an added bonus. The challenges faced in designing synthetic catalysts to produce stereochemically pure reaction products are the same as those faced in the rational design of enzyme active sites. Catalytic sites, either synthetic or enzymatic are difficult to rationally mold for a desired mode of action. The advantage of using proteins is that they are innate catalysts and can be adapted to serve even unnatural purposes by genetic manipulation using tools like directed evolution. Several classes of enzymes are already being used at the industrial scale to produce chemicals, and readers interested in learning about those enzymes, are referred to several excellent reviews already published on the subject [182–189]. This section will use examples as a tool to exemplify the vast variety of protein properties directed evolution has been used to improve, which goes beyond just modifying enzymes for industrial catalysis. Proteins are also potent therapeutic agents, and directed evolution has been used to design novel vaccines, gene therapeutics, and potent antibodies. In a more novel application, directed evolution, being an imitation of natural evolution, has been used to shed some light on the latter process.

2.4.1 Altering Catalytic Activity Enzymes are excellent at directing reactions toward certain products, however, they might not do so at high rates. Thus, many of the earliest examples of directed evolution are aimed at increasing the catalytic efficiency of these enzymes. Qian and Lutz [190] increased catalytic efficiency of Candida antarctica Lipase B toward p-nitrophenol butyrate and 6,8-difluoro-4-methylumbelliferyl octanoate over 12- and 28-fold, respectively, using circular permutations of the wild-type protein with a library of about 5 × 105 mutants. In another impressive example, Castle et al. [191] improved the catalytic efficiency of a glyphosate N-acetyltransferase (GAT) toward the herbicide glyphosate by four orders of magnitude. To do so, they screened 5,000 colonies in each of 11 rounds of iterative DNA shuffling with genes from different variants of Bacillus licheniformis. Several classes of enzymes are known to accept a large number of substrates, particularly those involved in primary metabolism. In such a case, the catalytic efficiency toward certain substrates may be less than optimal for a desired purpose, thus there may be a need to broaden enzyme specificity to catalyze reactions with additional substrates with higher efficiency. As an example of this, Fong et al. [192] used epPCR and DNA shuffling to broaden the substrate specificity of D-2-keto-3-deoxy6-phosphogluconate (KDPG) aldolase to accept both d- and l-sugars in nonphosphorylated forms for reversible aldol reactions. Cho et al. [193] used two rounds of DNA shuffling and screened about 12,000 mutants to improve the hydrolytic activity of an organophosphorus hydrolase (OPH) 725-fold toward the toxin chlorpyrifos. Raillard et al. [194] used DNA shuffling on two genes encoding s-triazine hydrolyzing enzymes and screened about 1,600 variants for activity against 15 different triazines. Their best mutant was able to hydrolyze five triazines that were not substrates for either of the wild-type enzymes. In other cases, with promiscuous enzymes, there may be an interest in narrowing substrate specificity, particularly when formation of undesired side-products is an issue. In general, a larger library needs to be screened to find enzymes with increased specificity, although, there have been cases where beneficial mutants have been found in smaller libraries. Antikainen et al. [195] did just this by altering the substrate specificity of phospholipase C from Bacillus cerus (PLCBc) by saturation mutagenesis of three residues in the substrate-binding pocket. The wild-type enzyme had a similar preference for the watersoluble substrate C6PC (1, 2-dihexanoyl-sn-glycero-3-phosphocholine) and C6PE (1, 2-dihexanoylsn-glycero-3-phosphoethanolamine) with kcat/K M ratio = 0.7; a double mutant had narrowed substrate preference with 29.0 for the same ratio, thereby indicating a large preference for C6PE over C6PC, albeit

2-24

Evolutionary Tools in Metabolic Engineering

at the cost of decreased activity. Mutants were found by screening a library of only 6,000 members. Rothman et al. [196] performed a single round of DNA shuffling on an E. coli aspartate aminotransferase (AATase) mutant, HEX [197], with tyrosine aminotransferase (TATase) activity and increased its activity as a TATase at the cost of AATase activity. Regulatory proteins, unlike those involved in primary metabolism, tend to be very specific in their choice of substrates. Like narrowing substrate specificity, altering substrate specificity requires more drastic changes, and thus screening of a larger library. Focusing initially on the substrate binding pocket in such cases tends to yield quicker results, as demonstrated by the following examples. Chockalingam et al. [100] used the ligand binding domain (LBD) of one such protein, the human estrogen receptor α (hERα), which has a very specific and high affinity for 17β-estradiol (E2), and drastically shifted its affinity to accept the synthetic ligand 4, 4ʹ-dihydroxybenzil (DHB) in its stead, with >107 fold shift in preference. To do so, they used several individual rounds of saturation mutagenesis on 14 ligand-contacting residues followed by epPCR on the entire LBD. Santoro et al. [198] created an orthogonal aminoacyltRNA synthetase and tRNA pair that cannot interact with their endogenous counterparts in E. coli. To do so, they performed combinatorial saturation mutagenesis on five residues thought to constitute the binding pocket, and selected mutants from a library of almost 109 mutants. Altering the substrate-binding pocket is one way to decrease the formation of side-products. Since enantiomeric products are resultants of the same substrate, enantiopure yields cannot be obtained by altering the substrate affinities. To approach this problem, one needs to widen the enzyme “conduit” toward the desired product, thus preferentially producing only that product. The role of substrate contacting residues on enantioselectivity of the enzyme may not be direct, and therefore random mutagenesis is generally preferable to alter this property. In one such study, van Loo et al. [199] increased the enantioselectivity of Agrobacterium radiobacter epoxide hydrolase by up to 20-fold using epPCR followed by DNA shuffling. A total of 40,000 mutants were screened as produced by epPCR and another 20,000 as created by DNA shuffling. In another example, Leibeton et al. [55] increased the enantioselectivity of Pseudomonas aeruginosa lipase by almost 24-fold by epPCR followed by saturation mutagenesis. Each generation of epPCR required screening of only 1,000–7,000 mutants to identify beneficial mutations.

2.4.2 Improving Enzyme Stability Most enzymes are suited to react in physiological conditions, which are quite mild except in extremophiles. Physiological conditions are not optimal for most large-scale reactions, and in such a case, it is simpler to alter the enzyme involved to cater to the needs of the process rather than altering the process itself. The major targets for enzymatic stability improvement are thermostablity, pH insensitivity, and resistance to denaturation in organic solvents. To one such goal, Johannes et al. [200] improved the half-life of NAD(P)H oxidizing phosphite dehydrogenase from Pseudomonas stutzeri by over 7,000-fold at 45°C by introducing 12 mutations over three rounds of epPCR. Each round required the screening of <10,000 clones to find mutants with significant improvement in stability. Giver et al. [201] increased the melting temperature (Tm) of p-nitrobenzyl esterase by over 14°C with six rounds of epPCR and a single round of DNA shuffling, screening <2,000 mutants per library. Sriprapundh et al. [202] increased the stability of Thermotoga neopolitana xylose isomerase at acidic pH using two rounds of epPCR and screening under 1,500 clones per round without sacrificing any activity. Hao and Berry [203] evolved fructose bisphosphate aldolases using four rounds of epPCR followed by DNA shuffling to yield mutants with increased thermostabilty in addition to stability in organic solvents. In a particularly impressive example, Ness et al. [204] improved all the aforementioned properties of the protease subtilisin using family shuffling on 26 homologous subtilisin genes. Stable, soluble expression in heterologous hosts is important for a number of reasons; the most prominent may be for ease of study, and economic large-scale production and purification. Phospho triesterase from Pseudomonas diminuta is one such protein that is extremely efficient at hydrolyzing

Improving Protein Functions by Directed Evolution

2-25

organophosphorus compounds [205] but is difficult to express heterologously. Rooseveldt and Tawfik [206] identified a variant with 20-fold increased functional expression in E. coli from a library of mutants created by epPCR. Sieber et al. [82] produced a soluble mutant of a membrane-associated human cytochrome P450 by creating a hybrid with a soluble bacterial P450 using SHIPREC.

2.4.3 Evolving Proteins with Therapeutic Value As mentioned before, directed evolution has application outside of improving enzymes for industrial catalysis. Several properties of proteins with therapeutic value can be improved using directed evolution, and several reviews have been published which provide more details [207–211]. Increased potency and affinity for targets, decreased immunogenicity, enhanced stability, and improved pharmacokinetics are some of the properties focused on by researchers. Shuffling techniques have shown particular promise [210,212], and several display technologies have been applied successfully [213–216]. Antibodies are a class of proteins with major significance as protein therapeutics and several are commercially available in the market to treat a variety of diseases. Directed evolution has been used to increase the affinity of an antibody by Boder et al. [174] using several rounds of DNA shuffling and epPCR coupled with yeast display to screen 105–107 mutants. A mutant showing >1,000 fold increased affinity was obtained after several rounds of mutagenesis and screening. Hanes et al. [217] used mRNA and ribosome display to screen for single-chain antibody fragments (scFv) mutants from a synthetic library with increased affinity. Selected mutants showed >40-fold increased affinity for their antigen compared to any of the progenitors. Another class of protein therapeutics—engineered cytokines—are beneficial for treatment of hepatitis C viral infections, hairy cell leukemia, and genital warts [218]. However, toxicity limits the dosage and thus, the efficacy of treatments. In an attempt to mitigate dosage limitations, Chang et al. [219] shuffled 20 human cytokines interferon-αs (IFN-αs) to achieve enhanced activity in murine cell-based assay for protection against a viral challenge. Screening fewer than 2,000 clones resulted in identification of chimeras with these beneficial properties. Enzymes have also been used as therapeutic agents in conjunction with reductive prodrugs for cancer chemotherapy. One such enzyme, YieF from E. coli, was identified and Barak et al. [220] improved its reductive activity on the prodrugs mitomycin C and CB 1954 using epPCR thereby enhancing its capacity to kill HeLa cells by over five-fold. Directed evolution has also been used in developing gene delivery vectors. The most successful applications have been on coat proteins of viral vectors, improving their target specificity as well as pharmacokinetic properties. Soong et al. [221] shuffled envelope protein encoding genes of six murine leukemia virus (MLV) strains to confer them with CHOK1 (Chinese Hamster Ovary) cell tropism, which the parent viruses lacked. Powell et al. [222] also used DNA shuffling on six MLV strains, creating a library of 5 × 106 mutants, and isolated mutants with 30- to 100-fold increased stability to ultracentrifugation. Maheshri et al. [223] used epPCR followed by StEP to mutate the coat proteins of an adeno-associated virus 2 (AAV2) to alter its receptor-binding properties. This allowed the mutants to evade neutralizing host antibodies in vivo and in vitro, increasing the potential of recombinant AAV2s as gene delivery vehicles.

2.4.4 Understanding Natural Evolution Since directed evolution is an imitation of natural evolution, albeit at an extremely accelerated rate, should it not be able to enlighten us on the nature of natural evolution? Currently, directed evolution has been used to produce progeny that are only slightly different from their parents, especially evident in comparison to the myriad of divergent enzymatic functions present in Nature. However, the large number of protein crystal structures have shown that these differing functions are built on the scaffold of only a few thousand platforms. This only reaffirms the fact that most novel functions are probably

2-26

Evolutionary Tools in Metabolic Engineering

derived by evolution of existing proteins, rather than de novo in “ junk DNA”. The generally accepted idea now is that the first step to acquiring a novel functions is to first evolve and procure promiscuous activity, duplicate, and then specialize to a different function [24,224–228]. Aharoni et al. [24] demonstrated this pathway using directed evolution with three different types of enzymes—a serum paranoxase, a phosphotriesterase, and a carbonic anhydrase II. By selecting for mutants with cross-reactivity, they found that loss of natural function is not required. This in itself was not particularly surprising, since there have already been several reports of successful directed evolution of enzymes for broader range of activities, without a selective pressure to retain original function. However, further selection for improvement of the promiscuous activity led indirectly to specialization by decreased efficiency of natural function. Along the same reasoning, Chen and Zhao designed a technique called in vitro coevolution, as a stepwise and iterative process of selective induction of alternating promiscuity and specificity to a parental enzyme to develop novel functions [229]. Yoshikuni et al. [230] reported the divergent evolution of several novel highly active and specific enzymatic functions from a promiscuous parental enzyme by recombining mutations at key sites that conferred functional plasticity. More recently, Peisajovich et al. [23] demonstrated a pathway to the evolution of novel protein topologies from existing structures using directed evolution techniques. The authors showed that mutagenesis and gene rearrangements could systematically lead to the evolution of functional enzyme homologs, especially during relaxed selective pressures, without substantial loss of the host organism’s fitness. Their study also suggested that point mutations that may have little effect on the functionality of parental enzyme, if allowed to accumulate, could yield beneficial effects in evolved progeny.

2.5 Patent and Licensing Issues Most uses of directed evolution are primarily commercially focused, that is, most improvements are generally made to proteins that have commercial value, and improvements are made to those properties that limit their industrial usage. It should therefore not be surprising that several of the protocols discussed here are protected under patents. Up until March 2006, Hoffman-La Roche held key patents on PCR, that covers its use as the foundation for most of these techniques described here (US4683195, US4683202, and US4965188). Patents for certain instruments, reagents, etc. are still valid for several more years, however. Nevertheless, PCR is an indispensable tool in molecular biology and biotechnology, and has been used extensively, especially since the use of Mn2 + as a low fidelity inducing agent was not covered under patents; neither was the use of several commercial polymerases such as Vent (New England BioLabs) and the Mutazyme® I and II systems (Stratagene). Affymax Technologies hold the patents US5605793 and US5830721 for DNA shuffling. Maxygen, Inc. holds US6506603, also for DNA shuffling, as well as US6521453 for synthetic shuffling. Finally, California Institute of Technology holds the patents for StEP (US6153410 and US6177263). MAX, an alternative to saturation mutagenesis, is protected by patents WO00/15777, held by Aston University and Amersham Pharmacia Biotech, and also WO03/106679, held by Aston University. Diversa Corporation claims the patents (US6562594, US6171820) to the use of complete set of primers to mutagenize every position in a gene, as required by GSSM. The ITCHY family of methods, the most popular techniques for homology-independent recombination, are justifiably so, because they are not protected by any patents. RID and ADO are also not protected by patents.

2.6 Outlook Directed evolution has proven to be a powerful tool with applications for more than just engineering preexisting enzymes to industrial requirements. Although its primary focus is to tailor enzymes and proteins for commercial purposes, directed evolution has also been used to understand natural

Improving Protein Functions by Directed Evolution

2-27

evolution and develop novel functions, which makes it an invaluable tool in molecular, genetic, and evolutionary biology. The successes have also focused primarily on only a few techniques in the toolbox available—epPCR and/or DNA shuffling for diversity generation, and microtiter plates and/or cellular compartmentalization for assay systems—but the use of novel tools should be accompanied by novel applications. In vitro display technologies such as ribosome/mRNA display have shown promise to overcome the limitations placed by maximum transformation efficiency of cells. However, this again is still a brute-force technique just circumventing, rather than overcoming, the problem of screening large libraries. In order to achieve quicker results, there is the need to start with a mutagenic technique that can create a larger fraction of improved mutants. With recent advances in computational approaches, semi-rational design may help achieve this goal. Of course, the final goal would be to know beforehand exactly the mutations that would provide the desired goal, and that would require absolute foreknowledge, something that would require outsmarting Mother Nature herself. The chances of this are slim, even in the distant future, so until then directed evolution and computational design will have to work synergistically to develop innovative new technologies and products in the future.

Acknowledgments We gratefully acknowledge financial support from Biotechnology Research and Development Consortium (BRDC) (Project 2-4-121), Office of Naval Research (N00014-02-1-0725), National Science Foundation (BES-0348107), Department of Energy, and the DuPont Company.

References 1. Patnaik, R., et al. Genome shuffling of Lactobacillus for improved acid tolerance. Nat. Biotechnol., 20(7), 707–12, 2002. 2. Zhang, Y.X., et al. Genome shuffling leads to rapid phenotypic improvement in bacteria. Nature, 415(6872), 644–46, 2002. 3. Kim, D. and F.P. Guengerich. Selection of human cytochrome P450 1A2 mutants with enhanced catalytic activity for heterocyclic amine N-hydroxylation. Biochemistry, 43(4), 981–88, 2004. 4. Kumar, S., et al. Directed evolution of mammalian cytochrome P450 2B1: mutations outside of the active site enhance the metabolism of several substrates, including the anticancer prodrugs cyclophosphamide and ifosfamide. J. Biol. Chem., 280(20), 19569–75, 2005. 5. Wang, C.L., D.C. Yang, and M. Wabl. Directed molecular evolution by somatic hypermutation. Protein Eng. Des. Sel., 17(9), 659–64, 2004. 6. Rajagopalan, P.T., S. Lutz, and S.J. Benkovic. Coupling interactions of distal residues enhance dihydrofolate reductase catalysis: mutational effects on hydride transfer rates. Biochemistry, 41(42), 12618–28, 2002. 7. Iffland, A., et al. Directed molecular evolution of cytochrome c peroxidase. Biochemistry, 39(35), 10790–98, 2000. 8. Horsman, G.P., et al. Mutations in distant residues moderately increase the enantioselectivity of Pseudomonas fluorescens esterase towards methyl 3bromo-2-methylpropanoate and ethyl 3phenylbutyrate. Chemistry, 9(9), 1933–39, 2003. 9. Mills, D.R., R.L. Peterson, and S. Spiegelman. An extracellular Darwinian experiment with a selfduplicating nucleic acid molecule. Proc. Natl. Acad. Sci. USA, 58(1), 217–24, 1967. 10. Chang, C.C., et al. Evolution of a cytokine using DNA family shuffling. Nat. Biotechnol., 17(8), 793–97, 1999. 11. Xu, L., et al. Directed evolution of high-affinity antibody mimics using mRNA display. Chem. Biol., 9(8), 933–42, 2002.

2-28

Evolutionary Tools in Metabolic Engineering

12. Leong, S.R., et al. Optimized expression and specific activity of IL-12 by directed molecular evolution. Proc. Natl. Acad. Sci. USA, 100(3), 1163–68, 2003. 13. Midelfort, K.S. and K.D. Wittrup. Context-dependent mutations predominate in an engineered high-affinity single chain antibody fragment. Protein Sci., 15(2), 324–34, 2006. 14. Maheshri, N., et al. Directed evolution of adeno-associated virus yields enhanced gene delivery vectors. Nat. Biotechnol., 24(2), 198–204, 2006. 15. O’Loughlin T, L., D.N. Greene, and I. Matsumura. Diversification and specialization of HIV protease function during in vitro evolution. Mol. Biol. Evol., 23(4), 764–72, 2006. 16. Perabo, L., et al. Combinatorial engineering of a gene therapy vector: directed evolution of adenoassociated virus. J. Gene Med., 8(2), 155–62, 2006. 17. Kather, I., C.A. Bippes, and F.X. Schmid. A stable disulfide-free gene-3-protein of phage fd generated by in vitro evolution. J. Mol. Biol., 354(3), 666–78, 2005. 18. Mijts, B.N., P.C. Lee, and C. Schmidt-Dannert. Identification of a carotenoid oxygenase synthesizing acyclic xanthophylls: combinatorial biosynthesis and directed evolution. Chem. Biol., 12(4), 453–60, 2005. 19. Lee, P.C., et al. Directed evolution of Escherichia coli farnesyl diphosphate synthase (IspA) reveals novel structural determinants of chain length specificity. Metab. Eng., 7(1), 18–26, 2005. 20. Lee, P.C., et al. Alteration of product specificity of Aeropyrum pernix farnesylgeranyl diphosphate synthase (Fgs) by directed evolution. Protein Eng. Des. Sel., 17(11), 771–77, 2004. 21. Umeno, D. and F.H. Arnold. Evolution of a pathway to novel long-chain carotenoids. J. Bacteriol., 186(5), 1531–36, 2004. 22. Umeno, D. and F.H. Arnold. A C35 carotenoid biosynthetic pathway. Appl. Environ. Microbiol., 69(6), 3573–79, 2003. 23. Peisajovich, S.G., L. Rockah, and D.S. Tawfik. Evolution of new protein topologies through multistep gene rearrangements. Nat. Genet., 38(2), 168–74, 2006. 24. Aharoni, A., et al. The ‘evolvability’ of promiscuous protein functions. Nat. Genet., 37(1), 73–76, 2005. 25. Drummond, D.A., et al. Why high-error-rate random mutagenesis libraries are enriched in functional and improved proteins. J. Mol. Biol., 350(4), 806–16, 2005. 26. Park, H.S., et al. Design and evolution of new catalytic activity with an existing protein scaffold. Science, 311(5760), 535–38, 2006. 27. Fujii, R., M. Kitaoka, and K. Hayashi. RAISE: a simple and novel method of generating random insertion and deletion mutations. Nucleic Acids Res., 34(4), e30, 2006. 28. Osuna, J., et al. Protein evolution by codon-based random deletions. Nucleic Acids Res., 32(17), e136, 2004. 29. Murakami, H., T. Hohsaka, and M. Sisido. Random insertion and deletion of arbitrary number of bases for codon-based random mutation of DNAs. Nat. Biotechnol., 20(1), 76–81, 2002. 30. Hayes, F., B. Hallet, and Y. Cao. Insertion mutagenesis as a tool in the modification of protein function. Extended substrate specificity conferred by pentapeptide insertions in the omega-loop of TEM-1 beta-lactamase. J. Biol. Chem., 272(46), 28833–36, 1997. 31. Bornscheuer, U.T., J. Altenbuchner, and H.H. Meyer. Directed evolution of an esterase for the stereoselective resolution of a key intermediate in the synthesis of epothilones. Biotechnol. Bioeng., 58(5), 554–59, 1998. 32. Selifonova, O., F. Valle, and V. Schellenberger. Rapid evolution of novel traits in microorganisms. Appl. Environ. Microbiol., 67(8), 3645–49, 2001. 33. Coia, G., et al. Use of mutator cells as a means for increasing production levels of a recombinant antibody directed against Hepatitis B. Gene, 201(1–2), 203–9, 1997. 34. Henke, E. and U.T. Bornscheuer. Directed evolution of an esterase from Pseudomonas fluorescens. Random mutagenesis by error-prone PCR or a mutator strain and identification of mutants showing enhanced enantioselectivity by a resorufin-based fluorescence assay. Biol. Chem., 380(7–8), 1029–33, 1999.

Improving Protein Functions by Directed Evolution

2-29

35. Botstein, D. and D. Shortle. Strategies and applications of in vitro mutagenesis. Science, 229(4719), 1193–201, 1985. 36. Djordjevic, B. and O. Djordjevic. Chromosomal aberrations in synchronized mammalian cells treated with 5-bromo-deoxyuridine and irradiated by ultra-violet light. Nature, 206(989), 1165–66, 1965. 37. Pitsikas, P., J.M. Patapas, and C.G. Cupples. Mechanism of 2-aminopurine-stimulated mutagenesis in Escherichia coli. Mutat. Res., 550(1–2), 25–32, 2004. 38. Lai, Y.P., et al. A new approach to random mutagenesis in vitro. Biotechnol. Bioeng., 86(6), 622–27, 2004. 39. Reznikoff, C.A. and R. DeMars. In vitro chemical mutagenesis and viral transformation of a human endothelial cell strain. Cancer Res., 41(3), 1114–26, 1981. 40. Shortle, D. and D. Botstein. Directed mutagenesis with sodium bisulfite. Methods Enzymol., 100, 457–68, 1983. 41. Tripathi, A.K. and H.D. Kumar. Mutagenesis by ethidium bromide, proflavine and mitomycin C in the cyanobacterium Nostoc sp. Mutat. Res., 174(3), 175–78, 1986. 42. Beckman, R.A., A.S. Mildvan, and L.A. Loeb. On the fidelity of DNA replication: manganese mutagenesis in vitro. Biochemistry, 24(21), 5810–17, 1985. 43. Cadwell, R.C. and G.F. Joyce. Mutagenic PCR. PCR Methods Appl., 3(6), S136–40, 1994. 44. Cadwell, R.C. and G.F. Joyce. Randomization of genes by PCR mutagenesis. PCR Methods Appl., 2(1), 28–33, 1992. 45. Fromant, M., S. Blanquet, and P. Plateau. Direct random mutagenesis of gene-sized DNA fragments using polymerase chain reaction. Anal. Biochem., 224(1), 347–53, 1995. 46. Leung, D.W., E. Chen, and D.V. Goeddel. A method for random mutagenesis of a defined DNA segment using a modified polymerase chain reaction. Technique, 1, 11–15, 1989. 47. Arnold, F.H. Enzyme engineering reaches the boiling point. Proc. Natl. Acad. Sci. USA, 95(5), 2035–36, 1998. 48. Suzuki, M., et al. Tolerance of different proteins for amino acid diversity. Mol. Divers, 2(1–2), 111–18, 1996. 49. Daugherty, P.S., et al. Quantitative analysis of the effect of the mutation frequency on the affinity maturation of single chain Fv antibodies. Proc. Natl. Acad. Sci. USA, 97(5), 2029–34, 2000. 50. Shafikhani, S., et al. Generation of large libraries of random mutants in Bacillus subtilis by PCRbased plasmid multimerization. Biotechniques, 23(2), 304–10, 1997. 51. Bloom, J.D., et al. Thermodynamic prediction of protein neutrality. Proc. Natl. Acad. Sci. USA, 102(3), 606–11, 2005. 52. Kunichika, K., Y. Hashimoto, and T. Imoto. Robustness of hen lysozyme monitored by random mutations. Protein Eng., 15(10), 805–9, 2002. 53. Wong, T.S., et al. Sequence saturation mutagenesis (SeSaM): a novel method for directed evolution. Nucleic Acids Res., 32(3), e26, 2004. 54. Fujii, R., M. Kitaoka, and K. Hayashi. One-step random mutagenesis by error-prone rolling circle amplification. Nucleic Acids Res., 32(19), e145, 2004. 55. Liebeton, K., et al. Directed evolution of an enantioselective lipase. Chem. Biol., 7(9), 709–18, 2000. 56. Hallet, B., D.J. Sherratt, and F. Hayes. Pentapeptide scanning mutagenesis: random insertion of a variable five amino acid cassette in a target protein. Nucleic Acids Res., 25(9), 1866–67, 1997. 57. Kashiwagi, K., et al. Frame shuffling: a novel method for in vitro protein evolution. Protein Eng. Des. Sel., 19(3), 135–40, 2006. 58. Shiba, K., Y. Takahashi, and T. Noda. Creation of libraries with long ORFs by polymerization of a microgene. Proc. Natl. Acad. Sci. USA, 94(8), 3805–10, 1997. 59. Zaccolo, M. and E. Gherardi. The effect of high-frequency random mutagenesis on in vitro protein evolution: a study on TEM-1 beta-lactamase. J. Mol. Biol., 285(2), 775–83, 1999. 60. Wong, T.S., et al. A statistical analysis of random mutagenesis methods used for directed protein evolution. J. Mol. Biol., 355(4), 858–71, 2006.

2-30

Evolutionary Tools in Metabolic Engineering

61. Bosley, A.D. and M. Ostermeier. Mathematical expressions useful in the construction, description and evaluation of protein libraries. Biomol. Eng., 22(1–3), 57–61, 2005. 62. Steinbachs, J.E. and K.E. Holsinger. S-RNase-mediated gametophytic self-incompatibility is ancestral in eudicots. Mol. Biol. Evol., 19(6), 825–29, 2002. 63. Wang, Y., et al. S-RNase-mediated self-incompatibility. J. Exp. Bot., 54(380), 115–22, 2003. 64. Stemmer, W.P. Rapid evolution of a protein in vitro by DNA shuffling. Nature, 370(6488), 389–91, 1994. 65. Crameri, A., et al. DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature, 391(6664), 288–91, 1998. 66. Joern, J.M., P. Meinhold, and F.H. Arnold. Analysis of shuffled gene libraries. J. Mol. Biol., 316(3), 643–56, 2002. 67. Gibbs, M.D., K.M. Nevalainen, and P.L. Bergquist. Degenerate oligonucleotide gene shuffling (DOGS): a method for enhancing the frequency of recombination with family shuffling. Gene, 271(1), 13–20, 2001. 68. Sambrook, J., E.F. Fritsch, and T. Maniatis. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, 1989. 69. Kikuchi, M., K. Ohnishi, and S. Harayama. An effective family shuffling method using singlestranded DNA. Gene, 243(1–2), 133–37, 2000. 70. Coco, W.M. RACHITT: gene family shuffling by random chimeragenesis on transient templates. Methods Mol. Biol., 231, 111–27, 2003. 71. Muller, K.M., et al. Nucleotide exchange and excision technology (NExT) DNA shuffling: a robust method for DNA fragmentation and directed evolution. Nucleic Acids Res., 33(13), e117, 2005. 72. Shao, Z., et al. Random-priming in vitro recombination: an effective tool for directed evolution. Nucleic Acids Res., 26(2), 681–83, 1998. 73. Zhao, H., et al. Molecular evolution by staggered extension process (StEP) in vitro recombination. Nat. Biotechnol., 16(3), 258–61, 1998. 74. Ness, J.E., et al. Synthetic shuffling expands functional protein diversity by allowing amino acids to recombine independently. Nat. Biotechnol., 20(12), 1251–55, 2002. 75. Zha, D., A. Eipper, and M.T. Reetz. Assembly of designed oligonucleotides as an efficient method for gene recombination: a new tool in directed evolution. Chembiochem. 4(1), 34–39, 2003. 76. Abecassis, V., D. Pompon, and G. Truan. High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2. Nucleic Acids Res., 28(20), e88, 2000. 77. Volkov, A.A., Z. Shao, and F.H. Arnold. Recombination and chimeragenesis by in vitro heteroduplex formation and in vivo repair. Nucleic Acids Res., 27(18), e18, 1999. 78. Xu, S., et al. Directed evolution of extradiol dioxygenase by a novel in vivo DNA shuffling. Gene, 368, 126–37, 2006. 79. Ostermeier, M., J.H. Shim, and S.J. Benkovic. A combinatorial approach to hybrid enzymes independent of DNA homology. Nat. Biotechnol., 17(12), 1205–9, 1999. 80. Hocker, B. Directed evolution of (betaalpha)(8)-barrel enzymes. Biomol. Eng., 22(1–3), 31–38, 2005. 81. Farber, G.K. and G.A. Petsko. The evolution of alpha/beta barrel enzymes. Trends Biochem Sci, 15(6), 228–34, 1990. 82. Sieber, V., C.A. Martinez, and F.H. Arnold. Libraries of hybrid proteins from distantly related sequences. Nat. Biotechnol., 19(5), 456–60, 2001. 83. Lutz, S., M. Ostermeier, and S.J. Benkovic. Rapid generation of incremental truncation libraries for protein engineering using alpha-phosphothioate nucleotides. Nucleic Acids Res., 29(4), e16, 2001. 84. Lutz, S., et al. Creating multiple-crossover DNA libraries independent of sequence identity. Proc. Natl. Acad. Sci. USA, 98(20), 11248–53, 2001.

Improving Protein Functions by Directed Evolution

2-31

85. Kawarasaki, Y., et al. Enhanced crossover SCRATCHY: construction and high-throughput screening of a combinatorial library containing multiple non-homologous crossovers. Nucleic Acids Res., 31(21), e126, 2003. 86. Kolkman, J.A. and W.P. Stemmer. Directed evolution of proteins by exon shuffling. Nat. Biotechnol., 19(5), 423–28, 2001. 87. Tsuji, T., M. Onimaru, and H. Yanagawa. Random multi-recombinant PCR for the construction of combinatorial protein libraries. Nucleic Acids Res., 29(20), e97, 2001. 88. Coco, W.M., et al. Growth factor engineering by degenerate homoduplex gene family recombination. Nat. Biotechnol., 20(12), 1246–50, 2002. 89. Bittker, J.A., B.V. Le, and D.R. Liu. Nucleic acid evolution and minimization by nonhomologous random recombination. Nat. Biotechnol., 20(10), 1024–29, 2002. 90. Bittker, J.A., et al. Directed evolution of protein enzymes using nonhomologous random recombination. Proc. Natl. Acad. Sci. USA, 101(18), 7011–16, 2004. 91. Kitamura, K., et al. Construction of block-shuffled libraries of DNA for evolutionary protein engineering: Y-ligation-based block shuffling. Protein Eng., 15(10), 843–53, 2002. 92. Ostermeier, M., A.E. Nixon, and S.J. Benkovic. Incremental truncation as a strategy in the engineering of novel biocatalysts. Bioorg. Med. Chem., 7(10), 2139–44, 1999. 93. Ostermeier, M. Theoretical distribution of truncation lengths in incremental truncation libraries. Biotechnol. Bioeng., 82(5), 564–77, 2003. 94. Firth, A.E. and W.M. Patrick. Statistics of protein library construction. Bioinformatics, 21(15), 3314–15, 2005. 95. Dalby, P.A. Optimising enzyme function by directed evolution. Curr. Opin. Struct. Biol., 13(4), 500– 5, 2003. 96. Park, S., et al. Focusing mutations into the P. fluorescens esterase binding site increases enantioselectivity more effectively than distant mutations. Chem. Biol., 12(1), 45–54, 2005. 97. Arnold, F.H. When blind is better: protein design by evolution. Nat. Biotechnol., 16(7), 617–18, 1998. 98. Hughes, M.D., et al. Removing the redundancy from randomised gene libraries. J. Mol. Biol., 331(5), 973–79, 2003. 99. Patrick, W.M., A.E. Firth, and J.M. Blackburn. User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries. Protein Eng., 16(6), 451–57, 2003. 100. Chockalingam, K., et al. Directed evolution of specific receptor-ligand pairs for use in the creation of gene switches. Proc. Natl. Acad. Sci. USA, 102(16), 5691–96, 2005. 101. Schmitzer, A.R., F. Lepine, and J.N. Pelletier. Combinatorial exploration of the catalytic site of a drug-resistant dihydrofolate reductase: creating alternative functional configurations. Protein Eng. Des. Sel., 17(11), 809–19, 2004. 102. Reetz, M.T., et al. Expanding the range of substrate acceptance of enzymes: combinatorial active-site saturation test. Angew Chem. Int. Ed. Engl., 44(27), 4192–96, 2005. 103. Kretz, K.A., et al. Gene site saturation mutagenesis: a comprehensive mutagenesis approach. Methods Enzymol., 388, 3–11, 2004. 104. Amin, N., et al. Construction of stabilized proteins by combinatorial consensus mutagenesis. Protein Eng. Des. Sel., 17(11), 787–93, 2004. 105. Voigt, C.A., et al. Protein building blocks preserved by recombination. Nat. Struct. Biol., 9(7), 553–38, 2002. 106. Saraf, M.C., et al. FamClash: a method for ranking the activity of engineered enzymes. Proc. Natl. Acad. Sci. USA, 101(12), 4142–47, 2004. 107. O’Maille, P.E., M. Bakhtina, and M.D. Tsai. Structure-based combinatorial protein engineering (SCOPE). J. Mol. Biol., 321(4), 677–91, 2002. 108. Hiraga, K. and F.H. Arnold. General method for sequence-independent site-directed chimeragenesis. J. Mol. Biol., 330(2), 287–96, 2003.

2-32

Evolutionary Tools in Metabolic Engineering

109. Voller, A., et al. A microplate method of enzyme-linked immunosorbent assay and its application to malaria. Bull World Health Organ., 51(2), 209–11, 1974. 110. Moore, J.C. and F.H. Arnold. Directed evolution of a para-nitrobenzyl esterase for aqueous-organic solvents. Nat. Biotechnol., 14(4), 458–67, 1996. 111. Bruno, J.G. and J.L. Kiel. In vitro selection of DNA aptamers to anthrax spores with electrochemiluminescence detection. Biosens. Bioelectron., 14(5), 457–64, 1999. 112. Bosworth, N. and P. Towers. Scintillation proximity assay. Nature, 341(6238), 167–68, 1989. 113. Pope, A.J., U.M. Haupts, and K.J. Moore. Homogeneous fluorescence readouts for miniaturized high-throughput screening: theory and practice. Drug Discovery Today, 4(8), 350–62, 1999. 114. Crameri, A., et al. Improved green fluorescent protein by molecular evolution using DNA shuffling. Nature Biotechnol., 14(3), 315–19, 1996. 115. Grunewald, J., et al. Fluorescence resonance energy transfer as a probe of peptide cyclization catalyzed by nonribosomal thioesterase domains. Chem. Biol., 12(8), 873–81, 2005. 116. Lueking, A., et al. Protein microarrays for gene expression and antibody screening. Anal. Biochem., 270(1), 103–11, 1999. 117. Ge, H. UPA, a universal protein array system for quantitative detection of protein-protein, proteinDNA, protein-RNA and protein-ligand interactions. Nucleic Acids Res., 28(2), e3, 2000. 118. MacBeath, G. and S.L. Schreiber. Printing proteins as microarrays for high-throughput function determination. Science, 289(5485), 1760–63, 2000. 119. Rowe, C.A., et al. Array biosensor for simultaneous identification of bacterial, viral, and protein analytes. Anal. Chem., 71(17), 3846–52, 1999. 120. Arenkov, P., et al. Protein microchips: use for immunoassay and enzymatic reactions. Anal. Biochem., 278(2), 123–31, 2000. 121. Schweitzer, B., et al. Inaugural article: immunoassays with rolling circle DNA amplification: a versatile platform for ultrasensitive antigen detection. Proc. Natl. Acad. Sci. USA, 97(18), 10113–19, 2000. 122. Zhu, H., et al. Analysis of yeast protein kinases using protein chips. Nat. Genet., 26(3), 283–89, 2000. 123. Tomatis, P.E., et al. Mimicking natural evolution in metallo-beta-lactamases through second-shell ligand mutations. Proc. Natl. Acad. Sci. USA, 102(39), 13761–66, 2005. 124. Waldo, G.S., et al. Rapid protein-folding assay using green fluorescent protein. Nature Biotechnol., 17(7), 691–95, 1999. 125. Aharoni, A., et al. Directed evolution of mammalian paraoxonases PON1 and PON3 for bacterial expression and catalytic specialization. Proc. Natl. Acad. Sci. USA, 101(2), 482–87, 2004. 126. van den Berg, S., et al. Improved solubility of TEV protease by directed evolution. J. Biotechnol., 121(3), 291–98, 2006. 127. McLoughlin, S.Y., et al. Growth of Escherichia coli coexpressing phosphotriesterase and glycerophosphodiester phosphodiesterase, using paraoxon as the sole phosphorus source. Appl. Environ. Microbiol., 70(1), 404–12, 2004. 128. Sakamoto, T., et al. Laboratory evolution of toluene dioxygenase to accept 4-picoline as a substrate. Appl. Environ. Microbiol., 67(9), 3882–87, 2001. 129. Drees, B.L. Progress and variations in two-hybrid and three-hybrid technologies. Curr. Opin. Chem. Biol., 3(1), 64–70, 1999. 130. McNabb, D.S. and L. Guarente. Genetic and biochemical probes for protein-protein interactions. Curr. Opin. Biotechnol., 7(5), 554–59, 1996. 131. Warbrick, E. Two’s company, three’s a crowd: the yeast two hybrid system for mapping molecular interactions. Structure, 5(1), 13–17, 1997. 132. Brachmann, R.K. and J.D. Boeke. Tag games in yeast: the two-hybrid system and beyond. Curr. Opin. Biotechnol., 8(5), 561–68, 1997. 133. Vidal, M. and P. Legrain. Yeast forward and reverse ‘n’-hybrid systems. Nucleic Acids Res., 27(4), 919–29, 1999.

Improving Protein Functions by Directed Evolution

2-33

134. Colas, P. and R. Brent. The impact of two-hybrid and related methods on biotechnology. Trends Biotechnol., 16(8), 355–63, 1998. 135. Lin, H. and V.W. Cornish. In vivo protein-protein interaction assays: beyond proteins. Angew Chem. Int. Ed. Engl., 40(5), 871–75, 2001. 136. Tawfik, D.S. and A.D. Griffiths. Man-made cell-like compartments for molecular evolution. Nat. Biotechnol., 16(7), 652–56, 1998. 137. Ghadessy, F.J., J.L. Ong, and P. Holliger. Directed evolution of polymerase function by compartmentalized self-replication. Proc. Natl. Acad. Sci. USA, 98(8), 4552–57, 2001. 138. Cohen, H.M., D.S. Tawfik, and A.D. Griffiths. Altering the sequence specificity of HaeIII methyltransferase by directed evolution using in vitro compartmentalization. Protein Eng. Des. Sel., 17(1), 3–11, 2004. 139. Bernath, K., S. Magdassi, and D.S. Tawfik. Directed evolution of protein inhibitors of DNA-nucleases by in vitro compartmentalization (IVC) and nano-droplet delivery. J. Mol. Biol., 345(5), 1015–26, 2005. 140. Levy, M., K.E. Griswold, and A.D. Ellington. Direct selection of trans-acting ligase ribozymes by in vitro compartmentalization. RNA, 11(10), 1555–62, 2005. 141. Sepp, A. and Y. Choo. Cell-free selection of zinc finger DNA-binding proteins using in vitro compartmentalization. J. Mol. Biol., 354(2), 212–19, 2005. 142. Smith, G.P. and V.A. Petrenko. Phage Display. Chem. Rev., 97(2), 391–410, 1997. 143. Sidhu, S.S. Phage display in pharmaceutical biotechnology. Curr. Opin. Biotechnol., 11(6), 610–16, 2000. 144. Rodi, D.J. and L. Makowski. Phage-display technology—finding a needle in a vast molecular haystack. Curr. Opin. Biotechnol., 10(1), 87–93, 1999. 145. Forrer, P., S. Jung, and A. Pluckthun. Beyond binding: using phage display to select for structure, folding and enzymatic activity in proteins. Curr. Opin. Struct. Biol., 9(4), 514–20, 1999. 146. Smith, G.P. Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science, 228(4705), 1315–17, 1985. 147. Sblattero, D. and A. Bradbury. Exploiting recombination in single bacteria to make large phage antibody libraries. Nat. Biotechnol., 18(1), 75–80, 2000. 148. Parmley, S.F. and G.P. Smith. Filamentous fusion phage cloning vectors for the study of epitopes and design of vaccines. Adv. Exp. Med. Biol., 251, 215–18, 1989. 149. Crameri, R. and M. Suter. Display of biologically active proteins on the surface of filamentous phages: a cDNA cloning system for selection of functional gene products linked to the genetic information responsible for their production. Gene, 137(1), 69–75, 1993. 150. Gao, C., et al. A cell-penetrating peptide from a novel pVII-pIX phage-displayed random peptide library. Bioorg. Med. Chem., 10(12), 4057–65, 2002. 151. Kwasnikowski, P., P. Kristensen, and W.T. Markiewicz. Multivalent display system on filamentous bacteriophage pVII minor coat protein. J. Immunol. Methods, 307(1–2), 135–43, 2005. 152. Deng, Q., et al. Screening for PreS specific binding ligands with a phage displayed peptides library. World J. Gastroenterol., 11(26), 4018–23, 2005. 153. Gao, C., et al. A method for the generation of combinatorial antibody libraries using pIX phage display. Proc. Natl. Acad. Sci. USA, 99(20), 12612–16, 2002. 154. Hoogenboom, H.R. and P. Chames. Natural and designer binding sites made by phage display technology. Immunol. Today, 21(8), 371–8, 2000. 155. Dall’Acqua, W. and P. Carter. Antibody engineering. Curr. Opin. Struct. Biol., 8(4), 443–50, 1998. 156. Yu, J., et al. A glycosidase antibody elicited against a chair-like transition state analog by in vitro immunization. Proc. Natl. Acad. Sci. USA, 95(6), 2880–84, 1998. 157. Baca, M., et al. Phage display of a catalytic antibody to optimize affinity for transition-state analog binding. Proc. Natl. Acad. Sci. USA, 94(19), 10063–68, 1997.

2-34

Evolutionary Tools in Metabolic Engineering

158. Janda, K.D., et al. Chemical selection for catalysis in combinatorial antibody libraries. Science, 275(5302), 945–48, 1997. 159. Choo, Y. and A. Klug. Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. Proc. Natl. Acad. Sci. USA, 91(23), 11163–67, 1994. 160. Jamieson, A.C., S.H. Kim, and J.A. Wells. In vitro selection of zinc fingers with altered DNA-binding specificity. Biochemistry, 33(19), 5689–95, 1994. 161. Rebar, E.J. and C.O. Pabo. Zinc finger phage: affinity selection of fingers with new DNA-binding specificities. Science, 263(5147), 671–73, 1994. 162. Beste, G., et al. Small antibody-like proteins with prescribed ligand specificities derived from the lipocalin fold. Proc. Natl. Acad. Sci. USA, 96(5), 1898–903, 1999. 163. Schlehuber, S., G. Beste, and A. Skerra. A novel type of receptor protein, based on the lipocalin scaffold, with specificity for digoxigenin. J. Mol. Biol., 297(5), 1105–20, 2000. 164. Soumillion, P., et al. Selection of beta-lactamase on filamentous bacteriophage by catalytic activity. J. Mol. Biol., 237(4), 415–22, 1994. 165. Demartis, S., et al. A strategy for the isolation of catalytic activities from repertoires of enzymes displayed on phage. J. Mol. Biol., 286(2), 617–33, 1999. 166. Pedersen, H., et al. A method for directed evolution and functional cloning of enzymes. Proc. Natl. Acad. Sci. USA, 95(18), 10523–28, 1998. 167. Atwell, S. and J.A. Wells. Selection for improved subtiligases by phage display. Proc. Natl. Acad. Sci. USA, 96(17), 9497–502, 1999. 168. Ponsard, I., et al. Selection of metalloenzymes by catalytic activity using phage display and catalytic elution. Chembiochem, 2(4), 253–59, 2001. 169. Francisco, J.A., C.F. Earhart, and G. Georgiou. Transport and anchoring of beta-lactamase to the external surface of Escherichia coli. Proc. Natl. Acad. Sci. USA, 89(7), 2713–17, 1992. 170. Boder, E.T. and K.D. Wittrup. Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol., 15(6), 553–57, 1997. 171. Ernst, W., et al. Baculovirus surface display: construction and screening of a eukaryotic epitope library. Nucleic Acids Res., 26(7), 1718–23, 1998. 172. Christmann, A., et al. The cystine knot of a squash-type protease inhibitor as a structural scaffold for Escherichia coli cell surface display of conformationally constrained peptides. Protein Eng., 12(9), 797–806, 1999. 173. Daugherty, P.S., et al. Antibody affinity maturation using bacterial surface display. Protein Eng., 11(9), 825–32, 1998. 174. Boder, E.T., K.S. Midelfort, and K.D. Wittrup. Directed evolution of antibody fragments with monovalent femtomolar antigen-binding affinity. Proc. Natl. Acad. Sci. USA, 97(20), 10701–5, 2000. 175. Kieke, M.C., et al. Selection of functional T cell receptor mutants from a yeast surface-display library. Proc. Natl. Acad. Sci. USA, 96(10), 5651–56, 1999. 176. Olsen, M.J., et al. Function-based isolation of novel enzymes from a large library. Nat. Biotechnol., 18(10), 1071–74, 2000. 177. Mattheakis, L.C., R.R. Bhatt, and W.J. Dower. An in vitro polysome display system for identifying ligands from very large peptide libraries. Proc. Natl. Acad. Sci. USA, 91(19), 9022–26, 1994. 178. Hanes, J. and A. Pluckthun. In vitro selection and evolution of functional proteins by using ribosome display. Proc. Natl. Acad. Sci. USA, 94(10), 4937–42, 1997. 179. Roberts, R.W. and J.W. Szostak. RNA–peptide fusions for the in vitro selection of peptides and proteins. Proc. Natl. Acad. Sci. USA, 94(23), 12297–302, 1997. 180. Nemoto, N., et al. In vitro virus: bonding of mRNA bearing puromycin at the 3’-terminal end to the C-terminal end of its encoded protein on the ribosome in vitro. FEBS Lett., 414(2), 405–8, 1997.

Improving Protein Functions by Directed Evolution

2-35

181. Cull, M.G., J.F. Miller, and P.J. Schatz. Screening for receptor ligands using large libraries of peptides linked to the C terminus of the lac repressor. Proc. Natl. Acad. Sci. USA, 89(5), 1865–69, 1992. 182. Bloom, J.D., et al. Evolving strategies for enzyme engineering. Curr. Opin. Struct. Biol., 15(4), 447–52, 2005. 183. Parales, R.E. and J.L. Ditty. Laboratory evolution of catabolic enzymes and pathways. Curr. Opin. Biotechnol., 16(3), 315–25, 2005. 184. Hibbert, E.G., et al. Directed evolution of biocatalytic processes. Biomol. Eng., 22(1–3), 11–19, 2005. 185. Chica, R.A., N. Doucet, and J.N. Pelletier. Semi-rational approaches to engineering enzyme activity: combining the benefits of directed evolution and rational design. Curr. Opin. Biotechnol., 16(4), 378–84, 2005. 186. Powell, K.A., et al. Directed evolution and biocatalysis. Angew Chem. Int. Ed. Engl., 40(21), 3948–59, 2001. 187. Otten, L.G. and W.J. Quax. Directed evolution: selecting today’s biocatalysts. Biomol. Eng., 22(1–3), 1–9, 2005. 188. Hibbert, E.G. and P.A. Dalby. Directed evolution strategies for improved enzymatic performance. Microb. Cell. Fact, 4, 1–6, 2005. 189. Williams, G.J., A.S. Nelson, and A. Berry. Directed evolution of enzymes for biocatalysis and the life sciences. Cell. Mol. Life Sci., 61(24), 3034–46, 2004. 190. Qian, Z. and S. Lutz. Improving the catalytic activity of Candida antarctica lipase B by circular permutation. J. Am. Chem. Soc., 127(39), 13466–67, 2005. 191. Castle, L.A., et al. Discovery and directed evolution of a glyphosate tolerance gene. Science, 304(5674), 1151–54, 2004. 192. Fong, S., et al. Directed evolution of D-2-keto-3-deoxy-6-phosphogluconate aldolase to new variants for the efficient synthesis of D- and L-sugars. Chem. Biol., 7(11), 873–83, 2000. 193. Cho, C.M., A. Mulchandani, and W. Chen. Altering the substrate specificity of organophosphorus hydrolase for enhanced hydrolysis of chlorpyrifos. Appl. Environ. Microbiol., 70(8), 4681–85, 2004. 194. Raillard, S., et al. Novel enzyme activities and functional plasticity revealed by recombining highly homologous enzymes. Chem. Biol., 8(9), 891–98, 2001. 195. Antikainen, N.M., et al. Altering substrate specificity of phosphatidylcholine-preferring phospholipase C of Bacillus cereus by random mutagenesis of the headgroup binding site. Biochemistry, 42(6), 1603–10, 2003. 196. Rothman, S.C., M. Voorhies, and J.F. Kirsch. Directed evolution relieves product inhibition and confers in vivo function to a rationally designed tyrosine aminotransferase. Protein Sci., 13(3), 763–72, 2004. 197. Onuffer, J.J. and J.F. Kirsch. Redesign of the substrate specificity of Escherichia coli aspartate aminotransferase to that of Escherichia coli tyrosine aminotransferase by homology modeling and sitedirected mutagenesis. Protein Sci., 4(9), 1750–57, 1995. 198. Santoro, S.W., et al. An efficient system for the evolution of aminoacyl-tRNA synthetase specificity. Nat. Biotechnol., 20(10), 1044–48, 2002. 199. van Loo, B., et al. Directed evolution of epoxide hydrolase from A. radiobacter toward higher enantioselectivity by error-prone PCR and DNA shuffling. Chem. Biol., 11(7), 981–90, 2004. 200. Johannes, T.W., R.D. Woodyer, and H. Zhao. Directed evolution of a thermostable phosphite dehydrogenase for NAD(P)H regeneration. Appl. Environ. Microbiol., 71(10), 5728–34, 2005. 201. Giver, L., et al. Directed evolution of a thermostable esterase. Proc. Natl. Acad. Sci. USA, 95(22), 12809–13, 1998.

2-36

Evolutionary Tools in Metabolic Engineering

202. Sriprapundh, D., C. Vieille, and J.G. Zeikus. Directed evolution of Thermotoga neapolitana xylose isomerase: high activity on glucose at low temperature and low pH. Protein Eng., 16(9), 683–90, 2003. 203. Hao, J. and A. Berry. A thermostable variant of fructose bisphosphate aldolase constructed by directed evolution also shows increased stability in organic solvents. Protein Eng. Des. Sel., 17(9), 689–97, 2004. 204. Ness, J.E., et al. DNA shuffling of subgenomic sequences of subtilisin. Nat. Biotechnol., 17(9), 893–96, 1999. 205. Dumas, D.P., et al. Purification and properties of the phosphotriesterase from Pseudomonas diminuta. J. Biol. Chem., 264(33), 19659–65, 1989. 206. Roodveldt, C. and D.S. Tawfik. Directed evolution of phosphotriesterase from Pseudomonas diminuta for heterologous expression in Escherichia coli results in stabilization of the metal-free state. Protein Eng. Des. Sel., 18(1), 51–58, 2005. 207. Vasserot, A.P., et al. Optimization of protein therapeutics by directed evolution. Drug Discov. Today, 8(3), 118–26, 2003. 208. Kurtzman, A.L., et al. Advances in directed protein evolution by recursive genetic recombination: applications to therapeutic proteins. Curr Opin Biotechnol, 12(4), 361–70, 2001. 209. Delagrave, S. and D.J. Murphy. In vitro evolution of proteins for drug development. Assay Drug Dev. Technol., 1(1 Pt 2), 187–98, 2003. 210. Marshall, S.H. DNA shuffling: induced molecular breeding to produce new generation long-lasting vaccines. Biotechnol. Adv., 20(3–4), 229–38, 2002. 211. Vellard, M. The enzyme as drug: application of enzymes as pharmaceuticals. Curr. Opin. Biotechnol., 14(4), 444–450, 2003. 212. Patten, P.A., R.J. Howard, and W.P. Stemmer. Applications of DNA shuffling to pharmaceuticals and vaccines. Curr. Opin. Biotechnol., 8(6), 724–33, 1997. 213. Ballinger, M.D., et al. Selection of heregulin variants having higher affinity for the ErbB3 receptor by monovalent phage display. J. Biol. Chem., 273(19), 11675–84, 1998. 214. Yang, J.H., et al. Enhancing the anticoagulant potency of soluble tissue factor mutants by increasing their affinity to factor VIIa. Thrombosis and Haemostasis, 87(3), 450–58, 2002. 215. Wu, H.R., et al. Stepwise in vitro affinity maturation of Vitaxin, an alpha(v)beta(3)-specific humanized mAb. Proc. Natl. Acad. Sci. USA, 95(11), 6037–42, 1998. 216. Pearce, K.H., Jr., et al. Growth hormone binding affinity for its receptor surpasses the requirements for cellular activity. Biochemistry, 38(1), 81–89, 1999. 217. Hanes, J., et al. Picomolar affinity antibodies from a fully synthetic naive library selected and evolved by ribosome display. Nat. Biotechnol., 18(12), 1287–92, 2000. 218. Blatt, L.M., et al. The biologic activity and molecular characterization of a novel synthetic interferon-alpha species, consensus interferon. J. Interferon Cytokine Res., 16(7), 489–99, 1996. 219. Chang, C.C.J., et al. Evolution of a cytokine using DNA family shuffling. Nat. Biotechnol., 17(8), 793–97, 1999. 220. Barak, Y., et al. New enzyme for reductive cancer chemotherapy, YieF, and its improvement by directed evolution. Mol. Cancer Ther., 5(1), 97–103, 2006. 221. Soong, N.W., et al. Molecular breeding of viruses. Nat. Genet., 25(4), 436–39, 2000. 222. Powell, S.K., et al. Breeding of retroviruses by DNA shuffling for improved stability and processing yields. Nat. Biotechnol., 18(12), 1279–82, 2000. 223. Maheshri, N., et al. Directed evolution of adeno-associated virus yields enhanced gene delivery vectors. Nat. Biotechnol., 24(2), 198–204, 2006. 224. Jensen, R.A. Enzyme recruitment in evolution of new function. Ann. Rev. Microbiol., 30, 409–25, 1976. 225. Copley, S.D. Enzymes with extra talents: moonlighting functions and catalytic promiscuity. Curr. Opin. Chem. Biol., 7(2), 265–72, 2003.

Improving Protein Functions by Directed Evolution

2-37

226. James, L.C. and D.S. Tawfik. Conformational diversity and protein evolution—a 60-year-old hypothesis revisited. Trends Biochem. Sci., 28(7), 361–68, 2003. 227. Gerlt, J.A., P.C. Babbitt, and I. Rayment. Divergent evolution in the enolase superfamily: the interplay of mechanism and specificity. Arch. Biochem. Biophys., 433(1), 59–70, 2005. 228. O’Brien, P.J. and D. Herschlag. Catalytic promiscuity and the evolution of new enzymatic activities. Chem. Biol., 6(4), R91–R105, 1999. 229. Chen, Z. and H. Zhao. Rapid creation of a novel protein function by in vitro coevolution. J. Mol. Biol., 348(5), 1273–82, 2005. 230. Yoshikuni, Y., T.E. Ferrin, and J.D. Keasling. Designed divergent evolution of enzyme function. Nature, 440(7087), 1078–1082, 2006.

3 Engineering DNA and RNA Regulatory Regions through Random Mutagenesis and Screening 3.1

Introduction ��3-1

3.2 3.3

Promoters, Operators, and Enhancers...........................................3-2 Practical Approaches to Promoter Mutagenesis and Cloning ��3-3

Why Control Gene Expression? • Challenges • Outline of Chapter

Randomization of Promoter by Whole Plasmid PCR • Chromosomal Integration

Ichiro Matsumura Emory University School of Medicine

Sean A. Lynch Emory University

Justin P. Gallivan Emory University

3.4

Review of Engineered RNA-Based Regulatory Systems............ 3-8

3.5

rinciples and Protocols for Creating Synthetic P Riboswitches ��3-12

Ligand-Independent RNA Regulatory Systems • Ligand-Dependent RNA Regulatory Systems

Aptamer Selection • Design Considerations for Creating Synthetic Riboswitches • Developing a High Throughput Screening Method— General Considerations • Creating Synthetic Riboswitches Using High Throughput Screening

3.6 Conclusions ��3-16 References ��3-16

3.1 Introduction 3.1.1 Why Control Gene Expression? The general goal of metabolic engineering is to use living cells to synthesize one or more product molecules. Often, novel secondary metabolic pathways are expressed within recombinant cells. The yield of the desired product(s) is optimized by up-regulating the expression of foreign and native genes, and/ or down-regulating the expression of others. The secondary metabolic pathways of wild-type cells are usually regulated, thus enabling homeostasis and responses to internal and external stimuli. In contrast, human-made gene expression systems, if they are regulated at all, are generally designed to respond to an external stimulus (e.g. isopropyl-β-d-thiolgalactopyranoside, galactose, tetracycline). Until recently, few artificial expression systems enabled the “fine tuning” of transcription rates, and virtually none 3-1

3-2

Evolutionary Tools in Metabolic Engineering

were designed for self-regulation. Here we review techniques that enable fine-tuning and the control of gene expression with novel effector molecules.

3.1.2 Challenges Metabolic engineers must precisely regulate the expression of multiple genes to optimize the productivity of the cell. Excessive expression of any gene can be toxic to the engineered cell, and could t herefore reduce yield. The expression of multiple genes encoding the enzymes of a metabolic pathway must be coordinated in order to avoid bottlenecks.1,2 This “fine tuning” problem is complicated by the context-dependence of complex molecular systems. All biological systems are complex, which means that interactions between their components create emergent properties. A promoter that produces moderate steady-state concentrations of one transcript may produce too much or too little of another. A chromosomal promoter that is tightly repressed can be leaky when cloned into a high copy number plasmid. A protein that is essential to one cell type can be toxic to another. Another general challenge is that of effector recognition. The rational recombination of effector-dependent repressors, activators, and operators can enable the regulation of any gene by a variety of naturally occurring inducers. One goal of metabolic engineering, however, is the biosynthesis of unnatural compounds. It would therefore be advantageous to fabricate gene regulation systems that are activated or repressed by any arbitrarily chosen compound. This will require the engineering of existing regulatory molecules3 or the selection of entirely new ones. These techniques can be applied to direct the evolution of novel metabolic pathways, or to assemble novel regulatory pathways (synthetic biology).

3.1.3 Outline of Chapter We begin with a review of promoters and promoter mutagenesis studies. High throughput assays to assess the strengths of wild-type and mutant promoters are also discussed. Many expression cassettes (promoter-gene) are genetically unstable when cloned into multicopy plasmids, so approaches to chromosome engineering are also considered. Next we review reports of riboswitches, which are liganddependent RNA sequences that control the expression of metabolic genes. The selection of aptamers, the ligand-binding component of riboswitches, is briefly reviewed, and practical approaches to riboswitch design and construction are discussed.

3.2 Promoters, Operators, and Enhancers Genes are regulated at several levels: transcription, posttranscriptional mRNA modification, translation and posttranslational protein modification. In eukaryotes, the localization of mRNAs and proteins also play important roles in gene regulation. Here we focus upon the regulation of transcription and translation in Eschericia coli and Acinetobacter baylyi sp. ADP1. Transcription is catalyzed by RNA polymerases, which bind upstream of the gene at sites called promoters. Differences in promoter sequence lead to vast differences in transcription initiation efficiencies (vide infra). The control of transcription is mediated by DNA-binding proteins called repressors, which prevent RNA polymerase from binding the promoter, and activators, which potentiate the interaction. Repressors bind elements outside the promoter called operators, while activators bind other (nonpromoter) sites called enhancers. The rational recombination of promoters, operators, and enhancers can switch the control of transcription to different inducers. For example, the chimeric tac promoter contains the strong trp promoter and the operator region of the lac promoter. The tac promoter is 5–10 fold stronger than the lac promoter, but is derepressed by isopropyl-β-d-thiogalactopyranoside (IPTG) in lacI + E. coli cells.4 Sequence comparisons showed that two regions −35 and −10 base pairs upstream of the RNA start site are conserved;5 stronger promoters tended to show greater similarity to a consensus sequence. The functions of sequences within the promoter were studied through random mutagenesis and high throughput clone analysis; 75% of all mutations that affect promoter function fall within these −35 or −10 regions.6

Engineering DNA and RNA Regulatory Regions

3-3

Mutations in the −35 region are generally very deleterious with respect to transcription initiation, while those in the −10 region can have mild to strong effects.7 Site saturation mutagenesis of the −35 and −10 regions and genetic selection led to the identification of promoter sequences that are stronger than the consensus, at least within the context of particular promoters on particular multicopy plasmids.8,9 Many workers have applied random promoter mutagenesis and high throughput screening to optimize gene expression. The mutagenesis has been applied in three different ways with broadly similar results. We replace the six nucleotide −10 region with random sequence to achieve the widest range of promoter strengths within the smallest populations.10,11 Miksch and coworkers similarly “randomize” the nucleotides upstream of the −10 region.12,13 We presume that many of the mutant promoters will be weaker than the wild-type, and that this result is desirable for otherwise strong promoters on high copy number plasmids. Conversely, Jensen and his colleagues randomize the sequences outside the conserved −35 and −10 regions, presumably to achieve a more gradual variation of promoter strengths.14–17 Stephanopoulos and his colleagues use error-prone PCR to introduce random mutations throughout the promoter.18,19 The promoter libraries are usually cloned into plasmids upstream of reporter genes.11,12,16,19 The reporter genes associated with selected promoter variants can later be replaced with metabolic genes.19 In some cases, the gene is cloned next to a reporter gene so that both are expressed as a polycistronic message (promoter library-ribosome binding site-metabolic gene-ribosome binding site-reporter geneterminator).16,20 E. coli are transformed with these constructs, and those that exhibit the desired phenotypes (reporter gene expression levels) are identified in high throughput screens. The best screens are sensitive, high in throughput, broad in dynamic range and precise (such that isogenic clones exhibit little phenotypic variance). Candidate expression plasmids should always be re-evaluated (using the high throughput screen) several times to demonstrate that they consistently produce the desired phenotype. Multi-copy plasmids are most often employed because they enable the highest cloning efficiencies, and therefore the most diverse libraries. Unfortunately, they are also most prone to genetic instability. When plasmids do not segregate evenly between daughter cells, and when the constitutive or leaky expression of a gene imparts a growth disadvantage, individual cells within a genetically homogeneous population will express different amounts of protein.21 This problem can be overcome in several ways, each with its own advantages and disadvantages. Lower copy vectors, such as those based upon pACYC184, pCDF, pCOLA (Novagen), are usually more stable than ultra-high copy number pUC-type plasmids.22 Singlecopy BACs should be the most stable, but are infrequently used as expression vectors because their low purification yields make them inconvenient. Expression vectors with tightly regulated promoters are usually more stable than those that are leaky or constitutive. Another way to stabilize an expression construct (promoter-gene) is to integrate it into the bacterial chromosome. Alper et al. have randomly mutated promoters, cloned them into plasmids upstream of a reporter gene (GFP), and identified constitutive promoter variants that produce different amounts of GFP. The GFP genes were replaced with those encoding selected metabolic enzymes, and the new expression constructs were integrated into the E. coli chromosome using the Red/ET recombination system19 (Section 3.3.2.1). It might also be possible to integrate libraries of expression constructs (mutant promoter-reporter gene-selectable marker) into the chromosome using “gene gorging,” which utilizes self-linearizing plasmids as templates for recombination.23 Alternatively, expression construct libraries can be integrated into the chromosome of Acinetobacter baylyi sp. ADP1 (Section 3.3.2.2). This organism is naturally competent and integrates foreign DNA into its chromosome much more efficiently than E. coli.24

3.3 Practical Approaches to Promoter Mutagenesis and Cloning 3.3.1 Randomization of Promoter by Whole Plasmid PCR We can randomize the six nucleotides in the −10 region by: (a) synthesis of oligonucleotides encoding degenerate sequences at the 5ʹ ends, (b) long PCR amplification of the entire expression vector, (c) purification of the PCR product, (d) self-ligation to recircularize the plasmid, and (e) transformation of E. coli.

3-4

Evolutionary Tools in Metabolic Engineering

PCRs often fail when the reaction conditions are too stringent (such that the primers never anneal) or too relaxed (primers or partially extended products anneal to the wrong sequences). The latter problems become pronounced when the sequence becomes very long, when the base composition is imbalanced (GC-rich or AT-rich) or when repeating sequences exist within the region to be amplified. In our hands, the well characterized long PCR protocol developed by Suzanne Cheng25–27 has proven most reliable for plasmid constructs up to 10 kb (amplification of products up to 20 kb is possible). The Tth/Vent polymerase mixture used in this protocol is modestly mutagenic28 and recombinagenic.29 These side-effects are acceptable, even desirable, for libraries that are later evaluated in high throughput screens. The newer high processivity DNA polymerase variants30 should reduce the mutation and recombination rates when the throughput of the screen is low. 3.3.1.1 Primer Design The primers are designed to amplify the entire expression vector EXCEPT the −10 region. One primer should also have six degenerate nucleotides (NNNNNN, where N is an equimolar mixture of G, A, T and C) at its 5ʹ end to replace those in the −10 region (Figure 3.1). For long PCR, Cheng and her colleagues recommend high Tm primers (62–70°C) and higher annealing temperatures to maximize specificity.25 In addition, we recommend the following design rules: (1) Avoid ending your oligonucleotide with a run of three or more identical bases. (2) Primers containing restriction sites at the 5ʹ end should also contain three to six additional nucleotides at the extreme 5ʹ end. (3) Always test your primers “in silico” using Amplify 1.2 (Apple Macintosh) or FastPCR (WindowsXP) to eliminate primer dimers and secondary binding sites. (4) Try to keep primers shorter than 45 nucleotides long to reduce the proportion of those containing deletions, and to eliminate the need for purification. 3.3.1.2 Polymerase Mixtures Applied Biosystems sells a mixture of Tth and Vent polymerases in its GeneAmp XL PCR kit, which comes with 3.3x PCR buffer. However, we choose to mix separately purchased enzyme preparations (13.33 units of Epicenter Tth DNA polymerase: 1 unit NEB Vent polymerase in 50% glycerol, 20 mM Tris pH 8.0, 100 mM KCl, 0.1 mM EDTA, 0.5% Tween 20). We also prepare our own 5x long PCR buffer (125 mM Tricine, pH 8.7, 425 mM K-acetate, 40% glycerol, 5% DMSO). Both the enzyme and buffer are stored at −20°C. 3.3.1.3 Hot Start Thermostable polymerases are often active at mesophilic temperatures, so the mixing of the PCR components at room temperature can lead to the extension of misannealed primers.31 We recommend incubation of the incomplete mixture to at least 80°C before the addition of 0.5 μL Tth/Vent DNA polymerase mixture (1 unit). The completed reaction should be mixed in the thermocycler by pipetting 10 μL up and down to ensure that glycerol-containing DNA polymerases do not settle on the bottom of the PCR tube. The PCR cycling program should then be initiated before the tube has a chance to cool. 5’-GTGTGGAATTGTGAGCGGATAACAATTT-3’ (-35 region) (-10 region) (f-Met) CAGGCTTTACATTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATG GTCCGAAATGTAAATACGAAGGCCGAGCATACAACACACCTTAACACTCGCCTATTGTTAAAGTGTGTCCTTTGTCGATAC 3’-CGAAATGTAAATACGAAGGCCGAGCNNNNNN-5’

Figure 3.1 “Randomization” of the −10 region of the lac promoter. Both strands of the lac promoter sequence are shown. The primers above and below the double-stranded DNA can be used to PCR amplify the entire plasmid. Self-ligation of the PCR product produces a plasmid in which each nucleotide in the −10 region has been replaced by an equimolar mixture of G, A, T, and C.

Engineering DNA and RNA Regulatory Regions

3-5

3.3.1.4 Purification of Long PCR Product The purification of the product away from the polymerase is a cumbersome but necessary step for efficient ligation. The failure to completely eliminate the polymerase can lead to undesirable modification of the ends of the DNA, so we use a combination of proteinase K, silica-based DNA affinity chromatography, restriction digestions, and extractions with phenol and chloroform. Neither proteinase K nor restriction enzymes work well in XL PCR buffer, presumably due to the DMSO, so we first use the Promega Wizard PCR prep procedure to eliminate primers and exchange buffers. Any other silicabased DNA purification, protocols should work as well, but such methods apparently fail to completely eliminate thermostable polymerases bound to DNA ends.32 The partially purified PCR products are partially digested with DpnI. This restriction enzyme specifically cleaves methylated DNA from dam + E. coli strains, and should thus eliminate wild-type plasmid template. The thermostable polymerases are then eliminated by reaction with 0.5% SDS, 20 mM EDTA and 50 μg/mL proteinase K. These reactants are subsequently eliminated by phenol/chloroform extraction, ethanol precipitation, and a Qiagen PCR purification (again, any silica-based DNA affinity purification protocol will do). 3.3.1.5 Self-ligation of PCR Product The 5ʹ ends of PCR product are phosphorylated with T4 Polynucleotide Kinase, “polished” (made blunt) with the thermolabile T4 DNA polymerase, and self-ligated using T4 DNA ligase. The ligase is heat-killed, and the reaction is desalted.33 E. coli cells are transformed by chemical transformation34 or electroporation35 of DNA. The resulting colonies should be evaluated in high throughput screens (see Section 3.4).

3.3.2 Chromosomal Integration 3.3.2.1 Red/ET Recombination (E. coli) Homologous recombination rates are normally low in wild-type E. coli, but can be increased through the bacteriophage λ Red/ET system.36 The recBC nuclease normally catalyzes the degradation of double-stranded DNA, so this activity must be eliminated through mutation or inhibition. The expression of the Rac prophage recE and recT genes promote recombination. These genes encode a nuclease and single-stranded DNA binding protein, respectively, that in combination with endogenous polymerases, and ligases mediate homologous recombination. Mutations in sbcA lead to the over-expression of the recE and recT genes in a prophage, so recBC - sbcA- strains such as JC8679 are constitutively recombinogenic. Alternatively, inducible vectors for the over-production of the λ Red operon can be used to increase recombination rates in any strain. This operon encodes the Gam protein, which inhibits the recBC nuclease, and the Redα and Redβ proteins (functionally equivalent to the recE and recT proteins, respectively). High Red/ET recombination frequencies (>1,000 recombinants) have been reported,37 but we have been unable to produce libraries using this approach. It seems likely that we were expressing the Gam, Redα and Redβ proteins at suboptimal levels,38 and that recombination frequency is sequencedependent. One possible solution would be to apply the aforementioned “gene gorging” technique.23 Stephanopoulos and his colleagues use plasmids to identify candidate promoter variants and Red/ET recombination to integrate individual expression cassettes into the E. coli chromosome.19 3.3.2.1.1 Natural Transformation (Acinetobacter baylyi sp. ADP1) Acinetobacter baylyi sp. ADP1 is naturally recombinogenic, so we are currently integrating libraries of uncloned promoters into its chromosome. It is a Gram-negative γ-proteobacterium that possesses two advantages over E. coli: natural competence and efficient homologous recombination. Up to ~25% of the A. baylyi cells in a mid-logarithmic culture can be transformed simply by coincubation with DNA

3-6

Evolutionary Tools in Metabolic Engineering

complementary to the chromosomal target site.39 The recombination efficiency is sufficient to generate libraries.40 A. baylyi cells grow in Luria Broth (LB), but three functional differences should be kept in mind. E. coli-specific promoters are generally less efficient in Acinetobacter, so broad-spectrum promoters, such as those derived from bacteriophage T5 and antibiotic resistance genes, should be employed. A. baylyi is naturally resistant to ampicillin and chloramphenicol, so markers that confer resistance to spectinomycin, kanamycin, and tetracycline should be employed instead.24 Finally, the effective transformation efficiency is directly correlated with the lengths of the homologous flanking sequences up to at least 5 kb.39 The integration of foreign DNA into A. baylyi ADP1 is straightforward, but the creation of recombinant integration constructs (A. baylyi chromosome sequence 1-promoter-ribosome binding site-gene-selectable marker-A. baylyi chromosome sequence 2) requires some effort. The design and fabrication of multi-component DNA sequences can be streamlined by the adoption of standard restriction enzyme sites. For example, the “BioBricks” standard41 is based upon modules flanked by standard restriction sites (EcoRI-NotI-XbaI-“content”-SpeI-NotI-PstI, http://openwetware.org/wiki/Synthetic_ Biology:BioBricks). The content DNA must not contain EcoRI, NotI, XbaI, SpeI, or PstI sites; these must be removed during the initial cloning of the BioBrick. Two BioBricks can be combined by digesting the donor with EcoRI and SpeI, digesting the recipient with EcoRI and XbaI and ligating them together (XbaI and SpeI cut to form compatible sticky ends). The construction of BioBrick versions of A. baylyi chromosome sequences and selectable markers (reported elsewhere) facilitate the combinatorial assembly of integration cassettes. 3.3.2.1.2 Primer Design for Overlap PCR Alternatively, integration constructs can be assembled in a series of long PCRs. Two nearby 5 kb fragments of the A. baylyi chromosome, hereafter called the left and right flanks, are amplified in the first two PCRs. The expression/selection cassette (T5 promoter-gene-selectable marker) is prepared by traditional cloning techniques and amplified in a third PCR. Finally, the left flank, expression/selection cassette and right flank are recombined in an overlap PCR (Figures 3.2 and 3.3). In this reaction, the strands of the original PCR products are denatured and reannealed. The “top” strand of one PCR product will 5´ 3´

3´ 5´

5´ 3´

3´ 5´

Denaturation 5´

3´

5´

3´

3´

5´

3´

5´

Annealing 5´

3´

3´

5´

Extension 5´ 3´

3´ 5´

Figure 3.2 Overlap PCR. Two PCR products are represented as gray and black strands, respectively. The sequence at the extreme right side of the gray PCR product is identical to that of the left site of the black PCR product. When the two are combined in a single PCR, the top strand of the gray product can anneal to the bottom strand of the black product. The polymerase-catalyzed extension of this mega-primer dimer produces a single recombined product.

3-7

Engineering DNA and RNA Regulatory Regions

5´ 3´

1

3´ 5´ Left flank 2 5’ 3´

3

3´ 5´ Promoter-gene-marker 4

5´ 3´

5

3´ 5´ Right flank 6

Left flank

Primer 3

Promoter-gene-marker

cgc gt gg ca -3’ 5’-g gc ca tg ac tc cc ca at at t gattagtagatcacaatggcggagctgaattacattcccaacc tta at gt aa gg gt tg gc gc ac cg t -5’ cg gt a ctgaggggttataactaatcatctagtgttaccgcctcgact 3’-c Primer 2

Figure 3.3 Assembly of integration constructs by overlap PCR. (Top) The acinetobacter chromosome sequences (left and right “flanks”) can be fused to the expression construct (promoter-gene-marker) by overlap PCR. (Bottom) Composite primers are designed to amplify each segment, and to create complementary ends for overlap PCR.

anneal to the “bottom” strand of a different PCR product if their ends are complementary. Extension of the giant “primer dimer” creates a double-stranded fusion of the two products, which in turn can be amplified in a regular PCR with external primers.42 We presently use inexpensive and/or free software (Clone Manager, Sci Ed Central or Vector NTI, Invitrogen; Molecular BioComputing Suite;43 FastPCR, http://www.biocenter.helsinki.fi/bi/Programs/ fastpcr.htm) to design the primers. First, we identify a chromsomal target in the A. baylyi genome sequence (http://www.cns.fr/externe/English/Projets/Projet_DY/DY.html); we generally target apparently nonessential sites such as putative antibiotic resistance markers or prophage regions.44 Second, we copy and paste the desired left flank, expression/selection cassette and right flank sequences together in a word processing program. Third, we use sequence analysis software to visualize the top and bottom strands of the virtual construct. Fourth, the primers are copied and pasted from the sequence analysis software back into the word processing file. In general, the primers for the amplification of the flanks should be long because the Acinetobacter chromosome is AT rich.44 Primer 1 is identical to the top strand (Tm = 80°C according to the formula Suggs et al.45) of the left flank (Figure 3.3). Primer 2 is identical to the bottom strand at the intersection of the left flank (Tm = 80°C) and the expression/selection cassette (Tm = 40°C). Primer 3 is the top strand at the intersection of the left flank (Tm = 40°C) and the expression/selection cassette (Tm = 80°C). The top strand of the first PCR product (left flank) is thus complementary (Tm = 80°C) to the bottom strand of the second PCR product (expression/selection cassette). 3.3.2.1.3 Long PCR and Overlap PCR The three long PCRs are similar in procedure to the whole plasmid PCR, using the Tth/Vent DNA polymerase mixture (Section 3.3.2.1.4). Whole A. baylyi ADP1 cells can be used as the template for amplification of the left flank (Figure 3.3, primers 1 and 2) and the right flank (primers 5 and 6). We are currently using expression vectors with a T5 promoter, the gene of interest and a spectinomycin resistance marker (PT5-pCDF) as the template for amplification of the expression/selection cassette (primers 3 and 4). We usually gel purify the PCR products to eliminate the templates and PCR sideproducts that could interfere with the overlap PCR. The overlap PCR is simply another long PCR using the three products (left flank, expression/selection cassette, right flank) as templates and the outside

3-8

Evolutionary Tools in Metabolic Engineering

oligonucleotides (1 and 6) as primers. The extension of annealed templates is likely catalyzed by the Vent polymerase, which has strong strand displacement activity (www.neb.com). 3.3.2.1.4 Transformation of Acinetobacter baylyi sp. Strain ADP1 A. baylyi ADP1 is generally propagated in LB medium at 30°C.24 It is an obligate aerobe,44 and we find that significant agitation (small liquid culture volumes, large vessels, >250 rpm shakers) is necessary to achieve maximum growth rates. Saturated cultures are diluted 1/15-fold (20 μl of culture into 300 μl of media) and shaken vigorously for 2 hours at 30°C. The overlap PCR product (~1 μg in 1–50 μL ) is added directly from the thermal cycler to the mid-logarithmic culture. The culture should be shaken for an additional 3 hours at 30°C and spread on LB agar plates supplemented with spectinomycin and a histochemical marker (such as 5-bromo-4-chloro-4-indolyl-β-d-glucuronide, X-gluc, for the detection beta-glucuronidase activity). Colonies form during overnight incubations at 37°C.

3.4 Review of Engineered RNA-Based Regulatory Systems While RNA occupies the center position of the central dogma of molecular biology, it was long thought to play only a passive role in the transmission of information from DNA to proteins. Over the past few decades, RNA has been shown to play myriad functional roles in the cell, ranging from catalysis to gene regulation. The versatility of RNA derives from its ability to fold upon itself to form complex structures, as well as its ability to interact with DNA, proteins, small molecules, and other RNAs. Because many RNA-based regulatory systems including microRNAs46 and small interfering RNAs47 have been reviewed extensively elsewhere, we will limit the discussion primarily to prokaryotic systems with a special emphasis on systems that can be engineered to regulate gene expression in a metabolite-dependent fashion.

3.4.1 Ligand-Independent RNA Regulatory Systems One of the most straightforward methods to fine tune the expression of prokaryotic genes in a ligandindependent fashion is through the introduction of engineered RNA secondary structure. RNA secondary structure can influence gene expression through a variety of mechanisms, including regulating mRNA stability and modulating the rate of translation initiation. de Smit and van Duin have extensively studied the effects of engineered secondary structure near the ribosome binding site (RBS) on the rates of translation initiation of bacterial mRNA transcripts.48–52 Specifically, they demonstrated that increasing the stability of a hairpin structure near the RBS reduced the translation rate of an mRNA in a manner proportional to the strength of the hairpin. Originally this behavior was explained using a thermodynamic model in which mRNA structures with paired and unpaired RBSs were in rapid equilibrium with one another, but only the fraction of the mRNA pool with unpaired RBSs could initiate translation. In the thermodynamic model, increasing the strength of the engineered secondary structure reduced the population of free RBSs and reduced translation proportionally.48,49 Although this model satisfactorily explained a variety of data, a further analysis based on more recent observations of the kinetics of RNA folding led to a refinement of the model, but did not alter the fundamental relationship between mRNA secondary structure near the RBS and translational efficiency.52 Living systems also take advantage of secondary structure near the RBS to regulate translation in a temperature-sensitive fashion, whereby increases in temperature can “melt” the secondary structure and increase translation.53–59 A beautiful example of translational thermoregulation occurs in the expression of the rpoH gene in E. coli.55 The rpoH gene encodes the σ32-transcription factor protein, which activates the expression of several genes involved in the heat-shock response. An increase in temperature from 30°C to 42°C reduces the amount of base-pairing near the RBS of the rpoH mRNA and increases the expression of the σ32-transcription factor 3.5-fold.55 While a variety of other genes are regulated in a similar fashion, the significance of thermosensing is often less clear.

Engineering DNA and RNA Regulatory Regions

3-9

In addition to affecting the rate of translation initiation, mRNA secondary structure can also affect transcript stability. RNA hairpins located either 5ʹ- or 3ʹ- to a coding region can often produce increased levels of protein expression by protecting a transcript from degradation by ribonucleases. Smolke and Keasling have demonstrated how introduction of mRNA secondary structure can control the levels of expression from poly-cistronic mRNAs for metabolic engineering applications.60–64 In a very nice example,62 the introduction of a ribonuclease cleavage site between two coding regions of a transcript resulted in two secondary transcripts upon RNA cleavage. The expression levels of each of the secondary transcripts could be tuned independently by engineering the stability of RNA hairpins that flanked the coding region. Additional modifications in secondary structure near the RBS could further regulate the protein expression levels. More recently, a number of groups have developed trans-acting riboregulators that conditionally regulate gene expression.65–67 The groups of Ptashne68 and Liu69 both adapted a yeast-three hybrid approach to transcriptional activation by incorporating a protein-binding domain into a library of trans-acting RNA sequences and selecting for variants that could recruit the yeast transcriptional machinery and activate the transcription of an essential gene. Buskirk and Liu extended this approach by including an RNA aptamer sequence that recognized the small molecule tetramethylrosamine in the library and screening for ligand-dependent activation of the reporter gene.69 Collins and coworkers65 have engineered cis-acting RNA sequences that repress translation of a reporter gene in bacteria by pairing to regions near the RBS; addition of a complementary RNA sequence in trans disrupts the base pairing near the RBS and activates translation of the reporter gene. Bayer and Smolke66 extended this approach by including a theophylline-binding RNA aptamer in the trans-acting sequence, which made gene expression theophylline-dependent in yeast. These and other systems that involve trans-acting RNAs have been reviewed extensively elsewhere,67 so we will focus primarily on ligand-dependent RNA-based regulatory systems that act in cis.

3.4.2 Ligand-Dependent RNA Regulatory Systems A primary goal of metabolic engineering is to enable the efficient multi-step synthesis of compounds from inexpensive feedstocks. The efficiency of any synthetic process depends on appropriately balancing the concentrations of the reactants, intermediates, products, and catalysts. This is particularly true in metabolic processes where intermediates may be toxic, act as inhibitors, or be diverted to other pathways if not acted on readily. Furthermore, protein synthesis places an additional burden on the cell and ill-timed production of an enzyme can lead to an inefficient process. Ideally, all components in a multi-step synthesis would be present at the right concentration and at the right time. Such control thus requires the ability to dynamically tune the level of enzyme activity, for example by increasing activity when a substrate is plentiful, and decreasing activity when a substrate is scarce, or when the immediate downstream product is abundant. Living systems have evolved a variety of mechanisms to regulate metabolism based on the demands of the cell. Many of these mechanisms rely on ligand-induced allostery, where binding of a metabolite regulates function. Allosteric systems operate in a variety of contexts ranging from the regulation of the activity of a single metabolic enzyme, such as the regulation of aspartate transcarbamoylase by both ATP and CTP, to the transcriptional control of entire metabolic pathways, exemplified by the (allo) lactose-induced transcription of genes within the lac operon. In addition to these “textbook examples” in which small molecules interact with proteins to regulate metabolism, it is now recognized that small molecules can interact directly with mRNA sequences to regulate metabolism through a mechanism now known as riboswitch control.70–79 3.4.2.1 Natural Riboswitches Riboswitches are cis-acting RNA elements that regulate gene expression in a ligand-dependent fashion.70–81 Riboswitches are comprised of an aptamer domain, which recognizes the ligand, and an

3-10

Evolutionary Tools in Metabolic Engineering

“expression platform” that regulates gene expression in one of several ways. As of this writing (mid2006), riboswitches that couple ligand binding to changes in transcription,72, 82–84 translation,70,80 or mRNA structure85 have been discovered in a variety of organisms. As will be discussed in Section 3.4.2.2, engineered riboswitches that operate via mechanisms such as alternative splicing are known,86 and it is not unreasonable to suspect that living systems may also employ such mechanisms to regulate gene expression. Furthermore, while most of the known natural riboswitches have been discovered in prokaryotes, a eukaryotic riboswitch from A. thaliana has been identified and its structure determined by crystallography,87 and it is extremely likely that other riboswitches will be discovered in a variety of organisms and pathways. 3.4.2.1.1 Mechanisms of Natural Riboswitches: Regulation of Translation Initiation As discussed in Section 3.4.1, one of the simplest mechanisms to regulate the initiation of translation is to introduce secondary structure near the RBS of an mRNA. Many riboswitches, particularly those found in Gram-negative bacteria, couple metabolite binding to changes in RNA secondary structure to regulate translation initiation. Most often, these riboswitches act to decrease the rate of translation initiation upon ligand binding by occluding access to the RBS. This mechanism of action is observed for riboswitches that regulate the expression of thiamine pyrophosphate (TPP) synthesis80 and coenzyme-B12 transport in E. coli,81 as well as the expression of a putative riboflavin transporter in B. subtilis.70 Because engineered riboswitches can activate translation in a ligand-dependent fashion (Section 3.4.2.2),88,89 the tendency of natural riboswitches to repress translation in a metabolite-dependent fashion likely results from evolutionary pressure to reduce metabolite synthesis when these compounds are plentiful, and not from any fundamental limitation of the mechanism. 3.4.2.1.2 Mechanisms of Natural Riboswitches: Regulation of Transcription In addition to regulating translation, riboswitches also employ small molecule-RNA interactions to control the fate of transcription. Like riboswitches that control translation, those that regulate transcription most often decrease gene expression in a metabolite-dependent fashion. Typically, the mechanism for this decrease involves a ligand-mediated early termination of transcription. In the absence of a ligand, transcription occurs normally, however, the presence of a ligand during transcription can favor the formation of an intrinsic terminator structure that leads to the premature dissociation of the RNA polymerase and a nonfunctional RNA transcript. Ligand-dependent transcriptional attenuation has been implicated as a mechanism of action for riboswitches that respond to S-adenosylmethionine,71,76,90–93 flavin mononucleotide, lysine,73,94 TPP,80 and guanine.72,83 While most riboswitches down-regulate gene expression, some such as the adenine riboswitch,84,95 up-regulate gene expression in a metabolite-dependent fashion. The adenine riboswitch is particularly interesting because the aptamer core that recognizes the ligand is nearly identical to that found in the guanine riboswitch.* However, unlike the guanine riboswitch, ligand binding to the adenine riboswitch increases the level of transcription. This activation is believed to occur by the disruption of a transcriptional terminator upon adenine binding, which leads to the production of a full-length mRNA transcript.84 Mechanistic studies performed by Wickiser et al.95 suggest that this process may be kinetically controlled, and that the “decision” of whether to terminate transcription is likely made before equilibrium binding is achieved. Thus, although full-length riboswitch transcripts produced in vitro may be able to equilibrate between two structures upon ligand binding, it is unclear whether this is relevant in vivo.95 Although many of the finer mechanistic details of riboswitch function must still be established, it is already clear that riboswitches employ a variety of control mechanisms to balance the thermodynamic and kinetic considerations of metabolite binding with other concurrent cell processes such as the rates of transcription, translation, and mRNA decay. * A single mutation in the guanine riboswitch (C to U) changes the ligand specificity from guanine to adenine.84

Engineering DNA and RNA Regulatory Regions

3-11

3.4.2.1.3 Mechanisms of Natural Riboswitches: Regulation of RNA Cleavage and Splicing Recent studies have shown that riboswitches can also act as metabolite-sensitive ribozymes that cleave an mRNA upon ligand binding.85,96 The first example of this mechanism was observed for the glmS riboswitch, which controls the expression of an enzyme that synthesizes glucosamine-6-phosphate (GlcN6P) in B. subtilis and other Gram-positive bacteria.85,96 Binding of GlcN6P to a region in the 5ʹ-UTR of the glmS mRNA reduces the half-life of a self-cleavage reaction from 4 hours to 15 seconds in vitro.85 In vivo, the addition of ligand reduces the expression of a reporter construct downstream of the riboswitch possibly by creating a nonfunctional transcript or by influencing the rate of RNA degradation, though the exact mechanism remains to be established. Finally, although there are examples of riboswitches that appear to effect ligand-dependent RNA splicing in eukaryotes,97 engineered RNA-based regulation systems based on this mechanism are not generally applicable in prokaryotes, which do not perform RNA splicing. From this short discussion, it should be clear that nature uses a variety of RNA-based mechanisms to regulate metabolic pathways in a ligand-dependent fashion. However, the explosion of new gene sequences and rapid advances in protein engineering now enable the heterologous expression of entirely novel metabolic pathways. As such, there is an increasing demand for new ligand-dependent gene control systems. In the following sections, we will describe how powerful in vitro selection techniques coupled with high throughput screening methods enable the creation of riboswitches that respond to entirely new compounds. Such riboswitches may become useful tools for controlling engineered metabolic pathways. 3.4.2.2 Synthetic Riboswitches The development of engineered ligand-regulated RNA expression systems actually presaged the confirmation of the natural riboswitch control mechanism by several years. In 1998, Werstuck and Green demonstrated that an RNA aptamer sequence that had been selected to bind to a small molecule in vitro could be cloned upstream of a reporter gene and repress protein translation in a small molecule-dependent fashion in eukaryotic cells.98 In this original study, aptamers that recognized the antibiotics kanamycin and tobramycin, and the Hoescht dyes 33258 and 33342 were all able to repress the translation of a reporter gene in a dose-dependent fashion when cloned upstream of the initiation codon.98 Subsequent studies have demonstrated that aptamers that bind to malachite green,99 biotin,100 theophylline,100 and tetracycline101–104 can also be cloned into the 5ʹ-untranslated region of a gene to repress translation in a small molecule-dependent fashion in eukaryotic translation systems. However, because translation initiation in eukaryotes differs substantially from that in prokaryotes, many of the strategies used to create these ligand-regulated systems, such as the use of multiple aptamer sequences,98,100 are not generally applicable to producing riboswitches that function in prokaryotic systems. Though most of the known riboswitches have been discovered in prokaryotes, there are relatively few examples of engineered riboswitches that control gene expression in these organisms. In 2004, Suess et al.89 and our group88 both reported riboswitches that controlled bacterial gene expression in a theophylline-dependent fashion. The Suess group created their riboswitch using a rational approach, in which the theophylline aptamer was cloned upstream of the RBS of a xylose repressor gene in B. subtilis.89 In the absence of theophylline, the RNA was proposed to adopt a structure that prevented access to the RBS. Addition of theophylline was proposed to cause a slipping of the helix near the RBS that granted access to the ribosome and induced the translation of the xylose repressor. Expression levels of the xylose repressor protein were determined by monitoring the theophylline-dependent repression of a separate reporter gene under the control of two xyl operator sequences.89 In our earlier study,88 we attempted to create a theophylline responsive riboswitch by cloning a t heophylline-binding aptamer into several locations of the 5ʹ-UTR of a lacZ reporter gene. Based on the earlier work of de Smit and van Duin (Section 3.4.1 and Refs 48–51), we hypothesized that simply having a theophylline-binding aptamer near the RBS might be sufficient to reduce the rate of translational

3-12

Evolutionary Tools in Metabolic Engineering

initiation upon theophylline binding, as ligand binding would strongly stabilize the aptamer structure. To our surprise, we discovered that theophylline activated gene expression in a dose-dependent fashion. Although we were able to establish that the riboswitch acted post-transcriptionally and appeared to be under thermodynamic control, we were unable to determine the precise mechanism of activation.* Nevertheless, we were able to show that this riboswitch could be used to distinguish between two closely related compounds, which is important in metabolic engineering applications that require discriminating between substrates and products. In addition, the riboswitch could effect the ligand-dependent expression of different genes, including an antibiotic resistance gene that coupled cell growth to the presence of a nonmetabolite, which suggested that cell growth could eventually be tied to the product of a biotransformation. While early studies of engineered and synthetic riboswitches have been promising, for riboswitches to become a standard feature of the metabolic engineer’s toolbox, it will be critical to develop methods to rapidly produce riboswitches that respond to new compounds in a selective, tunable, and predictable fashion. In the following sections, we will describe techniques that our lab has developed to create riboswitches with improved performance characteristics, which have also resulted in a better understanding of how riboswitches function.

3.5 Principles and Protocols for Creating Synthetic Riboswitches Fundamentally, a riboswitch couples the binding of a ligand by an aptamer to a change in gene expression. To create a riboswitch that can effectively regulate an engineered metabolic pathway, it is important to discover aptamers that selectively recognize a desired compound while discriminating against closely related compounds (e.g. precursors, or downstream products).

3.5.1 Aptamer Selection The fundamental principles of aptamer selection were reduced to practice in the early 1990s by the groups of Jack Szostak105 and Larry Gold,106 who demonstrated that RNA sequences that bind tightly and specifically to small molecule-targets could be discovered by subjecting libraries of random RNA sequences to iterative rounds of affinity chromatography and amplification. Over the past several years, these “in vitro selection” or “SELEX” methods have been refined in a variety of ways,107 including the introduction of counter-selection techniques that enable the isolation of aptamers that discriminate between related structures,108 and techniques based on allosteric ribozymes, which enable the selection of aptamers without affinity chromatography.109,110 While there are a variety of techniques to select aptamers, all benefit from the remarkable ability of RNA to recognize small molecules, the ability of RNA libraries to cover complete areas of sequence space,† and the ability of PCR to amplify rare sequences from large populations. This has allowed the discovery of aptamers that recognize a wide variety of compounds, including amino acids, alkaloids, and polyketides, and discriminate on the basis of subtle modifications in structure or stereochemistry.107 A variety of methods for aptamer selection are now available.105–107,109,110 Some of the best-established selection methods use affinity chromatography to partition binding sequences from nonbinders.106,108 In these selections, the target is typically immobilized to a solid support through either covalent or noncovalent interactions. Libraries of random RNA sequences are passed over the support to discover sequences that bind to the target. Extensive washing can eliminate nonspecific binding sequences, and washing with structurally related ligands can be used to eliminate nonselective binders. Repeated * We thus considered this riboswitch to be “synthetic” rather than “engineered”, as engineering typically implies the use of design principles that result in a desired outcome, even if the specific details remain murky. † A fully randomized library of 25 nucleotides (N = 4 25 ≈ 1015 possibilities) can be sampled in a single experiment. 25

Engineering DNA and RNA Regulatory Regions

3-13

rounds of amplification and selection can yield tight binding aptamers that often display impressive selectivities.108 One of the challenges of using affinity chromatography-based selection methods to isolate aptamers that recognize small molecules is that the target molecule must be attached to the support. Thus, targets must often be decorated with functionality that may not only require additional synthesis, but may also eliminate some potential binding modes to the RNA. An alternative method for selecting aptamers is through allosteric selection,109,110 where a randomized RNA library is inserted into a region of a self-cleaving RNA enzyme, such as the hammerhead ribozyme. Using iterative rounds of gel-based selection, where sequences that cleave in the presence of a desired ligand are amplified, and those that cleave in its absence are discarded, it is possible to discover new ligand-dependent aptamers. An advantage of allosteric selection is that the ligands are used in solution in an unmodified form, which eliminates the need for attaching them to a solid support.* Detailed protocols for allosteric selection have been published recently.111

3.5.2 Design Considerations for Creating Synthetic Riboswitches Though riboswitches likely represent a very early form of genetic control, they are a relatively recent addition to the genetic engineer’s toolbox. Thus, it is too early to evaluate the promise of any single strategy for creating new riboswitches, and certainly new strategies (both rational and evolution-based) will emerge in the coming years as our understanding of riboswitch mechanisms improves. That said, it is worth briefly discussing the history of engineered riboswitches to see where we are and where metabolic engineers may want to go. As described in Section 3.4.2.2, most synthetic riboswitches have been created in eukaryotes by cloning one or more copies of an aptamer in the 5ʹ-UTR of a gene. Most of these switches are thought to repress protein translation by introducing an additional barrier to translation upon ligand binding which promotes the dissociation of the ribosome and the termination of protein synthesis. In prokaryotes, there are fewer examples of synthetic riboswitches, and the two best known examples88,89 respond to the same molecule (theophylline), but were created using different strategies and act via different mechanisms. What is clear, however, is that all of these riboswitches were created from pre-existing aptamers that were selected for another purpose. Thus, one the one hand, all of these examples have benefited from the availability of these aptamer structures and the extensive body of knowledge about their structures and binding mechanisms. On the other hand, all of these studies have been constrained to the same set of building blocks that were in most cases, selected for their ability to bind ligands tightly and specifically, and not to control gene expression. To control new metabolic pathways using synthetic riboswitches, it is clear that aptamers that recognize the various metabolites will have to be selected. As discussed in Section 3.5.1, there are several ways of selecting aptamers, and all begin with extremely large combinatorial libraries that are typically winnowed down to a manageable number of clones that can be studied individually. Historically, aptamers have been selected primarily for their ability to bind tightly, however, riboswitches must not only bind ligands, but also transmit signals, and it is not at all clear whether the tightest binding aptamers will make the best riboswitches. Given these uncertainties and the lack of general riboswitch design principles, we currently favor high-throughput combinatorial screening approaches to riboswitch discovery. We have found that combinatorial approaches not only readily identify robustperforming synthetic riboswitches, but also that the sequences of these new constructs provide significant insights into riboswitch function that we hope will provide general principles for riboswitch design.

* Another potential advantage of allosteric selection is that aptamers produced in this manner have already been selected for their ability to participate in context where ligand binding (thermodynamics) and switching (kinetics) are both important. Thus, it is tempting to speculate that these aptamers may be particularly suitable for incorporation into a riboswitches, where ligand-mediated allostery is critical for function, though this notion remains to be tested.

3-14

Evolutionary Tools in Metabolic Engineering

3.5.3 Developing a High Throughput Screening Method—General Considerations To discover new riboswitches or promoters from large libraries requires an efficient high throughput screening or selection method. Since engineered promoters and riboswitches ultimately control gene expression, a wide variety of well-characterized reporter gene systems are available. In vivo selections, in which cell survival depends on the controlled expression of an essential gene (e.g. antibiotic resistance, production of an essential metabolite), are straightforward to perform and extremely highthroughput.* However, selections can be relatively insensitive to small changes in activity and operate in a narrow dynamic range. Fluorescence activated cell sorting (FACS), in which the activity of a fluorescent reporter gene (e.g. GFP) is determined on a cell-by-cell basis is also extremely high throughput, and offers improvements in sensitivity and dynamic range over selections. However, FACS systems require significant investments in equipment and maintenance. Spectrophotometric assays performed in microtiterplates offer high sensitivity and broad dynamic range, but are more modest in throughput than genetic selection or FACS. To screen for the activity of new promoters (Section 3.3), we generally rely upon histochemical assays performed on agar plates, such as those based on the hydrolysis of 5-bromo-4-chloro-3-indolyl-β-dgalactopyranoside (X-gal) by the enzyme β-galactosidase. This approach is precise, sensitive, relatively high in throughput (~10,000 colonies/plate) and very affordable. We generally employ lower throughput microplate screens to quantitatively evaluate candidate clones.

3.5.4 Creating Synthetic Riboswitches Using High Throughput Screening Screening for new riboswitches adds additional complexity because it is important to assay potential switches in both the presence and absence of a desired ligand. Ideally, a riboswitch will display an extremely low level of gene expression in the absence of the ligand, and display a large, reproducible increase in expression upon addition of the ligand.† To discover new synthetic riboswitches with large dynamic ranges, we developed a two-stage semi-automated screen in which diverse libraries of sequences are first assayed for their abilities to repress protein expression in the absence of the ligand.112 Candidate sequences from the first screen are then assayed for their ability to activate protein expression in the presence of the desired ligand. 3.5.4.1 Creating Libraries of Candidate Riboswitches Controlling the initiation of translation is likely the most straightforward mechanism of engineering riboswitch function because it only requires the formation of structures that affect ribosome binding, and it does not require the engineering of a terminator sequence. As such, we typically adopt a strategy in which an aptamer produced by in vitro selection (Section 3.5.1) is cloned at a location four to ten bases 5ʹ- to the RBS of a reporter gene.‡112 We then create libraries in which the region between the aptamer and RBS is randomized using PCR. An advantage of working with RNA libraries is that the number of potential variants is relatively small (N4 = 4 4 = 16; N8 = 4 8 = 65,536) and nearly all sequence space can be sampled using a small number of screens. Furthermore, candidate riboswitches identified by screening can be sequenced and their folds predicted using established * Often, selections are limited only by the transformation efficiency of the organism, which for E. coli approaches 109 transformants per experiment. † Although for clarity we will refer only to screens for ligand-dependent activation of expression, one can just as easily screen for riboswitches that decrease gene expression in a ligand-dependent fashion. ‡ We typically use β-galactosidase, though we have successfully used GFP, dsRED, β-glucuronidase, and chloramphenicol acetyl transferase as reporters.

Engineering DNA and RNA Regulatory Regions

3-15

methods113 to generate mechanistic hypotheses112 that may also suggest rational strategies for further improvements. 3.5.4.2 Sample Screening Protocol Here we describe a sample screening protocol that was used to identify synthetic riboswitches that activate protein translation in the presence of theophylline.112 However, we have used this protocol with appropriate modifications to discover riboswitches that respond to a variety of ligands beginning with other aptamers and using other reporter genes (Gallivan et al., unpublished). The number of transformants needed in the initial screening will depend on the size of the library; using a β-galactosidase reporter gene, we typically screen from 1,000 colonies (N4 library) to 100,000 colonies (N8 library) on agar plates to identify candidates.

1. Transform E. coli with library of constructs by electroporation and grow overnight on LB-agar plates supplemented with antibiotic and X-gal. 2. Pick the 96 whitest colonies from the plate and inoculate them in a 96-well plate with each well containing 200 μL of LB supplemented with antibiotic. (This assay can be performed repeatedly using multiple 96-well plates if there are a large number of white colonies.) 3. Incubate the 96-well plate overnight with shaking (37°C, 180 rpm). 4. The following day, inoculate four 96-well plates (two sets of two) with 5 μL of the overnight culture. Each well of the first set of plates should contain 200 μL of LB supplemented with an appropriate antibiotic. Wells in the second set of plates should contain 200 μL of LB supplemented with both antibiotic and ligand (500 μL). 5. Incubate the four 96-well plates for approximately 2.5 hours at 37°C with shaking (210 rpm) until appropriate OD600 is reached (0.3–0.5 when corrected to a 1-cm path length cuvette). Record OD600 for each culture. 6. Lyse cultures with 21 μL of Pop Culture® solution from Novagen (10:1, Pop Culture : lysozyme (4 U/mL)) and mix by pipetting up and down. Allow cultures to rest for 5 min at room temperature to ensure complete lysis. 7. In a new 96-well plate, combine 15 µL of lysed culture with 132.25 μL of Z-buffer (60 mM Na 2HPO4, 40 mM NaH2PO4, 10 mM KCl, 1 mM MgSO4, 50 mM β-mercaptoethanol, pH = 7.0). 8. Add 29 μL of o-nitrophenylgalactoside (ONPG, 4 mg/mL in 100 mM NaH2PO4) to each well containing lysed culture and buffer. Note the time of substrate addition. Allow ONPG to hydrolyze for approximately 20 min or until faint yellow color is observed. 9. Quench the reaction by adding 75 μL of Na 2CO3 (1 M) to each reaction. Record the hydrolysis time and the OD420 for each well measured. 10. Calculate Miller Units for each well using the following formula: Miller Units = OD420/(OD600 × hydrolysis time × relative volume of cell lysate). 11. Compare the ratios of the Miller units for cultures grown in the presence of ligand to those grown in the absence of ligand (the “activation ratio”) to identify functional switches.

3.5.4.2.1 Notes on Screening Protocol We identify candidate switches as clones that show an activation ratio >2.0 in two separate determinations.112 Because ratios of signal to background can be misleading in cases where both the signal and the background are low, we eliminate candidates that do not display a minimum activity in the presence of ligand (an OD420 ≥ 0.04). As a final check, visually inspect the data for aberrations, such as cultures that grew especially slowly or quickly (as represented by OD600), or for cultures with dramatically different results between the two plates. Candidate switches can be subcultured from the original overnight culture and assayed individually following the protocol of Jain and Belasco.114

3-16

Evolutionary Tools in Metabolic Engineering

3.5.4.3 Individual Assay for Riboswitch Activity

1. Using 5 μL of selected culture, inoculate 5 mL of LB supplemented with the appropriate antibiotic. Incubate overnight at 37°C with shaking. 2. The following day, using 5 μL of fresh overnight culture, inoculate two new tubes. One tube should contain 1.5 mL of LB supplemented with the appropriate antibiotic. The second tube should contain LB supplemented with the appropriate antibiotic plus ligand. 3. Incubate cultures at 37°C with shaking until an OD600 of 0.3–0.5 is reached. Place tubes on ice for 20 min. Measure OD600. 4. To a separate glass culture tube, add 200 μL of culture to 800 μL of Z-buffer (60 mM Na 2HPO4, 40 mM NaH2PO4, 10 mM KCl, 1 mM MgSO4, 50 mM β-mercaptoethanol, pH = 7.0), 20 μL of chloroform and 10 μL of 0.1% sodium dodecyl sulfate. Vortex for 15 seconds. Incubate tubes for 5 min in a 28°C water bath for 5 min. 5. Add 200 μL of ONPG and allow substrate to hydrolyze until a faint yellow color is observed. 6. Quench reaction with 500 μL of Na 2CO3 (1 M). Measure OD420 of reaction measure and calculate Miller Units using the above equation.

3.6 Conclusions The general goal of metabolic engineering is to modify cells to optimize the biosynthesis of desired molecules. The heterologous expression of metabolic enzymes must be coordinated in order to avoid potentially toxic bottlenecks. Gene expression must be mediated by precursor and product molecules, or by other regulators. Nature manages biological complexity by generating sequence diversity and selecting haplotypes associated with the highest organismal fitness. We emulate natural selection by (1) randomizing promoters and riboswitch linkers, (2) cloning the resulting libraries into plasmids or integrating them into bacterial chromosomes, and (3) identifying clones that produce appropriate levels of a coupled reporter gene. These techniques can easily be generalized, and should allow the metabolic engineer to focus on the problem of designing novel pathways.

References 1. Koffas, M. and Stephanopoulos, G. Strain improvement by metabolic engineering: lysine production as a case study for systems biology. Curr. Opin. Biotechnol., 2005, 16, 361–16. 2. Parikh, M. R., Greene, D. N., Woods, K. K., and Matsumura, I. Directed evolution of RuBisCO hypermorphs through genetic selection in engineered E. coli. Protein. Eng. Des. Sel., 2006, 19, 113–19. 3. Schwimmer, L. J., Rohatgi, P., Azizi, B., Seley, K. L., and Doyle, D. F. Creation and discovery of ligand-receptor pairs for transcriptional control with small molecules. Proc. Natl. Acad. Sci. USA, 2004, 101, 14707–12. 4. de Boer, H. A., Comstock, L. J., and Vasser, M. The tac promoter: a functional hybrid derived from the trp and lac promoters. Proc. Natl. Acad. Sci. USA, 1983, 80, 21–25. 5. Harley, C. B. and Reynolds, R. P. Analysis of E. coli promoter sequences. Nucleic Acids. Res., 1987, 15, 2343–61. 6. Rosenberg, M. and Court, D. Regulatory sequences involved in the promotion and termination of RNA transcription. Ann. Rev. Genet., 1979, 13, 319–53. 7. Youderian, P., Bouvier, S., and Susskind, M. M. Sequence determinants of promoter activity. Cell, 1982, 30, 843–53. 8. Horwitz, M. S. and Loeb, L. A. Promoters selected from random DNA sequences. Proc. Natl. Acad. Sci. USA, 1986, 83, 7405–09. 9. Oliphant, A. R. and Struhl, K. Defining the consensus sequences of E. coli promoter elements by random selection. Nucleic Acids Res., 1988, 16, 7673–83.

Engineering DNA and RNA Regulatory Regions

3-17

10. Matsumura, I. and Rowe, L. A. Whole plasmid mutagenic PCR for directed protein evolution. Biomol. Eng., 2005, 22, 73–79. 11. Matsumura, I., Olsen, M. J., and Ellington, A. D. Optimization of heterologous gene expression for in vitro evolution. Biotechniques, 2001, 30, 474–76. 12. Miksch, G., Bettenworth, F., Friehs, K., and Flaschel, E. The sequence upstream of the -10 consensus sequence modulates the strength and induction time of stationary-phase promoters in Escherichia coli. Appl. Microbiol. Biotechnol., 2005, 69, 312–20. 13. Miksch, G., Bettenworth, F., Friehs, K., Flaschel, E., Saalbach, A., Twellmann, T., and Nattkemper, T. W. Libraries of synthetic stationary-phase and stress promoters as a tool for fine-tuning of expression of recombinant proteins in Escherichia coli. J. Biotechnol., 2005, 120, 25–37. 14. Hammer, K., Mijakovic, I., and Jensen, P. R. Synthetic promoter libraries—tuning of gene expression. Trends. Biotechnol., 2006, 24, 53–55. 15. Jensen, P. R. and Hammer, K. Artificial promoters for metabolic optimization. Biotechnol. Bioeng., 1998, 58, 191–95. 16. Solem, C. and Jensen, P. R. Modulation of gene expression made easy. Appl. Environ. Microbiol., 2002, 68, 2397–403. 17. Jensen, P. R. and Hammer, K. The sequence of spacers between the consensus sequences modulates the strength of prokaryotic promoters. Appl. Environ. Microbiol., 1998, 64, 82–87. 18. Fischer, C. R., Alper, H., Nevoigt, E., Jensen, K. L., and Stephanopoulos, G. Response to Hammer et al.: Tuning genetic control—importance of thorough promoter characterization versus generating promoter diversity. Trends Biotechnol., 2006, 24, 55–56. 19. Alper, H., Fischer, C., Nevoigt, E., and Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA, 2005, 102, 12678–83. 20. Jorgensen, C. M., Hammer, K., Jensen, P. R., and Martinussen, J. Expression of the pyrG gene determines the pool sizes of CTP and dCTP in Lactococcus lactis. Eur. J. Biochem., 2004, 271, 2438–45. 21. Keasling, J. D. Gene-expression tools for the metabolic engineering of bacteria. Trends. Biotechnol., 1999, 17, 452–60. 22. Vieira, J. and Messing, J. The pUC plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene, 1982, 19, 259–68. 23. Herring, C. D., Glasner, J. D., and Blattner, F. R. Gene replacement without selection: regulated suppression of amber mutations in Escherichia coli. Gene, 2003, 311, 153–63. 24. Metzgar, D., Bacher, J. M., Pezo, V., Reader, J., Doring, V., Schimmel, P., Marliere, P., and de Crecy-Lagard, V. Acinetobacter sp. ADP1: an ideal model organism for genetic analysis and genome engineering. Nucleic Acids. Res., 2004, 32, 5780–90. 25. Cheng, S. Longer PCR amplifications. In PCR Strategies, Innis, M. A., Gelfland, D. H., and Sninsky, J. J., Eds. Academic Press: San Diego, CA, 1995, 313–24. 26. Cheng, S., Chang, S. Y., Gravitt, P., and Respess, R. Long PCR. Nature, 1994, 369, 684–85. 27. Cheng, S., Fockler, C., Barnes, W. M., and Higuchi, R. Effective amplification of long targets from cloned inserts and human genomic DNA. Proc. Natl. Acad. Sci. USA, 1994, 91, 5695–99. 28. Stewart, A. C., Gravitt, P. E., Cheng, S., and Wheeler, C. M. Generation of entire human papillomavirus genomes by long PCR: frequency of errors produced during amplification. Genome Res., 1995, 5, 79–88. 29. Shafikhani, S. Factors affecting PCR-mediated recombination. Environ. Microbiol., 2002, 4, 482–86. 30. Wang, Y., Prosen, D. E., Mei, L., Sullivan, J. C., Finney, M., and Vander Horn, P. B. A novel strategy to engineer DNA polymerases for enhanced processivity and improved performance in vitro. Nucleic Acids Res., 2004, 32, 1197–207. 31. D’Aquila, R. T., Bechtel, L. J., Videler, J. A., Eron, J. J., Gorczyca, P., and Kaplan, J. C. Maximizing sensitivity and specificity of PCR by pre-amplification heating. Nucleic Acids Res., 1991, 19, 3749. 32. Wybranietz, W. A. and Lauer, U. Distinct combination of purification methods dramatically improves cohesive-end subcloning of PCR products. Biotechniques, 1998, 24, 578–80.

3-18

Evolutionary Tools in Metabolic Engineering

33. Thomas, M. R. Simple, effective cleanup of DNA ligation reactions prior to electro-transformation of E. coli. Biotechniques, 1994, 16, 988–90. 34. Inoue, H., Nojima, H., and Okayama, H. High efficiency transformation of Escherichia coli with plasmids. Gene, 1990, 96, 23–28. 35. Dower, W. J., Miller, J. F., and Ragsdale, C. W. High efficiency transformation of E. coli by high voltage electroporation. Nucleic Acids Res., 1988, 16, 6127–45. 36. Muyrers, J. P., Zhang, Y., and Stewart, A. F. Techniques: Recombinogenic engineering—new options for cloning and manipulating DNA. Trends Biochem. Sci., 2001, 26, 325–31. 37. Zhang, Y., Muyrers, J. P., Testa, G., and Stewart, A. F. DNA cloning by homologous recombination in Escherichia coli. Nat. Biotechnol., 2000, 18, 1314–17. 38. Nakayama, M. and Ohara, O. Improvement of recombination efficiency by mutation of red proteins. Biotechniques, 2005, 38, 917–24. 39. Palmen, R. and Hellingwerf, K. J. Uptake and processing of DNA by Acinetobacter calcoaceticus—a review. Gene, 1997, 192, 179–90. 40. Buchan, A. and Ornston, L. N. When coupled to natural transformation in Acinetobacter sp. strain ADP1, PCR mutagenesis is made less random by mismatch repair. Appl. Environ. Microbiol., 2005, 71, 7610–12. 41. Endy, D. Foundations for engineering biology. Nature, 2005, 438, 449–53. 42. Temesgen, B. and Eschrich, K. Simplified method for ligase-free cloning of PCR products. Biotechniques, 1996, 21, 828–832. 43. Muller, P. Y., Studer, E., and Miserez, A. R. Molecular Biocomputing Suite: a word processor add-in for the analysis and manipulation of nucleic acid and protein sequence data. Biotechniques, 2001, 31, 1306, 1308, 1310–13. 44. Barbe, V., Vallenet, D., Fonknechten, N., Kreimeyer, A., Oztas, S., Labarre, L., Cruveiller, S., Robert, C., Duprat, S., Wincker, P., Ornston, L. N., Weissenbach, J., Marliere, P., Cohen, G. N., and Medigue, C. Unique features revealed by the genome sequence of Acinetobacter sp. ADP1, a versatile and naturally transformation competent bacterium. Nucleic Acids Res., 2004, 32, 5766–79. 45. Suggs, S. V., Hirose, T., Miyake, T., Kawashima, E. H., Johnson, M. J., Itakura, K., and Wallace, R. B. In Developmental Biology Using Purified Genes (ICN-UCLA Symposia on Molecular and Cellular Biology), Brown, D. D. and Fox, C. F., Eds. Academic Press: New York, NY, 1981, Vol. 23, 683–93. 46. He, L. and Hannon, G. J. Micro RNAs: Small RNAs with a big role in gene regulation. Nat. Rev. Genet., 2004, 5, 522–31. 47. Matzke, M. A. and Birchler, J. A. RNAi-mediated pathways in the nucleus. Nat. Rev. Genet., 2005, 6, 24–35. 48. de Smit, M. H. and van Duin, J. Secondary structure of the ribosome binding site determines translational efficiency: A quantitative analysis. Proc. Natl. Acad. Sci. USA, 1990, 87, 7668–72. 49. de Smit, M. H. and van Duin, J. Control of translation by mRNA secondary structure in Escherichia coli: A quantitative analysis of literature data. J. Mol. Biol., 1994, 244, 144–50. 50. Klovins, J., Tsareva, N. A., de Smit, M. H., Berzins, V., and van Duin, J. Rapid evolution of translational control mechanisms in RNA genomes. J. Mol. Biol., 1997, 265, 372–84. 51. de Smit, M. H. and van Duin, J. Translational initiation on structured messengers: Another role for the shine-dalgarno interaction. J. Mol. Biol., 1994, 235, 173–84. 52. de Smit, M. H. and van Duin, J. Translational standby sites: How ribosomes may deal with the rapid folding kinetics of mRNA. J. Mol. Biol., 2003, 331, 737–43. 53. Narberhaus, F., Kaser, R., Nocker, A. and Hennecke, H. A novel DNA element that controls bacterial heat shock gene expression. Mol. Microbiol., 1998, 28, 315–23. 54. Mathews, D. H., Sabina, J., Zuker, M., and Turner, D. H. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol., 1999, 288, 911–940. 55. Morita, M. T., Tanaka, Y., Kodama, T. S., Kyogoku, Y., Yanagi, H., and Yura, T. Translational induction of heat shock transcription factor sigma32: evidence for a built-in RNA thermosensor. Genes Dev., 1999, 13, 655–65.

Engineering DNA and RNA Regulatory Regions

3-19

56. Johansson, J., Mandin, P., Renzoni, A., Chiaruttini, C., Springer, M., and Cossart, P. An RNA thermosensor controls expression of virulence genes in Listeria monocytogenes. Cell, 2002, 110, 551–61. 57. Waldminghaus, T., Fippinger, A., Alfsmann, J., and Narberhaus, F. RNA thermometers are common in alpha- and gamma-proteobacteria. Biol. Chem., 2005, 386, 1279–86. 58. Chowdhury, S., Maris, C., Allain, F. H., and Narberhaus, F. Molecular basis for temperature sensing by an RNA thermometer. EMBO J., 2006, 25, 2487–97. 59. Narberhaus, F., Waldminighaus, T., and Chowdhury, S. RNA thermometers. FEMS Microbiol. Rev., 2006, 30, 3–16. 60. Smolke, C. D., Carrier, T. A., and Keasling, J. D. Coordinated, differential expression of two genes through directed mRNA cleavage and stabilization by secondary structures. Appl. Environ. Microbiol., 2000, 66, 5399–5405. 61. Smolke, C. D., Khlebnikov, A., and Keasling, J. D. Effects of transcription induction homogeneity and transcript stability on expression of two genes in a constructed operon. Appl. Environ. Microbiol., 2001, 57, 689–96. 62. Smolke, C. D., Martin, V. J. J., and Keasling, J. D. Controlling the metabolic flux through the carotenoid pathway using directed mRNA processing and stabilization. Metabolic Eng., 2001, 3, 313–321. 63. Smolke, C. D. and Keasling, J. D. Effect of gene location, mRNA secondary structures, and RNase sites on expression of two genes in an engineered operon. Biotechnol. Bioeng., 2002, 80, 762–76. 64. Smolke, C. D. and Keasling, J. D. Effect of copy number and mRNA processing and stabilization on transcript and protein levels from an engineered dual-gene operon. Biotechnol. Bioeng., 2002, 78, 412–24. 65. Isaacs, F. J., Dwyer, D. J., Ding, C., Pervouchine, D. D., Cantor, C. R., and Collins, J. J. Engineered riboregulators enable post-transcriptional control of gene expression. Nat. Biotech., 2004, 22, 841–847. 66. Bayer, T. S. and Smolke, C. D. Programmable ligand-controlled riboregulators of eukaryotic gene expression. Nat. Biotech., 2005, 23, 337–343. 67. Isaacs, F. J., Dwyer, D. J., and Collins, J. J. RNA synthetic biology. Nat. Biotechnol., 2006, 24, 545–54. 68. Saha, S., Ansari, A. Z., Jarrell, K. A., and Ptashne, M. RNA sequences that work as transcriptional activating regions. Nucleic Acids Res., 2003, 31, 1565–70. 69. Buskirk, A. R., Landrigan, A., and Liu, D. R. Engineering a ligand-dependent RNA transcriptional activator. Chem. Biol., 2004, 11, 1157–1163. 70. Winkler, W. C., Cohen-Chalamish, S., and Breaker, R. R. An mRNA structure that controls gene expression by binding FMN. Proc. Natl. Acad. Sci. USA, 2002, 99, 15908–13. 71. Epshtein, V., Mironov, A. S., and Nudler, E. The riboswitch-mediated control of sulfur metabolism in bacteria. Proc. Natl. Acad. Sci. USA, 2003, 100, 5052–56. 72. Mandal, M., Boese, B., Barrick, J. E., Winkler, W. C., and Breaker, R. R. Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria. Cell, 2003, 113, 577–86. 73. Rodionov, D. A., Vitreschak, A. G., Mironov, A. A., and Gelfand, M. S. Regulation of lysine biosynthesis and transport genes in bacteria: yet another RNA riboswitch? Nucleic Acids Res., 2003, 31, 6748–57. 74. Sudarsan, N., Barrick, J. E., and Breaker, R. R. Metabolite-binding RNA domains are present in the genes of eukaryotes. RNA, 2003, 9, 644–47. 75. Wade C. and Winkler, R. R. B. Genetic control by metabolite-binding riboswitches. ChemBioChem, 2003, 4, 1024–32. 76. Winkler, W. C., Nahvi, A., Sudarsan, N., Barrick, J. E., and Breaker, R. R. An mRNA structure that controls gene expression by binding S-adenosylmethionine. Nat. Struct. Biol., 2003, 10, 701–7. 77. Breaker, R. R. Complex riboswitches. Mol. Biol. Cell, 2004, 15, 357a–58a. 78. Mandal, M. and Breaker, R. R. Gene regulation in riboswitches. Nat. Rev. Mol. Cell. Biol., 2004, 5, 451–463. 79. Nudler, E. and Mironov, A. S. The riboswitch control of bacterial metabolism. Trends Biochem. Sci., 2004, 29, 11–17. 80. Winkler, W., Nahvi, A., and Breaker, R. R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature, 2002, 419, 952–956.

3-20

Evolutionary Tools in Metabolic Engineering

81. Nahvi, A., Sudarsan, N., Ebert, M. S., Zou, X., Brown, K. L., and Breaker, R. R. Genetic control by a metabolite binding mRNA. Chem. Biol., 2002, 9, 1043–1049. 82. Serganov, A., Yuan, Y. R., Pikovskaya, O., Polonskaia, A., Malinina, L., Phan, A. T., Hobartner, C., Micura, R., Breaker, R. R., and Patel, D. J. Structural basis for discriminative regulation of gene expression by adenine- and guanine-sensing mRNAs. Chem. Biol., 2004, 11, 1729–41. 83. Batey, R. T., Gilbert, S. D., and Montange, R. K. Structure of a natural guanine-responsive riboswitch complexed with the metabolite hypoxanthine. Nature, 2004, 432, 411–15. 84. Mandal, M. and Breaker, R. R. Adenine riboswitches and gene activation by disruption of a transcription terminator. Nat. Struct. Mol. Biol., 2004, 11, 29–35. 85. Winkler, W. C., Nahvi, A., Roth, A., Collins, J. A., and Breaker, R. R. Control of gene expression by a natural metabolite-responsive ribozyme. Nature, 2004, 428, 281–86. 86. Kim, D.-S., Gusti, V., Pillai, S. G., and Gaur, R. K. An artificial riboswitch for controlling pre-mRNA splicing. RNA, 2005, 11, 1667–77. 87. Thore, S., Leibundgut, M., and Ban, N. N. Structure of the eukaryotic thiamine pyrophosphate riboswitch with its regulatory ligand. Science, 2006, 312, 1208–11. 88. Desai, S. K. and Gallivan, J. P. Genetic screens and selections for small molecules based on a synthetic riboswitch that activates protein translation. J. Am. Chem. Soc., 2004, 126, 13247–54. 89. Suess, B., Fink, B., Berens, C., Stentz, R., and Hillen, W. A. Theophylline responsive riboswitch based on helix slipping controls gene expression in vivo. Nucleic Acids Res., 2004, 32, 1610–14. 90. Corbino, K. A., Barrick, J. E., Lim, J., Welz, R., Tucker, B. J., Puskarz, I., Mandal, M., Rudnick, N. D., and Breaker, R. R. Evidence for a second class of S-adenosylmethionine riboswitches and other regulatory RNA motifs in alpha-proteobacteria. Genome Biol., 2005, 6, R70. 91. Fuchs, R. T., Grundy, F. J., and Henkin, T. M. The S-MK box is a new SAM-binding RNA for translational regulation of SAM synthetase. Nat. Struct. Mol. Biol., 2006, 13, 226–33. 92. McDaniel, B. A., Grundy, F. J., Kurlekar, V. P., Tomsic, J., and Henkin, T. M. Identification of a mutation in the Bacillus subtilis S-adenosylmethionine synthetase gene that results in derepression of S-box gene expression. J. Bacteriol., 2006, 188, 3674–81. 93. Montange, R. K. and Batey, R. T. Structure of the S-adenosylmethionine riboswitch regulatory mRNA element. Nature, 2006, 441, 1172–75. 94. Sudarsan, N., Wickiser, J. K., Nakamura, S., Ebert, M. S., and Breaker, R. R. An mRNA structure in bacteria that controls gene expression by binding lysine. Genes Dev., 2003, 17, 2688–97. 95. Wickiser, J. K., Cheah, M. T., Breaker, R. R., and Crothers, D. M. The kinetics of ligand binding by an adenine-sensing riboswitch. Biochemistry, 2005, 44, 13404–14. 96. McCarthy, T. J., Plog, M. A., Floy, S. A., Jansen, J. A., Soukup, J. K., and Soukup, G. A. Ligand requirements for glmS ribozyme self-cleavage. Chem. Biol., 2005, 12, 1221–26. 97. Kubodera, T., Watanabe, M., Yoshiuchi, K., Yamashita, N., Nishimura, A., Nakai, S., Gomi, K., and Hanamoto, H. Thiamine-regulated gene expression of Aspergillus oryzae thiA requires splicing of the intron containing a riboswitch-like domain in the 5’-UTR. FEBS Lett., 2003, 555, 516–20. 98. Werstuck, G. and Green, M. R. Controlling gene expression in living cells through small moleculeRNA interactions. Science, 1998, 282, 296–98. 99. Grate, D. and Wilson, C. Inducible regulation of the S. cerevisiae cell cycle mediated by an RNA aptamer-ligand complex. Bioorg. Med. Chem., 2001, 9, 2565–2570. 100. Harvey, I., Garneau, P., and Pelletier, J. Inhibition of translation by RNA-small molecule interactions. RNA, 2002, 8, 452–63. 101. Hanson, S., Berthelot, K., Fink, B., McCarthy, J. E., and Suess, B. Tetracycline-aptamer-mediated translational regulation in yeast. Mol. Microbiol., 2003, 49, 1627–37. 102. Suess, B., Hanson, S., Berens, C., Fink, B., Schroeder, R., and Hillen, W. Conditional gene expression by controlling translation with tetracycline-binding aptamers. Nucleic Acids Res., 2003, 31, 1853–58. 103. Hanson, S., Bauer, G., Fink, B., and Suess, B. Molecular analysis of a synthetic tetracycline-binding riboswitch. RNA, 2005, 11, 503–11.

Engineering DNA and RNA Regulatory Regions

3-21

104. Muller, M., Weigand, J. E., Weichenrieder, O., and Suess, B. Thermodynamic characterization of an engineered tetracycline-binding riboswitch. Nucleic Acids Res., 2006, 34, 2607–17. 105. Ellington, A. D. and Szostak, J. W. In vitro selection of RNA molecules that bind specific ligands. Nature, 1990, 346, 818–22. 106. Tuerk, C. and Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science, 1990, 249, 505–10. 107. Wilson, D. S. and Szostak, J. W. In vitro selection of functional nucleic acids. Annu. Rev. Biochem., 1999, 68, 611–47. 108. Jenison, R. D., Gill, S. C., Pardi, A., and Polisky, B. High-resolution molecular discrimination by RNA. Science, 1994, 263, 1425–29. 109. Koizumi, M., Soukup, G. A., Kerr, J. N., and Breaker, R. R. Allosteric selection of ribozymes that respond to the second messengers cGMP and cAMP. Nat. Struct. Biol., 1999, 6, 1062–71. 110. Robertson, M. P., and Ellington, A. D. In vitro selection of an allosteric ribozyme that transduces analytes to amplicons. Nat. Biotechnol., 1999, 17, 62–66. 111. Roth, A. and Breaker, R. R. Selection in vitro of allosteric ribozymes. In Methods in Molecular Biology: Ribozymes and siRNA Protocols, 2nd ed. Sioud, M., Ed. Humana Press: Totowa, NJ, 2004, Vol. 252, 145–64. 112. Lynch, S. A., Desai, S. K., Sajja, H. K., and Gallivan, J. P. A high throughput screen for synthetic riboswitches reveals mechanistic insights into their function. Chem. Biol., 2007, 14, 173–184. 113. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 2003, 31, 3406–15. 114. Jain, C. and Belasco, J. C. Rapid genetic analysis of RNA-protein interactions by translational repression in Escherichia coli. Methods Enzymol., 2000, 318, 309–32.

4 Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

Ethan T. Johnson University of Minnesota

Claudia SchmidtDannert University of Minnesota

4.1 Introduction ��4-1 4.2 Initial Pathway Design and Integration into Host Metabolic Network ��4-2 4.3 Optimization of Metabolic Enzyme Levels Using Evolutionary Design �� 4-4 4.4 Evolving Metabolic Pathways for the Production of New Compounds.................................................... 4-6 4.5 Addressing Complex Traits Using Evolutionary Methods........ 4-8 4.6 Screening Technologies ��4-10 4.7 Conclusions and Future Directions..............................................4-12 References ��4-13

4.1 Introduction Microorganisms synthesize many useful compounds and humans have exploited their metabolic versatility for thousands of years. The discovery of the first microbial fermentation processes in the nineteenth century was the starting point for the development of industrial biotechnology. Biotechnologies introduced during the first half of the twentieth century allowed the mass production of citric acid and penicillin through the fermentation of sugars by specific strains of molds. At the beginning, microbial production of molecules depended entirely upon the exploitation of native biosynthetic pathways in microbial organisms. Microbial strains producing a desired compound were first identified by screening and production levels were then optimized using classical strain improvement strategies involving chemical mutagenesis and selection. While strain selection and screening sometimes did yield impressive increases in production levels, the improvements were not the result of directed engineering and the mechanism of the improvements remained obscure. As recombinant DNA technologies developed, engineering of specific metabolic traits became possible. By the 1980s it was possible to insert or delete enzymes in a microbial genome, and in one of the first examples of this technology the production of cellular carbon from methanol for animal feed was enhanced by replacing the glutamate synthase gene from the bacterium Methylophilus methylotrophus with a glutamate dehydrogenase from E. coli.1 With the sophistication of genetic methods for the manipulation of microbial cells and with increasing knowledge about metabolic processes on a molecular level, it became feasible to not only manipulate metabolic pathways for increased production levels in a native host, but also to combine genes from 4-1

4-2

Evolutionary Tools in Metabolic Engineering

different organisms into new multistep pathways for the production of natural and novel compounds. The genomics era provides an ever increasing resource of DNA sequence to be used for “bioprospecting”, to produce new classes of compounds, industrial fine chemicals and antibiotics and pharmaceuticals.2–5 Manipulation of biosynthetic activities through rational protein design methods facilitated further increases in production levels, enabled synthesis of desired pathway intermediates and new molecules. However, this approach relies on the availability of structural information and an understanding of the function of the biosynthetic enzyme(s) to be manipulated. Protein structures are available only for a small number of biosynthetic enzymes and our ability to correctly predict mutations that would result in a variant enzyme with the desired properties is rather limited. Recognizing these difficulties, scientists began to take nature as a guide for developing new design strategies that mimic evolutionary mechanisms and this strategy has been termed evolutionary engineering. The mechanisms of evolution have been used for thousands of years for plant domestication and breeding, but only recently have been adapted to the molecular scale. Strategies and applications of evolutionary engineering of individual proteins through directed or in vitro evolution is described in more detail elsewhere in this section. Metabolic pathways composed of multiple enzymatic steps that require coordination of their catalytic activities for optimal function represent a higher level of complexity. Individual biosynthetic enzymes within a pathway have been subjected to directed evolution for pathway optimization and diversification of pathway products. Manipulation of individual pathway enzymes via in vitro evolution or rational protein design may not always be sufficient to obtain superior production strains and it may be necessary to introduce multiple changes in the genome that influence the flux through a pathway at different levels. Different evolutionary engineering methods currently are being developed that aim to elicit more global metabolic changes. Natural mechanisms of pathway evolution again serve as a guide for these approaches. Comparison of enzymes and metabolism from many organisms has highlighted how genes are transferred in the community and how new functions develop from existing biochemical pathways. Transferable genetic elements are a main mechanism for driving evolution by sharing large sequences of DNA through a process known as horizontal gene transfer.6–8 Genomic analysis has revealed that individual genes such as ribosomal proteins and entire operons such as the leucine/isoleucine and tryptophan biosynthetic operons are transferred between organisms.6,8 Recombination processes incorporate the acquired DNA into the organism’s genome and new activities and functions develop. Gene duplication, deletion, inversion, and displacement facilitate the evolution of new metabolic pathways, and after assembly of the pathway it can be optimized by point mutations in the regulatory or coding regions of the enzyme to fine tune enzyme expression and activity. Changes throughout the pathway and the genome integrate the new pathway into the complex metabolism that occurs in the organism such that the metabolic systems are efficient with a minimum of overlap between different pathways. The evolution of the pathway optimizes the flux through the system so that intermediates do not build up, and regulatory elements develop to control the function of many pathways that all use the same substrates molecules. In this chapter, we first describe current design principles of complex metabolic pathways and briefly illustrate frequent problems encountered when engineering heterologous pathways. Following this we discuss how evolutionary design strategies have been applied to find solutions to those problems, optimize production levels, and synthesize new compounds.

4.2 Initial Pathway Design and Integration into Host Metabolic Network Recombinant metabolic pathways for the biosynthesis or biodegradation of specific compounds can be constructed in a specific host strain from a native metabolic pathway transferred from a heterologous source, or they can be assembled from a heterologous set of enzymes from different organisms or different pathways. Mixing and matching enzyme functions from different sources and pathways

Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

4-3

may increase production levels and diversity of products. Combinations of enzyme functions into new biosynthetic pathways rely on the compatibility of product and substrate molecules of the combined enzymes. This approach, also termed combinatorial biosynthesis, is described elsewhere in this handbook in more detail. An excellent illustration of the construction of a complex metabolic pathways in microbial hosts is the manipulation of the ergosterol biosynthetic pathway in S. cerevisiae.9 In this study, 13 genes assembled from yeast, mammalian, mitochondrial, and plant pathways were inserted into both chromosomal and plasmid DNA and, working together, produced the steroid hydrocortisone from a simple carbon source. This study also is significant because it recreates a mostly membrane-bound mammalian pathway in a microbial host. In other work, plant pathways have been incorporated into microbial systems to produce carotenoids, terpenes, and flavonoids. Several studies describe the production of naringenin, a flavone that can be derivatized further by a host of enzymes, using a combination of yeast, bacterial, and plant genes10 and also by a set of genes entirely from Arabidopsis thaliana.11 High-level production of porphyrin molecules has been obtained through overexpression of seven genes from E. coli, Synechocystis, and B. subtillis.12 Despite many successes, engineered metabolic pathways are subject to many potential problems. Intermediate compounds may arise because individual enzymes have low activity caused by poor gene expression and protein folding, low-catalytic activity, or low affinity with nonideal substrates. The flux through an assembled recombinant pathway may further be limited by the availability of precursor compounds, enzyme cofactors or, as more recently realized for some biosynthetic pathways (especially from plants), lack of interactions between enzymes to allow substrate channeling between subsequent pathway enzymes. For example, Yan et al.13 describe the expression of a plant-specific anthocyanin pathway in E. coli and despite strong expression levels for each enzyme, significant levels of precursor metabolites accumulated and the product yield remained low. It was suggested that enzyme complexes may not be formed properly in E. coli, limiting the channeling of substrate along the pathway. To aid substrate channeling, consecutive enzymes in a pathway have been fused to improve interactions for efficient substrate channeling and this method has increased the rate of formation of epi-aristolochene in vitro by fusing together two proteins in the terpene biosynthetic pathway.14 Depending on the problem, pathway optimization strategies may require manipulations of the host’s metabolic network and individual recombinant biosynthetic genes. Numerous examples for increasing the activity of rate-limiting steps can be found in the literature and in this book, and only will be discussed in this chapter as they relate to evolutionary engineering. Examples illustrating the manipulation of the host’s metabolic network to accommodate an engineered pathway to provide an initial framework for further optimization by rational or evolutionary design also are described. A recombinant pathway needs to be connected to the host’s metabolic network for the supply of pathway precursors and cofactors. It may be necessary for the precursor routes to be engineered. Like the assembly of the recombinant pathway, initial integration into the host metabolic network is done by rational design. However, evolutionary design methods increasingly are used for further optimization of precursor and cofactor supply. A recent example for the engineering of new precursor routes in E. coli is that of the production of polyketide antibiotics in E. coli. Apart from achieving functional expression of the large polyketide synthases in E. coli, it was necessary to introduce into E. coli the ability for the synthesis of the diverse starter and extender units (propionyl- , malonyl- , and methylmalonyl-CoA) for polyketide chain synthesis.15,16 Another challenge using E. coli for the synthesis of complex natural products, such as polyketide antibiotics, is that E. coli does not synthesize many of the sugars that decorate these compounds and are critical for biological activity. Production of glycosylated natural products in E. coli therefore requires the installment of sugar pathways and overexpression of cognate glycosyltransferases able to transfer the specific sugars to the antibiotic molecule.17 Lombo et al.18 have constructed sugar biosynthetic cassettes to endow the organism the capability to produce branched deoxysugars and have used the “flexible”

4-4

Evolutionary Tools in Metabolic Engineering

elloramycin glycosyltransferase to generate two tetracenomycin derivatives. The sugar cassettes have been extended to the antibiotic staurosporine, a potent protein kinase inhibitor,19 and future directions of this work include combining novel deoxysugars with aglycones using glycosyltransferases that accept many deoxysugars as substrates.20 Several glycosylated derivatives of erythromycin have been produced in an E. coli strain able to produce 6-deoxyerhthronolide B (6dEB).21 The engineered pathway included various sugar cassettes based upon the megalomicin gene cluster to generate the specific deoxysugars, a glycosyltransferase to transfer this moiety to the completed polyketide and a rRNA methyltransferase to confer resistance to the host cell from the mature antibiotic. In many instances, levels of precursor compounds in a production host limit the flux through an engineered recombinant pathway. Supplementation of the host with an alternative heterologous precursor pathway may therefore become necessary.12 Terpenoid production in E. coli was achieved by overexpressing the yeast isoprenoid precursor pathway (mevalonate pathway).22 Previous efforts to increase isoprenoid precursor production in E. coli by manipulating gene expression levels of its native mevalonate-independent pathway failed as this pathway appeared to be strictly regulated and perturbations of gene expression levels proved detrimental for E. coli. Intermediate molecules can accumulate if cofactor molecules required for subsequent reactions are not present. This was observed during the overproduction of porphyrin compounds in engineered E. coli cells where intracellular iron concentrations appeared to be limiting. Despite high levels of the enzyme ferrochelatase, protoporphyrin IX was not converted into the iron containing tetrapyrrole heme. To overcome intracellular iron limitation the iron transporter zupT was overexpressed to improve the uptake of iron and heme concentrations doubled.23 The intracellular pool of reducing equivalents in a cell defines production levels of many engineered pathways. For example, the availability of reducing equivalents was found to limit the production of succinate from glucose and Sanchez et al.24 have approached the problem by minimizing the NADH requirement by opening up the flow of acetyl-CoA through the glyoxylate pathway and increase production of succinate from this nontraditional pathway.

4.3 Optimization of Metabolic Enzyme Levels Using Evolutionary Design Once the initial framework of a recombinant pathway has been established in a host, production levels and diversity of products can be optimized by engineering specific genes or regulatory elements in the recombinant pathway or host metabolic network (Figure 4.1). This strategy follows the natural evolution of biosynthetic pathways which involves recruitment of duplicated genes for new biosynthetic functions (pathway assembly) followed by optimization of biosynthetic activities.6,25–27 Optimization of protein expression levels of both the enzymes directly involved in the pathway and the enzymes to generate the precursor molecules is a major design goal for improving flux through an engineered pathway. The relative activities of the enzymes in the pathway lead to different intermediate compound concentrations and the targets for protein design often can be determined by identifying bottlenecks in the pathway using metabolic flux analysis. Increasing the activity or expression of the enzyme downstream of the bottleneck can optimize flux through the pathway and improve the overall production. As it is difficult to predict mutations in coding or regulatory regions that would yield optimal enzyme expression levels, in vitro evolution strategies initially developed for individual enzymes are used for the optimization of protein expression levels. Evolutionary design of enzymes uses in vitro evolution methods that are reviewed in depth elsewhere in this book. Briefly, directed evolution selects for enzymes having a specific trait from a “library” of protein sequences that include random mutations or combinations of recombined genes from several related enzymes. Error-prone PCR and saturation mutagenesis methods can be used to generate libraries of random mutations, and DNA shuffling and PCR assembly of genes are efficient methods for making many correlated changes in the overall sequence of an enzyme. The library of enzymes is screened

Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

1. Site-directed mutagenesis 2. Rational protein design 3. Directed-evolution of protein function

Individual enzymes

4. Assembly of artificial pathways 5. Protein expression levels 6. Promoter evolution 7. Cofactor regulation 8. Transporter proteins 9. Precursor supply

Assembled biosynthetic pathway

Microbial host genome

4-5

Host genome

10. Host strain selection 11. Flux analysis 12. Classical strain improvement 13. Whole genome shuffling

Figure 4.1 Optimization strategies for small-molecule production in microbial hosts expressing heterologous biosynthetic pathways. Methods are available that address enzyme regulation and activity at the level of the individual enzymes, assembled biosynthetic pathways and the genome.

for the desired activity using absorbance, fluorescence or cell-growth assays, which often take advantage of recent developments in high-throughput techniques. Directed evolution of protein functions has been a powerful method to quickly generate desired protein variants without the necessity for structural information and frequently results in mutations that would not have been predicted by rational design methods. Optimization of metabolic protein expression levels can occur at many levels including the copy number of the plasmid, 28 the regulation and strength of the promoter, and the efficiency of translation of the mRNA product into protein. Tao et al. 28 have explored the effects of the plasmid copy number on heterologous protein expression by generating a library of sequences for the regulatory protein that controls plasmid replication. Measurements on the levels of product formation and realtime PCR confirmed that the copy number of the plasmid increased proportionally to the amount of product. Continuous variation in gene expression can be obtained by creating a library of promoter and ribosome binding sites that have different degrees of transcription efficiency. Meynial-Salles et al.29 describe a library of expression cassettes that have variation in the sequences for the promoter, ribosome-binding site and mRNA-stabilizing regions. When recombined into the E. coli chromosome, these sequences provided a 200-fold change in enzyme activity when using the model enzyme β-galactosidase. E. coli cells expressing recombinant carotenoid pathways frequently have been used as a model system to explore pathway evolution and optimization strategies. Synthesis of colored carotenoid pigments in E. coli provides a convenient screen with which to identify E. coli clones in a library that produce different production levels and novel chromophores (see below). For example, Kim et al.30 used the carotenoid pathway as a system to report on the effects of copy number and find that the araBAD promoter on medium-copy number plasmids provided the highest production of lycopene in E. coli. Directed evolution of the first committed carotenoid enzyme, geranylgeranyldiphosphate synthase, and the upstream promoter region identified mutations that increase the promoter and enzyme activity which was reflected by an increase in carotenoid production.31 In a systematic study of promoter strength, Alper et al.32 used error-prone PCR and a GFP fluorescence measurement to generate a series of promoters with different promoter-strengths. The series of promoters then were used to drive expression of deoxy-xylulose-P synthase (dxs), an enzyme at the beginning of the isoprenoid precursor pathway of carotenoid biosynthesis, and the amount of the carotenoid lycopene was measured. When enzymes downstream of dxs were expressed at low levels and the dxs promoter strength varied, lycopene production peaked at an

4-6

Evolutionary Tools in Metabolic Engineering

optimal expression level for dxs and further increases in the promoter strength decreased the lycopene production, suggesting that intermediate products had accumulated and presumably had toxic effects. When the downstream enzymes were overexpressed, lycopene production followed the dxs promoter strength suggesting that dxs was rate limiting. The degradation of toxic chemicals by a series of enzymatic reactions is also an area of considerable interest.33 Organophosphates have been used as pesticides and chemical warfare agents and are contaminants in many environments. Phosphotriesterases catalyze the hydrolytic detoxification of these compounds and are targets for directed evolution to improve the ability of these enzymes to hydrolyze more diverse chemicals. Directed evolution of these enzymes has been difficult because of the lack of suitable high-throughput methods for library screening or selection. McLoughlin et al. 34 have improved the screens by engineering a strain of E. coli to express a phosphodiesterase from E. aerogenes which breaks down the phosphotriesterase product into methyl phosphate which E. coli then can use as a phosphorous source. Subsequently, McLoughlin et al.35 used directed evolution and were able to improve protein expression of the phosphotriesterase and increase the growth rate of E. coli when grown using methyl paraoxon as a sole phosphorous source. In another example, DNA shuffling of the enzymes that catalyze the hydrolysis of s-triazines by dechlorination or deamination produced a library of enzymes that transforms an increased range of substrates with reaction rates up to 150-fold higher than the parent enzymes.36 The toxicity of the substrates and products often prevents efficient bioremediation. DNA shuffling has improved the activity of toluene ortho-monooxygenase toward a range of chlorinated ethenes37; however, the toxic epoxides continue to limit growth. Rui et al.38 reasoned that increased glutathione levels would improve biodegradation since glutathione is involved in the biotransformation of chlorinated ethenes in higher organisms. Holding true to this hypothesis, a strain of E. coli overexpressing glutathione degraded chlorinated ethenes two to four-fold faster. Entire operons that confer resistance to a particular chemical also can be optimized by evolutionary methods. Crameri et al.39 have used DNA shuffling to introduce point mutations throughout a Staphylococcus aureus arsenate resistance operon and obtained cells able to grow in 0.5 M arsenate, a 40-fold increase in resistance. While it is possible to regulate the expression and activity of specific enzymes, the optimization of a multienzyme pathway is complex and it is difficult to predict the ideal activity of each enzyme. Computational methods have been developed to address these problems and a current review is included in this book. Experimentally, Jung et al.40 have reconstructed the entire trehalose biosynthetic pathway on an immobilized substrate by using mRNA-protein fusions. The relative concentration of each enzyme component is varied by controlling the amount of capture DNA bound to the substrate and the concentrations of five enzymes were varied systematically and an optimal ratio of the rate limiting enzymes was determined. Recently, Pfleger et al.41 have generated a combinatorial library of tunable intergenic regions containing control elements that include mRNA secondary structures, RNase cleavage sites and ribosome binding sequences that lead to 100-fold changes in activity in a two-gene reporter system using GFP expression, and a seven-fold increase in the mevalonate system which includes three coordinated genes.

4.4 Evolving Metabolic Pathways for the Production of New Compounds Biosynthesis of new compounds can be achieved by (i) combining enzymes from different sources and pathways into new biosynthetic reaction sequences; (ii) altering the product and substrate spectrum of individual enzymes in an assembled pathway; or (iii) a combination of both processes. In all three strategies, an efficient reaction sequence from precursor to product may require adaptation of the substrate and product spectra of enzymes located next to the key diversity generating enzyme(s), and in many cases the catalytic promiscuity of the enzymes is important for generating the new substrate and product spectra.42,43 Many biosynthetic enzymes, especially those from secondary metabolic pathways,

4-7

Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

exhibit some flexibility in their substrate and product molecules, as it allows an organism to produce a series of related bioactive compounds which may provide an advantage during growth in altered environmental conditions. In vitro evolution methods can be applied to increase conversion of a selected substrates from a range of related substrates to a desired new product, or to decrease catalytic promiscuity such that only one particular product is formed. The biosynthesis of carotenoids provides a detailed example of using directed evolution together with gene combination to create product diversity.44 Carotenoids are colored pigments that play important biological functions in photosynthesis, coloration, and are antioxidants and radical scavengers. These pigments are derived from the general isoprenoid pathway by the head-to-head condensation of either two C15 farnesyl diphosphate molecules or two C20 geranylgeranyl diphosphate molecules to generate C30 or C 40 carotenoids, respectively. Following the condensation reaction, a desaturase introduces double bonds along the carotenoid backbone to generate a chromophore that absorbs in the visible range. Modifications, such as cyclization and various oxidation reactions at both ends of the linear carotenoid create diverse cyclic and linear carotenoid structures (Figure 4.2). A strategy termed “molecular pathway breeding”, involving first assembly of a series of genes in a microbial host and then in vitro evolution of key enzymes, has been used to synthesize novel carotenoid compounds in engineered, noncarotenogenic E. coli. New carotenoid structures with fully conjugated electron systems have been obtained by screening a library of desaturase variants for mutants that would synthesize carotenoids with extended chromophores.45,46 DNA shuffling of phytoene desaturases from the genus Erwinia led to a desaturase variant which introduces six double bonds, compared to the typical four double bonds, into the backbone of the noncolored carotenoid phytoene, producing the f ully conjugated carotenoid tetradehydrolycopene. The evolved pathway then was extended with a library of cyclase mutants and screened for variants capable of cyclizing the end-groups of carotenoids with

Diversity-oriented directed evolution of carotenoid pathways

C30 OPP C35

C15PP OPP

C40

C20PP

O O OH

OPP C25PP

C45

C50 Diphosphate precursors

Carotenoid backbones

Terminal groups

Figure 4.2 Pathway engineering of carotenoid biosynthesis has improved production levels and has generated novel and diverse types of carotenoids. Condensation of two diphosphate precursor molecules produces f ull-length natural (C30 and C 40) and novel (C35,C 45 and C50) carotenoid backbones (unsaturated molecules are shown). Desaturase variants introduce double bonds into the carotenoid backbone and subsequent enzymes including monooxygenases, hydroxylases, glucosylases, and ketolases modify the carotenoid scaffold. The solid arrow represents the enzymatic condensation reaction to form the carotenoid backbone and the dashed arrow represents one or more biosynthetic steps to produce a family of diverse molecules (only a few examples are shown).

4-8

Evolutionary Tools in Metabolic Engineering

extended chromophores. A mutant cyclase was identified in the library that produced a novel carotenoid compound, torulene, in E. coli. Carotenoid enzymes located downstream of the desaturase and cyclase possess sufficient catalytic flexibility to accept related carotenoid substrates, and the two in vitro evolved carotenoid pathway have been extended with additional carotenoid-modifying enzymes such as carotenoid monooxygenase, hydroxylase, glucosylase, or ketolase to synthesize a number of structurally novel carotenoid compounds in E. coli.47 Further, the diversity of carotenoids was increased by a carotenoid desaturase that generated more polar products.48 The chain length of carotenoids also was modified by directed evolution of the enzymes for these pathways. The C30 carotenoid synthase from Staphylococcus aureus was evolved to produce C 40 carotenoids,49 and further rounds of directed evolution exploited the substrate plasticity of the carotenoid synthases to produce C 45 and C50 carotenoids, previously unknown molecules.50 Another study showed that when the C30 carotene synthase is exposed to both C30 and C 40 precursor molecules, a novel asymmetric C35 carotenoid is produced.51 Additionally, the downstream enzymes continue to act on the respective half of the molecule: the cyclase from the C 40 pathway continues to act on that half of the C35 carotenoid. With the success of modifying carotenoids of various lengths, the new backbone structures opens up a wide variety of new compounds. The multiple mutations discovered by directed evolution reveal the enzymology of the chain elongation and has been explored through a series of enzymes with increased activities.52,53

4.5 Addressing Complex Traits Using Evolutionary Methods Activity outside the engineered pathway also has an influence on the growth characteristics and product yields. In the carotenoid pathway, potential bottlenecks in the precursor molecule source are related to the distribution of pyruvate and glyceraldehyde-3-phosphate molecules.54 Farmer et al.55 designed a control element that stimulated expression of two carotenoid genes in times of excess glycolytic flux increasing carotenoid production when there was high concentration of substrate, while also reducing the stress of the external pathway during the periods when the cells are limited by nutrients. Kang et al.56 identified genes that enhance the production of lycopene by screening a shotgun library of E. coli chromosomal DNA. Six clones containing 13 genes were isolated that had increased lycopene production, several of which had not been known to affect lycopene production. The genes included one that directly increases the precursor supply and several that indirectly affect metabolism by regulating processes related to stationary phase growth and transcription factors that regulate anaerobic growth and curli surface fibers. In a complementary study, all genes were subject to a systematic in silico model to identify genes that increase lycopene production.57 Of the several genes identified all but one were directly involved in production of precursor molecules, and the other increased the concentration of NADPH. Extending this work to include a combinatorial method to identify genes that enhance lycopene production, transposon mutagenesis was applied and lycopene production was measured.58 These methods identified two sets of genes that alter product formation by precursor availability or through kinetic and regulatory mechanisms, and a metabolic landscape for a small subset of genes was generated that describes lycopene biosynthesis. An engineered pathway stresses the limited resources of the host as precursor molecules are diverted from natural metabolic pathways, and requires energy resources for reactions such as protein synthesis and plasmid replication. To address the stresses and energy requirements of a population of E. coli cells expressing the lactose operon, the relative growth rate of induced cells versus noninduced cells was measured to determine the cost associated with expressing proteins for enzymatic function. Dekel et al.59 measured the cost of expressing the lactose operon in the absence of lactose and the benefit as a function of lactose concentration. The data allow them to predict the optimal enzyme activity for a given level of lactose, and they observe that E. coli growing in constant lactose for a few hundred generations optimize their use of lactose, most likely by altering the promoter and ribosome binding sequences. Additionally,

Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

4-9

expression of plasmids also strains the host machinery for DNA replication and causes the cells to grow slowly and as many microbial biotechnologies rely on coexpression of genes it is important to identify and reduce these stresses. To study this effect, a high-copy plasmid with an inducible promoter expressing glucose-6-phosphate dehydrogenase was transformed into E. coli.60 Induction of the dehydrogenase led to an increase in growth rate, suggesting that engineering the pentose-phosphate pathway increases the precursor supply for the building blocks necessary for plasmid replication. The genome of the microbial host responds to stresses and evolves to adapt to new metabolic activity and culture conditions; however, many of these mutations may not be predicted and it even may be difficult to know which metabolic systems require changes to improve production. In many cases, it is efficient to recast the problem from one of rationally engineering an optimized pathway to one of introducing random mutations into the genome and screening for the desired phenotype. Strain improvement methods generally subject an initial strain to a mutagen and the resulting phenotypes are screened for improvements in cell growth in specific environments or production of an engineered product. In classical strain improvement strategies further improvements of the strain can be attained by iteration of the mutation and selection strategy, at each point taking the best mutant as the starting point for next round of mutation. Strain improvement strategies recently have addressed the problem that production of an engineered chemical often is linked with cell growth, and that once stationary growth rates are obtained the production is diminished. To overcome this limitation, Sonderegger et al.61 developed a screen to select for cells able to continue to live without spending energy on replication and growth processes in ammonia- and glycerol-limited conditions. A metabolically active quiescent bacterial strain was developed by selecting for increased glucose uptake immediately after a nutrient limited environment. The entry to stationary phase was impaired such that the E. coli strain retained metabolic vigor even when on the verge of starvation, and that the metabolic activity was channeled toward production of the engineered product. Toxic chemicals present during fermentation often limit the growth and efficiency of the microbe. The chemicals may be the desired final product as in ethanol or phenol bioproduction, or may be the target of biodegradation through enzymatic reactions. In either case, a strain resistant to the toxic chemical will improve the reaction. Wierckx et al.62 have started to engineer a host for the bioproduction of phenol from glucose. To produce phenol from glucose and avoid the toxicity of phenolic aromatic compounds, they introduced a tyrosine phenol lyase into the solvent-resistant microbe Pseudomonas putida S12 and used a combination of rational and random engineering and high-throughput screening to optimize production of phenol. Whole genome shuffling approaches have been developed to incorporate multiple changes in the genome that improve the microbe as a host for the engineered biosynthetic pathway. This method is more efficient than the classic strain improvement and depends only on one round of mutation and selection followed by successive rounds of genomic recombination. In whole genome shuffling, a population of cells is subjected to a chemical mutagen to introduce random mutations and the fittest strains are identified by a high-throughput screening strategy. Using recombination techniques known as protoplast fusion, the genomes of the fittest strains are shuffled to create strains that have combinations of the beneficial mutations. In this manner, after only a few rounds of selection the final strains have incorporated many beneficial changes and are significantly fitter than strains developed using classical strain improvement. The original development of genome shuffling focused on increasing the production of a complex polyketide antibiotic.63 Strains isolated after only two rounds of genome shuffling matched the production levels of strains developed over 20 years and 20 rounds of classical improvement methods. In a second study, genome shuffling was applied to growth of Lactobacillus at low pH.64 A mutagenized population of Lactobacillus was screened for growth on plates with a pH-dependent gradient and five rounds of genome shuffling isolated strains thriving at pH 4 that produce lactate at elevated levels. Genome shuffling also has been used to improve the degradation of the pesticide pentachlorophenol

4-10

Evolutionary Tools in Metabolic Engineering

(PCP) by Sphingobium chlorophenolicum.65 Successive rounds of protoplast fusion and selection lead to strains that constitutively express the PCP degrading enzymes, have increased growth rate and are resistant to the toxic effects of PCP. Recently, Dai et al.66 improved protoplast fusion in E. coli by a factor of 104 and have opened doors to new evolutionary strategies in this powerful genetic host.

4.6 Screening Technologies Evolutionary strategies applied to metabolic pathways or whole genomes are limited by the ability to screen large libraries. In many instances, screening methods have to directly detect the product molecules in lysates obtained from whole cells without the amplification common to high-throughput assays used for the directed evolution of enzymes. The detection strategies are based on a primary detection scheme that uses physical properties of the molecule to quantify its concentration in the sample; alternatively, it is possible to modify chemical groups such as aldehydes to make the molecules visible to a simple screen. Each engineered metabolic pathway requires a specific screen and creativity in development of the screen often plays a large role. Many methods in evolutionary pathway engineering have capitalized on high-throughput technologies developed for drug discovery schemes. Many screening technologies for metabolites are based on absorbance or fluorescence properties of the compound: the absorbance properties of carotenoids have made them ideal for developing methods to design and evolve metabolic pathways, and fundamental insights into the biological processes that govern transcription and translation of enzymes can be monitored by GFP fusion proteins. Screening methods based on absorbance and fluorescence can be done in microplates using robotic devices and are compatible with a variety of extraction and filtering protocols. Additionally, fluorescence-activatedcell sorting is a technique that can screen individual whole cells and has been used in conjunction with directed evolution of a metabolic pathway.23 Recent years have brought improvements in technologies for analysis of metabolic profiles from plants and microbes (Figure 4.3).67–69 High-throughput detection of secondary metabolites depends on fast, parallel, or multiplexed high-resolution chromatographic separations and a strategy for detection. The gold standard for metabolic studies is gas chromatography-mass spectrometry (GC-MS) because of its ability to identify many compounds in a single experiment. Fiehn et al.70 have identified more than 300 secondary metabolites from leaf extracts of Arabidopsis thaliana, and Villas-Boas et al.71 have applied a derivatization strategy in conjunction with high-throughput detection to measure intracellular central carbon metabolism and amino acid biosynthesis. GC-MS has also been used to measure metabolic flux by following 13C through a series of metabolites.72,73 Liquid chromatography is a complementary strategy that often improves metabolite separation because nonvolatile molecules can be detected and sample preparation is simple.74 Reverse-phase, ion-exchange and other types of separations can be coupled to detection using a mass spectrometer. Capillary electrophoresis has also been used to analyze tissue from Arabidopsis for metabolites.75 In a recent study on polyketides, Kittell et al.76 used parallel capillary electrophoresis to screen a library of Aspergillus terreus for production of lovastatin compounds and increased the screening throughput by 50-fold compared to previous methods. In some applications it may be possible to avoid using separation methods and inject the sample directly into the mass spectrometer using electrospray ionization or laser desorption/ionization.77–79 Many clever detection strategies that do not depend on mass spectrometry will contribute to advances in strain improvement. Advances in FTIR and NMR have been applied toward screening metabolites in engineered organisms.80,81 Selections based on chemical gradients in solid media have been used for selection of cell growth at low pH and can be extended to other types of specific culture conditions.64 Chip-based screens for measuring ligand binding may be used for screening and a scintillation proximity assay advances the high-throughput screening for protein ligands and may be extended to quantify production of small molecules.82

Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

Growth characteristics – assay for growth on solid media using pH or chemical gradients

(a) 1 2 Counts

Cells have selectable phenotype Physical characteristics – Cell sorting methods based on absorption or fluoresence

4-11

3

(b) 1

Pathway products can be extracted for analytical characterization Detection strategies include GC-MS, LC-MS, absorption and fluoresence assays.

2

R1

100 101 102 103 Fluoresence intensity (FLI)

Assembled pathway

104

Host genome Functional genomics links phenotype and genotype Transcription measurements, proteomics and flux analysis can determine specific mutations responsible for changes in pathway response.

Figure 4.3 Evolutionary strategies applied to metabolic pathways or whole genomes are limited by the ability to screen large libraries. Screening methods used to detect small-molecule production include high-throughput absorbance or fluorescence based assays, analytical separations, and mass spectrometry. Functional genomics methods have become increasingly important for identification of mutations that improve production levels.

The specific genomic changes that confer improved strain selection also can be identified by advances in functional genomics methods that increase the ability of matching a phenotype to genotype. This becomes important when searching for the mutations identified in strain selection strategies that affect production of secondary metabolites.83–85 A few genes have been identified by various strain selection techniques, but a truly genomic analysis will have great impact on inverse metabolic engineering, the process by which pathways are assembled and engineered in a rational and intentional manner. The techniques used to identify genes involved in the metabolic network include sequencing the DNA of a selected set of genes from several strains to identify key enzymes, hybridization analysis of entire genome, transcriptional profiling, proteomics, metabolic profiling, and flux analysis. For example, transcript profiling improved galactose uptake by identifying that phosphoglucomutase plays a key role in the regulation of the flux of the metabolites.86,87 In addition to advances in analytical chemistry, simple pooling strategies also increase the throughput of screening methods by analyzing multiple samples in a single experiment. For example, the samples in a single 96-well plate can be analyzed by combining aliquots from wells in a single row or column. After analyzing all of the rows and columns, the data can be deconvoluted to determine the individual wells that demonstrate higher product yield by finding the intersections of the rows and columns that have higher product yield. In this example, the number of runs required to identify a single well that

4-12

Evolutionary Tools in Metabolic Engineering

has improved product production is reduced from 96 experiments to 21 (eight rows, 12 columns, and one individual well). For screening strategies with sufficient sensitivity, larger numbers of wells can be combined to reduce further the number of analyses.

4.7 Conclusions and Future Directions Rational construction of metabolic traits in a microbial host began with the introduction of a few recombinant genes and progressed to the assembly of complex heterologous pathways and development of analytical and modeling tools for metabolic flux analysis. Researchers have gained increasing insight into the design principles and complexity of cellular metabolic networks, which has led to the development of the metabolic engineering strategies discussed in this chapter that utilize evolutionary strategies to optimize complex cellular machineries for metabolite production. Initially, evolutionary engineering focused on optimizing catalytic activities and expression levels of individual enzymes in engineering pathways. It has become clear that in order to achieve optimal production levels, an engineered pathway needs to be integrated into the metabolic network of the heterologous host. Of particular importance is the development of regulatory circuits that allow a recombinant pathway to become part of a heterologous metabolic network. Evolving regulatory circuits (or genetic circuits) that self-regulate gene expression and enzyme activity in response to precursor supply is therefore an important next step in the field of metabolic engineering.88,89 The field of synthetic biology recently has gained attention and holds promise to develop tools and methods for the redesign of metabolic networks.3,90,91 Within this field progress has been made at understanding and designing synthetic genetic networks. In early examples of the engineering of genetic networks, the periodic expression of GFP was controlled by a set of three transcription factors 92 and a bistable network that switched between states after transient chemical or thermal induction was constructed from two repressible promoters.93 The stochastic nature of protein expression has been measured in single cells94 and mathematical representations of the stochastic behavior of the genetic circuits has improved to predict the behavior of increasingly complex combinations of activator and repressor elements.95 The majority of the studies on the design of genetic networks involve transcriptional elements that have been characterized in detail. How can the experience gained from the model systems lead to design of industrially important biosynthetic pathways? The best design may arise through a combination of rational methods that provide an original design of the network, followed by evolutionary methods that select for network behavior. The modular nature of the regulatory circuits allows for the evolution of new elements and functions from existing regulatory circuits. Protein networks can evolve new functions while maintaining original functions during selection under varying environmental conditions; however, under static conditions the modularity is lost and the evolved network may be trapped in a local minimum.96 Notable achievements in this area include combinatorial synthesis of regulatory circuits using variations of promoters and transcription factors assembled randomly to generate circuits with 13 different types of connections between the elements of the circuit and lead to unique responses to environmental conditions.97 Directed evolution has been used to fine-tune the control region of a promoter to match protein levels so that the circuit can function properly. Using this strategy, a nonfunctional circuit was converted into one that is functional and the main mutation that made it work was a truncation of one of the genes to generate a different allosteric response.27 Further advances in synthetic biology also may arise from a simplified host genome. Fewer metabolic and regulatory systems may decrease the number of unintended interactions and may provide efficient biocatalytic pathways for natural and novel compounds. To this end, a fully synthetic genome with a characterized list of genes may be possible for applications in bioenergy or biosynthesis or, alternatively, the reduction of the size of the genome to the essential features may provide a route to the ideal microbial host.98–100

Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

4-13

References 1. Windass, J. D. et al. Improved conversion of methanol to single-cell protein by Methylophilus methylotrophus. Nature, 287, 396, 1980. 2. Hibbert, E. G. et al. Directed evolution of biocatalytic processes. Biomol. Eng., 22, 11, 2005. 3. McDaniel, R. and Weiss, R. Advances in synthetic biology: on the path from prototypes to applications. Curr. Opin. Biotechnol., 16, 476, 2005. 4. Koffas, M. Evolutionary metabolic engineering. Metabol. Eng., 7, 1, 2005. 5. Chatterjee, R. and Yuan, L. Directed evolution of metabolic pathways. Trends Biotechnol., 24, 28, 2006. 6. McAdams, H. H., Srinivasan, B., and Arkin, A. P. The evolution of genetic regulatory systems in bacteria. Nat. Rev. Genet., 5, 169, 2004. 7. Brown, J. R. Ancient horizontal gene transfer. Nat. Rev. Genet., 4, 121, 2003. 8. Omelchenko, M. V. et al. Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ. Genome Biol., 4, R55, 2003. 9. Szczebara, F. M. et al. Total biosynthesis of hydrocortisone from a simple carbon source in yeast. Nat. Biotechnol., 21, 143, 2003. 10. Hwang, E. I. et al. Production of plant-specific flavanones by Escherichia coli containing an artificial gene cluster. Appl. Environ. Microbiol., 69, 2699, 2003. 11. Watts, K. T., Lee, P. C., and Schmidt-Dannert, C. Exploring recombinant flavonoid biosynthesis in metabolically engineered Escherichia coli. Chembiochem, 5, 500, 2004. 12. Kwon, S. J. et al. High-level production of porphyrins in metabolically engineered Escherichia coli: Systematic extension of a pathway assembled from overexpressed genes involved in heme biosynthesis. Appl. Environ. Microbiol., 69, 4875, 2003. 13. Yan, Y. J. et al. Metabolic engineering of anthocyanin biosynthesis in Escherichia coli. Appl. Environ. Microbiol., 71, 3617, 2005. 14. Brodelius, M. et al. Fusion of farnesyldiphosphate synthase and epi-aristolochene synthase, a sesquiterpene cyclase involved in capsidiol biosynthesis in Nicotiana tabacum. Eur. J. Biochem., 269, 3570, 2002. 15. Pfeifer, B. et al. Process and metabolic strategies for improved production of Escherichia coli-derived 6-deoxyerythronolide B. Appl. Environ. Microbiol., 68, 3287, 2002. 16. Pfeifer, B. A. et al. Biosynthesis of complex polyketides in a metabolically engineered strain of E. coli. Science 291, 1790, 2001. 17. Rodriguez, L. et al. Engineering deoxysugar biosynthetic pathways from antibiotic-producing microorganisms. A tool to produce novel glycosylated bioactive compounds. Chem. Biol., 9, 721, 2002. 18. Lombo, F. et al. Engineering biosynthetic pathways for deoxysugars: branched-chain sugar pathways and derivatives from the antitumor tetracenomycin. Chem. Biol., 11, 1709, 2004. 19. Salas, A. P. et al. Deciphering the late steps in the biosynthesis of the anti-tumour indolocarbazole staurosporine: sugar donor substrate flexibility of the StaG glycosyltransferase. Mol. Microbiol., 58, 17, 2005. 20. Luzhetskyy, A. and Bechthold, A. It works: combinatorial biosynthesis for generating novel glycosylated compounds. Mol. Microbiol., 58, 3, 2005. 21. Peiru, S. et al. Production of the potent antibacterial polyketide erythromycin C in Escherichia coli. Appl. Environ. Microbiol., 71, 2539, 2005. 22. Martin, V. J. J. et al. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat. Biotechnol., 21, 796, 2003. 23. Kwon, S. J. et al. A high-throughput screen for porphyrin metal chelatases: application to the directed evolution of ferrochelatases for metalloporphyrin biosynthesis. Chembiochem, 5, 1069, 2004.

4-14

Evolutionary Tools in Metabolic Engineering

24. Sanchez, A. M., Bennett, G. N., and San, K. Y. Novel pathway engineering design of the anaerobic central metabolic pathway in Escherichia coli to increase succinate yield and productivity. Metabol. Eng., 7, 229, 2005. 25. McAdams, H. H. and Arkin, A. Gene regulation: towards a circuit engineering discipline. Curr. Biol., 10, R318, 2000. 26. Hasty, J. Design then mutate. Proc. Natl. Acad. Sci. USA, 99, 16516, 2002. 27. Yokobayashi, Y., Weiss, R., and Arnold, F. H. Directed evolution of a genetic circuit, Proc Natl. Acad Sci USA, 99, 16587, 2002. 28. Tao, L., Jackson, R. E., and Cheng, Q. Directed evolution of copy number of a broad host range plasmid for metabolic engineering. Metabol. Eng., 7, 10, 2005. 29. Meynial-Salles, I., Cervin, M. A., and Soucaille, P. New tool for metabolic pathway engineering in Escherichia coli: one-step method to modulate expression of chromosomal genes. Appl. Environ. Microbiol., 71, 2140, 2005. 30. Kim, S. W. and Keasling, J. D. Metabolic engineering of the nonmevalonate isopentenyl diphosphate synthesis pathway in Escherichia coli enhances lycopene production. Biotechnol. Bioeng., 72, 408, 2001. 31. Wang, C. W., Oh, M. K., and Liao, J. C. Directed evolution of metabolically engineered Escherichia coli for carotenoid production. Biotechnol. Prog., 16, 922, 2000. 32. Alper, H. et al. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA, 102, 12678, 2005. 33. Parales, R. E. and Ditty, J. L. Laboratory evolution of catabolic enzymes and pathways. Curr. Opin. Biotechnol., 16, 315, 2005. 34. McLoughlin, S. Y. et al. Growth of Escherichia coli coexpressing phosphotriesterase and glycerophosphodiester phosphodiesterase, using paraoxon as the sole phosphorus source. Appl. Environ. Microbiol., 70, 404, 2004. 35. McLoughlin, S. Y. et al. Increased expression of a bacterial phosphotriesterase in Escherichia coli through directed evolution. Protein Exp. Purif., 41, 433, 2005. 36. Raillard, S. et al. Novel enzyme activities and functional plasticity revealed by recombining highly homologous enzymes. Chem. Biol., 8, 891, 2001. 37. Canada, K. A. et al. Directed evolution of toluene ortho-monooxygenase for enhanced 1-naphthol synthesis and chlorinated ethene degradation. J. Bacteriol., 184, 344, 2002. 38. Rui, L. Y. et al. Metabolic pathway engineering to enhance aerobic degradation of chlorinated ethenes and to reduce their toxicity by cloning a novel glutathione S-transferase, an evolved toluene o-monooxygenase, and gamma-glutamylcysteine synthetase. Environ. Microbiol., 6, 491, 2004. 39. Crameri, A. et al. Molecular evolution of an arsenate detoxification pathway DNA shuffling. Nat. Biotechnol., 15, 436, 1997. 40. Jung, G. Y. and Stephanopoulos, G. A functional protein chip for pathway optimization and in vitro metabolic engineering. Science, 304, 428, 2004. 41. Pfleger, B. F. et al. Combinatorial engineering of intergenic regions in operons tunes expression of multiple genes. Nat. Biotechnol., 24, 1027, 2006. 42. Aharoni, A. et al. The ‘evolvability’ of promiscuous protein functions. Nat. Genet., 37, 73, 2005. 43. Kazlauskas, R. J. Enhancing catalytic promiscuity for biocatalysis. Curr. Opin. Chem. Biol., 9, 195, 2005. 44. Umeno, D., Tobias, A. V., and Arnold, F. H. Diversifying carotenoid biosynthetic pathways by directed evolution. Microbiol. Mol. Biol. Rev., 69, 51, 2005. 45. Schmidt-Dannert, C., Umeno, D., and Arnold, F. H. Molecular breeding of carotenoid biosynthetic pathways. Nat. Biotechnol., 18, 750, 2000. 46. Wang, C. W. and Liao, J. C. Alteration of product specificity of Rhodobacter sphaeroides phytoene desaturase by directed evolution. J. Biol. Chem., 276, 41161, 2001.

Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

4-15

47. Lee, P. C. et al. Biosynthesis of structurally novel carotenoids in Escherichia coli. Chem. Biol., 10, 453, 2003. 48. Mijts, B. N., Lee, P. C., and Schmidt-Dannert, C. Identification of a carotenoid oxygenase synthesizing acyclic xanthophylls: combinatorial biosynthesis and directed evolution. Chem. Biol., 12, 453, 2005. 49. Umeno, D., Tobias, A. V., and Arnold, F. H. Evolution of the C30 carotenoid synthase CrtM for function in a C40 pathway. J. Bacteriol., 184, 6690, 2002. 50. Umeno, D. and Arnold, F. H. Evolution of a pathway to novel long-chain carotenoids. J. Bacteriol., 186, 1531, 2004. 51. Umeno, D. and Arnold, F. H. A C-35 carotenoid biosynthetic pathway. Appl. Environ. Microbiol., 69, 3573, 2003. 52. Lee, P. C. et al. Directed evolution of Escherichia coli farnesyl diphosphate synthase (IspA) reveals novel structural determinants of chain length specificity. Metabol. Eng., 7, 18, 2005. 53. Lee, P. C. et al. Alteration of product specificity of Aeropyrum pernix farnesylgeranyl diphosphate synthase (Fgs) by directed evolution. Protein Eng. Des. Sel., 17, 771, 2004. 54. Farmer, W. R. and Liao, J. C. Precursor balancing for metabolic engineering of lycopene production in Escherichia coli. Biotechnol. Prog., 17, 57, 2001. 55. Farmer, W. R. and Liao, J. C. Improving lycopene production in Escherichia coli by engineering metabolic control. Nat. Biotechnol., 18, 533, 2000. 56. Kang, M. J. et al. Identification of genes affecting lycopene accumulation in Escherichia coli using a shot-gun method. Biotechnol. Bioeng., 91, 636, 2005. 57. Alper, H. et al. Identifying gene targets for the metabolic engineering of lycopene biosynthesis in Escherichia coli. Metabol. Eng., 7, 155, 2005. 58. Alper, H., Miyaoku, K., and Stephanopoulos, G. Construction of lycopene-overproducing E-coli strains by combining systematic and combinatorial gene knockout targets. Nat. Biotechnol., 23, 612, 2005. 59. Dekel, E. and Alon, U. Optimality and evolutionary tuning of the expression level of a protein. Nature, 436, 588, 2005. 60. Flores, S. et al. Growth-rate recovery of Escherichia coli cultures carrying a multicopy plasmid, by engineering of the pentose-phosphate pathway. Biotechnol. Bioeng., 87, 485, 2004. 61. Sonderegger, M., Schumperli, M., and Sauer, U. Selection of quiescent Escherichia coli with high metabolic activity. Metabol. Eng., 7, 4, 2005. 62. Wierckx, N. J. P. et al. Engineering of solvent-tolerant Pseudomonas putida S12 for bioproduction of phenol from glucose. Appl. Environ. Microbiol., 71, 8221, 2005. 63. Zhang, Y. X. et al. Genome shuffling leads to rapid phenotypic improvement in bacteria. Nature, 415, 644, 2002. 64. Patnaik, R. et al. Genome shuffling of Lactobacillus for improved acid tolerance. Nat. Biotechnol., 20, 707, 2002. 65. Dai, M. H. and Copley, S. D. Genome shuffling improves degradation of the anthropogenic pesticide pentachlorophenol by Sphingobium chlorophenolicum ATCC 39723. Appl. Environ. Microbiol., 70, 2391, 2004. 66. Dai, M. H. et al. Visualization of protoplast fusion and quantitation of recombination in fused protoplasts of auxotrophic strains of Escherichia coli. Metabol. Eng., 7, 45, 2005. 67. Dunn, W. B. and Ellis, D. I. Metabolomics: current analytical platforms and methodologies. Trac. Trends Anal. Chem., 24, 285, 2005. 68. Kell, D. B. et al. Metabolic footprinting and systems biology: the medium is the message. Nat. Rev. Microbiol., 3, 557, 2005. 69. Kell, D. B. et al. Metabolic footprinting: a high-throughput, high-information approach to cellular characterisation and functional genomics. Yeast, 20, S335, 2003.

4-16

Evolutionary Tools in Metabolic Engineering

70. Fiehn, O. et al. Metabolite profiling for plant functional genomics. Nat. Biotechnol., 18, 1157, 2000. 71. Villas-Boas, S. G. et al. High-throughput metabolic state analysis: the missing link in integrated functional genomics of yeasts. Biochem. J., 388, 669, 2005. 72. Wittmann, C., Kim, H. M., and Heinzle, E. Metabolic network analysis of lysine producing Corynebacterium glutamicum at a miniaturized scale. Biotechnol. Bioeng., 87, 1, 2004. 73. Fischer, E., Zamboni, N., and Sauer, U. High-throughput metabolic flux analysis based on gas chromatography-mass spectrometry derived C-13 constraints. Anal. Biochem., 325, 308, 2004. 74. Villas-Boas, S. G. et al. Mass spectrometry in metabolome analysis. Mass. Spectrom Rev., 24, 613, 2005. 75. von Roepenack-Lahaye, E. et al. Profiling of Arabidopsis secondary metabolites by capillary liquid chromatography coupled to electrospray ionization quadrupole time-of-flight mass spectrometry. Plant Physiol., 134, 548, 2004. 76. Kittell, J. et al. Parallel capillary electrophoresis for the quantitative screening of fermentation broths containing natural products. Metabol. Eng., 7, 53, 2005. 77. Allen, J. et al. High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nat. Biotechnol., 21, 692, 2003. 78. Vaidyanathan, S., Gaskell, S., and Goodacre, R. Matrix-suppressed laser desorption/ionisation mass spectrometry and its suitability for metabolome analyses. Rapid Commun. Mass. Spectrom, 20, 1192, 2006. 79. Vaidyanathan, S., O’Hagan, S., and Goodacre, R. Direct infusion electrospray ionization mass spectra of crude cell extracts for microbial characterizations: influence of solvent conditions on the detection of proteins. Rapid Commun. Mass Spectrom, 20, 21, 2006. 80. Kaderbhai, N. N. et al. Functional genomics via metabolic footprinting: monitoring metabolite secretion by Escherichia coli tryptophan metabolism mutants using FT-IR and direct injection electrospray mass spectrometry. Comp. Funct. Genom., 4, 376, 2003. 81. Lindon, J. C., Holmes, E., and Nicholson, J. K. So whats the deal with metabonomics? Metabonomics measures the fingerprint of biochemical perturbations caused by disease, drugs, and toxins. Anal. Chem., 75, 384A, 2003. 82. Sun, S. X. et al. Assay development and data analysis of receptor-ligand binding based on scintillation proximity assay. Metabol. Eng., 7, 38, 2005. 83. Bro, C. and Nielsen, J. Impact of ‘ome’ analyses on inverse metabolic engineering. Metabol. Eng., 6, 204, 2004. 84. Gill, R. T. Enabling inverse metabolic engineering through genomics. Curr. Opin. Biotechnol., 14, 484, 2003. 85. Gill, R. T. Special issue on inverse metabolic engineering. Metabol. Eng., 6, 175, 2004. 86. Gill, R. T. et al. Genome-wide screening for trait conferring genes using DNA microarrays. Proc. Natl. Acad. Sci. USA, 99, 7033, 2002. 87. Bro, C. et al. Improvement of galactose uptake in Saccharomyces cerevisiae through overexpression of phosphoglucomutase: Example of transcript analysis as a tool in inverse metabolic engineering. Appl. Environ. Microbiol., 71, 6465, 2005. 88. Endy, D. Foundations for engineering biology. Nature, 438, 449, 2005. 89. Chin, J. W. Modular approaches to expanding the functions of living matter. Nat. Chem. Biol., 2, 304, 2006. 90. Benner, S. A. and Sismour, A. M. Synthetic biology. Nat. Rev. Genet., 6, 533, 2005. 91. Sismour, A. M. and Benner, S. A. Synthetic biology. Expert Opin. Biol. Ther., 5, 1409, 2005. 92. Elowitz, M. B. and Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature, 403, 335, 2000. 93. Gardner, T. S., Cantor, C. R., and Collins, J. J. Construction of a genetic toggle switch in Escherichia coli. Nature, 403, 339, 2000.

Evolving Pathways and Genomes for the Production of Natural and Novel Compounds

4-17

94. Cai, L., Friedman, N., and Xie, X. S. Stochastic protein expression in individual cells at the single molecule level. Nature, 440, 358, 2006. 95. Guido, N. J. et al. A bottom-up approach to gene regulation. Nature, 439, 856, 2006. 96. Kashtan, N. and Alon, U. Spontaneous evolution of modularity and network motifs. Proc. Natl. Acad. Sci. USA, 102, 13773, 2005. 97. Guet, C. C. et al. Combinatorial synthesis of genetic networks. Science, 296, 1466, 2002. 98. Smith, H. O. et al. Generating a synthetic genome by whole genome assembly: phi X174 bacteriophage from synthetic oligonucleotides. Proc. Natl. Acad. Sci. USA, 100, 15440, 2003. 99. Posfai, G. et al. Emergent properties of reduced-genome Escherichia coli. Science, 312, 1044, 2006. 100. Glass, J. I. et al. Essential genes of a minimal bacterium. Proc. Natl. Acad. Sci. USA, 103, 425, 2006.

5 Models Predicting Optimized Strategies for Protein Evolution 5.1 5.2

Introduction ��5-1 Random Mutagenesis ��5-3

5.3

Directed Mutagenesis ��5-7

5.4

Recombination ��5-10

5.5

Optimizing Chimeric Libraries....................................................5-16

Choosing Parental Proteins • Identifying a Protocol • Caveats

Combinatorial Consensus Mutagenesis (CCM) • Structure-Guided Saturation Mutagenesis • Structure-Guided Consensus Mutagenesis The Schema Algorithm • Residue Clash Maps (RCM) The FamClash Algorithm • Statistical Coupling Analysis (SCA)

Jonathan J. Silberg Rice University

Peter Q. Nguyen Rice University

Tunable Parameters during Library Construction • Calibrating the Disruptive Nature of Recombination • Recombination as a Shortest Path Problem (RASPP) • Optimal Pattern of Tiling for Combinatorial Libraries (OPTCOMB) • Practical Considerations for Library Synthesis • Estimating Chimeric Library Diversity

5.6 Conclusions ��5-23 References ��5-24

5.1 Introduction Naturally occurring enzymes are widely used for metabolic engineering, but they frequently do not exhibit the exact functional properties desired for a given application (e.g., they lack the desired stability, expression level, catalytic efficiency, or substrate-specificity). Within the colossal space of possible polypeptide sequences, proteins are thought to exist with functions that meet the specifications for almost any engineering goal that you can imagine. In some cases, knowledge-based design can find proteins with structures and functions distinct from those observed in nature [1–6]. However, our understanding of protein sequence-structure-function relationships is not yet sophisticated enough to consistently alter enzyme functions in a desired manner, especially when the design goal is to optimize a preexisting property. Directed evolution, in contrast, has repeatedly proven quite effective at optimization when applied alone or when used in tandem with knowledge-based mutagenesis [7]. By coupling mutagenic and selection (or screening) strategies as occurs in nature, this approach sieves through libraries of variants to filter out the desired ones with improved fitness. Directed evolution is restricted in its scope for many engineering goals. The number of variants that can be evaluated in a given experiment is minuscule (at best 1013 variants; see Ref. [8]) compared with the number of possible sequence variants that exist for an average size protein (~20300 in prokaryotes; see 5-1

5-2

Evolutionary Tools in Metabolic Engineering

Fraction of variants that are functional

1

% Chimera from parent 1

% Chimera from parent 2

Recombination

0.01 10–4 10–6 10–8

Mutation

10–10 10–12 10–14

0

20 40 60 80 Amino acid substitutions (relative to parent 1)

100

Figure 5.1. Amino acid substitutions created by recombination are more conservative than those created randomly. Random mutations yield variants whose function decreases exponentially with increasing amino acid substitutions m; this trend is approximated by νm, where ν is the fraction of single mutations that do not disrupt function [13]. The fraction of functional proteins remaining after recombination of structurally related proteins is approximated by ρm(D–m)/(D-1), where ρ is the fraction of single amino acid substitutions created by recombination that do not alter function, and D is the sequence distance between the recombined parents. For the example shown, D = 100 and ν and ρ represent the values experimentally determined for lactamases, 0.54 and 0.79, respectively. (Adapted from Drummond, D.A., Silberg, J.J., Meyer, M.M., Wilke, C.O., and Arnold, F.H., Proc. Natl. Acad. Sci. USA, 102, 5380, 2005.)

Ref. [9]) and the frequency with which functional proteins are thought to occur in this sequence space (~1 in 1012 to 1 in 1077; see Refs. [10,11]). In addition, the most commonly used methods for creating sequence diversity (mutation and recombination) create a limited fraction of variants that are functional (see Figure 5.1), a prerequisite for evolving improved fitness. With mutation, this arises because random amino acid substitutions are deleterious to function nearly half of the time [12,13]. On average, the effects of multiple substitutions are statistically independent, leading to an exponential decline in the probability of finding anything functional in a combinatorial library [13,14]. Recombination of structurally related proteins is more efficient at finding functional proteins with a given level of amino acid substitution [15]. Single substitutions created by recombination are more conservative on average than those created randomly, and the probability of retaining function with increasing substitutions does not decrease as dramatically relative to random mutation [16]. Because screening capacity is limited and adaptive mutations are rare, the fraction of folded and functional proteins found in a library can influence the strides you make during laboratory evolution. One way to optimize an experiment is to increase the fraction of folded and potentially interesting variants in your library. This can be done by infusing into your library design some knowledge about the protein(s) being evolved. This can draw from our understanding of protein stability, family sequences, structure-function relationships, or previous laboratory evolution experiments. In this chapter, we survey models that can enrich the number functional proteins found in combinatorial libraries that have unique polypeptide sequences. We primarily focus our discussion on computational approaches that have been directly tested using in vitro evolution experiments (see Table 5.1). In many cases, it remains unclear which strategy is best suited for a given design goal, since these models have not yet been rigorously benchmarked against each other. With continually accruing experimental data, however, we

5-3

Models Predicting Optimized Strategies for Protein Evolution Table 5.1 Summary of Models for Guiding Directed Evolution Strategy

Applications

Thermodynamic predictions

Combinatorial consensus mutagenesis

Finds optimal parents for random mutagenesis Identifies optimal protocols for random mutagenesis Predicts libraries that are enriched in stabilized variants

Structure-guided saturation mutagenesis Structure-guided consensus mutagenesis

Enriches libraries in variants with altered activity Enriches libraries in variants with improved activity

Schema

Anticipates disruption for a single chimera

Residue clash maps

Anticipates disruption for a single chimera Anticipates disruption for a single chimera Identifies networks of residues that specify a proteins fold Finds chimeric libraries that minimize calculated disruption Finds chimeric libraries that minimize calculated disruption

Experimental Evidence

Citation

MUTATION-BASED APPROACHES

EP-PCR simulations

Accelerated evolution of cytochromes P450 function Predicted qualities of scFv EP-PCR libraries Accelerated lactamase and lactamase-antibody stabilization Altered the specificity of various biocatalysts Improved the specific activity of lactate dehydrogenase

[12,25] [19] [35,38]

[42–45] [48]

RECOMBINATION-BASED APPROACHES

FamClash Statistical coupling analysis Recombination as the shortest path problem Optimal pattern of tiling for library design

Active lactamase chimeras have lower disruption than non-functional variants Correlated with previously published chimera data Correlated with DHFR chimera activity Accelerated discovery of WW proteins with diverse function Accelerated discovery of folded cytochromes P450 Not yet experimentally validated

[55,56]

[57] [58] [39,66] [51] [52]

should begin to glean which methods are optimal for a given design goal and how the existing strategies can be used in concert to further enhance the efficiency of directed evolution.

5.2 Random Mutagenesis Typically the first approach used for any laboratory evolution goal is random mutagenesis, in which an error-prone polymerase chain reaction (EP-PCR) is used to create mutations in a gene [14]. This method is easy and useful for optimizing a preexisting function in an enzyme of interest (e.g., increasing the stability, catalytic efficiency, or altering the substrate specificity profile), even in the absence of structural or mechanistic information [17]. By using protocols that produce mutations at a low frequency, EP-PCR creates variants that have a reasonable probability of retaining parental-like structure, a prerequisite for improved fitness. When screening relatively small libraries, beneficial mutations can frequently be identified that yield small increases in protein fitness. Larger improvements can be achieved by performing multiple rounds of EP-PCR, using the best variants from each round of screening [18]. The real challenge in creating optimal libraries is to maximize the number of protein variants with unique sequences that are folded and functional [19]. Library size, mutation rate, and mutation spectrum dictate the number of distinct sequences evaluated in an experiment. The number of unique and functional proteins Uf analyzed also depends on the probability that each variant will retain function, which decreases approximately exponentially with increasing numbers of amino acid substitutions [12–14]. In this section, we review models that have been reported for optimizing random mutagenesis. The first uses a thermodynamic approach to find optimal parents [12], and the latter can be used to simulate EP-PCR reactions and identify protocols that maximize Uf [19].

5-4

Evolutionary Tools in Metabolic Engineering

5.2.1 Choosing Parental Proteins Bloom and coworkers developed a simple thermodynamic model for estimating the fraction of proteins with m nonsynonymous mutations that retain parent-like function Pf(m) after random mutagenesis [12]. Their theory was motivated by the observations that the free energy of stability ∆G f for many proteins is similar in magnitude to the average stability change caused by single amino acid substitution [20,21], and also by the reports that multiple amino acid substitutions have approximately additive thermodynamic effects on protein stability [13,14,22–24]. Their model predicts that at low mutation levels, homologous proteins with differing stabilities exhibit a distinct fraction of functional proteins Pf(m) at each level of m (see Figure 5.2). As the level of m increases for structurally related proteins, the average effects of each additional random mutation begin to have similar average effects on protein function, and Pf(m) is approximately equal to νm, where ν is the average neutrality of each additional nonsynonymous mutation. Neutrality is a fundamental measure of a protein’s tolerance to mutation in terms of the fraction of proteins that retain wild-type structure after a single substitution [12–14]; this parameter is thought to depend largely on protein structure. Bloom’s model suggests that among protein homologs, those with higher stability yield EP-PCR libraries that are enriched in the number of unique, functional and potentially interesting proteins. Thus, when you have a choice of multiple proteins as parents for random mutagenesis, the model predicts that you should choose the parent that is more thermostable, provided that all else is equal. Two reports have recently provided direct experimental support of this hypothesis. In the first, the effects of random mutagenesis on β-lactamase function was measured for TEM-1 and a M182T mutant of TEM-1, which is 2.7 kcal/mol more stable than the native protein [12]. For each parent, identical mutagenesis conditions were used to create five distinct pairs of libraries with a range of mutation levels. The libraries derived from the M182T mutant consistently contained 15–60% more functional variants than those created from TEM-1. A recent study has also provided evidence that increases in protein stability promote the creation of new and improved cytochromes P450 [25]. As with the lactamases, a library derived from a more stable P450 encoded a greater fraction of variants with improved fitness properties. Protein variants created from a more thermostable P450 were ~two-fold more likely to be folded, as assessed by cofactor incorporation, and ~three-fold more likely to have new functions [26]. Provided that structural coordinates are available, Bloom’s theory can be used to estimate the effects of nonsynonymous mutations on protein function for any protein without creating a combinatorial library. This is done via a simple calculation of the probability that a sequence is still folded and functional Pf(m) after m nucleotide mutations as - ∆Gfextra

Pf (m ) =

∫

pm ( ∆∆G m )d ( ∆∆G m )

(5.1)

-∞

where ∆G fextra is the difference between the stability of the parental protein and the minimal threshold required to maintain its structure, and pm(∆∆Gm) is the probability distribution that a random mutation causes a stability change of ∆∆G [12]. The probability distribution can be obtained for all single amino acid substitutions by calculating ∆∆G1 using available algorithms [27,28], weighting these values by the probability that they are introduced by a single-nucleotide mutation during a random mutagenesis protocol, and assigning a zero ∆∆G value to all synonymous mutations and a large ∆∆G value to insertions and deletions (25 kcal/mol) [12]. The distribution pm(∆∆Gm) is calculated using an m-fold convolution of p1(∆∆G) as described [29]. Typically we do not know ∆G fextra. However, you can still estimate ν for any protein without this parameter by calculating the effects of a tenth mutation on folded variants that already contain nine mutations [12], i.e., the fraction of folded proteins with nine mutations that are inactivated by addition of another mutation. ∆G fextra can be calculated by fitting Bloom’s model to experimentally determined values for Pf(m) [12].

5-5

Models Predicting Optimized Strategies for Protein Evolution

m=0

P1 ∆Gextra f

∆Gextra f

F1

U

F1 F1

U

U

F1 F1 F1

U

U

F2

U

F2 F2

U

U

F2 F2 F2

U

U

F1 F1 U

m=1

F1 F1 F1 F1 F1 F1 F1 F1 F1

m=1

U

F2 F2 U

m=2

F10 U

F2 F2 F2

U

F2 F2 F2

U U U

m=2

F10 U

F10 F10 U

U

F10 F10 F10 U

U

0

m=0

P2

m = 10 U

∆Gf

F10 F10 U

U

F10 F10 F10 U

U

0

m = 10 U

∆Gf

Figure 5.2 Parental stability affects the fraction of functional proteins obtained upon random mutation. On average, random single mutations yield a higher fraction of folded and functional proteins for homologs with increased stability; compare m = 1 for P1 and P2 (F = folded and U = unfolded). In the case of the marginally stable P1, the incorporation of single random mutations into the F1 variants yields a similar fraction of functional variants with m = 2 as observed in the m = 1 distribution. However, in the case of the P2 homolog with increased stability, subsequent mutation of F1 variants yields a reduced fraction of functional variants (compare m = 1 and m = 2 distributions for P2). As shown for m = 10, a single round of mutation to the functional P1 and P2 progeny yield similar distributions of functional variants at high m. This results in the convergence of P1 and P2 to a similar exponential decline in function with increasing mutation level. (Adapted from Bloom, J.D., Silberg, J.J., Wilke, C.O., Drummond, D.A., Adami, C., and Arnold, F.H., Proc. Natl. Acad. Sci. USA, 102, 606, 2005. Adapted from Bloom, J.D., Labthavikul, S.T., Otey, C.O., and Arnold, F.H., Proc. Natl. Acad. Sci. USA, 103, 5869, 2006.)

5.2.2 Identifying a Protocol Several strategies have been described for altering the spectrum of mutations created during EP-PCR, including increasing Mg2 + concentrations over those used in standard PCR reactions [30], doping reactions with Mn2 + [14], varying the relative concentrations of nucleotides [31], and using DNA polymerases with different biases in the types of errors that they create [32]. Typically, EP-PCR is optimized for a

5-6

Evolutionary Tools in Metabolic Engineering

given application by screening a set of conditions and empirically evaluating each based on the number of variants identified with increased fitness [14,31]. In a few cases, there has been evidence that protocols yielding high mutation frequencies are better for the acquisition of improved or novel functions [33]. These findings have recently been rationalized by a model described by Drummond and coworkers [19], which posits that optimal libraries maximize the number of unique functional variants Uf. Drummond’s model is by far the best available for optimizing EP-PCR protocols in cases where you have some information about: (i) the number of thermal cycles used for PCR n, (ii) the probability that DNA strands are duplicated during PCR λ, (iii) the average nucleotide substitution frequency created by a PCR protocol <mnt>, (iv) the probability that a base pair mutation is nonsynonymous pns, (v) the probability that a nonsynonymous mutation will truncate and inactivate a protein ptr, and (vi) the average fraction of functional protein variants with one amino acid substitution ν. Many of these parameters can easily be obtained through simple experiments. The probability of strand duplication λ = d/n can be calculated by measuring the number of doublings d that occur in n PCR cycles using DNA agarose electrophoresis. Sequencing a small number of variants in a library provides information on the <mnt>, pns, and ptr for a given protocol [12,16]. Furthermore, ν can be measured directly by creating a set of libraries, measuring the fraction functional and <mnt> in each library, and fitting the data as previously described [16]. Alternatively, ν can be estimated as described in Section 5.2.1 [12]. Using these parameters, you first calculate the distribution of nonsynonymous mutations Pr(m) in your library as

Pr (m ) = (1 + λ )-n

n

∑ k =0

 n k ( ky ) e - ky  k λ m! m

(5.2)

where y = <mnt>pns(1 + λ)/(nλ), and m is the total number of nonsynonymous mutations in a variant. You should avoid assuming that you have a Poisson-distributed library as previously described [14]. This assumption does not hold when high-frequency mutation rates are generated through a small number of PCR thermal cycles [19]. Variants created by EP-PCR have a significant probability of containing nonsynonymous mutations that are insertions, deletions, and premature stop codons [12,16]. These mutations typically disrupt protein function, so it is important to account for them when calculating Uf. In Drummond’s model [19], the number of sequences that lack indels and premature stops at each level of nonsynonymous mutation Nm is N·Pr(m)·(1-ptr/pns)m, where N is the number of transformants analyzed in experiment. The number of possible unique sequences at any level of m is limited by the conservative nature of the genetic code and given by  L/3 Mm =   5.7m  m

(5.3)

the number of unique, functional sequences Uf in your library is L /3

uf =

∑M

m

(1 - e- N

m / Mm

) νm

(5.4)

m=0

and the fraction of unique functional sequences in a library is Uf/N. Upon establishing Uf /N for a set of libraries, you simply choose the library that maximizes this value to optimize your directed evolution experiment.

5.2.3 Caveats Random mutagenesis can frequently optimize a preexisting enzymatic function through multiple rounds of evolution. However, this approach can be arduous and limited in the types of functional changes that

Models Predicting Optimized Strategies for Protein Evolution

5-7

can be accomplished. The fraction of functional variants that EP-PCR creates decreases exponentially with increasing mutation frequencies (see Figure 5.1), greatly restricting the number of potentially folded and functional variants evaluated when screening a library with multiple mutations. In addition, libraries created by EP-PCR contain less than a third of all possible variants with a single mutation. Nonsynonymous base pair mutations created by EP-PCR are only capable of producing at best an average of six out of 19 possible amino acid mutations at any given residue position. Because DNA polymerases like Taq exhibit biases in mutation frequency, an even more restricted sequence diversity can be encountered [14,31]. Such biases can be minimized by using strategies that produce a more uniform mutational spectrum than traditional approaches [14,31], e.g., using Strategene’s GeneMorph II kit.

5.3 Directed Mutagenesis Many engineering goals necessitate the incorporation of amino acid substitutions at specific sites that cannot be accessed by random mutagenesis. For example, alterations in cofactor or substrate specificity frequently require multiple substitutions at key sites that are unlikely to occur during EP-PCR [34]. Family sequence or biophysical information can be used to identify these key sites and direct mutations during laboratory evolution. In addition, results from laboratory evolution experiments can be used to guide the design of second-generation libraries, in which mutations are created that have an increased probability of creating variants with increased fitness. In this section, we review approaches that constrain the location and/or the types of mutations that are allowed. These strategies have had success increasing the frequency of finding variants with improved thermostability [35–38], improving protein expression [38], and accelerating the discovery of proteins with altered functions [39].

5.3.1 Combinatorial Consensus Mutagenesis (CCM) The herculean efforts in protein sequencing have yielded a tremendous number of homologous sequences for almost any protein that you might choose to evolve. There is emerging evidence that this information alone can direct mutations during library synthesis. One of the simpler methods that uses sequence information for library design is CCM, which creates mutations in a protein that correspond to highly conserved amino acids within its protein family [35,38]. CCM is derived from the idea that stabilizing residues are more highly conserved than other residues at a given position within a protein family. To perform CCM, two sequential libraries are created. In the initial library, all possible consensus mutations are generated with identical probabilities. Some of these mutations are destabilizing, so a small set of variants is screened for stability improvements, and the relative contribution that each possible mutation makes to stability is calculated using a simple model. Mutations that are predicted to be stabilizing are used to create a second library that is further enriched in variants with improved stability properties [35,38]. Laboratory evolution studies have provided evidence that CCM can accelerate the discovery of stabilizing mutations [35,38]. When screening a small pool of β-lactamase variants from a library created with 29 possible consensus mutations (229 = 108 variants), almost one quarter of all isolates had improved stability compared to the parent [35]. In a second optimized library, created using the subset of the consensus mutations that were calculated to be stabilizing, a great majority of the variants had stability that surpassed the parent. In fact, a screen of just a few hundred variants from this library identified a protein with a mid-point for thermal denaturation that was 9°C higher than the parent. In a more recent study, CCM has been used to optimize the expression and thermostability of a lactamase-antibody fusion, which is being developed as a prodrug therapy [38]. CCM is relatively easy to perform (see Figure 5.3). ClustalW is used to calculate a consensus sequence from a family sequence alignment [40], and an initial library is then synthesized using QuikChange multi site-directed mutagenesis, a method that requires one mutant primer for each possible consensus mutation [41]. The sequence and fitness of a small pool of variants (~100) in this library is evaluated, and the relative contribution that each consensus mutation k makes to fitness P k is calculated using

5-8

Family sequences

Parent

Evolutionary Tools in Metabolic Engineering

M D I A D N E L E L P P A K I R C S G

.

.

.

M D I A E N E L E L P P V K I R C S A

.

.

.

M D I A D N A L E L P P V K L H C S A

.

.

.

M D I A E N P L E L P P V K L K C S A

.

.

.

m1 = E

m2 = V

m3 = L

m4 = A

m1

1

yes

2

yes

3

yes

yes

yes

yes

Calculated Pk

+

–

+

–

Mutations allowed in library #2

m1

Library #1 sequences

Variant

m2

m3

m4

Ri 2

yes

2.1 0.2

m3

Figure 5.3 Optimizing libraries using combinatorial consensus mutagenesis. A family sequence alignment is used to identify mutation sites and amino acid substitutions for creating an initial library. The sequence and relative fitness (Ri) is determined for i variants in this library, the contribution that each mutation k makes to fitness Pk is calculated, and a further optimized library is created using only those mutations that are predicted to improve fitness, i.e., those with Pk>0. (Adapted from Amin, N., Liu, A.D., Ramer, S., Aehle, W., Meijer, D., Metin, M., Wong, S., Gualfetti, P., and Schellenberger, V., Protein Eng. Des. Sel., 17, 787, 2004. Adapted from Roberge, M., Estabrook, M., Basler, J., Chin, R., Gualfetti, P., Liu, A., Wong, S.B., Rashid, M.H., Graycar, T., Babe, L., and Schellenberger, V., Protein Eng. Des. Sel., 19, 141, 2006.)

screening and sequencing data [35,38,41]. The second-generation library is then created using only those sites k where Pk>0, i.e., by including only those mutations that are found to be stabilizing in nature. To determine Pk, a matrix Mki is created where the possible mutation sites k in each characterized variants i are represented as a 1 if they are present (otherwise = 0). Stability contributions of each mutation are assumed to be additive, and the remaining activity Ri of each protein variant after heat treatment is defined as: m

log Ri =

∑M P +C ki k

(5.5)

k =1

where C is a constant that equals the remaining activity in the parental enzyme after a similar heat treatment. Finally, the set of values for Pk is calculated by minimizing the sum of the differences between the measured and calculated Ri for all n chimeras as follows:

n

min

∑ log( R )

i obs

- log ( Ri )calc

(5.6)

i =1

This minimization can be performed using the solver function of Microsoft Excel, as was described for the optimization of the lactamase CCM library [35].

Models Predicting Optimized Strategies for Protein Evolution

5-9

5.3.2 Structure-Guided Saturation Mutagenesis Saturation mutagenesis is one of the oldest strategies available for optimizing combinatorial libraries. In this approach, select codons within a gene are randomized to yield the full spectrum of possible amino acids at those positions. Saturation mutagenesis is extremely limited in the number of sites that can be effectively targeted and thoroughly screened for fitness improvements. When m sites are targeted, the ensuing sequence diversity = 20m. Thus, in libraries with just four or five mutations, the number of unique sequences is a staggering 105–106. In some cases, structural information has been used to guide saturation mutagenesis. Such experiments have had the most success when the design goal is to alter the substrate selectivity of an enzyme. In these cases, structural information is used to target key sites that appear to directly mediate protein substrate specificity, e.g., those directly contacting specific functional groups on your ligand of interest. One of the most notable early successes with this approach was the expansion of the genetic code [42]. In this study, an aminoacyl-tRNA synthetase was created that charges a tRNA with the non-natural amino acid O-methyl-L-tyrosine. To alter the amino acid specificity for this tRNA synthetase, variants were created by performing saturation mutagenesis at residues that were within 6.5 Å of the aryl ring of tyrosine [42]. Since this initial success, similar structure-guided approaches have been successfully used to create altered substrate specificity in diverse enzymes, including cytochromes P450 [43], N-acetylneuraminic acid lyase [44], and β-galactosidase [45]. Due to the structure of the genetic code, randomized codons (NNN, where N can be A, C, G, or T) produce amino acids at varying frequencies. This occurs because the number of codons encoding each amino acid varies, e.g., tryptophan is encoded by one codon and serine is encoded by six. If only one position is randomized, the differences in the ratio of the most common variant and the rarest in a library is not that large. However, this ratio increases exponentially ( = 6 m) with the number of codons m that are randomized. Thus, when even a small number of sites are targeted, e.g., m = 5, the most common variant is 7,776 times more likely to be encountered than the rarest one. Randomized codons also produce a significant number of variants with premature stop codons. The genetic code contains three stop codons, and the fraction of variants in a library that lack stop codons is (61/64)m. Thus, for the example above of a library with five randomized sites, approximately 22% of the protein variants are truncated. One way to minimize these issues is to use a biased nucleotide mixture when synthesizing each codon [46,47]. If you create a randomized codon using a nucleotide mixture that corresponds to NNX (where X can be C, T, or G but not A), then the fraction of variants that lack stop codons in increased to (47/48)m; with this mutation approach only 10% of variants in a m = 5 library are truncated. Biased nucleotide mixtures can also be used to reduce library sequence biases, e.g., the ratio of the most common variant and the rarest, as described in Ref. [46].

5.3.3 Structure-Guided Consensus Mutagenesis A modified consensus mutagenesis approach has been described that may accelerate the discovery of proteins with altered function [48]. This strategy borrows on concepts from CCM and site-saturation mutagenesis, in which key residues are mutated in a protein to try to improve the probability of making large fitness jumps. As with CCM, this strategy uses a family protein sequence alignment to determine the natural variability of residues at each position within the family. However, this approach also uses structural information to limit the possible mutations to sites proximal to the ligand-binding site being modified, unlike CCM. Only those residues that lie within 4 Å of the relevant cofactor (or substrate) are mutated. Residues are partially randomized at each site such that those positions can encode the wild type residue or any of the amino acids that were observed within the protein family at that position. Structure-guided CCM has been used to improve the specific activity of lactate dehydrogenase [48]. In this study, the PredictProtein server was used to find similar sequences to LDH from Bacillus

5-10

Evolutionary Tools in Metabolic Engineering

stearothermophilus [49], and MaxHom was used to determine the sequence variability at each postion within the family alignment [50]. Five of the 17 residues proximal to NAD + in the LDH structure exhibited sequence variability and were targeted for mutation. The combinatorial library was then synthesized using PCR with oligonucleotide primers that encoded a programmed set of codons at each of the five positions [48]. The codons were designed to maximize the number of variants that exhibited protein family diversity at those positions. For each mutated codon, a minimal set of base pairs was chosen that encoded the parental residue and phylogenetically prevalent amino acids. It should be emphasized that this type of codon programming can at times generate undesired biases or substitutions at low frequencies in addition to the desired residues. As described in the previous section, the biases in programmed codon mixtures can be evaluated in silico to determine which best suits your sequence diversity goals.

5.4 Recombination The best strategy for creating functional proteins with large numbers of amino acid substitutions is protein recombination, in which structurally related polypeptides are exchanged among homologous proteins [15]. Studies investigating the effects on protein structure of amino acid substitutions created in this way have shown that recombination can be up to 1016-fold more effective than EP-PCR at finding functional proteins with high levels of substitution [16]. While the substitutions created by recombination are inherently more conservative than those created randomly, libraries containing chimeras with high levels of substitutions still contain a significant fraction of variants with disrupted structure and function. Several algorithms have been reported that use sequence and/or structural information to anticipate chimera disruption in a library. Typically these algorithms cannot be used alone for library optimization. They must be applied in concert with the models described in Section 5.5 to design optimized libraries for laboratory evolution, e.g., RASPP or OPTCOMB [51,52]. More sophisticated structure-based models have been described for anticipating the folding of protein variants created through mutation or recombination [53,54]. However, it remains unclear how well these approaches anticipate chimera disruption compared with the simpler models reviewed herein.

5.4.1 The Schema Algorithm The first major innovation in structure-guided recombination was the development of the Schema algorithm [55]. When swapping structurally related polypeptides among protein homologs, Schema posits that the best way to maximize your probability of conserving structure and function is to minimize the number of pairwise residue-residue interactions in the parental structures that are altered by recombination. Interactions are simply defined as any pair of residues that are within a defined cutoff distance. One of the major advantages of Schema is its simplicity and ease of use. As described in subsequent sections, additional models have been developed to anticipate chimera structural disruption. However, the ability of these methods to enrich libraries in folded and functional chimeras has not yet been rigorously tested, or benchmarked against Schema for a range of protein families. Laboratory evolution studies have shown that Schema disruption E is a useful metric for anticipating if a chimeric protein will retain function. In a well-defined library of β-lactamase chimeras, bacterial selections revealed that variants with low E were functional more frequently than expected if there was no correlation between E and the conservation of function [56]. Schema-guided recombination of cytochromes P450 have also yielded an amazing array of folded and functional chimeras with high levels of amino acid substitution [26]. In well-defined library, created by recombining three cytochrome P450 heme domains (~60% pairwise sequence identity), approximately half of the variants were correctly folded. The folded chimeras differed from their parents by an average of 73 amino acid substitutions, and a great majority of these variants had some catalytic function.

5-11

Models Predicting Optimized Strategies for Protein Evolution (a) SCHEMA

All interactions in structure

RCM

All interactions in structure

Parent sequences

PDB coordinates

ALVSRAHDEF ALTTEVRERY

Parent sequences ALVSRAHDEF ALTTEVRERY

(b)

=

1 2 3 4 5 6 7 8 9 10 Residue

1 2 3 4 5 6 7 8 9 10 Residue

Remove interactions that cannot be disrupted

Identify potentially disruptive interactions

ALVTEVREEF

=

=

1 2 3 4 5 6 7 8 9 10 Residue

1 2 3 4 5 6 7 8 9 10 Residue Identify and count interactions disrupted ALVTEVREEF

(E = 2)

=

ALVTEVREEF

=

Count interactions that create clashes

Residues 5 and 9 clash 0 charge in both parents –2 charge in chimera 1 2 3 4 5 6 7 8 9 10 Residue

(Eclash = 1)

1 2 3 4 5 6 7 8 9 10 Residue

Figure 5.4 Using structural coordinates to calculate the disruption of a chimera. (A) To calculate the Schema disruption E for a chimera arising from the recombination of two homologous proteins, a matrix representing all interacting residue-residue pairs within the parents is created using the PDB coordinates of one parent (top panel), residue-residue pairs that cannot be altered by recombination are removed from this matrix (middle panel), and the number of remaining interactions that are broken by recombination are counted (thick lines, bottom panel). (B) To calculate RCM clashes E clash for the same chimera, coordinates are used to generate a contact matrix (top panel), residue-residue pairs that are proximal and not present in either of the parents are defined as potentially disruptive (thick lines, middle panel), and parental sequence data is used to determine which of these pairs create a clash upon recombination (bottom panel). A proximal residue-residue pair that alters the physiochemical properties of that pair compared to the parents is considered a clash, e.g., the chimera shown in the bottom panel contains a new residue-residue pair (E5–E9) whose charge is distinct from the structurally related pairs found in both of the parents (R5-E9 and E5-R9). (Adapted from Voigt, C.A., Martinez, C., Wang, Z.G., Mayo, S.L., and Arnold, F.H., Nat. Struct. Biol., 9, 553, 2002. Adapted from Saraf, M.C., and Maranas, C.D., Protein Eng., 16, 1025, 2003.)

To calculate E for a single chimera (see Figure 5.4a), you first identify all pairwise residue-residue interactions in the structure of one parent. In the simplest case of recombining two parents, the contacts are scaled by the sequence identity of the proteins being recombined, i.e., all contacts that cannot be broken by recombination are ignored. The E of a chimera is simply calculated by counting the number of residue-residue contracts broken when that chimeric protein inherits portions of its sequence from different parents. It should be noted that Schema does not differentiate between chimeras in a library that have identical crossover locations but different polypeptide inheritance, e.g., AA-BB versus BB-AA chimeras, even though these types of chimeras often exhibit distinct functional properties [56]. This can be contrasted with other metrics, such as FamClash and residue clash maps (RCM), which at times assign different calculated disruption values to chimeras with identical crossover locations [57,58].

5-12

Evolutionary Tools in Metabolic Engineering

When recombining sequence elements from homologous proteins to create a single, monomeric chimera, si is used to indicate the parent incorporated at each position i in the chimeric sequence (e.g., s1 = 1 if the first residue in the chimera is inherited from the first parent, s2 = 2 if the second residue is from the second parent, etc.). The calculated disruption E of that chimera is N

E=

N

∑ ∑C ∆ ij

ij

(5.7)

i =1 j =i +1

where N is the number of residues that have well-defined coordinates in the structure used for calculations, and Cij designates whether two residues are contacting ( = 1 if residues i and j are within the cutoff distance, otherwise Cij = 0). In some cases a pairwise contact (i.e., Cij = 1) cannot be broken by recombination because some exchanged polypeptides do not effectively disrupt pairwise interactions observed in the parents; they don’t create residue-residue pairs that are distinct from those that are observed in one or more of the parents. For this reason, the delta function ∆ij uses a parental sequence alignment to indicate which pairwise interactions in the chimera are distinct from those present at structurally related positions in either parent. If the pairwise interactions between two amino acids i and j in the chimeric sequence are present in structurally related positions of either parent, then ∆ij = 0 (otherwise ∆ij = 1). It remains unclear whether a particular distance cutoff is optimal for calculating Cij and anticipating the structural disruption in a chimera. Studies investigating the quality of Schema predictions have used a distance cutoff equal to 4.5 Å [26,55,56,59], and these studies have typically excluded hydrogen, backbone nitrogen, and backbone oxygen atoms from the calculation of E. In studies using Schema to recombine cofactor-containing proteins, interactions between atoms in the cofactors and residues in the proteins are excluded from the calculation of E [26,59]. Homologous proteins typically bind and use the same cofactor, so any interactions between the cofactor and protein cannot be broken upon recombination. Provided that PDB coordinates are available for all the parents recombined, a structure-based alignment of the parental sequences should be performed to ensure that structurally related residues are numbered identically. Several free software packages can do this, including SwissProt and the combinatorial-extension algorithms [60–62]. If multiple conformational states are available for one or more of the parent proteins (e.g., structures in the presence and absence of substrates) you should assess E using all available coordinates to ensure that both conformational states of the chimeras are likely to exhibit similar disruption [59]. In cases where the structure of only one parent is available, a parental sequence alignment is generated using any multiple sequence alignment program, and all pairwise contacts Cij are identified using the available structure. When no direct structural information is available for any of the parents, a structural model can be used to identify residue-residue contacts in the parents. Structural models can be created using Swiss-Model [63], a fully automated protein structure homology-modeling server. It remains unclear how useful this latter approach is for optimizing libraries, although it seems likely that it will decrease as the parental sequences and the structural template used for modeling become more divergent. Protein homologs often differ in length, and alignment of the parents requires the insertion of gaps within the primary amino acid sequence of one or more of the parents. Such gaps should be accounted for during the calculation of E. When gaps are introduced into the parent whose structural coordinates are being used to generate the contact matrix Cij, the residues found in the other parents are ignored when calculating E because there is no corresponding structural information. In contrast, when gaps occur in any parent other than the one used for structural information, they should be treated like real residues that differ in identity from the residues in the other parents. Some proteins require oligomerization for their stability and function. In cases where such oligomereic proteins are recombined and your design goal is to create oligomeric chimeras, the number of broken interactions within a monomer may not be sufficient to account for all broken interactions. In

Models Predicting Optimized Strategies for Protein Evolution

5-13

addition to calculating the number of residue-residue interactions broken by recombination as described above (E), you must account for interactions that are disrupted between each of the peptide chains in the oligomer. The equation used to calculate interfacial disruption Einterface of an oligomeric chimera is the same as for calculating disruption in a monomer, except that Cij designates whether residue i from chain A is contacting residue j from chain B. The total calculated disruption for an oligomeric chimera Etotal = E + Einterface.

5.4.2 Residue Clash Maps (RCM) Conceptually, RCM uses a similar approach as Schema [57]. Parental sequences and structures are used to identify and count potentially disruptive residue-residue interactions that occur in chimeras created by recombining structurally related proteins. Whereas Schema treats all broken pairwise interactions as having a similar probability of disrupting function, RCM only considers interactions that significantly alter charge, volume, or hydrogen bonding properties. To date, there have been no reports of libraries optimized using RCM as a guide. It remains unclear how well this algorithm can design libraries de novo that maximize the number of sequences with low disruption (and a user defined level of sequence diversity). However, comparisons of RCM predictions with published experimental results for a small number of recombination studies indicate that functional chimeras have fewer RCM clashes than would be observed if clash maps were randomly generated [57], i.e., if clashes were not selectively restricted to residue-residue pairs that alter charge, volume, or hydrogen bonding properties. In RCM (see Figure 5.4b), the number of clashes in a chimera Eclash is N

Eclash =

N

∑ ∑C ∆ ρ ij

ij ij

(5.8)

i =1 j =i +1

where N is the number of residues that have well-defined coordinates in the structure used for calculations, and Cij designates whether two residues are contacting. Cij = 1 if the C β carbon in residues i and j are within 8 Å (if a residue lacks C β, then C α carbon is used), otherwise Cij = 0 [57]. As with Schema, the delta function ∆ij uses a parental sequence alignment to determine which pairwise interactions in the chimera are distinct from those present at structurally related positions of either parent. If the pairwise interactions between amino acids in the chimera are present in structurally related positions of either parent, then ∆ij = 0 (otherwise ∆ij = 1). Alignments are performed as outlined for Schema. The rho function ρij indicates which pairwise interactions in the chimera significantly alter the charge, volume, or hydrogen bonding properties of that pair from that observed in the parental proteins. ρij = 1 (otherwise ρij = 0) for any residues i and j that: (i) contain repulsive charges that are not observed in the parents (e.g., if two basic or acidic residues are placed in contacting positions within a chimera), (ii) do not create the number of satisfied hydrogen bonds that are detected in the parents, (iii) are within the core of the protein and have a volume change score Sij>15 Å3, or (iv) are surface exposed, have a additive volume greater that of either of the parents at those positions, and have a volume change score Sij>30 Å3 [57]. A residue-residue pair is only considered within the core of a protein if the accessible surface area of the side chain is <8 Å 2. The WHATIF software package can be used to calculate accessible surface area for a 1.4-Å water probe and identify hydrogen bonds [64]. In RCM [57], cavities created by recombination, i.e., residue pairs i and j created with additive volumes (Vi + Vj) that are less than the average additive volume Vp observed for the parents at those positions, have an Sij = |(Vi + Vj)-Vp-Dij|. In cases where steric hinderence arises from recombination (Vi + Vj>Vp), the Sij = |(Vi + Vj)- + Dij|. The Dij parameter accounts for level of tolerable changes in volume that occur within the parental proteins. Dij equals |(Vi + Vj)Parent1-(Vi + Vj)Parent2|, provided that it is greater than 0.1 Vp, otherwise Dij = 0.1 Vp to prevent artificially inflated scores.

5-14

Evolutionary Tools in Metabolic Engineering

5.4.3 The FamClash Algorithm FamClash was developed to guide the recombination of homologous proteins using family sequence data alone [58]. FamClash posits that chimeras that retain the observed consensus physiochemical properties of residue-residue pairs within a protein family are most likely to retain parent-like function. The number of FamClash predicted residue-residue incompatibilities (designated clashes) appears to be related to the specific activity of functional chimeras [58]. A strong inverse correlation was reported between the number of calculated clashes and the specific activity of functional dihydrofolate reductase (DHFR) chimeras present in a combinatorial library, which was created by swapping single polypeptides between E. coli and B. subtilis DHFR. In FamClash (see Figure 5.5), each pair of sequence positions i and j in the protein family is initially designated as conserved (or not) based on the relative variation of additive charge (c), volume (v), and hydrophobicity (h) for those positions throughout the family [58]. For each sequence m in the protein family, matrices representing the additive charge (Cij = ci + cj), hydrophobicity (Hij = hi + hj), and volume (Vij = vi + vj) are calculated for related pairs of positions i and j using published values of c, v, and h [58]. The values obtained for all m sequences at a given pair of positions are partitioned into bins φpqr that subdivide the possible combinations of charge (p), hydrophobicity (r), and volume (q) property ranges. In the case of charge, p = Cij, yielding five possible integer values, -2, -1, 0, 1, and 2. Additive hydrophobicity and volume, in contrast, are broken into ten bins each, which divide their possible continuous range of values (q = 0-300 Å3 and r = -2.3-3.7 kcal/mol) into equal sized bins. Any pair of sequence positions i and j where ≥20% of the m naturally occurring pairs reside in one of the 600 possible φpqr bins is designated as potentially conserved. A threshold level of occupancy in a bin φpqr is not sufficient to designate a given pair of sequence positions as conserved by FamClash [58]. In addition, the identity of the amino acid at the first position

Homolog

C1,2

H1 H2 H3 H4 . .

0 1 0 0 . .

Homolog

C1,3

H1 H2 H3 H4 . .

0 0 0 –1 . .

V1,2

H1,2

Å3

100 –2 kcal/mol 45 Å3 –1.4 kcal/mol –2 kcal/mol 100 Å3 90 Å3 –1.9 kcal/mol . . . . V1,3 220 Å3 145 Å3 180 Å3 175 Å3 . .

H2

Sort parents into φpqr bins H1 H3 H4

r = hydrophobicity

q = volume p

H1,3 –1 kcal/mol 1.8 kcal/mol 0.4 kcal/mol 0.5 kcal/mol . .

Residue pair 1,2 = potentially conserved

p = charge

H1

H2

H3 H4

q

r Residue pair 1,3 = not conserved

Figure 5.5 Finding residue pairs in a family with conserved physiochemical properties. In FamClash, a family sequence alignment is used to generate matrices that describe the additive physiochemical properties (charge Cij, volume Vij, and hydrophobicity Hij) of every pair of positions i and j in a protein. For each pair of positions, homologs are sorted into φpqr bins based on these properties, and the relative occupancy of the bins is used to establish whether a given pair of positions in the family displays conserved physiochemical properties. Pairs are considered as potentially conserved if a single bin has sufficient occupancy (>20% of the pairs). In the example shown, residue pair 1,2 is potentially conserved, but residue pair 1,3 is not conserved and is no longer considered in the calculation of chimera clashes. (Adapted from Saraf, M.C., Horswill, A.R., Benkovic, S.J., and Maranas, C.D., Proc. Natl. Acad. Sci. USA, 101, 4142, 2004.)

5-15

Models Predicting Optimized Strategies for Protein Evolution

within the pair that occupies a highly populated bin must exhibit a defined level of dependence on the identity of the amino acids found at the second position. This relative level of dependence is given by the mutual information index score Mijpqr as

Mijpqr =

∑ ∑ P (a ,a ).log  P (a( ) P (a) )  ik

k

jl

2

 P aik , a jl ik

l



(5.9)

jl

where aik is the frequency of residue k at position i, ajl is the frequency of residue l at position j, P(aik,ajl) is the joint probability that k and l are observed at positions i and j at the same time in the parents, P(aik) P(ajl) is the product of the individual probabilities of occupancy for residues k and l at positions i and j. A pair of positions is considered conserved if the mutual information index is greater than a threshold value Mc. Mc is calculated using a bootstrap replicate analysis [58]. In this procedure, the vectors encoding residues i and j are copied from the parental sequence alignment, and 10,000 copies are made by randomly permuting the residues found in the original vectors. For each possible φpqr, mutual information index scores are generated using the bootstrap replicates. A Mc is chosen using the obtained score distributions for each φpqr, such that only 0.5% of the φpqr bins have scores that are greater than that value; that is, only three out of the 600 bins scores are greater than Mc. In a chimera, two residues are considered as having a clash if that pair of residues i and j is designated as conserved in the family, the amino acids k and l participating in that pair are inherited from different parents p1 and p2, and if any of the following four conditions occurs: (i) the additive charge of residues i and j in the chimera differs from the average additive charge obtained for all sequences in the protein family, (ii) the absolute value of the additive hydrophobicity of residues i and j in the chimera (hi + hj) is greater than the average additive hydrophobicity obtained for all sequences in the protein family plus a cutoff value δhij, (iii) the additive volume of residues i and j in the chimera (vi + vj) is greater than the average additive volume obtained for all sequences in the protein family plus a cutoff value δvij, or (iv) the additive volume of residues i and j in the chimera (vi + vj) is less than the average additive volume obtained for all sequences in the protein family minus a cutoff value 2δvij [58]. The cutoff values used for δvij and δhij are

{

}

{

}

δ vij = max Vijp1 - Vijp2 , Vij / 10 , δ hij = max Hijp1 - Hijp2 , Hij / 10

(5.10)

5.4.4 Statistical Coupling Analysis (SCA) SCA is an area of great excitement in sequence-guided protein design [39,65,66]. Like FamClash, SCA posits that if you have enough sequence information for a particular protein family, you can identify networks of coevolving residues that must be inherited from the same parent in order to conserve protein structure. Residues are defined as coevolving if the amino acids found at two positions in the protein family are observed to change in concert during sequence evolution [65]. Protein variants that maintain parent-like coupling are predicted to be the most likely to retain structure and be functional. It should be noted that SCA provides a binary answer about whether a particular variant is expected to retain parent-like coupling (and structure), unlike FamClash which predicts the extent to which protein structure is likely to be disrupted by recombination [58]. There is emerging evidence that for small single-domain proteins, SCA yields much of the information required for specifying protein fold and function [39,66]. Experiments analyzing the structures of 43 artificial WW domains with conserved coupling found that approximately one third of the variants retained parent-like structures [66]. In contrast, a similar sized set of WW variants that did not conserve coupling contained zero proteins with parent-like structures [66]. Subsequent functional analysis of the coupled variants further showed that these designed proteins exhibit functional properties similar to

5-16

Evolutionary Tools in Metabolic Engineering

those observed in naturally occurring WW proteins [39]. Thus, it appears that in at least some protein families, this strategy can recapitulate the functional diversity observed in nature in a small combinatorial library. To perform SCA for a particular family, you must first generate a multiple sequence alignment. Ranganthan and coworkers have noted that such an alignment must be evolutionarily well sampled, such that the removal or addition of sequences does not significantly alter the amino acid distribution at each position [65]. Using the abundance of each amino acid x at each position j from this alignment, the binomial probability of observing an amino acid x at each position j(Pxi) is calculated as

Pjx =

N! N -n pxnx (1 - px ) x nx !( N - nx )!

(5.11)

where N is the number of sequences in the family alignment, nx is the number of sequences with amino acid x at position j, and px is the mean of all amino acids x in all proteins [65]. This information is converted into statistical energies ∆Gxj for each amino acid x at site j as

∆G xj = kT ln

Pjx x PMSA

(5.12)

where P is the probability of observed amino acid x in the family sequence alignment [65]. The statistical coupling between two sites in the protein family i and j (∆Gi,jstat) is given as

∆Gistat , j = kT

∑ ln( ∆G

x i δj

- ∆Gix

)

2

(5.13)

where ∆Gixδ j is the statistical energy for amino acid x at site j derived from the perturbation at i [65]. In cases where two residues are not coupled, the statistical coupling is zero. SCA has not yet been used to design proteins with >100 residues or create large combinatorial libraries for laboratory evolution, e.g., those with ≥102 variants [39,66]. However, statistical coupling has been observed in proteins significantly larger than WW domains, including retinoid X receptors [67] and PDZ domains [65], suggesting that this theory will be useful for diverse protein engineering goals. One can easily envision using SCA concomitantly with other optimization strategies for site-directed recombination. SCA could be used to identify coupled residues within biocatalysts, and library optimization strategies like Schema and RASPP could be used to find the libraries that minimize the structural disruption of chimeras which conserve coupled residues [51,55].

5.5 Optimizing Chimeric Libraries In cases where you want to recombine two parents at a couple of sites, it is typically easy to exhaustively enumerate all possible libraries of that type on a desktop computer and identify those that are optimally enriched in chimeras of low calculated disruption. However, when recombining multiple proteins at a large number of sites, exhaustive enumeration cannot calculate the disruption of all chimeras in all possible libraries of a given size for a set of parents. Consider the recombination of two average size bacterial proteins (~300 residues in length), which exhibit 30% amino acid sequence identity. If you created a library by allowing ten crossover sites between two parents (i.e., allowing inheritance of eleven different polypeptides) that library would contain a total of 211 unique variants. While complete enumeration of a single library is trivial, there are ~1015 possible libraries that exist for the two parents in this example; far too many for complete enumeration on a reasonably fast desktop computer. In these cases, the optimization strategies described below should be used for library design.

Models Predicting Optimized Strategies for Protein Evolution

5-17

Before implementing the algorithms reviewed in this section, you must first identify your diversity goal and choose a theoretical model to use for structural disruption, e.g., FamClash, Schema, and Residue clash maps [55,57,58]. Diversity goals could be as simple as creating a particular number of unique folded chimeras, or generating a specified number of folded chimeras with desired sequence properties, e.g., a particular average level of amino acid substitution. We still know very little about what constitutes optimal diversity when using recombination for laboratory evolution. There is emerging evidence that high levels of amino acid substitution may at times be better for creating chimeras with altered biocatalytic properties [26,59]. Once your goals have been established, you should calibrate the relationship between the theoretical predictions of your models and experimental reality. The models described in previous sections cannot estimate the fraction of folded and functional chimeras in a library a priori. They only generate noncalibrated estimates for the relative structural disruption of two or more chimeras. You must also have some prior knowledge about the disruptive nature of amino acid substitutions arising from recombination of your proteins of interest. Small experimental data sets, comprised of a handful of chimeras, can be used to establish this relationship and calibrate model predictions. With this data in hand, the algorithms described in this section can be used concomitantly with those described in Section 5.4 to rapidly identify near-optimal libraries that achieve your diversity goals.

5.5.1 Tunable Parameters during Library Construction Chimeric libraries optimized using computational approaches are typically created using site-directed recombination [26,56,68]. This experimental approach affords tight control over a number of library characteristics. The number of unique chimeras can easily be specified by how you build your library, since this parameter depends on the number of parents and crossover sites you use to build your library. Upon recombining p parents at n sites, pn + 1 unique variants are created in cases where there is no bias in the polypeptide inheritance (see Figure 5.6). The amino acid substitution level accessible in your chimeras can also be adjusted through your parental choice. The accessible substitution level increases as the sequence identity among the parents decreases, and also as the number of parents recombined increases. In cases where your optimization goal is to create a library with chimeras exhibiting a defined number of amino acid substitutions, it is best to recombine the most closely related proteins that will yield your desired goal [16]. Libraries created in this way are predicted to contain a higher fraction of functional variants than libraries created using more distantly-related parents [16]. Finally, the fraction of folded (and functional) proteins in your library can be maximized by choosing crossovers that maintain the calculated disruption below a threshold value. The design flexibility associated with the use of site-directed recombination for computation-guided library design is much greater than that of traditional strategies. In site-directed protein recombination, libraries of well-defined composition can be created using any parents that you can imagine. In contrast, annealing based strategies, like DNA shuffling [15], staggered extension process [69], or in vivo methods [70], are limited to recombining proteins with >70% sequence identity because these strategies can only create crossovers in regions where the parental genes exhibit high sequence identity. These restrictions not only limit the choice of parents, but they also limit the number of amino acid substitutions that can be incorporated into a given chimera, relative to the parents used for recombination. For example, when recombining two Class A β-lactamases [26,56], site-directed recombination can generate chimeras with >70 amino acid substitutions, but annealing-based strategies are limited to ≤40 substitutions. The sequence-independent methods SHIPREC, ITCHY, and SCRATCHY can also recombine more distantly related parents and access greater sequence diversity than annealing-based strategies [71–73]. However, these approaches cannot control crossover location, and they are grossly inefficient at recombining structurally related proteins; they frequently create insertions and deletions. In addition, they are limited to making chimeras with only a handful of crossover sites.

5-18

Evolutionary Tools in Metabolic Engineering 1

2

3

4

5

1

2

3

4

5

Fragment

1

2

3

4

5

1

2

1

2

3

4

5

1

2

5 3

4

5

Reassemble Unbiased

Biased

Figure 5.6 Polypeptide inheritance in libraries. Sequence elements encoding structurally related fragments are swapped at user-defined locations during site-directed recombination. In this approach, structurally related fragments can be inherited with equal frequency (unbiased) or with defined biases.

5.5.2 Calibrating the Disruptive Nature of Recombination The effects on protein structure of a particular level of calculated disruption is expected to vary dramatically from experiment to experiment, as observed when comparing recombination studies with β-lactamases and cytochromes P450 [26,55,56,59]. Structural disruption can be influenced by a number of factors. First of all, protein topology and parental sequence identity affect the deleterious nature of amino acid substitutions created by recombination [16]. Second, the thermodynamic stability of the parent proteins is expected to influence the deleterious nature of recombination. Amino acid substitutions are predicted to be less deleterious when incorporated into proteins of higher thermostability, since the deleterious nature of random mutations created by EP-PCR decreases as the thermostability of the parents increases [12]. Protein expression is also expected to vary between recombination experiments, due to changes in chimera mRNA sequence. Some chimeric genes are expected to have altered mRNA half-life or secondary structure from the parental transcripts. Before going to the great effort of building a chimeric library, you should always ascertain the effects of amino acid substitutions by recombining your proteins of interest. In this way you can estimate a priori the fraction of folded (and potentially interesting) chimeras that are likely to be present in your combinatorial libraries. A quick way to perform such a calibration is to construct a small set of chimeras (~20), with varying levels of substitution and calculated disruption, and evaluate the structural conservation of those chimeras. In proteins that contain a cofactor, chimera folding can often be rapidly assessed by investigating whether the cofactor is incorporated [59]. In cases where the proteins being recombined do not contain a cofactor, retention of parent-like enzymatic activity can be used to assess structural conservation, as was reported with β-lactamases [55,56]. Several alternative screens have also been described for rapidly assessing protein solubility and stability [74,75]. Calibration experiments have had great success in estimating library quality and aiding in library design [59]. In the case of cytochromes P450, an initial calibration was performed on a small set of

Models Predicting Optimized Strategies for Protein Evolution

5-19

chimeras by monitoring the incorporation of heme cofactors through CO-difference spectra [59]. The results from this study suggested that three distantly related P450s (60% sequence identity) could be recombined to create a library where almost half of the highly substituted chimeras (average mutation level = 76) folded. In fact, a subsequent study where this library was created and screened showed that 47% of the chimeras fold [26]. While the functional diversity in this library has not yet been established, it is expected to have rich functional diversity as observed in a small set of P450 chimeras derived from the same parents [59].

5.5.3 Recombination as a Shortest Path Problem (RASPP) RASPP is used for optimization in libraries where there are no selective restrictions imposed on the inheritance of polypeptide fragments [51], i.e., chimeras have an equal probability of obtaining each of their polypeptide fragments from the different parents (see Figure 5.6). For a given set of parents, RASPP uses dynamic programming to identify crossover locations that minimize the average energy of each library subject to constraints on the length of the fragments exchanged. This approach yields a set of libraries over the range of possible average amino acid substitution levels, which are near optimal compared with all possible libraries that can be enumerated. RASPP can be used with any metric of chimera structural disruption that involves pairwise-decomposable, residue-based energy functions. These include metrics as simple as SCHEMA calculated disruption [55], as well as rotamer-based energies that have been averaged to derive residue-residue interactions [53]. Several parameters must be defined when using RASPP, including: (i) the parents that will be recombined, (ii) the number of crossovers n, and (iii) the energy function that will be used for quantifying structural disruption (herein we use SCHEMA disruption for illustration purposes). Once these constraints have been established, a directed graph is created which represents every feasible library with n crossovers made from parents of length N (see Figure 5.8). In this graph, there are k columns representing each possible crossover choice in a library. The nodes shown in each column designate the possible recombination sites X within the primary sequence of the parents for each distinct crossover. In a library with n crossovers, the first node Xk that can be visited in any column is equal to the column number k; the last accessible node has a value = N-n + (k-1). A single library is represented by selectively connecting the nodes on adjacent columns with arcs, such that the node visited in column Xk represents the residue after which a recombination site occurs (see Figure 5.7). To calculate the disruptive nature of chimeras in a particular library, arc lengths are assigned such that the length of the total trajectory through the graph represents the average calculated disruption <E> of all chimeras within that library. Figure 5.7 shows that the arcs connecting node 0 to a node within the first column, designated A(0,X1), are simply given a length that corresponds to the average calculated disruption <E>(X1) of the chimeras found in a one-crossover library where the first crossover occurs after residue X1. Arcs between subsequent columns are given a length A(Xk-1,Xk) that corresponds to the incremental energy change associated with swapping the polypeptide defined by that arc. Figure 5.8 illustrates how an optimal library is identified by determining the shortest path through a directed graph representing all feasible three-crossover libraries. There is no need to optimize the arcs connecting node 0 to nodes in the first column because there is only one path to each node in column 1, i.e., all A(0,X1) are optimal. Optimal two-crossover libraries are identified by finding the shortest path to each node in the second column. All arcs representing suboptimal paths from node 0 to each relevant node in column 2 are removed. This process is continued to find the length of the shortest path U from node 0 to node j in column k using the shortest paths from node 0 to all nodes in column k-1:

u kj = min (u kj -1 + A(i , j )) i

(5.14)

The <E> of chimeras in a library is intimately linked to the <m> of those variants. For this reason, you cannot use a single directed graph representing all feasible n-crossover libraries to find a near-optimal

5-20

0

Evolutionary Tools in Metabolic Engineering

A(0

,2)

Col1

Col2

Col3

1

1

1

2

2

2 3

A(2

,4)

3

2

3

4

Residue 5 6 7

8

9

10

3 A(4,

4

4

5

5

6

6

6

7

7

7

8

8

8

9

9

9

10

10

10

A(0,2) = <E>CO1

1

5)

A(2,4)

4 5

<E>CO1,CO2

<E>CO1

Figure 5.7 Directed graph representation of a single library. Top panel: a single library is represented by a set of arcs that form a continuous path through the graph. Each column k represents a crossover, and the node Xk within a given column designates the residue after which a crossover occurs. In this example, three arcs visit the nodes 21, 42, and 53, designating the four swapped polypeptides as those comprised of residues 1–2, 3–4, 5, and 6–10. Bottom: Arc weights are designated such that the complete path through the graph (representing a particular library) is equal to the average calculated disruption of chimeras defined by that library. In the case of arc A(0,2), the average weight is simply the average calculated disruption of chimeras created by allowing this crossover. In the cases of A(2,4) and subsequent arcs, the assigned weight is equal to the average disruption of the chimeras generated by each additional crossover. (Adapted from Endelman, J.B., Silberg, J.J., Wang, Z.G., and Arnold, F.H., Protein Eng. Des. Sel., 17, 589, 2004.)

library with high <m>. To find near-optimal libraries over a range of <m> values, the shortest paths through a set of directed graphs satisfying all feasible constraints on the length L of the swapped polypeptide fragments are calculated. The ranges of possible length constraints are: Lmin = 1 to N/(n + 1) and Lmax = N/(n + 1) to N-nLmin. For the three-crossover example shown in Figure 5.7, there are 11 different sets of possible Lmin and Lmax combinations (Lmin, Lmax = 1,2; 1,3; 1,4; 1,5; 1,6; 1,7; 2,2; 2,3; 2,4; 2,5; 3,3), so the shortest paths through eleven different graphs must be determined. Even for relatively long proteins, optimization problem can be calculated quickly since all arc lengths are initially determined; they do not have to be recalculated when finding the shortest path through directed graphs corresponding to each set of length constraints. Once the set of shortest-path libraries are identified for all combinations of Lmin and Lmax for a given optimization problem, the <E> and <m> is calculated for each RASPP library identified. A “RASPP curve” is then generated that shows the lowest-energy RASPP libraries over the range of all possible <m> values. These libraries represent the optimal tradeoff surface for library design, i.e., the libraries that maximize the fraction of folded and functional variants over a range of <m> values [51]. In previous studies seeking to find optimal libraries with four-crossovers between the β-lactamases PSE-4 and TEM-1, the optimal energy-diversity tradeoff surface identified by RASPP was almost identical to that found by enumerating all four-crossover libraries created from these parents [51].

5-21

Models Predicting Optimized Strategies for Protein Evolution Col1 1 2 3 4 5 6 7 8 9 10

0

0

Col1 1 2 3 4 5 6 7 8 9 10

Col1 1 2 3 4 5 6 7 8 9 10

0 Lmin=2 Lmax=4

Col2 1 2 3 4 5 6 7 8 9 10

Col3 1 2 3 4 5 6 7 8 9 10

0

Col1 1 2 3 4 5 6 7 8 9 10

Col2 1 2 3 4 5 6 7 8 9 10 Col2 1 2 3 4 5 6 7 8 9 10

Col3 1 2 3 4 5 6 7 8 9 10

Col1 1 2 3 4 5 6 7 8 9 10

0

0

Col1 1 2 3 4 5 6 7 8 9 10

Col2 1 2 3 4 5 6 7 8 9 10 Col2 1 2 3 4 5 6 7 8 9 10

Col3 1 2 3 4 5 6 7 8 9 10

Figure 5.8 Finding libraries enriched in low disruption chimeras. Top: length constraints Lmin and Lmax dictate the arcs that are allowed when building a directed graph. Bottom: for a given set of length constraints, there is only one possible path from node 0 to each node in the first column, so all paths are retained. In the case of the paths from node 0 to each node in the second column, there are typically many possible paths. All arcs representing suboptimal paths from node 0 to each relevant node in column 2 are removed. This process is then repeated for all subsequent columns. To perform RASPP, the shortest paths are solved for directed graphs representing all possible combinations of constraints. (Adapted from Endelman, J.B., Silberg, J.J., Wang, Z.G., and Arnold, F.H., Protein Eng. Des. Sel., 17, 589, 2004.)

Figure 5.9 illustrates the libraries identified by RASPP when recombining adenylate kinases from Escherichia coli and Bacillus subtilis. By varying the length constraint for all libraries containing three crossovers, we generated 4,187 libraries using RASPP, 484 of which were unique. RASPP identified libraries over nearly the full range of possible <m> which are enriched in low disruption chimeras. A number of gaps do occur in the RASPP curve, however, where no libraries are obtained with a particular <m>. For example, no libraries were identified with <m> = 22-24. Gaps in a RASPP curve are common, since RASPP finds optimal libraries subject to length constraints, not average amino acid substitution level [51], as would be the case of an ideal optimization algorithm. One way to overcome gaps that are present in a RASPP curve is to choose alternate sets of parents for recombination.

5.5.4 Optimal Pattern of Tiling for Combinatorial Libraries (OPTCOMB) OPTCOMB is an approach for library optimization that uses pairwise-decomposable, residue-based energy functions to identify optimal crossover sites for recombination of structurally related proteins [52]. For a given set of parents, OPTCOMB minimizes the number of disruptive interactions (or clashes) present in a library that meets user-defined constraints on the length and number of exchanged polypeptides. By varying these constraints, OPTCOMB establishes the tradeoff between library size and library quality, i.e., average number of clashes per chimera. In a theoretical study evaluating DHFR recombination, OPTCOMB consistently identified libraries that were enriched in low clash chimeras compared to what is found on average in randomly generated

5-22

Evolutionary Tools in Metabolic Engineering

35

Average calculated disruption

30 25 20 15 10 5 0

0

5

10 15 20 25 30 Average amino acid substitution level

35

Figure 5.9 Evidence that RASPP curves approximate the optimal tradeoff surface. All possible three-crossover libraries were enumerated for the recombination of Bacillus subtilis and Escherichia coli adenylate kinase (N = 111 nonconserved residues), resulting in approximately 1.5x10 6 libraries (gray boxes). RASPP optimized libraries were calculated, yielding 4,187 libraries. Of these, 484 were unique, and this set of libraries exhibited only 62 distinct pairs of values for <E> and <m> (see black boxes). The black line represents the optimal energydiversity tradeoff surface. (Adapted from Endelman, J.B., Silberg, J.J., Wang, Z.G., and Arnold, F.H., Protein Eng. Des. Sel., 17, 589, 2004.)

libraries of similar size [52]. OPTCOMB displayed the most dramatic enrichment of low clash DHFR chimeras when creating small libraries (<104 variants). In these libraries, OPTCOMB consistently identified chimeric libraries with 50% fewer clashes (as defined by FamClash) than found in randomly chosen libraries. However, the extent to which OPTCOMB enriches combinatorial libraries in folded and functional proteins compared to the true optimum remains unclear for libraries with a particular level or sequence diversity. The initial report of OPTCOMB used the CPLEX solver for optimization [52]. CPLEX is a commercially available software package for mixed-integer linear programming that employs branch-andbound (and cut) techniques for optimization (see www.ilog.com/products/cplex/). Several parameters must be defined to use OPTCOMB, including: (i) the energy function that will be used to score structural disruption (e.g., FamClash or Schema disruption), (ii) the set of parents that will be recombined, (iii) the minimum number of exchanged polypeptides, and (iv) the minimum and maximum length constraints on the size of the swapped polypeptides. Once these constraints are defined, libraries of a given size that minimize the number of clashes are simply obtained by running the CPLEX solver for the set of OPTCOMB objective functions and mathematical constraints described [52]. The greatest feature of OPTCOMB is its ability to minimize the average number of clashes present in libraries that are generated with biased polypeptide inheritance (see Figure 5.6). This can be contrasted with RASPP, which can only optimize the design of libraries with unbiased polypeptide inheritance. However, unlike RASPP, OPTCOMB does not identify the optimal tradeoff surface for a given library size as shown in Figure 5.9. In theory, such a tradeoff surface could be obtained for OPTCOMB by identifying optimal libraries for all sets of possible polypeptide length constraints.

Models Predicting Optimized Strategies for Protein Evolution

5-23

5.5.5 Practical Considerations for Library Synthesis The best in vitro approaches for recombining distantly related proteins, sequence-independent sitedirected chimeragenesis (SISDC) and complete chemical synthesis [56,68], create chimeric libraries through the ligation of gene fragments that contain single-stranded overhangs. SISDC is typically preferred over complete chemical synthesis of a library, even though the latter approach is simpler. The gene fragments recombined during SISDC are derived from plasmids whose sequences can be verified before fragment assembly [68]. If any of your SISDC building blocks contain undesired mutations in the parent plasmids, you can remove them before assembling your combinatorial library. This minimizes random mutations which can significantly reduce the fraction of folded and functional variants in a library [16]. SISDC has one major limitation. As sequence identity among the parents decreases, you become increasingly limited in where you can select recombination sites without introducing amino acids that are not found in either parent. This happens because single-stranded overhangs (3–5 base pair sticky ends) are used at the crossover boundaries to mediate fragment association before ligation. In cases where the parents do not exhibit sufficient identity at the desired crossover locations, synonymous mutations can often be introduced to allow recombination at that site. If synonymous mutations are not sufficient to avoid mutation, then alternative libraries with different crossover positions should be chosen.

5.5.6 Estimating Chimeric Library Diversity When performing site-directed recombination, libraries can be generated where chimeras have a nearequal probability of inheriting polypeptides from any of the parents [16,56]. In such libraries, it is reasonable to assume that the probability of observing one chimera is independent from the number of occurrences of any other chimera, and the frequency at which each variant occurs can be approximated by a Poisson distribution. For this reason, the same approaches that are used to assess the statistics of libraries created by directed mutagenesis can be applied to those created by site-directed recombination [76]. The number of distinct chimeras C that are present in a library containing L clones and V possible total variants is V·(1- e -L/V). The fractional completeness F ( = C/V) in such a library is (1- e -L/V). F can be used to calculate how many clones L must be screened to sample a desired percentage of the V possible variants, i.e., L = -V·ln(1-F). Often when screening well-defined libraries of variants, it is more useful to determine the probability Pc that every chimera is present in your library [76]. In these cases, the number of clones L that must be screened to yield the desired probability Pc is given by

(

L = -V ln -

)

ln Pc V

(5.15)

5.6 Conclusions There is now extensive evidence that theoretical models can be used to optimize directed evolution. Current algorithms can be easily implemented on desktop computers to accelerate the discovery of proteins with improved functional properties. In most cases, existing models seek to increase the frequency at which folded variants are observed in combinatorial libraries, since folding is typically a prerequisite for acquisition of new enzyme functions. However, these strategies still remain somewhat limited in their information content, i.e., their ability to consistently predict which protein variants are functional and which ones are not. New models that use a combination of sequence, structure, and biophysical knowledge will most certainly extend the fitness steps that can be achieved in a single round of screening or selection.

5-24

Evolutionary Tools in Metabolic Engineering

References 1. Allert, M., Rizk, S.S., Looger, L.L., and Hellinga, H.W. Computational design of receptors for an organophosphate surrogate of the nerve agent soman. Proc. Natl. Acad. Sci. USA, 101, 7907, 2004. 2. Dwyer, M.A., Looger, L.L., and Hellinga, H.W. Computational design of a biologically active enzyme. Science, 304, 1967, 2004. 3. Korkegian, A., Black, M.E., Baker, D., and Stoddard, B.L. Computational thermostabilization of an enzyme. Science, 308, 857, 2005. 4. Kuhlman, B., Dantas, G., Ireton, G.C., Varani, G., Stoddard, B.L., and Baker, D. Design of a novel globular protein fold with atomic-level accuracy. Science, 302, 1364, 2003. 5. Park, H.S., Nam, S.H., Lee, J.K., Yoon, C.N., Mannervik, B., Benkovic, S.J., and Kim, H.S. Design and evolution of new catalytic activity with an existing protein scaffold. Science, 311, 535, 2006. 6. Shifman, J.M. and Mayo, S.L. Exploring the origins of binding specificity through the computational redesign of calmodulin. Proc. Natl. Acad. Sci. USA, 100, 13274, 2003. 7. Arnold, F.H. Advances in Protein Chemistry, Vol. 55, 55 Edition. San Diego: Academic Press, 2001. 8. Takahashi, T.T., Austin, R.J., and Roberts, R.W. mRNA display: ligand discovery, interaction analysis and beyond. Trends Biochem. Sci., 28, 159, 2003. 9. Netzer, W.J. and Hartl, F.U. Recombination of protein domains facilitated by co-translational folding in eukaryotes. Nature, 388, 343, 1997. 10. Keefe, A.D. and Szostak, J.W. Functional proteins from a random-sequence library. Nature, 410, 715, 2001. 11. Axe, D.D. Estimating the prevalence of protein sequences adopting functional enzyme folds. J. Mol. Biol., 341, 1295, 2004. 12. Bloom, J.D., Silberg, J.J., Wilke, C.O., Drummond, D.A., Adami, C., and Arnold, F.H. Thermodynamic prediction of protein neutrality. Proc. Natl. Acad. Sci. USA, 102, 606, 2005. 13. Guo, H.H., Choe, J., and Loeb, L.A. Protein tolerance to random amino acid change. Proc. Natl. Acad. Sci. USA, 101, 9205, 2004. 14. Shafikhani, S., Siegel, R.A., Ferrari, E., and Schellenberger, V. Generation of large libraries of random mutants in Bacillus subtilis by PCR-based plasmid multimerization. Biotechniques, 23, 304, 1997. 15. Crameri, A., Raillard, S.A., Bermudez, E., and Stemmer, W.P. DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature, 391, 288, 1998. 16. Drummond, D.A., Silberg, J.J., Meyer, M.M., Wilke, C.O., and Arnold, F.H. On the conservative nature of intragenic recombination. Proc. Natl. Acad. Sci. USA, 102, 5380, 2005. 17. Bloom, J.D., Meyer, M.M., Meinhold, P., Otey, C.R., MacMillan, D., and Arnold, F.H. Evolving strategies for enzyme engineering. Curr. Opin. Struct. Biol., 15, 447, 2005. 18. Glieder, A., Farinas, E.T., and Arnold, F.H. Laboratory evolution of a soluble, self-sufficient, highly active alkane hydroxylase. Nat. Biotechnol., 20, 1135, 2002. 19. Drummond, D.A., Iverson, B.L., Georgiou, G., and Arnold, F.H. Why high-error-rate random mutagenesis libraries are enriched in functional and improved proteins. J. Mol. Biol., 350, 806, 2005. 20. Bava, K.A., Gromiha, M.M., Uedaira, H., Kitajima, K., and Sarai, A. ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Res., 32, D120, 2004. 21. Fersht, A.R. Structure and Mechanism in Protein Science. New York: Freeman, 1999. 22. Serrano, L., Day, A.G., and Fersht, A.R. Step-wise mutation of barnase to binase. A procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability. J. Mol. Biol., 233, 305, 1993. 23. Wells, J.A. Additivity of mutational effects in proteins. Biochemistry, 29, 8509, 1990. 24. Zhang, X.J., Baase, W.A., Shoichet, B.K., Wilson, K.P., and Matthews, B.W. Enhancement of protein stability by the combination of point mutations in T4 lysozyme is additive. Protein Eng., 8, 1017, 1995.

Models Predicting Optimized Strategies for Protein Evolution

5-25

25. Bloom, J.D., Labthavikul, S.T., Otey, C.O., and Arnold, F.H. Protein stability promotes evolvability. Proc. Natl. Acad. Sci. USA, 103, 5869, 2006. 26. Otey, C.O., Landwehr, M., Endelman, J.B., Hiraga, K., Bloom, J.D., and Arnold, F.H. Structureguided recombination creates an artificial family of cytochromes P450. PLoS Biol., 4, e112, 2006. 27. Guerois, R., Nielsen, J.E., and Serrano, L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol., 320, 369, 2002. 28. Gilis, D. and Rooman, M. PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. Protein Eng., 13, 849, 2000. 29. Van Kampan, N.G. Stochastic Processes in Physics and Chemistry. Amsterdam: Elsevier, 1992. 30. Cadwell, R.C. and Joyce, G.F. Mutagenic PCR. PCR Methods Appl., 3, S136, 1994. 31. Fromant, M., Blanquet, S., and Plateau, P. Direct random mutagenesis of gene-sized DNA fragments using polymerase chain reaction. Anal. Biochem., 224, 347, 1995. 32. Vanhercke, T., Ampe, C., Tirry, L., and Denolf, P. Reducing mutational bias in random protein libraries. Anal. Biochem., 339, 9, 2005. 33. Daugherty, P.S., Chen, G., Iverson, B.L., and Georgiou, G. Quantitative analysis of the effect of the mutation frequency on the affinity maturation of single chain Fv antibodies. Proc. Natl. Acad. Sci. USA, 97, 2029, 2000. 34. Arnold, F.H. Fancy footwork in the sequence space shuffle. Nat. Biotechnol., 24, 328, 2006. 35. Amin, N., Liu, A.D., Ramer, S., Aehle, W., Meijer, D., Metin, M., Wong, S., Gualfetti, P., and Schellenberger, V. Construction of stabilized proteins by combinatorial consensus mutagenesis. Protein Eng. Des. Sel., 17, 787, 2004. 36. Lehmann, M., Kostrewa, D., Wyss, M., Brugger, R., D’Arcy, A., Pasamontes, L., and van Loon, A.P. From DNA sequence to improved functionality: using protein sequence comparisons to rapidly design a thermostable consensus phytase. Protein Eng., 13, 49, 2000. 37. Lehmann, M., Loch, C., Middendorf, A., Studer, D., Lassen, S.F., Pasamontes, L., van Loon, A.P., and Wyss, M. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng., 15, 403, 2002. 38. Roberge, M., Estabrook, M., Basler, J., Chin, R., Gualfetti, P., Liu, A., Wong, S.B., Rashid, M.H., Graycar, T., Babe, L., and Schellenberger, V. Construction and optimization of a CC49-Based scFvb-lactamase fusion protein for ADEPT. Protein Eng. Des. Sel., 19, 141, 2006. 39. Russ, W.P., Lowery, D.M., Mishra, P., Yaffe, M.B., and Ranganathan, R. Natural-like function in artificial WW domains. Nature, 437, 579, 2005. 40. Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., and Thompson, J.D. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res., 31, 3497, 2003. 41. Hogrefe, H.H., Cline, J., Youngblood, G.L., and Allen, R.M. Creating randomized amino acid libraries with the QuikChange Multi Site-Directed Mutagenesis Kit. Biotechniques, 33, 1158, 2002. 42. Wang, L., Brock, A., Herberich, B., and Schultz, P.G. Expanding the genetic code of Escherichia coli. Science, 292, 498, 2001. 43. Kubo, T., Peters, M.W., Meinhold, P., and Arnold, F.H. Enantioselective epoxidation of terminal alkenes to (R)- and (S)-epoxides by engineered cytochromes P450 BM-3. Chemistry, 12, 1216, 2006. 44. Williams, G.J., Woodhall, T., Nelson, A., and Berry, A. Structure-guided saturation mutagenesis of N-acetylneuraminic acid lyase for the synthesis of sialic acid mimetics. Protein Eng. Des. Sel., 18, 239, 2005. 45. Parikh, M.R. and Matsumura, I. Site-saturation mutagenesis is more efficient than DNA shuffling for the directed evolution of beta-fucosidase from beta-galactosidase. J. Mol. Biol., 352, 621, 2005. 46. Patrick, W.M. and Firth, A.E. Strategies and computational tools for improving randomized protein libraries. Biomol. Eng., 22, 105, 2005. 47. Bosley, A.D. and Ostermeier, M. Mathematical expressions useful in the construction, description and evaluation of protein libraries. Biomol. Eng., 22, 57, 2005.

5-26

Evolutionary Tools in Metabolic Engineering

48. Flores, H. and Ellington, A.D. A modified consensus approach to mutagenesis inverts the cofactor specificity of Bacillus stearothermophilus lactate dehydrogenase. Protein Eng. Des. Sel., 18, 369, 2005. 49. Rost, B., Yachdav, G., and Liu, J. The PredictProtein server. Nucleic Acids Res., 32, W321, 2004. 50. Sander, C. and Schneider, R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins, 9, 56, 1991. 51. Endelman, J.B., Silberg, J.J., Wang, Z.G., and Arnold, F.H. Site-directed protein recombination as a shortest-path problem. Protein Eng. Des. Sel., 17, 589, 2004. 52. Saraf, M.C., Gupta, A., and Maranas, C.D. Design of combinatorial protein libraries of optimal size. Proteins, 60, 769, 2005. 53. Gordon, D.B., Marshall, S.A., and Mayo, S.L. Energy functions for protein design, Curr. Opin. Struct. Biol., 9, 509, 1999. 54. Moore, G.L. and Maranas, C.D. Identifying residue-residue clashes in protein hybrids by using a second-order mean-field approach. Proc. Natl. Acad. Sci. USA, 100, 5091, 2003. 55. Voigt, C.A., Martinez, C., Wang, Z.G., Mayo, S.L., and Arnold, F.H. Protein building blocks preserved by recombination. Nat. Struct. Biol., 9, 553, 2002. 56. Meyer, M.M., Silberg, J.J., Voigt, C.A., Endelman, J.B., Mayo, S.L., Wang, Z.G., and Arnold, F.H. Library analysis of SCHEMA-guided protein recombination. Protein Sci., 12, 1686, 2003. 57. Saraf, M.C. and Maranas, C.D. Using a residue clash map to functionally characterize protein recombination hybrids. Protein Eng., 16, 1025, 2003. 58. Saraf, M.C., Horswill, A.R., Benkovic, S.J., and Maranas, C.D. FamClash: a method for ranking the activity of engineered enzymes. Proc. Natl. Acad. Sci. USA, 101, 4142, 2004. 59. Otey, C.R., Silberg, J.J., Voigt, C.A., Endelman, J.B., Bandara, G., and Arnold, F.H. Functional evolution and structural conservation in chimeric cytochromes p450: calibrating a structure-guided approach. Chem. Biol., 11, 309, 2004. 60. Shindyalov, I.N. and Bourne, P.E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng., 11, 739, 1998. 61. Guda, C., Lu, S., Scheeff, E.D., Bourne, P.E., and Shindyalov, I.N. CE-MC: a multiple protein structure alignment server. Nucleic Acids Res., 32, W100, 2004. 62. Guex, N. and Peitsch, M.C. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis, 18, 2714, 1997. 63. Schwede, T., Kopp, J., Guex, N., and Peitsch, M.C. SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res., 31, 3381, 2003. 64. Vriend, G. WHAT IF: a molecular modeling and drug design program. J. Mol. Graph., 8, 52, 1990. 65. Lockless, S.W. and Ranganathan, R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science, 286, 295, 1999. 66. Socolich, M., Lockless, S.W., Russ, W.P., Lee, H., Gardner, K.H., and Ranganathan, R. Evolutionary information for specifying a protein fold. Nature, 437, 512, 2005. 67. Shulman, A.I., Larson, C., Mangelsdorf, D.J., and Ranganathan, R. Structural determinants of allosteric ligand activation in RXR heterodimers. Cell, 116, 417, 2004. 68. Hiraga, K. and Arnold, F.H. General method for sequence-independent site-directed chimeragenesis. J. Mol. Biol., 330, 287, 2003. 69. Zhao, H., Giver, L., Shao, Z., Affholter, J.A., and Arnold, F.H. Molecular evolution by staggered extension process (StEP) in vitro recombination. Nat. Biotechnol., 16, 258, 1998. 70. Shao, Z., Zhao, H., Giver, L., and Arnold, F.H. Random-priming in vitro recombination: an effective tool for directed evolution. Nucleic Acids Res., 26, 681, 1998. 71. Lutz, S., Ostermeier, M., Moore, G.L., Maranas, C.D., and Benkovic, S.J. Creating multiple-crossover DNA libraries independent of sequence identity. Proc. Natl. Acad. Sci. USA, 98, 11248, 2001. 72. Ostermeier, M., Shim, J.H., and Benkovic, S.J. A combinatorial approach to hybrid enzymes independent of DNA homology. Nat. Biotechnol., 17, 1205, 1999.

Models Predicting Optimized Strategies for Protein Evolution

5-27

73. Sieber, V., Martinez, C.A., and Arnold, F.H. Libraries of hybrid proteins from distantly related sequences. Nat. Biotechnol., 19, 456, 2001. 74. Waldo, G.S., Standish, B.M., Berendzen, J., and Terwilliger, T.C. Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol., 17, 691, 1999. 75. Philipps, B., Hennecke, J., and Glockshuber, R. FRET-based in vivo screening for protein folding and increased protein stability. J. Mol. Biol., 327, 239, 2003. 76. Patrick, W.M., Firth, A.E., and Blackburn, J.M. User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries. Protein Eng., 16, 451, 2003.

Gene Expression Tools for Metabolic Pathway Engineering

II

Christina D. Smolke California Institute of Technology

6 Low-Copy Number Plasmids as Artificial Chromosomes Kristala L. Jones Prather..................................................................................................................6-1 Introduction • Development of Low- and Multicopy Plasmid Expression Vectors • The Metabolic Burden Effect, Plasmid Instability, and the Impact of Both on Engineered Metabolic Systems • Applications of Low-Copy Plasmids to Metabolic Engineering • Opportunities for Artificial Chromosome Use in Metabolic Engineering • Summary

7 Chromosomal Engineering Strategies Kenan C. Murphy........................................ 7-1 Introduction • Strategies of λ Red-Promoted and SSR-Mediated Modification of the Bacterial Chromosome • Large Insertions • Applications of Red/ET Recombineering Technology for Metabolic Engineering • Summary

8 Regulating Gene Expression through Engineered RNA Technologies Maung Nyan Win and Christina D. Smolke.........................................8-1 Introduction • Basic RNA Regulatory Elements • RNA Sensory Elements • Natural and Engineered Riboswitches • Applications of RNA Control Elements in Metabolic Network Engineering • Enabling Technologies in Support of Constructing Integrated Network and Control Systems • Future Applications of Advanced RNA-Based Control Systems • Conclusions

II-1

II-2

Gene Expression Tools for Metabolic Pathway Engineering

9 Tools Designed to Regulate Translational Efficiency Claes Gustafsson......................9-1 Introduction • Synthetic Genes • Synthetic Genes for Synthetic Biology • Codon Usage in Different Hosts • Improving Expression by Modifying the Host • Improving Expression by Modifying the Gene • Codon Optimization Using the CAI = 1 Algorithm • Codon Optimization by Probability Score • DNA Sequence Features to Eliminate, Add, or Modify during Design • Incorporating Tools for Translational Control in Metabolic Engineering

10 Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes Mohak Mhatre, Maira P. Pellegrini, and Michael J. Betenbaugh............. 10-1 Introduction • Chaperones and Folding Catalysts • Glycoengineering • γ-Carboxylation • Furin Cleavage • Conclusion

11 Engineering Multifunctional Enzyme Systems for Optimized Metabolite Transfer between Sequential Conversion Steps Robert J. Conrado, Thomas J. Mansell, and Matthew P. DeLisa..........................................................................................................................11-1 Introduction • Enzyme-to-Enzyme Channeling • Metabolic Channeling in Primary Metabolism • Metabolic Channeling in Secondary Metabolism • Advantages Conferred from Multifunctional Enzyme Systems • Engineering Multifunctional Enzymes • Engineering Metabolic Channels de Novo • Concluding Remarks

12 Practical Pathway Engineering—Demonstration in Integrating Tools Sung Kuk Lee and Jay D. Keasling...................................................................................... 12-1 Introduction • Gene Discovery • Protein Engineering • Metabolic Pathway Regulation • Pathway Optimization Using Functional Genomics • Improvement of Cellular Properties • Perspective

M

etabolic engineering is the construction, redirection, and manipulation of cellular metabolism through the alteration of enzyme activities and levels to achieve the biosynthesis or biocatalysis of desired compounds.1 The optimization of production or consumption of a target metabolite often requires precisely controlled expression of several natural and/or heterologous pathway enzymes such that individual conversion steps in the pathway do not limit the desired product yield (based on different K m and kcat values), and that cellular resources and energy are not being inefficiently utilized by the cell host. Therefore, genetically-encoded tools that enable the control of enzyme expression levels and activities are required for the optimization of pathway fluxes. Metabolic engineering is a larger application area within biological engineering, in which cells are used to process chemicals, materials, and energy. The strategies utilized in accomplishing these tasks are based on mutagenesis and screening techniques, genetic engineering strategies, and recombinant DNA techniques. The field of metabolic engineering is seeing a transformation with the emerging field of synthetic biology. Synthetic biology is the design, construction, and characterization of biological systems using engineering design principles.2,3 To support a framework for engineering biology, synthetic biology is rooted in foundational technologies that enable the construction of more complex, heterologous networks in living systems. With advances in DNA sequencing and synthesis it is becoming common practice to synthesize entire genes and pathways from scratch, no longer limiting researchers to physical DNA they obtain from organisms. In addition, abstraction frameworks have been proposed to enable rapid assembling and reassembling of basic biological components (or parts) into larger networks (or devices) and systems, supporting the reliable construction of complex metabolic pathways in cellular hosts (or chassis).3 There is also discussion in the field around the engineering of specific chassis, or cellular hosts, optimized for metabolic engineering applications.4,5 Finally, enabling geneticallyencoded technologies are being developed for use in precise and quantitative manipulation of biological components such as enzymes. An example of a synthetic biology approach to the construction of a metabolic pathway in Escherichia coli was recently described.6

Gene Expression Tools for Metabolic Pathway Engineering

II-3

Section II provides an overview of some of the more important molecular tools for regulating enzyme expression levels and activities in different cellular hosts in metabolic engineering applications. The chapter topics have been selected to cover different mechanisms of regulation with a particular emphasis on tools that focus on regulation at different points along the gene expression pathway. It should be noted that the successful implementation of these strategies to a target pathway of interest will depend on the particulars of the pathway of interest, such that no one strategy will apply to every pathway and that some level of knowledge about the pathway enzymes and their activities in the host cell are needed in order to design a successful regulation strategy. In addition, these tools are not meant to be utilized in exclusivity or in the absence of one another and the most effective strategies will be those that integrate different tools effectively to optimize regulation of enzyme levels and activities across different steps in the gene expression pathway.7,8 In addition, this section does not cover all molecular tools that are important to optimizing flux through metabolic pathways in engineered hosts. For example, tools that allow transcriptional control, i.e., engineered promoter-based systems, are conspicuously absent from this section. The vast majority of metabolic engineering efforts to-date have focused pathway design strategies at the level of transcriptional regulation. In addition, researchers have developed libraries of engineered promoter systems for fine-tuning expression levels in different cellular hosts.9–11 Due to the prevalence of this expression tool, there are many good reviews on promoter-based systems in the literature,12,13 and a chapter on tools that regulate at the level of transcription is left out from this edition of the handbook. Chapters 6 and 7 look at tools that regulate DNA copy number. DNA copy number becomes an important consideration in minimizing the metabolic burden placed on the host cell. Overexpression of enzyme levels can sometimes kill or significantly reduce the growth rate of an engineered host cell by pulling too much of the precursor or energy resources from other pathways required for cell growth. In addition, DNA copy number can also be an important consideration to the stability of the synthetic genetic constructs in the host cell. Two different tools are discussed that allow for lower copies of genetic constructs in host cells: low- or single-copy plasmid systems (Chapter 6) and genome engineering systems (Chapter 7). Both of these systems allow for the expression of pathway enzymes in a host cell without the need for selective maintenance pressure, significantly lowering the costs associated with performing large-scale fermentations with the antibiotics that are typically used for plasmid maintenance. The choice of expressing a synthetic pathway from a single-copy plasmid versus from the chromosome of your host organism will likely be determined by the specifics of the pathway and process stage. Chapters 8 and 9 look at tools that regulate gene expression levels through post-transcriptional and translational mechanisms. Tools based on modulating post-transcriptional and translational processes can provide powerful strategies in precisely regulating enzyme expression levels to avoid overexpression and balance flux through pathways to avoid intermediate bottlenecks. Two different tools are discussed that ultimately allow one to tune enzyme expression levels: RNA-based regulatory systems (Chapter 8) and strategies that modify codon usage of the expressed genes (Chapter 9). Engineered RNA-based regulatory systems offer advantages in genetic control based on their tunability, programmability, and ease of implementation particularly for differential expression of multi-gene control.14,15 In addition, new advances in RNA engineering are enabling the implementation of metabolite-responsive control systems for optimizing metabolic pathways. The optimization of codon usage is becoming a standard strategy in conjunction with gene synthesis. If the genes encoding the enzymes of interest are taken from organisms that exhibit significantly different codon usage than the host cell, synthesis of that gene with a codon set optimized for that host cell can significantly improve the expression of that enzyme. Chapters 10 and 11 look at tools that regulate the activity of enzymes through post-translational engineering strategies. Cells use many different types of post-translational mechanisms to modify the activities of proteins and to increase the specificity of and flux through metabolic pathways. Two different tools are discussed that allow one to regulate enzyme activity: strategies for engineering post-translational modifications on proteins (Chapter 10) and strategies for coupling enzymes to form

II-4

Gene Expression Tools for Metabolic Pathway Engineering

engineered enzyme complexes (Chapter 11). Cells, and in particular eukaryotic organisms, modify proteins post-translationally with a variety of different functional groups to alter and regulate protein activity. The ability to engineer different post-translational modifications is critical to the functional heterologous expression of certain classes of enzymes. General strategies for building engineered enzyme complexes offers several advantages in metabolic engineering including providing methods to reduce intermediates in a pathway that may be toxic to the cell and to optimize the flux through a metabolic pathway through enzyme channeling and increasing effective intermediate concentrations. Both strategies can provide regulatory properties to different pathways that the other strategies that result in the regulation of enzyme levels cannot accomplish. Chapter 12 looks at the integration of different types of tools with different metabolic pathways. The chapter highlights cases in which the implementation of different molecular tools resulted in significant changes to flux or product accumulation in engineered metabolic pathways. The case studies serve to highlight the importance of molecular tools that enable the regulation of enzyme levels and activities and draw attention to the unique challenges posed in metabolic engineering, where often times different levels of enzymes in the metabolic pathway are needed in order to optimize cellular resources and energy usage.

References 1. Khosla, C. and Keasling, J.D. Metabolic engineering for drug discovery and development. Nat. Rev. Drug Discov., 2, 1019–1025, 2003. 2. Voigt, C.A. Genetic parts to program bacteria. Curr. Opin. Biotechnol., 17, 548–557, 2006. 3. Endy, D. Foundations for engineering biology. Nature, 438, 449–453, 2005. 4. Sharma, S.S., Blattner, F.R., and Harcum, S.W. Recombinant protein production in an Escherichia coli reduced genome strain. Metab. Eng., 9, 133–141, 2007. 5. Posfai, G. et al. Emergent properties of reduced-genome Escherichia coli. Science, 312, 1044–1046, 2006. 6. http://parts.mit.edu/wiki/index.php/MIT_2006. 7. Smolke, C.D. and Keasling, J.D. Effect of gene location, mRNA secondary structures, and RNase sites on expression of two genes in an engineered operon. Biotechnol. Bioeng., 80, 762–776, 2002. 8. Smolke, C.D. and Keasling, J.D. Effect of copy number and mRNA processing and stabilization on transcript and protein levels from an engineered dual-gene operon. Biotechnol. Bioeng., 78, 412–424, 2002. 9. Nevoigt, E. et al. Engineering of promoter replacement cassettes for fine-tuning of gene expression in Saccharomyces cerevisiae. Appl. Environ. Microbiol., 72, 5266–5273, 2006. 10. Jensen, P.R. and Hammer, K. Artificial promoters for metabolic optimization. Biotechnol. Bioeng., 58, 191–195, 1998. 11. Jensen, P.R. and Hammer, K. The sequence of spacers between the consensus sequences modulates the strength of prokaryotic promoters. Appl. Environ. Microbiol., 64, 82–87, 1998. 12. Tyo, K.E., Alper, H.S., and Stephanopoulos, G.N. Expanding the metabolic engineering toolbox: more options to engineer cells. Trends Biotechnol., 25, 132–137, 2007. 13. Hammer, K., Mijakovic, I., and Jensen, P.R. Synthetic promoter libraries—tuning of gene expression. Trends Biotechnol., 24, 53–55, 2006. 14. Smolke, C.D., Martin, V.J., and Keasling, J.D. Controlling the metabolic flux through the carotenoid pathway using directed mRNA processing and stabilization. Metab. Eng., 3, 313–321, 2001. 15. Pfleger, B.F., Pitera, D.J., Smolke, C.D., and Keasling, J.D. Combinatorial engineering of intergenic regions in operons tunes expression of multiple genes. Nat. Biotechnol., 24, 1027–1032, 2006.

6 Low-Copy Number Plasmids as Artificial Chromosomes 6.1 6.2 6.3 6.4

Introduction ��6-1 evelopment of Low- and Multicopy D Plasmid Expression Vectors ��6-2 The Metabolic Burden Effect, Plasmid Instability, and the Impact of Both on Engineered Metabolic Systems..................... 6-4 Applications of Low-Copy Plasmids to Metabolic Engineering �� 6-5

Motivation for Reducing the Expression Level • Case Studies of Low-Copy Plasmids in Metabolic Engineering

6.5

Kristala L. Jones Prather Massachusetts Institute of Technology

pportunities for Artificial Chromosome Use in O Metabolic Engineering ��6-10

Minimal Perturbation of Host Metabolism: An Opportunity for Novel Metabolic Control Systems • Maintenance of Large DNA Fragments: Pathway Engineering and Discovery

6.6 Summary ��6-13 References ��6-13

6.1 Introduction In the early 1970s, Stanley Cohen et al. reported the production of a new, autonomously replicating DNA molecule that had been constructed in vitro.1 It was produced by digesting separate tetracyclineresistant (pSC101) and kanamycin-resistant (pSC102) DNA fragments with the restriction endonuclease EcoRI and then covalently linking the two fragments with DNA ligase. Transformation of competent Escherichia coli with the resulting ligation mixture yielded clones with resistance to both tetracycline and kanamycin localized on a single contiguous DNA molecule (pSC105). This was the first published report of recombinant DNA technology. Since that time, recombinant DNA molecules have been used to produce a variety of both small molecules and proteins in host organisms ranging from prokaryotic microbes to animal cells. Such productivity is based on the promiscuous nature of molecular biology— based on a lack of fidelity to the genome—and the demonstration by Cohen and colleagues that these recombined genes could come from a variety of sources and be readily transcribed in a bacterial host.1–3 Thus, the biotechnology revolution that began with these first experiments was predicated on the ability to introduce and maintain heterologous DNA in a host organism. The carriers of foreign DNA, or vectors, may be thought of as “artificial chromosomes,” enabling the in vivo expression of genes that are maintained separately from the genome.

6-1

6-2

Gene Expression Tools for Metabolic Pathway Engineering

Cohen’s vector of choice for many of these initial experiments—and for several experiments by other researchers that followed—was the R factor-derived plasmid pSC101, a 9.2 Kb plasmid with a single EcoRI site for gene insertion.4,5 Vector pSC101 is a low-copy plasmid, existing at ~five copies per genome in a bacterial host.6 As the field of biotechnology advanced, plasmids remained the vectors of choice for recombinant gene expression. However, the low-copy numbers of pSC101 and other naturally occurring replicons, such as the single-copy F factor of E. coli, proved to be suboptimal for many applications. As a result, the most widely available and widely used vectors to-date continue to be those derived from plasmids with much higher copy numbers. Yet an examination of the impact of high-copy number plasmids on microbial cell physiology, and a review of several examples in the literature, suggests that the use of such vectors can have a negative impact for the objectives of metabolic engineering. Indeed, the low-copy number plasmids may prove to be an advantage for this field. This chapter (i) provides an overview of the evolution of several representative plasmids that have found use in metabolic engineering; (ii) discusses the “metabolic burden effect” observed with multicopy plasmids and its significance for metabolic engineering; (iii) reviews examples of the use of lowcopy plasmids for metabolite production in engineered systems; and (iv) presents opportunities for the unique application of low-copy plasmids as artificial chromosomes. For the purposes of this discussion, the low-copy plasmids include both the pSC101-type R-derived plasmids that exist at less than ten copies per cell, and the F-factor-derived plasmids that exist at one to four copies per cell.

6.2 Development of Low- and Multicopy Plasmid Expression Vectors The most widespread and consistent use of expression vectors has been for the purpose of producing recombinant proteins. (The other significant use is for the replication of DNA fragments for sequencing.) Instinctively, one would believe that a reasonable strategy for improving the yield of a recombinant protein is to increase the dosage of the recombinant gene by using plasmids with higher copy number. However, most naturally occurring replicons exist at low to moderate copy numbers.7,8 The highest of these are the ColE1-type plasmids, which naturally exist at growth-rate-dependent copy numbers of ~40–55 molecules per cell.9 This represents a significant increase in gene dosage over the pSC101-derived and similar plasmids. Yet the copy number of these plasmids can be increased even further. The ColE1-type plasmids (ColE1, pMB1) require only plasmid-encoded proteins for in vivo maintenance and can replicate in the absence of active protein synthesis.10–12 The plasmid-encoded elements include two RNA transcripts, RNAI and RNAII, and a protein called Rop or Rom (for repressor of primer and RNA one modulator, respectively). Exclusion of the Rop/Rom protein from the plasmid results in an elevated copy number, yielding ~150 plasmid molecules per cell.13–15 Among these derivatives are the well-known and often-utilized pUC18 and pUC19 plasmids.16 However, the pUC plasmids were determined to have an additional modification relative to other Rop/Rom-deficient vectors, a G→A point mutation that maps to the RNAII sequence.17 This mutation results in a further amplification of the plasmid copy number, to ~500–700 copies per cell.18 As new vectors were being created with higher plasmid copy number, modifications to the basic genetic sequence were also being made to facilitate the cloning and expression of recombinant genes. A few key characteristics that are consistently included in new cloning vectors are • Constitutive or inducible promoter to eliminate the need for promoters to be carried with the recombinant open reading frame, and to enable high-level (and possibly controlled) expression in the bacterial host; • Multiple cloning site to provide multiple sites for insertion of the cloned gene behind the promoter; and • Selectable marker for identification of cells that have been successfully transformed by the vector (with or without an insertion), which is typically an antibiotic resistance gene.

6-3

Low-Copy Number Plasmids as Artificial Chromosomes

These features provide both greater consistency and flexibility in introducing a cloned gene into a vector and even moving the gene to other vectors. Many vectors include an additional feature to facilitate screening of clones for the presence of an inserted fragment. Typically, this is achieved by locating the multiple cloning site within the open reading frame of the lacZα gene fragment, encoding the alpha peptide of the enzyme β-galactosidase. Screening of the clones in the presence of IPTG, an inducer of the lac and related promoters, and X-gal, a chromogenic substrate for β-galactosidase, enables simple identification of clones carrying an inserted fragment based on visual inspection of the cells. Those without an insert will produce functional β-galactosidase, cleave the X-gal substrate, and become blue. Successful insertion will cause disruption of the lacZα fragment and prevent expression of the β-galactosidase enzyme. Such clones will remain white. Low-Copy plasmids with some or all of these same characteristics have also been developed. Several cloning vectors have been developed from the pSC101-type replicons, including pLG338/pLG339,19 pHSG575/pHSG76,20 pCL1920/pCL1921,21 pWSK29/pWSK30 and pWSK129/pWSK130.22 pLG338 and pLG339 do not contain a multiple cloning site, but differ in the number of unique sites available for insertion of a cloned gene. Each pair of the other vectors represents similar constructs with inverted orientations of the recognition sequences in the multiple cloning site. All in this latter group of vectors have the multiple cloning site located within the lacZα gene fragment to enable blue/white selection, and differ primarily in the antibiotic resistance gene (chloramphenicol, spectinomycin/streptomycin, ampicillin, and kanamycin, respectively), and all allow expression of the plasmid-encoded genes from a lac promoter. A representative pSC101-based low-copy plasmid is shown in Figure 6.1a. While in much of the literature, “low-copy plasmids” most often refers to pSC101-derived vectors, there are plasmids that exist at even lower copy numbers. In particular, the mini-F plasmids derived from the f5 fragment of the naturally occurring F factor of E. coli, are stably maintained at one to two copies per genome.23 This stability at such a low-copy number is achieved from specialized systems for timing replication precisely with the cell cycle24 and for segregating replicated plasmids into daughter cells at division.25 A back-up “kill” function results in the inhibition of cell division in plasmid-free cells that would arise from a failure of the replication or segregation systems.26 The extreme stability of the low-copy plasmids makes them especially attractive for metabolic engineering purposes, as will be discussed in more depth in the next section. However, these features, particularly stable maintenance at copy numbers equal or close to the genome, make mini-F plasmids true “artificial chromosomes.” (a)

repA

pCL1920 4549 bp

Plac HindIII PstI SalI XbaI BamHI SmaI KpnI SacI lacZα

(b) araC

pKLJ12 12396 bp

NheI EcoRI PBAD SalI SphI rrnB term. AmpR

Spc/StrR

mini-F (f5) fragment

Figure 6.1 (See color insert following page 13-20.) Representative low-copy plasmids. (a) Plasmid pCL1920, derived from the R-factor replicon pSC101. (b) Plasmid pKLJ12, derived from the mini-F replicon pML31 (f5 fragment of the F factor plasmid). In both diagrams, the restriction sites that constitute the multiple cloning site are shown. (From Cohen, S. N. and Chang, A. C. Y. Proc. Nat. Acad. Sci. USA 70 (5), 1293–1297, 1973a; Jones, K. L. and Keasling, J. D. Biotechnol. Bioeng. 59, 659–665, 1998. With permission.)

6-4

Gene Expression Tools for Metabolic Pathway Engineering

Fewer cloning vectors derived from the mini-F plasmids are available than those from either pUC or pSC101. Two vectors that maintain each of the stability/maintenance functions described above have been constructed.27,28 Both contain a multiple cloning site (though limited in scope) behind an inducible promoter and the ampicillin resistance gene, bla. These two vectors differ in the choice of promoter, with one carrying the IPTG-inducible tac promoter (pKLJ03) and the other carrying the arabinose-inducible araBAD promoter (pKLJ12). A representative mini-F plasmid is shown in Figure 6.1b.

6.3 The Metabolic Burden Effect, Plasmid Instability, and the Impact of Both on Engineered Metabolic Systems Plasmids with such high-copy numbers as the ColE1-type pUC vectors do indeed typically result in a higher level of recombinant protein expression relative to their low-copy counterparts. A cursory review of the offerings of the major suppliers of biological research materials will reveal that most of the commercially available expression vectors are directly or indirectly derived from the high-copy pUC plasmids. As a consequence, these vectors are often the first chosen for applications in metabolic engineering. However, it should be noted that the use of such vectors carries some disadvantages. Principal among these are metabolic burden and plasmid instability. Introduction of plasmid DNA into a cell affects metabolism due to the need to both replicate plasmid DNA and to express plasmid-encoded proteins. Because these tasks utilize cellular machinery (DNA and RNA polymerases, ribosomes) and cellular building blocks (nucleotides, amino acids), the availability of the machinery and building blocks for normal cellular metabolism is altered. This effect has been termed “metabolic load” or “metabolic burden” and has been studied extensively, particularly with regards to limitations on recombinant protein production.28–35 Key findings related to the metabolic burden effect in E. coli include the following: (i) mRNA synthesis rates of a recombinant gene increase with increasing copy number, but the maximum level of recombinant protein is observed at intermediate plasmid copy numbers36,37; (ii) translation, and not transcription, is the bottleneck for recombinant protein synthesis with high plasmid copy numbers37,38; (iii) the total protein content of plasmid-bearing cells is lower than that of plasmid-free cells37; and (iv) high-level expression of recombinant proteins induces the heat-shock response.34,39 At the most basic level, plasmid-mediated metabolic burden can be thought of as any perturbation in the normal metabolism of the host cell as a result of plasmid presence. However, metabolic burden is most often quantified at a macroscopic level by measuring the growth rate. Several studies have shown that for plasmids with relaxed replication, including the ColE1-type vectors, plasmid copy number and growth rate are inversely correlated.9,29,40–42 It has also been shown that although plasmids at moderate copy numbers do cause a shift in the relative abundance of many proteins in the cell, a significant impact on growth rate is not observed until the copy number exceeds ~60 copies/cell.34 In other words, the cell is able to compensate for the extra resources allocated toward replication of plasmid DNA and expression of plasmid-encoded genes up to this point. In this report, the only plasmid-encoded gene was a constitutively expressed bla encoding for ampicillin-resistance. The effect of plasmid presence on growth rate is much more pronounced when high-level expression of a recombinant protein is achieved. In this case, a reduction in growth rate is often seen following the induction of a strong promoter. In one example, a reduction in growth rate of ~25% was observed when lacZ carried on a high-copy plasmid (~100 copies/cell) was expressed from an inducible promoter.27 When a low copy, mini-F plasmid was used instead, the reduction in growth rate was less than 10%. Since the culture growth rate affects the overall process productivity, minimizing this perturbation is often the goal for processes that involve recombinant protein production. In the case of metabolic engineering, however, one must be concerned with perturbations to metabolite flow, since most often, the precursors for the desired end products are provided from cellular metabolism. If the growth rate is maintained, but the fluxes toward these precursors are reduced as a result of increasing the plasmid copy number, then the impact on end metabolite production is likely to be

Low-Copy Number Plasmids as Artificial Chromosomes

6-5

greater than that on recombinant protein production alone. Unfortunately, knowledge of the impact of plasmid presence on the metabolome profile is limited. In the next section, examples of such an effect, i.e., a decrease in metabolite productivity as the result of an increase in plasmid copy number, are described. The second primary disadvantage of multicopy plasmids, plasmid instability, may be observed in two different contexts: structural instability and segregational instability. Structural instabilities arise when inserted gene fragments are either partially or fully excluded from the vector, or disrupted through insertion sequences, following transformation of a host strain. Such instabilities may be observed if expression of the recombinant gene causes undue stress upon the host cell,43 for example, because of toxicity or unbalanced GC content, or if the inserted fragment is too large.44 Structural instabilities are likely to arise early in the strain development cycle; however, it is difficult to determine a priori whether a particular insertion will cause such instabilities.45 Segregational instabilities are the result of the failure of plasmid DNA to be maintained in every cell of a population. These instabilities arise, first, from the uneven distribution of plasmids into daughter cells following cell division.46,47 As discussed previously, growth rate and plasmid copy number are inversely correlated; therefore, plasmid-free cells will exhibit a faster growth rate. Competition will eventually result in the dominance of plasmid-free cells in the population. As stated previously, growth rate differences are more pronounced with high-level expression of a cloned gene from the vector. Similarly, the rates of plasmid loss under these circumstances are also much more pronounced.27 As with structural instability, prediction of which inserts are more likely to induce segregational instability is also quite difficult.48,49 Structural instabilities usually have to be resolved by a modification of the DNA. Segregational instability is most often resolved through the use of selective pressure. Since most cloning vectors include antibiotic resistance genes for selection of transformed cells, it is common to grow the culture in the presence of an antibiotic to suppress the growth of plasmid-free cells. This method is effective on a small scale; however, metabolic engineering frequently targets large-scale chemical production in which cost becomes a critical factor for process feasibility.50 The use of antibiotics in this case is not favorable. The low-copy plasmids, especially those derived from the F-factor, are extremely stable in the absence of antibiotics, even with maximum expression of a cloned gene product.27 The low-copy plasmids are therefore an attractive alternative to multicopy ones for large-scale processes.

6.4 Applications of Low-Copy Plasmids to Metabolic Engineering Although expression of recombinant proteins from multicopy plasmids can have a significant impact on the health and growth of the bacterial host, various control strategies can be used to balance growth and production. This is less true if the recombinant protein is inherently toxic to the cell. Inducible promoters are known to be leaky, with measurable and sometimes significant production of a protein or metabolite in the absence of the inducer.27,28 Reducing the gene dosage also reduces the background activity. Thus, one of the earliest (and continued) uses of low-copy plasmids was for the expression of toxic proteins in E. coli.51 As the field of metabolic engineering has grown, attention has shifted from the ability to produce a single desired protein to the ability to produce a single desired metabolite as the result of the production of several proteins. As discussed in the previous section, the impact of metabolic burden, and attempting to alleviate it, must therefore be considered to a greater extent. Equally important is the issue of vector stability. Metabolic engineering projects are more likely to be employed for large scale, high-volume products than small scale, low-volume ones; therefore, long-term stability of the recombinant strain is critical.

6.4.1 Motivation for Reducing the Expression Level The goal of most metabolic engineering projects remains the construction of a highly productive strain, and in many cases, this will be accomplished by overexpressing a pathway gene. What must be determined then is the expression level that is most appropriate for maximizing not protein activity but

6-6

Gene Expression Tools for Metabolic Pathway Engineering

product formation. Several examples indicate that there are circumstances under which maximum product formation is not correlated with maximum gene expression. In one example, the biopolymer poly-3-hydroxybutyrate (PHB) was produced in E. coli.52 Both the plasmid copy number (~12 and >100 copies/cell) and type of expression (constitutive vs. inducible) were varied to determine the impact on overall product yield and molecular weight distribution. The results indicate that the highest productivity and molecular weight were achieved with induced expression from the lower copy plasmid. In a second example, a pathway was constructed for the production of plant flavonols.53 Completion of the pathway required the expression of a plant P450 hydroxylase. The P450 was introduced on plasmids with two different copy numbers (~10–12 and ~20–40 copies/cell). It was observed that growth of the culture carrying this enzyme on the higher copy plasmid was less stable, resulting in lower productivity. While protein activity does not linearly increase with increases in gene dosage, 36 it is still true that multicopy plasmids will almost always produce higher levels of protein than low-copy counterparts. In both of the previous examples, positive results were obtained by lowering the plasmid copy number and/ or expression level; however, the final vectors employed were still multicopy (≥10 copies/cell). It is therefore appropriate to ask: can low-copy plasmids meet the needs of the metabolic engineer? To answer this question, consider first the stability of maintenance and inheritance and the metabolite productivity as a result of the enzymatic activities that are encoded by the genome, a single copy vector. A healthy bacterial cell produces a variety of metabolic products in a variety of concentrations, mostly with single-copy genes. One approach to metabolic engineering is to increase the flux through an existing pathway in order to increase productivity. In this case, it may appear as if multicopy plasmids are the only effective means of boosting the levels of a particular enzyme. However, expression levels of genome-encoded enzymes can be altered through chromosomal engineering. Frost and coworkers used this approach to improve the synthesis of aromatics from glucose using an E. coli host.54,55 The choice of chromosomal manipulation to engineer the system was made deliberately in order to avoid metabolic burden effects from the expression of multiple genes on multiple plasmids. To increase the activity of enzymes in the biosynthetic pathway for aromatic amino acids, single additional copies of the aroB, aroA, and aroC genes placed behind the strong tac promoter were inserted into the chromosome.54 This strain was compared to others in which the same genes were encoded on multicopy plasmids (tens of copies/cell) and was found to produce an equivalent amount of the aromatic end products phenylalanine, phenyllactate, and prephenate. Measures of enzyme activity revealed that increases of 3.6–11-fold were observed for the three genes following chromosomal insertion. Interestingly, the addition of a fourth and fifth gene for increasing upstream pathway activities (tktA and aroF) on a ColE1-type multicopy plasmid (~40 copies/cell) reduced the aroB, aroA, and aroC activities by two-fold, presumably due to metabolic burden effects. Using the same chromosomal engineering strategy in a different host enabled the high-level production of the aromatic compound p-hydroxybenzoic acid.55 In another example of chromosomal engineering, the objective was the enhancement of β-carotene production in a modified strain of E. coli.56 Protein products of native genes, including dxs, idi, ispA, and ispD, participate in the production of isopentenyl diphosphate (IPP), the universal precursor for isoprenoids. Several researchers had previously increased the copy number of one of more of these genes with multicopy plasmids and observed a corresponding increase in carotenoid production.57,58 However, the approach taken by Yuan et al. was to replace the native promoters of several of the pathway genes with the strong bacteriophage T5 promoter.56 In doing so, β-carotene production increased 6.3-fold over the strains carrying the native promoters. It was also observed that expression of only the dxs gene from the T5 promoter increased β-carotene production by 2–3.3-fold. This increase was comparable to that observed in another carotenoid, lycopene, when dxs was overexpressed on multicopy plasmids.28 Finally, consider a third example where the native promoter was replaced, but rather than testing a single replacement, a promoter library of varying strengths was created, enabling a range of expression levels.59 With this system two case studies were examined: (i) the effect of altered expression of ppc on biomass yield on glucose; and (ii) the effect of dxs expression on lycopene production. In both cases, it

6-7

Low-Copy Number Plasmids as Artificial Chromosomes

was determined that the best phenotypes were observed with intermediate expression levels, similar to an observation made several years prior with respect to protein activity.36 The response of dxs to the promoter library changed, however, in a strain background where the expression of downstream pathway genes ispFD and idi had been increased through chromosomal engineering. In this case, lycopene increased nearly monotonically with increasing promoter strength, reflecting a new bottleneck in the pathway. Recall as well that flux through this carotenoid pathway was further modified through additional chromosome engineering.56 From these examples, it can be seen that there are cases in which a reduction in plasmid copy number is a competitive advantage. However, it is worth noting another motivation for using low-copy plasmids, namely, the need to express multiple genes in a single host. Because metabolic engineering is concerned with pathways and systems, it is rare that overexpression of a single gene will result in an optimally productive host. This is certainly the case with transference of entire pathways between hosts. Instead, and as has been demonstrated in the previous examples, multiple genes are required to assemble, first, a fully functional pathway, and second, an optimized one. It is certainly possible to introduce multiple genes on one vector,53–55,60 but reduction to a single vector (plasmid or genome) carrying the entire pathway of interest with all of its modifications is usually a long-term approach. In the short term, researchers are much more likely to clone the desired genes onto several different vectors and simultaneously introduce them into the same cell for coexpression. In order to achieve this, the various vectors must belong to different incompatibility groups. Because the pSC101 and mini-F plasmids belong to incompatibility groups that are both distinct from one another and from the commonly used multicopy plasmids, they are easily employed to co-express genes from multiple plasmids.

6.4.2 Case Studies of Low-Copy Plasmids in Metabolic Engineering Applications of low-copy plasmids in metabolic engineering case studies are summarized in Table 6.1. As shown, in three of these cases, the use of a low-copy vector seemed to be driven primarily by the need to utilize several different plasmid vectors in the same cell.57,61,62 However, in three other cases, use of a low-copy plasmid provided a clear advantage over multicopy counterparts. These three cases will be discussed in more detail below. Table 6.1 Applications of Low-Copy Plasmids in Metabolic Engineering Product

Replicon

Comments

Reference

Lycopene

pSC101

Vector apparently chosen due to need to express multiple plasmid-encoded genes in a single host Uninduced expression of dxs from low-copy plasmid with tac promoter resulted in moderate improvement over expression from a high-copy plasmid (2.9- vs. 2.4-fold over wild type); however, induced expression suppressed productivity when high-copy plasmids were used Comparable production of steady-state levels of polyphosphate with ppk gene present on low- or high-copy plasmids on a per cell basis; higher cell density leads to higher volumetric yields with low-copy plasmid Vector apparently chosen due to need to express multiple plasmid-encoded genes in a single host Low-Copy expression of guaBA gave ~60% higher yield of BH4 compared to expression of the same genes from a high-copy plasmid Vector apparently chosen due to need to express multiple plasmid-encoded genes in a single host

57,75

Mini-F

Polyphosphate

Mini-F

Indigo

pSC101

Tetrahydrobiopterin (BH4)

pSC101

Chitinbiose

pSC101

Note: The host organism in each of these examples is E. coli.

28

28

61 67

62

6-8

Gene Expression Tools for Metabolic Pathway Engineering

6.4.2.1 Enhanced Production of Polyphosphate through Overexpression of ppk from a Mini-F Plasmid (Figure 6.2a) 28 Polyphosphate (polyP) molecules are produced by the action of the enzyme polyphosphate kinase (PPK) in a one-step mechanism in which the terminal phosphate from an ATP molecule is transferred to a growing polyP chain, producing an n + 1 length chain and ADP (Figure 6.2a). PPK can also operate reversibly to synthesize ATP from ADP and a polyP chain. The gene is naturally transcribed as part of an operon that includes the enzyme polyphosphatase (PPX), which processively degrades polyP to inorganic phosphate. PolyP accumulates in E. coli in response to a shift from low Pi to high Pi medium.63 The production of polyP can also be easily enhanced by introducing additional copies of the ppk gene into the host cell, performing the Pi-shifts, and inducing expression of the recombinant genes following the shift to low Pi. These shift experiments were conducted with both the chromosomal copies of ppk and ppx, and plasmid-based copies of ppk at either low- or high-copy number. Since polyP is a storage polymer, steady-state levels were measured 24 hours following the shift to excess phosphate. The results indicate that the amount of Pi incorporated into polyP at steady-state was ~200 µmol/g DCW for both the lowand high-copy strains (Table 6.2). However, the observed maximum cell density was ~30% lower for the multicopy plamids. This is an indication of the metabolic burden effects that can be observed with (a) PPK

polyPn ATP (b)

polyPn+1

PPX

O

polyPn+ Pi

-O

P

O

O-

P

O-

On–2 O

SPR O

H 2N

N

CH3

H2N

N

H N

O

C10, C15, C20 H2N

IMP

OH

N

HN GCHI

N

H2N PPP O

N

N O

OPPP N

(d)

CH3

O

OH

HN

OH

NH

PTPS

OPP DMAPP (C5)

polyP

BH4

O

NH

idi

HN

O

H N

HN

DXP

Lycopene

O

(c)

Glucose

OPP

O

O-

ADP

Pyruvate + G3P dxs

IPP (C5)

P

O

OH

NH

guaB

XMP

guaA

OH OH

GMP

gmk

GDP

GTP

Figure 6.2 (See color insert following page 13-20.) Metabolic pathways in which the use of low-copy plasmids improved production. (a) One-step pathway for the production of polyphosphates from ATP. PPK = polyphosphate kinase, PPX = polyphosphatase. (b) Abbreviated pathway for lycopene formation. dxs = DXP synthase, idi = IPP isomerase; G3P = glyceraldehydes-3-phosphate, IPP = isopentenyl diphosphate, DMAPP = dimethylallyl diphosphate; C10, C15, C20 symbolize 10-, 15-, and 20-carbon chain length intermediates. (c) Tetrahydrobiopterin (BH4) biosynthetic pathway from GTP (guanosine 5′-triphosphate). GCHI = GTP cyclohydrolase I; PTPS = 6-pyruvoly-tetrahydropterin synthase; SPR = sepiapterin reductase. (d) Abbreviated pathway for GTP synthesis from IMP. guaB = IMP dehydrogenase; guaA = GMP synthase; gmk = GMP kinase. IMP = inosine 5′-monophosphate; XMP = xanthosine 5′-monophosphate; GMP = guanosine 5’-monophosphate; GDP = guanosine 5′-diphosphate; GTP = guanosine 5′-triphosphate.

Low-Copy Number Plasmids as Artificial Chromosomes

6-9

Table 6.2 Comparison of Productivities Observed with the Use of High- and Low-Copy Plasmids in Metabolic Engineering Case Studies Low-Copy Plasmids

Multicopy Plasmids Case 1: Polyphosphate28

219 µmol Pi/g DCW (1X)

223 µmol Pi/g DCW (1X)

164 µmol/L (1.3X)

127 µmol/L (1X) Case 2: Lycopene28

tac promoter (0 mM IPTG) 8.5 mg/L (1.2X) tac promoter (0.1 mM IPTG) 8.5 mg/L (7.1X) araBAD promoter (13.3 mM arabinose) 7.5 mg/L (0.7X)

tac promoter (0 mM IPTG) 6.9 mg/L (1X) tac promoter (0.1 mM IPTG) 1.2 mg/L (1X) araBAD promoter (13.3 mM arabinose) 10.3 mg/L (1X)

Case 3: Tetrahydrobiopterin (BH4)*,67 ~120 AU (1.6X)

~75 AU (1X)

Note: Enhancement factors relative to the multicopy plasmids are given in parentheses next to the relevant data points. *For the tetrahydrobiopterin, reported product concentrations are estimated from three data points presented in a scatter plot in the original reference.67 Units were originally reported as “relative intensity” and are given here as “arbitrary units” (AU).

engineered microbes. Because the cell densities differed so significantly, the volumetric yield of polyP was ~30% higher in the culture that utilized low-copy plasmids. 6.4.2.2 Enhanced Lycopene Production by Overexpression of dxs from a Mini-F Plasmid (Figure 6.2b) 28 The production of lycopene, a carotenoid, has already been presented in previous examples. Briefly, E. coli contains a mevalonate-independent pathway for the production of IPP, the universal precursor for the isoprenoids. In this pathway, the first step toward IPP is the condensation of pyruvate and glyceraldehyde-3-phosphate to form 1-deoxyxylulose 5-phosphate (DXP).64 The gene responsible for the DXP synthase activity is dxs.65,66 Introduction of the carotenoid synthesis genes crtEBI from Erwinia herbicola completes the pathway for lycopene synthesis in E. coli. As described previously, increased expression of the dxs gene results in elevated levels of lycopene. 56,58,59 In this case, lycopene levels were measured as a function of plasmid copy number, comparing a mini-F-derived vector to a high-copy ColE1-type plasmid. Additionally, two different promoters were examined, tac and araBAD. Use of the strong tac promoter resulted in increased levels of lycopene for both low- and multicopy plasmids, but only in the absence of any inducer (Table 6.2). The leaky expression was sufficiently high to improve product formation. However, the addition of even a small amount of IPTG resulted in a significant inhibition of growth in the multicopy system, yet another sign of metabolic burden. The loss of cell growth correlated with a loss of productivity, such that an induced tac promoter is overwhelmingly more desirable when used on a low-copy plasmid versus a multicopy one. Because the tac promoter is known to be strong, the experiment was repeated using the weaker, arabinose-inducible araBAD promoter. With this promoter fully induced, increased lycopene production was observed from both low- and multicopy plasmids. Here, the low-copy plasmid was about 30% less productive than its multicopy counterpart. It is interesting to note that reduction of the promoter strength on the multicopy plasmid improved lycopene production; whereas, the same reduction in a

6-10

Gene Expression Tools for Metabolic Pathway Engineering

low-copy plasmid had only a very small effect (Table 6.2). This suggests once again that alteration of the promoter strength may be an effective means of altering gene expression from low-copy vectors, including the genome,56 while minimizing metabolic burden effects. 6.4.2.3 De Novo Biosynthesis of Tetrahydrobiopterin (BH4) by Overexpression of guaBA from a pSC101-Derived Plasmid (Figure 6.2c and d)67 The third example involves the synthesis of tetrahydrobiopterin (BH4), an essential cofactor found in higher organisms.67 BH4 is used in human therapeutics to treat natural deficiencies, which are implicated in some neurological diseases. Currently, BH4 is chemically synthesized, but the presence of three chiral centers in the molecule makes this difficult. The compound can be biologically synthesized in three steps from GTP using mammalian enzymes, all of which have been cloned (Figure 6.2c). Yamamoto and coworkers succeeded in introducing all three of these enzymes into E. coli to construct a pathway for BH4 production.67 To further increase the flux to BH4, the researchers decided to increase the flux to GTP. Two genes were chosen for expression from the purine nucleotide synthesis pathway (Figure 6.2d). Overexpression of the gene gmk, which converts GMP to GDP, the immediate precursor to GTP, had no effect on BH4 levels. However, overexpression of guaB and guaA in an operon did lead to enhancements in BH4. These two enzymes are upstream of GMP in the purine nucleotide catabolic pathway and convert inosine 5′-monophosphate (IMP) to xanthosine 5′-monophosphate (XMP), then XMP to GMP, respectively. In this case, expression of the operon from a low-copy plasmid resulted in the production of 60% more BH4 relative to the high-copy plasmids. It should be noted that addition of additional copies of gmk to a strain overexpressing guaBA did not lead to further increases, indicating that the GMP kinase is not rate controlling. 6.4.2.4 Summary of Case Studies These three cases provide illustrations of the benefits of low-copy plasmids over multicopy ones, and also highlight the potential impact of metabolic burden. However, they do not provide enough information to allow one to predict which systems are more likely to suffer from adverse metabolic burden effects if expression levels are too high. Both polyp and BH4 production consume nucleotide cofactors, ATP and GTP, so perhaps this is a key indicator of vulnerable systems. However, lycopene synthesis has no explicit requirement for a nucleotide cofactor. On the other hand, siphoning of pyruvate and glyceraldehyde-3-phosphate into DXP prevents their entry into the energy-generating pathways of the TCA cycle and glycolysis, respectively. ATP is also the primary “energy currency” of the cell, which suggests that compounds involved in energy generation are critical. Yet one counter-example is the production of a monoterpene, another isoprenoid, in E. coli.68 This pathway uses the same mevalonate route to IPP as the lycopene case above; however, the use of low-copy (pSC101-derived) vectors resulted in very lowexpression levels and poor product yields. And, although the use of a reduced copy number vector did result in an improvement in both yield and product quality for PHB production in E. coli in one case,52 expression of the pha operon from a single-copy plasmid produced no measurable amounts of product in another.69 Hence, it is clear that although low-copy plasmids can perform as well as or better than high-copy ones for metabolic engineering, there is no guaranteeing that they will. Further, there is little understanding of how to determine which systems are most amenable to low-copy expression. Clearly, there is much work yet to be done in uncovering the intricate balance between maximum expression and minimum burden effects that must be achieved to reach optimal product yields.

6.5 Opportunities for Artificial Chromosome Use in Metabolic Engineering While the above examples provide an “existence proof” for the usefulness of low-copy plasmids in metabolic engineering applications, it is worthwhile to consider some aspects of metabolic engineering for which the low-copy plasmids are uniquely qualified. In particular, there are two properties of these plasmids that are worth exploring and exploiting further in this context: (1) the ability to minimally perturb

Low-Copy Number Plasmids as Artificial Chromosomes

6-11

the host cell upon introduction and expression of foreign genes; and (2) the ability to stably maintain extremely large fragments of DNA.

6.5.1 Minimal Perturbation of Host Metabolism: An Opportunity for Novel Metabolic Control Systems As discussed previously, altering the dosage of a recombinant gene or changing the promoter that controls its expression are both effective ways of influencing the amount of the corresponding recombinant protein that is produced. Another method that has been investigated involves the manipulation of the messenger RNA transcript. Engineered secondary structures, particularly in the 5′ untranslated region of the transcript, can influence the stability (half-life) of mRNA.70 Based on the central dogma of molecular biology, it is therefore expected that increasing the half-life of a message would increase the average concentration of the molecule within the cell and by extension, increase the amount of the corresponding protein that is produced. This was observed in the case of hairpins applied to a lacZ transcript.71 Several hairpins of varying strength were subsequently tested with inducible expression from either ColE1-type high-copy plasmids (~100 copies/cell) or mini-F-derived low-copy plasmids (~1 copy/ cell).72,73 This experiment revealed that, in the case of high-copy plasmids, stabilizing the transcript resulted in increased β-galactosidase activity only at low to moderate induction levels. At higher induction levels, the activities were comparable between stabilized and nonstabilized transcripts, indicating that transcript level was no longer rate-controlling in the system. Additionally, the growth rate of the cultures was significantly reduced with high-level expression from the high-copy plasmids. By contrast, while the low-copy plasmids gave lower expression than their higher copy counterparts across the various stabilizing structures examined, expression from stabilized transcripts was always higher than the nonstabilized transcripts. The effect on growth rate was minimal across all expression levels. This resulted in a library of structures that gave a range of activities as a function of both inducer concentration and secondary structure, with minimal perturbation to native metabolism. This work was extended to examine the expression of genes from a synthetic two-gene operon composed of lacZ and gfp in which a 5′ hairpin was inserted before the downstream gene.74 Although the results indicate that a number of factors ultimately influence gene expression due to the complexity of the systems examined (i.e., changing copy number, mRNA secondary structure, and relative position of the two genes), the general trend of low-copy plasmids responding to changes in hairpin structure over a wider range of inducer controls was still observed. In the case of the expression of the lacZ gene, the operons also appeared to be better insulated against downstream effects when low-copy plasmids were used. For example, a decrease in the β-galactosidase activity was observed when lacZ was in the upstream position and a hairpin was inserted in front of the downstream gfp gene on a high-copy plasmid. By contrast, there was no significant impact on upstream lacZ expression when hairpins were inserted in front of the downstream gene in low-copy plasmids. This suggests, again, that precise, even predictable, control of gene expression may be more easily achieved with low-copy vectors. This ability to minimally perturb the host (e.g., to minimally impact growth rate) while controlling gene expression may find other uses in metabolic engineering as well. Recently, more researchers have begun to examine regulatory networks to both co-opt existing systems75 and design new ones for novel purposes.76 (Much of this work falls under the heading of “synthetic biology,” yet it still involves a manipulation of the cell’s metabolism through heterologous gene expression and is, therefore, still rightly viewed as an aspect, perhaps even outgrowth, of metabolic engineering.) In at least one example, the integration of regulation and metabolism produced a cell with an oscillatory metabolic state.77 As such systems become more complex and more integrated—for example, engineering both gene expression levels for optimal flux through a key pathway step and the regulation of expression relative to cellular metabolism—utilizing expression vectors that have minimal impact on basic cell metabolism is likely to produce more stable, reproducible, and predictable systems. Indeed, it may be considered an engineering “fail-safe” for many applications in metabolic engineering.

6-12

Gene Expression Tools for Metabolic Pathway Engineering

6.5.2 Maintenance of Large DNA Fragments: Pathway Engineering and Discovery The ability of low-copy plasmids, particularly those of the mini-F variety, to stably replicate large DNA fragments has been appreciated for quite some time.78,79 Reports of the stable maintenance of 300-Kb fragments of human DNA led to interest in the use of these “bacterial artificial chromosomes (BACs)” for large-scale sequencing projects, especially during the period of the Human Genome Project.80 BACs became a viable alternative to yeast artificial chromosomes (YACs) because they can maintain fragment sizes that, though smaller, begin to approach those maintained in YACs (>300-Kb vs. >1,000-Kb) and that are much larger than what can be maintained in cosmids (~50-Kb). Additionally, BACs appear not to carry the difficulties associated with YACs, including the formation of chimeras from noncontinguous fragments81 and the contamination of recovered insert DNA with host chromosomal DNA.82 The recent ability to construct large DNA fragments through primarily chemical (as opposed to biological) means83 warrants a new look at the low-copy plasmids as vectors for metabolic engineering. This new technology can produce DNA segments that are on the order of tens to 100’s of Kb. It is very unlikely that high-copy plasmids could sustain such large fragments with high fidelity.84 Further, it is quite likely that introduction of multiple copies of large DNA fragments into the host organism would have deleterious effects. By contrast, the low-copy plasmids have the demonstrated capacity to maintain >300-Kb insert fragments, which should be sufficient to carry entire metabolic pathways. Additionally, as described in the previous section, the use of low-copy vectors enables the use of other technologies for regulating the expression of multiple genes.72 Other improvements in expression methods, including the incorporation of strong promoter sequences56,85 and the optimization of codon usage in heterologous genes to favor the biosynthetic machinery of the host60 also make it more likely that expression levels from low- and even single-copy vectors would be sufficient to achieve an efficiently engineered system. Carrying such large pathways on artificial chromosomes instead of inserting them into the genome also avoids the potential for long-range epigenetic effects or polar mutations that may result from the incorporation of such a large amount of DNA into the host genome. With the assembly of “mega” fragments now possible, the challenge now shifts to the introduction of such large pieces of DNA into the organism. It has been suggested that the current limitation on the size of BAC libraries is not due to any inherent limitation in the carrying capacity of the vector but rather in the ability to transform the microbial host using existing technologies.84 Indeed, F-prime factors naturally carry chromosomal DNA that ranges from <1 to >30% of the E. coli genome.86 One way to address the challenge of introduction of large DNA fragments into the host cell is through in vivo assembly. Such an approach has been described with F-derived low-copy plasmids.44 Although this specific demonstration resulted in the production of a ~125-Kb plasmid, there is no obvious reason why in vivo methods could not create F-based plasmids of the size naturally observed with the F-primes. It should also be noted that more recently, low-copy vectors with copy numbers closer to that of the pSC101 replicons (four to five copies per cell) have also shown the capability of stably maintaining insert DNA fragments of >300-Kb.84 Conventional vectors with higher copy numbers (30–40 copies/cell) were not stably maintained. However, although vectors with several copies per cell may be able to stably maintain large fragments, an in vivo assembly approach is likely to be less effective with such vectors because of the presence of multiple targets for homologous recombination. The ability of low-copy plasmids to stably maintain large DNA fragments may prove to be a significant advantage for future applications of metabolic pathway engineering, especially for the optimization of existing pathways. This characteristic also provides an interesting opportunity for the discovery of novel enzymes for use in pathway design and construction, or the discovery of entire pathways. BAC libraries have been used to profile “metagenomes,” the composite genomes of bacterial consortia.87,88 These metagenomic libraries provide access to microbial diversity while circumventing the need to culture individual species, akin to 16S RNA genotyping of consortia.89 However, such libraries provide a further opportunity to screen by phenotype, providing clues to the functional diversity of the system as

Low-Copy Number Plasmids as Artificial Chromosomes

6-13

well. This technique was first tested with a BAC library of a single genome, in which the expression of Bacillus cereus DNA in E. coli was analyzed by screening for established phenotypes, including ampicillin resistance and orange pigment production.90 The use of BACs has since been expanded to profile antibacterial, lipase, amylase, nuclease, and hemolytic activities in soil bacteria,87 and to isolate a novel alcohol/aldehyde deydrogenase from the microorganisms in a waste water treatment plant.88 While one objective of metabolic engineering is to enable the efficient production of natural products in scaleable processes,60 achieving this goal requires the identification and recruitment of the associated enzyme activities from the native host. In the case of bacteria, functional genomics in BAC libraries could significantly accelerate this process, leading to the identification of new activities and/or whole pathways.

6.6 Summary Until very recently, the low-copy plasmids have been used almost exclusively for the following purposes: (i) low-level expression of toxic genes, (ii) cloning of large eukaryotic DNA fragments for sequencing (i.e., in BACs); and (iii) to enable the expression of multiple genes from independent, compatible plasmids. The metabolic burden effect of high-copy plasmids has been well-documented and appreciated in the context of recombinant protein production, with the protein as the final product of interest. In this case, it is easy, and perhaps warranted, to think of the metabolic burden effect as an engineering challenge that can be addressed in the context of bioprocess development if large quantities of the product are desired, for example, as a biopharmaceutical. However, the focus of metabolic engineering on the production of small molecules—metabolites—requires one to consider the metabolic burden effect to a greater extent. The few examples discussed here of superior performance achieved through the use of a low-copy plasmid rather than a high-copy one to overexpress a gene of interest are unlikely to be the only cases for which this phenomenon is true. Unfortunately, at the current time, there is no simple way to predict which systems will be affected the most by plasmid-mediated metabolic stresses. As a deeper understanding of the effects of high-copy number plasmid maintenance on the metabolome of the cell emerges, perhaps such predictions may become possible. It is also worth noting the unique features of low-copy plasmids that can be exploited in metabolic engineering. The desire to create increasingly sophisticated microbes has resulted in increasing interest in engineering the regulatory network of a cell.75–77 As demonstrated in the case of regulating gene expression at the transcript level, it is very likely that such engineered systems will behave more reproducibly and with tighter control if the vector causes a minimum perturbation to the native metabolism of the host cell. Finally, the ability of low-copy vectors to maintain very large DNA fragments truly warrants their designation as “artificial chromosomes.” Such chromosomes are capable of carrying entire metabolic pathways—with individual promoters, in natural or synthetic operons, or both—for the production of novel compounds in the microbial host through a parallel metabolic network.

References 1. Cohen, S. N., Chang, A. C. Y., Boyer, H. W., and Helling, R. B. Construction of biologically functional bacterial plasmids in vitro. Proc. Nat. Acad. Sci. USA 70 (11), 3240–3244, 1973b. 2. Chang, A. C. Y. and Cohen, S. N. Genome construction between bacterial species in vitro: replication and expression of Staphylococcus plasmid genes in Escherichia coli. Proc. Nat. Acad. Sci. USA 71 (4), 1030–1034, 1974. 3. Morrow, J. F., Cohen, S. N., Chang, A. C. Y., Boyer, H. W., Goodman, H. M., and Helling, R. B. Replication and transcription of eukaryotic DNA in Escherichia coli. Proc. Nat. Acad. Sci. USA 71 (5), 1743–1747, 1974. 4. Cohen, S. N. and Chang, A. C. Y. Recircularization and autonomous replication of a sheared R-factof DNA segment in Escherichia coli transformants. Proc. Nat. Acad. Sci. USA 70 (5), 1293–1297, 1973a.

6-14

Gene Expression Tools for Metabolic Pathway Engineering

5. Cohen, S. N. and Chang, A. C. Y. Revised interpretation of the origin of the pSC101 plasmid. J. Bacteriol. 132 (2), 734–737, 1977. 6. Cabello, F., Timmis, K., and Cohen, S. N. Replication control in a composite plasmid constructed by in vitro linkage of two distinct replicons. Nature 259, 285–290, 1976. 7. Keasling, J. D. Gene-expression tools for the metabolic engineering of bacteria. TIBTECH 17, 452– 460, 1999. 8. Prather, K. J., Sagar, S., Murphy, J., and Chartrain, M. Industrial scale production of plasmid DNA for vaccine and gene therapy: plasmid design, production, and purification. Enzyme Microb. Tech. 33, 865–883, 2003. 9. Lin-Chao, S. and Bremer, H. Effect of the bacterial growth rate on replication control of plasmid pBR322 in Escherichia coli. Mol. Gen. Genet. 203, 143–149, 1986. 10. Davison, J. Mechanism of control of DNA replication and incompatibility in ColE1-type plasmids— a review. Gene 28, 1–15, 1984. 11. Lacatena, R. M., Banner, D. W., Castagnoli, L., and Cesareni, G. Control of initiation of pMB1 replication: purified Rop protein and RNA I affect primer formation in vitro. Cell 37, 1009–1014, 1984. 12. Tomizawa, J-i. and Som, T. Control of ColE1 plasmid replication: enhancement of binding of RNA I to the primer transcript by the Rom protein. Cell 38, 871–878, 1984. 13. Balbás, P., Soberón, X., Merino, E., Zurita, M., Lomeli, H., Valle, F., Flores, N., and Bolivar, F. Plasmid vector pBR322 and its special-purpose derivatives—a review. Gene 50, 3–40, 1986. 14. Twigg, A. J. and Sherratt, D. Trans-complementable copy-number mutants of plasmid ColE1. Nature 283, 216–218, 1980. 15. Vieira, J. and Messing, J. The pUC plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene 19, 259–268, 1982. 16. Norrander, J., Kempe, T., and Messing, J. Construction of improved M13 vectors using oligodeoxynucleotide-directed mutagenesis. Gene 26, 101–106, 1983. 17. Lin-Chao, S., Chen, W-T., and Wong, T-T. High copy number of the pUC plasmid results from a Rom/Rop-suppressible point mutation in RNA II. Mol. Microbiol. 6 (22), 3385–3393, 1992. 18. Chambers, S. P., Prior, S. E., Barstow, D. A., and Minton, N. P. The pMTL nic- cloning vectors. I. Improved pUC polylinker regions to facilitate the use of sonicated DNA for nucleotide sequencing. Gene 68, 139–149, 1988. 19. Stoker, N. G., Fairweather, N. F., and Spratt, B. G. Versatile low-copy-number plasmid vectors for cloning in Escherichia coli. Gene 18, 335–341, 1982. 20. Takeshita, S., Sato, M., Toba, M., Masahashi, W., and Hashimoto-Gotoh, T. High-copy-number and low-copy-number plasmid vectors for lacZα-complementation and chlromaphenicol or kanamycin-resistance selection. Gene 61, 63–74, 1987. 21. Lerner, C. G. and Inouye, M. Low copy number plasmids for regulated low-level expression of cloned genes in Escherichia coli with blue/white insert screening capability. Nucleic Acids Res. 18 (15), 4631, 1990. 22. Wang, R. F. and Kushner, S. R. Construction of versatile low-copy-number vectors for cloning, sequencing, and gene expression in Escherichia coli. Gene 100, 195–199, 1991. 23. Kline, B. C. Mechanism and biosynthetic requirements for F plasmid replication in Escherichia coli. Biochemistry 13, 139–146, 1974. 24. Keasling, J. D., Palsson, B. O., and Cooper, S. Cell-cycle-specific F plasmid replication: regulation by cell size control of initiation. J. Bacteriol. 173 (8), 2673–2680, 1991. 25. Ogura, T. and Hiraga, S. Partition mechanism of F plasmid: two plasmid gene-encoded products and a cis-acting region are involved in partition. Cell 32, 351–360, 1983. 26. Hiraga, S., Jaffe, A., Ogura, T., Mori, H., and Takahashi, H. F plasmid ccd mechanism in Escherichia coli. J. Bacteriol. 166 (1), 100–104, 1986. 27. Jones, K. L. and Keasling, J. D. Construction and characterization of F plasmid-based expression vectors. Biotechnol. Bioeng. 59, 659–665, 1998.

Low-Copy Number Plasmids as Artificial Chromosomes

6-15

28. Jones, K. L., Kim, S-W., and Keasling, J. D. Low-Copy plasmids can perform as well as or better than high-copy plasmids for metabolic engineering of bacteria. Metabolic Eng. 2, 328–338, 2000. 29. Seo, J-H. and Bailey, J. E. Effects of recombinant plasmid content on growth properties and cloned gene product formation in Escherichia coli. Biotechnol. Bioeng. 27, 1668–1674, 1985. 30. Bailey, J. E., Silva, N. A. d., Peretti, S. W., Seo, J.-H., and Srienc, F. Studies of host-plasmid interactions in recombinant microorganisms. Ann. NY Acad. Sci. 469, 194–211, 1986. 31. Mason, C. A. and Bailey, J. E. Effects of plasmid presence on growth and enzyme activity of Escherichia coli DH5α. Appl. Microbiol. Biotechnol. 32, 54–60, 1989. 32. Betenbaugh, M. J., Beaty, C., and Dhurjati, P. Effects of plasmid amplification and recombinant gene expression on the growth kinetics of recombinant E. coli. Biotechnol. Bioeng. 33, 1425–1436, 1989. 33. Betenbaugh, M. J. and Dhurjati, P. Effects of promoter induction and copy number amplification on cloned gene expression and growth of recombinant cell cultures. Ann. NY Acad. Sci. 589, 111–120, 1990. 34. Birnbaum, S. and Bailey, J. E. Plasmid presence changes the relative levels of many host cell proteins and ribosome components in recombinant Escherichia coli. Biotechnol. Bioeng. 37, 736–745, 1991. 35. Glick, B. R. Metabolic load and heterlogous gene expression. Biotechnol. Adv. 13 (2), 247–261, 1995. 36. Peretti, S. W., Bailey, J. E., and Lee, J. J. Transcription from plasmid genes, macromolecular stability, and cell-specific productivity in Escherichia coli carrying copy number mutant plasmids. Biotechnol. Bioeng. 34, 902–908, 1989. 37. Wood, T. K. and Peretti, S. W. Depression of protein synthetic capacity due to cloned-gene expression in E. coli. Biotechnol. Bioeng. 36, 865–878, 1990. 38. Vind, J., Sorensen, M. A., Rasmussen, M. D., and Pedersen, S. Synthesis of proteins in Escherichia coli is limited by the concentration of free ribosomes. J. Mol. Biol. 231, 678–688, 1993. 39. Dong, H., Nilsson, L., and Kurland, C. G. Gratuitous overexpression of genes in Escherichia coli leads to growth inhibition and ribosome destruction. J. Bacteriol. 177 (6), 1497–1504, 1995. 40. Seo, J-H. and Bailey, J. E. Continuous cultivation of recombinant Escherichia coli—existence of an optimum dilution rate for maximum plasmid and gene-product concentration. Biotechnol. Bioeng. 28 (10), 1590–1594, 1986. 41. Klotsky, R-A. and Schwartz, I. Measurement of cat expression from growth-rate-regulated promoters employing β-lactamase activity as an indicator of plasmid copy number. Gene 55, 141–146, 1987. 42. Reinikainen, P. and Virkajärvi, I. Escherichia coli growth and plasmid copy numbers in continuous cultivations. Biotechnol. Lett. 11 (4), 225–230, 1989. 43. Nakamura, K. and Inouye, M. Inactivation of the Serratia marcescens gene for the lipoprotein in Escherichia coli by insertion sequences, IS1 and IS5; sequence analysis of junction points. Mol. Gen. Genet. 183, 107–114, 1981. 44. O’Connor, M., Peifer, M., and Bender, W. Construction of large DNA segments in Escherichia coli. Science 244, 1307–1312, 1989. 45. Balbas, P. and Bolivar, F. Design and construction of expression plasmid vectors in Escherichia coli. In Methods in Enzymology, Goeddel, D. D. Academic Press, Inc., San Diego, CA, 1990, 14–37. 46. Summers, D. K. The kinetics of plasmid loss. TIBTECH 9, 273–278, 1991. 47. Summers, D. Timing, self-control and a sense of direction are the secrets of multicopy plasmid stability. Mol. Microbiol. 29 (5), 1137–1145, 1998. 48. Noack, D., Roth, M., Geuther, R., Muller, G., Undisz, K., Hoffmeier, C., and Gaspar, S. Maintenance and genetic stability of vector plasmids pBR322 and pBR325 in Escherichia coli K12 strains grown in a chemostat. Mol. Gen. Genet. 184, 121–124, 1981. 49. Jones, S. A. and Melling, J. Persistence of pBR322-related plasmids in Escherichia coli grown in chemostat cultures. FEMS Microbiol. Lett. 22, 239–243, 1984.

6-16

Gene Expression Tools for Metabolic Pathway Engineering

50. Chotani, G., Dodge, T., Hsu, A., Kumar, M., LaDuca, R., Trimbur, D., Weyler, W., and Sanford, K. The commercial production of chemicals using pathway engineering. Biochim. Biophys. Acta 1543, 434–455, 2000. 51. Shi, J. and Biek, D. P. A versatile low-copy-number cloning vector derived from plasmid F. Gene 164, 55–58, 1995. 52. Kahar, P., Agus, J., Kikkawa, Y., Taguchi, K., Doi, Y., and Tsuge, T. Effective production and kinetic characterization of ultra-high-molecular-weight poly[(R)-3-hydroxybutyrate] in recombinant Escherichia coli. Polym. Degrad. Stab. 87, 161–169, 2005. 53. Leonard, E., Yan, Y., and Koffas, M. Functional expression of a P450 flavonoid hydroxylase for the biosynthesis of plant-specific hydroxylated flavonols in Escherichia coli. Metab. Eng. 8, 172–181, 2005. 54. Snell, K. D., Draths, K. M., and Frost, J. W. Synthetic modification of the Escherichia coli chromosome: enhancing the biocatalytic conversion of glucose into aromatic chemicals. J. Am. Chem. Soc. 118, 5605–5614, 1996. 55. Barker, J. L. and Frost, J. W. Microbial synthesis of p-hydroxybenzoic acid from glucose. Biotechnol. Bioeng. 76 (4), 376–390, 2001. 56. Yuan, L. Z., Rouviere, P. E., LaRossa, R. A., and Suh, W. Chromosomal promoter replacement of the isoprenoid pathway for enhancing carotenoid production in E. coli. Metab. Eng. 8, 79–80, 2006. 57. Wang, C-W., Oh, M.-K., and Liao, J. C. Engineered isoprenoid pathway enhances astaxanthin production in Escherichia coli. Biotechnol. Bioeng. 62 (2), 235–241, 1999. 58. Kim, S-W. and Keasling, J. D. Metabolic engineering of the nonmevalonate isopentenyl diphosphate synthesis pathway in Escherichia coli enhances lycopene production. Biotechnol. Bioeng. 72 (4), 408–415, 2001. 59. Alper, H., Fischer, C., Nevoigt, E., and Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA 102 (36), 12678–12683, 2005. 60. Martin, V. J. J., Pitera, D. J., Withers, S. T., Newman, J. D., and Keasling, J. D. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nature Biotech. 21 (7), 796–802, 2003. 61. Berry, A., Dodge, T. C., Pepsin, M., and Weyler, W. Application of metabolic engineering to improve both the production and use of biotech indigo. J. Ind. Microbiol. Biotechnol. 28, 127–133, 2002. 62. Cottaz, S. and Samain, E. Genetic engineering of Escherichia coli for the production of NI,NIIdiacetylchitobiose (chitinbiose) and its utilization as a primer for the synthesis of complex carbohydrates. Metab. Eng. 7, 311–317, 2005. 63. Sharfstein, S. T. and Keasling, J. D. Polyphosphate metabolism in Escherichia coli. Ann. NY Acad. Sci. 745, 77–91, 1994. 64. Harker, M. and Bramley, P. M. Expression of prokaryotic 1-deoxy-D-xylulose-5-phosphatases in Escherichia coli increases carotenoid and ubiquinone biosynthesis. FEBS Lett. 448, 115–119, 1999. 65. Sprenger, G. A., Schörken, U., Wiegert, T., Grolle, S., Graaf, A. A. d., Taylor, S. V., Begley, T. P., Bringer-Meyer, S., and Sahm, H. Identification of a thiamin-dependent synthase in Escherichia coli required for the formation of the 1-deoxy-D-sylulose 5-phosphate precursor to isoprenoids, thiamin, and pyridoxol. Proc. Natl. Acad. Sci. USA 94, 12857–12862, 1997. 66. Lois, L. M., Campos, N., Putra, S. R., Danielsen, K., Rohmer, M., and Boronat, A. Cloning and characterization of a gene from Escherichia coli encoding a transketolase-like enzyme that catalyzes the synthesis of D-1-deoxyxylulose 5-phosphate, a common precursor of isoprenoid, thiamin, and pyridoxol biosynthesis. Proc. Natl. Acad. Sci. USA 95, 2105–2110, 1998. 67. Yamamoto, K., Kataoka, E., Miyamoto, N., Furukawa, K., Ohsuye, K., and Yabuta, M. Genetic engineering of Escherichia coli for production of tetrahydrobiopterin. Metab. Eng. 5, 246–254, 2003. 68. Carter, O. A., Peters, R. J., and Croteau, R. Monoterpene biosynthesis pathway construction in Escherichia coli. Phytochem. 64, 425–433, 2003. 69. Fidler, S. and Dennis, D. Polyhydroxybutyrate production in recombinant Escherichia coli. FEMS Microbiol. Rev. 103, 231–36, 1992.

Low-Copy Number Plasmids as Artificial Chromosomes

6-17

70. Carrier, T. A. and Keasling, J. D. Controlling messenger RNA stability in bacteria: strategies for engineering gene expression. Biotechnol. Prog. 13, 699–708, 1997. 71. Carrier, T. A. and Keasling, J. D. Engineering mRNA stability in E. coli by the addition of synthetic hairpins using a 5’ cassette system. Biotechnol. Bioeng. 55 (3), 577–580, 1997. 72. Carrier, T., Jones, K. L., and Keasling, J. D. mRNA stability and plasmid copy number effects on gene expression from an inducible promoter system. Biotechnol. Bioeng. 59 (6), 666–672, 1998. 73. Carrier, T., Jones, K. L., and Keasling, J. D., unpublished results. 74. Smolke, C. D. and Keasling, J. D. Effect of copy number and mRNA processing and stabilization on transcript and protein levels from an engineered dual-gene operon. Biotechnol. Bioeng. 78 (4), 412–424, 2002. 75. Farmer, W. R. and Liao, J. C. Improving lycopene production in Escherichia coli by engineering metabolic control. Nat. Biotechnol. 18, 533–537, 2000. 76. Elowitz, M. B. and Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338, 2000. 77. Fung, E., Wong, W. W., Suen, J. K., Bulter, T., Lee, S-g., and Liao, J. C. A synthetic gene-metabolic oscillator. Nature 435, 118–122, 2005. 78. Hosoda, F., Nishimura, S., Uchida, H., and Ohki, M. An F factor based cloning system for large DNA fragments. Nucleic Acids Res. 18 (13), 3863–3869, 1990. 79. Shizuya, H., Birren, B., Kim, U-J., Mancino, V., Slepak, T., Tachiri, Y., and Simon, M. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl. Acad. Sci. USA 89, 8794–8797, 1992. 80. Anderson, C. Genome shortcut leads to problems. Science 259, 1684–1687, 1993. 81. Green, E. D., Riethman, H. C., Dutchik, J. E., and Olson, M. V. Detection and characterization of chimeric yeast artificial-chromosome clones. Genomics 11 (3), 658–669, 1991. 82. Zhang, H-B. and Wing, R. A. Physical mapping of the rice genome with BACs. Plant Mol. Biol. 35, 115–127, 1997. 83. Tian, J., Gong, H., Sheng, N., Zhou, X., Gulari, E., Gao, X., and Church, G. Accurate multiplex gene synthesis from programmable DNA microchips. Nature 432, 1050–1054, 2004. 84. Tao, Q. and Zhang, H-B. Cloning and stable maintenance of DNA fragments over 300kb in Escherichia coli with conventional plasmid-based vectors. Nucleic Acids Res. 26 (21), 4901–4909, 1998. 85. Chang, T-S., Wu, W-J., Shiu, T-R., and Wu, W-T. High-level expression of a IacZ gene from a bacterial artificial chromosome in Escherichia coli. Appl. Microbiol. Biotechnol. 61, 234–239, 2003. 86. Holloway, B. and Low, K. B. F-prime and R-prime factors. In Escherichia coli and Salmonella: Cellular and Molecular Biology, 2d ed., Neidhardt, F. C., III, R. C., Ingraham, J. L., Lin, E. C. C., Low, K. B., Magasanik, B., Reznikoff, W. S., Riley, M., Schaechter, M., and Umbarger, H. E. Eds. ASM Press, Washington, DC, 1996, 2413–2420. 87. Rondon, M. E., August, P. R., Bettermann, A. D., Brady, S. F., Grossman, T. H., Liles, M. R., Loiacono, K. A., Lynch, B. A., MacNeil, I. A., Minor, C., Tiong, C. L., Gilman, M., Osburne, M. S., Clardy, J., Handelsman, J., and Goodman, R. M. Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl. Environ. Microbiol. 66 (6), 2541–2547, 2000. 88. Wexler, M., Bond, P. L., Richardson, D. J., and Johnston, A. W. B. A wide host-range metagenomic library from a waste water treatment plant yields a novel alcohol/aldehyde dehydrogenase. Environ. Microbiol. 7 (12), 1917–1926, 2005. 89. Lane, D. J., Pace, B., Olsen, G. J., Stahl, D. A., Sogin, M. L., and Pace, N. R. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc. Nat. Acad. Sci. USA 82, 6955–6959, 1985. 90. Rondon, M. E., Raffel, S. J., Goodman, R. M., and Handelsman, J. Toward functional genomics in bacteria: analysis of gene expression in Escherichia coli from a bacterial artificial chromosome library of Bacillus cereus. Proc. Natl. Acad. Sci. USA 96, 6451–6455, 1999.

7 Chromosomal Engineering Strategies 7.1

Introduction �� 7-1

7.2

S trategies of λ Red-Promoted and SSR-Mediated Modification of the Bacterial Chromosome...............................7-12

Mechanism of Red/ET Recombination and Recombineering • Development of Recombineering Technology • Materials and Methods • Recombineering with Single-Stranded DNA Oligos • Counter-Selection Schemes • Flp and Cre Site-Specific Recombination (SSR) Systems

Gene or Operon Replacement • SSR-Mediated Excision of Drug Markers • Large Deletions • Seamless Deletions Insertions • Duplications and Inversions • Reporter Fusions Random Mutagenesis of Chromosomal Genes Promoter Engineering • Genome Reduction

7.3 7.4

Kenan C. Murphy University of Massachusetts Medical School

Large Insertions �� 7-19 Applications of Red/ET Recombineering Technology for Metabolic Engineering �� 7-21 7.5 Summary ��7-22 References ��7-23

7.1 Introduction Central to the concept of metabolic engineering is the modification of a microorganism’s genome to help overproduce a target metabolite, inhibit the formation of an undesirable byproduct, and/or to improve the fitness of the microorganism for industrial applications. In the early years of metabolic engineering, genetic modification was carried out randomly by chemical or transposon mutagenesis, followed by screening or selecting for a desired phenotype. With the onset of recombinant DNA technology, the addition, deletion, or modification of specific genes or operons began to be carried out in a systematic manner. Since the early 1990s, researchers employing Saccharomyces cerevisiae have enjoyed a distinct advantage in chromosomal engineering strategies, since the high rate of homologous recombination in yeast had afforded the means to generate gene deletions via transformation of a PCR product [1–3]. Alterations to bacterial genomes, especially with non-E. coli bacteria, were not as easily performed. Bacterial gene replacements typically involved transformation of a nonreplicating plasmid containing a deleted or modified gene, followed by a low-frequency plasmid integration event into the chromosome. Selection or screening for resolution events was followed by phenotypic screens and/or Southern gels to identify gene replacement candidates. Studies of homologous recombination pathways in E. coli lead to the development of strains that were genetically modified to promote transformation with linear DNA via the RecBC(D –), RecE, or RecF pathways [4–7]. However, recombinant transformants required large amounts of homology to the 7-1

7-2

Gene Expression Tools for Metabolic Pathway Engineering 3KO

5KO Drug marker

Drug marker

PCR product

+ Chromosome

Target gene

Red recombination Modified chromosome

Drug marker

U

M

D

Figure 7.1 Red/ET recombineering. A PCR product is generated by primers 3KO and 5KO that contains a drug marker flanked by 40 bp of target sequence. The PCR product is purified and electroporated into E. coli containing the λ Red + Gam (or RecET + λ Gam) recombination system. After growing the cells for 1–2 hours, the culture is plated on antibiotic-selection media for growth of the drug resistant transformant. The gene replacement can be verified phenotypically, or by a PCR using primers upstream (U), downstream (D), or within the drug marker (M).

target gene (1–2 kb) and often did not occur at high frequency. Furthermore, transformation of linear DNA substrates were restricted to these recombination-proficient strains; wild type E. coli could not be efficiently transformed with linear DNA fragments, let alone small PCR-generated substrates. What is it about wild type E. coli that prevents efficient transformation with linear DNA? The destruction of linear DNA substrates is accomplished by the RecBCD dsDNA exonuclease [8,9]. Paradoxically, this enzyme is the major component in E. coli’s principle pathway of recombination (the RecBCD pathway), but only works efficiently on DNA substrates of 50–100 kb (the typical size of DNA transferred during conjugation and transduction). Modification of RecBCD from a destructive exonuclease into a recombination-promoting RecA-loading helicase depends on its interaction with Chi sites (crossover hotspot instigator, GCTGGTGG) that occur on average every 5 kb in the E. coli genome. Small linear DNA substrates of 1 kb (like PCR products) are unlikely to have either the right number of Chi sites (at least one on each end) or in the right orientation (Chi is active in only one orientation) to be activated for recombination by RecBCD. Interestingly, if one designs linear DNAs to contain such properly orientated Chi sites, gene replacement using such substrates is elevated to levels reaching one event per 105 cells, as was shown by Dahlbert and Smith [10]. Starting nine years ago, however, a new technology for highly efficient gene replacement in E. coli was initiated by the recognition of the fact that the bacteriophage λ Red and RecET recombination systems are very efficient for crossing small linear substrates into the E. coli chromosome [11–17]. These systems are highlighted by their ability to cross PCR-generated substrates bearing small regions of homology to the target gene (40–50 bp) into bacterial chromosomes, allowing bacterial geneticists to perform PCR-mediated gene replacement that yeast geneticists have enjoyed for years (see Figure 7.1). The use of

Chromosomal Engineering Strategies

7-3

phage recombination systems acting on PCR-generated substrates to promote engineering of bacterial chromosomes and BACS has been termed recombineering [18] and presents new options for the cloning and manipulation of DNA [19]. It has altered the way investigators approach chromosomal engineering strategies, and is the focus of the methodology described in this chapter.

7.1.1 Mechanism of Red/ET Recombination and Recombineering Two phage recombination systems have been used to promote recombineering in E. coli and related bacteria: the λ Red system and the rac prophage RecET system. Hence, the term Red/ET recombineering is used here to denote use of either system to promote chromosomal gene replacements and alterations. The λ Red system (initially identified by recombination defective λ phage) consists of three genes: exo, bet and gam (originally described as redα, redβ, and redγ) [20]. The exo gene encodes for λ exonuclease , a trimeric 5′-3′ dsDNA exonuclease that generates a 3′ ended ssDNA tail [21,22], a common intermediate in virtually all recombination pathways. The Bet protein, which forms multimeric rings (12–18 subunits per ring), binds ssDNA, promotes the annealing of complementry ssDNA strands, and can promote limited strand exchange [23–26]. The Exo and Bet proteins form a complex in vitro and presumably work in concert in vivo to prepare dsDNA ends for recombination. The λ Gam protein plays an important role in recombineering by binding to RecBCD and preventing it from binding to dsDNA ends [27], thus inhibiting all of the enzyme’s activities [28,29]. This inactivation of RecBCD allows Exo and Bet to work unimpeded, which in turn promotes high-frequency linear DNA recombination. The RecE and RecT rac prophage recombination functions are biochemically similar to λ Exo and Bet, respectively. The demonstration of gene replacement in E. coli using PCR products with short regions of homology was first demonstrated with RecET [12], but the λ Red functions have been claimed to be up to three-fold more efficient when compared to RecET [19]. RecT and λ Bet singlestrand annealing proteins are members of the same superfamily [30], but share little sequence similarity. RecE exonuclease (originally identified as ExoVIII of E. coli) and λ Exo both degrade the 5′ strand of dsDNA ends and prefer dsDNA to ssDNA. However, the proteins share no sequence similarity, are immunologically distinct [31], and differ in size: λ Exo has a monomer MW of 25.9 kD and has been crystallized as a trimer [22], whereas RecE exonuclease is a monomer of 140 kD [32]. Genetic analyses of phage λ crosses have demonstrated that Red acts at dsDNA breaks in vivo, and have led to a model whereby Bet acts on the 3′ ssDNA generated by Exo and promotes either strand invasion and/or strand annealing, depending on the nature of the substrates involved, the presence or absence of RecA, and whether or not replication is ensuing [33–37]. These studies have led to two models of Redpromoted recombination. In Figure 7.2a, Red-promoted recombination is assisted by RecA (as well as a number of other RecF pathway and phage functions, not shown) and involves 3′ ssDNA invasion into a homologous duplex. Branch migration proteins (RuvAB/RecG) lead to the formation of a classical Holliday junction, which serves as a substrate for RuvC resolvase. In the absence of replication, RecA is strictly required. The role of Red in this pathway may simply be to provide the ssDNA for RecA. An alternative model of Red recombination (Figure 7.2b) is the ssDNA-annealing pathway. In this pathway of recombination, λ Red acts on dsDNA ends containing overlapping sequences (such that might occur on the ends of circularly permuted phage chromosomes). In this scenario, the processive action of Exo degrades the 5′ strand at the sites of entry, and Bet binds (or is loaded by Exo) to the ssDNA ends generated. Bet then aligns and anneals the ssDNA substrates containing homologous regions of the overlap. In this case, RecA is not required (as no strand invasion is necessary). This model of λ Red recombination is based largely on the in vitro properties of Exo and Bet proteins [21,23,24]. Key elements of this model were demonstrated by Red-dependent packaging of phage following transfection of recB spheroplasts with sheared (half-length) molecules of the λ chromosome [38], and by physical analysis of recombinant products generated by cutting two parental nonreplicating λ chromosomes at nonallelic sites in recA hosts [37]. However, attempts to reconstitute this reaction in vitro have failed to

7-4

Gene Expression Tools for Metabolic Pathway Engineering

(a)

(b)

(c)

Exo Bet

RuvAB

RecA

RuvC

X

Figure 7.2 Three pathways of λ Red-promoted recombination. (a) RecA-dependent pathway. At a dsDNA end, λ Exo (toroid) digests the 5´ strand of dsDNA duplex. RecA (triangle) promotes strand invasion of the 3´ ssDNA into an homologous duplex. The exact role of Bet (circle) is not known, but is thought to protect the 3´ ends generated by λ Exo. Bet is then (presumably) replaced by RecA in a RecFOR dependent mechanism. Branch migration and resolution of Holliday junction is carried out by host recombination functions (e.g., RuvABC). Only one resolution product is shown. (b) Single-stranded DNA annealing (SSA) pathway. Bet and Exo act on dsDNA ends that share terminal homology. Exo produces 3´ ssDNA tails and Bet anneals the ssDNA overlapping regions; PolI then fills the gaps and DNA ligase seals the nicks. (c) Red-promoted recombineering. λ Exo’s 5´ nuclease activity generates ssDNA, which serves as a substrate for Bet-promoted annealing to ssDNA in the lagging strand of a replication fork. The “x” in the recombinant represents a deletion, an insertion, or a point mutation. The model shown is a proposed mechanism for oligo-mediated recombineering. The mechanism for recombineering with substrates containing a non-homology (e.g., a PCR product containing a drug marker flanked with 50 bp of target DNA) is not known, but is thought to target a replication fork by a similar type of mechanism.

provide strong biochemical support for this model (Poteete and Murphy, unpublished observations). Instead, an alternative model suggests that Red generates a dsDNA end with a Bet-coated ssDNA tail, which then serves as a substrate for the phage chromosome to invade the replication fork and promote strand switching by DNA polymerase [39]. This model was inspired by the observation by Ellis et al. [40] who showed that Bet can promote annealing of ssDNA oligos to the E. coli chromosome. In that study, oligos that targeted the lagging strand of the replication fork were more efficient for oligomediated repair relative to those that targeted the leading strand, suggesting that ssDNA regions of the replication fork are targets for Bet-promoted ssDNA annealing activity. Recombineering with PCR substrates containing short regions of homology is independent of many of the host recombination functions (including RecA) and thus cannot proceed via the model shown in Figure 7.2a. While the actual mechanism is not known, Figure 7.2c suggest a model whereby Exo generates a fully ssDNA substrate coated with Bet (in this case, the action of Exo is limited to one end) or a dsDNA substrate containing Bet-coated ssDNA tails (with Exo acting at both ends of the linear substrate—not shown). In either case, Bet-ssDNA complexes are targeted to regions of the replication fork where they get incorporated at high frequency. In this case, the “x” in the recombinant (Figure 7.2c) represents a deletion, an insertion, or point mutation. This model is supported by the recent finding that

Chromosomal Engineering Strategies

7-5

a biotinylated single-stranded DNA oligo is directly incorporated into a “repaired” plasmid following λ Red-mediated repair of a defective chlorampenicol acetyl transferase (cat) gene [41].

7.1.2 Development of Recombineering Technology The early years of λ red genetics, and development of the break-join model of λ Red recombination, is nicely reviewed by Stahl [42]. That same year, chromosomal gene replacement using the λ red and gam functions was first demonstrated using drug markers flanked by large regions of flanking homology (~1 kb) [11]. A strain that had the recBCD chromosomal region replaced with a Plac-bet-exo operon generated recombination with linear DNA substrates at a rate 50-fold higher than the standard recombination-proficient strain JC9387 (recBC sbcCD), a RecF pathway E. coli strain typically used at the time for gene replacement. (In my laboratory, the use of the RecF pathway to generate the ∆recBCD::Plac-bet-exo allele was the last time the RecF pathway was used for gene replacement.) Zhang and coworkers, using the Rac prophage RecET recombination system (accompanied by λ Gam) were the first to recognize that PCR-generated substrates with as little as 42 bp of homology could serve as appropriate substrates for recombination [12]. They demonstrated targeting of both plasmid and chromosomal DNA in E. coli, as well as the use of sitespecific recombinases Flp and Cre to generate marker free deletions. The Stewart lab has gone on to show the RecET can be used to modify bacterial artificial chromosomes (BACs) [19], promote DNA cloning of chromosomal fragments [13] (an extension of the RecET-promoted in vivo cloning of PCR products described by Oliner et al. [43]) and the use of λ phage annealing functions to promote single-stranded oligo (SSO) mediated repair in E. coli and mouse ES cells [44]. Concentrating more on the λ Red system, Datsenko and Wanner generated plasmid pKD46, which drives the red and gam functions from the controllable PBAD promoter, and also described the use of FRT sites to generate markerless deletions [15]. Yu et al. [14] developed a Red recombineering strain where the red and gam functions are expressed from a defective prophage (where lysis, replication and structural functions of the λ prophage have been removed). The defective prophage carries the temperature-sensitive cI repressor cI857; thus, induction of red and gam is done by heating to 42° for 15 minutes during preparation of the cells for electroporation. The tight control of the red and gam functions in this system has the potential to better control their expression to a “recombinogenic window,” preventing the generation of unwanted recombinational events, especially with highly susceptible BACs that contain large DNA inserts. In addition, the heating step, besides allowing for controlled expression of the red and gam functions, offers an advantage for the recovery of recombinants. Murphy and Campellone have found that heat shocking E. coli cells at 42oC for 15 minutes (in this case where plasmid-borne red and gam are expressed from the Ptac promoter) generates levels of total recombinants that are two-to ten-times higher [45]. It may be that the heatshock step prepares the cells for the electroporation shock, allowing higher rates of cell survival and thus higher numbers of total recombinants, though its actual effect is not known. The prophage system for λ Red delivery has been used to generate precise and unmarked modification of BACs [46], to demonstrate the efficacy of Bet-promoted single-stranded oligo (SSO) DNA-mediated chromosomal modification [40], and to identify mismatch repair-deficient genetic backgrounds (e.g., mutS) as hosts that generate high levels of SSO-mediated DNA repair (up to 25% of cells surviving electroporation—see below) [47]. Many researchers have used the Red/RecET recombineering technology to manipulate bacterial chromosomes in a variety of interesting ways. What follows is a basic description of the materials and methods used for Red/RecET recombineering, and examples of how the technology has been (and can) be used to generate specific types of chromosomal modifications.

7.1.3 Materials and Methods 7.1.3.1 Source of λ Red By far, the simplest way to deliver the λ red and gam functions is by expression from a controllable low copy-number plasmid containing a temperature-sensitive origin of replication. The red and gam functions should be inducible since constitutive expression of Red and Gam proteins at high levels can be toxic

7-6

Gene Expression Tools for Metabolic Pathway Engineering

Table 7.1 Plasmids Expressing λ red and gam Plasmid pKD46 pSIM5 pKM208 pRed/ET pTP223 pKOBEG-SacB

Operon PBAD-gam-bet-exo PL-gam-bet-exo Ptac-gam-bet-exo PBAD-gam-bet-exo-recA Plac-gam-bet-exo PBAD-gam-bet-exo-sacB

Drug Marker

Reference

Source*

AmpR CamR AmpR TetR TetR AmpR

[15] [48] [44] [49] [119] [120]

http://cgsc.biology.yale.edu/ [email protected] www.addgene.org www.genebridges.com www.addgene.org [email protected]

*Other plasmid constructs, with different drug markers, are also available from most of these resources.

and mutagenic [45,48]. The temperature-sensitive ori serves as a means to cure the host of the Red plasmid following gene replacement of the target gene. Table 7.1 lists six sources of red- and gam- expressing plasmids from various labs. Plasmid pKD46 (Wanner lab) has been used extensively and contains the red and gam genes under control of PBAD; the red and gam are thus inducible with arabinose [15]. In addition, pKD46 expresses araC, the repressor/activator for PBAD, thus allowing control of red and gam gene expression in various backgrounds. The pSIM5 vector (Court lab) was generated from λ phage sequences and contains the red and gam functions driven by the endogenous (strong) PL promoter [49]. The plasmid contains λ sequence devoid of genes between N and kil (including the transcription terminators) making expression of red and gam wholly dependent on the thermo-labile cI857 λ repressor. Thus, the recombination functions are highly suppressed at 30°C and induced by a 15-minute incubation at 42°C. In side-byside comparisons, pSIM5 generated nine-fold higher recombinants/108 viable cells relative to a pKD46 derivative, pKD119 [49]. In plasmid pKM208 (Murphy lab), the red and gam functions are driven by Ptac, and thus inducible with IPTG [45]. pKM208 worked better in pathogenic strains of E. coli relative to pKD46, probably because of more efficient expression of Ptac relative to PBAD in this background (unpublished observation). Also, pKM208 contains the lacI repressor gene, allowing for IPTG-controlled expression in various host backgrounds. The pRed/ET vector (Stewart lab) contains red and gam driven by PBAD and araC, but in addition, expresses the E. coli recA function [50]. The transient expression of recA allows for increased viability, and thus increased recovery of total recombinants when working in recA host backgrounds. Such backgrounds are typically used when engineering BACs in order to prevent unwanted DNA rearrangements. Two other plasmids used for recombineering and listed in Table 7.1 are noteworthy. The plasmid pTP223 (Poteete lab) is a multi-copy vector (colE1 ori) that drives red and gam from the Plac promoter, is TetR and expresses lacI. While it generates lower frequencies of recombineering relative to the other plasmids (perhaps because it makes too much Red proteins), it was able to transform enteropathogenic E. coli (when pSC101 replicons could not) and promoted low but measurable recombineering events in this pathogen with short homology substrates [45]. pKOBEG-sacB has been used for recombineering of Yersinia pseudotuberculosis [51]. Gene replacement occurred readily with plasmid-derived linear substrates containing 0.5 kb of flanking target homology, but with low frequency with short homology substrates. In this bacterium, temperature-sensitive pSC101 replicons could not be easily cured, so this plasmid includes sacB to allow curing by growth on sucrose. 7.1.3.2 Preparation of Recombineering Substrates The extreme ease and simplicity of recombineering technology is based on the use of PCR products as gene replacement substrates. A number of different E. coli drug markers (see Table 7.2) can serve as templates for a PCR. Template sequences can also include site-specific recombination sites (e.g., FRT or loxP sites) for removal of the drug marker following chromosomal incorporation [15], or origins of transfer (oriT) for moving modified plasmid or cosmid constructs into non-E. coli hosts [52]. Typically, two 60 mer primers are used to amplify a drug marker. The drug marker (including its promoter) is targeted

7-7

Chromosomal Engineering Strategies Table 7.2 Drug Cassettes Used for λ Red Recombineering

Antibiotic

Gene(s)

Primer Pair

Tn903 aph (typeI) CACGTTGTGTCTCAAAATCTC TACAACCAATTAACCAATTCTG Kanamycin Tn5 aph (typeII) TATGGACAGCAAGCGAACCG TCAGAAGAACTCGTCAAGAAG Ampicillin Tn3 bla CGCGGAACCCCTATTTGTTT GGTCTGACAGTTACCAATGC Tetracycline Tn10 tetRA CTCGACATCTTGGTTACCGT CGCGGAATAACATCATTTGG Chloramphenicol Tn9 cat TGAGACGTTGATCGGCACGT ATTCAGGCGTAGCACCAGGC Spectinomycin Tn21 aadA AAACGGATGAAGGCACGAA TTATTTGCCGACTACCTTGG Gentamicin Tn1696 aacC CGAATCCATGTGGGAGTTTA TTAGGTGGCGGTACTTGGGT Kanamycin

Suggested Template (Source)

Drug Conc. (µg/ ml)

Cassette Length (Base Pairs)

20

944

20

949

50

975

7

1996

10

822

20

1080

10

616

pTP858 (www.addgene.com) pBBR1MCS-2 [121] pBR322 (New England Biolabs) pTP857 (www.addgene.com) pACYC184 (New England Biolabs) pGB2 (www.addgene.com) strain TP997 (www.addgene.com)

Source: Poteete, A.R., C. Rosadini, and C. St. Pierre, Biotechniques, 41, 261–264, 2006. With permission.

with 20 bases at the 3′ end of these primers. The 5′ tails of the primers (~40 bases) contain sequences upstream and downstream of the target gene (see Figure 7.1). The PCR product, which has the drug marker flanked by 40 bases upstream and downstream of the gene of interest, is purified using a PCR cleaning kit, suspended in a small volume of TE buffer or water, and electroporated into an E. coli strain that expresses the λ red and gam functions (as described below). 7.1.3.3 Preparation of Electrocompetent/Recombinogenic E. coli Cells Below is the protocol we have used for preparing electrocompetent cells containing pKM208, and can be modified accordingly for the various red and gam-expressing constructs listed in Table 7.1. We typically grow 20 ml of cells containing pKM208 (enough for two electroporations) in a 125-ml flask at 30°C; multiple flasks are used dependent upon the number of electroporations required. A single colony (or 100 µl of an overnight culture) is used to inoculate 20 ml of LB containing 100 µg/ml ampicillin; the culture is incubated at 30°C with shaking. When the culture reaches 107 cells/ml (slight cloudiness around 0.5–1 hour prior to collection), IPTG is added to a final concentration of 1 mM. When the culture reaches 0.5–1 × 108, the cells are heat shocked for 15 minutes by swirling at 42°C, transferred to an icewater bath for 10 min with swirling, then collected by centrifugation. The cells are resuspended in 1 ml of ice-cold 20% glycerol–1 mM MOPS (unbuffered), transferred to a 1.5 ml sterile Eppendorf tube, and spun in a microfuge for 30 seconds (moderate speed). The supernatant is removed by decanting without disturbing the pellet, followed by removal of residual buffer with a pipet tip. The cells are resuspended in 1 ml 20% glycerol–1 mM MOPS and recentrifuged. This step is repeated. The cells are finally resuspended in 90–100 µl of ice-cold 20% glycerol–1 mM MOPS. This protocol is a variation of descriptions for the preparation of eletrocompetent cells for recombineering that use either 10% glycerol [12,15] or ice cold water [53]. We have found that the use of water in preparing cells for electrocompetence sometimes leads to partial cell lysis, resulting in a low percentage of or lack of recombinants. 7.1.3.4 Electroporation and Plating A 50 µl sample of cells is electroporated with 1–5 µl of DNA (0.1–0.5 µg) in TE or water (2000 volts in 0.1 cm cuvettes). Include a control electroporation that contains no DNA. Cuvettes are cooled in ice-water for at least 10 minutes prior to use. DNA and cells are mixed in a 1.5 ml Eppendorf tube,

7-8

Gene Expression Tools for Metabolic Pathway Engineering

transferred to the cuvette, and incubated on ice for 1 minute. The cuvette is thoroughly (but quickly) dried, and the cells are shocked. Immediately following electroporation, the cells are resuspended in 0.5 ml of LB, diluted into 2.5 ml of LB, and grown in a roller for 1.5–2 hours at 37oC. The culture is plated directly, or concentrated by a brief microcentrifugation step, and then plated on appropriate antibiotic plates. In some cases (e.g., counterselection with sacB), one should let the culture grow for several hours (or even overnight) prior to plating to allow cells that contain several copies of their chromosomes (only one of which is likely targeted) to replicate and segregate their DNA. 7.1.3.5 Selection and Verification of Recombinants In most cases, the recombineering substrate is simply a drug cassette flanked by 40–50 bp of target DNA (see substrate preparation above). Thus, selection of recombinants is performed by plating the electroporated cells on antibiotic-containing plates. One must remember that targeting the chromosome will involve selection of the drug marker at single copy. This usually means (though not always) selecting recombinants at drug concentrations much lower than one would typically use for growing the same marker on a multicopy plasmid. Drug markers we have used for generating recombineering substrates, and the concentrations of the drugs used to select recombinants, are listed in Table 7.2. In practice, it is best to use the lowest concentration of drug that will allow the selection of recombinants, but that still efficiently suppresses the growth of nonrecombinants. Restreak transformants on fresh drug-selection plates, applying ~40 stabs per plate, especially if multiple-sized and colored colonies are evident. Sometimes, false-positive KanR and CamR colonies arise among the “true recombinants.” True recombinants are identified by their ability to grow well after restreak (relative to the false-positives) on fresh antibiotic-containing plates. At this time, one can also streak onto ampicillin or tetracycline plates (depending on the Red plasmid used), to identify candidates that have spontaneously lost the Red-producing plasmid. We have found that Red plasmids pKM208 and pTP223 are lost at variable frequencies (10–50%) during competent cell preparation and electroporation. A minilysate preparation can be used to identify if the strain is truly cured, or if a deletion variant of the Red plasmid that lost the drug marker is still present. For most of the plasmids listed in Table 7.1, growth at 42°C (pKD46 or pKM208) or growth at 37oC (pSim5) can be used to cure the recombinant of the Red-producing plasmid. Alternatively, P1 transduction (if possible) of the marked deletion or insertion to a clean genetic background is usually the best policy. Phenotypic screens (if available) and/or PCR analysis can be used to verify the presence of the gene deletion or insertion. For PCR analysis, one typically uses primers that target the recombination substrate (i.e., the drug marker) and a region of the chromosome not present in the electroporated substrate (see Figure 7.1). Alternatively, one can target regions flanking the deletion or insertion site, and look for differences in the restriction enzyme pattern or size of the PCR product (relative to PCR product generated from the unmodified chromosome). 7.1.3.6 Other Considerations for Recombineering Success

a. Design the primers to place the drug marker colinear with genes surrounding your target gene (if possible). If one orientation doesn’t work, try the other. We have seen context effects on drug expression from the chromosome (i.e., the drug marker works well in one direction, but not at all when turned around). b. Don’t use intact plasmids for PCR templates. The plasmids (even at low concentration) transform at a high efficiency during electroporation creating false-positives. If one uses a drug marker from a plasmid as a PCR template, gel purify a fragment of the plasmid that contains the target sequence (but does not contain the origin of replication) prior to PCR. Better yet, use a colony or an overnight culture containing a chromosomal drug marker for the template (5 µl of overnight culture in a 100 µl PCR works well), or a use a target contained within a nonreplicative plasmid. If a plasmid is used as a template, digestion with the DpnI (which requires a methylated GATC target site for efficient restriction) prior to electroporation can also be used to reduce plasmid transformants [12].

Chromosomal Engineering Strategies

7-9

c. If there are problems obtaining recombinants, follow cell survival after electroporation by plating the culture on LB plates. In the final 3 ml culture, a survival count of 107–8/ml or more is desired. Cell survivals of less than 106 indicate excess killing. Since frequencies of gene replacement are typically in the range of 10 –4–10 –5/ survivor, excess killing will prevent successful recovery of recombinants. If this happens, search for possible salt contamination in the electrocompetent cell preparation or DNA substrate, lower the electroporation voltage and/or decrease or eliminate the heat-shock exposure.

7.1.4 Recombineering with Single-Stranded DNA Oligos In the reaction of Red proteins with a dsDNA linear substrate, Exo binds to a dsDNA end and generates 3′ ssDNA end, which then serves as the substrate for binding by Bet protein. It was reasoned that if one started with ssDNA (say a DNA oligo), then λ Exo might not be required to generate a recombination event. This idea was verified by Ellis et al. [40] who showed that Bet alone could promote the annealing of a 70-base ssDNA oligo (SSO) to the E. coli chromosome. Targeting an amber stop codon within the chromosomal galK locus, they generated Gal+ recombinants by electroporation with a 70-base oligo that contained the wild-type sequence of the galK gene. The frequency of recombination was 2 × 105 recombinants per 108 viable cells, a higher rate than typically seen with similar dsDNA substrates. They showed that Bet is required, Gam is stimulatory (five-fold), and Exo and RecA are not necessary for SSO-mediated recombineering. Oligos containing 40–60 bases resulted in a five-fold drop in recombination. The small size of the oligo means it cannot carry a selection marker, limiting the usefulness of ssDNA oligo-mediated recombineering in many cases. However, if a phenotype can be selected or screened, it offers the easiest way to make single and multiple base pair changes in the E. coli chromosome. Furthermore, performing recombineering in hosts deficient for MMR (see below) can greatly increase the frequency of recombinants (in some cases, up to 25% of the viable cells). The procedure for SSO-mediated recombineering is identical to that described above for PCRgenerated dsDNA substrates, except one electroporates with a ssDNA oligo. The salt-free oligo (70 bp is optimum) should be dissolved in TE buffer to a final concentration of 1 mM and stored at –20 oC. A working solution of the oligo is made (10 µM); 0.5–1.0 µl of this dilution is mixed with 50 µl of electrocompetent cells and shocked as described above. The base change desired should be positioned toward the middle of the oligo, allowing the 5′ and 3′ ends to become fully annealed to the target site. Either DNA strand of a target site can be the source for the oligo, but oligos that are complementary to the lagging strand of the replicating target site display a higher frequency of recombination relative to oligos that target the leading strand. It is based on this observation that Ellis et al. [40] suggested that the Bet protein targets the replication fork during SSO-mediated recombineering. Once annealed to ssDNA regions of the replication fork, an oligo designed to introduce a base change forms a base-pair mismatch, which serves as a substrate for the E. coli MutSLH MMR system [54,55]. MMR will specifically remove the oligo-directed base change, since it resides in the newly synthesized unmethylated strand. (The unmethylated strand is the signal that directs the MMR system to identify which of the mismatched bases is the “error”.) Thus, MMR should be inhibitory to Red-mediated SSO-promoted recombineering. Constantino and Court showed this to be the case, demonstrating that SSO-recombineering frequencies were increased by a factor of 100 or more when performed in cells deficient for MMR [47]. The problem with this scheme, of course, is that working in a MMR deficient genetic background can leads to the generation of unintended mutations, so the target region of the recombinant will need to be sequenced. Also, post modification, the altered target should be moved to a nonmutagenic background. As an alternative to the use of MMR repair mutants, incubation with 75 µg/ ml 2-aminopurine has been shown to increase the levels of SSO-mediated recombineering in wild type hosts [47], but not to the levels seen in MMR-deficient strains. By careful design of the SSO, however, one can increase the frequency of SSO-mediated recombineering in wild type cells by recognizing that certain mismatches are not repaired well. The E. coli MMR

7-10

Gene Expression Tools for Metabolic Pathway Engineering

system repairs the mismatches in the following order: G/T, A/C, A/A, G/G > A/G, T/C, T/T > C/C. In fact, C/C mismatches are not repaired at all, meaning that G to C changes can be made at high frequency in wild type cells by the use of ssDNA oligos [47]. Also, the efficiency of E. coli’s MMR system is limited to small unrepaired regions between one and three mismatches. Thus, by including in the oligo a sequence that generates a series of four to six consecutive mismatched bases, one can avoid interference of gene modification by MMR. In a two-step method to generate a single base pair change using SSOmediated recombineering, one can design an oligo to make six changes in a small region of the target gene at high frequency. In the second recombineering step, all of them are changed back to the original sequence, with the exception of the desired base pair change [56].

7.1.6 Counter-Selection Schemes Selecting for the absence of a particular gene (counter-selection) has been a popular way to introduce an unmarked mutant gene or engineered operon into bacterial chromosomes. By far, the most popular counter-selection function for this purpose is sacB, a gene from B. subtilits that encodes a levansucrase that transfers fructosyl residues from sucrose to various cellular constituents [57]. Expression of sacB kills E. coli, and many other bacteria, in the presence of sucrose. When sacB is placed alongside an antibiotic drug marker (e.g., cat), one can select for both insertion (chloramphenicol resistance) and removal of DNA containing this cassette (sucrose resistance). The removal of a cat-sacB cassette (and insertion of the DNA of interest) is accompanied by conversion to chloramphenicol sensitivity. Such a cassette containing a kan resistance marker and sacB was first described by Ried and Collmer [58] for use in E. chrysanthemi. While this scheme has been used for years in conventional gene replacement strategies, its use in conjunction with Red recombineering technology has allowed for quick and easy construction of marker-free BAC and chromosomal gene replacements and modifications [16,45,46]. The cat-sacB cassette is amplified by PCR with primers that contain target DNA sequences on their 5′ ends, as described above for single drug cassettes. One disadvantage of this cassette is its rather large size (~3 kb) making for larger PCR products that maybe more difficult to amplify relative to single drug cassettes. However, high fidelity/processive polymerases have made this less of a problem in recent years. After electroporation of the PCR product into Red/ET-expressing cells, one selects for CamR colonies (screening them for sucrose sensitivity), as well as for retention of the AmpR Red-producing plasmid since a second recombineering step is required. In the next step, electroporation of the desired construct is performed (prepared by PCR or from a plasmid digest), which replaces the cassette to generate SucR CamS colonies. While SucS cells are known to revert to SucR at high frequency (1 × 10 –4), these spontaneous revertants can be distinguished from true gene replacements by their resistance to chloramphenicol. In most cases, the frequency of Red recombination is at least as high as (and many times higher than) spontaneous reversion to SucR, such than many recombinants can be easily found. Importantly in this process, the cells must be allowed to grow 4–5 hours (or overnight) following electroporation to segregate the chromosomes that have lost sacB from the ones that still retain sacB. This step is critical, since as far as sucrose sensitivity is concerned, sacB + chromosomes are dominant to sacB- chromosomes. Other counterselection schemes used previously for plasmid counterselection have recently been used for λ Red-promoted chromosomal manipulations [59,60]. In these cases, the genes galK and thyA are used for both the initial selection (by growth on minimal media) and counterselection (by growth in the presence of 2-deoxy-galactose and trimethoprim, respectively). A disadvantage with these cassettes is that they can only be used in strains that have been previously deleted of galK (or thyA), thus creating the auxotrophy that is used in the first selection step. Nonetheless, they are powerful tools to make seamless modifications of BAC constructs and bacterial chromosomes. Another counterselection scheme that has been examined for use in Red recombineering is the rpsL gene (Poteete and Murphy, unpublished observations). The rpsL gene for E. coli encodes the ribosomal

7-11

Chromosomal Engineering Strategies

protein S12. A mutant version of the gene encoding a K43R mutant protein confers resistance to streptomycin (rpsL31). When present in the chromosome, the streptomycin resistance conferred by the mutant allele can be suppressed by the wild type version of rpsL, making cells sensitive to streptomycin. [61] A cat-rpsL allele has been constructed, that contains a strong artificial IPTG-dependent promoter that allows for overexpression of wild type rpsL. When placed into the chromosome of an E. coli (rpsL31) strain, it suppresses the streptomycin resistance conferred by this allele. However, in wild type cells, the cassette is lost spontaneously at high frequency following induction with IPTG, probably as a result of high expression the artificial promoter at the genetic loci tested. Nonetheless, when used in cells overexpressing the LacI repressor (where it is assumed the promoter is more moderately expressed) it has been used successfully to replace lacZ with an inactive version of the tet gene (Murphy, unpublished). Also, a “green cat” cassette has been constructed that contains the chloramphenicol resistance gene and the green fluorescent protein (Poteete, unpublished results). Insertion of the cassette is selected by both CamR and green colored colonies, and the loss of the cassette is monitored by loss of GFP and CamS. Using this cassette, an ampicillin enrichment protocol [16] was used to find a markerless deletion of the E. coli recG gene (Poteete, unpublished results).

7.1.7 Flp and Cre Site-Specific Recombination (SSR) Systems Flp and Cre are members of the λ integrase family of site-specific recombinases, and have been used in bacteria, yeast, plants, Drosophila and mammalian cells over the last 15 years [62–69]. The Flp recombinase (from the Saccharomyces cerevisiae 2 µM plasmid) and the Cre recombinase (from bacteriophage P1) recombine their target sequences (FRT and loxP, respectively) at high efficiency and without the need of endogenous host factors. It is these properties that allow Flp and Cre to be used in a variety of different organisms to create insertions, deletions, inversions, duplications, and translocations. Their use has actually been more prolific in mammalian systems, where they are the key genetic tools for the construction of conditional mouse mutations [68,69]. Flp and Cre, both serine recombinases, share a common mechanism of recombination where all four strands of the recombination targets are cut prior to strand exchange and religation [70]. The target sites FRT (Flp recognition target) and loxP (locus of crossover X in P1) are different sequences, though they share common structural motifs which include 13 bp inverted repeats separated by an 8 bp asymmetric spacer sequence. These 34 bases are the minimal functional sites for both these recombinases (see Table 7.3). Because of the asymmetry in the spacer regions, the relative orientation of the target sites to one another defines the type of chromosomal rearrangement. Recombination between two directly repeated target sites generates a deletion of the intervening DNA and a circular molecule, whereas recombination between two inverted target sites generates an inversion of the sequence between the sites. The first event is essentially irreversible (due to loss of the excised circle), while the latter event is fully reversible (since both target sites are maintained on the same linear DNA). The use of these sitespecific recombination systems, in combination with Red/ET recombineering, is described in various chromosomal engineering strategies as outlined in the next section.

Table 7.3 Site-Specific Recombination Systems Used for Drug Marker Eviction in E. coli System

Target Site

Flp/FRT

GAAGTTCCTATTC (N)8 GTATAGGAACTTC

Cre/loxP

ATAACTTCGTATA (N)8 TATACGAAGTTAT

Plasmids pCD20 PBAD33-Flp 1921-cIFLP pJW168 PBAD75Cre 1921-cICre

Details

Reference

λ PR cI857 Rep Amp Cam PBAD R1 origin KanR λ PR cI857 SpecR PlacUV5 Repts AmpR PBad Repts CamR λ PR cI857 SpecR ts

R

R

[122] [123] [124] [125] [126] [124]

7-12

Gene Expression Tools for Metabolic Pathway Engineering

7.2 Strategies of λ Red-Promoted and SSR-Mediated Modification of the Bacterial Chromosome 7.2.1 Gene or Operon Replacement Red/ET recombineering is most commonly used to generate simple gene replacements, substituting a drug marker in place of the gene of interest. Given the simplicity of the process, it is often just as easy to order the primers one needs to make a knockout, even though a mutant allele of the gene of interest already exist. When an existing mutant allele contains a drug marker that is not compatible with the genetic system under study, the easiest route is to simply recombineer the gene knockout with the marker of one’s choice. (Also, ordered primers typically arrive long before requested materials.) Since most of the drug markers used in recombineering tend to be 1 kb or less (see Table 7.2), gene replacements usually involve deleting a typically sized gene of 1–2 kb with a similar-sized fragment. The size of the deletion seems to have little effect on success of the gene or operon replacement using a 70-bp ssDNA oligo, Ellis et al. generated both a 3,300 bp deletion and a single base pair change with equal frequencies [40]. In enterohemorrhagic E. coli (EHEC), Campellone et al. generated multiple deletions of pathogenicity islands ranging from 5 to 45 kb in size [71]. The method of λ Red-mediated disruption is now the method of choice for E. coli gene knockouts, and has led to the systematic knockout of 3985 of 4288 targeted E. coli genes (the Keio collection) [72]. This collection, where the open reading frames were replaced with a kanamycin marker flanked by FRT sites, is now available from the E. coli Genetic Stock Center (http://cgsc2.biology.yale.edu). The straightforward replacement of a target gene by a drug marker is depicted in Figures 7.1 and 7.3a.

7.2.2 SSR-Mediated Excision of Drug Markers The scheme for excising drug markers with Flp is outlined in Figure 7.3b. This is most easily done by using a drug marker cassette that is flanked by FRT sites as a template for PCR. (The drug marker is said to be “flexed”; if loxP sites were used instead, the marker is “floxed.) Alternatively, the FRT sites can be included within the primer sequences used for the PCR. Following target gene replacement, the drug marker can be removed by transformation with a Flp-expressing plasmid that contains a temperaturesensitive origin of replication. Induction of Flp excises the drug marker, and the drug-free recombinant can be cured of the Flp plasmid by growth at the nonpermissive temperature. This procedure then allows the same drug to be used to target a different location in the chromosome. The sequence of FRT and loxP target sites and plasmids that expressed Flp and Cre recombinases are listed in Table 7.3. A word of caution is noted with this procedure. Once the Flp-promoted recombination event has occurred, a single FRT site is left behind. Together with any exogenous primer sequences, the FRT site represents a “scar” that is left over within the chromosomal deletion. This scar can have two general (unintended) effects. Once inserted into the target gene (depending on its orientation), the scar could have polar effects on the expression of genes downstream of the target gene. In recognition of this fact, Datsenko and Wanner included an idealized ribosome binding site and start codon within one scar to prevent such polar effects [15]. The second concern is that if the excision of a drug marker were used in a reiterative fashion, multiple scars could become targets for the Red recombinase, generating large unintended deletions or inversions in the bacterial chromosome. Given the small sequence of the scar, the frequency of such events is likely low and would depend on the number and proximity of multiple scar sites.

7.2.3 Large Deletions An efficient way to generate large precise deletions involves the use of site-specific recombinases Flp or Cre. Ayres et al. [73] have shown that loxP sites transferred to the broad host range plasmid RK2 could be used to generate precise deletions in RK2 following expression of Cre. More recently, others have

7-13

Chromosomal Engineering Strategies Chromosomal manipulation

Starting or intermediate structure

Final product

(a) Marked gene deletion (b) Unmarked gene deletion

(c) Chromosomal deletion 1 (d) Chromosomal duplication

(e) Chromosomal inversion -

1

2

2 P

P

(f ) Reporter fusion

(g) Promoter engineering

Pm

Po

*

or * Pm

Figure 7.3 Chromosomal manipulations promoted by Red/ET and Cre/Flp site-specific recombination (SSR) systems. Chromosomal target genes (dark gray boxes); drug markers (light gray boxes); loxP or FRT sites (black triangles); PCR primers used for recombineering (black arrows); reporter gene (hatched box). (a) Generation of a simple marked gene deletion. (b) Unmarked deletion—site-specific recombination (SSR) sites flank the drug marker, allowing for removal of the drug cassette by in vivo expression of the SSR recombinase following Red/ET-promoted gene replacement. (c) Chromosomal deletion—two SSR target sites are placed in the same orientation at different regions of the chromosome by Red/ET-promoted recombineering (via incorporation of 2 different drug markers). In vivo expression of the SSR recombinase generates a large deletion of the chromosomal region between the 2 SSR target sites, including the drug markers. (d) Gene duplication—an amplified drug marker is flanked by chromosomal sequences defined by primers 1 and 2. Red/ET recombination between the sequences defined by primer #1 is followed by a second recombination event at sequences defined by primer #2, generating a duplicated region of the chromosome. Growth of the transformant on antibiotic-selection plates ensures the presence of the duplication. (e) Chromosomal inversion—two SSR target sites are placed in the opposite orientation at different regions of the chromosome by Red/ET-promoted recombineering (via incorporation of 2 different drug markers). In vivo expression of the SSR recombinase generates a large inversion of the chromosomal region between the 2 SSR target sites. (f) Reporter fusions—a cassette containing a reporter gene (hatched box) and a drug marker (light grey box) is amplified by PCR, and recombineered in place of the target gene. The cassette can contain SSR target sites to create an unmarked fusion (if so desired). (g) Promoter engineering—a drug marker is incorporated just upstream of the promoter region (Po) of a target gene by Red/ET recombineering. One primer (denoted by asterisk) is degenerate at selected positions to create a variety of promoter mutations on the chromosome (top picture). In this case, however, one might interfere with regulatory features upstream of the promoter under study. As an alternative (see bottom picture) one could use the degenerate primer to amplify the target operon. The PCR product is designed to overlap a second PCR product containing a selectable drug marker. Overlap PCR between the two PCR products would generate the recombineering substrate containing variable promoter mutants that can be crossed into the chromosome.

7-14

Gene Expression Tools for Metabolic Pathway Engineering

modified the system by delivery of loxP sites to the E. coli chromosome by Tn5 transposons [74] or λ Red recombineering [75]. The random insertion of loxP sites into the chromosome by transposons, followed by Cre expression, is useful to assess the essentiality of various regions of the genome, though it takes time to identify the endpoints of the deletions. Alternatively, the placement of loxP sites at defined regions of the chromosome by λ Red recombineering allows one to design precise chromosomal deletions in a systematic manner. In this latter scheme (diagrammed in Figure 7.3c), a loxP site can be engineered to be adjacent to a drug marker, which is then selectively targeted by Red to a specific region of the chromosome. A second loxP site (in the same orientation as the first) is recombineered to a second site using a different drug marker. The arrangement is set up so that one or both drug markers lie in between the two loxP sites, depending on whether one wishes to have a marked or unmarked deletion. Once isolated, the strain is transformed with an AmpR Cre-expressing plasmid, transformants are allowed to form colonies on Amp plates, and the deletion-containing drug-free segregants are found by plating (and verified by PCR). Fukiya et al. [75] have reported that, following isolation of the double-loxP containing intermediate, 117- and 165-kbp deletions were formed in this manner in E. coli with 100% efficiency an indication of the high efficiency of the Cre-loxP recombination system.

7.2.4 Seamless Deletions The use of the Flp and Cre to remove drug markers is a highly efficient way to generate marker-free deletions. However, as discussed above, the site-specific target site is left over as a “scar.” Many times, one desires an in-frame deletion in the gene of interest in order to prevent polar effects on genes downstream. For the generation of a precise deletion in the E. coli chromosome, a two-step method that includes the use of a counter-selection step is required. Any one of the counter-selection markers discussed above is crossed into the template site by Red recombineering. Once selected via the drug marker, the presence of the counter-selection maker is verified (e.g., sucrose sensitivity with sacB). In the second step, replacement of the counter-selection marker with a specifically designed DNA fragment that contains the precise deletion, insertion, or point mutation is carried out. Verification of the chromosomal modification is done by PCR size analysis with primers flanking the target site, and/or by sequencing of the PCR product. Typically, for a simple clean deletion, we have used the cat-sacB cassette for the first step, then crossed a DNA fragment that contains the first three codons fused to the last three codons of the target gene [16]. This convention allows for precise deletion of the gene of interest without generating any polar effects on the expression of downstream genes. Alternatively, a 70-base ssDNA oligo can be used which contains 3′ and 5′ sequences that flank the sequence to be deleted, as describe Section 7.1.4 and by Ellis et al. [40]. Such a deletion can be generated in one step, though the frequency would have to high enough (~1 in 102–103 events) to screen for the desired deletion (i.e., one would need a PCR-based screen or an expected phenotype in order to find the transformants containing the deletion). A second way to generate markerless gene replacements in E. coli was first described by Posfai et al. using suicide plasmids [76], and further developed by Kolisnychenko et al. using linear DNA substrates and Red recombineering [77]. This scheme is outlined in Figure 7.4 and takes advantage of I-SceI meganuclease restriction enzyme, which recognizes an 18 bp restriction site not present in the E. coli chromosome. In this procedure, an initial marked deletion is generated in the chromosome using Red as described above, with two modifications. First, the target for the PCR is a drug marker flanked by two I-SceI sites. Secondly, primer 1 is composed of sequences that fuse region A (which defines the left side of the final deletion) with region B (which defines the right side of the final deletion—see Figure 7.4). A PCR product generated with primers 1 and 2, after crossed into the chromosome (shaded region in Figure 7.4), generates a 40–60 bp duplication of chromosomal DNA (marked as box B in Figure 7.4). The strain carrying the deletion is isolated, verified, and electroporated with a SceI-expressing plasmid that carries a temperature-sensitive origin of replication. The SceI sites flanking the drug marker are cut, generating two broken ends in the chromosome. The dsDNA break is repaired via RecA-mediated

7-15

Chromosomal Engineering Strategies

I

A

C

B

Red recombineering SceI

1. II

A

B

SceI Drug marker

2. C

B

SceI cutting III

A

B

C

B

RecA recombinational repair IV

A

B

Figure 7.4 Markerless gene replacement in E. coli using Red/ET recombineering and the meganuclease I-SceI. (I) Region of chromosome to be deleted (sequence between A and B). (II) Primers 1 and 2 are used to generate a PCR product that is crossed into the chromosome by Red/ET recombineering. The drug marker used as the template for the PCR is flanked by SceI sites. The endpoints of the deletion are ultimately defined by the sequence of primer #1 (see text for details). (III) In vivo expression of SceI restriction enzyme generates a dsDNA break. (IV) RecA-promoted recombinational repair (between sequences denoted by B) occurs at high efficiency to generate a precise (unmarked) deletion.

recombination (which occurs at high frequency since nonrepair is lethal) using the B box sequences. The intramolecular recombinational repair event results in an unmarked chromosomal deletion whose endpoints are defined by the sequence within primer 1. Primer 1 (100–140 bp) is actually a composite primer, made by the annealing of two overlapping oligos, followed by a filling-in reaction. Using this method, Kolisnychenko et al. [75] deleted 0.37 Mb of E. coli DNA consisting of cryptic prophages, transposons, damaged genes and genes of unknown function. This strain MDS12 resulted in a 8.3% reduction of genome content (see genome reduction in Section 7.2.10 below). This method for generating seamless knockouts was later employed for use in BAC vectors [78].

7.2.5 Insertions With recombineering, various antibiotic and counter-selection cassettes (between 0.8 and 3.5 kb in size) have been typically used as foreign DNA inserts into the E. coli chromosome (most often replacing an endogenous gene). While there have been no direct comparisons, large insertions (greater than 4 kb) using λ Red/ET technology tend not to be as successful as large deletions. The failure to generate large insertions via recombineering may have to do with the difficulty in coordinating the annealing of two spatially distant dsDNA ends to a replication fork, though the exact mechanism of this process is not yet known. However, other nonrecombineering schemes for generating large insertions into bacterial chromosomes are described in Section 7.3 below.

7.2.6 Duplications and Inversions Duplications and inversions of defined sequences within the E. coli chromosome can be constructed by classical genetic means [79,80], but the use of λ Red/ET recombineering has allowed such manipulations to be made very easily and precisely. For generating duplications of a region of the E. coli chromosome,

7-16

Gene Expression Tools for Metabolic Pathway Engineering

a proposed mechanism of duplication formation in S. typhimurium by Galitski and Roth [81] can be engineered by λ Red to promote such duplications at high frequency. As shown in Figure 7.3d, a drug marker is generated by PCR with flanking sequences as shown. A recombination event occurring in the interval represented by arrow #1 generates a dsDNA break that has as a terminus the sequence represented by arrow #2. Subsequent λ Red-promoted DNA repair, via recombination at the sequence represented by arrow #2 on a sister chromosome, generates a duplicated region of the chromosome. The duplicated regions are separated by the drug marker, so that growth of the cells in the presence of the drug selects against loss of the duplication. It was just this sort of event that allowed Poteete et al. [82] to identify cryptic prophage sequences in the E. coli chromosome, following the electroporation of PCR substrates that contained homology to lacZ on only one side of a cat marker. The size of the duplications allowed by such a mechanism has not been fully investigated, but duplications over 1 Mbp have been observed [82]. For inversions, the easiest method is a two-step process. In the first step, the region to be inverted is deleted and replaced with a counter-selectable marker. In the second step, primers are selected that target the region to be inverted, with flanking homologous sequences that result in inversion of the segment following excision of the counter-selectable marker. For inversion of much larger regions (3 kb or more), λ Red technology can be combined with site-specific recombination systems (SSRs). In this method, one places an SSR target site (e.g., FRT) at the endpoints of the region one desires to invert, as described above for creating large deletions, with the exception that the two loxP sites are placed in inverted orientation (see Figure 7.3e). Again, two different drug markers are used to precisely place loxP sites at the endpoints of the region to invert. Transformation of the double loxP transformant with a Cre-expressing plasmid results in inversion of the region between the loxP sites. Following curing of the strain by the Cre plasmid (typically by growth at 42oC), the inversion event is verified by PCR analysis. Since inversion is 100% reversible, only half of the cells will contain an inversion of the target sequence.

7.2.7 Reporter Fusions The construction of transcriptional and translation fusions to chromosomal genes has been key to the study of gene regulation and expression in E. coli and other bacteria. Methods for the construction of genetic fusions have been described that place gene fusions on multi-copy plasmids, or in single copy within the chromosome via plasmid-chromosomal integrants [83–86], at phage-attachment sites [87,88], or by the use of the site-specific Flp recombinase system [64]. Transposons have also been widely used to generate genetic fusions [86], which by necessity are randomly generated and generally create null mutations in the target gene. The onset of λ Red/ET recombineering has made the construction of genetic fusions a straightforward manipulation, though the full potential of λ Red for such constructions has yet to be fully exploited. Uzzau et al. have shown that epitope tags can be easily placed into the C-terminal encoding regions of a number of chromosomal genes in Salmonella enterica serovar Typhimurium [89]. FLAG tag sequences were incorporated into primers that targeted a Cam or Kan drug cassette flanked by FRT sites. The 5′ regions of one of the PCR primers targeted the translation stop signal of the target gene. The chromosomal fusion was then generated by λ Red recombineering; Flp recombinase could be used (if needed) to remove the downstream drug marker. Ellermeier et al. [90] used the combination of FRT sites and λ Red system to target single copy lac fusions to the E. coli chromosome. In this method, a drug marker flanked by FRT sites is placed downstream of the promoter of interest, followed by excision of the drug marker by Flp recombinase. This single chromosomal FRT site then becomes the substrate for recombination with FRT-containing replication-deficient plasmids that carries lacZ and lacY, designed in such a way to create either a transcriptional or translational fusion to the promoter of interest. No molecular cloning is necessary. In another study, λ Red recombineering has been used to recombineer the GFP+ -encoding sequence into the proV gene of Salmonella strain LT2 to follow expression of the

Chromosomal Engineering Strategies

7-17

osmotically regulated proU gene [91]. The use of the GFP+ variant increased the sensitivity of the assay, allowing the chromosomal gfp fusion to work as well as an identically located lacZ fusion for measuring proU expression effects due to different salt concentrations. In the same study, Hautefort and colleagues also showed that λ Red can be used to easily place promoter-gfp fusions anywhere in the chromosome by recombineering. To do this, the promoter region of interest was generated by PCR, then cloned into a plasmid downstream of the T1 terminator and upstream of a promoterless gfp. This segment, including an adjacent cat gene, was then amplified by primers that contained 50 nucleotide tails homologous to the chromosomal target site. The transcriptional fusions were placed in single copy at a site different from the gene under study (in this case, the putPA genes, a site not important for Salmonella infectivity). In this scenario, the target genes themselves are left intact at their endogenous positions, so that studies geared toward the regulation of virulence genes can be performed without attenuation of the pathogen. In a two-step process utilizing an rpsL-tetA counter-selection marker, Dolphin and Hope recently constructed a marker-free gfp translational fusions in a C. elegans fosmid clone, which was subsequently used to generate transgenic worms for expression analysis [92]. Gerlach, et al. [93] have described a set of plasmids containing various reporter functions (including luciferase, DsRed and phoA) adjacent to a flexed aph gene (KanR) that can be used as templates for PCR to generate recombineeing substrates for the construction of transcriptional or translational fusions on bacterial chromosomes. As such, fusions can simply be engineered by recombineering the reporter-drug cassette into the region just downstream of the Shine-Delgarno ribosome binding site of the target gene (for transcriptional fusions), or within the coding region of the target gene (for translational fusions) (see Figure 7.3f).

7.2.8 Random Mutagenesis of Chromosomal Genes Random mutagenesis a specific gene or promoter has generally been done with plasmids. Chemical mutagenesis of a plasmid containing the gene of interest, passage of the plasmid through a mutagenic host (e.g., mutD), or cloning of a gene following PCR mutagenesis are typical ways in which random mutations are generated. Characterization of the plasmid-encoded mutated gene typically follows, which can at times (because of overexpression) leads to erroneous conclusions regarding the nature of the mutant protein. For this reason, investigators cross mutated genes into the bacterial chromosome in place of the wild type gene, so that phenotypic comparisons to the wild type host can be made directly. These laborious substitutions are usually done one at a time. Random mutagenesis of a chromosomal gene can be performed by the use of λ Red/RT recombineering technology. De Lay and Cronin recently described mutagenesis of the acpP (acyl carrier protein) gene in the E. coli chromosome [94]. The authors derived three temperature-sensitive mutants in acpP using this technique (where no conditional lethal acpP mutants had been found previously). The process entails a combination of error-prone PCR mutagenesis [95], overlap extension PCR [96] and λ Red recombineering. In the first step, mutagenic PCR is carried out targeting the gene or region of interest. A second nonmutagenic PCR is performed on an adjacent chromosomal sequence that contains a drug marker (previously generated by recombineering). Primers for both PCRs are designed so that the products contain a 20 bp overlap, and fusion of the PCR products is carried out by overlap extension PCR. In the final step, the overlap extention product is electroporated into λ Red expressing hosts. Mutagenic screens can be employed on drug-resistant transformants to identify chromosomal mutants in the gene of interest. It is presumed, though not reported, that chromosomal random mutagenesis could be performed without the need to incorporate a selectable marker if the mutated PCR product were transformed into a MMR-deficient host (i.e., E. coli mutS) expressing red and gam functions. In the absence of MMR, mutated sequences would be incorporated at high frequency, allowing screens or selections to be simply applied to the survivors following electroporation.

7-18

Gene Expression Tools for Metabolic Pathway Engineering

7.2.9 Promoter Engineering The protocol for chromosomal random mutagenesis described above could also be applied to promoters of genes within a metabolic pathway (and other regulatory elements). However, a more systematic approach (for example, increasing metabolic flux or biological yield) would be to replace key genes within a pathway with one or more variant promoters. This method to fine-tune genetic control of a metabolic process by altering the promoters of genes in the pathway is known as promoter engineering. In this scheme, libraries of artificial promoters are created and transferred to the chromosome to test their effects on the metabolic flux of a pathway of interest. Jensen and Hammer [97] described a library of artificial promoters (in Lactococcus lactis) by using PCR mutagenesis to vary the spacer region between the –10 and –35 regions of a consensus promoter, though unintended mutations in the consensus regions (in some of their clones) resulted in a library with promoter strengths covering three orders of magnitude. While initial characterization of these promoters was done with plasmid constructs, the real value of such a collection is evident when the promoters can be engineered into target genes on the chromosome, where their effects on metabolic flux and yield can be determined. This process can be a tedious task when done by the classical route of gene replacement via plasmid integration and resolution [98]. More recently, the λ Red recombination system has made this approach more feasible, allowing chromosomal promoters to be easily replaced with artificial constructs. Meynial-Salles et al. reported the construction of a small library of clones that express the lacZ gene from three short glucose isomerase promoters of different strengths [99]. They demonstrated a range of in vivo β-galactosidase activities ranging from two-fold higher to four-fold lower relative to the level seen with the wild type lacZ locus. This group went on to show that other regulatory elements, such as the Shine-Delgarno sequence (ribosome binding site) and mRNA stabilizing sequences could be randomly modified to create chromosomal operons exhibiting an even wider range of lacZ expression. The authors also generated a library of E. coli clones that expressed glyceraldehyde-3-phosphate dehydrogenase at different levels, and subsequently screened for the clone with the highest level of glycerol production. A diagram for this type of protocol is presented in Figure 7.3g. In a different study [100], Alper et al. used mutagenic PCR of λ PL driving GFP to generate an artificial promoter library. For chromosomal promoter delivery, overlap extention PCR was used to generate a fusion between a dozen selected promoter mutants and a selectable marker (kan), and λ Red recombineering was used to transfer the linear substrates onto the chromosome in place of two endogenous promoters (to the ppc and dxs genes). In this study, using λ Red, there were able to transfer twelve well-defined promoters, which varied step-wise over the entire range of expression levels (325-fold when analyzed at the transcriptional level) to two different genes in three different hosts. The authors went on to find an optimum level of production of phosphoenolpyruvate carboxylase (ppc) for growth yield using glucose as the sole carbon source, as well as the optimum level of dsx expression for the production of lycopene in both wild type and an engineered strain of E. coli.

7.2.10 Genome Reduction One area of chromosomal engineering that will have a direct impact on metabolic engineering studies is the development of strains with reduced genome content. Bacteria possess many genes that play important roles for survival in the environment, but are dispensable when grown in the laboratory. Among these nonessential elements (imbedded throughout bacterial genomes) are cryptic prophages, transposons, insertion sequence elements, damaged genes, and genes of unknown function. Since E. coli and other bacteria are critical for the production of metabolites and proteins of commercial value, it is of interest to create strains that are deleted of as many genes as possible without hindering growth and metabolic efficiency. As noted by Kolisnychenko et al. [78], the simple reduction in the content of needless protein contaminants in the host makes purification of recombinant proteins that much easier. Also, metabolic energy is not wasted maintaining and expressing functions that are not useful,

Chromosomal Engineering Strategies

7-19

and instead is directed more to the synthesis of the desired metabolite. Finally, genome reduction may decrease redundancies in gene function and regulatory mechanisms, making it easier to manipulate and design improvements in metabolic pathways. Using Red/ET recombineering and RecA-mediated dsDNA break repair (see method described in Figure 7.4), the labs of Gyorg Posfai and Frederick Blattner engineered a reduced E. coli genome by identifying and removing nonessential genes, mobile elements, and cryptic phage sequences [78,101]. They did this by comparison of genomic sequences between E. coli K-12 and five other E. coli species, and identified DNA segments present in K-12 but absent from the other genomes. These segments were removed from E. coli K-12 generating a strain that had over 15% of its genome removed (708,267 bp–743 genes). These multiple-deleted strains (MDS41–43) grew similar to starting strain MG1655, possessed a 100-fold increase in electroporation efficiency, and expressed recombinant proteins as well as MG1655. These strains were designed to be free of all insertion sequences (IS) that would be expected to decrease the number of spontaneous transpositional mutagenic events. The reduced genome resulted in a strain that displayed an increase in plasmid stability and a 20% drop in spontaneous mutation rates. Clearly, these attributes are attractive for strains to be used for metabolic engineering applications. Other nonrecombineering approaches have also been used to generate minimized genomes of E. coli K-12. Yu et al. [75] have generated and precisely mapped two libraries of independent transposon mutants (400 per library) using two different selection markers (Cam and Kan). The modified transposons contain loxP sites. Individual members from each library were selected and combined into one strain by P1 transduction. Subsequently, the region between the insertion sites was deleted by Cre recombinase. Using this method, this group generated an E. coli strain with a reduced genome containing a total deletion of 313 kbp (287 orfs). Similar to the reduced MDS strains described above, the cumulative-deletion mutant exhibited no growth defect when grown under standard laboratory conditions. Out of 13 deletions attempted, only six were generated in this study. The seven deletions not obtained suggest these regions likely contain one or more essential functions. Thus, the inability to obtain a deletion of a chromosomal region offers an informative approach for studies designed to identify essential genes, and thus regions of bacterial chromosomes that may contain potential drug targets for antibiotic development. B. subtilis has also been a target of genome reduction. Using plasmid-based chromosomal integrationexcision systems, Westers et al. [102] made a strain that was deleted of two prophages, the prophage-like elements, and a polyketide synthase operon that represented a 7.7% reduction (320 kb) of the B. subtilis chromosome. As these authors pointed out, the deletion of phage and phage-like elements removes potential lysins that might interfere with the growth of bacteria during industrial fermentations. Much like what was found for other reduced genomes discussed above, the reduced genome of B. subtilis had no effect on growth rates and biomass yields. A further examination of the properties of this reduced genome strain was performed using metabolic flux analysis, proteomics, and a variety of physiological assays. The study found that the parent and deleted strains of B. subtilis exhibited similar levels of carbon metabolism, protein secretion, competence, and sporulation, with some measurable changes in bacterial motility observed in the presence of different concentrations of agarose. It had yet to be demonstrated whether this deleted strain represented an improved strain for industrial fermentation.

7.3 Large Insertions When considering the engineering of a metabolic pathway, it is often necessary to transfer all the genes of that pathway from an exogenous source into the target strain chromosome. Since genes within a metabolic pathway tend to be clustered, this process can involve the insertion of a very large DNA segment into the bacterial chromosome. As mentioned above, recombineering does not work efficiently with large DNA inserts (~5 kb and greater). However, a number of nonrecombineering approaches to the insertion of foreign DNA into the E. coli chromosome have been described, protocols that include the use of site-specific recombinases, transposons, and general homologous recombination.

7-20

Gene Expression Tools for Metabolic Pathway Engineering

A method for DNA insertion into the E. coli chromosome described by Haldimann and Wanner uses a series of plasmid-host systems for the introduction of multiple genes into the same cell at single copy [88]. This method should be useful for the development of strains for metabolic engineering that requires multiple genes to be inserted, with various regulatory regions governing each of the inserts. The conditional-replication, integration, and modular (CRIM) plasmids are a series of plasmids that include the attB and attP sites for phages λ, HK022, φ80, P21, and P22, that can be used in conjunction with a series of different promoters for ectopic expression of cloned inserts. The plasmids replicate in a pir-dependent manner (a gene that encodes the trans-acting Π protein for the R6K origin of replication) allowing these plasmids to be grown under either low or high copy-number conditions [103]. The integration events are governed by a series of Int (Integrase) helper plasmids that promote integration of the attP-containing plasmids into the attB site of the chromosome of non-pir hosts (hosts that cannot support replication of the plasmids). Following integration, the inserts are stably maintained in the absence of antibiotic selection. The modular design of these plasmids allows different inserts to be placed at alternate positions in the chromosome, and expressed by various promoters. An alternative method of chromosomal DNA insertion involves the use of the yeast FLP recombinase system. Huang et al. [64] have described the FLIRT system (Flp-mediated DNA integration and rearrangement at prearranged genomic targets). In this method, FRT sites are delivered to various locations in the E. coli genome by a modified Tn5 transposon. Plasmids that contain a cloned DNA insert (and TetR marker) flanked by FRT sites is processed by restriction enzyme digestion to remove the origin of replication. Following treatment of the linearized DNA with Flp in vitro, the circular product is purified and electroporated into E. coli containing the Flp-expressing plasmid pLH29. Site-specific recombination promotes insertion of the cloned DNA into the FRT site on the chromosome, generating TetR colonies. The targeted integrants are stable in the absence of Flp expression. Clearly, as an addendum to this technology, Red/ET recombineering could be used to deliver FRT sites (and subsequently the insert) to any desired location in the chromosome. A third method for chromosomal insertion takes advantage of the site-specific recombination mechanism of transposon Tn7 [104–107]. In this scheme, the DNA insertion is cloned into a multiple cloning site of a plasmid, where it is thus flanked by the left and right ends of Tn7. The nonreplicative plasmid is then transformed, conjugated or electroporated into a target bacterium, along with a helper plasmid expressing the Tn7 transposase functions. The cloned insert is transposed to the attTn7 attachment site, a region downstream of the glnS gene (encoding glucosomin-6-phosphate synthetase). Insertion into this site does not interfere with any gene function, and thus is a neutral site for insertion elements. A broad host range Tn7-based system, utilized in E. coli, Pseudomonas aeruginosa, Pseudomonas putida, and Yersinia pestis, has been described [108]. A system based on mini-Tn5 transposition has also been described [109], but in this case the transposition is random, and thus requires characterization of the insertion site prior to analysis of the transformant. This system is more applicable for insertional mutagenesis and promoter probing applications. In the most recent description of the Tn7 method [107], the miniTn7 plasmid pGRG25 contains a multiple cloning site flanked by Tn7 ends, the trnABCD genes under control of the PBAD promoter, araC (for controlled expression of trnABCD), the bla gene (for selection with ampicillin), and a temperature– sensitive origin of replication (allowing for curing of the plasmid following the transposition event). Thus no helper plasmid is required. Following cloning of the insert into pGRG25 and transformation of the plasmid into E. coli, a cell population is grown up at 32oC in the presence of ampicillin. The culture is then grown nonselectively to saturation in the presence of arabinose to induce transposition, then plated out at 42oC for colonies. Colonies are screened for the loss of the temperature sensitive plasmid (AmpS) and for the presence of the insertion by PCR across the Tn7 insertion site (attTn7). The frequency of insertion is reported to be as high as 79%, making it the most efficient method of transferring insertions to the E. coli chromosome. Given this high frequency of insertion, a selection scheme is not required to identify insertion events. The high frequency of this system likely comes from the outgrowth of the culture following transformation of the mini-Tn7 plasmid, ensuring that a majority of the cells will be subject to the transposition reaction.

Chromosomal Engineering Strategies

7-21

A method for inserting very large regions of exogenous DNA (~100 kb) into the E. coli chromosome that takes advantage of the endogenous RecA pathway of recombination has been described [110]. Two large regions (5–6 kb) of defined E. coli sequence were cloned into a BAC. The two genetic segments, the gpt region (~5.5 kb) and the lacI region (~6.5 kb) are separated by a polylinker for cloning exogenous DNA and the cat gene for genetic selection of the insertion. Following random cloning of L. lactis DNA into the cloning site, two clones were selected for transfer of the inserts to the intergenic region between gpt and lacI in E. coli, a nonessential region of the E. coli chromosome (~110 kb). Following linearization of the clones and electroporation into E. coli, transformants were selected (and verified) that had transferred either 20 kb (17 orfs) or 11 kb (12 orfs) of L. lactis DNA into the E. coli chromosome. A further development of this protocol might include construction of an E. coli strain that has a counter-selectable marker already in place of the chromosomal gpt-lacI interagenic region (e.g., a ∆gpt-lacI::cat sacB strain). One then would not need to incorporate a selectable marker into the BAC clone to select for the insert.

7.4 Applications of Red/ET Recombineering Technology for Metabolic Engineering Red/ET recombineering is now the preferred method for making gene knockouts in E. coli, simplifying the generation of bacterial mutants. The simplicity, speed, and accuracy of the method virtually ensure finding the desired chromosomal alteration in most cases. In fact, this technology has been used to generate a gene knockout in every nonessential gene in the E. coli chromosome [72]. In most recent developments, Red/ET recombineering has been carried out in a 96-well format for the construction of conditional knockout targeting vectors for the generation of mutant mice [111], and been used to fluorescently label numerous genes within the E. coli genome for visualization of the E. coli proteome [112]. Red/ET technology is also moving into the field of bacterial pathogenesis, allowing investigators to make gene insertions and deletions in a day or two that would have normally taken two or more weeks [45,113]. For metabolic engineering, recombineering is now being recognized as a critical tool in the search for novel pharmaceutical compounds, where, for example, one might want to manipulate large biosynthetic gene clusters for expression in heterologous hosts, or for greater flexibility in the use of combinatorial biosynthetic technologies. Many of the bacterial species used by metabolic engineers possess genetic systems that are either time-consuming (e.g., Pseudomonas) or nonexistent (Mycobacteria). Many investigators are cloning the DNA of interest into BAC vectors, manipulating its genes using Red/ET recombineering in E. coli, then incorporating mobilization determinants into these vectors for transfer to the original (or heterologous) recipients. A good example of this gene replacement method was described by Gust et al. [52] where cosmid clones of Streptomyces coelicolor DNA were modified in E. coli by Red/ET recombineering. The inclusion of FRT sites flanking the drug marker allowed marker-free in-frame deletions of the gene of interest to be generated. The clone contained an origin of transfer (oriT) that allowed the investigators to transfer the modified DNA into S. coelicolor where it was integrated into the chromosome. This method has been used to generate numerous gene deletions in S. coelicolor. The value of recombineering for metabolic engineering is also evident in a report by R. Muller and colleagues [114]. In their study, the authors were interested in cloning a large biosynthetic gene cluster from the myxobacterium Stigmatella aurantiaca and expressing it in Pseudomonas putida. The gene cluster, which encodes a PKS/NRPS hybrid that synthesizes various forms of the cyclic peptide myxochromide S, was first cloned into a BAC vector. The vector was then modified by Red/ET recombineering that included the insertion of the following elements: oriT (for transfer in P. putida), a tetracyclineresistance gene (for selection in P. putida) and the P. putida trpE gene (for integration of the vector into the P. putida chromosome by homologous recombination). These steps were followed by the addition of a (missing) thioesterase domain to one of the nonribosomal peptide synthetases(NPSs), and the insertion of the toluic acid-inducible promoter (Pm) to drive expression of the myxochromide S gene cluster.

7-22

Gene Expression Tools for Metabolic Pathway Engineering

These manipulations were carried out by methods described above in Section 7.2 (and outlined in Figure 7.3). The final construct (43 kbp), containing the complete biosynthetic cluster expressed from Pm, was transferred to P. putida by triparental conjugation [115] and integrated into the chromosome. The P. putida strain containing the myxochromide S gene cluster generated a maximum yield of myxochromide S five-fold greater that the natural producing S. aurantiacat strain. The manipulations of such large DNA regions with conventional restriction enzyme technology would have been extremely difficult if not impossible. With Red/ET recombineering, the modifications were made easily and quickly allowing a novel approach to heterologous expression of synthetic genes in Pseudomonads. In another study, Eustaquio et al. [116] used Red recombineering to place the attachment site (attP) and integrase gene (int) of phage ΦC31 into cosmids containing the biosynthetic gene clusters of novobiocin (from Streptomyces spheroids) and clorobiocin (from Streptomyces roseochromogenes), enabling them to place these constructs into the genomes of heterologous hosts S. coelicolor and S. lividans (hosts which can be easily genetically manipulated). In E. coli, the cosmids were modified by λ Red to remove nonessential regions of both genes cluster prior to heterologous expression. The use of Red/ET recombineering for combinatorial biosynthesis has recently been reported. Nguyen et al. [117] generated a library of novel daptomycin derivatives using λ Red technology. Daptomycin, used for the treatment of skin infections caused by gram-positive pathogens, is a cyclic anionic 13-amino acid lipopeptide produced in Streptomyces roseosporus by a NPS. Using λ Red recombineering, these authors exchanged modules of the DptBC subunit, replaced them with other NRPSs, combined them with DptD subunit modules, inactivated a Glu12 methylase activity, and generated variations of its lipid tail to generate 30 novel NRPS biosynthetic pathways. Though none of the active antibiotics produced were superior to daptomycin, several were as equally potent. This study led to an understanding of what structural features of daptomycin were important of its clinical potency, and demonstrated the usefulness of combinatorial biosynthesis for the generation of antibiotic derivatives that cannot be effectively generated by chemical modification. Red/ET recombineering technology has also been used to make simple gene knockouts or replacements in a variety of metabolic engineering applications. Kim et al. [118] describe a method for the rapid and efficient generation of a library of hybrid type I polyketide synthetases for pikromycin production. In this scheme, the plasmid sequence encoding the loading domain (ATo-ACPo) was replaced with sacB, which was subsequently replaced by a library of chimeric loading domains fragments via Red/ ET recombineering. These authors described a rapid, precise, and efficient integration of shuffled DNA into a PKS complementation plasmid that generated a library of modified polyketide synthetases. Other investigators have reported using Red/ET recombineering for engineering of the geldanamycin biosynthesis pathway [119], the engineering of a biofilm-deficient E.coli strain for biotechnology applications [120], and for optimization of biomass yield of E. coli using inverse metabolic engineering [121]. Clearly, the efficiency and simplicity of the Red/ET recombineering system is an important and versatile tool for metabolic engineers, allowing investigators to make changes in large gene clusters that otherwise could not have been easily done with classical techniques of gene manipulation.

7.5 Summary The process of DNA insertions, knockouts, or replacement of gene sequences has always been the cornerstone to understanding the biochemistry and physiology of microorganisms. The development and use of a myriad of antibiotic markers, counter-selection schemes, phage-integration sites, and transposons have been developed over the past few decades to remove, introduce, or rearrange genes (and operons) in the bacterial chromosome. These same techniques of genetic manipulation are now being used by the metabolic engineer to develop strains that are more fit, less toxic and rationally designed for production of a particular metabolite of interest. More recently, the development of the highly efficient Red/ET recombineering technology, together with the use of the SSRs Cre and Flp, has given the investigator/engineer an almost limitless ability to modify bacterial genomes. In particular, recombineering

Chromosomal Engineering Strategies

7-23

changes the way research is performed by simplifying the methods of chromosomal manipulation (i.e., use of a PCR product rather than a plasmid construct), encouraging the investigator to create DNA constructs of increasing complexity in a shorter amount of time. The use of E. coli-borne BAC vectors to house the genetic elements of bacteria of interest to the metabolic engineer, and the subsequent modification of these sequences for transfer to the original or heterologous host, should become a standard method for metabolic engineering of bacterial chromosomes. In time, it is possible that phage elements within the meta-genomic universe might be found to contain Red or RecET-like recombination systems that may be used directly to recombineer the chromosomes of non-E. coli bacterial hosts used in metabolic engineering.

References 1. Baudin, A., et al. A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae. Nucleic Acid Res., 21, 3329, 1993. 2. Lorenz, M.C., et al. Gene disruption with PCR products in Saccharomyces cerevisiae. Gene, 158, 113, 1995. 3. Wach, A., et al. New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast, 10, 1793, 1994. 4. Jasin, M. and P. Schimmel. Deletions of an essential gene in Escherichia coli by site-specific recombination with linear DNA fragments. J. Bacteriol., 159, 783, 1984. 5. Marinus, M.G., et al. Insertion mutations in the dam gene of Escherichia coli K-12. Molec. Gen. Genet., 191, 288, 1983. 6. Russell, C.B., D.S. Thaler, and F.W. Dahlquist. Chromosomal transformation of Escherichia coli recD strains with linearized plasmids. J. Bacteriol., 171, 2609, 1989. 7. Winans, S.C., et al. Site-directed insertion and deletion mutagenesis with cloned fragments in Escherichia coli. J. Bacteriol., 161, 1219, 1985. 8. Spies, M. and S.C. Kowalczykowski. Homologous recombination by RecBCD and RecF pathways. In The Bacterial Chromosome, N.P. Higgins, Editor. ASM Press: Washington, DC, 2005, 389. 9. Kuzminov, A. Recombinational repair of DNA damage in Escherichia coli and bacteriophage lambda. Microbiol. Mol. Biol. Rev., 63 (4), 751, 1999. 10. Dabert, P. and G.R. Smith. Gene replacement with linear DNA fragments in wild type Escherichia coli: enhancement by chi sites. Genetics, 145, 877, 1997. 11. Murphy, K.C. Use of bacteriophage λ recombination functions to promote gene replacement in Escherichia coli. J. Bacteriol., 180, 2063, 1998. 12. Zhang, Y., et al. A new logic for DNA engineering using recombination in Escherichia coli. Nat. Genet., 20 (2), 123, 1998. 13. Zhang, Y., et al. DNA cloning by homologous recombination in Escherichia coli. Nat. Biotechnol., 18 (12), 1314, 2000. 14. Yu, D., et al. An efficient recombination system for chromosome engineering in Escherichia coli. Proc. Natl Acad. Sci. USA, 97 (11), 5978, 2000. 15. Datsenko, K.A. and B.L. Wanner. One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc. Natl. Acad. Sci. USA, 97 (12), 6640, 2000. 16. Murphy, K.C., K.G. Campellone, and A.R. Poteete. PCR-mediated gene replacement in Escherchia coli. Gene, 246, 321, 2000. 17. Muyrers, J.P., Y. Zhang, and A.F. Stewart. ET-cloning: think recombination first. Genet. Eng. (NY), 22, 77, 2000. 18. Copeland, N.G., N.A. Jenkins, and D.L. Court. Recombineering: a powerful new tool for mouse functional genomics. Nat. Rev. Genet., 2 (10), 769, 2001. 19. Muyrers, J.P., et al. Rapid modification of bacterial artificial chromosomes by ET-recombination. Nucleic Acids Res., 27 (6), 1555, 1999.

7-24

Gene Expression Tools for Metabolic Pathway Engineering

20. Signer, E.R. and J. Weil. Recombination in bacteriophage λ. Mutants defective in general recombination. J. Mol. Biol., 34, 261, 1968. 21. Cassuto, E. and C.M. Radding. Mechanism for the action of lambda exonuclease in genetic recombination. Nat. New Biol., 229 (1), 13, 1971. 22. Kovall, R. and B.W. Matthews. Toroidal structure of lambda-exonuclease. Science, 277 (5333), 1824, 1997. 23. Kmiec, E. and W.K. Holloman. Beta protein of bacteriophage lambda promotes renaturation of DNA. J. Biol. Chem., 256 (24), 12636, 1981. 24. Muniyappa, K. and C.M. Radding. The homologous recombination system of phage lambda. Pairing activities of beta protein. J. Biol. Chem., 261 (16), 7472, 1986. 25. Li, Z., et al. The beta protein of phage lambda promotes strand exchange. J. Mol. Biol., 276 (4), 733, 1998. 26. Passy, S.I., et al. Rings and filaments of beta protein from bacteriophage lambda suggest a superfamily of recombination proteins. Proc. Natl. Acad. Sci. USA, 96 (8), 4279, 1999. 27. Murphy, K.C. The lambda Gam protein inhibits RecBCD binding to dsDNA ends. J. Mol. Biol., 371 (1), 19, 2007. 28. Karu, A.E., et al. The gamma protein specified by bacteriophage gamma. Structure and inhibitory activity for the recBC enzyme of Escherichia coli. J. Biol. Chem., 250 (18), 7377, 1975. 29. Murphy, K.C. Lambda Gam protein inhibits the helicase and chi-stimulated recombination activities of Escherichia coli RecBCD enzyme. J. Bacteriol., 173 (18), 5808, 1991. 30. Iyer, L.M., E.V. Koonin, and L. Aravind. Classification and evolutionary history of the single-strand annealing proteins, RecT, Redbeta, ERF and RAD52. BMC Genomics, 3 (1), 8, 2002. 31. Gottesman, M.M., et al. Characterization of bacteriophage lambda reverse as an Escherichia coli phage carrying a unique set of host-derived recombination functions. J. Mol. Biol., 88 (2), 471, 1974. 32. Kushner, S.R., H. Nagaishi, and A.J. Clark. Isolation of exonuclease VIII: the enzyme associated with sbcA indirect suppressor. Proc. Natl. Acad. Sci. USA, 71 (9), 3593, 1974. 33. Thaler, D.S., M.M. Stahl, and F.W. Stahl. Double-chain-cut sites are recombination hotspots in the Red pathway of phage l. J. Mol. Biol., 195, 75, 1987. 34. Stahl, F.W., et al. Break-join recombination in phage l. Genetics, 125, 463, 1990. 35. Poteete, A.R. and A.C. Fenton. Efficient double-strand break-stimulated recombination promoted by the general recombination systems of phages l and P22. Genetics, 134, 1013, 1993. 36. Hill, S.A., M.M. Stahl, and F.W. Stahl. Single-strand DNA intermediates in phage λ's Red recombination pathway. Proc. Natl. Aad. Sci. USA, 94, 2951, 1997. 37. Stahl, M.M., et al. Annealing vs. invasion in phage lambda recombination. Genetics, 147 (3), 961, 1997. 38. Wackernagel, W. and C.M. Radding. Transfection by half-molecules and inverted molecules of λ DNA: requirement for exo an b-promoted recombination. Virology, 52, 425, 1973. 39. Poteete, A.R. Involvement of DNA replication in phage lamda Red-mediated recombination. Mol. Microbiol., 68(1), 66, 2008. 40. Ellis, H.M., et al. High efficiency mutagenesis, repair, and engineering of chromosomal DNA using single-stranded oligonucleotides. Proc. Natl Acad. Sci. USA, 98, 6742, 2001. 41. Huen, M.S., et al. The involvement of replication in single stranded oligonucleotide-mediated gene repair. Nucleic Acids Res., 34 (21), 6183, 2006. 42. Stahl, F.W. Recombination in phage λ: one geneticist's historical perspective. Gene, 223, 95, 1998. 43. Oliner, J.D., K.W. Kinzler, and B. Vogelstein. In vivo cloning of PCR products in E. coli. Nucleic Acids Res., 21 (22), 5192, 1993. 44. Zhang, Y., et al. Phage annealing proteins promote oligonucleotide-directed mutagenesis in Escherichia coli and mouse ES cells. BMC Mol. Biol., 4 (1), 1, 2003.

Chromosomal Engineering Strategies

7-25

45. Murphy, K. and K. Campellone. Lambda Red-mediated recombinogenic engineering of enterohemorrhagic and enteropathogenic E. coli. BMC Mol. Biol., 4, 11, 2003. 46. Lee, E.C., et al. A highly efficient Escherichia coli-based chromosome engineering system adapted for recombinogenic targeting and subcloning of BAC DNA. Genomics, 73 (1), 56, 2001. 47. Costantino, N. and D. Court. Enhanced levels of lambda Red-mediated recombinants in mismatch repair mutants. Proc. Natl. Acad. Sci. USA, 100, 15748, 2003. 48. Sergueev, K., et al. Cell toxicity caused by products of the p(L) operon of bacteriophage lambda. Gene, 272 (1–2), 227, 2001. 49. Datta, S., N. Costantino, and D.L. Court. A set of recombineering plasmids for gram-negative bacteria. Gene, 379, 109, 2006. 50. Wang, J., et al. An improved recombineering approach by adding RecA to lambda Red recombination. Mol. Biotechnol., 32 (1), 43, 2006. 51. Derbise, A., et al. A rapid and simple method for inactivating chromosomal genes in Yersinia. FEMS Immunol. Med. Microbiol., 38 (2), 113, 2003. 52. Gust, B., et al. PCR-targeted Streptomyces gene replacement identifies a protein domain needed for biosynthesis of the sesquiterpene soil odor geosmin. Proc. Natl. Acad. Sci. USA, 100 (4), 1541, 2003. 53. Yu, D., et al. Recombineering with overlapping single-stranded DNA oligonucleotides: testing a recombination intermediate. Proc. Natl. Acad. Sci. USA, 100, 7207, 2003. 54. Modrich, P. Methyl-directed DNA mismatch correction. J. Biol. Chem., 264 (12), 6597, 1989. 55. Lobner-Olesen, A., O. Skovgaard, and M.G. Marinus. Dam methylation: coordinating cellular processes. Curr. Opin. Microbiol., 8 (2), 154, 2005. 56. Yang, Y. and S.K. Sharan. A simple two-step, ‘hit and fix’ method to generate subtle mutations in BACs using short denatured PCR fragments. Nucleic Acids Res., 31 (15), e80, 2003. 57. Gay, P., et al. Cloning structural gene sacB, which codes for exoenzyme levansucrase of Bacillus subtilis: expression of the gene in Escherichia coli. J. Bacteriol., 153, 1424, 1983. 58. Ried, J.L. and A. Collmer. An nptI-sacB-sacR cartridge for constructing directed, unmarked mutations in Gram-negative bacteria by marker exchange-eviction mutagenesis. Gene, 57, 239, 1987. 59. Warming, S., et al. Simple and highly efficient BAC recombineering using galK selection. Nucleic Acids Res., 33 (4), e36, 2005. 60. Wong, Q.N., et al. Efficient and seamless DNA recombineering using a thymidylate synthase A selection system in Escherichia coli. Nucleic Acids Res., 33 (6), e59, 2005. 61. Russell, C.B. and F.W. Dahlquist. Exchange of Chromosomal and plasmid alleles in Escherichia coli by selection for loss of a dominant antibiotic sensitivity maker. J. Bacteriol., 171, 2614, 1989. 62. Cox, M.M. The FLP protein of the yeast 2-microns plasmid: expression of a eukaryotic genetic recombination system in Escherichia coli. Proc. Natl. Acad. Sci. USA, 80 (14), 4223, 1983. 63. Huang, L.C., E.A. Wood, and M.M. Cox. A bacterial model system for chromosomal targeting. Nucleic Acids Res., 19 (3), 443, 1991. 64. Huang, L.C., E.A. Wood, and M.M. Cox. Convenient and reversible site-specific targeting of exogenous DNA into a bacterial chromosome by use of the FLP recombinase: the FLIRT system. J. Bacteriol., 179 (19), 6076, 1997. 65. Siegal, M.L. and D.L. Hartl. Application of Cre/loxP in Drosophila. Site-specific recombination and transgene coplacement. Methods Mol. Biol., 136, 487, 2000. 66. Schweizer, H.P. Applications of the Saccharomyces cerevisiae Flp-FRT system in bacterial genetics. J. Mol. Microbiol. Biotechnol., 5 (2), 67, 2003. 67. Gilbertson, L. Cre-lox recombination: Cre-ative tools for plant biotechnology. Trends Biotechnol., 21 (12), 550, 2003. 68. Branda, C.S. and S.M. Dymecki. Talking about a revolution: The impact of site-specific recombinases on genetic analyses in mice. Dev. Cell, 6 (1), 7, 2004.

7-26

Gene Expression Tools for Metabolic Pathway Engineering

69. Schnutgen, F., et al. Engineering embryonic stem cells with recombinase systems. Methods Enzymol., 420, 100, 2006. 70. Grindley, N.D., K.L. Whiteson, and P.A. Rice. Mechanisms of site-specific recombination. Ann. Rev. Biochem., 75, 567, 2006. 71. Campellone, K.G., D. Robbins, and J.M. Leong. EspFU is a translocated EHEC effector that interacts with Tir and N-WASP and promotes Nck-independent actin assembly. Dev. Cell, 7 (2), 217, 2004. 72. Baba, T., et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol., 2, 2006 0008, 2006. 73. Ayres, E.K., et al. Precise deletions in large bacterial genomes by vector-mediated excision (VEX). The trfA gene of promiscuous plasmid RK2 is essential for replication in several gram-negative hosts. J. Mol. Biol., 230 (1), 174, 1993. 74. Yu, B.J., et al. Minimization of the Escherichia coli genome using a Tn5-targeted Cre/loxP excision system. Nat. Biotechnol., 20 (10), 1018, 2002. 75. Fukiya, S., H. Mizoguchi, and H. Mori. An improved method for deleting large regions of Escherichia coli K-12 chromosome using a combination of Cre/loxP and lambda Red. FEMS Microbiol. Lett., 234 (2), 325, 2004. 76. Posfai, G., et al. Markerless gene replacement in Escherichia coli stimulated by a double-strand break in the chromosome. Nucleic Acids Res., 27 (22), 4409, 1999. 77. Kolisnychenko, V., et al. Engineering a reduced Escherichia coli genome. Genome Res., 12 (4), 640, 2002. 78. Tischer, B.K., et al. Two-step red-mediated recombination for versatile high-efficiency markerless DNA manipulation in Escherichia coli. Biotechniques, 40 (2), 191, 2006. 79. Schmid, M.B. and J.R. Roth. Genetic methods for analysis and manipulation of inversion mutations in bacteria. Genetics, 105 (3), 517, 1983. 80. Segall, A., M.J. Mahan, and J.R. Roth. Rearrangement of the bacterial chromosome: forbidden inversions. Science, 241 (4871), 1314, 1988. 81. Galitski, T. and J.R. Roth. Pathways for homologous recombination between chromosomal direct repeats in Salmonella typhimurium. Genetics, 146 (3), 751, 1997. 82. Poteete, A.R., A.C. Fenton, and A. Nadkarni. Chromosomal duplications and cointegrates generated by the bacteriophage lamdba Red system in Escherichia coli K-12. BMC Mol. Biol., 5 (1), 22, 2004. 83. Hand, N.J. and T.J. Silhavy. A practical guide to the construction and use of lac fusions in Escherichia coli. Methods Enzymol., 326, 11, 2000. 84. Silhavy, T.J. and J.R. Beckwith. Uses of lac fusions for the study of biological problems. Microbiol. Rev., 49 (4), 398, 1985. 85. Simons, R.W., F. Houman, and N. Kleckner. Improved single and multicopy lac-based cloning vectors for protein and operon fusions. Gene, 53 (1), 85, 1987. 86. Slauch, J.M. and T.J. Silhavy. Genetic fusions as experimental tools. Methods Enzymol., 204, 213, 1991. 87. Platt, R., et al. Genetic system for reversible integration of DNA constructs and lacZ gene fusions into the Escherichia coli chromosome. Plasmid, 43 (1), 12, 2000. 88. Haldimann, A. and B.L. Wanner. Conditional-replication, integration, excision, and retrieval plasmid-host systems for gene structure-function studies of bacteria. J. Bacteriol., 183 (21), 6384, 2001. 89. Uzzau, S., et al. Epitope tagging of chromosomal genes in Salmonella. Proc. Natl. Acad. Sci. USA, 98, (26), 15264, 2001. 90. Ellermeier, C.D., A. Janakiraman, and J.M. Slauch. Construction of targeted single copy lac fusions using lambda Red and FLP-mediated site-specific recombination in bacteria. Gene, 290 (1–2), 153, 2002.

Chromosomal Engineering Strategies

7-27

91. Hautefort, I., M.J. Proenca, and J.C. Hinton. Single-copy green fluorescent protein gene fusions allow accurate measurement of Salmonella gene expression in vitro and during infection of mammalian cells. Appl. Environ. Microbiol., 69 (12), 7480, 2003. 92. Dolphin, C.T. and I.A. Hope. Caenorhabditis elegans reporter fusion genes generated by seamless modification of large genomic DNA clones. Nucleic Acids Res., 34 (9), e72, 2006. 93. Gerlach, R.G., et al. Rapid engineering of bacterial reporter gene fusions by using Red recombination. Appl. Environ. Microbiol., 73 (13), 4234, 2007. 94. De Lay, N.R. and J.E. Cronan. Gene-specific random mutagenesis of Escherichia coli in vivo: isolation of temperature-sensitive mutations in the acyl carrier protein of fatty acid synthesis. J. Bacteriol., 188 (1), 287, 2006. 95. Cadwell, R.C. and G.F. Joyce. Randomization of genes by PCR mutagenesis. PCR Methods Appl., 2 (1), 28, 1992. 96. Horton, R.M., et al. Engineering hybrid genes without the use of restriction enzymes: gene splicing by overlap extension. Gene, 77 (1), 61, 1989. 97. Jensen, P.R. and K. Hammer. The sequence of spacers between the consensus sequences modulates the strength of prokaryotic promoters. Appl. Environ. Microbiol., 64 (1), 82, 1998. 98. Solem, C. and P.R. Jensen. Modulation of gene expression made easy. Appl. Environ. Microbiol., 68 (5), 2397, 2002. 99. Meynial-Salles, I., M.A. Cervin, and P. Soucaille. New tool for metabolic pathway engineering in Escherichia coli: one-step method to modulate expression of chromosomal genes. Appl. Environ. Microbiol., 71 (4), 2140, 2005. 100. Alper, H., et al. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA, 102 (36), 12678, 2005. 101. Posfai, G., et al. Emergent properties of reduced-genome Escherichia coli. Science, 312 (5776), 1044, 2006. 102. Westers, H., et al. Genome engineering reveals large dispensable regions in Bacillus subtilis. Mol. Biol. Evol., 20 (12), 2076, 2003. 103. Metcalf, W.W., W. Jiang, and B.L. Wanner. Use of the rep technique for allele replacement to construct new Escherichia coli hosts for maintenance of R6K gamma origin plasmids at different copy numbers. Gene, 138 (1–2), 1, 1994. 104. Bao, Y., et al. An improved Tn7-based system for the single-copy insertion of cloned genes into chromosomes of gram-negative bacteria. Gene, 109 (1), 167, 1991. 105. Koch, B., L.E. Jensen, and O. Nybroe. A panel of Tn7-based vectors for insertion of the gfp marker gene or for delivery of cloned DNA into Gram-negative bacteria at a neutral chromosomal site. J. Microbiol. Methods, 45 (3), 187, 2001. 106. Choi, K.H., et al. A Tn7-based broad-range bacterial cloning and expression system. Nat. Methods, 2 (6), 443, 2005. 107. McKenzie, G.J. and N.L. Craig. Fast, easy and efficient: site-specific insertion of transgenes into enterobacterial chromosomes using Tn7 without need for selection of the insertion event. BMC Microbiol., 6, 39, 2006. 108. Choi, K.H. and H.P. Schweizer. An improved method for rapid generation of unmarked Pseudomonas aeruginosa deletion mutants. BMC Microbiol., 5 (1), 30, 2005. 109. de Lorenzo, V., et al. Mini-Tn5 transposon derivatives for insertion mutagenesis, promoter probing, and chromosomal insertion of cloned DNA in gram-negative eubacteria. J. Bacteriol., 172 (11), 6568, 1990. 110. Rong, R., et al. Engineering large fragment insertions into the chromosome of Escherichia coli. Gene, 336 (1), 73, 2004. 111. Chan, W., et al. A recombineering based approach for high-throughput conditional knockout targeting vector construction. Nucleic Acids Res., 35 (8), e64, 2007.

7-28

Gene Expression Tools for Metabolic Pathway Engineering

112. Watt, R.M., et al. Visualizing the proteome of Escherichia coli: an efficient and versatile method for labeling chromosomal coding DNA sequences (CDSs) with fluorescent protein genes. Nucleic Acids Res., 35 (6), e37, 2007. 113. van Kessel, J.C. and G.F. Hatfull. Recombineering in Mycobacterium tuberculosis. Nat. Methods, 4 (2), 147, 2007. 114. Wenzel, S.C., et al. Heterologous expression of a myxobacterial natural products assembly line in Pseudomonas via red/ET recombineering. Chem. Biol., 12 (3), 349, 2005. 115. Hill, D.S., et al. Cloning of genes involved in the synthesis of pyrrolnitrin from Pseudomonas fluorescens and role of pyrrolnitrin synthesis in biological control of plant disease. Appl. Environ. Microbiol., 60 (1), 78, 1994. 116. Eustaquio, A.S., et al. Heterologous expression of novobiocin and clorobiocin biosynthetic gene clusters. Appl. Environ. Microbiol., 71 (5), 2452, 2005. 117. Nguyen, K.T., et al. Combinatorial biosynthesis of novel antibiotics related to daptomycin. Proc. Natl. Acad. Sci. USA, 103 (46), 17462, 2006. 118. Kim, B.S., D.H. Sherman, and K.A. Reynolds. An efficient method for creation and functional analysis of libraries of hybrid type I polyketide synthases. Protein Eng. Des. Sel., 17 (3), 277, 2004. 119. Vetcher, L., et al. Rapid engineering of the geldanamycin biosynthesis pathway by Red/ET recombination and gene complementation. Appl. Environ. Microbiol., 71 (4), 1829, 2005. 120. Sung, B.H., et al. Development of a biofilm production-deficient Escherichia coli strain as a host for biotechnological applications. Appl. Environ. Microbiol., 72 (5), 3336, 2006. 121. Trinh, C.T., et al. Design, construction and performance of the most efficient biomass producing E. coli bacterium. Metab. Eng., 8 (6), 628, 2006. 122. Poteete, A.R. and A. Fenton. λ red dependent growth and recombination of phage P22. Virology, 134, 161, 1984. 123. Chaveroche, M., J. Ghigo, and C. d’Enfert. A rapid method for efficient gene replacement in the filamentous fungus Aspergillus nidulans. Nucleic Acids Res., 28, E97, 2000. 124. Kovach, M.E. et al. Four new derivatives of the broad-host-range cloning vector pBBR1MCS, carrying different antibiotic-resistance cassettes. Gene, 166, 175, 1995. 125. Cherepanov, P.P. and Wackernagel, W. Gene disruption in Escherichia coli: TcR and KmR cassettes with the option of Flp-catalyzed excision of the antibiotic-resistance determinant. Gene, 158, 9, 1995. 126. Voziyanov, Y, Stewart , A.F., and Jayaram, M. A dual reporter screening system identifies the amino acid at position 82 in Flp site-specific recombinase as a determinant for target specificity. Nucl. Acids Res. 30, 1656, 2002. 127. Buchholz, F., et al. Different thermostabilities of FLP and Cre recombinases: implications for applied site-specific recombination. Nucl. Acids Res., 24, 4256, 1996. 128. Palmeros, B. et al. A family of removable cassettes designed to obtain antibiotic-resistance-free genomic modifications of Escherichia coli and other bacteria. Gene, 247, 255, 2000. 129. Bigger, B.W. et al. An araC-controlled bacterial cre expression system to produce DNA minicircle vectors for nuclear and mitochondrial gene therapy. J. Biol. Chem., 276, 23018, 2001.

8 Regulating Gene Expression through Engineered RNA Technologies 8.1 8.2

Introduction ��8-1 Basic RNA Regulatory Elements.....................................................8-3

8.3

RNA Sensory Elements ��8-9

8.4

Natural and Engineered Riboswitches........................................8-11

RNA Elements That Direct Translation Initiation • RNA Elements That Direct RNase Activity • RNA Elements That Direct Processing and Degradation of Transcripts through Direct Cleavage • RNA Elements That Regulate Gene Expression through Antisense Mechanisms • RNA Elements That Regulate Gene Expression through the RNAi Pathway RNA Elements That Serve as Thermosensors • RNA Elements That Bind Nucleic Acids • RNA Elements That Bind Molecular Ligands

General Composition and Conformational Dynamics of Riboswitches • Mechanisms of Ligand-Controlled Gene Regulation by Riboswitches • Riboswitch Targets and Implementation in Metabolic Networks • Synthetic Riboswitches That Control Target Gene Expression Levels

8.5

Maung Nyan Win California Institute of Technology

Christina D. Smolke California Institute of Technology

pplications of RNA Control Elements in Metabolic A Network Engineering ��8-23 8.6 Enabling Technologies in Support of Constructing Integrated Network and Control Systems...................................8-25 8.7 Future Applications of Advanced RNA-Based Control Systems �� 8-26 8.8 Conclusions ��8-27 References ��8-27

8.1 Introduction Much effort has traditionally been directed to the development of organic synthesis methods for the production of valuable chemicals and small molecule pharmaceuticals. However, despite the significant achievements in this field many compounds remain difficult to be synthesized through such means. Advances in enzyme and biochemical pathway engineering have pushed biological synthesis strategies as alternate and viable options to chemical synthesis for certain compounds. Cellular biosynthesis strategies have several advantages over traditional chemical synthesis strategies in that the former is often 8-1

8-2

Gene Expression Tools for Metabolic Pathway Engineering

conducted under less harsh conditions, thereby enabling “green” synthesis strategies that are associated with the production of fewer toxic byproducts. In addition, cellular biosynthesis takes advantage of the cell’s natural ability to replenish enzymes and cofactors and to provide precursors from often inexpensive and renewable starting materials. Metabolic engineering is the construction, redirection, and manipulation of cellular metabolism through the alteration of enzyme activities and levels to achieve the biosynthesis or biocatalysis of desired compounds.1 The optimization of production or consumption of a target metabolite often requires precisely controlled expression of several natural and/or heterologous pathway enzymes such that individual conversion steps in the pathway do not limit the desired product yield (based on different K m and kcat values), and that cellular resources and energy are not being inefficiently utilized. Therefore, programmable and tunable expression technologies that enable the reliable and dynamic control of enzyme levels, including coordinated and differential regulation of multiple gene products, are required for the optimization of pathway fluxes. Technologies have been developed that target many aspects of gene expression such as gene copy number (plasmid and chromosome engineering), transcriptional regulation (promoter engineering), and post-transcriptional regulation (targeted RNA processing and decay). The vast majority of metabolic engineering efforts to-date have focused network design strategies at the level of transcriptional regulation. However, recent scientific and technological advances in RNA biology and engineering have highlighted the significant and diverse roles that RNA plays in controlling gene expression in both prokaryotic and eukaryotic organisms. Through specific binding interactions with a wide range of biological molecules such as DNA, transcripts, proteins, and metabolites, RNA has been shown to direct a variety of complex cellular functions. The traditional model of RNA’s role in cellular processes has been that of a passive messenger of genetic information between the genome and the proteome in all living systems. However, there have been an increasing number of examples of naturally occurring RNAs that act as regulatory elements in the gene expression pathway, thereby revealing it as a functionally versatile molecule. Antisense RNAs,2,3 ribozymes,4,5 riboswitches,6–11 and small interfering and microRNAs (siRNAs and miRNAs, respectively)12–15 are examples of RNA elements that exert their regulatory effects at different levels of the gene expression pathway. Unlike messenger RNAs (mRNAs), these regulatory RNAs are often noncoding RNAs (ncRNAs) or do not encode protein information. In addition, these regulatory elements are implemented through diverse physical compositions that can be grouped more generally into cisand trans-acting elements. In the former composition, the regulatory element is integrated within the transcript that harbors the target gene, whereas in the latter composition the regulatory element is a separate RNA molecule that acts on the transcript harboring the target gene through RNA–RNA binding interactions between the two individual molecules. RNA exhibits a wide variety of functional properties, including catalytic, gene regulatory, and ligandbinding activities. In addition, integrated RNA regulatory molecules have been characterized that achieve more sophisticated control over the expression of target proteins through a combination of these functional properties. The functional properties are encoded within the nucleotide sequence of an RNA molecule, which subsequently dictates its secondary and tertiary structure and ultimately its function. RNA adopts different conformations by folding into complex secondary and tertiary structures, which interact with various cellular constituents such as DNA, proteins, small molecules, and other RNA molecules.16,17 Furthermore, RNA molecules exhibit structural flexibility, which enable them to dynamically adopt different conformations. The binding of cellular and environmental molecules to particular conformations has been demonstrated to regulate the equilibrium distribution between stable conformational states.18,19 Unlike larger biomolecules such as proteins, the functional activity of RNA is more directly defined by its secondary structure. This relationship between RNA secondary structure and function, in combination with predictive RNA secondary structure/energetic folding programs, has recently been used by molecular engineers to construct synthetic “designer” regulatory RNA elements.19–21 Recent foundational advances in RNA engineering have demonstrated the programming of regulatory properties through alteration of nucleotide composition and ultimately RNA

Regulating Gene Expression through Engineered RNA Technologies

8-3

s tructure-function relationships. In addition, technological advances in RNA engineering have made it possible to generate desired regulatory properties through rational and/or combinatorial design strategies.19–21 Recent research in RNA biology and engineering support the model of RNA as a versatile molecule possessing biologically relevant gene regulatory properties. Recent advances in RNA engineering demonstrate that the design strategies connecting RNA structure and function are well-developed and provide technologies for the construction of tailored gene regulatory systems. This chapter provides an overview of several classes of naturally occurring RNAs that are involved in gene expression regulation and their synthetic counterparts that have been generated through various molecular design strategies. In addition, examples of applications of these engineered regulators in metabolic pathway engineering will be described. However, recent technological advances in RNA engineering will allow researchers to more broadly implement RNA elements as basic regulatory systems and to develop more sophisticated regulatory systems that involve integrated designs of multiple functional domains. These synthetic regulatory systems enable target gene expression to be precisely and dynamically controlled in response to changing intracellular environments and represent powerful tools that will advance current capabilities in metabolic pathway engineering. Such RNA-based tools can be implemented to redirect pathway fluxes for optimum metabolic yield, tune the expression levels of enzymes involved in a particular metabolic pathway for optimum energy usage, and establish general screens for noninvasively detecting accumulation of key metabolites.

8.2 Basic RNA Regulatory Elements RNA plays a critical role in the control of protein expression and achieves its regulatory properties through diverse mechanisms that act at each stage along the gene expression pathway. Many regulatory RNAs are cis-acting elements that are commonly located in the 5′ or 3′ untranslated regions (UTRs) of transcripts harboring the target gene(s), whereas others are trans-acting elements that are transcribed independently and act on appropriate transcript targets through intermolecular binding events. 21 These RNA regulatory elements typically lack a protein-coding capacity and most examples characterized to-date act at the post-transcriptional level.22 Over the past few years an increasing number of these regulatory RNAs have been discovered and characterized, and their specific roles in the control of gene expression have been demonstrated in both prokaryotes and eukaryotes, including mammals.22 Five major functional classes of RNA regulatory elements are described in this section and are categorized by their mechanism of action: translation initiation, RNase activity, ribozyme, antisense, and RNA interference (RNAi). These classes are selected for their utility and prevalence in use as synthetic RNA regulatory systems.

8.2.1 RNA Elements That Direct Translation Initiation The first class of RNA elements is associated with the regulation of the efficiency of translation initiation. These elements regulate the efficiency of translation through their sequence and structure.23,24 In prokaryotes, mRNAs contain a region in the 5′ end of transcripts called the ribosome binding site (RBS) consisting of a three to nine nucleotide region upstream of the start codon (Figure 8.1A). This sequence is complementary to and interacts with a sequence near the 3′ end of the 16S rRNA (3′-AUUCCUCCA-5′). Translation efficiency is dictated by the number of base-pairs involved in the interaction and the location of these base-pairs in relation to that of the start codon. The sequence, 5′-AGGAGG-3′, is a wellknown consensus sequence called the Shine–Dalgarno (SD) sequence that directs efficient translation in prokaryotes.25 Four to five base-pairing interactions is often strong enough to mediate efficient translation in most prokaryotic mRNAs. The spacing between the RBS sequence and the start codon is also essential, and while it typically ranges between five to eight nucleotides, the optimum spacing has been shown to be five.26

8-4

Gene Expression Tools for Metabolic Pathway Engineering Translation

A

7mG-cap

Ribosome 5´

SD

Translation

B

AUG

Prokaryotic mRNA

Ribosome IRES

5´

AUG

IRES-bearing eukaryotic mRNA

Figure 8.1 RNA elements that enable sequence- or structure- specific recognition by the ribosome within the 5′ UTR of (A) prokaryotic and (B) eukaryotic mRNAs for ribosomal loading and subsequent translation initiation.

While prokaryotic mRNAs contain an internal region for ribosome binding to initiate translation, eukaryotic mRNAs are capped at their 5′ ends with a 7-methyl guanosine (7mG) and this serves to recruit the translational machinery. Most translation is cap-dependent in eukaryotes, where translation is initiated by recruitment of the 40S ribosomal subunit to the 5′ cap region of the transcript.27,28 However, there are some eukaryotic mRNAs where translation is initiated in a cap-independent manner through specific RNA sequences located upstream of the coding region. These sequences are called internal ribosome entry sites (IRESes) (Figure 8.1B) and were first discovered in viral transcripts that lack the cap structure29,30 and subsequently identified in cellular transcripts.27 Typical IRES elements are large, highly structured, cis-acting RNA sequences, analogs to the 5′ cap structure, that are capable of recruiting the 40S ribosomal subunit, or other translation initiation factors, through binding upstream of a translational initiation codon.27 IRES elements characterized to-date are diverse and can range from nine nucleotides to several hundred nucleotides.31,32 IRESes have been demonstrated to facilitate translation initiation during stress conditions such as cell cycle, apoptosis, or hypoxia.28 Researchers have demonstrated that short synthetic RNA elements that mediate translation initiation events can be generated. Synthetic RBS sequences have been generated with varying strengths through mutagenesis and screening procedures in various bacterial systems.33–35 In addition, IRES sequences have been selected from random sequence libraries in yeast36 and mammalian systems.37 These short synthetic IRES elements are assumed to function similar to bacterial RBS sequences through direct hybridization to regions of the 18s rRNA.36,37 These IRES elements have been shown to exhibit varying translation initiation activities and increased function when present in multiple copies. RNA elements that enable internal ribosome entry and regulate translation, such as RBS and IRES sequences, are attractive tools for pathway engineering in that they enable coordinated production of multiple gene products from a single transcript when placed within intercistronic regions. Furthermore, different RBS and IRES sequences have been shown to have varying translational efficiencies28,36–38 and can also be genetically manipulated to exhibit differential activities.27 This tunability is a valuable feature for metabolic engineering, as RNA elements that impart different translational efficiencies may be used to control the differential expression of several enzymes encoded within a multicistronic transcript, so that balanced enzyme expression levels can be achieved, which will enable optimized yield of the desired metabolite product.

8.2.2 RNA Elements That Direct RNase Activity The next class of regulatory RNAs is associated with the mediation of RNase activity on cellular transcripts. RNases can function through endoribonuclease activity, where cleavage occurs internal to the transcript, and exoribonuclease activity, where degradation occurs processively from one end of the transcript. RNA elements can affect the rates of transcript cleavage and processing by these enzymes. The RNase III family of endoribonucleases are found in both prokaryotic and eukaryotic organisms and are known to cleave hairpin structures or double-stranded regions within RNA.39–42 Substrate recognition, sequence dependence, and cleavage activity of various RNase III endonucleases have been characterized and demonstrated to vary among different organisms.43–46 In addition to being present within ncRNA molecules, examples of RNase III substrates have been discovered within mRNAs where they serve as cleavage sites for an RNase III family enzyme such as Rnt1p, the yeast orthologue of RNase III,

Regulating Gene Expression through Engineered RNA Technologies

8-5

and silence target gene expression through rapid transcript degradation following cleavage.47,48 These examples support that RNase III substrates can be engineered into transcripts to potentially regulate gene silencing effects. Endoribonucleases have also been characterized that cleave single-stranded sequences within cellular RNAs. One example of this is RNase E, which is responsible for bulk mRNA processing in Escherichia coli and cleaves AU-rich sequences.49–53 Some studies have indicated that a free 5′ end is required for RNase E to bind to the mRNA before subsequent cleavage at sites internal to the transcript.54–56 Two types of RNA elements can be used to modulate RNase E activity on cellular transcripts. The first element is an RNase E substrate site, which is a single-stranded AU-rich sequence that directs cleavage at a particular location in the transcript. The second element is a hairpin structure, located close to (and no more than five nucleotides from) the 5′ end of the transcript.41,55 Such elements have been shown to inhibit cleavage activity of RNase E on the transcript. The strengths of both of these elements can be altered by sequence and structural modification, respectively.57,58 Exonucleases are responsible for bulk mRNA degradation, and these enzymes generally degrade cellular RNAs processively starting from one end of the transcript.59 RNA elements that form hairpins and are located at the 3′ end of cellular transcripts have been shown to inhibit degradation by exoribonucleases.41 These RNA elements stabilize the transcript through their secondary structure, which potentially serves as a barrier to exonucleolytic activities.53 The strengths of these elements can be altered by structural modification.60

8.2.3 RNA Elements That Direct Processing and Degradation of Transcripts through Direct Cleavage The third class of RNA regulatory elements mediates direct cleavage of cellular transcript targets w ithout assistance from protein factors. Ribozymes are catalytic RNAs that were discovered over 20 years ago.61–65 They are naturally found in plant RNA viruses, satellite RNAs, and viroids66 and most commonly catalyze the cleavage and/or ligation of RNA molecules.67 There are four types of self-cleaving ribozymes:4 hammerhead, hairpin, hepatitis delta virus, and Varkud satellite. Self-cleavage occurs at a specific phosphodiester bond via an internal phosphoester transfer reaction68 in a divalent metal ion-dependent manner.69 Each class of natural ribozymes exhibits a highly conserved catalytic core sequence. The hammerhead ribozyme has been extensively studied and most commonly used for its potential significance in targeting messages in biotechnological and biomedical applications due to its small size, ease of design, and rapid kinetics.67,70 Hammerhead ribozymes are generally known to cleave any 5′-NUX-3′ triplets of an RNA sequence 3′ to the X where N is any nucleotide, U is uridine, and X is any nucleotide except guanosine (Figure 8.2). The sequence 5′-GUC-3′ has been the conventional and most commonly found cleavage triplet in nature.69,71 Two of the hammerhead ribozyme’s stems are closed by a loop, and the majority of the hammerhead ribozymes are embodied within viral RNA sequences through stem III (Figure 8.2A).72 The presence of loops closing stem I and stem II has been shown to be critical to the catalytic activity of the hammerhead ribozyme in cellular environments, suggesting possible loop I–loop II interactions.72–74 Naturallyoccurring hammerhead ribozymes catalyze self-cleavage in cis; however, synthetic ribozymes that cleave in trans have been constructed by eliminating one of the two stem loops. Such genetic manipulations result in three possible designs, depending on which strands of the stems are targeted to bind the target transcript for cleavage.69 In one example, the loop sequence of stem I is removed, and the stem I and stem III strands are used as targeting arms to bind a transcript (Figure 8.2B).71 Ribozymes are attractive molecular tools in metabolic engineering for their potential applications in regulating enzyme expression in diverse cellular systems. For cis-acting regulation systems, a hammerhead ribozyme can be integrated into a noncoding region of the target transcript, resulting in self-cleavage and subsequent degradation of the transcript, resulting in lowered expression levels of the target enzyme(s). Alternatively, trans-acting ribozymes can regulate gene expression through

8-6

Gene Expression Tools for Metabolic Pathway Engineering

A

Stem I NNNN NNNN

5´

B

Cis-cleaving hammerhead ribozyme G C U X

A N G

A

Stem I

Stem II

AG U A A NN N N Stem III NN

Trans-cleaving hammerhead ribozyme

NNNN NNNN

5´ 3´

NNNN NNNN

G C U X

3´

U N N N 5´

A N G

A

Stem II

A G A A N N Stem III N

NNNN NNNN

3´

Figure 8.2 Structures of (A) a self- or cis-cleaving hammerhead ribozyme. (Adapted from Salehi-Ashtiani, K. and Szostak, J.W. Nature 414, 82–84, 2001.) (B) A trans-cleaving ribozyme. The conserved catalytic core sequence of the ribozyme is shown; N represents any nucleotide. The arrow indicates the cleavage site. Stems I and II are enclosed by loop sequences in the cis-cleaving ribozyme and therefore the cleavage is intramolecular, whereas only one of the stems is enclosed by a loop sequence in the trans-cleaving ribozyme and therefore the cleavage is intermolecular. Nonconserved loop sequences are illustrated as circular lines.

identical mechanisms, but they must be engineered to bind to the target transcript in such a way that satisfies requirements for catalytic activity of the regulatory element. One basic strategy is to target regions of transcripts that harbor an “NUX” triplet, which defines the cleavage site on the transcript, and to design a ribozyme carrying the catalytic core and targeting arm sequences that are complementary to the appropriate region of the target transcript for sequence-specific base-pairing (Figure 8.2B).65 In vitro selection processes have been used to generate self-cleaving hammerhead ribozymes with different kinetic properties.75,76 Such engineered ribozymes may be potentially useful in metabolic pathway engineering, as they can be used as regulatory elements to obtain varying enzyme levels through their different cleavage activities. Similarly, trans-cleaving hammerhead ribozyme variants exhibiting varying kinetic properties have been developed in vitro.77–79 The in vivo regulatory activities of both cis- and trans-cleaving hammerhead ribozymes have been demonstrated in various cellular systems.72,80–87

8.2.4 RNA Elements That Regulate Gene Expression through Antisense Mechanisms The fourth class of RNA regulatory elements mediates gene expression through antisense mechanisms. The regulation of gene expression by an antisense RNA was first discovered as a natural process in prokaryotes.88 Antisense RNA molecules are single-stranded, trans-acting regulatory elements that bind to their target transcripts through a sequence-specific hybridization event, resulting in the inhibition of expression from the bound mRNA through one of the two mechanisms.89, 90 The first mechanism involves targeting by double-stranded RNA(dsRNA)-cleaving enzymes such as the RNase III family of endoribonucleases, previously described (Figure 8.3A). The second mechanism involves the binding of the regulatory element to the target transcript around the translation start site or the 5′ cap region such that access or scanning of the ribosome is inhibited, thereby interfering with the translation initiation process (Figure 8.3B). Since their initial identification in bacteria,88 these regulatory elements have been characterized in many cellular processes including metabolism. As one example, SgrS is an antisense RNA that down-regulates the expression of the glucose transporter through base-pairing with the ptsG transcript when localized to the cell membrane, thereby lowering the accumulation of toxic phosphosugar metabolites.91,92

8-7

Regulating Gene Expression through Engineered RNA Technologies A

5´

B

RNase III cleavage Target mRNA 3´

Antisense RNA

5´

3´

5´

Inhibited ribosome access for translation Target mRNA 3´

Antisense RNA

5´

3´

Figure 8.3 Two major mechanisms through which antisense RNA mediates inhibition of target gene expression. (A) RNase III type nuclease-mediated cleavage of double-stranded RNA molecules. (B) Antisense-directed steric hindrance based on RNA-RNA interactions to ribosomal loading or scanning on the target mRNA for proper translation. (Adapted from Kurreck, J. Eur. J. Biochem., 270, 1628–1644, 2003.)

Antisense RNA molecules are powerful regulatory elements and are potentially useful molecular tools in metabolic engineering where these elements can be designed to target specific metabolic genes to alter cellular enzyme levels and pathway fluxes. Synthetic RNA elements have been designed to regulate target gene expression through the antisense pathway. These regulatory elements can be synthesized intracellularly from a plasmid and have been shown to selectively target and inhibit a variety of genes. As one example, Bunch and Goldstein demonstrated that the induction of antisense RNAs expressed from a promoter targeting the alcohol dehydrogenase gene significantly reduced the expression level of the target gene in cultured Drosophila cells.93 In a similar example, Blomberg et al. expressed a natural antisense RNA from a plasmid and demonstrated translational inhibition of its target transcript in bacteria where the antisense RNA binds to the leader region of the mRNA.94 In another example, Bonoli et al. employed a long antisense RNA regulatory element complementary to the 5′ UTR of a target transcript and demonstrated effective gene silencing in the budding yeast S. cerevisiae.95 Antisense regulatory elements have also been shown to silence gene expression through different mechanisms. As one example, Novick et al. showed that antisense RNAs inhibited the expression of a target mRNA encoding the plasmid replication initiator protein by inducing the formation of a terminator hairpin upstream of the start codon, and that in the absence of these antisense RNAs an upstream sequence hybridizes with part of the terminator hairpin, thereby preventing termination and allowing transcription of the target gene.96 In another example, a sequence segment of a short RNA molecule (DsrA) was proposed to act as an antisense RNA that suppresses gene expression by RNA–RNA basepairing interactions with the start and stop codon regions of a target transcript, while another segment of this short RNA molecule enhances the translation of a different target mRNA by releasing the translation initiation region initially sequestered by a cis-acting antisense sequence.97–99

8.2.5 RNA Elements That Regulate Gene Expression through the RNAi Pathway The final class of RNA regulatory elements mediates gene expression through the RNAi pathway. RNAi is a highly evolutionally conserved RNA-directed gene-silencing pathway.100 This pathway was originally observed in plants,101,102 but first demonstrated by Fire and Mello in the nematode worm Caenorhabditis elegans,103 and later found in a wide variety of metazoans.12 This silencing pathway is triggered by dsRNAs, which cause sequence-specific degradation of target transcripts when introduced into a cell (Figure 8.4).13 The dsRNA substrates can be generated from numerous sources including viruses, overlapping transcripts, and transposons.12,104 These RNA regulatory elements are processed into short RNA duplexes known as siRNAs by a RNase III-type enzyme called Dicer.105 The resulting siRNAs are generally 21–23 nucleotides-long dsRNAs with two nucleotide overhangs at the 3′ ends with 5′ phosphate and 3′ hydroxyl groups.12,13 These cleaved products are subsequently unwound and incorporated into endoribonuclease-containing complexes known as RNAi-induced silencing complexes (RISCs).106 The

8-8

Gene Expression Tools for Metabolic Pathway Engineering

Cytoplasm

Nucleus

Primary miRNA transcript (pri-miRNA) Drosha miRNA precursor (pre-miRNA)

Exportin-5 Pre-miRNA or shRNA

dsRNA ATP

ATP Dicer

Dicer miRNA

siRNA

miRNP

RISC

mRNA cleavage

AAA

AAA AAA Translational repression

Figure 8.4 A schematic diagram of the RNA interference (RNAi) pathway. dsRNAs introduced artificially or generated inside the cell can induce sequence-specific gene silencing through the RNAi pathway. These molecules are initially processed into siRNA duplexes by Dicer, followed by unwinding of these duplexes and incorporation of one of the duplex strands into RISC. The functional RISC carries the antisense strand of the duplex, which mediates gene silencing of the target mRNA through cleavage. Primary miRNA transcripts (pri-mRNAs) are processed by Drosha in the nucleus to yield miRNA precursors (pre-miRNAs or shRNAs), which are exported into the cytoplasm by Exportin-5. Once in the cytoplasm, the pre-miRNA is further processed by Dicer to yield siRNA-like duplexes known as miRNAs. The duplex is unwound, assembled into miRNP/RISC, and directs translational repression or cleavage of the target mRNA depending on the degree of complementarity between the miRNA antisense strand and its targeting transcript sequence. (Adapted from Meister, G. and Tuschl, T. Nature, 431, 343–349, 2004.)

functional RISC contains the antisense strand and acts as a multiple-turnover enzyme complex.107 The antisense strand hybridizes in a sequence-specific manner to the target mRNA, which subsequently leads to ATP-dependent cleavage of the transcript by endonucleases within RISC at a single site near the center of the siRNA strand.108,109 In some organisms, RNAi-mediated gene-silencing involves enzymes known as RNA-dependent RNA polymerases (RdRPs), which are thought to amplify the silencing effect by replicating siRNAs.110,111 This amplification also has been shown to enable the silencing effect to be spread across cells.12,14 RNAi was initially recognized as an innate host defense mechanism to RNA viruses and transposable elements.12,112 However, certain genes in plants and animals naturally encode RNA hairpins113–115 that are approximately 70 nucleotides in length.12–14,104 These endogenous structural elements are the precursors of another class of regulatory elements known as miRNAs that have been shown to result in sequence-specific gene silencing (Figure 8.4).116–118 The first step in miRNA maturation is the nuclear cleavage of the primary hairpin transcript (pri-miRNA)119,120 by a second RNase III-type endonuclease called Drosha.120 The cleaved hairpin has a 5′ phosphate and a two nucleotide 3′ overhang remaining

Regulating Gene Expression through Engineered RNA Technologies

8-9

at the base of the stem.120,121 This cleaved hairpin product, known as the pre-miRNA, is subsequently exported to the cytoplasm by the RanGTP-dependent nuclear export receptor, Exportin-5.122–124 The pre-miRNA is processed in the cytoplasm by Dicer, following processing steps similar to that described for siRNAs above.12,104 Once loaded into the ribonucleoprotein complex (miRNP),125 the single-stranded miRNA can direct silencing through two different mechanisms: cleavage or translational repression.12,104 The miRNA-guided cleavage mechanism is similar to that of siRNAs, whereas miRNA-directed translational inhibition occurs through imperfect binding of miRNAs to their target transcripts in the 3′ UTR.12 Researchers have taken advantage of the effectiveness of RNAi-mediated gene silencing to construct synthetic RNA regulatory elements that silence target genes through this pathway. dsRNA elements have been engineered through a variety of methods to down-regulate expression of various target genes.13 In vitro-synthesized dsRNAs have been shown to be processed by Dicer into siRNAs resulting in target gene silencing when introduced into cells.108,109 It has also been demonstrated that chemicallysynthesized siRNAs, that bypass the Dicer cleavage event and load directly into the RISC complex, effectively suppress gene expression.126,127 This strategy eliminates potential induction of the interferon response by dsRNAs longer than 30 nucleotides.126 However, other studies have demonstrated that slightly larger RNA duplexes (~27-mers) can effectively target sites that are refractory to silencing by 21-mer siRNAs without inducing the interferon response.128 Synthetic siRNAs expressed in vivo from RNA polymerase III (pol III) promoters, where individual sense and antisense strands of each siRNA are separately synthesized and assembled in trans, have also been demonstrated to silence target gene expression in vivo.129,130 As synthetic RNAi substrates gain recognition as powerful research and application-based tools and as our understanding of the pathway mechanism deepens, the molecular design strategies guiding the construction of these regulatory elements has evolved. As an example, short hairpin RNAs (shRNAs) have been constructed to produce Dicer-processable substrates through in vitro or in vivo transcription events, and have been demonstrated to induce efficient gene silencing.131–135 In addition, miRNA-based hairpins, with either perfect or imperfect complementarity to their target transcripts, have been constructed and expressed as part of longer transcripts from an RNA polymerase II (pol II) promoter and shown to effectively silence target gene expression.136 Compared with earlier designs, this implementation more closely mimics that of natural miRNAs, as miRNA precursors are usually encoded within the context of longer transcripts.

8.3 RNA Sensory Elements RNA molecules are functionally diverse and structurally flexible, and exhibit a wide range of regulatory properties such as catalytic, interactive, and allosteric binding properties. Another unique property of RNA molecules is their sensing capability to various types of inputs ranging from temperature to varied molecular ligands. Similar to other regulatory properties, RNA employs this property in exerting its diverse functional roles.

8.3.1 RNA Elements That Serve as Thermosensors RNA secondary structure is known to be highly dependent on temperature such that RNA assumes different structures in response to changes in temperature. Examples of such temperature-responsive RNA elements have been described.8 For instance, genes that encode small heat-shock proteins and regulators of heat shock-responsive genes were found to contain sequences in their 5′ UTRs that are capable of sequestering the RBS sequence and the start codon. The regulation of adoptable secondary structures at different temperatures is used to modulate the accessibility of these sequences and subsequently modulate target gene expression levels, thereby enabling these RNA elements to serve as thermosensing gene expression regulators.

8-10

Gene Expression Tools for Metabolic Pathway Engineering

8.3.2 RNA Elements That Bind Nucleic Acids Instances where RNA molecules exert their regulatory activities through base-pairing interactions with other RNA molecules have been described above. Another good example of RNA–RNA interactions through which a regulatory function was achieved was described by Isaacs et al.137 In their engineered system, an RNA sequence segment was integrated into a location upstream of the RBS of a reporter gene to serve as a nucleic acid-sensing domain. This segment sequesters the RBS in the absence of the target RNA molecule, thereby suppressing the expression of the reporter gene. In the presence of the target RNA molecule, the sensor domain becomes bound to the target through RNA–RNA base-pairing interactions and releases the RBS for efficient translation, thereby serving as a nucleic acid-binding sensor domain.

8.3.3 RNA Elements That Bind Molecular Ligands The ability of RNA structural elements to bind specific molecular ligands has been characterized in several natural systems. However, researchers have also generated many examples of synthetic RNA ligandbinding elements, referred to as aptamers, in the laboratory. Synthetic aptamers are generated through in vitro selection or Systematic Evolution of Ligands by EXponential enrichment (SELEX) processes (Figure 8.5).138,139 SELEX provides a very powerful selection method through which nucleic acid molecules exhibiting rare and specific binding properties to a ligand of interest can be generated de novo by selecting for functional binding activities from large randomized nucleic acid pools through iterative in vitro selection and amplification cycles. In vitro aptamer selection schemes begin with a large pool of single-stranded Random sequence library

Enriched aptamer pool

Repeat Incubate

Target of interest

SELEX

Amplify

Partition

Elute/recover

Recovered bound pool

Bound pool

Unbound pool

Figure 8.5 A schematic illustration of an in vitro selection process known as SELEX. The process starts with a large randomized pool of single-stranded RNA molecules transcribed from their DNA templates. The RNA pool is then incubated with target molecules of interest followed by separation of the target-bound pool, which is reverse-transcribed to cDNA and amplified for the next selection cycle. The selection cycles are repeated typically for eight to 15 cycles.

Regulating Gene Expression through Engineered RNA Technologies

8-11

RNA molecules generated through in vitro transcription from a DNA library. Aptamer pools are usually c omposed of 30–70 randomized nucleotides and encompass an initial sequence diversity between 1014 and 1015 molecules.140 The pool is incubated with the target ligand of interest and subject to a partitioning event to separate bound members from unbound members. The most commonly used partitioning schemes are based on affinity chromatography. Bound (functional) members are recovered and then amplified through reverse transcription and polymerase chain reaction (PCR) to yield a pool enriched for target binding. This enriched pool will become the input pool for the next round of selection. SELEX is particularly powerful in that RNA sequences that bind particular ligands can be generated de novo and the binding properties of the resulting aptamers can be programmed as desired. Specifically, aptamer binding properties such as affinities and specificities can be programmed by tailoring the stringency and counter-selections during each cycle. Aptamer affinities are tailored through the stringency of each selection cycle, normally by modifying wash volumes and target concentrations, whereas aptamer specificities are tailored through performed counter-selections with molecular analogs to the target. Typically eight to fifteen selection cycles are required to generate aptamers with high binding affinities and specificities. Recent work has demonstrated that protein aptamer selection schemes can be automated using standard robotics.141–143 In addition, partitioning schemes for protein aptamer selections based on capillary electrophoresis have been recently developed that provide several advantages over conventional affinity-based partitioning schemes.144–146 In particular, the efficiency of separation between the bound and unbound pools is significantly greater such that aptamers can be generated in fewer selection cycles. Synthetic aptamers have been generated to a wide range of target ligands including small molecules, antibiotics, carbohydrates, amino acids, peptides, proteins,147 and even organelles such as phospholipid bilayers,148,149 indicating that synthetic aptamers can be potentially generated to targets of interest for metabolic engineering applications.

8.4 Natural and Engineered Riboswitches Cells must regulate the expression of various genes in a controlled manner in response to diverse intracellular and extracellular signals such as changes in metabolic demands and environmental fluctuations.10 This degree of regulation requires highly responsive genetic sensors and regulators that can detect and quantify the magnitude of a particular signal and subsequently modulate, as necessary, the levels and activities of appropriate gene products. Traditionally, proteins have been recognized as the molecules responsible for performing the molecular sensing and actuation functions; however, recent discoveries have revealed that elements within mRNAs, called riboswitches, are also capable of performing biological sensor-actuator functions. Riboswitches are cis-acting RNA elements that modulate the expression of target genes through integrated sensor and regulatory domains. This integration scheme enables riboswitches to sense their target ligands, typically cellular metabolites, through direct binding interactions and thus autonomously mediate their own functional activity in response to changing metabolite levels. While the majority of riboswitches characterized to-date have been discovered in bacteria, it has been shown more recently that these complex RNA regulatory elements are also present in eukaryotes.9,150

8.4.1 General Composition and Conformational Dynamics of Riboswitches Riboswitches are naturally occurring, metabolite-responsive gene control elements primarily located within the 5′ UTRs of cellular transcripts.10 A riboswitch is typically composed of two domains: the ligand-binding or sensor domain known as the aptamer domain, and the gene regulatory domain known as the expression platform (Figure 8.6).10 Both domains are structurally flexible and capable of adopting different conformations. Riboswitches accomplish ligand-controlled regulation of gene expression through targeted dynamic switching between two primary conformations at equilibrium: one in which the regulatory domain is active and the other in which it is inactive. One of these conformational states

8-12

Gene Expression Tools for Metabolic Pathway Engineering Gene expression ON

mRNA 5´ Aptamer Expression domain platform

Gene expression OFF

Metabolite

mRNA 5´ Metabolite binding-stabilized conformation

Figure 8.6 A schematic diagram of a typical riboswitch composed of two distinct domains: the ligand-binding domain known as the aptamer domain and the regulatory domain known as the expression platform. Metabolite binding to the aptamer domain enables the stabilization of the rearranged conformation of the riboswitch (right), resulting in a shift in the equilibrium distribution between the two regulatory conformations and metabolitedependent regulation of target gene expression. (Adapted from Winkler, W.C. and Breaker, R.R. Chembiochem, 4, 1024–1032, 2003.)

is associated with the formation of the ligand-binding pocket within the aptamer domain, whereas the other carries an incorrectly formed binding pocket. Therefore, riboswitches can either repress or activate the expression of the target gene by assuming an appropriate combination of different conformational states adopted by the aptamer and regulatory domains. Ligand binding to the riboswitch shifts the distribution between these stable conformations to favor the ligand-bound form, thereby resulting in an allosteric gene regulation event.

8.4.2 Mechanisms of Ligand-Controlled Gene Regulation by Riboswitches Riboswitches regulate target gene expression in cis in response to changing metabolite levels through different mechanisms including transcription termination,151,152 translation initiation,153 mRNA processing,154 and splicing150 (Figure 8.7). Transcription termination takes place through the mediated formation of a rho-independent terminator stem, which is usually GC rich, thereby destabilizing the transcription elongation complex.151 Regulation can also target the disruption of the formation of a terminator stem upon metabolite binding, which allows proper transcription and thus up-regulation of target expression levels.152 Riboswitches can also mediate translation initiation by adopting a secondary structure that interferes with ribosomal access to the target gene, such as sequestering the RBS sequence in prokaryotic cells.153 Regulation targeting transcript processing or deactivation can be achieved through expression platforms composed of self-cleaving ribozymes, where the target transcripts undergo a ligand-directed cleavage event.154 Regulation through splicing has recently been demonstrated in a filamentous fungus, in which metabolite binding to its riboswitch can either repress or activate the expression of the main protein product by modulating the splice site choice through structural rearrangements.150 Metabolite-binding domains have also been found within the 3′ UTRs of transcripts in certain organisms, suggesting that riboswitch-mediated gene control may also occur through the regulation of mRNA stability.9 Riboswitches exhibiting unique mechanistic properties have also been recently identified, such as cooperative binding of the target metabolite162 and a Boolean two-input NOR gate behavior.165 Different classes of natural riboswitches including those with unique properties and their functional roles in metabolic networks in various organisms are discussed in the following section.

8.4.3 Riboswitch Targets and Implementation in Metabolic Networks Most of the riboswitches characterized to-date have been found in bacteria. Cells employ these elements as genetic regulators in many fundamental metabolic pathways in response to changing metabolite levels. Target metabolites include various classes of small molecules such as amino acids, nucleotide bases,

8-13

Regulating Gene Expression through Engineered RNA Technologies A Transcription ON 5´

Transcription OFF

Metabolite

Antiterminator

5´

Terminator

B

5´

RBS

Translation ON RBS AUG Ribosome loaded

Metabolite

5´

5´

Ribozyme catalytic cleavage inactivated

Transcript inactive

Cleavage

Metabolite

Ribosome unable to load

AUG

C Transcript active

Translation OFF

5´ Ribozyme catalytic cleavage activated

D II

5´

I

Metabolite Alternative splicing

+

5´

Spliced product II

Constitutive splicing 5´

Spliced product I

Figure 8.7 A schematic illustration of mechanisms through which riboswitches achieve gene expression regulation in response to binding their target metabolites. Ligand-regulated mechanisms involve (A) the formation of a transcription terminator stem, (B) sequestering the RBS and inhibiting translation initiation, (C) mRNA processing through catalytic cleavage of the transcript, and (D) alternative splicing using different sets of splice sites, I and II, respectively. (A and B adapted from Nudler, E. and Mironov, A.S. Trends Biochem. Sci., 29, 11–17, 2004.)

and cofactors. The presence of the integrated sensor domain enables riboswitches to sense intracellular metabolite concentrations through specific binding interactions and subsequently regulate expression levels of the associated gene product through allosteric conformational changes. Typically, this gene product is an enzyme directly involved in the biosynthesis, biodegradation, and/or transport of the target metabolite.7 This mode of regulation provides a direct dynamic relationship between the intracellular metabolite concentration and the expression levels of the enzyme responsible for the metabolism, catabolism, and transport of the target metabolite. In addition, riboswitches are capable of binding their

8-14

Gene Expression Tools for Metabolic Pathway Engineering

target metabolites with specificities and affinities appropriate to the intracellular environment. In this section, several different classes of natural riboswitches, their corresponding target metabolites, and functional roles are described. 8.4.3.1 Thiamine Pyrophosphate (TPP) Riboswitch TPP is a cofactor of decarboxylase enzymes involved in carbohydrate metabolism and derived from the precursor thiamine also known as vitamin B1.6 This riboswitch class is the most widespread in nature, and has been identified in bacteria, archaea, fungi, and plants where they have been shown to regulate the expression levels of enzymes responsible for the biosynthesis of thiamine and thus its derivative TPP.9 The sensor domain, referred to as “thi box”, present in the thiC gene of E. coli recognizes its target TPP with an apparent Kd of ~100 nM and discriminates against a precursor by ~1000-fold through the pyrophosphate moiety of TPP.153 This riboswitch employs various mechanisms in gene expression regulation, including transcription termination and translational initiation in bacteria,153,155 alternative splicing in fungi,150 and proposed mRNA processing in plants.9 While most TPP riboswitches are composed of a single metabolite-binding domain and a single expression platform, a recent study revealed that the TPP riboswitch system of the bacterium Bacillus anthracis contains two complete tandem riboswitches that respond independently to changing concentration levels of the same metabolite.156 8.4.3.2 Adenosylcobalamin (Coenzyme B12) Riboswitch This class of riboswitch is widespread across diverse bacterial species and binds coenzyme B12, a derivative of vitamin B12.6 Studies revealed that the riboswitch recognizes its target metabolite through the adenosyl moiety, the dimethylbenzimidazole moiety, and the cobinamide ring.157 In addition, this riboswitch regulates the expression levels of the cobalamin biosynthetic and transport genes through two different mechanisms: transcription termination and translation initiation.157,158 8.4.3.3 Flavin Mononucleotide (FMN) Riboswitch FMN is an essential cofactor for many redox-active enzymes 6 and derived from the precursor riboflavin, or vitamin B2. The FMN riboswitch is another riboswitch class widespread in bacteria. Comparative sequence analysis has revealed that a conserved metabolite-binding domain called the RFN element is commonly found in the untranslated leader regions of mRNAs encoding enzymes involved in the biosynthesis of riboflavin, FMN, and their transport.159 This element, located in the leader region of the ribDEAHT operon of Bacillus subtilis encoding the FMN biosynthetic genes, recognizes FMN with an apparent Kd of ~5 nM and exhibits a molecular discrimination against riboflavin, which lacks a single phosphate group, by almost a 1000-fold (~3 µM).159 Through this precise molecular recognition, this riboswitch regulates gene expression in response to changing levels of its target metabolite through either transcription termination (ribDEAHT) or translation initiation (ypaA).159 8.4.3.4 S-Adenosylmethionine (SAM) Riboswitch SAM is an essential coenzyme derived from methionine by SAM synthetase and serves as a source of methyl groups for methylase enzymes.6,160 The aptamer domain of this riboswitch class, called the S-box, is a highly conserved RNA domain and is located in the leader sequences of genes involved in sulfur metabolism and the biosynthesis of cysteine, methionine, and SAM.160 This riboswitch has been mostly found in gram-positive bacteria where it down-regulates target expression through transcription termination in response to SAM binding with high affinity (Kd of SAM to the S-box in B. subtilis ~4 nM) and molecular discrimination against metabolite analogs that lack a single methyl or methylene group.151 8.4.3.5 Lysine Riboswitch The lysine riboswitch contains a conserved aptamer domain known as the L-box that serves as a sensor in detecting the levels of the target amino acid lysine and subsequently regulates its biosynthesis.161

Regulating Gene Expression through Engineered RNA Technologies

8-15

The L-box of the B. subtilis lysine riboswitch, located in the 5′ UTR of the lysC gene, encoding the first enzyme in the biosynthetic pathway of L-lysine from L-aspartic acid, binds L-lysine with an apparent Kd of ~1 µM and subsequently down-regulates the expression of this enzyme through transcription termination.161 This aptamer domain was shown to exhibit high target specificity by discriminating closely related analogs including D-lysine.161 8.4.3.6 Glycine Riboswitch This amino acid-responsive riboswitch162 is found in a wide variety of bacteria and regulates the levels of the target amino acid glycine. While most riboswitch classes characterized to-date contain a single conserved metabolite-binding domain, two similar and highly structured aptamer domains connected through a short linker are often found in this class of riboswitch. In B. subtilis these conserved domains are located in the leader region of the gcvT operon, encoding proteins involved in the glycine cleavage system, and up-regulates the expression of target genes as the amino acid becomes present in excess and binds to the riboswitch. Both aptamer domains are specific to glycine and exhibit molecular discrimination against closely related analogs such as alanine and serine. Strikingly, these two domains exhibit cooperative binding to their target, where binding of glycine to the first aptamer enhances the second aptamer’s target affinity, resulting in a more digital gene control response. This property ensures that glycine is sufficiently available for protein synthesis before the amino acid gets degraded upon activation of its cleavage system by this sophisticated genetic ON riboswitch. 8.4.3.7 Guanine and Adenine Riboswitches These purine-responsive riboswitches exhibit conserved sequences and structures of the binding domains for their corresponding target metabolites termed the G-box.6 The guanine-specific riboswitch class was first identified in several operons of B. subtilis, encoding genes mainly involved in purine biosynthesis and transport.163 The G-box of the xpt-pbuX operon selectively binds guanine and hypoxanthine with apparent Kds of ~5 nM and ~50 nM, respectively, and has been shown to down-regulate target gene expression through transcription termination in response to ligand binding. Many purine analogs fail to bind this G-box; for example, adenine binds with an affinity approximately five orders of magnitude lower (Kd ~300 µM) than that of guanine.163,164 Interestingly, the G-box located upstream of the ydhL gene, which encodes a putative purine efflux pump,164 is specific for adenine (Kd ~300 nM) and discriminates against guanine (Kd > 10000 nM).152 Comparative sequence analysis reveals that the aptamer core of this G-box differs from that of the previous G-box by a single nucleotide (C for guanine and U for adenine), indicating that this single nucleotide is the determinant in discriminating between the two target ligands.152 In addition unlike the guanine riboswitch, the adenine riboswitch up-regulates target gene expression through inhibition of transcription termination in response to adenine binding.152 8.4.3.8 Glucosamine-6-Phosphate (GlcN6P) Riboswitch This riboswitch class was identified in gram-positive bacteria and is the only riboswitch class discovered to-date that employs a unique regulatory mechanism—ribozyme-based self-cleavage. The ribozyme is responsive to GlcN6P and located upstream of the glmS gene, which encodes an enzyme responsible for the biosynthesis of GlcN6P from fructose-6-phosphate and glutamine, and down-regulates target gene expression through transcript cleavage activated in response to ligand binding.154 8.4.3.9 SAM-Coenzyme B12 Riboswitch The riboswitch system of B. clausii contains two tandem riboswitches that respond to two different metabolites, SAM and coenzyme B12.165 This tandem genetic control element located in the 5′ region of the metE gene serves as a two-input NOR logic gate, where each metabolite can bind to its aptamer, induce transcription termination, and independently repress the expression of the target gene. The

8-16

Gene Expression Tools for Metabolic Pathway Engineering

genes metE and metH encode for proteins that independently catalyze the biosynthesis of methionine from homocysteine, and the latter requires a cofactor that is a derivative of coenzyme B12. Therefore, when coenzyme B12 is abundant, only the expression of metE should be decreased, as cells can synthesize methionine and its derivative SAM more efficiently through the gene product of metH, which uses the cofactor derived from coenzyme B12. This level of metabolite resource exploitation may explain why such a sophisticated genetic control element is required and employed by cells.

8.4.4 Synthetic Riboswitches that Control Target Gene Expression Levels Riboswitches are sophisticated gene control elements that achieve regulation by direct sensing of target metabolite levels and exhibit molecular recognition, high affinities, and precise control. Examples of the level of complexity achieved by these genetic regulatory elements include the self-cleaving ability of the glmS riboswitch,154 the alternative splicing control of the TPP riboswitch,156 the cooperative binding property of the glycine riboswitch,162 and the NOR gate signal processing behavior of the SAMcoenzyme B12 riboswitch,165 demonstrating that riboswitches are powerful sensor-actuator systems for autonomous gene expression control. Thus, engineered riboswitch control elements are attractive molecular tools for applications in synthetic metabolic network design, by directing pathway fluxes through differential and/or coordinated regulation of a set of associated gene products in response to specific metabolites. Many recent engineering efforts have focused on the construction of such synthetic, ligand-controlled, RNA-based gene regulatory elements through the integration of sensor and regulatory domains. The flexibility in RNA regulatory systems, the programmability inherent in RNA design strategies, and the ability to generate sensor domains to potentially any molecular ligand of interest, enable such synthetic riboswitch systems to hold significant promise in transforming our ability to engineer cellular metabolic networks. 8.4.4.1 Riboswitch Construction Based on Aptamer Insertion within a Transcript Synthetic riboswitches have been constructed by inserting an aptamer or multiple aptamers directly into the 5′ UTR of a target mRNA in eukaryotes (Figure 8.8). Although the insertion location may be any region in the 5′ UTR, it is often chosen to be in the vicinity of the cap region or the start codon of the transcript. This insertion strategy is a trail-and-error strategy, as the inserted aptamer(s) often does not result in ligand-mediated regulation of gene expression and may even cause substantial knockdown of the target gene in the absence of ligand. This strategy requires that the insertion of the aptamer itself and its associated secondary structure does not interfere with translation in the absence of the target ligand. The binding of ligand to the aptamer results in structural stabilization due to the molecular binding interaction between the aptamer and its target.147,166 Similar to binding of a protein to the 5′ UTR,167,168 this stabilized secondary structure can repress translation presumably by interfering with ribosomal scanning or ribosome-mRNA interactions required for effective translation.169 Werstuck and Green constructed the first examples of such riboswitches by inserting small molecule-binding RNA aptamers into the 5′ UTR of transcripts.170 Translation was demonstrated to be repressed by the addition of the appropriate ligands both in vitro and in vivo in mammalian cells. Following this initial work, different research groups have constructed synthetic riboswitches through this design strategy using theophylline-,171 biotin-,172 and tetracycline-binding173 aptamers and demonstrated similar ligand-controlled gene regulation in different systems, including wheat germ extracts,174 Xenopus oocyte,174 and the budding yeast S. cerevisiae.175,176 Synthetic riboswitches have also been constructed to regulate translation of target genes in prokaryotes through a similar aptamer insertion strategy. Although still located in the 5′ UTR of the target transcript, prokaryotes do not exhibit the same type of ribosomal scanning as eukaryotic organisms. Therefore, the physical implementation of these switches requires slightly different design strategies.

8-17

Regulating Gene Expression through Engineered RNA Technologies 5´ UTR-inserted aptamer sequence Ribosome scanning

Translation ON AUG

5´

Target mRNA

AAAAA 3´

Ligand Aptamer bound Interrupted ribosome scanning 5´

Translation OFF AUG

Target mRNA

AAAAA 3´

Figure 8.8 A schematic illustration of riboswitches constructed through methods based on aptamer insertion within a target transcript. An aptamer or multiple aptamers can be inserted into the 5′ UTR of a transcript near the 5′ cap region or the start codon. Such an insertion may allow the aptamer-fused transcript region to adopt two primary conformations, one in which the aptamer binding pocket is disrupted (top) and the other in which the aptamer is correctly formed to bind its ligand (bottom), and ligand binding shifts the equilibrium toward the latter conformation by stabilizing this conformation. This insertion strategy requires that the former conformation not introduce significant steric hindrance to the ribosome for proper translation and that the ligand-bound conformation effectively inhibit translation through its stabilized structure.

In bacteria, the sequence distance between the RBS and the start codon is relatively short and varies between five and 13 nucleotides.24 As a result, targeted insertion of an aptamer in this region to interfere with ribosomal scanning through a ligand-induced secondary structure is generally not applicable without the addition of a peptide fusion to the 5′ end of the coding region.177 In most bacteria, translation initiation relies on ribosomal accessibility to the RBS and the start codon, and thus mRNA secondary structure in this translation initiation region can dictate the efficiency of translation.178–180 Desai and Gallivan developed a synthetic riboswitch system in E. coli where they inserted the theophylline aptamer171 to a location five base-pairs upstream of the RBS to modulate ribosomal access to the RBS through ligand binding.181 The theophylline-dependent up-regulation of gene expression by this synthetic riboswitch was demonstrated through plate-based screening and liquid culture assays. In addition, the extent of up-regulation from this synthetic riboswitch was observed to be dramatically affected when the aptamer was moved to a slightly different location, two or eight base-pairs upstream of the RBS, indicating the functional sensitivity of this riboswitch system to the aptamer insertion location. 8.4.4.2 Riboswitch Construction Based on Direct Attachment of the Aptamer to a Regulatory Element Synthetic riboswitches have been constructed by directly attaching an aptamer to the regulatory domain, such that ligand binding to the aptamer inhibits the activity of the regulatory domain through some mechanism (Figure 8.9). This construction strategy is also a trial-and-error strategy, since the desired ligand-responsive regulatory activity may be highly specific to the location of attachment, the mechanism of action, and the specific aptamer-regulator pair. A riboswitch system based on this strategy was developed by An et al. to modulate Dicer processing of an shRNA molecule through a small molecule ligand-aptamer interaction within this RNAi substrate.182 In this example, the theophylline aptamer171 was directly fused to an shRNA molecule, in place of the loop sequence. This shRNA construct silenced reporter gene expression in mammalian cells in a

8-18

Gene Expression Tools for Metabolic Pathway Engineering Partner regulator for regulatory element (Dicer, Spliceosome, etc.)

Direct attachment

+ Aptamer

Regulatory element (shRNA, splice site, etc.)

Aptamer-fused regulatory element

Partner regulator loaded

Partner regulator unable to load

Regulatory activity enabled

Regulatory activity inhibited

Figure 8.9 A schematic illustration of riboswitches constructed through methods based on direct attachment between the aptamer and regulatory domains. An aptamer is directly attached to the regulatory element in such a way that the ligand binding pocket within the aptamer is sufficiently close to the regulatory element. In the absence of ligand, the partner regulator is capable of loading onto this element and enabling the corresponding regulatory event to occur. In the presence of ligand, ligand binding to the aptamer and residing within the binding pocket results in steric hindrance for the partner regulator loading onto this element, resulting in the inhibition of the normal regulatory event.

theophylline dose-dependent manner. Dicer cleavage of the aptamer-fused shRNA for subsequent generation of siRNAs was demonstrated to be modulated in vitro and in vivo by theophylline. This ligandmediated regulation of Dicer processing was likely achieved due to locating the ligand-binding site of the aptamer sufficiently close to the Dicer processing site, such that theophylline binding to its aptamer blocks Dicer cleavage of the shRNA molecule, resulting in regulatable siRNA-based gene silencing. This proposed mechanism is supported by the observation that the ligand-mediated regulatory effect was abolished when the shRNA stem was extended by one or two base-pairs, resulting in a small shift in the aptamer fusion point compared to the initial design. This direct attachment strategy has also been employed in constructing a synthetic riboswitch that regulates gene expression at the level of splicing. This riboswitch was designed by insertion of the t heophylline aptamer171 near the 3′ consensus splice site region of a model pre-mRNA to modulate the splicing of the pre-mRNA through ligand-aptamer complex interactions.183 The addition of theophylline was shown to repress the in vitro splicing of the pre-mRNA harboring the aptamer and have no regulatory effect on the pre-mRNA without the aptamer. In addition, the aptamer’s effect on splicing was demonstrated to be location-dependent and explained by modulating the spliceosome’s accessibility to the splice site. The pre-mRNA harboring an aptamer with a stable base stem, inserted to encompass the 3′ splice site AG within the ligand-binding sequence exhibited the most efficient splicing inhibition, as the splice site becomes less accessible when theophylline resides in the aptamer binding pocket.

Regulating Gene Expression through Engineered RNA Technologies

8-19

Synthetic riboswitches constructed through the direct attachment strategy between the aptamer and regulatory domains exhibit functional dependence on the attachment location of the aptamer to the regulatory domain. This is because the ligand-mediated regulatory mechanism relies solely on the efficacy of the ligand-aptamer complex interaction in affecting the functional activity of the regulatory domain. This mechanism is not standardized and is highly specific to the particular aptamer-regulator pair. Therefore, such engineered riboswitches lack a reliable composition framework for integrating sensor and regulatory domains that results in allosteric binding properties. 8.4.4.3 Riboswitch Construction Based on an Evolved Linker between the Aptamer and Regulatory Domains Synthetic riboswitches have also been constructed by using a linker region that couples the aptamer and regulatory domains and serves as an element that translates the ligand-binding event in the aptamer domain to the adjacent regulatory domain. Early examples of evolved linker regions implemented a mechanism of information transmission known as “helix slipping”, in which a nucleotide shift event within the element is translated to a small-scale change in the conformation of the regulatory domain in a ligand-dependent manner.184 Such functional elements are often evolved through in vitro selection procedures and referred to as “communication modules.”184 These dynamic elements are typically three to five base-pairs long and often contain non-Watson-Crick base-pairing. Significant effort has been directed toward the construction of communication module-based riboswitches that use a hammerhead ribozyme as the regulatory domain (Figure 8.10), as ribozymes have proven to be a powerful platform for controlling gene expression. Several research groups have engineered a class of in vitro riboswitches called allosteric ribozymes.184–190 Allosteric ribozymes resemble allosteric enzymes in that binding of specific effectors, typically small molecule ligands, modulate the functional activities of the molecule.191 An allosteric ribozyme contains two separate domains, a catalytic, or regulatory, domain and a ligand-binding, or aptamer, domain, which interact in a liganddependent manner to control the catalytic activity of the molecule.191 Thus, the allosteric property of these ribozymes enables their catalytic activity to be regulated through specific ligands, and therefore may represent a modular design platform that can directly make use of different ligand-aptamer pairs. Different strategies including rational design, library screening, and combinatorial approaches have been employed to generate allosteric hammerhead ribozymes.191,192 Rational design strategies involve integration of an existing aptamer domain directly to the catalytic domain of the ribozyme through different linkers, followed by examination of the activities of the resulting integrated constructs.185–187 Library screening strategies involve screening randomized sequence libraries for novel aptamer domains (sometimes including communication modules) that function allosterically with the attached catalytic domain.193,194 Finally, the combined approach involves integration of an existing aptamer domain to the ribozyme’s catalytic domain through a randomized linker region, and screening for functional linker sequences from this library that result in allosteric binding (Figure 8.10).184,188,189 The majority of synthetic allosteric hammerhead ribozymes constructed to-date are responsive to small molecule ligands such as theophylline,171,184 adenosine triphosphate (ATP),185,195 and FMN,165,196 indicating that these riboswitches may be adapted to respond to metabolites, and thus hold potential for implementation in metabolic networks as allosteric gene control elements. While allosteric hammerhead ribozymes have been mostly demonstrated in vitro, Win and Smolke recently specified a set of design principles that enable these engineered allosteric ribozymes to function in the cellular environment.197 The critical design strategy that enabled translation of in vitro allosteric function to in vivo environments addressed the coupling method of the ribozyme and aptamer domains. In earlier systems, one of the loop and/or stem sequences was replaced by part of the aptamer domain, eliminating the tertiary interactions between loop I–II that had been demonstrated to be essential for the in vivo cleavage activity of the ribozyme.72 In the coupling strategy proposed by Win and Smolke, neither of the two loop or stem sequences of the ribozyme was replaced by the aptamer domain, allowing these tertiary interactions to be maintained, and therefore cleavage activity

8-20

Gene Expression Tools for Metabolic Pathway Engineering Coupling regulatory and aptamer domains

A

Targeted random library regions for communication module and ligand binding Stem loops I-II interactions Loop replacement Stem III Regulatory domain (hammerhead ribozyme)

Aptamer

with aptamer

Aptamer-coupled ribozyme (loop I-II interactions abolished)

Functional screening of communication modules

B

Randomize

Targeted random library region for communication module

NNNN NNNN

Functional screening

Helix slipping

In the absence of ligand

In the presence of ligand

Figure 8.10 A schematic illustration of riboswitches constructed based on an evolved linker between the aptamer and regulatory domains. An in vitro functional communication module-based allosteric hammerhead ribozyme system is shown as an example. The ribozyme’s cleavage site is indicated by an arrow. In general, an aptamer is attached to the regulatory domain through a linker (top), whose function is evolved to be ligand-dependent through selection from a random sequence library (bottom). The functional linkers are called “communication modules,” which employ the helix slipping mechanism in mediating the activity of the regulatory domain. An existing functional linker can also be used to mediate the activities of other regulatory platforms such as the RBS. In addition, an existing functional linker can also be used to couple a regulatory domain to an aptamer domain comprised of a random sequence library to evolve the latter to bind a new ligand of interest and function in this ligand-dependent manner (top). In this particular example of the in vitro allosteric ribozyme system, part of the aptamer domain replaces one loop of the ribozyme domain, thereby abolishing loop I-II interactions required for in vivo functionality.

of the ribozyme in vivo. Following these composition strategies, the authors coupled the aptamer and ribozyme domains through a third domain, the information transmission domain, within which previously in vitro selected communication modules were placed and screened for their in vivo functional activities. One quarter of the tested communication modules were demonstrated to be functional in this in vivo system, indicating that in vitro functionality of these elements is selectively translated to in vivo activity. In addition, these riboswitches did not exhibit aptamer-domain modularity in vivo. The select functionality and lack of modularity are presumably due to the weak non-Watson-Crick basepairs comprising these linkers that are likely sensitive to surrounding sequences. This system may be used to screen for new linker sequences that function with specific aptamer-ribozyme pairs to generate new ribozyme-based switches. Researchers have also developed communication module-based riboswitches using a different regulatory platform such as the RBS. Suess et al. engineered a synthetic riboswitch composed of a theophylline aptamer171 and a previously developed communication module191 placed at a position proximal to the RBS.198 This linker element had been proposed to perform helix slipping by one nucleotide between the ligand-bound and unbound states.191 In their design, the communication module served as a helix bridge between the aptamer and the RBS such that binding of theophylline to its aptamer causes a singlenucleotide shift in the communication module, thereby enabling ribosome binding to the RBS without steric interference, and thus efficient translation in the presence of theophylline. This design scheme is similar to a direct coupling design between a theophylline aptamer and RBS described above, except that a distinct communication module was incorporated between the aptamer and regulatory domains.

Regulating Gene Expression through Engineered RNA Technologies

8-21

Linker regions have also been evolved that implement a different mechanism of information transmission known as “strand displacement”, a functionally similar mechanism to “helix slipping.” Gallivan and colleagues developed a second riboswitch system as an extension of their initial direct attachment riboswitch design, using a combined rational and library screening design strategy.199 A linker region adjoining the theophylline aptamer and the RBS was randomized, and sequences that translated a ligand-binding event in the aptamer domain to a structural change in the RBS, thereby regulating ribosomal access to the RBS, were screened through plate-based assays. These sequences are functionally similar to communication modules in that they translate ligand-binding events at the aptamer domain to the regulatory domain, but they are compositionally and mechanistically distinct. Gene expression regulation through this second class of linker regions takes place through the strand-displacement mechanism instead of a helix slipping mechanism. The functional sequences are complementary to regions of the theophylline aptamer such that base-pairing with a region of the aptamer sequesters the RBS and thus inhibits ribosome access to the RBS, whereas binding of theophylline to its aptamer disrupts this conformation and releases the RBS, resulting in up-regulation of target gene expression. As such, this regulatory mechanism is specific to the theophylline aptamer employed in this system and is not functionally independent. Therefore, this riboswitch system is not readily amenable to the insertion of different aptamers and thus lacks modularity, such that new linker regions would need to be generated by screening for specific aptamer-regulator pairs. 8.4.4.4 A General Framework for the Construction of Modular, Extensible Riboswitches Allosteric hammerhead ribozymes with in vivo functionality are highly attractive gene control elements due to their general applicability to a broad range of organisms, as their regulatory mechanism does not require cell-specific machinery and thus represent a universal platform for controlling gene expression. The modular, in vivo functional, allosteric hammerhead ribozyme system developed by Win and Smolke, called ribozyme switches,197 contains distinct functional domains: a ligand-binding or sensor domain, composed of an RNA aptamer, and a regulatory or actuator domain, composed of a hammerhead ribozyme (Figure 8.11). The authors specified key design strategies in the development of this gene regulatory platform. First, the aptamer sequence is directly coupled to the ribozyme through one of two stem loops without removing or replacing the loop sequences, which have previously been shown to form tertiary interactions essential for in vivo functionality by stabilizing the ribozyme’s active conformation (Figure 8.11A). Second, the ribozyme constructs are integrated into the 3′ UTRs of target transcripts through stem III (Figure 8.11A), maintaining loop I–II interactions, removing regulatory artifacts that can be observed when placing RNA secondary structures in the 5′ UTR due to non-specific effects of these elements on translation initiation, and implementing a universal mechanism through which to destabilize transcripts across bacterial and mammalian cells (cleavage in the 3′ UTR). Third, the ribozyme switch was flanked by spacer sequences to insulate the control system from surrounding sequences (Figure 8.11A). Finally, a standardized information transmission domain was rationally constructed, composed of competing strands that are independent of the other domains within the switch molecule, in order to implement a modular strand displacement-based mechanism (Figure 8.11). This domain serves as a computation element capable of transmitting binding information in the aptamer domain to the activity of the ribozyme domain. The strand-displacement regulatory mechanism is based on the conformational dynamics characteristic of RNA molecules that enables them to distribute between two primary conformations at equilibrium. In one conformation, the competing strand is not base-paired or base-paired such that the ligand-binding pocket is not formed, and in the other conformation, the competing strand is basepaired with the aptamer base stem, displacing the switching strand and thus allowing the formation of the ligand-binding pocket. Binding of ligand to the latter conformation shifts the equilibrium distribution to favor the aptamer-bound form as a function of increasing ligand concentration. Strand displacement results in the disruption (ON switch, Figure 8.11A, bottom) or restoration (OFF switch,

8-22

Gene Expression Tools for Metabolic Pathway Engineering

A Stem loops I-II interactions

Stem loops I-II interactions retained Competing strand

Direct coupling Stem III Regulatory domain (sTRSV ribozyme)

Sensor domain (aptamer)

integration Aptamer-coupled ribozyme

Aptamer-coupled ribozyme ON switch platform

Insertion into the 3´ UTR of a target gene through stem III

AAAAA GFP Aptamer bound, ribozyme inactive conformation, allowing gene expression

Ligand

AAAAA GFP Aptamer unbound, ribozyme active conformation, suppressing gene expression

B

AAAAA GFP Aptamer unbound, ribozyme inactive conformation, allowing gene expression

Ligand

AAAAA GFP Aptamer bound, ribozyme active conformation, suppressing gene expression

Figure 8.11 (See color insert following page 13-20.) General compositional framework and design strategy for engineering universal, ligand-controlled cis-acting hammerhead ribozyme-based regulatory systems. The color scheme is as follows: catalytic core, purple; loop sequences, blue; aptamer sequence, brown; competing strand, green; switching strand, red; spacer sequences, orange; cleavage site, brown arrow. Modular strategies for coupling the aptamer and regulatory domains and systematic integration of the coupled control molecule comprising these domains into a target mRNA are shown. An aptamer is directly attached to the ribozyme through one of its loops without replacing any part of the ribozyme, thereby maintaining loop I-II interactions required for in vivo functionality. Spacer sequences are included on both ends of the control molecule to insulate from non-specific interactions with the surrounding sequences. A competing strand, whose sequence is similar to that of the switching strand, is integrated into the aptamer-coupled ribozyme, which enables the control molecule to adopt two primary conformations through the strand displacement mechanism, as the competing strand displaces the switching strand. Through this mechanism, (A) an ON switch, in which ligand binding stabilizes the conformation with the disrupted catalytic core, and (B) an OFF switch, in which ligand binding stabilizes the conformation with the restored catalytic core, are constructed. (Adapted from Win, M.N. and Smolke, C.D. Proc. Natl. Acad. Sci. USA, 104, 14283–14288, 2007.)

Figure 8.11B) of the ribozyme’s catalytic core, thereby enabling up- or down-regulation of target gene expression. The authors demonstrated that both ON and OFF switches can be tuned to exhibit different dynamic ranges of ligand-dependent responses. The authors demonstrated the tunability of this platform targeted to the information transmission domain through alteration of the nucleotide composition of this domain to affect the stabilities of individual switch constructs and the energies required for the construct to switch between the two primary adoptable conformations. In addition, the modularity of the stranddisplacement switch platform was demonstrated by direct replacement of the aptamer domain with another aptamer sequence, resulting in a new switch that maintains similar switch response properties to a new ligand (Figure 8.12). In addition to exhibiting aptamer domain modularity, this standardized platform also demonstrates modularity in the regulatory domain. For instance, Smolke’s group has also

8-23

Regulating Gene Expression through Engineered RNA Technologies

Aptamer I

ON switch I, active ribozyme

AAAAA

Aptamer II

Sensor domain replacement

ON switch II, active ribozyme

AAAAA

ON switch II, inactive ribozyme

AAAAA

Figure 8.12 (See color insert following page 13-20.) Modular design strategies for the construction of new ribozyme switches comprising aptamer domains responsive to diverse ligands. The color scheme corresponds to that used in Figure 8.11. An ON switch platform is used for illustration where aptamer I (left dashed box) is directly replaced with aptamer II (right dashed box) to construct an ON switch II. (Adapted from Win, M.N. and Smolke, C.D. Proc. Natl. Acad. Sci. USA, 104, 14283–14288, 2007.)

developed another class of switches, called antiswitches, constructed through a similar compositional strategy, where an antisense RNA was employed as the regulatory domain in place of the hammerhead ribozyme.200 This class of synthetic riboswitches also enables ligand-dependent up- or down-regulation of target gene expression. Finally, the versatility of this switch platform was demonstrated by implementing several ribozyme switches in an application-specific control system where an amino acid biosynthesis gene responsible for cell growth was regulated in a ligand-dependent manner. Cells harboring switches with different dynamic response ranges exhibited corresponding extents of cell growth in response to the presence of ligand. In addition to demonstrating the first examples of small molecule-responsive, in vivo functional allosteric hammerhead ribozymes, this work describes a modular and extensible RNA-based framework for the reliable construction of ligand-controlled gene regulatory systems that is portable across diverse organisms. This engineering framework provides a flexible and programmable platform in which the aptamer and regulatory domains can be modularly coupled to one another, enabling a plug-and-play type capability for the rapid design and construction of new switches.

8.5 Applications of RNA Control Elements in Metabolic Network Engineering RNA plays a critical role in regulating gene expression and significant effort has been directed toward the design of synthetic RNA elements that function as engineered gene regulatory systems. Engineered biological systems design has traditionally focused on transcriptional regulation schemes, without much effort being directed to the design and integration of RNA-based control systems. However, several recent examples demonstrate the application of these engineered regulators to metabolic pathway engineering. Such RNA-based tools have been implemented to redirect pathway fluxes and to tune the expression levels of enzymes involved in a particular metabolic pathway for optimal energy usage. Combinatorial implementation of RNA regulatory elements has been used to achieve precise control over the differential expression levels of multiple enzymes. Bacteria often locate multiple related

8-24

Gene Expression Tools for Metabolic Pathway Engineering

pathway genes into operons to coordinate the levels of the encoded enzymes. RNA regulatory elements can be used to layer additional control on to the system and achieve differential expression of the genes encoded in that operon construct.41,201,202 Therefore, synthetic RNA regulatory elements such as endonuclease cleavage elements and hairpin elements that result in transcript stabilization (as previously described) have been placed within intergenic regions of a polycistronic transcript to direct the processing and stability of different regions of the polycistronic transcript to result in differential expression of the genes encoded in that operon construct.203,204 In one example, Smolke et al. developed an engineered operon system where multiple genes were placed under the control of a single promoter and the encoded proteins were produced at different ratios depending on the combination of control elements used.58,60,205 In this system, an RNase E cleavage site was placed between two coding regions to direct cleavage in this location and to generate two secondary transcripts, and novel hairpin structures were incorporated into the 5′ and 3′ ends of the coding regions to protect the resulting transcripts from exoribonuclease activity. This RNA regulatory system was implemented to control the metabolic flux through a carotenoid pathway, where the production of the product β-carotene was modulated up to 300-fold relative to levels of an intermediate metabolite in the pathway, lycopene.206 Accumulation levels of these and other intermediates in the pathway were varied with different combinations of synthetic RNA regulatory elements, demonstrating the ability of these systems to precisely regulate pathway enzyme levels and balance the production of the intermediate metabolites. A similar system, based on engineering RNA regulatory elements into intergenic regions of operon constructs, was developed through combinatorial assembly strategies.207 A library of integenic regions was generated through combinatorial assembly of oligonucleotide sequences and a screening method was developed based on fluorescent activated cell sorting (FACS) to generate tunable intergenic regions (TIGRs) consisting of combinations of RNA regulatory elements that result in desired expression ratios.207 Functional TIGRs were demonstrated to vary the relative expression of two reporter genes over a 100-fold range. This combinatorial library generation method was applied to a three-gene operon that encodes a heterologous mevalonate pathway and a biosensor assay was used to screen for high mevalonate producers. Several selected intergenic regions optimized the relative expression of the three pathway enzymes resulting in a seven-fold increase in mevalonate production. This system provides another example of the regulatory capabilities of these RNA-based systems as powerful tools in metabolic pathway engineering. Synthetic regulatory elements that act through the RNAi pathway have significant application in plant metabolic engineering systems.208–211 In one recent example, RNAi substrates were employed to induce gene-silencing in Papaver somniferum,212 the opium poppy, to alter accumulation of benzylisoquinoline alkaloid (BIA) metabolites in the plant. A hairpin RNA (hpRNA) construct was designed to target codeinone reductase (COR), encoded by a multi-gene family consisting of seven members. COR is an enzyme responsible for codeine biosynthesis and the penultimate enzyme for morphine biosynthesis from codeine. Expression of this RNA regulatory element in P. somniferum resulted in the unexpected accumulation of an early BIA metabolite, reticuline, at the expense of later metabolites such as thebaine, oripavine, morphine, and codeine. Reticuline is a non-narcotic alkaloid that is synthesized seven enzymatic steps upstream of COR and also a valuable common precursor metabolite that is shared by other alkaloid pathways. Although it is currently unclear how silencing of the downstream enzyme affected metabolic processing to result in accumulation of an intermediate metabolite synthesized far upstream in the pathway, this example highlights RNAi-based regulatory elements as effective tools in plant metabolic pathway engineering. Efforts in metabolic engineering have been primarily directed to the development of gene regulatory tools and the application of these tools to optimize the regulation of the expression levels of metabolic genes of interest with the goal of maximizing the production of desired metabolites. However, tools that provide general strategies to detect key metabolite accumulation levels without mechanically disrupting the cells, providing methods for non-invasive sensing of metabolite production are critical to advancing metabolic engineering efforts. Recently, Win and Smolke developed a

Regulating Gene Expression through Engineered RNA Technologies

8-25

modular platform for constructing molecular sensors to noninvasively detect biosynthesis of a target metabolite in real-time through a fluorescent reporter signal.197 This platform provides a framework for the construction of ribozyme switches, discussed earlier as extensible ligand-responsive gene regulation systems. The application of these switches to detecting the biosynthesis of a metabolite xanthine from a precursor fed to cells was demonstrated. Ribozyme switches constructed with a sensor domain responsive to xanthine were demonstrated to up-regulate reporter gene expression as the metabolite accumulated in the cell, thereby serving as non-invasive molecular sensors of metabolite production.

8.6 Enabling Technologies in Support of Constructing Integrated Network and Control Systems RNA engineering is a rapidly developing field that holds great promise for implementation of synthetic RNA regulatory systems toward the construction of reliable and robust engineered cellular systems for various biotechnological applications including metabolic engineering. Advances in RNA biology and engineering have enabled researchers to develop numerous regulatory platforms, which will be expanded on to develop more sophisticated systems that will enable the design and construction of biological control systems tailored to application-specific performance requirements. The development of engineered riboswitches highlights such efforts to develop sophisticated control systems, where various regulatory elements such as RBS sequences, antisense RNAs, siRNAs, alternative splice sites, and ribozymes have been employed in combination with a second functional element (aptamers) capable of binding specific ligands. While RNA regulatory elements are generally designed through well-defined forward design principles, the aptamer or sensor elements are generated through standardized in vitro selection processes known as SELEX. Technological advances have resulted in SELEX being adapted to automated, high throughput platforms.141–143 In addition, more efficient SELEX strategies have been developed by modifying the standard affinity-based partitioning schemes to partitioning through capillary electrophoresis.144–146 Selections performed with this partitioning strategy can be conducted in fewer selection cycles to attain higher affinity aptamer than is required in conventional SELEX, due to the substantial enhancement in partitioning target-bound and unbound aptamers. Although these improved aptamer selection schemes have been demonstrated for the generation of aptamers to protein ligands, in principle these schemes can be adapted to the selection of aptamers to small molecule ligands, which are more relevant sensor components to metabolic engineering efforts. Synthetic riboswitches composed of distinct, modular ligand-binding domains are excellent candidates for the construction of metabolite-specific gene control systems. For example, the previously described strand displacement-based ribozyme switch platform is modular in design, tunable in regulation, and specific in target recognition, and thus represents a versatile foundational technology that can be applied to the generation of tailor-made synthetic riboswitches. In addition, as the regulatory mechanism employed by this control system does not require cell-specific machinery, it is therefore applicable to diverse organisms offering platform portability. Since the functional domains of the switch are distinct and exhibit sequence independence from each other, they enable component reuse strategies where domains can be integrated into the platform without disrupting the switch activity. This property potentially enables the construction of higher-order genetic devices and signal processing systems that involve extended integrated design schemes. For instance, tailor-made synthetic riboswitches that exhibit logical processing of multiple inputs can be potentially engineered through tandem ribozyme switch design. Integration schemes may also implement two different sensor domains within a single switch—tandem sensor domain systems—to exhibit higher order signal processing properties. Such higher order signal processing systems will enable more sophisticated control of desired metabolic genes in a pathway, which otherwise cannot be achieved with single riboswitch systems.

8-26

Gene Expression Tools for Metabolic Pathway Engineering

8.7 Future Applications of Advanced RNA-Based Control Systems Riboswitches are sophisticated RNA-based control systems capable of mediating gene expression autonomously without additional aid from auxiliary protein components, and provide a direct link between key metabolites and their biosynthetic genes. Therefore, application of engineered riboswitches in synthetic metabolic networks will enable such autonomous regulation in a specific metabolite-dependent manner. While other synthetic RNA regulatory elements have been employed in metabolic engineering applications, implementation of engineered riboswitches as gene regulatory tools in rewiring metabolic fluxes has not yet been demonstrated. This is because the development of engineered riboswitch systems is recent and most of these systems were constructed as artisanal examples to demonstrate the gene regulatory capabilities of these synthetic systems. As a result, most of these designs do not support modularity and portability and present challenges in adaptation to practical applications. Nevertheless, novel design strategies for engineering adaptable, scalable systems are increasingly emerging. The first example of employing synthetic riboswitches in an application associated with metabolic engineering was recently demonstrated, in which ribozyme switches were implemented as non-invasive sensors for monitoring the biosynthesis of a target metabolite as previously described.197 In addition to serving as cellular biosensors, engineered riboswitches can also be implemented as metabolite-specific gene regulatory tools controlling target enzyme levels in response to key metabolites. A schematic illustration of the application of such advanced RNA control systems in a metabolic network is shown in Figure 8.13, where pathway fluxes are directed and controlled through metabolite-dependent dynamic regulatory schemes. This type of control scheme requires integration of specific aptamers domains into the engineered switch platform. While SELEX has been a powerful technique in generating diverse ligand-binding domains in vitro, it does not assure that these in vitro selected ligand-binding domains will function in vivo within the context of the integrated switch platform. Therefore, a technique complementary to SELEX that enables the generation of new in vivo functional ligand-binding domains that are compatible with integration into the switch platform is needed. Closely related aptamer domains have been evolved in vitro from existing ones through small

S2 M1

E12

M2

S7 E23

E34

M3 E45 M8

M4 S4

E45

M5

E56

E67

M6

M7

E34 M9

Figure 8.13 An abstracted scheme of metabolite-specific RNA switch-based directed gene expression regulation where various switches control the regulation of specific pathway genes and direct metabolic fluxes. Abbreviations are as follows: M, metabolite; E, enzyme; S, switch. S2 up-regulates the downstream enzyme E23 in response to M2 accumulation to conserve cellular energy. S4 up-regulates E 45 in response to M4 accumulation, which is converted from M3, to minimize an undesired conversion of M3 to M8 by E 45. In response to certain levels of M7 accumulation, S7 down-regulates the immediate upstream enzyme E67, which is the biosynthetic enzyme for M7, allowing the metabolite to self-regulate its own production levels. This feedback regulation scheme can be used to specify set points for M7 accumulation.

Regulating Gene Expression through Engineered RNA Technologies

8-27

sequence libraries placed within the latter (“doped libraries”), 213,214 supporting that the generation of orthogonal aptamers to metabolite families through in vivo selection strategies is feasible. Ribozyme switches, or similar engineered switch platforms, may provide the basis and thus serve as a platform for such in vivo strategies to generate new aptamer domains from smaller randomized libraries. This in vivo screening technique is potentially complementary to standard SELEX techniques such that in vitro selection can be initially performed on larger switch libraries for fewer selection rounds, which will then be subsequently subject to in vivo screening on these partially enriched smaller libraries, thereby enabling the generation of RNA aptamers for building synthetic riboswitches that are functional in the cellular environment.

8.8 Conclusions In natural bacterial and eukaryotic systems, many metabolic pathways have been identified that employ RNA elements such as riboswitches in controlling the expression levels of enzymes responsible for the biosynthesis, biodegradation, and transport of key metabolites in these pathways, indicating that RNA is a well-suited regulatory platform for metabolic networks. Naturally occurring riboswitches control gene expression by regulating the efficiency of transcriptional and post-transcriptional processes upon binding small molecule ligands through different mechanisms that act through base-pairing interactions (terminator formation, RBS sequestering), transcript processing (ribozyme cleavage), and mRNA maturation (splicing). Building on these natural examples and exploiting the interdependent relationship between RNA sequence, structure, and function, various synthetic RNA regulatory systems including engineered riboswitches have been developed and their use as gene control elements has been successfully demonstrated. These natural and engineered RNA-based regulatory systems have highlighted the design flexibility and functional versatility of RNA, and demonstrated their unique and powerful capabilities as tools in metabolic engineering applications.

References 1. Khosla, C. and Keasling, J.D. Metabolic engineering for drug discovery and development. Nat. Rev. Drug Discov., 2, 1019–1025, 2003. 2. Simons, R.W. Naturally occurring antisense RNA control—a brief review. Gene, 72, 35–44, 1988. 3. Inouye, M. Antisense RNA: its functions and applications in gene regulation—a review. Gene, 72, 25–34, 1988. 4. Doudna, J.A. and Cech, T.R. The chemical repertoire of natural ribozymes. Nature, 418, 222–228, 2002. 5. Fedor, M.J. and Williamson, J.R. The catalytic diversity of RNAs. Nat. Rev. Mol. Cell Biol., 6, 399–412, 2005. 6. Mandal, M. and Breaker, R.R. Gene regulation by riboswitches. Nat. Rev. Mol. Cell. Biol., 5, 451–463, 2004. 7. Winkler, W.C. and Breaker, R.R. Regulation of bacterial gene expression by riboswitches. Ann. Rev. Microbiol., 59, 487–517, 2005. 8. Lai, E.C. RNA sensors and riboswitches: self-regulating messages. Curr. Biol., 13, R285–291, 2003. 9. Sudarsan, N., Barrick, J.E., and Breaker, R.R. Metabolite-binding RNA domains are present in the genes of eukaryotes. RNA, 9, 644–647, 2003. 10. Winkler, W.C. and Breaker, R.R. Genetic control by metabolite-binding riboswitches. Chembiochem, 4, 1024–1032, 2003. 11. Tucker, B.J. and Breaker, R.R. Riboswitches as versatile gene control elements. Curr. Opin. Struct. Biol., 15, 342–348, 2005. 12. Meister, G. and Tuschl, T. Mechanisms of gene silencing by double-stranded RNA. Nature, 431, 343–349, 2004.

8-28

Gene Expression Tools for Metabolic Pathway Engineering

13. Dykxhoorn, D.M., Novina, C.D., and Sharp, P.A. Killing the messenger: short RNAs that silence gene expression. Nat. Rev. Mol. Cell. Biol., 4, 457–467, 2003. 14. Novina, C.D. and Sharp, P.A. The RNAi revolution. Nature, 430, 161–164, 2004. 15. Tang, G. siRNA and miRNA: an insight into RISCs. Trends Biochem. Sci., 30, 106–114, 2005. 16. Breaker, R.R. Natural and engineered nucleic acids as tools to explore biology. Nature, 432, 838–845, 2004. 17. Schroeder, R., Barta, A., and Semrad, K. Strategies for RNA folding and assembly. Nat. Rev. Mol. Cell Biol., 5, 908–919, 2004. 18. Soukup, J.K. and Soukup, G.A. Riboswitches exert genetic control through metabolite-induced conformational change. Curr. Opin. Struct. Biol., 14, 344–349, 2004. 19. Bauer, G. and Suess, B. Engineered riboswitches as novel tools in molecular biology. J. Biotechnol., 124, 4–11, 2006. 20. Isaacs, F.J., Dwyer, D.J., and Collins, J.J. RNA synthetic biology. Nat. Biotechnol., 24, 545–554, 2006. 21. Davidson, E.A. and Ellington, A.D. Engineering regulatory RNAs. Trends Biotechnol., 23, 109–112, 2005. 22. Erdmann, V.A. et al. The non-coding RNAs as riboregulators. Nucleic Acids Res., 29, 189–193, 2001. 23. Kozak, M. Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene, 361, 13–37, 2005. 24. Kozak, M. Initiation of translation in prokaryotes and eukaryotes. Gene, 234, 187–208, 1999. 25. Shine, J. and Dalgarno, L. Determinant of cistron specificity in bacterial ribosomes. Nature, 254, 34–38, 1975. 26. Chen, H., Bjerknes, M., Kumar, R., and Jay, E. Determination of the optimal aligned spacing between the Shine-Dalgarno sequence and the translation initiation codon of Escherichia coli mRNAs. Nucleic Acids Res., 22, 4953–4957, 1994. 27. Hellen, C.U. and Sarnow, P. Internal ribosome entry sites in eukaryotic mRNA molecules. Genes Dev., 15, 1593–1612, 2001. 28. Bonnal, S., Boutonnet, C., Prado-Lourenco, L., and Vagner, S. IRESdb: the Internal Ribosome Entry Site database. Nucleic Acids Res., 31, 427–428, 2003. 29. Pelletier, J. and Sonenberg, N. Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from poliovirus RNA. Nature, 334, 320–325, 1988. 30. Jang, S.K. et al. A segment of the 5′ nontranslated region of encephalomyocarditis virus RNA directs internal entry of ribosomes during in vitro translation. J. Virol., 62, 2636–2643, 1988. 31. Chappell, S.A., Edelman, G.M., and Mauro, V.P. A 9-nt segment of a cellular mRNA can function as an internal ribosome entry site (IRES) and when present in linked multiple copies greatly enhances IRES activity. Proc. Natl. Acad. Sci. USA, 97, 1536–1541, 2000. 32. Kieft, J.S., Zhou, K., Grech, A., Jubin, R., and Doudna, J.A. Crystal structure of an RNA tertiary domain essential to HCV IRES-mediated translation initiation. Nat. Struct. Biol., 9, 370–374, 2002. 33. Chapon, C. Expression of malT, the regulator gene of the maltose region in Escherichia coli, is limited both at transcription and translation. EMBO J., 1, 369–374, 1982. 34. De Boer, H.A., Comstock, L.J., Hui, A., Wong, E., and Vasser, M. A hybrid promoter and portable Shine-Dalgarno regions of Escherichia coli. Biochem. Soc. Symp., 48, 233–244, 1983. 35. Rackham, O. and Chin, J.W. A network of orthogonal ribosome x mRNA pairs. Nat. Chem. Biol., 1, 159–166, 2005. 36. Zhou, W., Edelman, G.M., and Mauro, V.P. Isolation and identification of short nucleotide sequences that affect translation initiation in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA, 100, 4457– 4462, 2003. 37. Owens, G.C., Chappell, S.A., Mauro, V.P., and Edelman, G.M. Identification of two short internal ribosome entry sites selected from libraries of random oligonucleotides. Proc. Natl. Acad. Sci. USA, 98, 1471–1476, 2001.

Regulating Gene Expression through Engineered RNA Technologies

8-29

38. Borman, A.M., Bailly, J.L., Girard, M., and Kean, K.M. Picornavirus internal ribosome entry segments: comparison of translation efficiency and the requirements for optimal internal initiation of translation in vitro. Nucleic Acids Res., 23, 3656–3663, 1995. 39. Nicholson, A.W. Function, mechanism and regulation of bacterial ribonucleases. FEMS Microbiol. Rev., 23, 371–390, 1999. 40. Lamontagne, B., Larose, S., Boulanger, J., and Elela, S.A. The RNase III family: a conserved structure and expanding functions in eukaryotic dsRNA metabolism. Curr. Issues Mol. Biol., 3, 71–78, 2001. 41. Carrier, T.A. and Keasling, J.D. Controlling messenger RNA stability in bacteria: strategies for engineering gene expression. Biotechnol. Prog., 13, 699–708, 1997. 42. Saunders, L.R. and Barber, G.N. The dsRNA binding protein family: critical roles, diverse cellular functions. FASEB J., 17, 961–983, 2003. 43. Rotondo, G. and Frendewey, D. Purification and characterization of the Pac1 ribonuclease of Schizosaccharomyces pombe. Nucleic Acids Res., 24, 2377–2386, 1996. 44. Zhang, K. and Nicholson, A.W. Regulation of ribonuclease III processing by double-helical sequence antideterminants. Proc. Natl. Acad. Sci. USA, 94, 13437–13441, 1997. 45. Nagel, R. and Ares, M., Jr. Substrate recognition by a eukaryotic RNase III: the double-stranded RNA-binding domain of Rnt1p selectively binds RNA containing a 5′-AGNN-3′ tetraloop. RNA, 6, 1142–1156, 2000. 46. Lamontagne, B. and Elela, S.A. Evaluation of the RNA determinants for bacterial and yeast RNase III binding and cleavage. J. Biol. Chem., 279, 2231–2241, 2004. 47. Ge, D., Lamontagne, B., and Elela, S.A. RNase III-mediated silencing of a glucose-dependent repressor in yeast. Curr. Biol., 15, 140–145, 2005. 48. Larose, S. et al. RNase III-dependent regulation of yeast telomerase. J. Biol. Chem., 282, 4373–4381, 2007. 49. Lundberg, U., von Gabain, A., and Melefors, O. Cleavages in the 5′ region of the ompA and bla mRNA control stability: studies with an E. coli mutant altering mRNA stability and a novel endoribonuclease. EMBO J., 9, 2731–2741, 1990. 50. Taraseviciene, L., Miczak, A., and Apirion, D. The gene specifying RNase E (rne) and a gene affecting mRNA stability (ams) are the same gene. Mol. Microbiol., 5, 851–855, 1991. 51. McDowall, K.J., Lin-Chao, S., and Cohen, S.N. A+U content rather than a particular nucleotide order determines the specificity of RNase E cleavage. J. Biol. Chem., 269, 10790–10796, 1994. 52. Zubiaga, A.M., Belasco, J.G., and Greenberg, M.E. The nonamer UUAUUUAUU is the key AU-rich sequence motif that mediates mRNA degradation. Mol. Cell. Biol., 15, 2219–2230, 1995. 53. Alifano, P., Bruni, C.B., and Carlomagno, M.S. Control of mRNA processing and decay in prokaryotes. Genetica, 94, 157–172, 1994. 54. Emory, S.A. and Belasco, J.G. The ompA 5′ untranslated RNA segment functions in Escherichia coli as a growth-rate-regulated mRNA stabilizer whose activity is unrelated to translational efficiency. J. Bacteriol., 172, 4472–4481, 1990. 55. Emory, S.A., Bouvet, P., and Belasco, J.G. A 5′-terminal stem-loop structure can stabilize mRNA in Escherichia coli. Genes Dev., 6, 135–148, 1992. 56. Hansen, M.J., Chen, L.H., Fejzo, M.L., and Belasco, J.G. The ompA 5′ untranslated region impedes a major pathway for mRNA degradation in Escherichia coli. Mol. Microbiol., 12, 707–716, 1994. 57. Carrier, T.A. and Keasling, J.D. Library of synthetic 5′ secondary structures to manipulate mRNA stability in Escherichia coli. Biotechnol. Prog., 15, 58–64, 1999. 58. Smolke, C.D., Carrier, T.A., and Keasling, J.D. Coordinated, differential expression of two genes through directed mRNA cleavage and stabilization by secondary structures. Appl. Environ. Microbiol., 66, 5399–5405, 2000. 59. Zuo, Y. and Deutscher, M.P. Exoribonuclease superfamilies: structural analysis and phylogenetic distribution. Nucleic Acids Res., 29, 1017–1026, 2001.

8-30

Gene Expression Tools for Metabolic Pathway Engineering

60. Smolke, C.D. and Keasling, J.D. Effect of gene location, mRNA secondary structures, and RNase sites on expression of two genes in an engineered operon. Biotechnol. Bioeng., 80, 762–776, 2002. 61. Zaug, A.J. and Cech, T.R. In vitro splicing of the ribosomal RNA precursor in nuclei of Tetrahymena. Cell, 19, 331–338, 1980. 62. Cech, T.R., Zaug, A.J., and Grabowski, P.J. In vitro splicing of the ribosomal RNA precursor of Tetrahymena: involvement of a guanosine nucleotide in the excision of the intervening sequence. Cell, 27, 487–496, 1981. 63. Kruger, K. et al. Self-splicing RNA: autoexcision and autocyclization of the ribosomal RNA intervening sequence of Tetrahymena. Cell, 31, 147–157, 1982. 64. Guerrier-Takada, C., Gardiner, K., Marsh, T., Pace, N., and Altman, S. The RNA moiety of ribonuclease P is the catalytic subunit of the enzyme. Cell, 35, 849–857, 1983. 65. Symons, R.H. Small catalytic RNAs. Ann. Rev. Biochem., 61, 641–671, 1992. 66. Forster, A.C., Jeffries, A.C., Sheldon, C.C., and Symons, R.H. Structural and ionic requirements for self-cleavage of virusoid RNAs and trans self-cleavage of viroid RNA. Cold Spring Harb. Symp. Quant. Biol., 52, 249–259, 1987. 67. Sun, L.Q., Cairns, M.J., Saravolac, E.G., Baker, A., and Gerlach, W.L. Catalytic nucleic acids: from lab to applications. Pharmacol. Rev., 52, 325–347, 2000. 68. Tang, J. and Breaker, R.R. Structural diversity of self-cleaving ribozymes. Proc. Natl. Acad. Sci. USA, 97, 5784–5789, 2000. 69. Birikh, K.R., Heaton, P.A., and Eckstein, F. The structure, function and application of the hammerhead ribozyme. Eur. J. Biochem., 245, 1–16, 1997. 70. Jen, K.Y. and Gewirtz, A.M. Suppression of gene expression by targeted disruption of messenger RNA: available options and current strategies. Stem Cells, 18, 307–319, 2000. 71. Vaish, N.K., Kore, A.R., and Eckstein, F. Recent developments in the hammerhead ribozyme field. Nucleic Acids Res., 26, 5237–5242, 1998. 72. Khvorova, A., Lescoute, A., Westhof, E., and Jayasena, S.D. Sequence elements outside the hammerhead ribozyme catalytic core enable intracellular activity. Nat. Struct. Biol., 10, 708–712, 2003. 73. De la Pena, M., Gago, S., and Flores, R. Peripheral regions of natural hammerhead ribozymes greatly increase their self-cleavage activity. EMBO J., 22, 5561–5570, 2003. 74. Blount, K.F. and Uhlenbeck, O.C. The structure-function dilemma of the hammerhead ribozyme. Ann. Rev. Biophys. Biomol. Struct., 34, 415–440, 2005. 75. Eckstein, F., Kore, A.R., and Nakamaye, K.L. In vitro selection of hammerhead ribozyme sequence variants. Chembiochem, 2, 629–635, 2001. 76. Salehi-Ashtiani, K. and Szostak, J.W. In vitro evolution suggests multiple origins for the hammerhead ribozyme. Nature, 414, 82–84, 2001. 77. Ishizaka, M., Ohshima, Y., and Tani, T. Isolation of active ribozymes from an RNA pool of random sequences using an anchored substrate RNA. Biochem. Biophys. Res. Commun., 214, 403–409, 1995. 78. Conaty, J., Hendry, P., and Lockett, T. Selected classes of minimised hammerhead ribozyme have very high cleavage rates at low Mg2+ concentration. Nucleic Acids Res., 27, 2400–2407, 1999. 79. Persson, T., Hartmann, R.K., and Eckstein, F. Selection of hammerhead ribozyme variants with low Mg2+ requirement: importance of stem-loop II. Chembiochem, 3, 1066–1071, 2002. 80. Marschall, P., Thomson, J.B., and Eckstein, F. Inhibition of gene expression with ribozymes. Cell Mol. Neurobiol., 14, 523–538, 1994. 81. Fujita, S. et al. Discrimination of a single base change in a ribozyme using the gene for dihydrofolate reductase as a selective marker in Escherichia coli. Proc. Natl. Acad. Sci. USA, 94, 391–396, 1997. 82. Kawasaki, H. et al. Selection of the best target site for ribozyme-mediated cleavage within a fusion gene for adenovirus E1A-associated 300 kDa protein (p300) and luciferase. Nucleic Acids Res., 24, 3010–3016, 1996. 83. Sakamoto, N., Wu, C.H., and Wu, G.Y. Intracellular cleavage of hepatitis C virus RNA and inhibition of viral protein translation by hammerhead ribozymes. J. Clin. Invest., 98, 2720–2728, 1996.

Regulating Gene Expression through Engineered RNA Technologies

8-31

84. Gavin, D.K. and Gupta, K.C. Efficient hammerhead ribozymes targeted to the polycistronic Sendai virus P/C mRNA. Structure-function relationships. J. Biol. Chem., 272, 1461–1472, 1997. 85. Kijima, H. et al. Hammerhead ribozymes against gamma-glutamylcysteine synthetase mRNA down-regulate intracellular glutathione concentration of mouse islet cells. Biochem. Biophys. Res. Commun., 247, 697–703, 1998. 86. Kim, Y.K. et al. Repression of hepatitis B virus X gene expression by hammerhead ribozymes. Biochem. Biophys. Res. Commun., 257, 759–765, 1999. 87. Yen, L. et al. Exogenous control of mammalian gene expression through modulation of RNA selfcleavage. Nature, 431, 471–476, 2004. 88. Green, P.J., Pines, O., and Inouye, M. The role of antisense RNA in gene regulation. Ann. Rev. Biochem., 55, 569–597, 1986. 89. Crooke, S.T. Progress in antisense technology. Ann. Rev. Med., 55, 61–95, 2004. 90. Kurreck, J. Antisense technologies. Improvement through novel chemical modifications. Eur. J. Biochem., 270, 1628–1644, 2003. 91. Kawamoto, H., Morita, T., Shimizu, A., Inada, T., and Aiba, H. Implication of membrane localization of target mRNA in the action of a small RNA: mechanism of post-transcriptional regulation of glucose transporter in Escherichia coli. Genes Dev., 19, 328–338, 2005. 92. Vanderpool, C.K. and Gottesman, S. Noncoding RNAs at the membrane. Nat. Struct. Mol. Biol., 12, 285–286, 2005. 93. Bunch, T.A. and Goldstein, L.S. The conditional inhibition of gene expression in cultured Drosophila cells by antisense RNA. Nucleic Acids Res., 17, 9761–9782, 1989. 94. Blomberg, P., Nordstrom, K., and Wagner, E.G. Replication control of plasmid R1: RepA synthesis is regulated by CopA RNA through inhibition of leader peptide translation. EMBO J., 11, 2675–2683, 1992. 95. Bonoli, M., Graziola, M., Poggi, V., and Hochkoeppler, A. RNA complementary to the 5′ UTR of mRNA triggers effective silencing in Saccharomyces cerevisiae. Biochem. Biophys. Res. Commun., 339, 1224–1231, 2006. 96. Novick, R.P., Iordanescu, S., Projan, S.J., Kornblum, J., and Edelman, I. pT181 plasmid replication is regulated by a countertranscript-driven transcriptional attenuator. Cell, 59, 395–404 (1989). 97. Lease, R.A., Cusick, M.E., and Belfort, M. Riboregulation in Escherichia coli: DsrA RNA acts by RNA:RNA interactions at multiple loci. Proc. Natl. Acad. Sci. USA, 95, 12456–12461, 1998. 98. Majdalani, N., Cunning, C., Sledjeski, D., Elliott, T., and Gottesman, S. DsrA RNA regulates translation of RpoS message by an anti-antisense mechanism, independent of its action as an antisilencer of transcription. Proc. Natl. Acad. Sci. USA, 95, 12462–12467, 1998. 99. Lease, R.A. and Belfort, M. A trans-acting RNA as a control switch in Escherichia coli: DsrA modulates function by forming alternative structures. Proc. Natl. Acad. Sci. USA, 97, 9919–9924, 2000. 100. Sharp, P.A. RNA interference—2001. Genes Dev., 15, 485–490, 2001. 101. Napoli, C., Lemieux, C., and Jorgensen, R. Introduction of a chimeric chalcone synthase gene into Petunia results in reversible co-suppression of homologous genes in trans. Plant Cell., 2, 279–289, 1990. 102. van der Krol, A.R., Mur, L.A., Beld, M., Mol, J.N., and Stuitje, A.R. Flavonoid genes in petunia: addition of a limited number of gene copies may lead to a suppression of gene expression. Plant Cell., 2, 291–299, 1990. 103. Fire, A. et al. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature, 391, 806–811, 1998. 104. Bartel, D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 116, 281–297, 2004. 105. Bernstein, E., Caudy, A.A., Hammond, S.M. and Hannon, G.J. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature, 409, 363–366, 2001. 106. Hammond, S.M., Bernstein, E., Beach, D., and Hannon, G.J. An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature, 404, 293–296, 2000.

8-32

Gene Expression Tools for Metabolic Pathway Engineering

107. Hutvagner, G. and Zamore, P.D. A microRNA in a multiple-turnover RNAi enzyme complex. Science, 297, 2056–2060, 2002. 108. Elbashir, S.M., Lendeckel, W., and Tuschl, T. RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev., 15, 188–200, 2001. 109. Zamore, P.D., Tuschl, T., Sharp, P.A., and Bartel, D.P. RNAi: double-stranded RNA directs the ATPdependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell, 101, 25–33, 2000. 110. Wassenegger, M. and Pelissier, T. A model for RNA-mediated gene silencing in higher plants. Plant Mol. Biol., 37, 349–362, 1998. 111. Sijen, T. et al. On the role of RNA amplification in dsRNA-triggered gene silencing. Cell, 107, 465–476, 2001. 112. Waterhouse, P.M., Wang, M.B., and Lough, T. Gene silencing as an adaptive defence against viruses. Nature, 411, 834–842, 2001. 113. Lee, R.C., Feinbaum, R.L., and Ambros, V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell, 75, 843–854, 1993. 114. Reinhart, B.J. et al. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature, 403, 901–906, 2000. 115. Mello, C.C. and Conte, D., Jr. Revealing the world of RNA interference. Nature, 431, 338–342, 2004. 116. Hutvagner, G. et al. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science, 293, 834–838, 2001. 117. Ketting, R.F. et al. Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev., 15, 2654–2659, 2001. 118. Grishok, A. et al. Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell, 106, 23–34, 2001. 119. Lee, Y., Jeon, K., Lee, J.T., Kim, S., and Kim, V.N. MicroRNA maturation: stepwise processing and subcellular localization. EMBO J., 21, 4663–4670, 2002. 120. Lee, Y. et al. The nuclear RNase III Drosha initiates microRNA processing. Nature, 425, 415–419, 2003. 121. Basyuk, E., Suavet, F., Doglio, A., Bordonne, R., and Bertrand, E. Human let-7 stem-loop precursors harbor features of RNase III cleavage products. Nucleic Acids Res., 31, 6593–6597, 2003. 122. Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. Exportin-5 mediates the nuclear export of premicroRNAs and short hairpin RNAs. Genes Dev., 17, 3011–3016, 2003. 123. Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. Nuclear export of microRNA precursors. Science, 303, 95–98, 2004. 124. Bohnsack, M.T., Czaplinski, K., and Gorlich, D. Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. RNA, 10, 185–191, 2004. 125. Mourelatos, Z. et al. miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev., 16, 720–728, 2002. 126. Elbashir, S.M. et al. Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature, 411, 494–498, 2001. 127. Caplen, N.J., Parrish, S., Imani, F., Fire, A., and Morgan, R.A. Specific inhibition of gene expression by small double-stranded RNAs in invertebrate and vertebrate systems. Proc. Natl. Acad. Sci. USA, 98, 9742–9747, 2001. 128. Kim, D.H. et al. Synthetic dsRNA Dicer substrates enhance RNAi potency and efficacy. Nat. Biotechnol., 23, 222–226, 2005. 129. Miyagishi, M. and Taira, K. U6 promoter-driven siRNAs with four uridine 3′ overhangs efficiently suppress targeted gene expression in mammalian cells. Nat. Biotechnol, 20, 497–500, 2002. 130. Lee, N.S. et al. Expression of small interfering RNAs targeted against HIV-1 rev transcripts in human cells. Nat. Biotechnol., 20, 500–505, 2002. 131. Brummelkamp, T.R., Bernards, R., and Agami, R. A system for stable expression of short interfering RNAs in mammalian cells. Science, 296, 550–553, 2002.

Regulating Gene Expression through Engineered RNA Technologies

8-33

132. Paddison, P.J., Caudy, A.A., Bernstein, E., Hannon, G.J., and Conklin, D.S. Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells. Genes Dev., 16, 948–958, 2002. 133. Yu, J.Y., DeRuiter, S.L., and Turner, D.L. RNA interference by expression of short-interfering RNAs and hairpin RNAs in mammalian cells. Proc. Natl. Acad. Sci. USA, 99, 6047–6052, 2002. 134. Paul, C.P., Good, P.D., Winer, I., and Engelke, D.R. Effective expression of small interfering RNA in human cells. Nat. Biotechnol., 20, 505–508, 2002. 135. McManus, M.T., Petersen, C.P., Haines, B.B., Chen, J. and Sharp, P.A. Gene silencing using microRNA designed hairpins. RNA, 8, 842–850, 2002. 136. Zeng, Y., Wagner, E.J., and Cullen, B.R. Both natural and designed micro RNAs can inhibit the expression of cognate mRNAs when expressed in human cells. Mol. Cell., 9, 1327–1333, 2002. 137. Isaacs, F.J. et al. Engineered riboregulators enable post-transcriptional control of gene expression. Nat. Biotechnol., 22, 841–847, 2004. 138. Tuerk, C. and Gold, L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science, 249, 505–510, 1990. 139. Ellington, A.D. and Szostak, J.W. In vitro selection of RNA molecules that bind specific ligands. Nature, 346, 818–822, 1990. 140. Rimmele, M. Nucleic acid aptamers as tools and drugs: recent developments. Chembiochem, 4, 963–971, 2003. 141. Cox, J.C. and Ellington, A.D. Automated selection of anti-protein aptamers. Bioorg. Med. Chem., 9, 2525–2531, 2001. 142. Cox, J.C. et al. Automated acquisition of aptamer sequences. Comb. Chem. High Throughput Screen, 5, 289–299, 2002. 143. Cox, J.C. et al. Automated selection of aptamers against protein targets translated in vitro: from gene to aptamer. Nucleic Acids Res., 30, e108, 2002. 144. Berezovski, M. et al. Nonequilibrium capillary electrophoresis of equilibrium mixtures: a universal tool for development of aptamers. J. Am. Chem. Soc., 127, 3165–3171, 2005. 145. Drabovich, A., Berezovski, M., and Krylov, S.N. Selection of smart aptamers by equilibrium capillary electrophoresis of equilibrium mixtures (ECEEM). J. Am. Chem. Soc., 127, 11224–11225, 2005. 146. Krylov, S.N. Nonequilibrium capillary electrophoresis of equilibrium mixtures (NECEEM): A novel method for biomolecular screening. J. Biomol. Screen, 11, 115–122, 2006. 147. Hermann, T. and Patel, D.J. Adaptive recognition by nucleic acid aptamers. Science, 287, 820–825, 2000. 148. Khvorova, A., Kwak, Y.G., Tamkun, M., Majerfeld, I., and Yarus, M. RNAs that bind and change the permeability of phospholipid membranes. Proc. Natl. Acad. Sci. USA, 96, 10649–10654, 1999. 149. Vlassov, A., Khvorova, A., and Yarus, M. Binding and disruption of phospholipid bilayers by supramolecular RNA complexes. Proc. Natl. Acad. Sci. USA, 98, 7706–7711, 2001. 150. Cheah, M.T., Wachter, A., Sudarsan, N., and Breaker, R.R. Control of alternative RNA splicing and gene expression by eukaryotic riboswitches. Nature, 447, 497–500, 2007. 151. Winkler, W.C., Nahvi, A., Sudarsan, N., Barrick, J.E., and Breaker, R.R. An mRNA structure that controls gene expression by binding S-adenosylmethionine. Nat. Struct. Biol., 10, 701–707, 2003. 152. Mandal, M. and Breaker, R.R. Adenine riboswitches and gene activation by disruption of a transcription terminator. Nat. Struct. Mol. Biol., 11, 29–35, 2004. 153. Winkler, W., Nahvi, A., and Breaker, R.R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature, 419, 952–956, 2002. 154. Winkler, W.C., Nahvi, A., Roth, A., Collins, J.A., and Breaker, R.R. Control of gene expression by a natural metabolite-responsive ribozyme. Nature, 428, 281–286, 2004. 155. Mironov, A.S. et al. Sensing small molecules by nascent RNA: a mechanism to control transcription in bacteria. Cell, 111, 747–756, 2002.

8-34

Gene Expression Tools for Metabolic Pathway Engineering

156. Welz, R. and Breaker, R.R. Ligand binding and gene control characteristics of tandem riboswitches in Bacillus anthracis. RNA, 13, 573–582, 2007. 157. Nahvi, A. et al. Genetic control by a metabolite binding mRNA. Chem. Biol., 9, 1043–1049, 2002. 158. Nahvi, A., Barrick, J.E., and Breaker, R.R. Coenzyme B12 riboswitches are widespread genetic control elements in prokaryotes. Nucleic Acids Res., 32, 143–150, 2004. 159. Winkler, W.C., Cohen-Chalamish, S. and Breaker, R.R. An mRNA structure that controls gene expression by binding FMN. Proc. Natl. Acad. Sci. USA, 99 15908–15913, 2002. 160. Nudler, E. and Mironov, A.S. The riboswitch control of bacterial metabolism. Trends Biochem. Sci., 29, 11–17, 2004. 161. Sudarsan, N., Wickiser, J.K., Nakamura, S., Ebert, M.S., and Breaker, R.R. An mRNA structure in bacteria that controls gene expression by binding lysine. Genes Dev., 17, 2688–2697, 2003. 162. Mandal, M. et al. A glycine-dependent riboswitch that uses cooperative binding to control gene expression. Science, 306, 275–279, 2004. 163. Mandal, M., Boese, B., Barrick, J.E., Winkler, W.C., and Breaker, R.R. Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria. Cell, 113, 577–586, 2003. 164. Johansen, L.E., Nygaard, P., Lassen, C., Agerso, Y., and Saxild, H.H. Definition of a second Bacillus subtilis pur regulon comprising the pur and xpt-pbuX operons plus pbuG, nupG (yxjA), and pbuE (ydhL). J. Bacteriol., 185, 5200–5209, 2003. 165. Sudarsan, N. et al. Tandem riboswitch architectures exhibit complex gene control functions. Science, 314, 300–304, 2006. 166. Patel, D.J. et al. Structure, recognition and adaptive binding in RNA aptamer complexes. J. Mol. Biol., 272, 645–664, 1997. 167. Stripecke, R., Oliveira, C.C., McCarthy, J.E., and Hentze, M.W. Proteins binding to 5′ untranslated region sites: a general mechanism for translational regulation of mRNAs in human and yeast cells. Mol. Cell. Biol., 14, 5898–5909, 1994. 168. Paraskeva, E., Atzberger, A., and Hentze, M.W. A translational repression assay procedure (TRAP) for RNA-protein interactions in vivo. Proc. Natl. Acad. Sci. USA, 95, 951–956, 1998. 169. Pelletier, J. and Sonenberg, N. Insertion mutagenesis to increase secondary structure within the 5′ noncoding region of a eukaryotic mRNA reduces translational efficiency. Cell, 40, 515–526, 1985. 170. Werstuck, G. and Green, M.R. Controlling gene expression in living cells through small moleculeRNA interactions. Science, 282, 296–298, 1998. 171. Jenison, R.D., Gill, S.C., Pardi, A. and Polisky, B. High-resolution molecular discrimination by RNA. Science, 263, 1425–1429, 1994. 172. Wilson, C., Nix, J. and Szostak, J. Functional requirements for specific ligand recognition by a biotin-binding RNA pseudoknot. Biochemistry, 37, 14410–14419, 1998. 173. Berens, C., Thain, A. and Schroeder, R. A tetracycline-binding RNA aptamer. Bioorg. Med. Chem., 9, 2549–2556, 2001. 174. Harvey, I., Garneau, P. and Pelletier, J. Inhibition of translation by RNA-small molecule interactions. RNA, 8, 452–463, 2002. 175. Suess, B. et al. Conditional gene expression by controlling translation with tetracycline-binding aptamers. Nucleic Acids Res., 31, 1853–1858, 2003. 176. Hanson, S., Berthelot, K., Fink, B., McCarthy, J.E., and Suess, B. Tetracycline-aptamer-mediated translational regulation in yeast. Mol. Microbiol., 49, 1627–1637, 2003. 177. Topp, S. and Gallivan, J.P. Riboswitches in unexpected places—a synthetic riboswitch in a protein coding region. RNA, 14, 2498–503, 2008. 178. de Smit, M.H. and van Duin, J. Control of prokaryotic translational initiation by mRNA secondary structure. Prog. Nucleic Acid Res. Mol. Biol., 38, 1–35, 1990. 179. de Smit, M.H. and van Duin, J. Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. Proc. Natl. Acad. Sci. USA, 87, 7668–7672, 1990.

Regulating Gene Expression through Engineered RNA Technologies

8-35

180. de Smit, M.H. and van Duin, J. Control of translation by mRNA secondary structure in Escherichia coli. A quantitative analysis of literature data. J. Mol. Biol., 244, 144–150, 1994. 181. Desai, S.K. and Gallivan, J.P. Genetic screens and selections for small molecules based on a synthetic riboswitch that activates protein translation. J. Am. Chem. Soc., 126, 13247–13254, 2004. 182. An, C.I., Trinh, V.B., and Yokobayashi, Y. Artificial control of gene expression in mammalian cells by modulating RNA interference through aptamer-small molecule interaction. RNA, 12, 710–716, 2006. 183. Kim, D.S., Gusti, V., Pillai, S.G., and Gaur, R.K. An artificial riboswitch for controlling pre-mRNA splicing. RNA, 11, 1667–1677, 2005. 184. Soukup, G.A. and Breaker, R.R. Engineering precision RNA molecular switches. Proc. Natl. Acad. Sci. USA, 96, 3584–3589, 1999. 185. Tang, J. and Breaker, R.R. Rational design of allosteric ribozymes. Chem. Biol., 4, 453–459, 1997. 186. Araki, M., Okuno, Y., Hara, Y., and Sugiura, Y. Allosteric regulation of a ribozyme activity through ligand-induced conformational change. Nucleic Acids Res., 26, 3379–3384, 1998. 187. Soukup, G.A. and Breaker, R.R. Design of allosteric hammerhead ribozymes activated by ligandinduced structure stabilization. Structure, 7, 783–791, 1999. 188. Soukup, G.A., Emilsson, G.A., and Breaker, R.R. Altering molecular recognition of RNA aptamers by allosteric selection. J. Mol. Biol., 298, 623–632, 2000. 189. Kertsburg, A. and Soukup, G.A. A versatile communication module for controlling RNA folding and catalysis. Nucleic Acids Res., 30, 4599–4606, 2002. 190. Link, K.H. et al. Engineering high-speed allosteric hammerhead ribozymes. Biol. Chem., 388, 779–786, 2007. 191. Soukup, G.A. and Breaker, R.R. Nucleic acid molecular switches. Trends Biotechnol., 17, 469–476, 1999. 192. Wilson, D.S. and Szostak, J.W. In vitro selection of functional nucleic acids. Ann. Rev. Biochem., 68, 611–647, 1999. 193. Koizumi, M., Soukup, G.A., Kerr, J.N., and Breaker, R.R. Allosteric selection of ribozymes that respond to the second messengers cGMP and cAMP. Nat. Struct. Biol., 6, 1062–1071, 1999. 194. Piganeau, N., Thuillier, V., and Famulok, M. In vitro selection of allosteric ribozymes: theory and experimental validation. J. Mol. Biol., 312, 1177–1190, 2001. 195. Sassanfar, M. and Szostak, J.W. An RNA motif that binds ATP. Nature, 364, 550–553, 1993. 196. Burgstaller, P. and Famulok, M. Isolation of RNA apatmers for biological cofactors by in vitro selection. Angew. Chem. Int. Ed. Engl., 33, 1084–1087, 1994. 197. Win, M.N. and Smolke, C.D. From the cover: A modular and extensible RNA-based generegulatory platform for engineering cellular function. Proc. Natl. Acad. Sci. USA, 104, 14283– 14288, 2007. 198. Suess, B., Fink, B., Berens, C., Stentz, R., and Hillen, W. A theophylline responsive riboswitch based on helix slipping controls gene expression in vivo. Nucleic Acids Res., 32, 1610–1614, 2004. 199. Lynch, S.A., Desai, S.K., Sajja, H.K., and Gallivan, J.P. A high-throughput screen for synthetic riboswitches reveals mechanistic insights into their function. Chem. Biol., 14, 173–184, 2007. 200. Bayer, T.S. and Smolke, C.D. Programmable ligand-controlled riboregulators of eukaryotic gene expression. Nat. Biotechnol., 23, 337–343, 2005. 201. Eddy, C.K. et al. Segmental message stabilization as a mechanism for differential expression from the Zymomonas mobilis gap operon. J. Bacteriol., 173, 245–254, 1991. 202. Mejia, J.P. et al. Coordination of expression of Zymomonas mobilis glycolytic and fermentative enzymes: a simple hypothesis based on mRNA stability. J. Bacteriol., 174, 6438–6443, 1992. 203. Higgins, C.F. Stability and degradation of mRNA. Curr. Opin. Cell Biol., 3, 1013–1018, 1991. 204. Keasling, J.D. Gene-expression tools for the metabolic engineering of bacteria. Trends Biotechnol., 17, 452–460, 1999.

8-36

Gene Expression Tools for Metabolic Pathway Engineering

205. Smolke, C.D. and Keasling, J.D. Effect of copy number and mRNA processing and stabilization on transcript and protein levels from an engineered dual-gene operon. Biotechnol. Bioeng., 78, 412–424, 2002. 206. Smolke, C.D., Martin, V.J., and Keasling, J.D. Controlling the metabolic flux through the carotenoid pathway using directed mRNA processing and stabilization. Metab. Eng., 3, 313–321, 2001. 207. Pfleger, B.F., Pitera, D.J., Smolke, C.D., and Keasling, J.D. Combinatorial engineering of intergenic regions in operons tunes expression of multiple genes. Nat. Biotechnol., 24, 1027–1032, 2006. 208. Hamilton, A.J. and Baulcombe, D.C. A species of small antisense RNA in posttranscriptional gene silencing in plants. Science, 286, 950–952, 1999. 209. Smith, N.A. et al. Total silencing by intron-spliced hairpin RNAs. Nature, 407, 319–320, 2000. 210. Wesley, S.V. et al. Construct design for efficient, effective and high-throughput gene silencing in plants. Plant J., 27, 581–590, 2001. 211. Wang, M.B. and Waterhouse, P.M. Application of gene silencing in plants. Curr. Opin. Plant Biol., 5, 146–150, 2002. 212. Allen, R.S. et al. RNAi-mediated replacement of morphine with the nonnarcotic alkaloid reticuline in opium poppy. Nat. Biotechnol., 22, 1559–1566, 2004. 213. Geiger, A., Burgstaller, P., von der Eltz, H., Roeder, A., and Famulok, M. RNA aptamers that bind L-arginine with sub-micromolar dissociation constants and high enantioselectivity. Nucleic Acids Res., 24, 1029–1036, 1996. 214. Mannironi, C., Scerch, C., Fruscoloni, P., and Tocchini-Valentini, G.P. Molecular recognition of amino acids by RNA aptamers: the evolution into an L-tyrosine binder of a dopamine-binding RNA motif. RNA, 6, 520–527, 2000.

9 Tools Designed to Regulate Translational Efficiency 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9

Introduction ��9-1 Synthetic Genes ��9-2 Synthetic Genes for Synthetic Biology...........................................9-2 Codon Usage in Different Hosts.....................................................9-3 Improving Expression by Modifying the Host............................ 9-4 Improving Expression by Modifying the Gene............................9-5 Codon Optimization Using the CAI = 1 Algorithm................... 9-6 Codon Optimization by Probability Score................................... 9-6 DNA Sequence Features to Eliminate, Add, or Modify during Design ��9-7

GC Content • Avoiding mRNA Sequence Motifs • Avoiding mRNA Secondary Structure • Downstream Region • Homology to Wild-Type Sequence • Codon Context

Claes Gustafsson DNA2.0, Inc.

9.10 I ncorporating Tools for Translational Control in Metabolic Engineering ��9-9 Acknowledgments ��9-10 References ��9-10

9.1 Introduction The genetic information encoded in an open reading frame (ORF) goes far beyond simply stating the order of amino acids in the protein. Information about temporal and spatial expression levels, DNA tertiary and quaternary structure, nucleosome positioning, and much more is all intertwined with the protein coding information. It is estimated that 40–60% of all human multiexon genes have alternative splicing, antisense transcription occurs in 10–20% of all genes, mRNA editing is common (at least in neural cells), cis-regulatory elements are abundant and mRNA degradation signals are identified in genes throughout the human genome. As we start to peel through the different layers of complex and integrated information present within the coding regions of DNA, we can start to make more informed decisions on how to design genes and genetic networks. The design and use of synthetic genes offers a mechanism by which researchers can assume much greater control over protein expression and regulation in the cellular environment. Codon biases can be manipulated, peptide tags can be added, splice sites removed, and restriction sites placed as desired. The cost, speed and fidelity of gene synthesis appears to be following a trajectory similar to that seen for synthetic oligonucleotides over the past two decades, making their use increasingly time- and cost-effective [1–4]. This trend allows the scientist and metabolic engineer to focus on designing the end product 9-1

9-2

Gene Expression Tools for Metabolic Pathway Engineering

instead of spending time and resources in obtaining the tools with which to do the work. Direct synthesis of genes is now the most efficient way to make functional genetic constructs and enables applications such as codon optimization [5], making RNAi resistant genes [6] and protein engineering [7].

9.2 Synthetic Genes The technology to produce synthetic genes predates both polymerase chain reaction (PCR) and cloning from cDNA libraries. The first synthetic gene, a gene encoding the yeast alanine tRNA, was synthesized already in 1970 [8]. The first gene encoding a translated product, a 14-amino acid residue hormone, was made in 1977 by Genentech and their academic collaborators [9]. A few years later the synthesis of the first full length gene encoding a protein was accomplished by the synthesis of the 166 codons long alpha interferon gene [10]. In 2002 came the first report of synthesizing an entire genome, the 7.4 kb poliovirus genome [2]. Today, there is no technical upper limit to the length of synthetic DNA fragments that can be synthesized. It is common for DNA2.0 (www.dna20.com, a gene synthesis company based in Menlo Park, CA) to receive orders for DNA fragments of 20 kb and larger. Several alternative methods for gene synthesis have been developed over the years including gene synthesis on solid phase in several related formats [11–14], FokI based synthesis [15], polymerase cycling assembly [16,17], ligase chain reaction [18], shotgun synthesis [19], synthesis by serial cloning [20], and insertional gene synthesis [21]. The many methods available each have their distinctive advantages and disadvantages. Some methods are better for GC rich sequences, some for highly repetitive sequences. Some methods are faster and some have higher accuracy. There is no “best method”, but instead the many methods complement each other giving the user a plethora of gene synthesis tools to choose from. With the appearance of several commercial gene synthesis vendors, such as DNA2.0, most scientists can now quickly and routinely order any synthetic DNA fragment without having to ponder the alternative synthesis methods. With the many rapidly advancing gene synthesis technologies available, essentially any sequence can be synthesized independent of sequence characteristics and requirements.

9.3 Synthetic Genes for Synthetic Biology It is difficult to imagine conducting molecular biology research today without access to custom oligonucleotides. Similarly, it is becoming increasingly hard to envision creating artificial metabolic pathways and other synthetic biology applications without the access to synthetic genes. Natural, nonsynthetic, genes encode not only the protein sequence of the gene of interest but also encode a complete evolutionary history of the gene and the organism encoding it [22]. This interpretive proteomics as revealed by cis-regulatory nucleotide sequences, alternative splicing, codon bias, etc., can be very useful for analyzing ancient metabolic pathways and regulatory networks and other features under various degrees of selective pressure. The wild-type DNA sequence, however, is not useful when the purpose of manipulating the gene is to express the protein in a new context, under non-native conditions, as defined by the experimenter. Furthermore, the wild-type genes are rarely selected by evolution to express at high levels even in their native organism. Instead, the expression levels of the wild-type gene are finely tuned both by transcriptional and translational features as well as regulatory circuits to express the gene product at the evolutionarily favored levels and at the evolutionary favored conditions. The construction of novel genetic pathways, regulatory circuits, and designed organisms require biological building blocks that solely contain the information content needed for the relevant application, and not any spurious information content that may or may not be known, and may or may not interfere with the designed application. Just like it is crucial for the software engineer to only use clean blocks of programming code where each block encapsulate one function, it is important for the genetic engineer to use clean blocks of DNA sequence to ensure each function that is introduced into the system is a known entity and only encodes the function required.

Tools Designed to Regulate Translational Efficiency

9-3

9.4 Codon Usage in Different Hosts The genetic code uses 64 nucleotide triplets (codons) to encode 20 amino acids and stop. Each amino acid is encoded by on average three codons that are “read” during translation by tRNAs charged with the cognate amino acid. The degeneracy of the genetic code enables many alternative nucleotide sequences to encode the same protein. The frequencies with which different codons are used by different organisms and different types of genes vary significantly [23] and are correlated to the concentration of the corresponding tRNA population in the cell [24]. Rare codons are not only strongly associated with low levels of protein expression due to ribosome stalling and abortive translation [25], but also implicated in frameshift and amino acid misincorporation [26,27]. Codon usage has been identified as the single most important factor in increasing prokaryotic gene expression [28]. Principal component analysis (PCA) can be used to compress high-dimensional information into lower dimensional maps. PCA can be applied to compress large codon usage tables for a multitude of organism into a two-dimensional graph that captures the majority of the codon distribution information. This type of PCA plot provides a convenient way to visualize differences in codon preferences between different organisms. Figure 9.1 shows the codon preferences of ten common hosts for heterologous protein expression. The 62-dimensional information is derived from the codon usage for 62 codons (Trp and Met are excluded as they are encoded by one codon, i.e., distribution bias is one). The multidimensional data is projected down to a new coordinate system so that the maximum variance is captured by the first coordinate. In this example, 42% of all variance is captured by the first coordinate (Principal Component 1) and 27% of the variance is captured by the second coordinate (Principal Component 2). Accordingly, 69% of all variance present in the 62-dimensional dataset can be visualized in only two-dimensions. The codon usage of human, mouse, and CHO cells map very closely in the lower right quadrant. This is consistent with all three organisms being mammals and of close evolutionary origin. Codons in the same lower right quadrant such as GAG and CCC are preferentially favored by the mammals, whereas codons in the opposite (upper left) quadrant such as GTA and TTA are preferentially avoided compared to the other hosts in the graph. Figure 9.1 also makes immediately obvious the considerable divergence between E. coli and human codon preferences. This confirms what many researchers have learned through trial and error. E. coli is not the optimal host for expressing proteins encoded by human codon usage profile. The “Coli 2” table is derived from a subset of highly expressed E. coli genes [29]. This codon table, also often called “Coli class II table”, is the most extreme codon table present in Figure 9.1, as is evident in the far top right location of the graph. This result is consistent with the finding that codon bias is usually increased in highly expressed genes [23,24,30]. This may be a direct correlation between expression levels and codon bias, or more likely an indirect correlation as both expression levels, and codon bias rely on other evolutionary constraints [31]. Codons in the lower left quadrant are significantly underrepresented and upper right quadrant codons overrepresented in the “Coli 2” table. From Figure 9.1 it can be deduced that using the “Coli 2” codon usage table is not advisable if the designed gene is to be expressed in any other host than E. coli. Pichia pastoris located orthagonally to “Coli 2” in the figure would be the worst host of the organism depicted here for expressing a “Coli 2” gene. Combined codon usage tables are useful in cases where a gene is initially screened for expressing a protein with a defined activity in a high throughput system such as E. coli and later transferred to a more rigorous low throughput system such as CHO-cells for biochemical characterization. The Sf9/Human table is an example of a combined codon usage table. For this codon usage table the codon usage for human and Sf9 baculovirus are averaged and any codon that is below 10% in either organism is set below the threshold of the combined table. The Sf9/Human codon table is a reasonable compromise for design of genes that are intended to be expressed in both baculovirus and mammalian expression systems. For some host pairs such as Arabidopsis thaliana and Streptomyces coelicolor, combined codon usage tables are impossible to create without severely

9-4

Gene Expression Tools for Metabolic Pathway Engineering 0.5

Coli 2

Coli 0.4

B. subtilis

0.3

PC 2 (27%)

0.2

Yeast

0.1 0

Pichia

–0.1 –0.2 –0.3 –0.4 –0.8

–0.6

–0.4

AAA + CCG ++GCG GAA GTA + ++ + +CGT ATT+TCG + +ACG ++ GTT GAT +CGC +TAT GGT TTA+ +CTG + CAT+ GCA+ +GGC TTT TCT + + +ACC CAA AGC + AAC TGT +TGC + +ATC AAT ++ +CTT +CAG +TTGACT TAT TTC AGA++CTA+ +GGG + ++ GCT +CGG ACA TCCCAC TAC + ++ ++CGA TGA ATA+CCA + GTC + + +GAC + AGT + GTC + GGA + GCC ++ +GAG CCT TAG Sf9 + + CTC + CCC AGG AAG Sf9/Human Human CHO Mouse –0.2 0 PC 1 (42%)

0.2

0.4

0.6

Figure 9.1 Graphical representation of “codon usage space”. The codon bias for each of the 64 codons in ten common codon usage tables were compressed into two-dimensions using PCA where each codon is a vector. Two of the codons, ATG and TGG, are omitted since the bias is 1.0 (only one option) for all organisms. Principal Component 1 (PC 1) captures 42% of all information and Principal Component 2 (PC 2) captures 27%, i.e., 69% of all codon distribution information from the ten organisms × 62 codons can be visually inspected in the graph. “Yeast” denotes the codon distribution for all ORF in Saccharomyces cerevisiae, “Pichia” denotes the codon distribution for all sequenced open reading frames in Pichia pastoris, “B. subtilis” denotes the codon distribution for all ORF in Bacillus subtilis, “Coli” denotes the codon distribution for all ORF in Escherichia coli, “CHO” denotes the codon distribution for all sequenced ORF in Cricetulus griseus (from where CHO cells are derived), “Human” denotes the codon distribution for all ORF in Homo sapiens, “Mouse” denotes the codon distribution for all ORF in Mus musculus. “Sf9” denotes the codon distribution for all sequenced open reading frames in Spodoptera frugiperda, “Coli 2” denotes the codon distribution for highly expressed E. coli genes (see text), “Sf9/Human” denotes the codon distribution in a combined Sf9/Human table (see text). The genome codon usage distributions are derived from the codon usage database (http://www.kazusa.or.jp/codon).

adjusting the threshold levels. The only Arg codons above the 10% threshold in S. coelicolor are CGG and CGC, whereas in Arabidopsis thaliana both those codons are below the 10% threshold (9 and 7%, respectively).

9.5 Improving Expression by Modifying the Host If the negative effect of different codon usage biases on heterologous gene expression results from different tRNA levels, one simple solution appears to be to expand the intracellular tRNA pool of the host. This can be accomplished by overexpressing genes encoding the rare tRNAs. For E. coli, the primary targets to facilitate expression of human genes are the argU gene encoding the minor tRNA Arg4 that reads rare AGG and AGA codons, tRNA Ile2 that reads AUA, tRNALeu3 that reads CUA and CUG, and tRNAPro2 that reads CCC and CCU [27]. E. coli strains overexpressing these tRNA genes are commercially available from companies such as Stratagene and Novagen. Several laboratories have shown that

Tools Designed to Regulate Translational Efficiency

9-5

expression yields of proteins whose genes contain rare codons can be improved when the cognate tRNA is increased within the host [27]. Even though tRNA overexpression initially appears to be an attractive solution, there are several caveats. Different tRNAs need to be overexpressed for genes originating from different organisms, and the strategy is less appealing for hosts that are more difficult to manipulate than E. coli. There are also general pleiotropic effects of changing the concentrations of charged tRNA in the cell [32]. Perhaps the most important concern with expanding the intracellular tRNA pool is the question of how increasing the tRNA concentration will affect amino acylation and tRNA modifications and thus if the composition of the overexpressed protein will be consistent. Transfer RNA molecules are extensively processed before amino-acylation and participation in the translational process. More than 30 modified nucleotides have been found in E. coli tRNAs; some are present at the same position for all tRNAs, others are found in one or a few different tRNAs [33]. Many of the tRNA modifications scattered throughout the tRNA molecule, and especially those located in the anticodon loop, have been shown to improve reading frame maintenance [34]. One purpose of these modifications is to reduce levels of translational frameshift. The lack of some tRNA modifications have been experimentally linked to missense and nonsense errors during translation [33], for example tRNAs lacking methylation of tRNA at the N-1 position of guanosine (m1G) at position 37 result in translational frameshift [35]. Thus, one problem with the tRNA overexpression strategy is that producing a fully functional tRNA requires other cellular components that might be in limiting supply when the tRNA alone is overproduced. When tRNALeu1 is overexpressed in E. coli, the tRNA is significantly undermodified in at least two ways. m1G at position 37 and pseudouridine (Ψ) at position 32. Only 40% of the tRNALeu1 molecules are amino-acylated, the strain grows very slowly and the ribosomal step time is reduced two- to three-fold [36]. Similarly, overexpressed tRNATyr at least results in a decrease of the 2-methylthio-N 2 6-isopentenyl adenosine (ms2i6A) modification at position 37 and a tRNA that is less efficient in vitro [37]. Loss of the ms2i6A modification following tRNAPhe overexpression led to decreased fidelity of translation [38,39]. Translational missense substitution frequencies can increase by more than an order of magnitude as a function of underacetylated tRNA. One particular concern over such loss of fidelity is the possibility that the resulting heterogeneous mixture of proteins might induce an immune response if introduced into vertebrates [40,41]. In addition to translational fidelity and host metabolic load issues, the tRNA overexpression strategy is not very flexible. It is much more difficult to engineer fungal or mammalian host cells than E. coli. In eukaryotic cells, the tRNA expression is primarily driven by copy number, not promoter strength, further complicating the issue. For some applications such as the emerging field of DNA vaccines, host engineering is quite out of the question.

9.6 Improving Expression by Modifying the Gene Given the constraints of modifying the host, a more plausible way of controlling the functional characteristics of the engineered biological system is to modify the gene itself. By having access to synthetic genes, the exact nucleotide sequence of the synthetic construct is only limited by the users’ gene design creativity. Any sequence designed in silico now can quickly and efficiently be converted into a physical gene, operon or genome. But how does one design a synthetic gene? In 1977, when Genentech scientists and their academic collaborators produced the first human protein (somatostatin) in a bacterium [9], expression of proteins in heterologous hosts played a crucial role in the launch of the entire biotechnology industry. At the time, only the amino acid sequence of somatostatin was known, so the Genentech group synthesized the 14-codon long somatostatin gene using oligonucleotides instead of cloning it from the human genome. Itakura and coworkers designed these oligonucleotides based on three criteria; First, codons favored by the phage MS2 were used preferentially—very little of the E. coli genome DNA sequence was known at the time, but the MS2 phage had just been sequenced and was assumed to provide a good guide to the codons used in highly expressed

9-6

Gene Expression Tools for Metabolic Pathway Engineering

E. coli genes. Second, care was taken to eliminate undesirable inter- and intra-molecular pairing of the overlapping oligonucleotides as this would compromise the gene synthesis process and as it would also avoid any mRNA repeats or inverted repeats, i.e., secondary structures. Third, sequences that were GC-rich followed by AT-rich were avoided as it was believed such sequences could terminate transcription. The result was the first production of a functional polypeptide from a synthetic gene. Even for the tiny somatostatin ORF, the number of alternative nucleotide sequence solutions available for synthesis is a staggering five million that all code for the same 14 amino acids long somatostatin. Which of the five million solutions is “best”?

9.7 Codon Optimization Using the CAI = 1 Algorithm An often used way to design a DNA sequence from an amino acid sequence is to assign the most abundant codon to all instances of that amino acid in the sequence. Codon usage preference in a gene can be measured by Codon Adaptation Index (CAI score) [42]. The CAI score for such a construct is 1.0, i.e., in each case only the most abundant codon is used. This “one amino acid—one codon” or “CAI = 1.0” approach is very simple and straight forward, but unfortunately has several drawbacks. Strongly transcribed mRNA from a CAI = 1.0 gene will generate high codon concentrations for a subset of the tRNA populations, resulting in imbalanced tRNA pool, skewed codon usage pattern and increased translational error [40,41]. Heterologously expressed proteins may be produced at levels as high as 80% of total cell mass, making an imbalance tRNA pool a significant problem with severe pleiotropic effects [43,44]. Furthermore, protein expression levels are not necessarily directly correlated to the CAI score. Silent mutations in CAI = 1 genes have been shown to lead to increased protein expression yield [45]. A CAI = 1.0 design also offers no flexibility in adjusting the nucleotide sequence. It is impossible to avoid repetitive elements and mRNA secondary structures in the gene if the codons are fixed to only one option. It is often desirable to incorporate or exclude sequence elements such as restriction sites from the sequence to facilitate subsequent manipulations. These modifications are impossible to accommodate if the codon usage is rigidly fixed. Other sequences such a cis-regulatory signals, internal ribosome binding sites, splicing sites, codon context effects, etc. cannot be avoided if the nucleotide is solely defined based on the CAI = 1 principle.

9.8 Codon Optimization by Probability Score Contrary to the “CAI = 1.0” method, an alternative codon optimization algorithm matches the codon distribution profile with a predetermined probability score table [46]. Each codon is given a probability score based on its abundance in, e.g., a set of highly expressed genes, or alternatively derived from the genome of the host organism of interest [5]. For E. coli expression, it is common practice to use the “Coli 2” table which is derived from a collection of highly expressed E. coli genes [29]. Typically any codon below a given threshold value of say 10% is not used and the remaining codons are used according to their defined codon usage probability. The threshold level directly reflects the codons that are omitted from the design of the synthetic genes. At the lowest level (10%), any codon that is present in the codon bias table 10% for any given amino acid can not be used for the design. The threshold variable is strongly correlated with CAI score. The higher the threshold, the higher the CAI scores. Genes designed using probability score ensures that the “fingerprint” of the synthetic gene looks like that of the host organism (or of the genes that were used to build the codon usage table) and the synthetic gene will utilize the tRNA pool of the host at the same relative frequency as genes from the host organism. The probability score method further ensures that there is no one sequence that is “the best” sequence, but that there are many, many sequences that all fulfill the constraints of the algorithm. Candidate sequences can be enumerated in silico using a Monte Carlo algorithm. The algorithm selects codons based on the probabilities obtained from the codon usage table and combines the codons into

Tools Designed to Regulate Translational Efficiency

9-7

a proposed optimized sequence. Depending on design constraints, computing power, and length of protein sequence, millions of in silico sequences can quickly be generated where each sequence is equally good. Each designed sequence can then be passed through subsequent filters to ensure a match with any additional design criteria. The in silico filters can include filtering out mRNA secondary structures and DNA repeats, eliminating or incorporating restriction sites or other sequence motifs, and avoiding methylation sites that overlap methylation sensitive restriction sites, etc [5]. Commercial software for the design of expression-optimized synthetic gene constructs includes Gene Composer from Emerald BioSystems [47], and the free alternative Gene Designer from DNA2.0 [48], and software referenced therein.

9.9 DNA Sequence Features to Eliminate, Add, or Modify during Design Given the vast number of possible nucleotide solutions to even stringently constrained short proteins, the possibilities to modify a synthetic gene without altering the protein sequence are endless. A convenient way to approach the design is to generate thousands of in silico sequences and sort them based on various criteria.

9.9.1 GC Content It is often desirable to constrain both the local and global GC content of a synthetic gene to fall relatively close to the GC content of the host organism. By in silico generating a multitude of alternative solutions based on the same constraints, the sequences can be sorted by GC content and the sequence closest the preferred GC percentage can be synthesized. However, to a large degree the GC content will be determined by the codon usage table applied in combination with the degree of abundance for certain amino acids. A glycine rich gene (codons GGC, GGG, GGA, and GGT in order of abundance in mammalian codon usage table) optimized for expression in mammalian cells for example will be relatively GC rich, no matter how much the design is focused on avoiding high GC content. Runs of C and/or G in an ORF have been implicated in ribosomal slippage and frameshift of the ribosome during translation [49,50]. We generally filter out any runs of CG longer than six nucleotides to ensure uniform GC content variance during the translational process. In mammalian systems, both correlation analysis [51] and experimental studies [52] suggests a weak but consistent correlation between GC in the 3′ wobble position of translated codons and increased expression levels. This is in direct contradiction to other studies showing no correlation between codon usage and expression levels in mammalian systems [53].

9.9.2 Avoiding mRNA Sequence Motifs Intragenic motifs such as cryptic Shine–Dalgarno sequences, splice site motifs, IRE elements, internal Kozak sequences etc. can all be screened for and removed from the synthetic gene design. For some motifs, such as internal Shine Dalgarno it is clear that avoiding this motif in the synthetic gene design will increase the level of gene expression in E. coli [54]. For many other motifs, however, the data is often limited as to the effect the motif may have for expression. Furthermore, the functional effect of the motifs will vary between host systems, e.g., the presence of a splice site motif in a synthetic gene will not have any effect on protein expression in E. coli. Thousands of short regulatory mRNA motifs have been identified and more are identified every day. Many of these identified sequences are short and have a promiscuous consensus sequences g uaranteeing that any designed synthetic gene will encode one or more of these motifs. For those who cannot resist the urge to analyze their designed synthetic gene sequences against all possible eukaryotic mRNA sequence motifs, we recommend the online resource “Regulatory RNA Motifs and Elements Finder” (RegRNA

9-8

Gene Expression Tools for Metabolic Pathway Engineering

database) [55]. The results derived from searches like these should however be treated with caution; the motifs are often not experimentally validated, are often unique to a particular organism or family of organisms, or part of a multicomponent network. For genes to be expressed in the standard laboratory work horse E. coli, we typically avoid motifs such as internal Shine–Dalgarno sequences in all three reading frames and runs of five or more of a single nucleotide. Depending on applications, it may also be relevant to specifically avoid endoribonuclease target sequences for RNase III [56], and the less defined RNase E [57]. For genes to be expressed in other hosts, other host specific motifs have to be scanned for and avoided during synthesis.

9.9.3 Avoiding mRNA Secondary Structure Much controversy has been generated in the literature on the inhibitory effects by secondary RNA structure elements on translational processivity. It is generally assumed that mRNA secondary structures should be avoided whenever possible. However, during translation a significant stem-loop structure in the coding part of the mRNA does not seem to hinder the progress of the translational machinery. It has been shown that actively translating ribosomes can break up such structures, either by the energy driven translation process itself or by the support of RNA helicases [58,59]. It has even been suggested that mRNA secondary structures could be desired to incorporate in synthetic genes. Analysis of coding regions in bacteria show that RNA secondary structures are overrepresented within the ORF and suggested to have important functions in RNA processing, regulation of mRNA stability, and translational control [60]. There are numerous publicly available software programs that predicts RNA secondary structures, of which mFold [61] is probably the most popular. These types of software are designed to calculate RNA secondary structures for naked RNA and works reasonably well for short nontranslated RNA such as tRNA. The translated mRNA within an ORF, however, is densely covered by ribosomes, protecting the mRNA from long distance cis or trans annealing. Chemical footprinting of mRNA-ribosome complexes show that up to 20 codons (60 nucleotides) are covered by a single translating ribosome [62], and the ribosomes are translating at ~18 codons (54 bp)/second with one ribosome every 2 seconds [63] leaving only ~50 nucleotides available between translating ribosomes for folding an mRNA secondary structure. It is often recommended to avoid mRNA secondary structures, even though the experimental data supporting that conclusion is weak. In any case, any mRNA secondary structure requiring components outside of a 50 nucleotides sliding window are likely existing only in the virtual world, and not in actively translating mRNA.

9.9.4 Downstream Region The region stretching from the initiation fMet-ATG codon to approximately codon ~10 has long been implicated in controlling protein expression levels. The existence and importance of the downstream region has been experimentally verified for many bacterial genes and seems to be a generic bacterial translation feature [64–70]. Despite the widespread acceptance of the downstream region, the mechanistic explanation for the importance of this region is unclear. Several studies point at the unique codon usage in the downstream box [71–73], whereas other experiments suggest that the regulation of translational initiation/efficiency of the region is due to secondary structure formation [74,75], peptidyl-tRNA drop-off [76,77], avoidance of NGG codon [78,79], or base pairing to 16S [64,66]. Whatever the reason, special caution needs to be addressed when designing the first ten codons. We are constantly monitoring the literature as well as our own experimental data to capture as much information constrain as possible about the downstream region and incorporate it into the publicly available Gene Designer software.

Tools Designed to Regulate Translational Efficiency

9-9

During the first translated codons of an ORF the ribosome is much more prone to premature t ermination and the mRNA-ribosome initiation complex is not stabilized until codon 10–15. It has been suggested that the complex is stabilized once the nascent synthesized peptide is extruding from the ribosome.

9.9.5 Homology to Wild-Type Sequence Many genes encode known and unknown cis-regulatory elements within the ORF. This has been suggested to be more common for viral genes as these are much more densely packed in the viral genomes. By choosing a codon optimization solution that is as distant as possible from the wild-type sequence it can be inferred that the presence of any cis-regulatory elements have been removed and elements such as feedback loops etc no longer exists. Depending on gene sequence, the level of identity between the wild-type gene and the synthetic can be as low as ~65% while still have the exact amino acid sequence of the designed protein and codon bias consistent with the host organism. This feature has also been successfully used to design RNAi-resistant genes [6]. For many optimizations, we routinely take advantage of the ability to identify sequences that are as far away as possible from the wild-type sequence. On other instances (e.g., when optimizing a set of variant genes), we optimize the sequences to be as close as possible to a given reference sequence.

9.9.6 Codon Context The local context of a codon can sometimes influence the protein expression levels. Already in the early 1980s it was shown that the efficiency of the UGA stop codon in E. coli is decreased if followed by an adenine [80,81]. Since then, a multitude of experimentally validated codon contexts have been shown to affect ribosomal frameshift, missense, and nonsense incorporations and translational efficiency [82–85]. Separate from the experimentally validated cases of codon context effect on protein expression levels, there are also several publications where codon context effects have been proposed based on in silico analysis of genomes [86–88]. The absence of certain codon contexts in the analysis of entire genomes does not necessarily reflect that the identified sequences affect protein expression of a recombinant gene when grown in rich media, but more likely is a consequence of other evolutionary pressures such as facilitating DNA replication, mutational bias, expression during starvation, intrinsic metabolic regulation etc. [89]. In at least one case [90], the predicted codon pair bias effect on protein expression could not be experimentally validated [91].

9.10 Incorporating Tools for Translational Control in Metabolic Engineering Fifty years ago, upon unveiling the structure of DNA, Francis Crick and James Watson modestly stated “this structure has novel features which are of considerable scientific interest.” Since then, there has been an abundance of masterful artistry based on DNA which has further revealed and exploited its many secrets. As a sculptor chisels wood, the genetic engineer manipulates DNA, both must work with the many peculiarities of their respective natural substrates. However, unlike the sculptor, who can easily find a knot-free piece of wood, genetic engineers have had to accept their raw materials as they come from nature, despite features that may be incompatible with his or her vision and tools. Thanks to the convergence of molecular biology, computational sciences, and manufacturing engineering, the genetic engineer is no longer confined to building things based on natures leftover DNA. Instead by using the tools developed over the last few years, single genes, operons, complete metabolic pathway as well as entire chromosomes and genomes can now be designed, synthesized and assembled into fully functional units starting from only the simplest of phosphoamidites.

9-10

Gene Expression Tools for Metabolic Pathway Engineering

DNA is the information that defines the scope and extension of the application, just like software code is the information that defines the scope and extension of any application requiring computation (such as writing this text in Microsoft Word). The applications made possible by gene synthesis and metabolic engineering are only constrained by the imagination and creativity of the user.

Acknowledgments Jeremy Minshull, Louise Rafty, Sridhar Govindarajan, and Jon Ness all at DNA2.0, are gratefully acknowledged for providing technical, linguistic and moral support. Magda Bartilson for editing. DNA2.0 is acknowledged for providing financial support.

References 1. Casimiro D.R., Wright P.E., and Dyson H.J. PCR-based gene synthesis and protein NMR spectroscopy. Structure, 1997, 5:1407–1412. 2. Cello J., Paul A.V., and Wimmer E. Chemical synthesis of poliovirus cDNA. generation of infectious virus in the absence of natural template. Science, 2002, 297:1016–1018. 3. Prodromou C. and Pearl L.H. Recursive PCR: a novel technique for total gene synthesis. Protein Eng., 1992, 5:827–829. 4. Withers-Martinez C., Carpenter E.P., Hackett F., Ely B., Sajid M., Grainger M., and Blackman M.J. PCR-based gene synthesis as an efficient approach for expression of the A + T-rich malaria genome. Protein Eng., 1999, 12:1113–1120. 5. Gustafsson C., Govindarajan S., and Minshull J. Codon bias and heterologous protein expression. Trends Biotechnol., 2004, 22:346–353. 6. Kumar D., Gustafsson C., and Klessig D.F. Validation of RNAi silencing specificity Using synthetic genes: Salicylic acid-binding protein 2 is required for plant innate immunity. Plant J., 2006, 45:863–868. 7. Gustafsson C., Govindarajan S., and Minshull J. Putting engineering back into protein engineering. bioinformatic approaches to catalyst design. Curr. Opin. Biotechnol., 2003, 14:366–370. 8. Agarwal K.L., Buchi H., Caruthers M.H., Gupta N., Khorana H.G., Kleppe K., Kumar A., Ohtsuka E., Rajbhandary U.L., Van de Sande J.H., Sgaramella V., Weber H., and Yamada T. Total synthesis of the gene for an alanine transfer ribonucleic acid from yeast. Nature, 1970, 227:27–34. 9. Itakura K., Hirose T., Crea R., Riggs A.D., Heyneker H.L., Bolivar F., and Boyer H.W. Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin. Science, 1977, 198:1056–1063. 10. Edge M.D., Green A.R., Heathcliffe G.R., Meacock P.A., Schuch W., Scanlon D.B., Atkinson T.C., Newton C.R., and Markham A.F. Total synthesis of a human leukocyte interferon gene. Nature, 1981, 292:756–762. 11. Beattie K.L. and Fowler R.F. Solid-phase gene assembly. Nature, 1991, 352:548–549. 12. Richmond K.E., Li M.H., Rodesch M.J., Patel M., Lowe A.M., Kim C., Chu L.L., Venkataramaian N., Flickinger S.F., Kaysen J., Belshaw P.J., Sussman M.R., and Cerrina F. Amplification and assembly of chip-eluted DNA (AACED): a method for high-throughput gene synthesis. Nucleic Acids Res., 2004, 32:5011–5018. 13. Tian J., Gong H., Sheng N., Zhou X., Gulari E., Gao X., and Church G.M. Accurate multiplex gene synthesis from programmable DNA microchips. Nature, 2004, 432:1050–1054. 14. Zhou X., Cai S., Hong A., You Q., Yu P., Sheng N., Srivannavit O., Muranjan S., Rouillard J.M., Xia Y., Zhang X., Xiang Q., Ganesh R., Zhu Q., Matejko A., Gulari E., and Gao X. Microfluidic PicoArray synthesis of oligodeoxynucleotides and simultaneous assembling of multiple DNA sequences. Nucleic Acids Res., 2004, 32:5409–5417.

Tools Designed to Regulate Translational Efficiency

9-11

15. Mandecki W. and Bolling T.J. FokI method of gene synthesis. Gene, 1988, 68:101–107. 16. Dillon P.J. and Rosen C.A. A rapid method for the construction of synthetic genes using the polymerase chain reaction. Biotechniques, 1990, 9:298–300. 17. Jayaraman K., Fingar S.A., Shah J., and Fyles J. Polymerase chain reaction-mediated gene synthesis. synthesis of a gene coding for isozyme c of horseradish peroxidase. Proc. Natl. Acad. Sci. USA, 1991, 88:4084–4088. 18. Au L.C., Yang F.Y., Yang W.J., Lo S.H., and Kao C.F. Gene synthesis by a LCR-based approach. high-level production of leptin-L54 using synthetic gene in Escherichia coli. Biochem. Biophys. Res. Commun., 1998, 248:200–203. 19. Grundstrom T., Zenke W.M., Wintzerith M., Matthes H.W., Staub A., and Chambon P. Oligonucleotide-directed mutagenesis by microscale “shot-gun” gene synthesis. Nucleic Acids Res., 1985, 13:3305–3316. 20. Hayden M.A. and Mandecki W. Gene synthesis by serial cloning of oligonucleotides. DNA, 1988, 7:571–577. 21. Ciccarelli R.B., Loomis L.A., McCoon P.E., and Holzschu D.L. Insertional gene synthesis, a novel method of assembling consecutive DNA sequences within specific sites in plasmids: Construction of the HIV-1 tat gene. Nucleic Acids Res., 1990, 18:1243–1248. 22. Benner S.A., Caraco M.D., Thomson J.M., and Gaucher E.A. Planetary biology—paleontological, geological, and molecular histories of life. Science, 2002, 296:864–868. 23. Gouy M. and Gautier C. Codon usage in bacteria. correlation with gene expressivity. Nucleic Acids Res., 1982, 10:7055–7074. 24. Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol., 1981, 151:389–409. 25. Hayes C., Bose B., and Sauer R. Stop codons preceded by rare arginine codons are efficient determinants of SsrA tagging in Escherichia coli. Proc. Natl. Acad. Sci. USA, 2002, 99:3440–3445. 26. McNulty D., Claffee B., Huddleston M., Porter M., Cavnar K., and Kane J. Mistranslational errors associated with the rare arginine codon CGG in Escherichia coli. Protein Exp. Purif., 2003, 27:365–374. 27. Kane J.F. Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Curr. Opin. Biotechnol., 1995, 6:494–500. 28. Lithwick G. and Margalit H. Hierarchy of sequence-dependent features associated with prokaryotic translation. Genome Res., 2003, 13:2665–2673. 29. Henaut A. and Danchin A. Analysis and predictions from Escherichia coli sequences. In Escherichia coli and Salmonella typhimurium Cellular and Molecular Biology. Neidhardt F.C., Curtiss R.I., Ingraham J., Lin E., Brooks Low K., Magasanik B., Reznikoff W., Riley M., M.S., and Umbarger H. Editors. ASM Press: Washington, DC, 1996, 2047–2066. 30. Ikemura T. Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J. Mol. Biol., 1982, 158:573–597. 31. Drummond D.A., Bloom J.D., Adami C., Wilke C.O., and Arnold F.H. Why highly expressed proteins evolve slowly. Proc. Natl. Acad. Sci. USA, 2005, 102:14338–14343. 32. Sorensen M.A. Charging levels of four tRNA species in Escherichia coli Rel(+) and Rel(-) strains during amino acid starvation. a simple model for the effect of ppGpp on translational accuracy. J. Mol. Biol., 2001, 307:785–798. 33. Björk G.R. Stable RNA modification. In Escherichia coli and Salmonella. Cellular and Molecular Biology. Neidhardt F.C., Curtiss III R., Ingraham J.L., Lin ECC., Low Jr. K.B., Magasanik B., Reznikoff W.S., Riley M., Schaechter M., and Umbarger H.E., Editors. ASM Press: Washington, DC, 1996, 861–886.

9-12

Gene Expression Tools for Metabolic Pathway Engineering

34. Urbonavicius J., Qian Q., Durand J.M., Hagervall T.G., and Bjork G.R. Improvement of reading frame maintenance is a common function for several tRNA modifications. EMBO J., 2001, 20:4863–4873. 35. Li J.N. and Björk G.R. 1-methylguanosine deficiency of tRNA influences cognate codon interaction and metabolism in Salmonella typhimurium. J. Bact., 1995, 177:6593–6600. 36. Wahab S.Z., Rowley K.O., and Holmes W.M. Effects of tRNA(1Leu) overproduction in Escherichia coli. Mol. Microbiol., 1993, 7:253–263. 37. Gefter M.L. The in vitro synthesis of 2′-omethylguanosine and 2-methylthio 6N (gamma,gamma, dimethylallyl) adenosine in transfer RNA of Escherichia coli. Biochem. Biophys. Res. Commun., 1969, 36:435–441. 38. Wilson R.K. and Roe B.A. Presence of the hypermodified nucleotide N6-(delta 2-isopentenyl)-2methylthioadenosine prevents codon misreading by Escherichia coli phenylalanyl-transfer RNA. Proc. Natl. Acad. Sci. USA, 1989, 86:409-413. 39. Smith D.W. Problems of translating heterologous genes in expression systems. the role of tRNA. Biotechnol. Prog., 1996, 12:417-422. 40. Kurland C. and Gallant J. Errors of heterologous protein expression. Curr. Opin. Biotechnol., 1996, 7:489–493. 41. Rosenberger R.F. Translational errors during recombinant protein synthesis. Dev. Biol. Stand., 1994, 83:21–26. 42. Sharp P.M. and Li W.H. The codon Adaptation Index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res., 1987, 15:1281–1295. 43. Gong M. and Gong F.C.Y. Overexpression of tnaC of Escherichia coli inhibits growth by depleting tRNA2Pro availability. J. Bacteriol., 2006, 188:1892–1898. 44. Farabaugh P.J. and Björk G.R. How translational accuracy influences reading frame maintenance. EMBO J., 1999, 18:1427–1434. 45. Klasen M. and Wabl M. Silent point mutation in DsRed resulting in enhanced relative fluorescence intensity. BioTechniques, 2004, 36:236–237. 46. Tamura T., Holbrook S.R., and Kim S.H. A Macintosh computer program for designing DNA sequences that code for specific peptides and proteins. Biotechniques, 1991, 10:782–784. 47. Stewart L. and Burgin A.B. Whole gene synthesis: A Gene-O-Matic future. Front. Drug Design Discovery, 2005, 1:297–341. 48. Villalobos A., Ness J.E., Gustafsson C., Minshull J., and Govindarajan S. Gene Designer: A synthetic biology tool for constructing artificial DNA segments. BMC Bioinformatics, 2006, 7:285. 49. Björk G.R., Wikström P.M., and Byström A.S. Prevention of translational frameshifting by the modified nucleoside 1-methylguanosine. Science, 1989, 244:986–989. 50. O’Connor M. Imbalance of tRNA(Pro) isoacceptors induces +1 frameshifting at near-cognate codons. Nucleic Acids Res., 2002, 30:759–765. 51. Versteeg R., van Schaik B.D., van Batenburg M.F., Roos M., Monajemi R., Caron H., Bussemaker H.J., and van Kampen A.H. The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res., 2003, 13:1998–2004. 52. Kudla G., Lipinski L., Caffin F., Helwak A., and Zylicz M. High guanine and cytosine content increases mRNA levels in mammalian cells. PLoS Biol., 2006, 4:e180. 53. Semon M., Mouchiroud D., and Duret L. Relationship between gene expression and GC-content in mammals. statistical significance and biological relevance. Hum. Mol. Genet., 2005, 14:421–427. 54. Jin H., Zhao Q., Gonzalez de Valdivia E., Ardell D.H., Stenström M., and Isaksson L.A. Influences on gene expression in vivo by a Shine–Dalgarno sequence. Mol. Microbiol., 2006, 60:480–492. 55. Huang H.Y., Chien C.H., Jen K.H., and Huang H.D. RegRNA. An integrated web server for identifying r egulatory RNA motifs and elements. Nucleic Acids Res., 2006, 34:W429–434.

Tools Designed to Regulate Translational Efficiency

9-13

56. Pertzev A.V. and Nicholson A. Characterization of RNA sequence determinants and antideterminants of processing reactivity for a minimal substrate of Escherichia coli ribonuclease III. Nucleic Acids Res., 2006, 34:3708–3721. 57. Ehretsmann C.P., Carpousis A.J., and Krisch H.M. Specificity of Escherichia coli endoribonuclease RNase E. in vivo and in vitro analysis of mutants in a bacteriophage T4 mRNA processing site. Genes Dev., 1992, 6:149–159. 58. Takyar S., Hickerson R.P., and Noller H.F. mRNA helicase activity of the ribosome. Cell, 2005, 120:49–58. 59. Iost I. and Dreyfus M. mRNAs can be stabilized by DEAD-box proteins. Nature, 1994, 372:193–196. 60. Katz L. and Burge C.B. Widespread selection for local RNA secondary structure in coding regions of bacterial genes. Genome Res., 2003, 13:2042–2051. 61. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 2003, 31:3406–3415. 62. Green R. and Noller H. Ribosomes and translation. Ann. Rev. Biochem., 1997, 66:679–716. 63. Ingraham J.L. and Maaloe F.C.N. Growth rate as a variable. In Growth of the Bacterial Cell. Ingraham JL, Editor. Sinauer Associates Inc.: Sunderland, MA, 1983, 267–315. 64. Sprengart M.L., Fatscher H.P., and Fuchs E. The initiation of translation in E. coli. apparent base pairing between the 16srRNA and downstream sequences of the mRNA. Nucleic Acids Res., 1990, 18:1719–1723. 65. Ito K. and Kawakami K.Y.N. Multiple control of Escherichia coli lysyl-tRNA synthetase expression involves a transcriptional repressor and a translational enhancer element. Proc. Natl. Acad. Sci. USA, 1993, 90:302–306. 66. Faxen M., Plumbridge J., and Isaksson L.A. Codon choice and potential complementarity between mRNA downstream of the initiation codon and bases 1471–1480 in 16S ribosomal RNA affects expression of glnS. Nucleic Acids Res., 1991, 19:5247–5251. 67. Nagai H., Yuzawa H., and Yura T. Interplay of two cis-acting mRNA regions in translational control of sigma 32 synthesis during the heat shock response of Escherichia coli. Proc. Natl. Acad. Sci. USA, 1991, 88:10515–10519. 68. Stenstrom C.M., Holmgren E., and Isaksson L.A. Cooperative effects by the initiation codon and its flanking regions on translation initiation. Gene, 2001, 273:259–265. 69. Stenstrom C.M., Jin H., Major L.L., Tate W.P., and Isaksson L.A. Codon bias at the 3′–side of the initiation codon is correlated with translation initiation efficiency in Escherichia coli. Gene, 2001, 263:273–284. 70. Stenstrom C.M. and Isaksson L.A. Influences on translation initiation and early elongation by the messenger RNA region flanking the initiation codon at the 3’ side. Gene, 2002, 288:1–8. 71. Sorensen M.A., Kurland C.G., and Pedersen S. Codon usage determines translation rate in Escherichia coli. J. Mol. Biol., 1989, 207:365–377. 72. Bulmer M. Codon usage and intragenic position. J. Theor. Biol., 1988, 133:67–71. 73. Robinson M., Lilley R., Little S., Emtage J.S., Yarranton G., Stephens P., Millican A., Eaton M., and Humphreys G. Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic Acids Res., 1984, 12:6663–6671. 74. de Smit M.H. and van Duin J. Control of translation by mRNA secondary structure in Escherichia coli. A quantitative analysis of literature data. J. Mol. Biol., 1994, 244:144–150. 75. de Smit M.H. and van Duin J. Secondary structure of the ribosome binding site determines translational efficiency. a quantitative analysis. Proc. Natl. Acad. Sci. USA, 1990, 87:7668–7672. 76. Karimi R. and Ehrenberg M. Dissociation rate of cognate peptidyl-tRNA from the A-site of hyperaccurate and error-prone ribosomes. Eur. J. Biochem., 1994, 226:355–360. 77. Dincbas V., Heurgue-Hamard V., Buckingham R.H., Karimi R., and Ehrenberg M. Shutdown in protein synthesis due to the expression of mini-genes in bacteria. J. Mol. Biol., 1999, 291:745–759.

9-14

Gene Expression Tools for Metabolic Pathway Engineering

78. Gonzalez de Valdivia E.I. and Isaksson L.A. A codon window in mRNA downstream of the initiation codon where NGG codons give strongly reduced gene expression in Escherichia coli. Nucleic Acids Res., 2004, 32:5198–5205. 79. Gonzalez de Valdivia E. and Isaksson L.A. Abortive translation caused by peptidyl-tRNA drop-off at NGG codons in the early coding region of mRNA. FEBS J., 2005, 272:5306–5316. 80. Bossi L. and Roth J.R. The influence of codon context on genetic code translation. Nature, 1980, 286:123–128. 81. Engelberg-Kulka H. UGA suppression by normal tRNA Trp in Escherichia coli. codon context effects. Nucleic Acids Res., 1981, 9:983–991. 82. Murgola E., Pagel F.T., and Hijazi K.A. Codon context effects in missense suppression. J. Mol. Biol., 1984, 175:19–27. 83. Carrier M.J. and Buckingham R.H. An effect of codon context on the mistranslation of UGU codons in vitro. J. Mol. Biol., 1984, 175:29–38. 84. Bouadloun F., Srichaiyo T., Isaksson L.A., and Bjork G.R. Influence of modification next to the anticodon in tRNA on codon context sensitivity of translational suppression and accuracy. J. Bacteriol., 1986, 166:1022–1027. 85. Hagervall T. and Bjork G. Undermodification in the first position of the anticodon of supG-tRNA reduces translational efficiency. Mol. Gen. Genet., 1984, 196:194–200. 86. Shpaer E.G. Constraints on codon context in Escherichia coli genes. Their possible role in modulating the efficiency of translation. J. Mol. Biol., 1986, 188:555–564. 87. Gouy M. Codon contexts in enterobacterial and coliphage genes. Mol. Biol. Evol., 1987, 4:426–444. 88. Gutman G.A. and Hatfield G.W. Nonrandom utilization of codon pairs in Escherichia coli. Proc. Natl. Acad. Sci. USA, 1989, 86:3699–3703. 89. Moura G., Pinheiro M., Silva R., Miranda I., Afreixo V., Dias G., Freitas A., Oliveira JL., and Santos M.A. Comparative context analysis of codon pairs on an ORFeome scale. Genome Biol., 2005, 6:R28. 90. Irwin B., Heck J.D., and Hatfield G.W. Codon pair utilization biases influence translational elongation step times. J. Biol. Chem., 1995, 270:22801–22806. 91. Cheng L. and Goldman E. Absence of effect of varying Thr-Leu codon pairs on protein synthesis in a T7 system. Biochemistry, 2001, 40:6102–6106.

10 Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes Mohak Mhatre Johns Hopkins University

Maira P. Pellegrini Federal University of Rio de Janeiro

Michael J. Betenbaugh Johns Hopkins University

10.1 Introduction ��10-1 10.2 Chaperones and Folding Catalysts...............................................10-2 Protein Disulfide Isomerase (PDI) • Binding Protein • Calnexin (CNX) Signal Peptidase

10.3 Glycoengineering �� 10-6 Glycoengineering in Mammalian Cells • Glycoengineering in Insects • Glycoengineering in Plants • Glycoengineering in Yeast

10.4 γ-Carboxylation ��10-13 10.5 Furin Cleavage ��10-16 10.6 Conclusion ��10-16 References ��10-17

10.1 Introduction There is a great demand in the pharmaceutical and biotechnological industries to synthesize proteins at very high production rates. Although mammalian hosts would create proteins that most resembled human proteins, mammalian cell lines are expensive to culture and often do not produce proteins at the desired rate. Consequently, researchers have also turned to nonmammalian hosts, such as insect, plant, and fungal cells, to produce recombinant proteins at high levels and at relatively low cost. However, researchers have discovered that the secretory pathway—the designated steps for proteins destined to leave the cell—have numerous spots along the way that limit the production of recombinant proteins in all these expression systems. These rate-limiting steps can lead to inactive or misfolded proteins, protein aggregation, and ultimately, protein degradation. These bottlenecks are mainly due to the many post-translational modifications that a protein must undergo before it can leave the cell. Researchers are attempting to overcome these bottlenecks through metabolic engineering of the secretory pathway, specifically by overexpressing or manipulating the levels of expression of various enzymes involved in protein processing. In this chapter, we will discuss the research contributions to metabolic engineering of protein folding and assembly in the endoplasmic reticulum (ER) (protein disulfide isomerase (PDI), binding immunoglobulin protein (BiP), calnexin (CNX), and signal peptidase); N-site occupancy and processing of the N-glycosylation pathway; and gamma-carboxylation and furin cleavage.

10-1

10-2

Gene Expression Tools for Metabolic Pathway Engineering

S

PDI SH

SH

SH

SH

S S

PDI

S

S

S

S

S

Inactive protein

Active protein

Inactive protein

Figure 10.1 Schematic of the catalytic oxidation and isomerase functions of protein disulfide isomerase (PDI).

10.2 Chaperones and Folding Catalysts 10.2.1 Protein Disulfide Isomerase (PDI) PDI is an enzymatic protein involved in the protein folding and assembly process that occurs in the cell’s ER. This enzyme is specifically responsible for catalyzing the formation and degradation of disulfide bonds on a protein until the most stable conformation is obtained (Figure 10.1). Disulfide bonds increase the stability of the folded protein and allow it to be secreted in its proper conformation. When a protein is incorrectly folded, the unfolded protein response (UPR) may be triggered, resulting in aggregation and subsequent degradation of the protein. This problem may be especially evident with the expression of recombinant proteins in a host. Currently, researchers are engineering changes in the secretory pathways of a variety of cell hosts, in hopes to overcome UPR and increase protein secretion. Significant work has examined increasing secretion levels of heterologous proteins in Saccharomyces cerevisiae. One group found that the overexpression of PDI caused an improvement in secretion of human platelet derived growth factor B homodimer by a factor of ten.1 The group also found that overexpression of PDI quadrupled the secretion of Schizosaccharomyces pombe acid phosphatase.1 The results show that the overexpression of PDI can significantly improve secretion rates of recombinant proteins. In addition to improving protein secretion levels, researchers are attempting to use PDI to decrease UPR.2 One group studied the effect of the overexpression of the chaperone BiP and PDI in S. cerevisiae on the expression of a single-chain antibody.2 Their results demonstrate that while the UPR was not lowered by the overexpression of BiP alone, the overexpression of PDI alone or in conjunction with BiP led to a decrease in the UPR. In addition, the pairing of the two proteins caused the highest level of antibody secretion.2 This appears to show that PDI plays a pivotal role in both increased protein secretion and decreased UPR, at least for this host. Besides working with yeast, researchers are also interested in the effects of overexpressing single chaperones and pairs of chaperones in Pichia pastoris on recombinant protein secretion. One study revealed that the overexpression of PDI, Ssa1p, and other chaperones such as Kar2p, the BiP homologue

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes

10-3

in yeast, increase heterologous protein secretion by four to seven fold.3 Ssa1p is an Hsp70 chaperone protein found in the cytosol of yeast whose primary functions involve assistance in protein folding and translocation.4,5 The study also evaluated the effects of pairs of these chaperones, and found that YDJ1p/ PDI improved secretion more than pairs of chaperones that did not include PDI.3 Thus, the significance of PDI alone and its combination with chaperone proteins in alleviating the bottleneck effect cannot be ignored. These results show that chaperones and folding enzymes can couple to improve output of heterologous proteins.3 Another study worked with P. pastoris to increase secretion of recombinant antibody fragments. The study noted that while there was an accumulation of heavy chain antibody fragments inside the host cells, whole antibodies, and single light chain fragments were present only in the supernatant.6 The group engineered the host cell to upregulate both PDI and the UPR transcription factor HAC1.6 PDI caused an improvement in antibody secretion almost twice as great as HAC1, demonstrating that formation of disulfide bonds during protein folding significantly affects the production and secretion rates of Fab.6 The applications of PDI overexpression have also been focused on increasing secretion of a potential vaccine against hookworm infection—Necator americanus secretory protein (Na-ASP1) in P. pastoris.7 The group had initially attempted to increase Na-ASP1 secretion by augmenting the number of Na-ASP1 genes in the host cell line, but clones with multiple gene copies showed a reduction in protein secretion.7 To overcome this disparity, the group overexpressed PDI at variable levels, and studied the resulting amount of Na-ASP1 secreted. The results demonstrate that while there was still an aggregation of Na-ASP1 inside the host cells, the protein secretion level did improve with multiple gene copies of PDI.7 The problem of producing recombinant antibodies has also been studied in insect and mammalian cells. The expression of PDI was found to salvage insoluble immunoglobulins in a form that allowed the immunoglobulin monomers to assemble and be secreted into the medium with higher protein yields.8 The original PDI work was then further expanded in another paper 9 in which particular thioredoxinlike domains within PDI were found to facilitate immunoglobulin assembly. A study with a Chinese Hamster Ovary (CHO) cell line found that the cell line did not augment its amount of PDI and BiP with the rising production of the recombinant human antibody.10 The findings also demonstrate that antibody secretion levels are enhanced with increased expression of light chain, and that aggregation of heavy chain occurs when there is not enough light chain present—this may be another significant bottleneck in the pathway.10 To alleviate this bottleneck, PDI and BiP were introduced into the host cells both separately and together.10 The overexpression of BiP alone and paired with PDI caused a decrease in the rate of antibody secretion, while the overexpression of PDI alone enhanced the level of secretion.10

10.2.2 Binding Protein Another ER protein that has been overexpressed to enhance post-translational processing in a variety of expression systems is BiP, also known as the ER glucose regulated protein (GRP78) of the heat shock protein 70 family. Researchers have investigated how BiP’s association with proteins affects the proteinfolding and assembly pathway (Figure 10.2). Early research efforts investigated the effect of overexpressing Kar2p, the BiP homologue in yeast, on the secretion levels of heterologous proteins from Saccharomyces cerevisiae. Researchers found that Kar2p increased secretion rates of single chain antibody fragments (scFv) by two to eight times.11 These researchers also wanted to understand the specific role that BiP played in the processing of secreted proteins. It was hypothesized that higher amounts of BiP would either improve secretion levels by enhancing the solubility of the heterologous protein, or reduce secretion rates by preventing improperly folded proteins from leaving the cell.12 One study created a strain of S. cerevisiae in which the level of BiP could be adjusted from 5 to 250% of the normal expression level.12 They studied the effect of different expression levels of BiP on the secretion rates of bovine pancreas trypsin inhibitor, Schizosaccharomyces

10-4

Gene Expression Tools for Metabolic Pathway Engineering Ribosome 5´

3´

ER membrane BiP Polypeptide

Figure 10.2 Schematic of the polypeptide binding function of the chaperone, immunoglobulin binding rotein (BiP). p

pombe acid phosphatase, and human granulocyte colony-stimulating factor.12 Researchers found that when BiP was underexpressed, there was a correlating decrease in secretion levels from all three proteins12 to suggest that BiP’s role is primarily for solubilizing of proteins rather than being involved in the aggregation of recombinant proteins.12 On the other hand, the overexpression of BiP did not significantly affect the secretion rates in all cases, suggesting that BiP plays various roles in the solubilization and secretion of different proteins. Indeed, in mammalian cell lines, the overexpression of BiP had a variable and often negative effect on the secretion of the proteins von Willebrand factor (vWF), factor VIII, and macrophage colony-stimulating factor (M-CSF).13 BiP was found to associate with vWF and factor VIII, and its overexpression was found to inhibit their secretion. On the other hand, BiP did not associate with M-CSF, and thus, did not affect the secretion of that protein.13 In addition, a reduction in BiP by antisense RNA was observed to increase the secretion of tissue plasminogen activator (t-PA) in mammalian cells due to a reduction in the association of BiP with the heterologous protein.14 However, more recent proteomics and genomics studies have suggested that increases in BiP and other folding catalysts, such as PDI, often correlate with increased levels of IgG antibody production in mammalian cell lines such as NS0.15,16 Interestingly, however, the overexpression of BiP was observed to decrease secretion of an IgG antibody in mammalian CHO cells.10 In another study, one of the central regulators of ER processing capacity—the transcription factor XBP-1, which controls the expression of endogenous BiP and other ER chaperones—was upregulated in CHO-K1 cells, leading to an expansion in the ER and Golgi and an increase in the production of multiple secretory products.17 Production of recombinant proteins in insect cells had also come to a roadblock in the early 1990s, as researchers were obtaining high yields of intracellular proteins, but only small amounts of some secreted heterologous proteins. Our research group suspected that post-translation modification steps, such as folding and assembly, were the rate-limiting steps in the secretory pathway, and thus, represented a bottleneck that could potentially be overcome by metabolic cell engineering.18,19 These researchers expressed BiP in both Sf-9 and Trichoplusia ni (High Five) cells using the baculovirus vector, and observed that BiP interacted with the immunoglobulin chains in cells and increased the intracellular solubility of IgG.20 In High Five cell lines, the overexpression of BiP led to a 90% improvement in the secretion rates after 3½ days.20 These findings demonstrated that BiP was not an antagonist of protein folding and secretion; instead, it could significantly enhance secretion levels and reduce the accumulation of soluble protein within the cell.20,21 Thus, the particular cell engineering strategy used likely depends on the specific cell type and microenvironment encountered, and may require either increasing or decreasing the levels of certain chaperones or folding enzymes in the cells.

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes

10-5

Folded protein leaves CNXCRT cycle to continue on secretory pathway S

S Erp57 Calnexin

EDEM

Translocon complex

ERAD for degradation

Figure 10.3 Simplified schematic of the calnexin (CNX)-calreticulin (CRT) chaperone cycle along with the disulfide isomerase (Erp57), EDEM, and the ER associated degradation (ERAD) pathway.

10.2.3 Calnexin (CNX) Another resident ER chaperone protein is CNX, which works with calreticulin (CRT) to ensure proper folding of glycoproteins (Figure 10.3). CNX and CRT are carbohydrate-specific chaperones in that they recognize glycoproteins that have one glucose residue attached to the end of the N-glycan.22 CNX and CRT work to prevent the accumulation of misfolded proteins and inhibit the UPR.22 In order to interact with glycoproteins, CNX and CRT first bind with the unfolded glycoprotein and then create a complex with ERp57, a member of the DPI family.23 This complex assists the glycoprotein to fold into the correct configuration.23–27 Glucosidase II is an enzyme that allows the glycoprotein to leave the CNX–CRT complex by removing the one remaining glucose from the glycoprotein.23,28–31 If the glycoprotein is not folded correctly at this dissociation, UDPglucose:glycoprotein glucosyltransferase (GT) will reattach the glucose residue and the cycle begins again.23,31–33 Once the protein has folded properly, it leaves the cycle and moves on to the next steps in the processing pathway 23 (Figure 10.3). If, after many CNX–CRT cycles, the glycoprotein is unable to fold into its native conformation, the polypeptide will be retrotranslocated through the translocon complex, deglycosylated by a cytosolic glycoamidase, ubiquitinated, and undergo proteolysis through the ER-associated degradation (ERAD) pathway23,30,34–38 (Figure 10.3). EDEM (or Htm1/Mln1) is another lectin-binding protein that assists in identifying glycoproteins that need to be destroyed. EDEM recognizes an N-glycan with eight mannose residues in order to target the glycoprotein for degradation (Figure 10.3). One study found that overexpression of EDEM caused an increase in ERAD for improperly folded glycoproteins but did not increase the rate of proteolysis of nonglycosylated proteins.23,39–41 Furthermore, the coexpression of EDEM and a mannosidase enzyme accelerated the degradation of unfolded proteins in human embryonic kidney (HEK) 293 cell lines.23,42 Thus, the glycoprotein has the opportunity to attain its proper conformation with the assistance of CNX and CRT, but, failing to achieve this conformation, lectin-binding proteins will also ensure that the polypeptide will be removed from the ER and degraded. Researchers attempted to pinpoint where CNX has the most influence in the protein-folding pathway. In the study, researchers overexpressed CNX or CRT in a mammalian cell line expressing human thyroperoxidase (hTPO).43 Although the presence of the chaperone proteins appeared to increase the rate of the initial steps in the hTPO-folding pathway, there was no significant rise in hTPO at its final processing steps.43 These findings demonstrate that while CNX and CRT played an important role initially, other chaperone proteins are likely involved in the later stages of the protein-folding pathway.43 In order to distinguish the roles of BiP and CNX in the folding pathway, researchers performed pulsechase radiolabeling studies on mammalian cells expressing hTPO.44 CHO cells that overexpressed BiP

10-6

Gene Expression Tools for Metabolic Pathway Engineering

had a reduced level of properly folded hTPO, consistent with the results of previous studies,13,14 while cells overexpressing CNX and Erp57 had increased levels of partially folded hTPO.44 These results support the hypothesis that the CNX and BiP influence protein folding differently and the levels of these chaperones can alter the folding rates significantly in either a positive or negative fashion.44 Researchers have also explored the overexpression of CNX and CRT to improve productivity levels of other proteins, such as secreted thrombopoeitin (TPO), in CHO cells. First, they engineered a cell line that would allow control of CNX and CRT levels through varying the culture medium’s level of doxycycline.45 When there was no doxycycline present, the levels of CNX and CRT almost tripled, and caused the productivity of TPO to double.45 These findings suggest that high levels of CNX and CRT can improve the productivity of recombinant proteins in mammalian cell lines.45 Researchers have also studied the effects of CNX overexpression in insect cells in order to improve the production of membrane proteins, such as the serotonin transporter (SERT).46 Researchers discovered that the overexpression of recombinant SERT resulted in high amounts of improperly folded and inactive protein, and only low levels of properly folded and active transporter.46 These findings suggested that there was a bottleneck due to the lack of chaperones, and thus, the researchers overexpressed recombinant SERT along with CNX, CRT, BiP, and Erp57.46 The expression of CNX led to a tripling in the production of active transporter with lesser increases facilitated by BiP and CRT.46 CNX has also been found to improve levels of properly folded Shaker H4 potassium channel proteins in insect cells.47 In fungal cell lines such as Aspergillus niger, the production of heme-containing peroxidases is inefficient, perhaps due to a post-translational processing bottleneck.48 Researchers evaluated the coexpression of CNX as a way to enhance processing and found that the levels of extracellular manganese peroxidase increased four to five fold.48 In contrast, the coexpression of BiP resulted in lower levels of peroxidase. Addition of heme to the medium negated these effects but its presence was also found to increase the intracellular expression of BiP and CNX.48

10.2.4 Signal Peptidase The initial ER post-translational modification is the cleavage of the signal peptide of a polypeptide by signal peptidase as it passes into the ER compartment (Figure 10.4). The signal peptide is used as a marker for proteins that should be directed to the secretory pathway, and researchers hypothesized that this step may represent another potential bottleneck in the secretory pathway. In one study, multiple protein bands were observed when a scFv was expressed in insect cells using the baculovirus expression vectors.49 The researchers hypothesized that the multiple bands may represent incomplete processing of the signal peptide.49 The coexpression of wild-type SipS, a signal peptidase from Bacillus subtilis, eliminated the presence of multiple protein bands in the cell lysates while the expression of a mutant SipS did not affect the presence of the protein bands.49 This study indicated that signal peptide processing can be a bottleneck in the insect cell expression system. Although wild-type SipS did increase the amount of scFv in the medium, the rate of secretion of the protein was still low, suggesting that subsequent processing bottleneck situations limited secretion of the protein from insect cells.49

10.3 Glycoengineering In the ER, another protein post-translational modification occurs—glycosylation, which is the addition of sugars onto proteins. There are two forms of glycosylation: N-linked and O-linked. For N-linked processing, enzymes add carbohydrates to asparagine residues, while enzymes for O-linked processing glycosylate serine or threonine side chains. This chapter will focus on the metabolic engineering research contributions concerning N-glycosylation. In most eukaryotic cells, the process of N-glycosylation starts with the addition of Glc3Man9GlcNAc2 onto an asparagine side chain by the oligosaccharyltransferase (OST) complex to generate a glycoprotein 50 (Figure 10.5).

10-7

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes Ribosome

mRNA

5´

3´

Signal peptidase Signal peptide

Polypeptide

Figure 10.4 Schematic of the cleavage of signal peptides by the ER-associated membrane protein, signal peptidase.

O

H N H

C

NH CH CH2 C O X

NH

O H

N

O

X

OST-base-H+

OST-base

Ser/Thr

Glc3Man9GlcNAc2

CH C

CH2

C

Ser/Thr

OH O

O HO

O

NAc O

P O–

OST

O O

P

O HO

Dolichol

O– OH O

Glc3Man9GlcNAc2

O

NAc

NH O

H N

C

CH CH2

C

O

X Ser/Thr

Figure 10.5 Transfer of Glc3Man9GlcNAc2 from lipid or dolichol-linked oligasaccharide onto the asparagine residue of a growing polypeptide in the endoplasmic reticulum by the multisubunit enzyme, oligosaccharide transferase (OST).

During its travel through the ER and Golgi apparatus, the glycoprotein is subject to processing, as various enzymes will both remove and add specific sugars onto the oligosaccharide. These changes will ultimately shape the final structure of the sugars on the glycoprotein; many different sugar structures may be generated for a single glycoprotein. This N-glycan attachment plays a significant role in a glycoprotein’s stability, folding capabilities, secretion rate, and in vivo biological activity.50

10-8

Gene Expression Tools for Metabolic Pathway Engineering Legend GlcNAc Man Fuc Xyl Gal Sialic acid P Phosphate

Insect

Plant P

Mammalian

Yeast

Figure 10.6 Typical N-linked oligosaccharides produced in the insect, plant, and yeast hosts and the desirable mammalian oligosaccharide researchers are attempting to generate in these hosts through glycoengineering.

In N-glycan processing in the ER, insects, plants, and yeasts share many similarities with mammals, particularly in the transformation of the Glc2Man9GlcNAc2 oligosaccharide to Man8GlcNAc2.51 However, in the Golgi apparatus, N-glycan processing varies significantly among the species (Figure 10.6). Insect cells often eliminate most mannose sugars to form paucimannosidic N-glycans, while yeast cells often add at least 50 mannose residues during processing steps. Plants will often eliminate mannose residues, frequently attach β(1,2)-linked Xyl and α(1,3)-linked Fuc residues, and sometimes create GlcNAc or Gal complexes to be added to the oligosaccharide.23,51 The nonmammalian cell hosts are of significant pharmaceutical interest because of their relatively low cost, easy and safe use, and their adaptable system of expressing recombinant proteins at high levels. 52 The problem with producing these heterologous proteins in nonmammalian hosts is that the differences in the N-glycans are potentially immunogenic to humans and can cause the recombinant glycoprotein to be quickly removed from the patient’s circulatory system. Researchers are trying to engineer insect, plant, and yeast cells to create sialylated or galactosylated N-glycans in order to produce glycoproteins similar to those of mammalian strains so that they can be used in a clinical setting and with increased biological activity in vivo.51

10.3.1 Glycoengineering in Mammalian Cells Researchers have begun engineering glycosylation changes in mammalian cells, which would create glycoproteins that more closely resemble human proteins. A current roadblock with these cells is their lower production levels in some cases and their expensive culture costs.53 Nonetheless, researchers have made significant progress in this area, particularly in genetically manipulating mammalian cells to add N-glycans onto EPO.53 One study investigated the role of site occupancy in human erythropoeitin (hEPO) on biological activity in vivo and in vitro. EPO has three sites for N-glycosylation. While unglycosylated EPO derivatives were comparably active in vitro with respective to wild-type EPO, the study revealed that the elimination of these sites significantly reduced the in vivo activity.54,55 This shows that N-linked oligosaccharides are essential for in vivo activity.55 Another study focused on the effect of

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes

10-9

N-glycans on EPO secretion in BHK cells, where EPO mutants with all three glycosylation sites removed had a 90% reduction in production.54 The researchers then investigated the effect of engineering glycosylation consensus sequences into proteins such as recombinant human erythropoeitin (rHuEPO).56 In this case, the researchers modified the genes so that the glycoproteins included two additional target sites for N-glycosylation (Asn-X-Ser/Thr); the protein then would include five N-glycans attached rather than three. They discovered that these sequences caused a significant improvement of in vivo activity and longevity of action.56 In particular, glycoengineered rHuEPO has had clinical success, resulting in fewer required doses of the drug for each patient.56 These findings demonstrate that glycoengineering could be a potential solution to the dosage problem of some protein therapeutics. Transferrin, another protein with two N-glycosylation sites, has also been used in studies concerning the effect of site occupancy in protein secretion. One study created BHK cells producing recombinant transferrin. While wild-type transferrin had a secretion level of 125 mg/L, BHK cells expressing mutant transferrin—which had the glycosylation sites removed—had a decrease in protein secretion to only 25 mg/L.57 N-glycosylation can also be clinically relevant in congenital disorders of glycosylation (CDGs). CDGs involve genetic deficiency in the enzymes responsible for site occupancy or glycan processing.58 CDGs have a variety of clinical features, including developmental delays, seizures, organ failure, and death.23 The diseases can be classified into two groups: CDG-I, where there is a defect in the production of Glc3Man9GlcNAc2-P-P-Dol, and CDG-II, where problems arise in the later series of processing the sugars attached to the protein.58 The most prevalent indicator for CDG-I is the aggregation of improperly glycosylated forms of transferrin in the serum and cerebrospinal fluid of the patient.23 Clinical studies have demonstrated that healthy humans generate human transferrin (hTf) with both of its N-glycosylation site occupied, while CDG patient have a higher level of hTf with only either N-glycan site occupied.59 In addition, researchers have been engineering mammalian cells to create N-glycans for monoclonal antibodies that target tumor cells and trigger the immune system to respond.53 The ability of the antibody to bind to the tumor cell is affected by the sugars attached on the antibody, particular the ones on the heavy chains in the Fc region of the antibody53 (Figure 10.7). Researchers manipulated the glycosylation pathway by eliminating fucose, and subsequently increased binding affinities by 50 fold to Fc receptors on immune cells.53 To do this fucose elimination, some groups have expressed an enzyme, β(1,4)-N-acetylglucosaminyltransferase III (GnT-III), that adds a GlcNAc residue to compete and inhibit fucosylation with native mammalian fucosyltransferase, while other researchers knocked out fucosyltransferase from the mammalian host to produce a similar reduction in fucosylation on the monoclonal antibody.53,60 These glycoengineered antibodies exhibit enhanced antibody-dependent cellular cytoxicity (ADCC).60 In addition, the researchers have observed that changing the intracellular location of the GnT-III domain in the Golgi apparatus can enhance

Fv region Fc region

Figure 10.7 Schematic of an antibody with glycans attached to its fragment crystallizable (Fc) region near the carboxyl terminus of the heavy chain.

10-10

Gene Expression Tools for Metabolic Pathway Engineering

its ability to inhibit fucosylation of the antibodies and enhance ADCC.60 Coexpressing GnT-III and mannosidase II (ManII) in concert was found to increase the number of complex-type oligosaccharides relative to the expression of GnT-III alone.60

10.3.2 Glycoengineering in Insects The difference in N-glycans between insect and mammalian cells results from the lack of enzymes involved in creating sugars and adding these sugars to the N-glycan structure.52 Research has indicated that insect cells may express these enzymes early in development but that cell lines seem to have insufficient expression. When manipulating the glycosylation pathway in insect cells, researchers had to engineer enzymes to cover many steps that are found in the mammalian pathway. First, researchers must inhibit or minimize the effect of β-N-acetylglucosaminidase (GlcNAcase), which removes a terminal GlcNAc on the Manα(1,3)-branch present on mammalian glycoproteins.23 Researchers discovered that if a cell line does not contain GlcNAcase, or an inhibitor to this enzyme cannot be found, other options include the overexpression of N-acetylglucosamine Transferase I (GnT-I) or Galactose Transferase (GalT).23 This approach will either compete against the GlcNacase (GnT-I) or cap the GlcNac residue (GalT) in order to prevent removal of the GlcNac residue. Indeed, expressing mammalian GalT genes in insect cells has been shown to increase the level of galactosylated glycans from 0% to 13% and provide a simultaneous rise from 6.5% to 15.7% in N-glycans having a terminal GlcNAc on the Manα(1,3)branch.23,61 In this way, overexpressing GalT both suppressed GlcNAcase activity and enhanced the number of galactosylated N-glycans produced by insect cells. Unfortunately, the galactose residues were present on only one of the biantennary N-glycan branches (Manα(1,3)-branch), and there were only a few GlcNAc additions to the alternate Manα(1,6)-branch of the N-glycan. In order to obtain Gal additions on both branches, it was necessary to co-express both GalT and N-acetylglucosamine Transferase II (GnT-II).23 One study showed that expression of GnT-II alone in Tn-5B1-4 cells created only 0.7% of biantennary N-glycans on a heterologous transferrin, while the coexpression of GalT and GnT-II resulted in heterologous transferrin with 50% of its N-glycans containing Gal on both branches.23,52 In order to create glycoproteins on therapeutic drugs such as EPO and t-PA with more than two branches, the mammalian genes of GnT-III, IV, and V will also need to be genetically engineered into insect cells.23 The terminal step of the mammalian glycosylation processing pathway is typically the addition of sialic acid onto both branches of the N-glycan—a step that is not replicated in insect cells (Figure 10.6). The attachment of sialic acid plays a significant role in the functional activity and in vivo circulatory half-life of recombinant glycoproteins.62 For the addition of this molecule, cytidine monophospho-sialic acid (CMP-SA) is generated and transferred to an oligosaccharide by the enzyme sialyltransferase (SiaT). Insect cells typically lack or have very low levels of CMP-Neu5Ac, the single most ubiquitous sialylation substrate. Consequently, researchers have been working to engineer changes in the glycosylation processing pathway in insect cells so that they can generate the proper CMP-sialic acid molecule for sialylation. Initially, researchers engineered the enzymes required to generate CMP-sialic acid from the intermediate, N-acetylmannosamine (ManNAc), by expressing sialic acid phosphate synthase (SAS) and CMPsialic acid phosphate synthase (CMP-SAS) (Figure 10.8). When the researchers expressed both CMP-SAS and SAS and included a large amount of N-acetylmannosamine, the levels of CMP-Neu5Ac increased by 30 fold.62 In subsequent studies, researchers further engineered the sialic acid processing pathway of insects. The researchers expressed SAS, UDP-GlcNAc 2-epimerase/ManNAc kinase (EK)—the enzyme that converts UDP-GlcNAc to ManNAc-9-Phosphate—and CMP-SAS and found a significant increase in CMP-Neu5Ac without the need for ManNAc supplementation to the media.63 Furthermore, by mutating the EK gene to eliminate feedback inhibition of the enzyme by CMP-sialic acid, the researchers were able to increase CMP-sialic acid levels four fold.64 Researchers also found that supplying GlcNAc enhanced levels of CMP-Neu5Ac

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes UDP-GlcNAc

CMP-Neu5Ac

10-11

PPi CMP

ManNAc-6-P

Biological synthesis of CMP-sialic acid (CMP-Neu5Ac)

Neu5Ac

Pi PEP

Neu5Ac-9-P

Figure 10.8 Biochemical pathway for the synthesis of CMP-N-acetylneuraminic acid (CMP-Neu5Ac), also known as CMP-sialic acid from UDP-N-acetylglucosamine (UDP-GlcNAc).

even more than that which could be obtained by feeding ManNAc alone. These results suggested a limitation in the transport and phosphorylation of ManNAc in the cells.63 Such a strategy of generating CMP-Neu5Ac by metabolic engineering eliminated the need to add serum, which includes sialic acid, to the insect cell culture medium. Unfortunately, some insect cell lines also add a potentially immunogenic α(1,3)-Fucose to some N-glycans through the expression of the α(1,3) fucosyltransferase (α3FucT) gene, implying that future metabolic engineering strategies may require the need to restrict the expression of this gene. Alternatively, researchers have also utilized particular strains, such as Sf9, Ea4, and gypsy moth cells that have low native levels of α3FucT genes.

10.3.3 Glycoengineering in Plants The use of plant cells for producing heterologous glycoproteins is now beginning to be explored. As with other nonmammalian expression systems, the need for glycosylation engineering in plants is imperative because plants add fucose and xylose sugars that can trigger an immune response in humans53 (Figure 10.6). To alleviate this problem, researchers have identified the genes responsible for enzymes that attach xylose and fucose to glycoproteins, and have knocked them out. In one study, both genes were eliminated from a strain of Arabidopsis thaliana, resulting in synthesized glycoproteins that had no β(1,2)-xylosylation and core α(1,3)-fucosylation.23,65 Similar glycoengineering results were obtained when the genes were knocked out of Physcomitrella patens.23,66 Another strategy for restricting the addition of xylose and fucose is to prevent glycoproteins from entering the Golgi apparatus, where these glycosyltransferases are enzymatically active. In order to retain glycoproteins in the ER, researchers attached a KDEL sequence to the glycoprotein and observed that no immunogenic fucoses were attached to the N-glycan of a monoclonal antibody fragment retained in the ER.23,67 Researchers have also explored the possibility of overexpressing GalT to enhance galactosylation on plant N-glycans. In one study, tobacco plants were engineered to express a hybrid GalT that was made of the N-terminal domain of Arabidopsis core XylT and the catalytic domain of human β4GalT.23,68 The expression of this recombinant gene caused a 60% decrease in the level of glycoproteins with xylose and fucose residues but the structures that were obtained included Gal on only one branch of these so-called hybrid N-glycans; no di-galactosylated structures were obtained with Gal attached to both branches of the N-glycan.23,68 Another study knocked out the XylT and FucT genes in plant moss Physcomitrella patens and coexpressed the β4GalT gene to secrete recombinant glycoproteins lacking the immunogenic Xyl and Fuc attachments while including attached Gal residues.23,69 In order to generate fully humanized glycoproteins, plants must also be able to add the terminal sialic acid sugar onto recombinant glycoproteins. Unfortunately, plants have few, if any, of the many enzymes responsible for sialylation. However, researchers have begun to address this roadblock by engineering

10-12

Gene Expression Tools for Metabolic Pathway Engineering

plants to express mammalian SiaT.23 These studies must also address the inability of most plants to produce CMP-SA, a substrate limitation similar to that observed in insect cell lines.23,70

10.3.4 Glycoengineering in Yeast Researchers have also applied yeast as a lower cost method for generating recombinant glycoproteins. However, natural yeast oligosaccharides do not resemble those of humans, and therefore, could trigger an immune reaction in patients53 (Figure 10.6). Consequently, researchers have engineered yeast to express recombinant enzymes needed for glycosylation, such as glycosyltransferases, and restricted the yeast’s own undesirable glycosylation processing enzymes.53 Another advantage of this system is that yeast can potentially be engineered to generate homogenously N-glycosylated proteins similar to those found in humans.53 Such control of glycosylation may have a positive effect on therapeutic activity— one study demonstrated that an engineered yeast version of a commercial antibody was ten times more efficient in attaching to the Fc target receptor in receptor binding assays than the commercial drug.71 The success of this study suggests that other glycoforms found in humans may be produced in yeast cells in the future. Unlike mammals, yeast add significant numbers of mannose residues onto glycoproteins. In order for yeast to synthesize humanlike glycoproteins, polymannosylation must be restricted, and the yeast must be engineered to create the Man5GlcNAc2 acceptor found in humans.23 Researchers have been able to create N-glycans with Man5GlcNAc2 in S. cerevisiae by inhibiting the functions of some mannosyltransferase genes and, alternatively, by retaining A. saitoi α(1,2)-mannosidase in the ER using the ER retention signal HDEL23,72 in order to increase mannosidase activity. However, the production levels of the target Man5GlcNAc2 N-glycan structure were low. To improve these results, researchers examined combinatorial libraries of the α(1,2)-mannosidase gene fused to fungal membrane leaders in order to increase processing of N-glycans to include more than 90% of the desired precursor N-glycan, Man5GlcNAc2. The inclusion of a UDP-GlcNAc transporter and a GnT I fused to a membrane leader resulted in a strain of P. pastoris that could create large amounts of the hybrid N-glycan, GlcNAcMan5GlcNAc2.23,73 Subsequent genetic engineering removed an additional two mannose residues using α-mannosidase II, and co-expression of N-acetylglucosamine transferase II (GnT-II) led to a P. pastoris mutant that produced the complex-type N-glycan GlcNAc2Man3GlcNAc2 with GlcNAc sugars on both branches.23,74 Another strategy developed by researchers for creating Man5GlcNAc2 is to inhibit the formation of Glc3Man9GlcNAc2-PP-Dolichol, which would be modified to the high mannose forms in the yeast glycosylation pathway. Researchers discovered S. cerevisiae and P. pastoris mutants missing the mannosyltransferase gene, Alg3p, that produced Man5GlcNAc2-PP-Dol,23,75 and this mutant subsequently adds only shortened Man5GlcNAc2 structures to the acceptor polypeptide.23,76,77 These modified N-glycans will not allow the subsequent modification of the N-glycan to generate undesirable high mannose glycoforms. Consequently, endoH resistant sugar structures can be generated in this mutant 23,78,79 by expressing α(1,2)-mannosidase23 in order to generate the trimannosyl Man3GlcNAc2 core structure. Researchers have also explored how to obtain effective galactosylation in yeast, which involves the transfer of a Gal sugar onto the acceptor GlcNAc by the GalT enzyme. Unlike the efficient transfer of galactose in insect cells after expressing recombinant mammalian GalT23,61,80–83 yeast cells were not able to efficiently galactosylate GlcNAc residues, signifying that UDP-galactose or its transporter is lacking in yeast.23 Researchers overcame this limitation by introducing a fusion protein of UDP-galactose-4epimerase—which transforms UPD-Glc to UDP-Gal—the catalytic domain of human β4GalT I and a yeast leader sequence into the Golgi apparatus of an engineered P. pastoris strain that synthesized GlcNAc2Man3GlcNAc2.23,84 Since P. pastoris include significant levels of UDP-Glc sugar in the Golgi, the inclusion of this fusion protein resulted in the production of glycoproteins with galactose at the ends of both branches.23,85 Alternatively, production of monoclonal antibodies containing the fully galactosylated structure, Gal2GlcNAc2Man3GlcNAc2, was also achieved by expressing the UDP-Gal4-epimerase in the cytosol along with a UDP-Gal transporter and GalT in the Golgi apparatus of a

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes

10-13

similarly engineered P. pastoris strain.23,71 Another study demonstrated that the presence of both GalT and a UDP-Gal transporter produced high levels of galactosylated N-glycans in S. cerevisiae.23,86 Like insects and plants, yeast are unable to create sialylated glycoproteins because they are unable to produce CMP-sialic acid. They cannot transport CMP-sialic acid from the cytoplasm into the Golgi apparatus, and they do not have the enzyme necessary to transfer sialic acid onto the last galactose residue of an acceptor N-glycan.23 Researchers recently were able to overcome these obstacles and produce recombinant EPO containing N-glycans terminating in sialic acid on both branches by introducing human SAS, human CSAS, mouse CMP–sialic acid transporter, and mouse SiaT into the engineered P. pastoris strain producing the proper Gal2GlcNAc2Man3GlcNAc2 acceptor.23,87 These results and all the studies listed above illustrate convincingly the potential of metabolic engineering for remodeling the glycan structures in yeast, insect cells, and plants to forms that are similar to those obtained from mammalian hosts.

10.4 γ-Carboxylation Vitamin K-dependent (VKD) proteins require an unusual post-translational modification for their biological activity, namely γ-carboxylation. This modification consists of the addition of an extra carboxyl group onto the γ-carbon of specific glutamyl (Glu) residues, converting them to γ-carboxylated glutamyl (Gla) residues (Figure 10.9a). The carboxylation confers metal binding properties onto these proteins. In the presence of calcium ions, mature VKD proteins undergo a conformation change that leads to the ability to interact with anionic phospholipids exposed on the surfaces of cells or hydroxy-apatite present in the extracellular matrix. Until the 1980s, this specialized post-translational modification was considered significant only for hemostatic proteins (prothrombin, factors II, VII, IX, and X, and proteins C, S, and Z). However, other VKD proteins are now known to play a role in growth control, bone development, and regulation of biomineralization and signal transduction.88,89 The enzyme responsible for the conversion of Glu to Gla residues is the γ-glutamyl carboxylase, also known as VKD-carboxylase or GGCX. This enzyme is an integral membrane enzyme that resides in the ER and requires the energy of the reduced hydroquinone form of vitamin K (KH2) as the active cofactor.90 Generally, mammalian VKD proteins contain a highly conserved propeptide sequence located after the signal peptide.91 This propeptide acts as a recognition sequence for the γ-carboxylase to bind and perform γ-carboxylation.92 The cluster of Glu residues to be modified is found in a region called the Gla domain, which is located adjacent to the propeptide region. VKD proteins are carboxylated during their secretory pathway to the cell surface. Subsequent to γ-carboxylation, the propeptide is cleaved from the VKD proteins in a late Golgi event.92 To date, γ-carboxylase activity has only been detected in multicellular organisms, i.e., mammals, Drosophila, and the marine snail Conus. Homology searches have also revealed carboxylase orthologs in bacteria, but whether they have activity or not is unknown.90 The enzymatic reaction in the Conus system has been shown to have many striking similarities to mammalian systems, such as the requirement for reduced vitamin K, the presence of a γ-carboxylation recognition site on the propeptide, and the presence of a propeptide adjacent to the Glu residues undergoing carboxylation.93 Despite these similarities, the mammalian and Conus propeptides are not homologous.94 In addition, substrates from one organism are not or are poorly carboxylated by γ-carboxylases from another phyla.93,95–99 Drosophila VKD proteins have not been isolated yet, so the role of γ-carboxylation in this organism is unknown. Many of the recombinant VKD (r-VKD) proteins have been expressed in mammalian cells. However, the production and secretion of r-VKD proteins is complicated by the limited γ-carboxylation capacity observed in mammalian cell systems. At low levels of VKD protein production, the protein is secreted as fully γ-carboxylated and biological active proteins. However, when expression levels are increased, γ-carboxylation is impaired. Consequently, only a small fraction of fully γ-carboxylated proteins is obtained, while uncarboxylated and undercarboxylated forms are observed as well.90 Individual VKD proteins respond differently to this overexpression—secreted factor X is a mixture of uncarboxylated

10-14

Gene Expression Tools for Metabolic Pathway Engineering

(a) H N ...

...

...

Cytoplasm

N H COO–

...

N H

(b)

O

H N

O

KH2

COO– Gla

COO– Glu CO2

γ-carboxylase

γ-carboxylase

KO VKOR

O2

OH

O

R

R

ER LUMEN

O OH Reduced vitamin K (KH2)

O Vitamin K epoxide (KO) O R

VKOR

VKOR

QR O Vitamin K

de-, partially and fully carboxylated VKD proteins

Figure 10.9 The γ-carboxylation system. (a) The γ-carboxylase converts Glu to Gla residues in VKD-proteins by the addition of CO2 to Glu in a reaction requiring O2 and reduced vitamin K (KH2). This reaction results in oxidation of the KH2 to KO. The VKOR enzyme recycles back KO to KH2. There is also a second route for KH2 generation, via one or more quinone reductases (QR) that convert vitamin K to KH2. This cyclic interconversion of vitamin K metabolites constitutes the vitamin K cycle. (b) Hypothetical model of the γ-carboxylation system. γ-carboxylase and VKOR are parts of a supramolecular assembly of proteins responsible for γ-carboxylation in the ER membrane. VKD-proteins contain propeptides (indicated in black) that can bind to the γ-carboxylase, resulting in the conversion of multiples Glus to Glas (). Each Glu to Gla conversion requires one reduced vitamin K (KH2), which is recycled from KO by VKOR.

and fully carboxylated forms, but not partially γ-carboxylated proteins, while secreted factor IX comprises of all three forms.100,101 To improve γ-carboxylation of these systems, researchers overexpressed γ-carboxylase in CHO, BHK and 293 cell lines expressing recombinant factor IX (r-fIX). However, this overexpression reduced the release of r-fIX from these cells and did not improve the specific activity of r-fIX, which suggests that the γ-carboxylase is not the limiting factor.102–104 Researchers have shown that although r-fIX was fully γ-carboxylated, 95% of the r-fIX remained associated with the γ-carboxylase.103 They also suggested that the availability of the reduced vitamin K cofactor limits γ-carboxylation in mammalian cells.103 The VKD γ-carboxylation system does not consist of only γ-carboxylase; rather, it is a multicomponent enzyme system in the ER. The modification is carried out by (i) the γ-carboxylase, which requires the KH2 vitamin cofactor, CO2 and O2 and (ii) the vitamin K epoxide reductase (VKOR), which produces the KH2 cofactor. Concomitantly with γ-carboxylation, the vitamin KH2 is converted to the metabolite vitamin K epoxide (KO), which is reduced back to the vitamin KH2 cofactor by VKOR. Thus, VKOR recycles vitamin KH2 cofactor; and this interconversion of vitamin K metabolites is known as the vitamin K cycle.88,89,105 It is also known that lipids from ER membrane are essential for the operation of this

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes

10-15

system, probably by protecting vitamin KH2 from the oxidative environment of ER compartment.106 Thus, VKOR and γ-carboxylase must be positioned together closely enough in the membrane for a vitamin KH2 cofactor transfer from VKOR to γ-carboxylase (Figure 10.9). One study has shown, in an in vitro preparation, that VKOR and not the γ-carboxylase is the ratelimiting step of the γ-carboxylation system.107 Support of these results in vivo has been hampered because the components of the VKOR lipid–enzyme complex have not been fully identified, which impairs its expression in cell lines. Recently, the gene that encodes a proposed subunit (VKORC1) of the VKOR complex was identified.108,109 Some researchers have been able to clone and express VKORC1 in BHK cell line.110 These stably transfected cells, containing VKORC1 gene or double transfected γ-carboxylase and VKORC1 genes, showed increased activity of the γ-carboxylation system for a synthetic γ-carboxylase peptide substrate FLEEL. The expression of r-fIX in these engineered BHK cell lines showed that overexpression of VKORC1 increases the level of γ-carboxylated r-fIX secreted in the medium.104 Overexpression of VKORC1 alone and a combination of VKORC1 and γ-carboxylase resulted in 50 and 34%, respectively, of γ-carboxylated r-fIX found in the medium—a significantly higher level compared to 18% found in the parent BHK cells.104 The researchers suggest that overexpression of γ-carboxylase is not needed for increased production of r-VKD proteins, rather what is needed is a sufficient supply of its cofactor.104 Although engineered BHK cell lines have demonstrated an increase in γ-carboxylation, the specific activity of r-fIX did not change significantly between the engineered and parent cell lines. In a similar study, other researchers also reported a 2.2-fold increase in γ-carboxylated r-fIX being secreted from BHK overexpressing VKORC1 cells,101 which is in agreement with the 2.9fold reported by the first study.104 This study also measured the overexpression of VKORC1 and reported it to be 15-fold.101 Thus, the increases in γ-carboxylation were significantly smaller than the amount of VKORC1 overexpressed. This data indicates that the effect of VKORC1 is limited, possibly due to a saturation of at least one additional factor that is also required for VKD protein carboxylation. Since VKOR is oxidized and inactivated during each conversion of KO to KH 2 , and the active enzyme is regenerated by a redox protein, the component could be a redox protein. Researchers have also demonstrated that calumenin, an ER chaperone of the CREC family, is an inhibitory protein of γ-carboxylase and is associated with γ-carboxylase in the ER.111 Recently, these researchers eliminated calumenin expression in BHK cell lines expressing r-fIX and both r-fIX and VKORC1 by siRNA silencing. The production yield of functional r-fIX was 80%, in contrast to the 18 and 50% for BHK cell lines expressing r-fIX only and BHK coexpressing r-fIX and VKORC1, respectively.112 Nevertheless, it is not known whether cell lines engineered to stably overexpress calumenin siRNA would be viable, since gene deletions of members of the CREC family of proteins have been shown to be lethal.113 Although mammalian VKD propeptides share sequence homology, as mentioned previously, their relative affinities for the γ-carboxylase vary considerably, with the propeptide of factor X (fX) binding with the highest affinity, followed by the propeptides of fVII, protein S, fIX, protein C, and prothrombin.114 One study constructed a chimeric fX cDNA harboring the prothrombin propeptide and stably transfected into HEK-293 cells.115 They demonstrated that the expression of r-fX with the prothrombin propeptide could yield 70–90% of fully γ-carboxylated material, compared to 20–45% when using native r-fX construction—depending on the expression level of the recombinant protein.115 They hypothesized that the rate of substrate turnover by the γ-carboxylase may be influenced by the affinity of the propeptide, with high-affinity binding propeptides having lowest substrate turnover.115 These results agree with observations that expression of r-prothrombin and r-protein C result in fully γ-carboxylated proteins in CHO and HEK 293 cells, respectively.116–121 In a later study, this group also overexpressed the γ-carboxylase alone, VKORC1 alone, and a combination of them in a HEK-293 cell line expressing the chimeric r-fX.100 In the cell line expressing only the chimeric r-fX, 52% of it was carboxylated. When coexpressing with γ-carboxylase alone, VKORC1 alone, and γ-carboxylase/VKORC1, the fraction of fully γ-carboxylated r-fX increased to 57%, 92%, and 100%, respectively.100 They also observed that an overexpression of the γ-carboxylase alone did not affect the release of r-fX from cells, in contrast to

10-16

Gene Expression Tools for Metabolic Pathway Engineering

reports for r-fIX mentioned above. These results support the idea that the mechanism for processing each VKD protein may have slight differences and result in different yields from overexpression of either γ-carboxylase or VKORC1. These studies have shown that γ-carboxylation can be a limiting step in the production of several heterologous proteins. However, the particular bottleneck may exist at the enzymatic activity step or in the turnover of the cofactors. A complete understanding of the γ-carboxylation system at the molecular level and the identification of all components involved during the process will be essential for the optimal synthesis of competent r-VKD-proteins in mammalian cell lines.

10.5 Furin Cleavage Like γ−carboxylation, furin, and its related proprotein convertases, play an important role in the transGolgi network by transforming many different protein precursors into their active forms.122 These enzymes typically cleave at the consensus basic amino acid sequence Arg-X-(Lys/Arg)-Arg.122 One limitation encountered when engineering mammalian cells to produce high levels of recombinant proteins is the low native levels of furin available in the cells.123 The low levels of furin create a bottleneck that limits the amount of active protein secreted from the engineered cell line.123 Researchers have attempted to overcome this bottleneck by overexpressing furin in CHO cells in combination with the production of recombinant proteins such as vWF and transforming growth factor beta 1 (TGFβ1).123 When furin is coexpressed in animal cell lines, active and matured forms of both recombinant proteins are found at higher levels, demonstrating the positive effect of overexpressing furin in protein production. Researchers also investigated the possibility of using furin to transform recombinant prosomatostatin (PSS) into the end products of somatostatin 28 (SS-28) in COS-7.124 In this study, engineered COS-7 cell lines expressing rPSS had varying levels of furin.124 When the levels of furin doubled, there was a correlating rise in SS-28 expression and a correlating reduction in the amount of whole rPSS.124 The intracellular location and level of furin appear to be important factors in the processing of proproteins.125 One study found that α-1-microglobulin-bikunin was processed equally in both wild-type CHO cells and RPE.40 cells, a mutated CHO cell line which likely has a nonfunctional furin, suggesting that furin did not cleave the target protein.125 However, when researchers overexpressed the processing enzyme and α-1-microglobulin-bikunin in COS, CHO, and RPE.40 cells, processing of the target protein was enhanced, indicating that both the cellular compartment and levels of intracellular protease are important for enhancing the cleavage of precursors.125 Researchers have also evaluated the effects of furin expression on precursor processing in insect cells. One study involved production of simian-human immunodeficiency virus-like particles (SHIV VLPs) in baculovirus-infected insect cells. These cells produced VLP containing primarily the precursor envelope (Env) protein, gp160.126 However, when researchers coexpressed recombinant furin in the insect cell lines, there was an increase in the cleavage of gp160 into the processed gp120 Env protein.126 Researchers also attempted to increase the production levels of TGFβ1 in insect cells by overexpressing furin. To do so, the precursor form of the growth factor and human furin convertase were both expressed in High Five insect cells.127 This coexpression resulted in a nearly eight-fold increase in the production of active growth factor as compared to its level in the absence of furin coexpression.127 Thus, this study and similar ones described above demonstrate that furin and other convertases can represent a critical rate-limiting step in the production of complex proteins requiring protease processing for generating proper activity.

10.6 Conclusion In this chapter, we have discussed the various studies that have brought insight to a range of posttranslational modifications involved in the secretory pathway. These discoveries have led to improved efficiencies in recombinant protein processing and production of heterologous protein in mammalian

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes

10-17

and nonmammalian hosts. As genomic and cell biology studies increase our understanding of the secretory pathway of eukaryotes, many more metabolic engineering enhancements are likely in the future in order to improve the processing capacities of these hosts and to augment their pathways with new capabilities not possible in the native organisms.

References 1. Robinson, A.S., Hines, V., and Wittrup, K.D. Protein disulfide isomerase overexpression increases secretion of foreign proteins in Saccharomyces cerevisiae. Biotechnology (NY), 12, 381, 1994. 2. Xu, P., et al. Analysis of unfolded protein response during single-chain antibody expression in Saccaromyces cerevisiae reveals different roles for BiP and PDI in folding. Metab. Eng., 7, 269, 2005. 3. Zhang, W., et al. Enhanced secretion of heterologous proteins in Pichia pastoris following overexpression of Saccharomyces cerevisiae chaperone proteins. Biotechnol. Prog., 22, 1090, 2006. 4. Wegele, H., Haslbeck, M., and Buchner, J. Recombinant expression and purification of Ssa1p (Hsp70) from Saccharomyces cerevisiae using Pichia pastoris. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci., 786, 109, 2003. 5. Ngosuwan, J., et al. Roles of cytosolic Hsp70 and Hsp40 molecular chaperones in post-translational translocation of presecretory proteins into the endoplasmic reticulum. J. Biol. Chem., 278, 7034, 2003. 6. Gasser, B., et al. Engineering of Pichia pastoris for improved production of antibody fragments. Biotechnol. Bioeng., 94, 353, 2006. 7. Inan, M., et al. Enhancement of protein secretion in Pichia pastoris by overexpression of protein disulfide isomerase. Biotechnol. Bioeng., 93, 771, 2006. 8. Hsu, T.A., et al. Rescue of immunoglobulins from insolubility is facilitated by PDI in the baculovirus expression system. Protein Exp. Purif., 7, 281, 1996. 9. Whiteley, E.M., Hsu, T.A., and Betenbaugh, M.J. Thioredoxin domain non-equivalence and antichaperone activity of protein disulfide isomerase mutants in vivo. J. Biol. Chem., 272, 22556, 1997. 10. Borth, N., et al. Effect of increased expression of protein disulfide isomerase and heavy chain binding protein on antibody secretion in a recombinant CHO cell line. Biotechnol. Prog., 21, 106, 2005. 11. Shusta, E.V., et al. Increasing the secretory capacity of Saccharomyces cerevisiae for production of single-chain antibody fragments. Nat. Biotechnol., 16, 773, 1998. 12. Robinson, A.S., et al., Reduction of BiP levels decreases heterologous protein secretion in Saccharomyces cerevisiae. J. Biol. Chem., 271, 10017, 1996. 13. Dorner, A.J., Wasley, L.C., and Kaufman, R.J. Overexpression of GRP78 mitigates stress induction of glucose regulated proteins and blocks secretion of selective proteins in Chinese hamster ovary cells. EMBO J., 11, 1563, 1992. 14. Dorner, A.J., Krane, M.G., and Kaufman, R.J. Reduction of endogenous GRP78 levels improves secretion of a heterologous protein in CHO cells. Mol. Cell. Biol., 8, 4063, 1988. 15. Smales, C.M., et al. Comparative proteomic analysis of GS-NS0 murine myeloma cell lines with varying recombinant monoclonal antibody production rate. Biotechnol. Bioeng., 88, 474, 2004. 16. Alete, D.E., et al. Proteomic analysis of enriched microsomal fractions from GS-NS0 murine myeloma cells with varying secreted recombinant monoclonal antibody productivities. Proteomics, 5, 4689, 2005. 17. Tigges, M. and Fussenegger, M. Xbp1-based engineering of secretory capacity enhances the productivity of Chinese hamster ovary cells. Metab. Eng., 2006. 18. Hsu, T.A., Eiden, J.J., and Betenbaugh, M.J. Engineering the assembly pathway of the baculovirusinsect cell expression system. Ann. NY Acad. Sci., 721, 208, 1994. 19. Ailor, E. and Betenbaugh, M.J. Modifying secretion and post-translational processing in insect cells. Curr. Opin. Biotechnol., 10, 142, 1999.

10-18

Gene Expression Tools for Metabolic Pathway Engineering

20. Hsu, T.A. and Betenbaugh, M.J. Coexpression of molecular chaperone BiP improves immunoglobulin solubility and IgG secretion from Trichoplusia ni insect cells. Biotechnol. Prog., 13, 96, 1997. 21. Whiteley, E., Hsu, T.A., and Betenbaugh, M.J. Modeling assembly, aggregation, and chaperoning of immunoglobulin G production in insect cells. Biotechnol. Bioeng., 56, 106, 1997. 22. Hebert, D.N., Foellmer, B., and Helenius, A. Calnexin and calreticulin promote folding, delay oligomerization and suppress degradation of influenza hemagglutinin in microsomes. EMBO J., 15, 2961, 1996. 23. Betenbaugh, M.J., Tomiya, N., and Narang, S. Glycoengineering: recombinant glycoproteins. In: Comprehensive Glycoscience. Elsevier, Amsterdam, NL, 2007. 24. Huppa, J.B. and Ploegh, H.L. The eS-Sence of -SH in the ER. Cell, 92, 145, 1998. 25. Molinari, M., and Helenius, A. Glycoproteins form mixed disulphides with oxidoreductases during folding in living cells. Nature, 402, 90, 1999. 26. Oliver, J.D., et al. ERp57 functions as a subunit of specific complexes formed with the ER lectins calreticulin and calnexin. Mol. Biol. Cell, 10, 2573, 1999. 27. Oliver, J.D., et al. Interaction of the thiol-dependent reductase ERp57 with nascent glycoproteins. Science, 275, 86, 1997. 28. Helenius, A. and Aebi, M. Intracellular functions of N-linked glycans. Science, 291, 2364, 2001. 29. Ellgaard, L. and Helenius, A. ER quality control: towards an understanding at the molecular level. Curr. Opin. Cell Biol., 13, 431, 2001. 30. Ellgaard, L. and Helenius, A. Quality control in the endoplasmic reticulum. Nat. Rev. Mol. Cell Biol., 4, 181, 2003. 31. Parodi, A.J. Role of N-oligosaccharide endoplasmic reticulum processing reactions in glycoprotein folding and degradation. Biochem. J., 348 (1), 1, 2000. 32. Ritter, C. and Helenius, A. Recognition of local glycoprotein misfolding by the ER folding sensor UDP-glucose:glycoprotein glucosyltransferase. Nat. Struct. Biol., 7, 278, 2000. 33. Trombetta, E.S. and Helenius, A. Conformational requirements for glycoprotein reglucosylation in the endoplasmic reticulum. J. Cell Biol., 148, 1123, 2000. 34. Hammond, C. and Helenius, A. Quality control in the secretory pathway. Curr. Opin. Cell Biol., 7, 523, 1995. 35. Tsai, B., Ye, Y., and Rapoport, T.A. Retro-translocation of proteins from the endoplasmic reticulum into the cytosol. Nat. Rev. Mol. Cell Biol., 3, 246, 2002. 36. Brodsky, J.L. and McCracken, A.A. ER protein quality control and proteasome-mediated protein degradation. Semin. Cell Dev. Biol., 10, 507, 1999. 37. Jarosch, E., et al. Protein dislocation from the endoplasmic reticulum—pulling out the suspect. Traffic, 3, 530, 2002. 38. Spiro, R.G. Role of N-linked polymannose oligosaccharides in targeting glycoproteins for endoplasmic reticulum-associated degradation. Cell Mol. Life Sci., 61, 1025, 2004. 39. Hosokawa, N., et al. A novel ER alpha-mannosidase-like protein accelerates ER-associated degradation. EMBO Rep., 2, 415, 2001. 40. Mast, S.W., et al. Human EDEM2, a novel homolog of family 47 glycosidases, is involved in ER-associated degradation of glycoproteins. Glycobiology, 15, 421, 2005. 41. Molinari, M., et al. Role of EDEM in the release of misfolded glycoproteins from the calnexin cycle. Science, 299, 1397, 2003. 42. Hosokawa, N., et al. Enhancement of endoplasmic reticulum (ER) degradation of misfolded Null Hong Kong alpha1-antitrypsin by human ER mannosidase I. J. Biol. Chem., 278, 26287, 2003. 43. Fayadat, L., et al. Calnexin and calreticulin binding to human thyroperoxidase is required for its first folding step(s) but is not sufficient to promote efficient cell surface expression. Endocrinology, 141, 959, 2000. 44. Le Fourn, V., et al. Competition between calnexin and BiP in the endoplasmic reticulum can lead to the folding or degradation of human thyroperoxidase. Biochemistry, 45, 7380, 2006.

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes

10-19

45. Chung, J.Y., et al. Effect of doxycycline-regulated calnexin and calreticulin expression on specific thrombopoietin productivity of recombinant Chinese hamster ovary cells. Biotechnol. Bioeng., 85, 539, 2004. 46. Tate, C.G., Whiteley, E., and Betenbaugh, M.J. Molecular chaperones stimulate the functional expression of the cocaine-sensitive serotonin transporter. J. Biol. Chem., 274, 17551, 1999. 47. Higgins, M.K., Demir, M., and Tate, C.G. Calnexin co-expression and the use of weaker promoters increase the expression of correctly assembled Shaker potassium channel in insect cells. Biochim. Biophys. Acta, 1610, 124, 2003. 48. Conesa, A., et al. Calnexin overexpression increases manganese peroxidase production in Aspergillus niger. Appl. Environ. Microbiol., 68, 846, 2002. 49. Ailor, E., et al. A bacterial signal peptidase enhances processing of a recombinant single chain antibody fragment in insect cells. Biochem. Biophys. Res. Commun., 255, 444, 1999. 50. Jones, J., Krag, S.S., and Betenbaugh, M.J. Controlling N-linked glycan site occupancy. Biochim. Biophys. Acta, 1726, 121, 2005. 51. Betenbaugh, M.J., et al. Biosynthesis of human-type N-glycans in heterologous systems. Curr. Opin. Struct. Biol., 14, 601, 2004. 52. Tomiya, N., et al. Comparing N-glycan processing in mammalian cell lines to native and engineered lepidopteran insect cell lines. Glycoconj. J., 21, 343, 2004. 53. Borman, S. Glycosylation Engineering: controlling personalities tame wild sugars on proteins and natural products. Chem. Eng. News, 84, 13, 2006. 54. Yamaguchi, K., et al. Effects of site-directed removal of N-glycosylation sites in human erythropoietin on its production and biological properties. J. Biol. Chem., 266, 20434, 1991. 55. Higuchi, M., et al. Role of sugar chains in the expression of the biological activity of human erythropoietin. J. Biol. Chem., 267, 7703, 1992. 56. Elliott, S., et al. Enhancement of therapeutic protein in vivo activities through glycoengineering. Nat. Biotechnol., 21, 414, 2003. 57. Mason, A.B., et al. Expression of glycosylated and nonglycosylated human transferrin in mammalian cells. Characterization of the recombinant proteins with comparison to three commercially available transferrins. Biochemistry, 32, 5472, 1993. 58. Eklund, E.A. and Freeze, H.H. The congenital disorders of glycosylation: a multifaceted group of syndromes. NeuroRx, 3, 254, 2006. 59. Wada, Y., et al. Structure of serum transferrin in carbohydrate-deficient glycoprotein syndrome. Biochem. Biophys. Res. Commun., 189, 832, 1992. 60. Ferrara, C., et al. Modulation of therapeutic antibody effector functions by glycosylation engineering: influence of Golgi enzyme localization domain and co-expression of heterologous beta1, 4-N-acetylglucosaminyltransferase III and Golgi alpha-mannosidase II. Biotechnol. Bioeng., 93, 851, 2006. 61. Ailor, E., et al. N-glycan patterns of human transferrin produced in Trichoplusia ni insect cells: effects of mammalian galactosyltransferase. Glycobiology, 10, 837, 2000. 62. Lawrence, S.M., et al. Cloning and expression of human sialic acid pathway genes to generate CMPsialic acids in insect cells. Glycoconj. J., 18, 205, 2001. 63. Viswanathan, K., et al. Engineering sialic acid synthetic ability into insect cells: identifying metabolic bottlenecks and devising strategies to overcome them. Biochemistry, 42, 15215, 2003. 64. Viswanathan, K., et al. Engineering intracellular CMP-sialic acid metabolism into insect cells and methods to enhance its generation. Biochemistry, 44, 7526, 2005. 65. Strasser, R., et al. Generation of Arabidopsis thaliana plants with complex N-glycans lacking beta1, 2-linked xylose and core alpha1,3-linked fucose. FEBS Lett., 561, 132, 2004. 66. Koprivova, A., et al. Targeted knockouts of Physcomitrella lacking plant-specific immunogenic N-glycans. Plant Biotechnol. J., 2, 517, 2004. 67. Ko, K., et al. Function and glycosylation of plant-derived antiviral monoclonal antibody. Proc. Natl. Acad. Sci. USA, 100, 8013, 2003.

10-20

Gene Expression Tools for Metabolic Pathway Engineering

68. Bakker, H., et al. An antibody produced in tobacco expressing a hybrid beta-1,4-galactosyltransferase is essentially devoid of plant carbohydrate epitopes. Proc. Natl. Acad. Sci. USA, 103, 7577, 2006. 69. Huether, C.M., et al. Glyco-engineering of moss lacking plant-specific sugar residues. Plant Biol. (Stuttg), 7, 292, 2005. 70. Wee, E.G., et al. Targeting of active sialyltransferase to the plant Golgi apparatus. Plant Cell, 10, 1759, 1998. 71. Li, H., et al. Optimization of humanized IgGs in glycoengineered Pichia pastoris. Nat. Biotechnol., 24, 210, 2006. 72. Chiba, Y., et al. Production of human compatible high mannose-type (Man5GlcNAc2) sugar chains in Saccharomyces cerevisiae. J. Biol. Chem., 273, 26298, 1998. 73. Choi, B.K., et al. Use of combinatorial genetic libraries to humanize N-linked glycosylation in the yeast Pichia pastoris. Proc. Natl. Acad. Sci. USA, 100, 5022, 2003. 74. Hamilton, S.R., et al. Production of complex human glycoproteins in yeast. Science, 301, 1244, 2003. 75. Huffaker, T.C. and Robbins, P.W. Yeast mutants deficient in protein glycosylation. Proc. Natl. Acad. Sci. USA, 80, 7466, 1983. 76. Aebi, M., et al. Cloning and characterization of the ALG3 gene of Saccharomyces cerevisiae. Glycobiology, 6, 439, 1996. 77. Sharma, C.B., Knauer, R., and Lehle, L. Biosynthesis of lipid-linked oligosaccharides in yeast: the ALG3 gene encodes the Dol-P-Man:Man5GlcNAc2-PP-Dol mannosyltransferase. Biol. Chem., 382, 321, 2001. 78. Verostek, M.F., Atkinson, P.H., and Trimble, R.B. Glycoprotein biosynthesis in the alg3 Saccharomyces cerevisiae mutant. I. Role of glucose in the initial glycosylation of invertase in the endoplasmic reticulum. J. Biol. Chem., 268, 12095, 1993. 79. Verostek, M.F., Atkinson, P.H., and Trimble, R.B. Glycoprotein biosynthesis in the alg3 Saccharomyces cerevisiae mutant. II. Structure of novel Man6-10GlcNAc2 processing intermediates on secreted invertase. J. Biol. Chem., 268, 12104, 1993. 80. Hollister, J.R., Shaper, J.H., and Jarvis, D.L. Stable expression of mammalian beta 1,4-galactosyltransferase extends the N-glycosylation pathway in insect cells. Glycobiology, 8, 473, 1998. 81. Jarvis, D.L. and Finn, E.E. Modifying the insect cell N-glycosylation pathway with immediate early baculovirus expression vectors. Nat. Biotechnol., 14, 1288, 1996. 82. Wolff, M.W., et al. Electrophoretic analysis of glycoprotein glycans produced by lepidopteran insect cells infected with an immediate early recombinant baculovirus encoding mammalian beta1, 4galactosyltransferase. Glycoconj. J., 16, 753, 1999. 83. Breitbach, K. and Jarvis, D.L. Improved glycosylation of a foreign protein by Tn-5B1-4 cells engineered to express mammalian glycosyltransferases. Biotechnol. Bioeng., 74, 230, 2001. 84. Davidson, R.C., et al. Functional analysis of the ALG3 gene encoding the Dol-P-Man: Man5GlcNAc2PP-Dol mannosyltransferase enzyme of P. pastoris. Glycobiology, 14, 399, 2004. 85. Bobrowicz, P., et al. Engineering of an artificial glycosylation pathway blocked in core oligosaccharide assembly in the yeast Pichia pastoris: production of complex humanized glycoproteins with terminal galactose. Glycobiology, 14, 757, 2004. 86. Kainuma, M., et al. Coexpression of alpha1,2 galactosyltransferase and UDP-galactose transporter efficiently galactosylates N- and O-glycans in Saccharomyces cerevisiae. Glycobiology, 9, 133, 1999. 87. Hamilton, S.R., et al. Humanization of yeast to produce complex terminally sialylated glycoproteins. Science, 313, 1441, 2006. 88. Oldenburg, J., et al. Vitamin K epoxide reductase complex subunit 1 (VKORC1): the key protein of the vitamin K cycle. Antioxid. Redox Signal, 8, 347, 2006. 89. Stafford, D.W. The vitamin K cycle. J. Thromb. Haemost., 3, 1873, 2005. 90. Berkner, K.L. The vitamin K-dependent carboxylase. Ann. Rev. Nutr., 25, 127, 2005.

Metabolic Engineering of the Secretory Processing Pathway in Eukaryotes

10-21

91. Furie, B. and Furie, B.C. Molecular basis of vitamin K-dependent gamma-carboxylation. Blood, 75, 1753, 1990. 92. Furie, B., Bouchard, B.A., and Furie, B.C. Vitamin K-dependent biosynthesis of gamma-carboxyglutamic acid. Blood, 93, 1798, 1999. 93. Walker, C.S., et al. On a potential global role for vitamin K-dependent gamma-carboxylation in animal systems. Evidence for a gamma-glutamyl carboxylase in Drosophila. J. Biol. Chem., 276, 7769, 2001. 94. Bush, K.A., et al. Hydrophobic amino acids define the carboxylation recognition site in the precursor of the gamma-carboxyglutamic-acid-containing conotoxin epsilon-TxIX from the marine cone snail Conus textile. Biochemistry, 38, 14660, 1999. 95. Stanley, T.B., et al. Identification of a vitamin K-dependent carboxylase in the venom duct of a Conus snail. FEBS Lett., 407, 85, 1997. 96. Bandyopadhyay, P.K., et al. Conantokin-G precursor and its role in gamma-carboxylation by a vitamin K-dependent carboxylase from a Conus snail. J. Biol. Chem., 273, 5447, 1998. 97. Li, T., et al. Identification of a Drosophila vitamin K-dependent gamma-glutamyl carboxylase. J. Biol. Chem., 275, 18291, 2000. 98. Bandyopadhyay, P.K., et al. gamma-Glutamyl carboxylation: an extracellular posttranslational modification that antedates the divergence of molluscs, arthropods, and chordates. Proc. Natl. Acad. Sci. USA, 99, 1264, 2002. 99. Czerwiec, E., et al. Expression and characterization of recombinant vitamin K-dependent gammaglutamyl carboxylase from an invertebrate, Conus textile. Eur. J. Biochem., 269, 6162, 2002. 100. Sun, Y.M., et al. Vitamin K epoxide reductase significantly improves carboxylation in a cell line overexpressing factor X. Blood, 106, 3811, 2005. 101. Hallgren, K.W., et al. r-VKORC1 expression in factor IX BHK cells increases the extent of factor IX carboxylation but is limited by saturation of another carboxylation component or by a shift in the rate-limiting step. Biochemistry, 45, 5587, 2006. 102. Rehemtulla, A., et al. In vitro and in vivo functional characterization of bovine vitamin K-dependent gamma-carboxylase expressed in Chinese hamster ovary cells. Proc. Natl. Acad. Sci. USA, 90, 4611, 1993. 103. Hallgren, K.W., et al. Carboxylase overexpression effects full carboxylation but poor release and secretion of factor IX: implications for the release of vitamin K-dependent proteins. Biochemistry, 41, 15045, 2002. 104. Wajih, N., et al. Increased production of functional recombinant human clotting factor IX by baby hamster kidney cells engineered to overexpress VKORC1, the vitamin K 2,3-epoxide-reducing enzyme of the vitamin K cycle. J. Biol. Chem., 280, 31603, 2005. 105. Wallin, R. and Hutson, S.M. Warfarin and the vitamin K-dependent gamma-carboxylation system. Trends Mol. Med., 10, 299, 2004. 106. Cain, D., Hutson, S.M., and Wallin, R. Assembly of the warfarin-sensitive vitamin K 2,3-epoxide reductase enzyme complex in the endoplasmic reticulum membrane. J. Biol. Chem., 272, 29068, 1997. 107. Wallin, R., Sane, D.C., and Hutson, S.M. Vitamin K 2,3-epoxide reductase and the vitamin K-dependent gamma-carboxylation system. Thromb. Res., 108, 221, 2002. 108. Rost, S., et al. Mutations in VKORC1 cause warfarin resistance and multiple coagulation factor deficiency type 2. Nature, 427, 537, 2004. 109. Li, T., et al. Identification of the gene for vitamin K epoxide reductase. Nature, 427, 541, 2004. 110. Wajih, N., et al. Engineering of a recombinant vitamin K-dependent gamma-carboxylation system with enhanced gamma-carboxyglutamic acid forming capacity: evidence for a functional CXXC redox center in the system. J. Biol. Chem., 280, 10540, 2005. 111. Wajih, N., et al. The inhibitory effect of calumenin on the vitamin K-dependent gammacarboxylation system. Characterization of the system in normal and warfarin-resistant rats. J. Biol. Chem., 279, 25276, 2004.

10-22

Gene Expression Tools for Metabolic Pathway Engineering

112. Wajih, N., Hutson, S.M., and Wallin, R. siRNA silencing of calumenin enhances functional factor IX production. Blood, 108, 3757, 2006. 113. Honore, B. and Vorum, H. The CREC family, a novel family of multiple EF-hand, low-affinity Ca(2 +)-binding proteins localised to the secretory pathway of mammalian cells. FEBS Lett., 466, 11, 2000. 114. Stanley, T.B., et al. The propeptides of the vitamin K-dependent proteins possess different affinities for the vitamin K-dependent carboxylase. J. Biol. Chem., 274, 16940, 1999. 115. Camire, R.M., et al. Enhanced gamma-carboxylation of recombinant factor X using a chimeric construct containing the prothrombin propeptide. Biochemistry, 39, 14322, 2000. 116. Jorgensen, M.J., et al. Expression of completely gamma-carboxylated recombinant human prothrombin. J. Biol. Chem., 262, 6729, 1987. 117. Huber, P., et al. Identification of amino acids in the gamma-carboxylation recognition site on the propeptide of prothrombin. J. Biol. Chem., 265, 12467, 1990. 118. Yan, S.C., et al. Characterization and novel purification of recombinant human protein C from three mammalian cell lines. Biotechnology (NY), 8, 655, 1990. 119. Zhang, L. and Castellino, F.J. A gamma-carboxyglutamic acid (gamma) variant (gamma 6D, gamma 7D) of human activated protein C displays greatly reduced activity as an anticoagulant. Biochemistry, 29, 10828, 1990. 120. McClure, D.B., Walls, J.D., and Grinnell, B.W. Post-translational processing events in the secretion pathway of human protein C, a complex vitamin K-dependent antithrombotic factor. J. Biol. Chem., 267, 19710, 1992. 121. Sugiura, T. and Maruyama, H.B. Factors influencing expression and post-translational modification of recombinant protein C. J. Biotechnol., 22, 353, 1992. 122. Nakayama, K. Furin: a mammalian subtilisin/Kex2p-like endoprotease involved in processing of a wide variety of precursor proteins. Biochem. J., 327 (3), 625, 1997. 123. Ayoubi, T.A., et al. Production of recombinant proteins in Chinese hamster ovary cells overexpressing the subtilisin-like proprotein converting enzyme furin. Mol. Biol. Rep., 23, 87, 1996. 124. Galanopoulou, A.S., Seidah, N.G., and Patel, Y.C. Direct role of furin in mammalian prosomatostatin processing. Biochem. J., 309 (1), 33, 1995. 125. Bratt, T., Cedervall, T., and Akerstrom, B. Processing and secretion of rat alpha 1-microglobulinbikunin expressed in eukaryotic cell lines. FEBS Lett., 354, 57, 1994. 126. Yao, Q., et al. Production and characterization of simian-human immunodeficiency virus-like particles. AIDS Res. Hum. Retroviruses, 16, 227, 2000. 127. Laprise, M.H., Grondin, F., and Dubois, C.M. Enhanced TGFbeta1 maturation in high five cells coinfected with recombinant baculovirus encoding the convertase furin/pace: improved technology for the production of recombinant proproteins in insect cells. Biotechnol. Bioeng., 58, 85, 1998.

11 Engineering Multifunctional Enzyme Systems for Optimized Metabolite Transfer between Sequential Conversion Steps

Robert J. Conrado Cornell University

Thomas J. Mansell Cornell University

Matthew P. DeLisa Cornell University

11.1 11.2 11.3 11.4 11.5

Introduction ��11-1 Enzyme-to-Enzyme Channeling..................................................11-2 Metabolic Channeling in Primary Metabolism.........................11-3 Metabolic Channeling in Secondary Metabolism.....................11-5 Advantages Conferred from Multifunctional Enzyme Systems ��11-7 11.6 Engineering Multifunctional Enzymes.......................................11-7 11.7 Engineering Metabolic Channels de Novo..............................11-9 11.8 Concluding Remarks �� 11-11 References �� 11-11

11.1 Introduction A growing body of evidence indicates that many cellular metabolic pathways are catalyzed not by free-floating “soluble” enzymes, but via one or more membrane-associated multienzyme complexes. This type of macromolecular organization has important implications for the overall efficiency, specificity, and regulation of metabolic pathways. An ever-increasing number of biochemical and genetic studies on primary and secondary metabolism have laid a solid foundation for this model, providing compelling evidence in favor of the so-called channeling of intermediates between enzyme active sites and co-localization of enzymes inside of a cell. Information from these studies offers new insights into the structuring of biosynthetic pathways within cells, which should lead to more effective means for engineering the production of valuable metabolites with medical and industrial importance. To gain an appreciation for how such complexes may have arisen, one need look no further than “under the hood” of any living cell. The aqueous interior of a cell (hereafter cytoplasm) is a non-Newtonian fluid full of solutes and macromolecules solvated by water. The cytoplasm is bound by a lipid bilayer and is composed of a heterogeneous mixture of carbohydrates, proteins, and lipids. In Escherichia coli, the concentration of protein, DNA, 11-1

11-2

Gene Expression Tools for Metabolic Pathway Engineering

and RNA can reach 300–400 g/l with a significant volume fraction of 20–30% (Ellis, 2001; Ellis and Minton, 2003; Elowitz et al., 1999; Fulton, 1982). Despite this dense macromolecular presence, the bulk fluid-phase cytoplasm viscosity remains similar to that of water at 1.2–1.4 cP with little spatial variation (Fushimi and Verkman, 1991; Kao et al., 1993; Luby-Phelps et al., 1993). Similar to protein crystals, however, this macromolecular crowding leads to two phases of water, bulk water and water of hydration, the latter of which encompasses the macromolecules and is osmotically inactive (Fulton, 1982). This reduced volume of bulk water lowers the cytoplasm solvent capacity, which the cell must work to conserve (Mendes et al., 1992). Living cells have evolved to maintain a cytoplasm dense with macromolecules and it is this crowded environment that dominates the thermodynamic and entropic energies that drive much of cellular chemistry including movement, kinetics, and interactions. As a result of the crowded interior, the translational diffusion of small molecules and macromolecules differ greatly. The combination of increased viscosity, transient interactions, and hindered movement reduces small molecule diffusion three to four fold in the cytoplasm as compared with pure water, while this effect drops off steeply with increasing molecular weight (Arrio-Dupont et al., 2000; Kao et al., 1993). Enzymes such as green fluorescent protein (GFP) experience an 11-fold reduction in their diffusion in the E. coli cytoplasm (Elowitz et al., 1999) while β-galactosidase is slowed more than 1000 fold in cultured muscle cells (Arrio-Dupont et al., 2000). As a result of the macromolecular confinement within cells, there is a large thermodynamic driving force to minimize volume. The impact of this is seen with an exponential increase in the reaction rate with increased crowding, as governed by the thermodynamic activity of the cell constituents. For enzymatic reactions in the cell that are diffusion limited, this molecular crowding dramatically slows the rate, while reaction limited steps proceed up to 50–100 fold more quickly in a crowded environment (Ellis, 2001; Minton, 2001). The energetic and entropic driving force to reduce volume within the cytoplasm also drives enzyme associations in both a specific (e.g., protein folding and assembly) and a nonspecific manner (e.g., protein misfolding and aggregation). For example, the activity coefficient for a 50 kDa subunit of a tetrameric protein may increase 100 fold in a crowded environment, and therefore homodimer equilibrium can shift eight to 40 fold while tetramerization becomes 103–105 times more favorable (Ellis, 2001; Zimmerman and Trach, 1991). However, molecular confinement can also enhance strongly the propensity of incompletely folded polypeptides to misfold and aggregate (Eggers and Valentine, 2001; van den Berg et al., 1999), a process requiring the assistance of powerful molecular chaperone machines (e.g., Hsp60/GroEL) to be prevented or reversed (Hartl and Hayer-Hartl, 2002). Thus, it is clear from the above examples that the extreme protein crowding in the cytoplasm promotes many unique protein interactions that can be productive or nonproductive with respect to cellular function.

11.2 Enzyme-to-Enzyme Channeling One remarkable instance where macromolecular crowding plays an important and oftentimes unexpected role is in cellular metabolism, specifically in the association of metabolic enzymes (Jorgensen et al., 2005; Ovadi and Srere, 2000; Srere, 1987; Winkel, 2004) (Figure 11.1). Enzyme-to-enzyme channeling (a.k.a. metabolic channeling) and compartmentalization of biochemical reaction modules have long been suggested as key components of cellular metabolism. The first direct evidence of this behavior came in the 1940s when David Green isolated all of the Krebs tricarboxylic acid cycle (TCA) enzymes as an aggregated system, which he termed a “multienzyme complex” (Green, 1957). Meanwhile, cell centrifugation techniques were coming online as a means to isolate cell organelles. Studies of whole cell Neurospora (Zalokar, 1960) and Euglena (Kempner and Miller, 1968) demonstrated that the stratified layers that result from centrifugation correspond roughly to the varying cellular constituents. Remarkably, Kempner and Miller reported that “there may be no free or unbound protein in Euglena,” in agreement with Zalokar’s results a decade earlier. Biochemical studies served to corroborate these ideas of multienzyme complexes, as Yanofsky reported that no free intermediate was detected within the bifunctional tryptophan synthase system (Yanofsky, 1960). Despite the mounting evidence for multifunctional enzymes and enzyme systems, Yanofsky’s finding was not widely accepted until X-ray crystallographic evidence from Salmonella revealed a physical tunnel that connects the active sites of the tryptophan synthase complex and serves

11-3

Engineering Multifunctional Enzyme Systems

RuBisCO

R5PI

PGK

PRK GPD

Figure 11.1 Schematic of complex assembly of known Calvin cycle enzyme interactions. An efficient multienzyme system is formed in vivo by sequential metabolic enzyme associations between ribose-5-phosphate isomerase (R5PI), phosphoribulokinase (PRK), ribulose-bisphosphate carboxylase/oxygenase (RuBisCO), phosphoglycerates kinate (PGK), and glyceraldehyde-phosphate dehydrogenase (GPD).

to prevent diffusion of intermediate into the bulk (Hyde et al., 1988). With the more recent development of the yeast two-hybrid system (Fields and Song, 1989), further protein interaction studies have emerged that demonstrate specific interactions between sequential enzymes in glycolysis and the TCA cycle (Srere, 1987; Sullivan et al., 2003; Winkel, 2004). This progression of ideas has not developed without controversy as it has been argued that diffusion of intermediates is sufficiently fast to remove the need for channeling within living cells (Cornish-Bowden, 1991a,b; Cornish-Bowden and Cardenas, 1993) while others argue that the kinetic data used to support the notion of channeling between sequential enzymes in vivo has been misinterpreted (Petersson, 1991; Wu et al., 1991). Nonetheless, a thorough analysis of the temporal behavior of metabolite pools and cellular control mechanisms demonstrate that channeling between enzymes is important in at least a subset of cases (Kholodenko et al., 1996; Welch and Easterby, 1994).

11.3 Metabolic Channeling in Primary Metabolism Despite the controversy over the evolutionary basis for channeling in terms of kinetics and cellular control, numerous multifunctional enzyme systems have become manifest, spanning a multitude of organisms and impacting both primary and secondary metabolism. Examples of channeling and of multienzyme systems, termed metabolons (Srere et al., 1987), abound in primary metabolism including, but not limited to: glycolysis (Malaisse et al., 2004), fatty acid oxidation (Ishikawa et al., 2004), the TCA cycle (Beeckmans and Kanarek, 1987), pyruvate dehydrogenase (Aevarsson et al., 1999) and glycine decarboxylase (Douce et al., 2001) activities, the carboxysome (Kerfeld et al., 2005), the proteasome (Schmidt et al., 2005), the Calvin cycle (Suss et al., 1993), and the biosynthesis of aromatic amino acids (Welch and Gaertner, 1980). The latter two cases, as well as bacterial carboxysomes and the urea cycle, are discussed in detail below. From the standpoint of primary metabolism, the integration of CO2 into carbohydrate structures through the Calvin cycle in plants demonstrates a crucial, highly conserved sequence of enzymes, with many glycolytic enzyme homologs in prokaryotes and other eukaryotic systems. This coordinated enzyme system provides an experimental model for direct enzyme interaction between both consecutive and nonconsecutive catalytic steps. There is strong evidence from both pea and spinach chloroplasts that demonstrate an isolated multienzyme complex comprising as many as ten enzymes of the Calvin cycle (Anderson et al., 1996; Gontero et al., 1988; Rault et al., 1993). When carefully performed, the isolation of this multienzyme system reveals that these enzymes are bound to the stroma-faced thylakoid membrane in situ (Suss et al., 1993), which provides direct access to a necessary ATP pool generated by the electron-transport chain. Kinetic data further demonstrates that enzymatic activity increases when the individual enzymes are studied as an aggregated system (Gontero et al., 1993). The observation that this multienzyme system could be functionally reconstituted in vitro from individually purified enzymes (Graciet et al., 2003) provides the final evidence for the direct interaction between Calvin cycle proteins.

11-4

Gene Expression Tools for Metabolic Pathway Engineering

The polyaromatic pathway, ubiquitous in microorganisms and higher plants, yields additional examples of multifunctional enzyme systems that operate in primary metabolism. Specifically, production of tryptophan requires 13 total enzyme reactions, seven of which constitute the polyaromatic branch, including the production of phenylalanine and tyrosine as aromatic amino acids. In Neurospora and higher fungi, a coordinated and highly efficient multifunctional enzyme has evolved to couple five distinct enzymatic activities in a single polypeptide. This enzymatic aromatic complex (or AROM protein) is encoded by the arom locus, a continuous 4,812-bp open reading-frame without introns that arose by multiple gene fusions (Charles et al., 1986). The output of this locus is a single mRNA transcript that specifies a pentafunctional polypeptide catalyzing five consecutive steps leading to the production of 5-enolpyruvylshikimate 3-phosphate in the shikimate pathway (Hawkins and Smith, 1991; Hawkins et al., 1993; Welch and Gaertner, 1980). Interestingly, the same enzymatic activity in prokaryotes arises from five uncoupled and monofunctional enzymes (Hawkins, 1987). Evidence that the five active sites of the AROM protein behave as a coordinated multienzyme system come from kinetic studies, where higher catalytic activity and lower Km values were observed for the individual catalytic sites within wild-type complexes over complexes with mutations in another active sites (Welch and Gaertner, 1980). Similar studies demonstrated that the AROM protein has a low-level channeling function probably as a result of the close juxtaposition of five active sites and that this channeling function is only physiologically significant when carbon sources are not constant, which organisms frequently experience in situ (Lamb et al., 1992). Another multifunctional enzyme in the polyaromatic pathway of bacteria, fungi, and plants is the tryptophan synthase complex, which is known to exhibit channeling behavior (Srere, 1987). Tryptophan synthase, an α2β2 complex, is a classic example of an enzyme that is thought to “channel” a metabolic intermediate (indole) from the active site of the α subunit to the active site of the β subunit (Figure 11.2). As mentioned earlier, a physical tunnel with a diameter matching that of the indole intermediate exists within the αββα tryptophan synthase complex which allows indole to pass between active sites without diffusing into the bulk (Houben and Dunn, 1990; Hyde et al., 1988; Matchett, 1974). The use of a stochastic simulation to model tryptophan synthase activity verified this channeling phenomenon and found that the free cellular indole concentration remained well below 1 µM, further demonstrating the kinetic advantage of channeling indole (Degenring et al., 2004). In addition to channeling indole, the reaction rates of the tryptophan synthase complex increase one to two orders of magnitude when the α and β subunits combine over the uncomplexed subunits (Hyde et al., 1988; Yanofsky, 1960). The use of rapid quench methods combined with channel-impaired tryptophan synthase mutants provides additional direct evidence for a substrate channeling mechanism (Anderson et al., 1991; Anderson et al., 1995; Schlichting et al., 1994). Multienzyme systems in primary metabolism are not limited to the plant kingdom. Single celled organisms have evolved several types of polyhedral protein-based organelles in which highly organized enzyme sequestration is employed for carbon fixation, and 1,2-propanediol and ethanolamine degradation (Bobik, 2006). Of these organelle activities, the carboxysome, which can achieve

N H

Figure 11.2 Tryptophan synthase αββα complex. A physical tunnel of 25 Å in length (dotted lines) channels indole (C8H7N) from the active site of the α subunit (dark gray) to that of the β subunit (light gray). Each α/β pair act as a functional unit. (Adapted from Hyde, C.C., Ahmed, S.A., Padlan, E.A., Miles, E.W., and Davies, D.R., J. Biol. Chem., 263: 17857–17871, 1988.)

Engineering Multifunctional Enzyme Systems

11-5

rudimentary compartmentalization for CO2 fixation at low CO2 levels, represents the first (Gantt and Conti, 1969) and best-documented example of macromolecular protein organelles in bacteria, notably chemoautotrophs (e.g., H. neopolitanus) and cyanobacteria (e.g., Synechocystis). Here, interacting coat proteins, in the absence of any lipid bilayer, encapsulate the enzymatic activity of the carboxysome, forming complexes approaching 200 nm in diameter. Crystal structure analysis of two such coat proteins from Synechocystis carboxysomes, CcmK4 and CcmK2, revealed a hexamer structure, which when packed into sheets leave only 6 Å gaps between enzyme subunits (Kerfeld et al., 2005). Under low CO2 levels, these hexamers form capsids that enclose most, if not all, of cellular ribulose bisphosphate carboxylase oxygenase (RuBisCO). RuBisCO activity fixes carbon through the incorporation of CO2 into ribulose bisphosphate to form 3-phosphoglycerate (Bobik, 2006), which can then be consumed through glycolysis. Furthermore, CcmK4 and CcmK2 coat proteins have evolved a central pore, critical to controlling metabolite flow, rather than merely serving to localize carbon fixing activity in RuBisCO (Kerfeld et al., 2005). In a final example, the urea cycle, which spans the cytoplasmic and mitochondrial domains, provides a compelling case for enzyme compartmentalization along this multienzyme pathway. Radioactive labeling experiments, both from permeabilized cells and in situ, revealed a high degree of channeling between argininosuccinate synthetase, argininosuccinate lyase, and arginase, the three cytoplasmic reactions in the pathway (Cheung et al., 1989). Mathematical simulations validated these hypotheses as this reaction network was modeled with varying degrees of channeling between the cytoplasmic enzymes. The simulation results from the fully channeled case matched the experimental metabolite concentrations most accurately, from the radioactive labeling experiments, while results deviated as the simulated channeled system became increasingly leaky between the cytoplasmic enzymes (Maher et al., 2003).

11.4 Metabolic Channeling in Secondary Metabolism The underpinnings of metabolic channels come from observations over the past 50 years of primary metabolism in microorganisms, plants, and animals; however, more recent studies have documented that multienzyme systems are also important in secondary metabolism. In contrast to the high-affinity protein interactions typical of the Calvin cycle and the tryptophan biosynthetic machinery, channeling in secondary metabolism is often characterized by dynamic, low-affinity enzyme interactions. An illustrative example of this comes from bacteria where chemically and structurally diverse polyketide molecules are synthesized by modular multifunctional polyketide synthases, highly organized enzyme structures that are tethered to cell membranes (Cane et al., 1998; Pfeifer and Khosla, 2001). Similarly, in plant secondary metabolism, numerous studies have revealed the existence of metabolic channeling and compartmentalization in secondary product formation (Jorgensen et al., 2005). Examples include the biosynthesis of isoprenoids (Leivar et al., 2005), alkaloids (Panicot et al., 2002), flavonoids (Burbulis and Winkel-Shirley, 1999; Winkel-Shirley, 1999), cyanogenic glucosides (Kristensen et al., 2005), and phenylpropanoids (Achnine et al., 2004; Winkel-Shirley, 1999), with the latter two products representing the best-documented cases. Cyanogenic glucosides, namely dhurrin, represent a class of secondary compounds produced by a number of plant species for protection against herbivores. In Sorghum bicolor, this membrane-bound, multifunctional enzyme system forms an efficient metabolon whereby only three enzymes are necessary to carry out the seven catalytic steps for production of dhurrin from L-tyrosine (Kahn et al., 1999). Channeling behavior in this pathway has long been suspected as early studies of the dhurrin metabolon were hindered by the low amount of pathway intermediates present in the enzyme reaction mixtures (Moller and Conn, 1979). Radioactive labeling experiments confirmed this hypothesis as dhurrin production was primarily derived from tyrosine, even in the presence of saturating intermediates (Moller and Conn, 1980). According to the proposed reaction mechanism, the first two enzymes of this system, CYP79A1 and CYP71E1, individually exhibit multifunctional qualities, each catalyzing four and two

11-6

Gene Expression Tools for Metabolic Pathway Engineering

reactions, respectively. This behavior was evidenced both endogenously in S. bicolor and in recombinant E. coli clones (Bak et al., 1998; Halkier et al., 1995; Kahn et al., 1999). The channeling properties of this system were not fully appreciated until the S. bicolor three-enzyme pathway was introduced into and studied in Arabidopsis thaliana. In the engineered plant lines carrying only the first two enzymes for dhurrin biosynthesis, a stunted phenotype resulted from accumulation of the toxic intermediate, p-hydroxymandelonitrile. When the third enzyme of the dhurrin pathway was introduced into these transgenic plants, the normal growth phenotype was restored, dhurrin accumulated up to 4% by dry weight, and toxic byproducts were no longer detectable within the plant material (Kristensen et al., 2005). These dramatic results provide compelling evidence for the efficient coupling between the individual enzymes in dhurrin biosynthesis. Another large set of secondary molecules, phenylpropanoids, constitutes one of the best-studied metabolic systems in plants. Various membrane-bound multienzyme complexes, with similar basic architecture, can assemble to transform phenylalanine into several classes of valuable chemical compounds needed by the cell (Figure 11.3). Despite the existence of several branch points in the phenylpropanoid pathway, the first two enzymes, phenylalanine ammonia-lyase (PAL) and cinnamate 4-hydroxylase (C4H), are required for phenylalanine conversion to p-coumaric acid which can then be used to produce lignins, esters, flavonols, tannins, and anthocyanins. Early radioactive labeling experiments of cucumber microsomal fractions suggested channeling activity between the first two required enzymes, PAL and C4H. These studies demonstrated that the channeled intermediate, [14C]cinnamic acid, was a less efficient precursor than the upstream substrate, [3H]phenylalanine, in the production of p-coumaric acid (Czichi and Kindl, 1977). More recent experiments in transgenic tobacco plants have confirmed the loose enzymatic interaction between PAL and C4H through FRET analysis and have identified membrane association of these two enzymes through GFP fusions to PAL (Achnine et al., 2004). These results confirmed earlier cell centrifugation experiments where PAL, C4H, and chalcone synthase (CHS) activity was observed within the ER marker fraction (Wagner and Hrazdina, 1984). Similarly,

C4H 4CL

PAL1

Phenylalanine

CHS

F3’H CHI

DFR

F3H

LAR Proanthocyanidins LDOX BAN

Malonyl CoA

Anthocyanins

C4H PAL1

Phenylalanine

4CL

CHS

Malonyl CoA

F3’H CHI

FLS2 Quercetin (flavonol)

F3H FLS1

Kaempferol (flavonol)

Figure 11.3 Proposed model of phenylpropanoid metabolism in plants. Several ER membrane bound enzymes allow multienzyme complex assembly for metabolite transfer between sequential reaction steps. Key branch points exist resulting from a common set of intermediates between the specific phenylpropanoid pathways. (Adapted from Winkel, B.S., Annu. Rev. Plant Biol., 55: 85–107, 2004.)

Engineering Multifunctional Enzyme Systems

11-7

i mmunolocalization studies demonstrated that this membrane complex is tethered at the cytosolic side by several membrane bound protein anchors, including the second enzyme in the pathway, namely C4H, as well as one of the later enzymes in the production of flavonols and anthocyanins, F3’H (Hrazdina, 1992). The final evidence for direct interaction between several of the sequential pathway enzymes was confirmed through both yeast two-hybrid and coimmunoprecipitation studies, which demonstrated that the flavonoid enzymes assemble as a macromolecular complex with contacts between multiple proteins including CHS, chalcone isomerase (CHI), and dihydroflavonol 4-reductase (DFR) (Burbulis and Winkel-Shirley, 1999). The combination of these identified protein–protein interactions in addition to known membrane-bound proteins have led to widely accepted models for the multienzyme organization within the various phenylpropanoid pathways.

11.5 Advantages Conferred from Multifunctional Enzyme Systems There are at least six advantages associated with both the partial or complete assembly of a metabolic pathway into a multi-protein metabolon. First, one of the most significant and direct implications of metabolic channel formation is that sequential active sites are brought into close proximity, which serves both to decrease the intermediate transit time and to increase the catalytic efficiency that results from a net decrease in the Km value for channeled substrates. This is seen most strongly with the tryptophan synthase channel, where direct substrate channeling increases the reaction rates by one to two orders of magnitude over the uncomplexed enzymes (Hyde et al., 1988; Yanofsky, 1960). Second, metabolic channels offer cells a further opportunity to relieve the kinetic constraints within the dynamic intracellular environment. Despite the numerous and complex pathways inside the cytoplasm, the cell maintains its organization through thousands of enzymes that regulate these metabolic intermediates. Of the common metabolic reactions, about 80% of the cellular intermediates have only one function in the cell as they are passed between enzymes (Srere, 1987). Thus, channeling could specifically reduce the cellular levels of intermediates in these reaction networks, while keeping the effective local substrate concentrations high. Thirdly, cellular pathways oftentimes include toxic or unstable intermediates that require swift conversion to ensure organism survival. For instance, in the absence of channeling, dhurrin metabolism was observed to accumulate a toxic intermediate that caused a severe growth defect in A. thaliana (Kristensen et al., 2005). Fourth, channeling can physically prevent inhibitory compounds from accessing and deactivating the active site, and thus can increase the effective kcat of the multienzyme complex. For example, the AROM multifunctional enzyme coordinately protects each of the five active sites from proteolytic inactivation by the first substrate (Welch and Gaertner, 1980). A fifth advantage of multifunctional enzymes within cells is that direct channeling of a particular substrate can prevent metabolic cross talk between competing pathways. This phenomenon is evidenced in several of the metabolons described above, including the dhurrin metabolon, where the first intermediate, Z-p-hydroxyphenylacetaldoxime, can be converted to various glucosinates from a competing pathway in the absence of channeling; this phenotype disappears in the presence of the metabolon (Kholodenko et al., 1996). Finally, sixth dynamic multienzyme complexes, which can rapidly and reversibly assemble and disassemble, allow for differential control over the synthesis of secondary products in response to the intra- and extracellular environment. For instance, the PAL isozymes (part of the plant phenylpropanoid pathway, see above) are differentially expressed in response to stress and light, a phenomenon that likely reflects the highly adaptable nature of the phenylpropanoid enzyme complex and provides alternative branch points that allow for the production of an array of secondary compounds (Winkel-Shirley, 1999).

11.6 Engineering Multifunctional Enzymes Metabolic channels can be recast as multifunctional enzymes, and engineering of these enzyme systems has followed the trajectory of basic multifunctional enzymes in nature, with the tryptophan synthase metabolon (detailed above) standing out as an important example. In recent years, many groups have

11-8

Gene Expression Tools for Metabolic Pathway Engineering

attempted to mimic this bifunctionality by engineering C-to-N-terminal fusions between two proteins which are normally not fused (Beaujean et al., 2000; Lindbladh et al., 1994; Shatalin et al., 1999; Tian and Dixon, 2006). These fusions are chimeric proteins with domains comprising two functional enzymes, usually joined by a short linker sequence. The use of fusion proteins is not a new technology (Gausing, 1977), but a recent development is the analysis of the kinetic properties of bifunctional fusion enzymes compared to their individual kinetic properties. For instance, Lindbladh and colleagues (1994) reported that a fusion of two yeast enzymes showed a two-fold decrease in each Km and a marked resistance to inhibition by an intermediate-scavenger when compared to the two free enzymes. In another work, a bifunctional fusion enzyme was found to produce a metabolic product (genistein) not observed in the case of free enzymes, demonstrating that the bifunctional enzyme decreased diversion of metabolic flux by acting on an otherwise transient intermediate (Tian and Dixon, 2006). A final example comes from Bulow and coworkers who engineered a bifunctional enzyme that contributed to the recycling of cofactors (Prachayasittikul et al., 2006). In that work, a chimeric enzyme that made use of the cofactors NAD and NADH was shown to have up to a two-fold higher recycling rate of cofactors than the respective free enzymes. Collectively, the analyses of these simple engineered fusion proteins demonstrate the advantages of bifunctional enzymes engineered by C-to-N terminal fusions. It is noteworthy, however, that expression of engineered fusion proteins can be problematic, especially in bacteria, as misfolding can arise from nonproductive interactions between the non-natural enzymes (Chang et al., 2005; Frydman et al., 1999; Netzer and Hartl, 1997). Furthermore, there is no guarantee that two translationally fused enzymes will retain their individual activities in the context of the fusion protein. Indeed, many fusion proteins result in one or both of the linked enzymes losing partial or complete functionality (our unpublished observations). Other approaches have been used to engineer bifunctional proteins. One technique, known as domain insertion, was employed by Betton and coworkers (Betton et al., 1997). The protein sequence of TEM-1 β-lactamase was inserted into that of the E. coli maltose binding protein at sites known to be permissive to amino acid insertions. The resulting protein exhibited both maltose binding activity and β-lactamase activity identical to the wild type (free) proteins. Also, maltose was shown to stabilize the active site of the β-lactamase domain. This observation served as evidence that fusion proteins need not always act independently of one another. In fact, many attempts have since been made to couple the functions of two proteins, thereby creating a protein “switch.” Progress toward the creation of engineered allosteric enzymes with switch-like behavior has been made using directed evolution and rational design. Domain insertion has been widely used to rationally engineer control of protein activity (Buskirk et al., 2004; Guntas et al., 2004; Skretas and Wood, 2005) and a review of domain insertion-based strategies for regulating protein function can be found in Ostermeier (2005). Rational protein engineering was also used by Skretas and Wood (2005) to create self-cleaving inteins that are inducible by small molecules. By inserting hormone-binding domains into an engineered intein domain, dose-dependent changes in intein cleavage activity were observed. Finally, computational methods have been instrumental in guiding the construction of bifunctional enzymes as evidenced by the pioneering studies of Hellinga and colleagues who redesigned ligand-binding proteins for new substrates in order to create novel biosensors (Looger et al., 2003; Marvin and Hellinga, 2001). At the same time, the use of directed evolution techniques, whereby combinatorial libraries of protein switches are screened via high-throughput assays, have given rise to potent bifunctional enzymes. For instance, small-molecule dependent inteins similar to those of engineered by Skretas and Wood were also isolated using directed evolution techniques (Buskirk et al., 2004). Another particularly noteworthy case comes from Ostermeier and coworkers who created a family of engineered allosteric enzymes (β-lactamases with activity regulated by the presence or absence of maltose) using a combination of domain insertion and circular permutation techniques (Guntas et al., 2005). By following a repeated algorithm of random domain insertion (inserting the β-lactamase domain randomly throughout the maltose-binding domain), screening, and random circular permutation of β-lactamase at favorable domain insertion points, they were able to create switches with up to 600 fold increases in catalytic activity in the presence of maltose

Engineering Multifunctional Enzyme Systems (a)

11-9

(e)

(b)

(c)

(f )

(d)

Figure 11.4 (See color insert following page 13-20.) Influencing metabolic pathways via the addition of small molecule effectors. (a) Substrate and product molecule resulting from the presence or absence of (b) small molecule effectors that influence (c) active and (d) inactive enzymes within (e, f) metabolic pathways engineered for function that is coupled to ligand-binding events.

compared to activity in the absence of maltose. They also were able to alter the binding specificity of the switch, creating a sucrose-activated β-lactamase. Thus, using the molecular biological tools provided by experimental and computational rational design along with directed evolution, protein activity in several cases has been shown to be easily controlled in a ligand-dependent manner. While it remains to be explicitly demonstrated, the importance of controlling protein function in the context of metabolic pathway engineering cannot be overstated. Facile methods of diverting metabolic flux through a pathway (e.g., by adding a small molecule or changing temperature or pH) could lead to greatly increased metabolic control. Moreover, coupling the functions of multiple metabolic enzymes using multiple small molecule effectors could greatly influence the efficiency of a pathway (Figure 11.4). Thus, one can expect that multifunctional enzymes such as engineered protein switches will likely play a vital role in the future of metabolic engineering.

11.7 Engineering Metabolic Channels de Novo When thinking about the potential of metabolic channeling in a cellular engineering context, it is instructional to consider the experiments of Moller and colleagues who demonstrated that the S. bicolor dhurrin metabolon maintains its channeling behavior when this three enzyme system is introduced into A. thaliana (Kristensen et al., 2005; Moller and Conn, 1979, 1980). This fortuitous and remarkable result begs the question whether metabolic pathways, assembled from novel enzymes that have never coexisted in nature, can be intentionally engineered to exhibit channeling. Such a goal of building metabolic channels de novo will minimally require robust methods for in vivo assembly of enzymes either statically (covalently tethered enzymes) or dynamically (noncovalently tethered enzymes). Unfortunately, while many remarkable examples of assembling and engineering bifunctional enzymes

11-10

Gene Expression Tools for Metabolic Pathway Engineering

have been demonstrated, as detailed above, the ability to move beyond two-protein systems is currently hampered by a lack of tools for assembling protein complexes comprised of more than two proteins. To this point, engineering of multienzyme systems has focused on techniques to genetically fuse or insert catalytic domains rather than the more challenging task of post-translational assembly of threedimensional enzyme structures in vivo, as is the case for naturally occurring metabolic machinery. However, translational fusion of more than two proteins is rather impractical since even the simple fusion of two proteins can pose significant challenges from an expression and folding standpoint (Marston, 1986; Netzer and Hartl, 1997). Fortunately, the molecular toolkit available to the metabolic engineer has expanded greatly over the last decade. Indeed, one can envision the use of protein interacting domains (IDs) as an enabling technology for the post-translational assembly of dynamic enzyme channels. Known for their high affinity interaction with one another, leucine zippers remain one of the best-studied examples of protein IDs. Several investigators have employed the characteristic homo- and heterodimerization between leucine zippers, such as c-Jun and c-Fos, to create artificial protein interactions in vivo (Pandya et al., 2004; Sellers and Struhl, 1989). It is also noteworthy that this artificial assembly of peptide complexes can have phenotypic consequences. For instance, by genetically fusing known IDs of certain proteins, specifically a single chain Fv (scFv) antibody fragment and its corresponding antigen, Mayer and colleagues were able to mandate specific protein assembly in vivo while excluding endogenous interactions (Fujiwara et al., 2002). Since interacting peptide domains of variable affinity are ubiquitous in nature, their continued discovery and pursuit will permit well-defined de novo metabolon creation for optimizing metabolic pathway kinetics. (a)

(b)

(c)

(d) Periplasm

Cytoplasm

NAD+ NADH ADP+Pi

ATP

Figure 11.5 (See color insert following page 13-20.) Proposed model for engineering bacterial multienzyme systems between sequential reactions. One approach to the engineering of dynamic metabolic channels is through the use of a library of generic interacting domains (a) to directionally tether specific recombinant enzymes to one another (b, c) in order to produce any metabolic product of interest. Efficiently directing cellular substrates into metabolons will lead to the creation of engineered metabolic machines in bacteria (d) that mimic, and even rival, those found in higher organisms such as plants.

Engineering Multifunctional Enzyme Systems

11-11

11.8 Concluding Remarks A longstanding and overarching goal of metabolic engineering is to assemble efficient metabolic pathways in robust production hosts. Since the toolkit for metabolic and biomolecular engineering has evolved to a state where protein structures can be readily optimized for activity (Endelman et al., 2004; Joo et al., 1999; Otey et al., 2006), solubility (Fisher et al., 2006; Pedelacq et al., 2002; Waldo et al., 1999), and a broad host of other features, it is time to consider enzyme function beyond the context of its substrate and start to take into account the spatial organization of each enzyme in relation to the pathway in which it participates. In the future, we envision an alphabet of IDs (or post-translational covalent linkages such as disulfide bond formation), such that enzymes from any host can be sequentially and posttranslationally organized into a high flux pathway for the production of a particular substrate of interest (Figure 11.5A through C). The ultimate goal stands to achieve fully competent bacterial factories where the substrate is transported into the cell, directly to an efficient assembly line of sequential enzymes, or metabolic channel, whereby the final product is pumped back into the media (Figure 11.5D). While the final achievement of this vision remains far in the distance, immediate opportunities exist for engineering multifunctional enzyme systems whereby we can fully realize the ancient maxim: “the whole is greater than the sum of the parts.”

References Achnine, L., Blancaflor, E.B., Rasmussen, S., and Dixon, R.A. 2004. Colocalization of L-phenylalanine ammonia-lyase and cinnamate 4-hydroxylase for metabolic channeling in phenylpropanoid biosynthesis. Plant Cell., 16: 3098–3109. Aevarsson, A., Seger, K., Turley, S., Sokatch, J.R., and Hol, W.G. 1999. Crystal structure of 2oxoisovalerate and dehydrogenase and the architecture of 2-oxo acid dehydrogenase multienzyme complexes. Nat. Struct. Biol., 6: 785–792. Anderson, K.S., Miles, E.W., and Johnson, K.A. 1991. Serine modulates substrate channeling in tryptophan synthase. A novel intersubunit triggering mechanism. J. Biol. Chem., 266: 8020–8033. Anderson, K.S., Kim, A.Y., Quillen, J.M., Sayers, E., Yang, X.J., and Miles, E.W. 1995. Kinetic characterization of channel impaired mutants of tryptophan synthase. J. Biol. Chem., 270: 29936–29944. Anderson, L.E., Gibbons, J.T., and Wu, X. 1996. Distribution of 10 enzymes of carbon metabolism in pea (Pisum sativum) chroloplasts. Int. J. Plant Sci., 157: 525–538. Arrio-Dupont, M., Foucault, G., Vacher, M., Devaux, P.F., and Cribier, S. 2000. Translational diffusion of globular proteins in the cytoplasm of cultured muscle cells. Biophys. J. 78: 901–907. Bak, S., Kahn, R.A., Nielsen, H.L., Moller, B.L., and Halkier, B.A. 1998. Cloning of three A-type cytochromes P450, CYP71E1, CYP98, and CYP99 from Sorghum bicolor (L.) Moench by a PCR approach and identification by expression in Escherichia coli of CYP71E1 as a multifunctional cytochrome P450 in the biosynthesis of the cyanogenic glucoside dhurrin. Plant Mol. Biol., 36: 393–405. Beaujean, A., Ducrocq-Assaf, C., Sangwan, R.S., Lilius, G., Bulow, L., and Sangwan-Norreel, B.S. 2000. Engineering direct fructose production in processed potato tubers by expressing a bifunctional alpha-amylase/glucose isomerase gene complex. Biotechnol. Bioeng. 70: 9–16. Beeckmans, S. and Kanarek, L. 1987. Enzyme-enzyme interactions as modulators of the metabolic flux through the citric acid cycle. Biochem. Soc. Symp., 54: 163–172. Betton, J.M., Jacob, J.P., Hofnung, M., and Broome-Smith, J.K. 1997. Creating a bifunctional protein by insertion of beta-lactamase into the maltodextrin-binding protein. Nat. Biotechnol., 15: 1276–1279. Bobik, T.A. 2006. Polyhedral organelles compartmenting bacterial metabolic processes. Appl. Microbiol. Biotechnol., 70: 517–525. Burbulis, I.E. and Winkel-Shirley, B. 1999. Interactions among enzymes of the Arabidopsis flavonoid biosynthetic pathway. Proc. Natl. Acad. Sci. USA, 96: 12929–12934.

11-12

Gene Expression Tools for Metabolic Pathway Engineering

Buskirk, A.R., Ong, Y.C., Gartner, Z.J., and Liu, D.R. 2004. Directed evolution of ligand dependence: small-molecule-activated protein splicing. Proc. Natl. Acad. Sci. USA, 101: 10505–10510. Cane, D.E., Walsh, C.T., and Khosla, C. 1998. Harnessing the biosynthetic code: combinations, permutations, and mutations. Science, 282: 63–68. Chang, H.C., Kaiser, C.M., Hartl, F.U., and Barral, J.M. 2005. De novo folding of GFP fusion proteins: high efficiency in eukaryotes but not in bacteria. J. Mol. Biol., 353: 397–409. Charles, I.G., Keyte, J.W., Brammar, W.J., Smith, M., and Hawkins, A.R. 1986. The isolation and nucleotide sequence of the complex AROM locus of Aspergillus nidulans. Nucleic Acids Res., 14: 2201–2213. Cheung, C.W., Cohen, N.S., and Raijman, L. 1989. Channeling of urea cycle intermediates in situ in permeabilized hepatocytes. J. Biol. Chem., 264: 4038–4044. Cornish-Bowden, A. 1991a. How much effect on free metabolite concentrations does channelling have? J. Theor. Biol., 152: 39–40. Cornish-Bowden, A. 1991b. Failure of channelling to maintain low concentrations of metabolic intermediates. Eur. J. Biochem., 195: 103–108. Cornish-Bowden, A. and Cardenas, M.L. 1993. Channelling can affect concentrations of metabolic intermediates at constant net flux: artefact or reality? Eur. J. Biochem., 213: 87–92. Czichi, U. and Kindl, H. 1977. Phenylalanine ammonia-lyase and cinnamic acid hydrolase as assembled consecutive enzymes on microsomal membranes of cucumber cotyledons: Co-operation and subcellular distribution. Planta, 134: 133–143. Degenring, D., Rohl, M., and Uhrmacher, A.M. 2004. Discrete event, multi-level simulation of metabolite channeling. Biosystems, 75: 29–41. Douce, R., Bourguignon, J., Neuburger, M., and Rebeille, F. 2001. The glycine decarboxylase system: a fascinating complex. Trends Plant Sci., 6: 167–176. Eggers, D.K. and Valentine, J.S. 2001. Crowding and hydration effects on protein conformation: a study with sol-gel encapsulated proteins. J. Mol. Biol., 314: 911–922. Ellis, R.J. 2001. Macromolecular crowding: obvious but underappreciated. Trends Biochem. Sci., 26: 597–604. Ellis, R.J. and Minton, A.P. 2003. Cell biology: join the crowd. Nature, 425: 27–28. Elowitz, M.B., Surette, M.G., Wolf, P.E., Stock, J.B., and Leibler, S. 1999. Protein mobility in the cytoplasm of Escherichia coli. J. Bacteriol., 181: 197–203. Endelman, J.B., Silberg, J.J., Wang, Z.G., and Arnold, F.H. 2004. Site-directed protein recombination as a shortest-path problem. Protein Eng Des. Sel., 17: 589–594. Fields, S. and Song, O. 1989. A novel genetic system to detect protein-protein interactions. Nature, 340: 245–246. Fisher, A.C., Kim, W., and DeLisa, M.P. 2006. Genetic selection for protein solubility enabled by the folding quality control feature of the twin-arginine translocation pathway. Protein Sci., 15: 449–458. Frydman, J., Erdjument-Bromage, H., Tempst, P., and Hartl, F.U. 1999. Co-translational domain folding as the structural basis for the rapid de novo folding of firefly luciferase. Nat. Struct. Biol., 6: 697–705. Fujiwara, K., Poikonen, K., Aleman, L., Valtavaara, M., Saksela, K., and Mayer, B.J. 2002. A single-chain antibody/epitope system for functional analysis of protein-protein interactions. Biochemistry, 41: 12729–12738. Fulton, A.B. 1982. How crowded is the cytoplasm? Cell, 30: 345–347. Fushimi, K. and Verkman, A.S. 1991. Low viscosity in the aqueous domain of cell cytoplasm measured by picosecond polarization microfluorimetry. J. Cell Biol., 112: 719–725. Gantt, E. and Conti, S.F. 1969. Ultrastructure of blue-green algae. J. Bacteriol., 97: 1486–1493. Gausing, K. 1977. Regulation of ribosome production in Escherichia coli: synthesis and stability of ribosomal RNA and of ribosomal protein messenger RNA at different growth rates. J. Mol. Biol., 115: 335–354. Gontero, B., Cardenas, M.L., and Ricard, J. 1988. A functional five-enzyme complex of chloroplasts involved in the Calvin cycle. Eur. J. Biochem., 173: 437–443.

Engineering Multifunctional Enzyme Systems

11-13

Gontero, B., Mulliert, G., Rault, M., Giudici-Orticoni, M.T., and Ricard, J. 1993. Structural and functional properties of a multi-enzyme complex from spinach chloroplasts. 2. Modulation of the kinetic properties of enzymes in the aggregated state. Eur. J. Biochem., 217: 1075–1082. Graciet, E., Gans, P., Wedel, N., Lebreton, S., Camadro, J.M., and Gontero, B. 2003. The small protein CP12: a protein linker for supramolecular complex assembly. Biochemistry, 42: 8163–8170. Green, D. 1957. Studies in organized enzyme systems. The Harvey Lectures 52: 177–227. Guntas, G., Mitchell, S.F., and Ostermeier, M. 2004. A molecular switch created by in vitro recombination of nonhomologous genes. Chem. Biol., 11: 1483–1487. Guntas, G., Mansell, T.J., Kim, J.R., and Ostermeier, M. 2005. Directed evolution of protein switches and their application to the creation of ligand-binding proteins. Proc. Natl. Acad. Sci. USA, 102: 11224–11229. Halkier, B.A., Nielsen, H.L., Koch, B., and Moller, B.L. 1995. Purification and characterization of recombinant cytochrome P450TYR expressed at high levels in Escherichia coli. Arch. Biochem. Biophys., 322: 369–377. Hartl, F.U. and Hayer-Hartl, M. 2002. Molecular chaperones in the cytosol: from nascent chain to folded protein. Science, 295: 1852–1858. Hawkins, A.R. 1987. The complex Arom locus of Aspergillus nidulans. Evidence for multiple gene fusions and convergent evolution. Curr. Genet., 11: 491–498. Hawkins, A.R. and Smith, M. 1991. Domain structure and interaction within the pentafunctional arom polypeptide. Eur. J. Biochem., 196: 717–724. Hawkins, A.R., Moore, J.D., and Lamb, H.K. 1993. The molecular biology of the pentafunctional AROM protein. Biochem. Soc. Trans., 21: 181–186. Houben, K.F. and Dunn, M.F. 1990. Allosteric effects acting over a distance of 20–25 A in the Escherichia coli tryptophan synthase bienzyme complex increase ligand affinity and cause redistribution of covalent intermediates. Biochemistry, 29: 2421–2429. Hrazdina, G. (1992) Compartmentation in aromatic metabolism. In: Recent Advances in Phytochemistry. Vol. 26. Stafford, H. and Ibrahim, R. (Eds). New York: Plenum Press, 1–23. Hyde, C.C., Ahmed, S.A., Padlan, E.A., Miles, E.W., and Davies, D.R. 1988. Three-dimensional structure of the tryptophan synthase alpha 2 beta 2 multienzyme complex from Salmonella typhimurium. J. Biol. Chem., 263: 17857–17871. Ishikawa, M., Tsuchiya, D., Oyama, T., Tsunaka, Y., and Morikawa, K. 2004. Structural basis for channelling mechanism of a fatty acid beta-oxidation multienzyme complex. EMBO J., 23: 2745–2754. Joo, H., Lin, Z. and Arnold, F.H. 1999. Laboratory evolution of peroxide-mediated cytochrome P450 hydroxylation. Nature, 399: 670–673. Jorgensen, K., Rasmussen, A.V., Morant, M., Nielsen, A.H., Bjarnholt, N., Zagrobelny, M., Bak, S., and Moller, B.L. 2005. Metabolon formation and metabolic channeling in the biosynthesis of plant natural products. Curr. Opin. Plant Biol., 8: 280–291. Kahn, R.A., Fahrendorf, T., Halkier, B.A., and Moller, B.L. 1999. Substrate specificity of the cytochrome P450 enzymes CYP79A1 and CYP71E1 involved in the biosynthesis of the cyanogenic glucoside dhurrin in Sorghum bicolor (L.) Moench. Arch. Biochem. Biophys., 363: 9–18. Kao, H.P., Abney, J.R., and Verkman, A.S. 1993. Determinants of the translational mobility of a small solute in cell cytoplasm. J. Cell Biol., 120: 175–184. Kempner, E.S. and Miller, J.H. 1968. The molecular biology of Euglena gracilis. V. Enzyme localization. Exp. Cell Res., 51: 150–156. Kerfeld, C.A., Sawaya, M.R., Tanaka, S., Nguyen, C.V., Phillips, M., Beeby, M., and Yeates, T.O. 2005. Protein structures forming the shell of primitive bacterial organelles. Science, 309: 936–938. Kholodenko, B.N., Westerhoff, H.V., and Cascante, M. 1996. Effect of channelling on the concentration of bulk-phase intermediates as cytosolic proteins become more concentrated. Biochem. J., 313 (3): 921–926.

11-14

Gene Expression Tools for Metabolic Pathway Engineering

Kristensen, C., Morant, M., Olsen, C.E., Ekstrom, C.T., Galbraith, D.W., Moller, B.L., and Bak, S. 2005. Metabolic engineering of dhurrin in transgenic Arabidopsis plants with marginal inadvertent effects on the metabolome and transcriptome. Proc. Natl. Acad. Sci. USA, 102: 1779–1784. Lamb, H.K., van den Hombergh, J.P., Newton, G.H., Moore, J.D., Roberts, C.F., and Hawkins, A.R. 1992. Differential flux through the quinate and shikimate pathways. Implications for the channelling hypothesis. Biochem. J., 284 (1): 181–187. Leivar, P., Gonzalez, V.M., Castel, S., Trelease, R.N., Lopez-Iglesias, C., Arro, M., Boronat, A., Campos, N., Ferrer, A., and Fernandez-Busquets, X. 2005. Subcellular localization of Arabidopsis 3-hydroxy-3methylglutaryl-coenzyme A reductase. Plant Physiol., 137: 57–69. Lindbladh, C., Rault, M., Hagglund, C., Small, W.C., Mosbach, K., Bulow, L., Evans, C., and Srere, P.A. 1994. Preparation and kinetic characterization of a fusion protein of yeast mitochondrial citrate synthase and malate dehydrogenase. Biochemistry, 33: 11692–11698. Looger, L.L., Dwyer, M.A., Smith, J.J., and Hellinga, H.W. 2003. Computational design of receptor and sensor proteins with novel functions. Nature, 423: 185–190. Luby-Phelps, K., Mujumdar, S., Mujumdar, R.B., Ernst, L.A., Galbraith, W., and Waggoner, A.S. 1993. A novel fluorescence ratiometric method confirms the low solvent viscosity of the cytoplasm. Biophys. J., 65: 236–242. Maher, A.D., Kuchel, P.W., Ortega, F., de Atauri, P., Centelles, J., and Cascante, M. 2003. Mathematical modelling of the urea cycle. A numerical investigation into substrate channelling. Eur. J. Biochem., 270: 3953–3961. Malaisse, W., Zhang, Y., and Sener, A. 2004. Enzyme-to-enzyme channeling in the early steps of glycolysis. Endocrine, 24: 105–109. Marston, F.A. 1986. The purification of eukaryotic polypeptides synthesized in Escherichia coli. Biochem. J., 240: 1–12. Marvin, J.S. and Hellinga, H.W. 2001. Conversion of a maltose receptor into a zinc biosensor by computational design. Proc. Natl. Acad. Sci. USA, 98: 4955–4960. Matchett, W.H. 1974. Indole channeling by tryptophan synthase of neurospora. J. Biol. Chem., 249: 4041–4049. Mendes, P., Kell, D.B., and Westerhoff, H.V. 1992. Channelling can decrease pool size. Eur. J. Biochem., 204: 257–266. Minton, A.P. 2001. The influence of macromolecular crowding and macromolecular confinement on biochemical reactions in physiological media. J. Biol. Chem., 276: 10577–10580. Moller, B.L. and Conn, E.E. 1979. The biosynthesis of cyanogenic glucosides in higher plants. N-Hydroxytyrosine as an intermediate in the biosynthesis of dhurrin by Sorghum bicolor (Linn) Moench. J. Biol. Chem., 254: 8575–8583. Moller, B.L. and Conn, E.E. 1980. The biosynthesis of cyanogenic glucosides in higher plants. Channeling of intermediates in dhurrin biosynthesis by a microsomal system from Sorghum bicolor (linn) Moench. J. Biol. Chem., 255: 3049–3056. Netzer, W.J. and Hartl, F.U. 1997. Recombination of protein domains facilitated by co-translational folding in eukaryotes. Nature, 388: 343–349. Ostermeier, M. 2005. Engineering allosteric protein switches by domain insertion. Protein Eng. Des. Sel., 18: 359–364. Otey, C.R., Landwehr, M., Endelman, J.B., Hiraga, K., Bloom, J.D., and Arnold, F.H. 2006. Structure-guided recombination creates an artificial family of cytochromes P450. PLoS Biol., 4: e112, 789–798. Ovadi, J. and Srere, P.A. 2000. Macromolecular compartmentation and channeling. Int. Rev. Cytol., 192: 255–280. Pandya, M.J., Cerasoli, E., Joseph, A., Stoneman, R.G., Waite, E., and Woolfson, D.N. 2004. Sequence and structural duality: designing peptides to adopt two stable conformations. J. Am. Chem. Soc., 126: 17016–17024.

Engineering Multifunctional Enzyme Systems

11-15

Panicot, M., Minguet, E.G., Ferrando, A., Alcazar, R., Blazquez, M.A., Carbonell, J., Altabella, T., Koncz, C., and Tiburcio, A.F. 2002. A polyamine metabolon involving aminopropyl transferase complexes in Arabidopsis. Plant Cell, 14: 2539–2551. Pedelacq, J.D., Piltch, E., Liong, E.C., Berendzen, J., Kim, C.Y., Rho, B.S., Park, M.S., Terwilliger, T.C., and Waldo, G.S. 2002. Engineering soluble proteins for structural genomics. Nat. Biotechnol., 20: 927–932. Petersson, G. 1991. No convincing evidence is available for metabolic channeling between enzymes forming dynamic complexes. J. Theor. Biol., 152: 65–69. Pfeifer, B.A. and Khosla, C. 2001. Biosynthesis of polyketides in heterologous hosts. Microbiol. Mol. Biol. Rev., 65: 106–118. Prachayasittikul, V., Ljung, S., Isarankura-Na-Ayudhya, C., and Bulow, L. 2006. NAD(H) recycling activity of an engineered bifunctional enzyme galactose dehydrogenase/lactate dehydrogenase. Int. J. Biol. Sci., 2: 10–16. Rault, M., Giudici-Orticoni, M.T., Gontero, B., and Ricard, J. 1993. Structural and functional properties of a multi-enzyme complex from spinach chloroplasts. 1. Stoichiometry of the polypeptide chains. Eur. J. Biochem., 217: 1065–1073. Schlichting, I., Yang, X.J., Miles, E.W., Kim, A.Y., and Anderson, K.S. 1994. Structural and kinetic analysis of a channel-impaired mutant of tryptophan synthase. J. Biol. Chem., 269: 26591–26593. Schmidt, M., Hanna, J., Elsasser, S., and Finley, D. 2005. Proteasome-associated proteins: regulation of a proteolytic machine. Biol. Chem., 386: 725–737. Sellers, J.W. and Struhl, K. 1989. Changing fos oncoprotein to a jun-independent DNA binding protein with GCN4 dimerization specificity by swapping “leucine zippers”. Nature, 341: 74–76. Shatalin, K., Lebreton, S., Rault-Leonardon, M., Velot, C., and Srere, P.A. 1999. Electrostatic channeling of oxaloacetate in a fusion protein of porcine citrate synthase and porcine mitochondrial malate dehydrogenase. Biochemistry, 38: 881–889. Skretas, G. and Wood, D.W. 2005. Regulation of protein activity with small-molecule-controlled inteins. Protein Sci., 14: 523–532. Srere, P.A. 1987. Complexes of sequential metabolic enzymes. Ann. Rev. Biochem., 56: 89–124. Srere, P.A., Sumegi, B., and Sherry, A.D. 1987. Organizational aspects of the citric acid cycle. Biochem. Soc. Symp., 54: 173–178. Sullivan, D.T., MacIntyre, R., Fuda, N., Fiori, J., Barrilla, J., and Ramizel, L. 2003. Analysis of glycolytic enzyme co-localization in Drosophila flight muscle. J. Exp. Biol., 206: 2031–2038. Suss, K.H., Arkona, C., Manteuffel, R., and Adler, K. 1993. Calvin cycle multienzyme complexes are bound to chloroplast thylakoid membranes of higher plants in situ. Proc. Natl. Acad. Sci. USA, 90: 5514–5518. Tian, L. and Dixon, R.A. 2006. Engineering isoflavone metabolism with an artificial bifunctional enzyme. Planta, 224: 496–507. van den Berg, B., Ellis, R.J., and Dobson, C.M. 1999. Effects of macromolecular crowding on protein folding and aggregation. EMBO J., 18: 6927–6933. Wagner, G. and Hrazdina, G. 1984. Endoplasmic reticulum as a site of phenylpropanoid and flavonoid metabolism in Hippeastrum. Plant Physiol., 74: 901–906. Waldo, G.S., Standish, B.M., Berendzen, J., and Terwilliger, T.C. 1999. Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol., 17: 691–695. Welch, G.R. and Gaertner, F.H. 1980. Enzyme organization in the polyaromatic-biosynthetic pathway: the arom conjugate and other multienzyme systems. Curr. Top Cell Regul., 16: 113–162. Welch, G.R. and Easterby, J.S. 1994. Metabolic channeling versus free diffusion: transition-time analysis. Trends Biochem. Sci., 19: 193–197. Winkel, B.S. 2004. Metabolic channeling in plants. Annu. Rev. Plant Biol., 55: 85–107. Winkel-Shirley, B. 1999. Evidence for enzyme complexes in phenylpropanoid and flavonoid pathways. Physiol. Plant, 107: 142–149.

11-16

Gene Expression Tools for Metabolic Pathway Engineering

Wu, X.M., Gutfreund, H., Lakatos, S., and Chock, P.B. 1991. Substrate channeling in glycolysis: a phantom phenomenon. Proc. Natl. Acad. Sci. USA, 88: 497–501. Yanofsky, C. 1960. The tryptophan synthetase system. Bacteriol. Rev., 24: 221–245. Zalokar, M. 1960. Cytochemistry of centrifuged hyphae of Neurospora. Exp. Cell Res., 19: 114–132. Zimmerman, S.B. and Trach, S.O. 1991. Estimation of macromolecule concentrations and excluded volume effects for the cytoplasm of Escherichia coli. J. Mol. Biol., 222: 599–620.

12 Practical Pathway Engineering— Demonstration in Integrating Tools 12.1 12.2 12.3 12.4

Introduction �� 12-1 Gene Discovery �� 12-2 Protein Engineering �� 12-2 Metabolic Pathway Regulation �� 12-3

Promoters • Regulation of Multiple Genes in Operons Expression Vectors

12.5 Pathway Optimization Using Functional Genomics............... 12-5

Sung Kuk Lee Ulsan National Institute of Science and Technology (UNIST)

Jay D. Keasling University of California

Comparative Genome Analysis • Transcriptome Analysis Proteome Analysis • Metabolome Analysis • Fluxome Analysis

12.6 Improvement of Cellular Properties...........................................12-9 Extension of the Substrate Range • Elimination of Substrate and Product Toxicity • Improvement of Global Regulatory Functions

12.7 Perspective �� 12-10 References �� 12-11

12.1 Introduction Pathway engineering is the use of recombinant DNA technology to modify existing, or to introduce entirely new, metabolic pathways and regulatory systems within cells in order to improve their capacity to overproduce a desired molecule [10,15,26,41,43]. By channeling metabolic pathways in organisms toward a desired metabolite through rational introduction, modification, and removal of genes, a wide range of valuable products can be produced. Unfortunately, many desired compounds are produced only in small amounts in their native or engineered hosts. In the past, the improvement of metabolic pathways has been done primarily by evolutionary breeding methods or repeated rounds of mutagenesis and selection of a desired phenotype [61,66]. Recent advances in recombinant DNA technology allow more rational approaches to be applied to metabolic pathway engineering. Productivity of the desired metabolite can be increased by carefully balancing the expression of the genes and metabolic flux, both within the metabolic pathway and between the pathway and the host’s native metabolism. Pathway engineering also involves improvement of the overall cellular physiology, extension of the host’s substrate range, and deletion or reduction of by-product formation [24]. In this chapter, we review the most important considerations for successful metabolic pathway engineering and illustrate the concepts with selected examples. 12-1

12-2

Gene Expression Tools for Metabolic Pathway Engineering

12.2 Gene Discovery One of the most significant advances in the discovery of genes encoding a desired metabolic product has been the ability to sequence entire microbial genomes as well as the genomes of important plants and animals. The first whole genome sequence of the bacterium Haemophilus influenzae was reported in July 1995 [14]. To date, according to the National Center for Biotechnology Information (NCBI), 540 microbial genomes have been sequenced along with over 800 currently registered as being in progress. The genome sequencing efforts have greatly enhanced our ability to identify genes from different organisms that can work together in order to design and reconstruct metabolic pathways. Despite the ever-increasing number of sequenced genomes and specific genes encoding various enzymes, plant gene sequence information is still so insufficient that cloning of biosynthetic pathway genes for plant-derived natural products is an intensive and time-consuming area of research. One example is the morphine biosynthesis pathway, which consists of more than 15 enzymatic steps. Although all of the enzymes in the pathway have been characterized, the genes encoding only eight of the enzymes have been identified [45]. Because the genes encoding plant metabolic pathways are not located together in the genome, the entire genome must potentially be sequenced to identify all of the genes in the pathway and avoid the time-consuming process of identifying the genes one-by-one.

12.3 Protein Engineering Once the genes encoding the enzymes in a biosynthetic pathway have been identified in the natural producer, one or more of the enzymes may not be functional in the heterologous host [73]. Also, a number of protein properties such as the yield and kinetics, substrate specificity, reaction selectivity, thermal stability, etc. may need to be improved or altered in order to increase target compound production. There is a need to either find or redesign enzymes that will catalyze the production of the desired molecule with the desired kinetics in the heterologous host. Protein engineering is a widely accepted methodology to construct proteins with desired functions in the production host. There are two general strategies for protein engineering. The first is directed evolution [4,51,55], where an existing protein is subjected to random mutagenesis and the resulting mutants are screened for desired qualities. The advantage of directed evolution is that structural knowledge of the target protein is not required. However, a key requirement for the success of this strategy is the availability or the quality of a high-throughput screen (HTS). Since many metabolic products do not fluoresce or have a particular color, or do not provide a selective benefit to the host, HTSs are not available for improved production of the desired products [73]. Thus, laboratory evolution may not always be appropriate or possible for protein engineering. The other protein engineering strategy is rational design (structure-based computational method) based on a detailed knowledge of the structure and function of the protein to identify and make the desired changes to the amino acids sequence to affect the desired change in the protein structure and/or function [34,35,69]. When it works, this approach can be relatively inexpensive and more rapid than laboratory evolution, as screening or selection for the desired activity is not necessary. However, the drawback of this technique is that detailed structural knowledge of a protein is often unavailable, and even when it is available, it can be very difficult to predict the effects of various mutations. These strategies are not mutually exclusive. Indeed, an approach that combines the best of rational design and directed evolution will probably represent the most powerful approach for development of desired enzymes for a metabolic pathway. Moreover, the exponential growth in protein structure information and function as well as advancements in computer programs and HTSs will greatly expand the capabilities of protein engineering. Recently, a newly designed divergent evolution approach [72] has been

12-3

Practical Pathway Engineering—Demonstration in Integrating Tools

proposed by Keasling and coworkers and does not require HTSs. The methodology uses a mathematical model to mimic the mechanisms of divergent molecular evolution of enzymes for creation of new, biosynthetic enzymes. Using this methodology, they constructed seven specific and active terpene synthases, each catalyzing the synthesis of one or a few very different products, from a single, promiscuous terpene synthase (Figure 12.1). When the redesigned enzymes were incorporated into a heterologous host optimized for production of high levels of the terpene precursor, the microbial host was capable of producing the desired final products. Although they focused on a promiscuous enzyme (which may be one of the easiest test cases), the study demonstrates the feasibility of exploiting the evolvability of an enzyme scaffold to design enzymes with more specificity and higher activity or to create new molecules that do not exist in nature. There is great promise that the use of multiple redesigned or new enzymes in novel or reconstructed metabolic pathways will enable the efficient production of natural and unnatural desired compounds.

12.4 Metabolic Pathway Regulation Recently, successful reconstruction of metabolic pathways in microorganisms to produce valuable natural products such as terpenoids [37,54,72,74], polyketides [47] and nonribosomal peptides [70] in metabolically engineered Escherichia coli or Saccharomyces cerevisiae has been reported. In each of

8

W315P

7

4

1

3

13 14 15 16 17 18 Retention time (min) β-bisabolene synthase S484A A336V 3 Y566F M447H

100 80 60 40 5 6 20 0 12 13

Abundance (%)

1

Abundance (%)

Abundance (%)

100 80 60 40 20 0 12

100 80 60 40 20 0 12

3

Abundance (%)

7

Abundance (%)

Abundance (%)

Abundance (%)

3

Abundance (%)

7 100 14 15 16 17 18 80 I562T 13 14 15 16 17 18 A336C Retention time (min) Retention time (min) 60 T445C Another γ-humulene synthase 2 E-β-farnesene/Z,E-α-farnesene synthase S484C Optimized α-ylangene production 40 4 F312Q 2 I562L 100 5 100 7 20 5 6 M339A 1 M565L 80 M447F 80 0 12 13 14 15 16 17 18 60 60 Rentention time (min) 40 40 A336S wild-type M339N 4 3 20 S484C 1 3 20 S484C 1 7 I562V 0 M565I 0 12 13 14 15 16 17 18 12 13 14 15 16 17 18 Retention time (min) Retention time (min) sibirene synthase α-longipinene synthase 4 3 100 100 80 80 60 60 40 40 3 20 5 7 20 5 6 0 0 12 13 14 15 16 17 18 12 13 14 15 16 17 18 Retention time (min) Retention time (min) longifolene synthase γ-humulene synthase 2

Figure 12.1 Divergent evolution of novel sesquiterpene synthases from γ-humulene synthases. Chromatograms show the GC-MS analysis for sesquiterpene production from both wild-type (center) and variants of γ-humulene synthases. All γ-humulene synthase variants were designed based on the systematic remodeling and constructed by site-directed saturation mutagenesis and site-directed mutagenesis. (From Yoshikuni, Y., T. E. Ferrin, and J. D. Keasling, Nature 440, 1078–1082, 2006. With permission.)

12-4

Gene Expression Tools for Metabolic Pathway Engineering

these cases, multiple genes encoding the enzymes of the complicated biosynthetic pathways needed to be introduced into the heterologous host, and their expression needed to be coordinated to minimize metabolic burden and the accumulation of toxic intermediates that would decrease final product yields. Coordinating the expression of all of these genes requires the use of a combination of gene expression control elements: promoters and ribosome binding sites (RBSs) of the appropriate strength, mRNA stability elements, riboregulators, etc.

12.4.1 Promoters One of the first considerations in expressing the genes of a biosynthetic pathway is the promoter choice for varying the level of gene expression in individual cells of the culture. The ability to make subtle changes in the expression of enzymes that catalyze the synthesis of desired molecules is important for balancing and optimizing metabolic pathways. Although there are very good expression systems for high-level production of recombinant proteins, simultaneous overexpression of several enzymes from strong promoters may not improve existing pathways but instead stress the organisms by increasing the metabolic burden [23,38]. In most cases, it is sufficient to express the genes encoding a metabolic pathway at relatively low levels, such that only catalytic amounts of the enzymes in the pathway are produced. The ability to fine-tune production of the desired enzymes allows one to balance the metabolic pathway and requires the use of inducible promoters [30,31]. Although regulatable promoters (propionate-inducible [32], salicylate-inducible [71], and tetracycline-inducible [58] promoters) are available, most widely used, inducible promoters (e.g., lac- and PBAD-type promoters) do not provide a wide range of promoter strengths in a continuous manner. Recently, several laboratories have created artificial, constitutive promoters that cover a wide range of gene expression levels [2,22,39,60]. These promoters are of great value for steady-state expression of genes because they have a single expression level without the need for an inducer. On the other hand, the use of regulatable promoters may be better for tuning metabolic pathways and expressing toxic proteins that should be tightly repressed until a certain time point or density of the culture is reached. When tuning the expression of multiple genes encoding the enzymes of a metabolic pathway, it is often desirable to place the genes under the control of independent, inducible promoters. This allows one to alter the expression level of one or more genes independently of other genes in the same metabolic pathway or in competing metabolic pathways [29]. However, some pairs of promoters suffer from cross-talk—an inducer of one promoter affects expression from another promoter—making it difficult to simultaneously and independently control the expression of multiple genes. Lee et al. [29] found that cross-talk between two of the most useful promoters in E. coli (the arabinose-inducible araBAD and the lactose-inducible lac promoters) prevents them from being used simultaneously in the same cell over wide ranges of expression levels. They engineered an araBAD promoter expression system that is more compatible with the IPTG-inducible lac promoter system by the directed evolution of AraC, which allows the promoters to be independently regulated in a cell.

12.4.2 Regulation of Multiple Genes in Operons The synthesis of natural or unnatural products in microorganisms usually involves the introduction of a large number of genes encoding the enzymes of a metabolic pathway. Since the number of inducible promoters available is limited, it is necessary to place several genes under control of a single promoter (Figure 12.2), much like native operons in prokaryotes [37]. In operons, balanced expression of multiple genes can be controlled by altering post-transcriptional processes such as mRNA stability and translation initiation. A good example was reported by Pfleger et al. [48] who described a method for tuning the expression of multiple genes within operons by generating libraries of tunable intergenic regions (TIGRs) (Figure 12.3). Balancing expression of three genes in an operon that encodes a heterologous mevalonate

12-5

Practical Pathway Engineering—Demonstration in Integrating Tools pMKPMK pMevT atoB

HMGS tHMGR

A-CoA AA-CoA HMG-CoA

Mevalonate

ERG12

ERG8

ERG12

ERG8 MVD1

ERG12

ERG8 MVD1

idi

ERG12

ERG8 MVD1

idi

Mevalonate

Mev-P

pMevB

Mev-PP

pMBI

IPP

ADS

pMBIS ispA DMAPP

Mevalonate pathway

Amorphadiene

OPP

DXP pathway pSOE4 dxs G3P

OPP

FPP

ippHp

ispA

ispA

IPP ippHp DMAPP

dxs DXP MEP CDP-ME CDP-ME2P ME-2,4cPP HMB4PP ispC

Pyruvate

E. coli DYM1

Figure 12.2 Production of amorphadiene via the DXP or mevalonate isoprenoid pathways and depiction of the synthetic operons used in this study. Black triangles represent the PLAC promoter. (From Martin, V. J. J., D. J. Pitera, S. T. Withers, J. D. Newman, and J. D. Keasling, Nature Biotechnology 21, 796–802, 2003. With permission.)

biosynthetic pathway resulted in a seven-fold increase in mevalonate production [48]. In addition, the use of RBSs of different strengths allow control of the yield of proteins from a given coding region [11].

12.4.3 Expression Vectors Once the genes encoding a metabolic pathway have been placed under control of an optimal expression system, they can be integrated via a homologous recombination process into the bacterial genome in order to guarantee genetic stability and expression. Alternatively, low-copy plasmids are excellent for expression of heterologous genes when extreme stability and low metabolic burden imposed on the cells are desired [23]. Due to the metabolic burden associated with high plasmid copy number, multi-copy plasmids may not be appropriate for some metabolic engineering applications, even when the objective is to increase the intracellular concentration of an enzyme in order to improve flux through an existing pathway.

12.5 Pathway Optimization Using Functional Genomics The introduction of a heterologous metabolic pathway can have deleterious effects on the host due to the metabolic burden of the introduced pathway, toxicity of the product or metabolic intermediates, improper folding, or location of the enzymes in the metabolic pathway, etc. To maximize product titers/ yields, it is important to alleviate, to the extent possible, the negative impacts of the heterologous pathway on the host. Because cellular metabolism is complex, it is often difficult to determine how a heterologous pathway is deleterious to the host and then use directed pathway engineering to alleviate the negative interactions. In the past, such deleterious effects of the introduced or native, up-regulated pathways were alleviated by random mutagenesis of the host and selection for improved producers. However effective, little information about the reasons for the improvements are gleaned from such efforts and therefore little can be learned for future directed, pathway engineering projects.

12-6

Gene Expression Tools for Metabolic Pathway Engineering (a) OO O O

Glycolysis AcCoA OAA MAL

IPP

MEV Mevalonate pathway

TCA cycle

Acetyl CoA

HO

CIT

O Artemisinic Cytochrome acid

FPP

Amorhpadiene synthase

(b) Glycolysis

AtoB

Acetoacetyl CoA

(c)

atoB

DMAPP

?

O Artemisinin

UA C G UCA G C G A C G A G G G U A U G C G C G U G C G A U A U A C G U G G C AUACA A G A A A C G G C A U U A A C U C G A G G C C U A A U G ............

HMGS

p450

Amorphadiene

HMGCoA

Mevalonate tHMGR

GA U G C G U A G U A U C G AUA C G C G U C U GG G C G G U U C AGGCC A C C UA A U C C C G A U G C C G G C C C G U U G C G C C G C A C G U A A G C U A C C C U C C U A A U G C A A U A U C G G U A A U U A C G C G HMGS ..............U A A G C G G A GG A U U A C A C U A U G

tHMGR

Figure 12.3 Optimization of mevalonate production in E. coli using tunable intergenic regions (TIGRs). (a) Biosynthetic pathway of heterologous artemisinin production. (b) Reactions and metabolic intermediates in the top half of the mevalonate-based isopentenyl pyrophosphate biosynthetic pathway (shown in less detail in the dashed box of (a)). AtoB (acetoacetyl-CoA thiolase), HMGS (HMG-CoA synthase) and tHMGR (truncated HMG-CoA reductases). (c) Structure of the two intergenic regions that resulted from the TIGR approach. Dashed boxes represent RBSs. (From Pfleger, B. F., D. J. Pitera, C. D Smolke, and J. D. Keasling, Nature Biotechnology 24, 1027–1032, 2006. With permission.)

12-7

Practical Pathway Engineering—Demonstration in Integrating Tools

The recently developed, functional genomics techniques that have been employed to examine the expression of all genes and the production (or consumption) of proteins and metabolites can be exploited to better diagnose the problems created by the introduction of the heterologous metabolic pathway. The techniques allow a better understanding of the impact of the heterologous pathway on the host by mapping the cellular effects of genetic modifications at the level of DNA sequence (comparative genomic analysis), mRNA (transcriptome analysis), proteins (proteome analysis), metabolites (metabolome analysis), and fluxes (flux analysis) between strains with the desired phenotype and corresponding wild-type strains. In pathway optimization, therefore, several of these different techniques should be used in parallel to obtain a global understanding of the host and the introduced pathway and to identify and engineer targets to improve production [7] (Figure 12.4).

12.5.1 Comparative Genome Analysis Analysis of the genomes of two or more related organisms that have different functions, products, or product titers can be useful for identifying genes responsible for the differences among the organisms and for creating modified organisms with the desired traits. A good example of comparative genomic analysis was reported by Ohnishi et al. [42] who identified mutations beneficial for increased production of L-lysine in Corynebacterium glutamicum. Sixteen genes known to be involved in production Genome

Transcripts

Analysis Proteins

Denature and reduce cysteine bridges Digest Digest Digest Digest 114

Reference strains

Desired strains

115

116

117

Pool Cation exchange cleanup Analyze by LC-MS/MS

Metabolites

Methionine sulfone (ISD) Valine Glutamine Lysine Betaine Arginine

15 20 25 30 35 40 Time (min)

Figure 12.4 Analytical techniques employed in functional genomics. Genes and factors conferring desired phenotype can be identified through comparison of DNA sequence, transcript profiles, proteome analysis, metabolite profiling, or flux analyses of strains with the desired phenotype with corresponding reference strains.

12-8

Gene Expression Tools for Metabolic Pathway Engineering

of L-lysine were selected and sequenced. As a result, point mutations were identified in five genes, and introduction of three of these mutations into the wild-type strain increased L-lysine productivity two-fold. Because they compared only a limited fraction of the genome (only those genes directly responsible for production of L-lysine), it is likely that several mutations in other genes were present and contributed to the high product titers. Although it might have been too expensive or impractical at the time to sequence the entire genome to find all mutations, continued decreases in the cost and time to sequence whole genomes is making genome-wide sequence comparisons less costly and more routine, enabling one to determine all mutations in an evolved host.

12.5.2 Transcriptome Analysis The transcriptome is the collection of all mRNA molecules present in one or a population of cells. Unlike the genome, which is roughly fixed for a given cell, the transcriptome can vary dramatically with external environmental conditions. DNA arrays are the most commonly used tool for genome-wide transcription analysis, and the most direct application of DNA arrays to metabolic engineering is the identification of discriminatory genes characteristic of desired physiological states, such as those contributing to high productivity [62]. Mutations cannot directly be identified by genome-wide transcription analysis, but the transcriptome analysis allows mapping of all changes at the mRNA level, which should be diagnostic of a particular genetic or regulatory change. There are some examples of how this technique has been used in the field of metabolic engineering. Recently, the technique was applied to ethanol tolerant E. coli strains [16]. The transcript profile of the ethanol-tolerant strain indicated increased glycine metabolism, a loss of function of a regulatory protein, and increased metabolism of serine and pyruvate. Similar work in a mutant strain of S. cerevisiae capable of importing galactose at a rate three-fold higher than wild-type strains found that PGM2, which encodes the major isoenzyme of phosphoglucomutase, was slightly up-regulated relative to that in wild-type strains [6,44]. By overexpressing PGM2 the galactose uptake rate could be increased by 70% compared to that of the reference strain [6,44]. One disadvantage of the application of transcript profiling to metabolic engineering is that mRNA levels are not always proportional to the expression levels of the proteins they encode. Thus, it is necessary to profile biomolecules that are more indicative of the physiology of the cell, namely the proteome, metabolome, and fluxome.

12.5.3 Proteome Analysis The proteome of a given cell is the collection of proteins produced by it. Proteome analysis, the large-scale analysis of all, or most, proteins in an organism [46], potentially allows one to understand gene regulation better than transcriptome analysis, but it is one step away from the direct effects of genetic changes. The changes in the protein levels in different mutants or under different environmental conditions can be determined and used to identify target enzymes/proteins for further manipulation [20]. Most generally, the complex protein mixture is separated using either two-dimensional gel electrophoresis or liquid chromatograph prior to identification and quantification of individual proteins using mass spectrometry [1,17,59]. Lee and coworkers examined variations in protein profiles of E. coli in response to the overproduction of human leptin, a serine-rich (11.6% of total amino acids) protein, using two-dimensional gel electrophoresis. Based on the information gleaned from the proteomics analysis, they coexpressed and cycK, encoding cysteine synthase, with the leptin gene and successfully enhanced leptin productivity two-fold and the host cell growth rate [19]. Using a quantitative shotgun proteomics technique, Lee et al. [28] examined the proteomic changes in E. coli engineered to express seven to nine genes for the degradation of cis-1,2-dichloroethylene (cis-DCE). They found that the metabolic engineering that leads to enhanced aerobic degragation of cis-DCE and reduced toxicity from cis-DCE epoxide resulted in enhanced synthesis of glutathione coupled with stress response to reactive oxygen species and repression of enzymes involved in fatty acid synthesis, gluconeogenesis, and the tricarboxylic acid cycle [28].

Practical Pathway Engineering—Demonstration in Integrating Tools

12-9

12.5.4 Metabolome Analysis The metabolome represents the collection of all metabolites in an organism and are the end products of gene expression [8,9]. Metabolome analysis covers the quantification of intracellular and extracellular metabolite concentrations in the response of living systems to physiological stimuli or genetic modification. Analysis of the metabolome might aid inverse metabolic engineering by giving insight into metabolic function of mutated genes by comparison with a reference strain. Since the metabolism of an organism directly impacts synthesis of macromolecules and thus the physiology of the cells, metabolomics should yield a more complete picture of the impact of pathway engineering on the cell. With the aid of sophisticated nuclear magnetic resonance (NMR), gas chromatography-mass spectrometry (GC-MS), capillary electrophoresis-mass spectrometry (CE-MS), and liquid chromatography-mass spectrometry (LC-MS), high-throughput quantitative analysis of metabolites has become possible [33]. Its use for diagnosing changes to the host cell has yet to be demonstrated.

12.5.5 Fluxome Analysis The amount of material that flows through a cell’s metabolic pathways is perhaps the largest determining factor in the growth and physiology of the host and in the productivity of the desired metabolite. Determination of as many fluxes as possible is the goal of fluxome analysis, and it is the result of this analysis that can potentially be the most insightful for improving metabolic pathways. By determining how the cell’s metabolic fluxes change in response to environmental or genetic perturbation, one can begin to determine the flexibility of the network to genetic changes that might improve product titers [5,25]. Metabolic flux analysis can be performed in several ways: under determined, exactly determined, and over determined [63]. All methods rely on knowing the underlying metabolic network, which can be gotten more easily lately with the advances in genome sequencing and annotation. Under determined analysis of metabolic fluxes typically relies on measuring the inputs (substrate uptake rates) and outputs (biomass composition and growth rate, synthesis rates of extracellular products, etc.), posing an objective function, and using an optimization routine to calculate the intracellular metabolic fluxes [52,67,68]. Exactly determined methods typically rely on the same types of information but require the determination of one or more fluxes or elimination of unknowns to make the system soluble. Over determined methods typically use stable isotopes (e.g., 13C) to trace the isotope transition from substrate to product (typically, but not exclusively, proteogenic amino acids) [21,56,64,65]. In addition to its other benefits, isotopomer analysis allows one to determine the error in specific metabolic fluxes. The major challenge in fluxome analysis is that flux data rarely reveal a direct engineering target, primarily because fluxes are controlled at multiple levels (e.g., transcription, translation, mRNA degradation, protein activation/inactivation, and allosteric control of enzymes) and because changes in the expression of a particular enzyme can have dramatic impacts on the metabolic network and in unpredictable ways [40]. Thus, metabolic flux analysis is most often used to document changes in metabolic fluxes after a particular genetic or environmental change has been effected. The challenge going forward will be to integrate the data from the various omics methods into a coherent representation of the cell to allow one to predict what genes should be changed to effect the desired impact on the productivity of the desired metabolite [57].

12.6 Improvement of Cellular Properties Yeast, bacterial, animal, and plant cells can be engineered into excellent and environmentally friendly “factories” for commercially interesting chemicals. For cost savings and high production, it is important to improve the biological cell factories by manipulating enzymatic, transport and regulatory functions that are not directly related to the engineered metabolic pathways. These cell properties include extension of the substrate range, elimination of substrate and product toxicity, and improvement of global

12-10

Gene Expression Tools for Metabolic Pathway Engineering

regulatory functions. The introduction of recombinant DNA technology has allowed a more directed intervention into the genetics of the production hosts.

12.6.1 Extension of the Substrate Range To reduce the high cost of industrial biosynthetic production of useful compounds, it is desirable to increase the use of inexpensive and widely available substrates. Sugars (from sugarcane and sugar beets) and starches (from corn and root crops) have been used as a potential substrate for ethanol fermentation by microbial processes. Recently, attention shifted from starch to cellulose and hemicellulose (from wood and plants), because it is the most abundant source of carbohydrates in biomass for the production of bioenergy and biomaterials [53]. The major fermentable sugars derived from cellulose and hemicellulose hydrolysis are glucose and xylose [75]. Unlike glucose, xylose cannot be fermented by S. cerevisiae, the traditional ethanol producer. Through metabolic engineering, S. cerevisiae has been engineered to ferment xylose to ethanol [61]. In another example, lactose is abundant in milk and a major constituent of whey, a by-product of cheese production. Some attempts to use whey as a substrate for biotech processes have been made by expressing either lactose lactose permease plus β-galactosidase from Klyveromyces [12] or a secreted β-galactosidase from Aspergillus [27] in order to convert lactose to fermentable sugars, galactose and glucose.

12.6.2 Elimination of Substrate and Product Toxicity In many biological production processes, high concentrations of substrate or product are inhibitory or toxic to the enzymatic or cellular catalyst [18,36]. Production stops when the product concentration reaches an inhibitory level. The same problem is true for toxic substrates. The general solution for such inhibitory phenomena is to maintain the concentrations of the inhibitors as low as possible by altering and improving process conditions. The product concentration can be kept low inside the cell by transporting the final products out of the cell (by overexpressing native or engineered transporters) or by extracting the product from the medium. In the case of toxic intermediates, balancing production and consumption of the intermediate without significantly slowing product formation is essential [49,50].

12.6.3 Improvement of Global Regulatory Functions Reprogramming the cell for improved product formation or tolerance to a particular product or environment can be an arduous task, typically involving the manipulation of several to tens to hundreds of genes. Using the typical, single-gene modification methods such a task can be daunting, if not impossible. To enable multigene modifications to a host cell, an alternative approach for reprogramming gene transcription by altering global transcriptional regulators was proposed by Stephanopoulos and coworkers [3]. This cellular engineering approach, termed global transcription machinery engineering (gTME), generates diversity at the transcriptional level by altering key proteins that regulate the transcription of many genes simultaneously. Using this technique, Stephanopoulos and coworkers improved ethanol tolerance and production by S. cerevisiae [3].

12.7 Perspective Metabolic pathway engineering has been used to improve the production of existing metabolites and enable the production of new metabolites. It is now a rapidly growing area with great potential to impact industrial biocatalysis. Rapid advances in recombinant DNA technology, functional genomics, analytical technologies, the design of artificial biological systems and the understanding of their natural counterparts, known as synthetic biology [13], will extend the applications of pathway

Practical Pathway Engineering—Demonstration in Integrating Tools

12-11

engineering. In the future, it will be a powerful tool to synthesize structurally diverse and complex chemicals and useful compounds from abundant renewable biomass [53]. Thus, metabolic pathway engineering will have a significant impact on society in terms of the production of fuels, chemicals, materials and novel drugs.

References 1. Aebersold, R. and M. Mann. 2003. Mass spectrometry-based proteomics. Nature 422:198–207. 2. Alper, H., C. Fischer, E. Nevoigt, and G. Stephanopoulos. 2005. Tuning genetic control through promoter engineering. Proceedings of the National Academy of Sciences of the United States of America 102: 12678–12683. 3. Alper, H., J. Moxley, E. Nevoigt, G. R. Fink, and G. Stephanopoulos. 2006. Engineering yeast transcription machinery for improved ethanol tolerance and production. Science 314:1565–1568. 4. Arnold, F. H. 2001. Combinatorial and computational challenges for biocatalyst design. Nature 409:253–257. 5. Becker, J., C. Klopprogge, O. Zelder, E. Heinzle, and C. Wittmann. 2005. Amplified expression of fructose 1,6-bisphosphatase in Corynebacterium glutamicum increases in vivo flux through the pentose phosphate pathway and lysine production on different carbon sources. Applied and Environmental Microbiology 71:8587–8596. 6. Bro, C., S. Knudsen, B. Regenberg, L. Olsson, and J. Nielsen. 2005. Improvement of galactose uptake in Saccharomyces cerevisiae through overexpression of phosphoglucomutase: Example of transcript analysis as a tool in inverse metabolic engineering. Applied and Environmental Microbiology 71:6465–6472. 7. Bro, C. and J. Nielsen. 2004. Impact of ‘ome’ analyses on inverse metabolic engineering. Metabolic Engineering 6:204–211. 8. Buchholz, A., J. Hurlebaus, C. Wandrey, and R. Takors. 2002. Metabolomics: quantification of intracellular metabolite dynamics. Biomolecular Engineering 19:5–15. 9. Burja, A. M., S. Dhamwichukorn, and P. C. Wright. 2003. Cyanobacterial postgenomic research and systems biology. Trends in Biotechnology 21:504–511. 10. Chotani, G., T. Dodge, A. Hsu, M. Kumar, R. LaDuca, D. Trimbur, W. Weyler, and K. Sanford. 2000. The commercial production of chemicals using pathway engineering. Biochimica Et Biophysica Acta-Protein Structure and Molecular Enzymology 1543:434–455. 11. Desmit, M. H. and J. Vanduin. 1994. Translational initiation on structured messengers—another role for the Shine-Dalgarno interaction. Journal of Molecular Biology 235:173–184. 12. Domingues, L., J. A. Teixeira, and N. Lima. 1999. Construction of a flocculent Saccharomyces cerevisiae fermenting lactose. Applied Microbiology and Biotechnology 51:621–626. 13. Drubin, D. A., J. C. Way, and P. A. Silver. 2007. Designing biological systems. Genes & Development 21:242–254. 14. Fleischmann, R. D., M. D. Adams, O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, K. Mckenney, G. Sutton, W. Fitzhugh, C. Fields, J. D. Gocayne, J. Scott, R. Shirley, L. I. Liu, A. Glodek, J. M. Kelley, J. F. Weidman, C. A. Phillips, T. Spriggs, E. Hedblom, M. D. Cotton, T. R. Utterback, M. C. Hanna, D. T. Nguyen, D. M. Saudek, R. C. Brandon, L. D. Fine, J. L. Fritchman, J. L. Fuhrmann, N. S. M. Geoghagen, C. L. Gnehm, L. A. Mcdonald, K. V. Small, C. M. Fraser, H. O. Smith, and J. C. Venter. 1995. Whole-genome random sequencing and assembly of Haemophilus-Influenzae Rd. Science 269:496–512. 15. Flores, N., J. Xiao, A. Berry, F. Bolivar, and F. Valle. 1996. Pathway engineering for the production of aromatic compounds in Escherichia coli. Nature Biotechnology 14:620–623. 16. Gonzalez, R., H. Tao, J. E. Purvis, S. W. York, K. T. Shanmugam, and L. O. Ingram. 2003. Gene arraybased identification of changes that contribute to ethanol tolerance in ethanologenic Escherichia coli: Comparison of KO11 (Parent) to LY01 (resistant mutant). Biotechnology Progress 19:612–623.

12-12

Gene Expression Tools for Metabolic Pathway Engineering

17. Gygi, S. P., G. L. Corthals, Y. Zhang, Y. Rochon, and R. Aebersold. 2000. Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. Proceedings of the National Academy of Sciences of the United States of America 97:9390–9395. 18. Hack, C. J., J. M. Woodley, M. D. Lilly, and J. M. Liddell. 2000. Design of a control system for biotransformation of toxic substrates: toluene hydroxylation by Pseudomonas putida UV4. Enzyme and Microbial Technology 26:530–536. 19. Han, M. J., K. J. Jeong, J. S. Yoo, and S. Y. Lee. 2003. Engineering Escherichia coli for increased productivity of serine-rich proteins based on proteome profiling. Applied and Environmental Microbiology 69:5772–5781. 20. Han, M. J. and S. Y. Lee. 2003. Proteome profiling and its use in metabolic and, cellular engineering. Proteomics 3:2317–2324. 21. Hellerstein, M. K. and E. Murphy. 2004. Stable isotope-mass spectrometric measurements of molecular fluxes in vivo: Emerging applications in drug development. Current Opinion in Molecular Therapeutics 6:249–264. 22. Jensen, P. R. and K. Hammer. 1998. Artificial promoters for metabolic optimization. Biotechnology and Bioengineering 58:191–195. 23. Jones, K. L. K., S.-W., and J. D. Keasling. 2000. Low-copy plasmids can perform as well as or better than high-copy plasmids for metabolic engineering of bacteria. Metabolic Engineering 2:328–338. 24. Kern, A., E. Tilley, I. S. Hunter, M. Legisa, and A. Glieder. 2007. Engineering primary metabolic pathways of industrial micro-organisms. Journal of Biotechnology 129:6–29. 25. Kiefer, P., E. Heinzle, O. Zelder, and C. Wittmann. 2004. Comparative metabolic flux analysis of lysine-producing Corynebacterium glutamicum cultured on glucose, or fructose. Applied and Environmental Microbiology 70:229–239. 26. Kleerebezem, M. and J. Hugenholtz. 2003. Metabolic pathway engineering in lactic acid bacteria. Current Opinion in Biotechnology 14:232–237. 27. Kumar, V., S. Ramakrishnan, T. T. Teeri, J. K. Knowles, and B. S. Hartley. 1992. Saccharomyces cerevisiae cells secreting an Aspergillus niger beta-galactosidase grow on whey permeate. Biotechnology (NY) 10:82–85. 28. Lee, J., L. Cao, S. Y. Ow, M. E. Barrios-Llerena, W. Chen, T. K. Wood, and P. C. Wright. 2006. Proteome changes after metabolic engineering to enhance aerobic mineralization of cis-1,2-dichloroethylene. Journal of Proteome Research 5:1388–1397. 29. Lee, S. K., H. H. Chou, B. F. Pfleger, J. D. Newman, Y. Yoshikuni, and J. D. Keasling. 2007. Directed evolution of AraC for improved compatibility of arabinose and lactos-inducible promoters. Applied and Environmental Microbiology 73: 5711–5715. 30. Lee, S. K. and J. D. Keasling. 2006. Effect of glucose or glycerol as the sole carbon source on gene expression from the Salmonella prpBCDE promoter in Escherichia coli. Biotechnology Progress 22:1547–1551. 31. Lee, S. K. and J. D. Keasling. 2005. A propionate-inducible expression system for enteric bacteria. Applied and Environmental Microbiology 71:6856–6862. 32. Lee, S. K. and J. D. Keasling. 2006. Propionate-regulated high-yield protein production in Escherichia coli. Biotechnology and Bioengineering 93:912–918. 33. Lee, S. Y., D. Y. Lee, and T. Y. Kim. 2005. Systems biotechnology for strain improvement. Trends in Biotechnology 23:349–358. 34. Li, Q. S., U. Schwaneberg, M. Fischer, J. Schmitt, J. Pleiss, S. Lutz-Wahl, and R. D. Schmid. 2001. Rational evolution of a medium chain-specific cytochrome P-450 BM-3 variant. Biochimica Et Biophysica Acta-Protein Structure and Molecular Enzymology 1545:114–121. 35. Looger, L. L., M. A. Dwyer, J. J. Smith, and H. W. Hellinga. 2003. Computational design of receptor and sensor proteins with novel functions. Nature 423:185–190. 36. Marshall, C. T. and J. M. Woodley. 1995. Process synthesis for multistep microbial conversions. Bio-Technology 13:1072–1078.

Practical Pathway Engineering—Demonstration in Integrating Tools

12-13

37. Martin, V. J. J., D. J. Pitera, S. T. Withers, J. D. Newman, and J. D. Keasling. 2003. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nature Biotechnology 21:796–802. 38. Mattanovich, D., B. Gasser, H. Hohenblum, and M. Sauer. 2004. Stress in recombinant protein producing yeasts. Journal of Biotechnology 113:121–135. 39. Mijakovic, I., D. Petranovic, and P. R. Jensen. 2005. Tunable promoters in systems biology. Current Opinion in Biotechnology 16:329–335. 40. Nielsen, J. 2001. Metabolic engineering. Applied Microbiology and Biotechnology 55:263–283. 41. Nielsen, J. 1998. The role of metabolic engineering in the production of secondary metabolites. Current Opinion in Microbiology 1:330–336. 42. Ohnishi, J., S. Mitsuhashi, M. Hayashi, S. Ando, H. Yokoi, K. Ochiai, and M. Ikeda. 2002. A novel methodology employing Corynebacterium glutamicum genome information to generate a new L-lysine-producing mutant. Applied Microbiology and Biotechnology 58:217–223. 43. Ostergaard, S., L. Olsson, and J. Nielsen. 2000. Metabolic engineering of Saccharomyces cerevisiae. Microbiology and Molecular Biology Reviews 64:34–50. 44. Ostergaard, S., C. Roca, B. Ronnow, J. Nielsen, and L. Olsson. 2000. Physiological studies in aerobic batch cultivations of Saccharomyces cerevisiae strains harboring the MEL1 gene. Biotechnology and Bioengineering 68:252–259. 45. Page, J. E. 2005. Silencing nature’s narcotics: metabolic engineering of the opium poppy. Trends in Biotechnology 23:331–333. 46. Pandey, A. and M. Mann. 2000. Proteomics to study genes and genomes. Nature 405:837–846. 47. Pfeifer, B. A., C. C. C. Wang, C. T. Walsh, and C. Khosla. 2003. Biosynthesis of yersiniabactin, a complex polyketide-nonribosomal peptide, using Escherichia coli as a heterologous host. Applied and Environmental Microbiology 69:6698–6702. 48. Pfleger, B. F., D. J. Pitera, C. D Smolke, and J. D. Keasling. 2006. Combinatorial engineering of intergenic regions in operons tunes expression of multiple genes. Nature Biotechnology 24:1027–1032. 49. Pfleger, B. F., D. J. Pitera, J. D. Newman, V. J. J. Martin, and J. D. Keasling. 2007. Microbial sensors for small molecules: Development of a mevalonate biosensor. Metabolic Engineering 9:30–38. 50. Pitera, D. J., C. J. Paddon, J. D. Newman, and J. D. Keasling. 2007. Balancing a heterologous mevalonate pathway for improved isoprenoid production in Escherichia coli. Metabolic Engineering 9:193–207. 51. Powell, K. A., S. W. Ramer, S. B. del Cardayre, W. P. C. Stemmer, M. B. Tobin, P. F. Longchamp, and G. W. Huisman. 2001. Directed evolution and biocatalysis. Angewandte Chemie-International Edition 40:3948–3959. 52. Pramanik, J. and J. D. Keasling. 1997. Stoichiometric model of Escherichia coli metabolism: Incorporation of growth-dependent biomass composition and mechanistic engergy requirements. Biotechnology and Bioengineering 50:398–421. 53. Ragauskas, A. J., C. K. Williams, B. H. Davison, G. Britovsek, J. Cairney, C. A. Eckert, W. J. Frederick, J. P. Hallett, D. J. Leak, C. L. Liotta, J. R. Mielenz, R. Murphy, R. Templer, and T. Tschaplinski. 2006. The path forward for biofuels and biomaterials. Science 311:484–489. 54. Ro, D. K., E. M. Paradise, M. Ouellet, K. J. Fisher, K. L. Newman, J. M. Ndungu, K. A. Ho, R. A. Eachus, T. S. Ham, J. Kirby, M. C. Y. Chang, S. T. Withers, Y. Shiba, R. Sarpong, and J. D. Keasling. 2006. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature 440:940–943. 55. Rohlin, L., M. K. Oh, and J. C. Liao. 2001. Microbial pathway engineering for industrial processes: evolution, combinatorial biosynthesis and rational design. Current Opinion in Microbiology 4:330–335. 56. Sauer, U. 2004. High-throughput phenomics: experimental methods for mapping fluxomes. Current Opinion in Biotechnology 15:58–63.

12-14

Gene Expression Tools for Metabolic Pathway Engineering

57. Sauer, U. 2006. Metabolic networks in motion: C-13-based flux analysis. Molecular Systems Biology 2: 1774. 58. Skerra, A. 1994. Use of the tetracycline promoter for the tightly regulated production of a murine antibody fragment in Escherichia coli. Gene 151:131–135. 59. Smolka, M. B., H. L. Zhou, S. Purkayastha, and R. Aebersold. 2001. Optimization of the isotopecoded affinity tag-labeling procedure for quantitative proteome analysis. Analytical Biochemistry 297:25–31. 60. Solem, C. and P. R. Jensen. 2002. Modulation of gene expression made easy. Applied and Environmental Microbiology 68:2397–2403. 61. Sonderegger, M. and U. Sauer. 2003. Evolutionary engineering of Saccharomyces cerevisiae for anaerobic growth on xylose. Applied and Environmental Microbiology 69:1990–1998. 62. Stafford, D. E. and G. Stephanopoulos. 2001. Metabolic engineering as an integrating platform for strain development. Current Opinion in Microbiology 4:336–340. 63. Stephanopoulos, G. N., A. A. Aristidou, and J. Nielsen. 1998. Metabolic Engineering Principles and Methodologies. Academic Press, San Diego. 64. Tang, Y., F. Pingitore, A. Mukhopadhyay, R. Phan, T. C. Hazen, and J. D. Keasling. 2007. Pathway confirmation and flux analysis of central metabolic pathways in Desulfovibrio vulgaris Hildenborough using gas chromatography-mass spectrometry and Fourier transform-ion cyclotron resonance mass spectrometry. Journal of Bacteriology 189:940–949. 65. Tang, Y. J. J., J. S. Hwang, D. E. Wemmer, and J. D. Keasling. 2007. Shewanella oneidensis MR-1 fluxome under various oxygen conditions. Applied and Environmental Microbiology 73:718–729. 66. van Maris, A. J. A., A. A. Winkler, D. Porro, J. P. van Dijken, and J. T. Pronk. 2004. Homofermentative lactate production cannot sustain anaerobic growth of engineered Saccharomyces cerevisiae: Possible consequence of energy-dependent lactate export. Applied and Environmental Microbiology 70:2898–2905. 67. Varma, A., B. W. Boesch, and B. O. Palsson. 1993. Biochemical production capabilities of Escherichia coli. Biotechnology and Bioengineering 42:59–73. 68. Varma, A., B. W. Boesch, and B. O. Palsson. 1993. Stoichimetric interpretation of Escherichia coli glucose catabolism under various oxygenation rates. Applied and Environmental Microbiology 59:2465–2473. 69. Voigt, C. A., S. L. Mayo, F. H. Arnold, and Z. G. Wang. 2001. Computational method to reduce the search space for directed protein evolution. Proceedings of the National Academy of Sciences of the United States of America 98:3778–3783. 70. Watanabe, K., K. Hotta, A. P. Praseuth, K. Koketsu, A. Migita, C. N. Boddy, C. C. C. Wang, H. Oguri, and H. Oikawa. 2006. Total biosynthesis of antitumor nonribosomal peptides in Escherichia coli. Nature Chemical Biology 2:423–428. 71. Yen, K. M. 1991. Construction of cloning cartridges for development of expression vectors in Gramnegative bacteria. Journal of Bacteriology 173:5328–5335. 72. Yoshikuni, Y., T. E. Ferrin, and J. D. Keasling. 2006. Designed divergent evolution of enzyme function. Nature 440:1078–1082. 73. Yoshikuni, Y. and J. D. Keasling. 2007. Pathway engineering by designed divergent evolution. Current Opinion in Chemical Biology 11:233–239. 74. Yoshikuni, Y., V. J. J. Martin, T. E. Ferrin, and J. D. Keasling. 2006. Engineering cotton (+)-delta-cadinene synthase to an altered function: Germacrene D-4-ol synthase. Chemistry & Biology 13:91–98. 75. Yu, Z. and H. Zhang. 2003. Ethanol fermentation of acid-hydrolyzed cellulosic pyrolysate with Saccharomyces cerevisiae. Bioresource Technology 93:199–204.

Application of Emerging Technologies to Metabolic Engineering

III

Jay D. Keasling University of California

13. Genome-Wide Technologies: DNA Microarrays, Phenotypic Microarrays, and Proteomics Seh Hee Jang, Mee-Jung Han, Sang Yup Lee, Jong Hyun Choi, and Xiao Xia Xia......................................................................................................................... 13-1 Introduction • DNA Microarray • Phenotypic Microarray • Proteomics • Combined Genome-Wide Analysis • Conclusions and Future Prospects

14. Monitoring and Measuring the Metabolome Maria Rowena N. Monton and Tomoyoshi Soga................................................................................................................... 14-1 Introduction • Mass Spectrometry • Nuclear Magnetic Resonance Spectroscopy

T

he richness and versatility of biological systems make them ideally suited to solve some of the world’s most significant challenges, such as converting cheap, renewable resources into energy-rich molecules; producing high-quality, inexpensive drugs to fight disease; and remediating polluted sites. Over the years, significant strides have been made in engineering microorganisms to produce fuels, bulk chemicals, and valuable drugs from inexpensive starting materials; to detect and degrade nerve agents as well as less toxic organic pollutants; and to accumulate metals and reduce radionuclides. The components needed to engineer the chemistry inside a microbial cell III-1

III-2

Application of Emerging Technologies to Metabolic Engineering

are significantly different from those commonly used to overproduce pharmaceutical proteins. Besides gene expression tools to control metabolism and mathematical models to assess metabolic flux, there is a general need for functional genomics tools to assess the impact of metabolic pathways on the host. Engineering metabolic chemistry generally begins with the introduction of one or more genes encoding enzymes that transform readily available, intracellular intermediates into a desired chemical. Unlike production of pharmaceutical proteins in which the genes are highly expressed to maximize production of the target protein, the genes encoding the transformational enzymes do not need to be highly expressed; rather the enzymes need to be produced in catalytic amounts only sufficient to adequately transform the metabolic intermediates into the desired products at a sufficient rate. Expression of the desired genes at too high a level will rob the cell of metabolites (nucleotides for the excess mRNA, amino acids for the excess protein, etc.) that might be otherwise used to produce the desired molecule of interest [11]. In addition to decreasing final product titers, overexpression of the genes in a metabolic pathway many elicit a number of stress responses in the host cell that will decrease cell growth and further decrease product titers. Furthermore, because intermediates of a foreign metabolic pathway can be toxic to a heterologous host [5], which can result in decreased production of the desired final compound, it is essential that the relative levels of the enzymes be coordinated in such a way that no intermediate in the pathway accumulates to toxic levels. The ability to assess the impact of metabolic pathways on cellular metabolism is essential to optimize cells for production of the desired product. In analogy to the creation of computer programs, the development of software has been made possible only through the development and use of debugging software to quickly identify and correct problems with code [2,8]. The development of similar tools for biological debugging would reduce development times for building and optimizing engineered cells. For the development of microbial chemical factories, functional genomics can serve in the role of debugging routines [1,9,10], because the introduction of a metabolic pathway often elicits the equivalent of a “bug” in the cell’s natural program, which will be indicated by various stress responses (e.g., stringent response, unfolded protein response, etc.) [3,6,7]. These stresses are reflected in mRNA and proteins expressed at that time and are thus detectable using DNA arrays or proteomics [3,6,7]. Overexpression of a metabolic pathway might rob the cell of central metabolic intermediates resulting in decreased cell growth, a change in the profiles of intracellular and extracellular metabolites, which would be reflected in the metabolite profile, and a change in metabolic fluxes, which would be reflected in the flux profile. Imbalances in a metabolic pathway will result in an accumulation of potentially toxic metabolites that inhibit growth [5]. Information from one or more of these techniques can be used to then modify expression of genes in the metabolic pathway or in the host to improve titers and/or productivity of the final product. Because of the complexity and amount of information that must be collected and correlated, computational bioinformatics methods to manage and analyze the data are essential. An example of the use of functional genomics to assess the toxicity associated with a heterologous metabolic pathway and correct the problems associated with the pathway was recently demonstrated by Kizer and coworkers [4]. They analyzed the toxicity associated with a heterologous mevalonate-based isopentenyl pyrophosphate biosynthetic pathway that had been engineered into Escherichia coli so that the strain would produce large quantities of isoprenoids. Although the engineered E. coli produced high levels of isoprenoids, further pathway optimization lead to an imbalance in carbon flux and the accumulation of the pathway intermediate 3-hydroxy-3-methylglutaryl-coenzyme A (HMG-CoA), which proved to be cytotoxic to E. coli. Using both DNA microarray analysis and metabolite profiling, Kizer and coworkers studied E. coli strains inhibited by the intracellular accumulation of HMG-CoA. They showed that HMG-CoA inhibits fatty acid biosynthesis in the microbial host leading to a generalized membrane stress. The cytotoxic effects of HMG-CoA accumulation were counteracted by the addition of palmitic acid (16:0) and, to a lesser extent, oleic acid (cis-∆9-18:1) in the growth medium. This work demonstrates the utility of using transcriptomic and metabolomic methods to optimize synthetic biological systems. We anticipate this type of work will become commonplace as the technologies become easier to use and more affordable.

Application of Emerging Technologies to Metabolic Engineering

III-3

In Chapters 13 and 14, we describe several emerging technologies and their potential to impact metabolic engineering. Jang and coworkers describe the use of DNA microarrays, phenotypic microarrays, and proteomics techniques for assessing stress responses in the cell. Monton and Soga describe the use of metabolomics to assess changes in cellular metabolism. And Alm and Arkin describe the use of bioinformatics to compile and analyze the large datasets that arise from these techniques. The integration of these debugging technologies will certainly make metabolic engineering more predictable, easier, and faster.

References 1. Bro, C. and J. Nielsen. 2004. Impact of ‘ome’ analyses on inverse metabolic engineering. Metab. Eng., 6:204–11. 2. Campbell, R. V. D. 1952. Presented at the ACM Annual Conference/Annual Meeting, Pittsburgh, Pennsylvania. 3. Gill, R. T., J. J. Valdes, and W. E. Bentley. 2000. A comparative study of global stress gene regulation in response to overexpression of recombinant proteins in Escherichia coli. Metab. Eng., 2:178–89. 4. Kizer, L., D. J. Pitera, B. Pfleger, and J. D. Keasling. 2008. Functional genomics for pathway optimization: application to isoprenoid production. Appl. Environ. Microbiol., 74: 3229–41. 5. Martin, V. J. J., D. J. Pitera, S. T. Withers, J. D. Newman, and J. D. Keasling. 2003. Engineering the mevalonate pathway in Escherichia coli for production of terpenoids. Nat. Biotechnol., 21:796–802. 6. Oh, M. K. and J. C. Liao. 2000. DNA microarray detection of metabolic responses to protein overproduction in Escherichia coli. Metab. Eng., 2:201–9. 7. Oh, M. K. and J. C. Liao. 2000. Gene expression profiling by DNA microarrays and metabolic fluxes in Escherichia coli. Biotechnol. Prog., 16:278–86. 8. Orden, A. 1952. Presented at the ACM Annual Conference/Annual Meeting, Pittsburgh, Pennsylvania. 9. Park, J. H., K. H. Lee, T. Y. Kim, and S. Y. Lee. 2007. Metabolic engineering of Escherichia coli for the production of L-valine based on transcriptome analysis and in silico gene knockout simulation. Proc. Natl. Acad. Sci. USA, 104:7797–802. 10. Park, S. J., S. Y. Lee, J. Cho, T. Y. Kim, J. W. Lee, J. H. Park, and M. J. Han. 2005. Global physiological understanding and metabolic engineering of microorganisms based on omics studies. Appl. Microbiol. Biotechnol., 68:567–79. 11. Pfleger, B. F., D. J. Pitera, J. D. Newman, V. J. Martin, and J. D. Keasling. 2007. Microbial sensors for small molecules: development of a mevalonate biosensor. Metab. Eng., 9:30–8.

13 Genome-Wide Technologies: DNA Microarrays, Phenotypic Microarrays, and Proteomics Seh Hee Jang Korea Advanced Institute of Science and Technology

Mee-Jung Han Korea Advanced Institute of Science and Technology

Sang Yup Lee Korea Advanced Institute of Science and Technology

Jong Hyun Choi Korea Advanced Institute of Science and Technology

Xiao Xia Xia Korea Advanced Institute of Science and Technology

13.1 Introduction �� 13-1 13.2 DNA Microarray �� 13-2 Fundamentals of DNA Microarray • Applications of DNA Microarray • Reliability and Reproducibility Issues

13.3 Phenotypic Microarray ��13-6 Fundamentals of Phenotypic Microarray • Applications of Phenotypic Microarray

13.4 Proteomics ��13-8 Fundamentals of Proteomics • Protein Mapping • Quantitative Protein Profiling

13.5 Combined Genome-Wide Analysis.......................................... 13-13 13.6 Conclusions and Future Prospects............................................ 13-15 Acknowledgments �� 13-16 References �� 13-16

13.1 Introduction Since the completion of the first genome sequencing of a microorganism [1], hundreds of genomes have been sequenced and archived in National Center for Biotechnology Information (NCBI) and many databases. The organisms are a complex system and their genomes are immense, and thus powerful technologies are being developed to meet the demands in analysis of thousands of genes and their products and functions. Compared to traditional methodologies which were typically based on one target in one experiment, recently developed omics technologies, including transcriptomics, proteomics, metabolomics, and physiomics are allowing us to generate large amounts of these data. Accessibility to these omics data is providing a foundation for in-depth understanding of living organisms. Genome-wide technologies were based on the two-dimensional experimental methods which were initially developed by O’Farrell et al. [2] for proteomics, and phenotypic and DNA microarrays by Bochner et al. [3] and Fodor et al. [4], respectively. Thanks to these methodologies, experiments are no longer limited to one by one type, and can be performed for hundreds to tens of thousands of targets and conditions simultaneously. The effort to miniaturize the experimental scales without the loss of validity and reproducibility 13-1

13-2

Application of Emerging Technologies to Metabolic Engineering

Ribosome

Protein

Membrane

Polymerase Protein folding & modification

DNA mRNA Transcription

Translation

Metabolites

DNA microarray

Proteomics

Phenotypic microarray

Figure 13.1 Genome-wide technologies. DNA microarray is used for transcriptome profiling which allows for the simultaneous monitoring of relative mRNA abundance in multiple samples. Proteome profiling is conducted by two-dimensional gel electrophoresis and mass spectrometry. Phenotype microarrays is used for simultaneous testing of a large number of cellular phenotypes.

is also remarkable, and has contributed in widespread use of these technologies. Nowadays, these technologies have successfully been applied to biological research, and are providing new information on global cellular physiology and regulations of the cells (Figure 13.1). Together with computational analyses, these high-throughput omics technologies gave birth of systems biology [5]. In recent years, systems biotechnology [6], which allows development of improved strains and bioprocesses by taking systemslevel analytical approaches, also appeared. In this chapter, we review the advances in DNA microarray, phenotypic microarray and proteomics with their applications. Also, we describe the importance of combined analysis of these omics data within a systems biology framework for successful metabolic engineering.

13.2 DNA Microarray In the past decade, DNA microarray has attracted great interests among biological researchers as it enables to monitor the mRNA expression levels on a genome-wide scale on a single slide, and to obtain new information on the gene interactions and regulations simultaneously. The microarray was originally used as a screening method in the 1980s when complete genome sequences were not available. Bacterial cosmid libraries were used to select colonies forming hybridization with probe DNAs. In the 1990s, DNA microarray and mRNA expression profiling started to be applied to in depth study of the organisms whose genome sequences were completely determined. Using classical methods to assay gene expression, scientists were able to examine a relatively small number of genes at a time. Although those traditional methods are still important to study the biological systems, they do not offer a complete picture, which is essential to uncover new biological phenomena at a whole cell level. The opposite feature of DNA microarray made it increasingly popular in modern bioscience and biotechnology. The primary objective of using DNA microarray is for the functional genomic studies of living organisms [7,8]. The entire set of open reading frames (ORFs) of the organism can be arrayed on a slide and their mRNA expression profiles obtained in a single experiment. Because of these characteristics, DNA microarray technology allows the comparative analysis of gene expression levels in different cell types and genotypes, and/or varying environmental conditions. Additionally, computational analysis of microarray data allows the classification of genes according to their mRNA expression patterns, providing information on the regulatory mechanisms in the cell.

13-3

Genome-Wide Technologies

13.2.1 Fundamentals of DNA Microarray The basic concept of DNA microarray is the precise positioning of DNA probes at high density on a solid support to make it a DNA/RNA detector. The DNA microarrays can be categorized depending on the solid support used (glass or filter), the type of DNA immobilized on the array (cDNA, oligonucleotide, or genomic fragment), and the manufacturing method (inkjet printing, spotting, mask-based in situ synthesis). Currently, two main types of DNA microarrays used are spotted glass slide (or filter) arrays, which are relatively simple to prepare, and in situ synthesized oligonucleotide arrays (e.g., Affymetrix GeneChip), which require special equipments [9–11] (Figure 13.2). In the case of spotted microarray, the PCR-amplified cDNAs or presynthesized oligonucleotides can be deposited onto polylysine or aminosilane coated glass slide [12]. A high-speed robot is used to spot cDNA or oligonucleotides on the slide. The slides are then hybridized with cDNAs labeled separately with two different fluorescent dyes (e.g., Cy3 and Cy5) during the reverse-transcription of mRNAs from two different types of cells or the cells under two different conditions. The relative intensities of the two fluorescent dyes in a spot represent the relative mRNA expression levels of the genes under two conditions being compared. Although the fold changes from cDNA or oligonucleotide microarrays are not identical, the general trends of changes are similar to each other when the same genes are compared using the two microarrays. One advantage of the oligonucleotide microarray is that homologies among the thousands to tens of thousands of genes can be minimized during the probe design. Because of this reason, the oligonucleotide microarray is useful in monitoring global gene expression levels and to detect mutations and single nucleotide polymorphisms. The cDNA microarrays, however, have potential cross-hybridization problems due to sequence homologies among the genes. However, for the targeted gene expression profiling, it is still an important tool. The comparative nature of gene expression measurements with spotted arrays generates a problem in comparing data obtained from different experiments and other research groups. This problem can be alleviated by using universal references of the microarray community [13]. DNA oligonucleotides can also be synthesized in situ on the DNA chip with photo-labeled protecting groups and photolithographic masks [9–11]. The in situ synthesized oligonucleotide microarray contains much larger number of oligomers as the array manufacturing process adopts a semiconductor

(a)

(b) Sample RNA Reference RNA Light

cDNA or oligonucleotide Printing

Mask

Reverse transcription and labeling Light

Mask Hybridization

Excitation and emission

Analysis

Chip

OO O O

Image processing Normalization Identification GG CC Clustering and others A T A T TC TC mRNA level Down

Chip

OHO OHO

T

TO TO

Light TC TC

C

TOHTOH

TO TO

Up

Figure 13.2 Two main types of DNA microarrays; spotted arrays (a), in situ synthesized oligonucleotide arrays (b). (a) High-speed robot is used to spot cDNA or oligonucleotide onto polylysine or aminosilane coated slide, and this slide is hybridized with fluorescent dye labeled cDNA. (b) DNA chip can be synthesized with photo-labeled protecting groups and photolithographic masks, which is similar to micro-fabrication process.

13-4

Application of Emerging Technologies to Metabolic Engineering

micro-fabrication process. Commercially available arrays are manufactured at a density of over 1.3 million unique features per array (http://www.affymetrix.com/technology/manufacturing/index.affx). Another advantage of this type of microarray is that the shape and size of the spots are uniform, free of any irregularities caused by mechanical depositing. However, there is a probe design issue in that the probes directly synthesized on substrates contain a number of nucleotide chains that are different from the design sequences [14]. It is assumed in the data analysis of the microarray that the mRNA composition in the sample is proportional to the total mRNA extracted and to the fluorescence intensity data. The first step of the data analysis is data preprocessing to improve the confidence of the results, not to improve data quality. For this purpose, spot filtering and normalization are performed. While spot filtering flags doubtful or uninformative spots, data normalization identifies and removes the effects of systematic variation in the measured fluorescence intensities, other than differential expression. Due to the complicated process of manufacturing and hybridizing microarrays, a certain amount of systematic variation exists in the data. Therefore, normalization is a very important step for the analysis of all microarray data as it can prevent systematic bias. For the total intensity normalization, a normalization factor can be calculated from either the total integrated intensity (in one color hybridization) or the total average fold difference of the Cy3 and Cy5 channels (in two color hybridization). The normalization factor then can be used in adjusting the scale or fold change for every spot on the chip. The normalization factor can also be calculated from housekeeping genes to adjust experimental variability in the samples since housekeeping genes are assumed to have rather constant profiles throughout the experiments. It should be noted that the normalization can lead to over-enhancement of the information in the arrays [15]. To extract useful information from expression profiles, computational tools that cluster and display data are necessary. There are many ways to analyze gene expression data, such as hierarchical clustering [16] and self-organizing map (SOM) clustering [17], which have been widely used to display the data. Hierarchical clustering is relatively simple and the results are easily visualized. The distances between the genes are calculated for all of the genes based on their expression patterns. The genes having close expression patterns are merged to generate a cluster. Then the distances between these small clusters are calculated to produce a new cluster and the process is repeated again until only one cluster is left. The advantage of this clustering is that it forms a hierarchy of clusters enabling small groups of coexpressed genes to be identified. The SOM clustering assigns genes to a series of groups on the basis of expression pattern similarities. Random vectors are constructed for each group and a gene is assigned to the closest vector. With the clustering algorithms, distinct patterns of gene expression are identified and genes are grouped on the basis of the similarity of their expression profiles, for example, TCA/glyoxylate cycle-related genes, amino acid synthesis/degradation-related genes, and nucleotide synthesis/degradation-related genes. It is notable that the expression patterns of a group of genes with similar functions were found to be coregulated by temporal analysis [18]. Thus, expression profiling with DNA microarrays can reveal putative co-functional families of genes. Unfortunately, the clustering methods have one crucial disadvantage in that they are not robust to missing values. These algorithms can only function when the data sets are complete. Therefore, missing value estimation is required to use these clustering methods. Alter et al. [19] suggested singular value decomposition (SVD) method for genome-wide data processing and modeling. The SVD provides a mathematical framework for data processing and modeling in which biological meaning can be found. Troyankskaya et al. [20] was able to successfully apply SVD for missing value estimation. Principal component analysis (PCA) is a useful analytic method for identifying a subset of genes that are responsible for the observed transcriptional differences and the distinct pattern underlying the differences [21]. The PCA can visualize multidimensional data by projecting them onto a lower dimensional space. Therefore, transcriptional fingerprints underlying phenotypic variation can be easily visualized. In addition, the evaluation of the principal components can suggest the underlying factors responsible for the phenotypic variations.

Genome-Wide Technologies

13-5

13.2.2 Applications of DNA Microarray DNA microarrays have been used for gene expression profiling in a variety of organisms, including bacteria, yeast, plants, and tissues/cell lines [15]. The primary goals of these experiments were the identification of new genes involved in a pathway of interest, monitoring the differentially expressed genes under the conditions of comparison, identifying the gene regulatory circuits and the genes involved in by examining those showing similar expression patterns. Since DNA microarray permits a genomewide survey in a single assay, it can be considered as a hypothesis generating rather than hypothesis driven experimental technique. Therefore, the expression profiles generated from microarray experiments can be used as a starting point to identify candidate genes for further studies. In addition to providing a broad survey of gene expression levels, transcriptional profiling is also able to reveal the genes showing particular expression patterns, which can be used to generate a hypothesis on the gene function to be validated by further experiments. DNA microarrays have also been successfully used for monitoring transcript levels [22], single nucleotide polymorphism [23], or genomic variations among the different strains [24]. Additionally, DNA microarrays have been applied to investigate physiological changes in tissue/cell lines [25–30]. The patterns produced by mRNA expression profiling can also be used to study the properties of various cellular pathways including regulatory mechanisms [31,32]. Since the mRNA expression profiles obtained by DNA microarray experiments suggest the transcriptional response of a cell to a particular stimulus, the pattern of gene expression can be regarded as the fingerprint of a cell for a particular stimulus. Golub et al. [33] successfully classified the tumors by their expression profiles. van’t Veer et al. [34] showed that gene expression profiles of 117 primary breast tumors could be used to predict the clinical outcome of node-negative breast cancer patients. The DNA microarray is also an efficient tool for new drug discovery, pharmacogenomics, which utilizes global gene expression databases of molecular cell responses to drug exposure [35]. Traditional drug discovery has been performed by identifying a target molecule (typically protein) in a biological pathway followed by developing an inhibitory compound against the target. However, large-scale systematic approaches to drug discovery are now possible by comparing expression of thousands of genes between normal and diseased states, and identifying multiple potential drug targets which allow for the selection of new promising therapeutics for further testing [36]. Also, predictions on the chemosensitivity of cells can be made by analyzing its microarray profile [37]. Furthermore, drug target validation can be performed with the help of a database of deletion mutants [38]. From the metabolic engineering point of view, the DNA microarray proved to be an efficient tool for suggesting novel targets to be engineered for strain improvement and bioprocess development [39,40]. By comparing transcriptome profiles between different strains or between the samples obtained from different conditions, potential target genes or regulatory mechanisms can be identified to engineer the local metabolic pathways for improving the performance of microorganisms. Some of the examples are described below. DNA microarray was used to identify an additional glucose PTS (ptcBAC) in Lactococcus lactis [41]. This additional glucose PTS gene and the glk and ptnABCD genes were disrupted to completely remove the glucose metabolism in the strain. After introducing the lactose metabolic genes, the deletion strain could only ferment the galactose moiety of lactose. Therefore, the resulting strain could be used for the in situ production of glucose. Transcriptome analysis was also used to examine metabolic changes in recombinant Escherichia coli cells producing an insulin-like growth factor I (IGF-I) fusion protein in high cell density culture [40]. By comparatively analyzing the transcriptome data before and after induction, two down-regulated genes after induction were selected for amplification. This resulted in the increase of IGF-I production by several fold. An inverse metabolic engineering strategy using DNA microarrays was applied on Saccharomyces cerevisiae strains [42]. By comparing the mutant strain generated by chemical mutagenesis and its parental strain (metabolically engineered xylose-utilizing S. cerevisiae strain), a number of genes involved in

13-6

Application of Emerging Technologies to Metabolic Engineering

the xylose metabolism were found to have altered expression in the mutant strain. One of the genes with an altered expression, encoding a transcriptional regulator, was subsequently manipulated in the reference strain. This resulted in the manipulated strain to show similar physiological characteristics with the mutant strain. Another inverse metabolic engineering example is the identification of the factors contributing to ethanol tolerance [43]. The transcriptome profile of a strain with increased ethanol tolerance was compared with that of its parental strain under several different growth conditions. Several genes and factors influencing ethanol tolerance were identified: an increase in glycine metabolism, loss of function of a regulatory protein and increased metabolism of serine and pyruvate. DNA microarrays were also used for selecting marker genes for monitoring of fermentation process. By performing a detailed gene expression analysis on the Bacillus subtilis fed-batch fermentation processes with different ratio of casamino acids and ammonia, a few genes were identified for fermentation monitoring, for example, acoA and glnA as an indicator for glucose and nitrogen limitation, respectively [44]. Another application of DNA microarray is the integration of transcriptome data during the simulation of stoichiometric metabolic models to obtain improved flux predictions. The key idea here is that the regulatory information can be obtained from transcriptome data which give additional constraints on the metabolic fluxes in the model. Akesson et al. [45] combined gene expression data with the in silico model of S. cerevisiae to achieve improved predictions of its metabolic behavior. So far, the on-off type regulatory mechanisms (i.e., setting the flux to zero if the corresponding gene is not expressed) have been implemented during the simulation of the genome-scale metabolic model, but it is expected that better algorithms that are capable of incorporating more complicated regulatory mechanisms will be developed in the future.

13.2.3 Reliability and Reproducibility Issues The DNA microarray has proven to be a powerful technique for high throughput comprehensive analysis of thousand to tens of thousands of genes in parallel. However, the nature of microarray has its own requirements in terms of the amount of RNA needed, data acquisition, and normalization techniques. These requirements together with the differences in types and composition of probes, deposition technologies, and labeling and hybridization protocols, prevent direct and reliable comparison of microarray data. To address this issue, Minimal Information About a Microarray Experiment (MIAME), a set of common guidelines for designing and communicating microarray experiments, has been proposed by the Microarray Gene Expression Data Society (http://www.mged.org/miame). DNA microarrays focus on identifying genes differentially expressed under different experimental conditions and classifying the expression patterns by grouping genes. However, the expression levels of different genes cannot be compared to each other because of the differences in hybridization efficiencies of genes having different length, DNA sequence, and mRNA stabilities. One way to solve this problem is the use of reference containing genomic DNA [46] or labeled oligonucleotides of known abundance and complementary to every probe in the array [47]. The lack of linearity and proportionality of mRNA/ cDNA concentration to signal intensity at high concentrations is another reason to make microarray only semi-quantitative. To estimate changes in mRNA abundance, Northern hybridization and RT-PCR can be used as complementary experimental techniques [48,49]. It should be emphasized that an accurate measurement of absolute transcript levels by microarrays is not reliable yet. Although the ratios can be estimated reasonably well, it is necessary to increase the sensitivity and accuracy of the technology to extract more valuable information from the expression data.

13.3 Phenotypic Microarray The genomics, transcriptomics and proteomics technologies allow us to perform global analyses of the important macromolecules in cells. Thanks to these technologies, the information flow from

Genome-Wide Technologies

13-7

DNA to RNA then to protein can be depicted. However, the information flow does not end with proteins in the cell. One more important information category is metabolome, which is not within the scope of this chapter. Readers can refer to several excellent review articles [50–52]. The information flow can proceed further down to determine cellular phenotypes. Like other technologies, traditional phenotype studies could only examine a few components at a time. Although many excellent methods have been developed to understand the phenotypes of an organism, it has been almost impossible to survey all possible components with different combinations. Therefore, a novel technique was needed to give a global view of cellular properties by detecting phenotypic properties of living organisms. The primary goal of the phenotype microarray is to give quantitative measurement of thousands of cellular phenotypes all at once [53].

13.3.1 Fundamentals of Phenotypic Microarray Biolog (Hayward, CA) has developed a high throughput phenotypic microarray (PM) to assay a thousand metabolic and chemical sensitivity phenotypes in a single run within 2 days [53,54]. In PM, the bacterial cells are inoculated into 20 96-well plates; ten plates to assay utilization of various substrates and remaining ten plates to examine sensitivities to chemicals. Because each well contains tetrazolium dye, cell growth can be detected by measuring the reduction of the tetrazolium dye (blue color at reduced state). One of the key ideas of PM is that cell respiration is coupled to a large number and a wide range of cellular phenotypes. Cells must uptake nutrients to survive and then catabolize and/or convert them to produce essential molecular components. By polymerization, these components turn into macromolecules to create cellular components and structures. Therefore, during the growth, an actual physical flow of electrons exists in the cell. The electron flow starts from the carbon source to NADH, then to the electron transport chain of the cell, finally, to the tetrazolium dye [53]. Because the dye turns blue at reduced state via irreversible reaction, cell growth can be monitored indirectly. The intensity of the color, which is proportional to cell growth, can be monitored during the incubation and analyzed to yield quantitative results. For the analysis of the results, the kinetic data from color formation is plotted against time for each well. Two different colors are used to be assigned to each plot; red for the reference and green for the target strain, which is similar to DNA microarray analysis. Comparison of the data between the reference and target strains is achieved by overlaying these plots. Depending on the growth states of the reference and the target strains, the overlaid plots have three different colors: yellow, red, and green, which are again similar to the results of DNA microarray experiments. One obvious example of using the PM is the comparison of phenotypes between the mutant and wild-type strains. During the comparison of the overlaying data, red and green color can be assigned to wild-type and mutant strains, respectively. When equivalent growth is observed for both strains, the overlaid plot will show yellow. However, if either the wild-type or the mutant shows better growth, then the plot will be either red or green, respectively.

13.3.2 Applications of Phenotypic Microarray During the initial studies with PM [55], 39 American Type Culture Collection (ATCC) reference taxa and 45 gram-negative isolates from water samples were successfully identified within 4–24 h. Bochner [53] reported that PM can be used to assay nearly 700 phenotypes of E. coli. Additionally, PM could be used to directly assay the effects of genetic changes in the cell [54]. To explore the extent of phenotypic variations among closely related bacterial strains, the PM was used to analyze a diverse collection of E. coli O157:H7 isolates and other Gram-negative enteric pathogens [56]. The function of gene can be studied using a mutant or the knock-out strain by monitoring the change of cell phenotypes. In general, these mutants or knock-out strains are expected to show one or more altered phenotypic properties when the mutated or deleted gene has a role under the certain conditions. Zhou et al. [57] showed the phenotypic changes in E. coli with a large set of well-defined deletion

13-8

Application of Emerging Technologies to Metabolic Engineering

mutants. During this study, several new phenotypes were identified by PM. Ulrich et al. [58] revealed that cell density-responsive mechanism in Burkholderia thailandensis both positively and negatively affects the metabolism of numerous substrates by PM experiments. The PM has also been applied to revealing the profiles of carbon source utilization in Hypocrea jecorina [59], drug discovery in yeast [60], phenotypic profiling of Staphylococcus aureus mutant strains [61], and global physiological analysis of E. coli under limited carbon and energy source [62]. Recently, a large scale spotted cell microarray was developed by Narayanaswamy et al. [63]. They applied this microarray to define the responses of yeast cells to a mating pheromone. They have successfully created cell chips having 4,848 yeast deletion mutant strains onto coated glass slides using a microarray robot, and identified the genes contributing to certain cellular characteristics. As mentioned above, PM is based on the detection of bacterial growth and respiration. Therefore, it has a limitation for detecting phenotypic changes, such as morphological alterations and motility changes. Nonetheless, PM helps not only in studying microbial diversity, but also in bridging the gap between the phenotypic knowledge and the genome sequences. Particularly, from the metabolic engineering point of view, PM can be used in conjunction with metabolomics to compensate each other. The metabolome represents the entire collection of intracellular and extracellular metabolites under a particular condition. However, there are far fewer metabolites than the number of genes in a cell. For example, S. cerevisiae which contains approximately 6,000 genes has only 560 low-molecular metabolites [64]. Thus, the metabolome represents a combined information originating from the metabolism, potentially giving further insight into the function of genes [65]. However, due to the nature of combined information, it is difficult to relate changes in metabolite concentration to specific genetic changes. Since the PM provides direct observation of phenotype caused by genetic change, combined use of metabolomics and PM will be able to alleviate the drawbacks of integrative information and to give information on new targets for metabolic engineering.

13.4 Proteomics Proteomics is any large-scale protein-based systematic analysis of the proteome or a defined subproteome from a cell, tissue, or entire organism, usually by biochemical methods. The term “proteome” was first introduced in the mid-1990s by Wilkins and Williams to indicate the entire “PROTEin” complement expressed by a genOME” of a cell, tissue, or entire organism [66]. In contrast to conventional biochemical studies that focus on a single protein or simple macromolecular complexes, proteomics takes a broader, more comprehensive and systematic approach to the investigation of biological systems. Proteome analysis provides information about changes in protein levels, changes in protein synthesis, degradation rates or post-translational modifications and protein interactions. Furthermore, proteomics provides the capacity to discover many novel targets that may be even more important than the already known proteins and genes under a given condition. This leads to the development of superior production systems, correlates metabolic pathways and molecular mechanisms for cell survival and production of specific bioproducts, and enables a systems view of the organism under study. Since it is not constrained by prior knowledge, proteomics is again a discovery-based rather than a hypothesisdriven approach.

13.4.1 Fundamentals of Proteomics Two-dimensional gel electrophoresis (2-DE) coupled with mass spectrometry (MS) has been widely used for proteome analysis. 2-DE resolves complex protein mixtures first by isoelectric point and then by size in a gel matrix (Figure 13.3). It has dominated proteome profile analysis for more than 30 years [2]. However, the early 2-D gel studies were extremely limited in their analytical scope, since efficient technologies to identify and further characterize the separated proteins did not exist at that time. With the rapid advances in various technologies, including the availability of large-scale

13-9

Genome-Wide Technologies (a)

(b)

200

Mw (kDa)

OmpF

10 3

pI

10

3

pI

10

Figure 13.3 Two representative subproteomes of E. coli: (a) Whole cellular proteins and (b) secreted proteins into culture medium. Protein samples were separated on a pH 3–10 IPG strip followed by 12 (w/v) SDS-PAGE. The gels were stained with a general protein stain, e.g., silver.

genome sequences and protein databases, the development of database search engines capable of exploiting these databases, and the introduction of high-sensitivity, easy-to-use MS techniques in the early 1990s, it became possible to identify and examine the proteins resolved by 2-DE, and to study proteins at large-scale. Recently, in order to reduce complexity and increase sensitivity in detecting low abundance proteins, proteomics researchers have become increasingly aware of gel-independent technologies combined with subcellular fractionation by n-dimensional chromatographies. The technological and methodological advances in proteome research with gel and gel-independent based approaches, and predictive proteomics including 2-DE, MS, and computational tools were excellently reviewed by Han and Lee [67]. Proteome studies has been focused on mainly three categories: (i) identification of protein components of a sample of interest without quantitative analysis (protein mapping); (ii) quantitative comparison of protein levels in two or more samples (quantitative protein profiling); and (iii) analysis of protein interactions including binary interactions and isolation of macromolecular complexes (protein interaction). Here, the first two categories more relevant to metabolic engineering are reviewed.

13.4.2 Protein Mapping Prior to quantitative comparison, the first task of proteome analyses involves the detection and identification of as many protein components as possible in a sample, making an establishment of a reference map of 2-D gel. The goal is to define the total protein complement of a cell, or a subproteome in subcellular organelles such as liver, lung, or kidney, and in cellular compartment such as mitochondria, ribosomes, or membranes. Recently, the subcellular proteome techniques have become popular due to the additional benefits of reduced sample complexity, ability to identify additional unique proteins, localizing newly discovered proteins to specific organelles, and in some cases, allowing functional validation [68]. Twenty-nine proteome maps, including 18 derived from human, six derived from mouse, and the other five from species such as E. coli (several different narrow pH range of 2-D gel maps), Arabidopsis thaliana, Dictyostelium discoideum, S. aureus, and S. cerevisiae, are available in SWISS-2DPAGE database (http://kr.expasy.org/ch2d/). The E. coli 3.5–10 SWISS-2DPAGE map shows 40% of the E. coli

13-10

Application of Emerging Technologies to Metabolic Engineering

proteome [69], among which 231 proteins have been identified by techniques such as gel comparison, microsequencing, N-terminal sequencing and amino acid composition analysis. The use of narrow range pH gradients (pH 4–5, 4.5–5.5, 5–6, 5.5–6.7, 6–9 and 6–11) was shown to potentially display proteins existing at low levels (up to a few protein molecules per cell), resulting in the discrimination of >70% of the entire E. coli proteome [70]. The number of displayed proteins was higher than that identified by gel-independent based approaches, but not all of the proteins could be identified. Recently, E. coli subcellular proteomics based on 2-DE can be used to assign various proteins to the cytosol, periplasm, inner membrane or outer membrane by biochemical fractionation; this method was used to assemble the largest proteome database to date [71]. Analysis of 2,160 spots revealed 575 unique ORFs, including 151 hypothetical ORFs, 76 proteins of completely unknown functions, and 222 proteins without localization assignments in the Swiss-Prot Database. Several recent studies have focused on the analysis of extracellular proteins from bacteria used in the bioprocess industry such as E. coli W3110 [72], Mannheimia succiniciproducens [73], Bacillus sp. [74], and pathogens such as Helicobacter pylori [75], Edwardsiella tarda [76], S. aureus [77] and Streptococcus pneumonia [78]. The secretory proteome of the pathogenic and nonpathogenic strains may provide information on physiology applicable in the study of both pathogens and strains used for production of recombinant proteins. For example, extracellular proteins can be used as fusion partner to secret a target protein into culture medium for readily protein purification in downstream processes. As shown in Figure 13.3, one of highly abundant proteins among extracellular proteins in E. coli BL21 (DE3), OmpF, was successfully used for secretory production of recombinant human β-endorphin into culture medium [77]. However, the utility of 2-DE approach is limited by the ability to reliably monitor proteins that are present in very low abundance in a sample, as well as proteins that are very hydrophobic or that are very acidic or basic [80]. Low copy number proteins might represent key regulatory molecules within cells or signaling molecules in tissues and organs, driving the development of advanced proteomic methodologies or technologies to detect low abundance of proteins; for instance sequential extractions with increasingly stronger solubilization solutions, subcellular fractionation, selective removal of the most abundant protein components, preparative isoelectric focusing separations, and chromatographic fractionation of sample mixtures [81]. Hydrophobic proteins that are present in membranes might have key roles in communicating extracellular information to the inside of cells. Molloy et al. [82] introduced a new isolation method of sequential extractions with increasing concentrations of sodium carbonate in analyzing E. coli outer membrane proteins (OMPs). This led to the successful identification of 21 out of 26 of the predicted integral OMPs. The largest database of E. coli membrane proteins constructed to date is that reported by Fountoulakis and Gasser [83] who identified 394 different gene products using the same method described by Molloy et al. [82]. Notably, these studies demonstrate that membrane proteins, which are commonly absent from 2-D gel maps, are amenable to 2-DE separation using specific techniques. From proteome profiling of these OMPs, high abundant proteins, such as E. coli OmpA [84] and FadL [85], and Pseudomonas aeruginosa OprF [86] have been widely used as an anchoring motif of cell surface display in microorganisms. In pathogenic organisms, membrane proteins are a target that prevents transfer of regulatory signaling or infect toxic products into host cells. The preferred analytical method for protein composition is multidimensional chromatography coupled with tandem MS. Gevaert et al. [87] identified 800 E. coli proteins from sorted methioninecontaining peptides using the combined technologies consisting of combined fractional diagonal chromatography (COFRADIC), LC-MS/MS and MALDI-TOF-MS. More than 1,100 E. coli proteins (about a quarter of those encoded in the E. coli genome) were identified by high performance liquid chromatography (HPLC)-MS/MS analysis [88]. Perhaps the most popular of these techniques to date is the multidimensional protein identification technology, often referred to as MudPIT [89]. In this method, mixtures of trypsin-digested peptides are loaded onto a biphasic microcapillary column containing a strong cation exchange resin upstream of a reverse-phase resin directly coupled to a MS/MS. Peptides

Genome-Wide Technologies

13-11

are displaced from the strong cation exchange resin using a salt step gradient, and subsequently bind to the reverse-phase resin. Elution from the reverse-phase resin is accomplished using an acetonitrile gradient, and the peptides are analyzed online by MS/MS. Repeated rounds of step and gradient elutions can result in analysis and identification of a large number of peptides in a single run. Thus, the gel-independent approaches have clear superiority over 2-DE methods in detection and identification of multiple protein components in a proteome. However, because of the complexity of any given proteome and the separation limits of 1-D or 2-D LC, it is still required to reduce the complexity prior to protein separation and characterization.

13.4.3 Quantitative Protein Profiling The most common type of proteomic studies is quantitative protein profile comparison of the samples obtained from two or more experimental conditions such as normal versus diseased cells or tissues, or responses to environmental stimuli or stresses. Up- and down-regulation of specific proteins in response to a number of chemical and physical stresses, such as heat, oxidative agents and hyperosmotic shock, can be monitored. These responses are thought to act as protective mechanisms leading to the elimination of stress agent and/or repair of cellular damage. The cellular responses, as reflected by the proteome, can differ widely according to the stress imposed. Thus, comparative proteome profiling under various genotypic and environmental conditions can reveal new regulatory circuits and the relative abundance of proteins at the system-wide level. Most of protein profiling data have been currently obtained using 2-DE. 2-DE is still a core technology for quantitative comparisons of proteins from two or more closely related experimental samples. Recently, to overcome shortcomings in conventional 2-DE method, 2-D difference gel electrophoresis (DIGE) introduced by Ünlü et al. [90] was used in various applications in quantitative proteomics including the analysis of bacteria [91], yeast [92], mouse brain [93], cat visual cortex [94], human breast cancer cells [95], and human colon cancer [96]. The basis of the technique is the use of two or three massand charge-matched N-hydroxy succinimidyl ester derivatives of the fluorescent cyanine dyes Cy2, Cy3 and Cy5 that possess distinct excitation and emission spectra. Protein samples for comparative analysis are labeled with different fluorescent dyes, mixed together, co-separated, and visualized on a single 2-D gel using excitation and emission wavelengths for image acquisition that are specific for each fluorescent tag. This technique avoids the complications of gel-to-gel variation seen in conventional 2-D gel experiments, reduces the number of gels that need to be run, and enables detection of quantitative differences in incompletely resolved spots. These fluorescent dyes also have a wide dynamic range and are more sensitive than most other detection methods. However, similar to conventional 2-DE, even the large 2-D DIGE gel can typically resolve only about 1,500 to 2,000 spots or less and detect only a small portion of complex proteomes. As an alternative method to 2-DE, multiple gel-independent approaches were developed for largescale and throughput quantitative comparison. In these methods, two samples may be labeled with stable isotopes prior to sample separation, either by metabolic incorporation or through chemical derivatization. In this way, proteins derived from the different samples (e.g., normal versus abnormal or untreated versus treated samples) can be directly separated, identified, and quantified using nLC-MS/ MS. An attractive method for quantitative comparison of two proteomes is the isotope-coded affinity tag (ICAT) method [97,98]. The ICAT reagent has a protein-reactive group, a biotin tag and an ethylene glycol linker connecting the two functional groups, which can be synthesized with hydrogen (light ICAT) or deuterium (heavy ICAT). For comparison, one sample is reacted with the light reagent and the second sample is reacted with the heavy reagent under identical labeling conditions. After trypsin digestion, the extremely complex tryptic peptide mixture is simplified by affinity purification of the cysteinecontaining derivatized peptides on an avidin affinity resin. The eluted peptides are then analyzed using LC-MS/MS or 2-D LC-MS/MS. The ratios of MS signals from the light and heavy ICAT-labeled forms of the same peptide are compared to determine the relative abundance of the parent protein in the

13-12

Application of Emerging Technologies to Metabolic Engineering

respective samples, and MS/MS is used to identify the proteins. An alternative isotopic labeling method includes: 16O or 18O incorporation from H216O or H218O, respectively, at the carboxyl terminus of peptides during proteolytic cleavage by trypsin [99,100]; and stable-isotope metabolic protein labeling in which cells can be cultured in isotopically defined media [101,102]. Recently, a multiplexed protein quantification strategy that provides relative and absolute measurements of proteins in complex mixtures was developed by Ross et al. [103]. The multiplex strategy simultaneously determines the relative levels of proteins at multiple states (e.g. several experimental controls or time-course studies) up to four samples in parallel. A multiplexed set of isobaric reagents that yield amine-derivatized peptides (iTRAQ reagents; Applied Biosystems, CA) was used for labeling at the N-termini and lysine side chains of peptides in a digest mixture. The derivatized peptides are indistinguishable in MS, but exhibit intense low-mass MS/MS signature ions that support quantification. Absolute quantification of targeted proteins can also be achieved using synthetic peptides tagged with one of the members of the multiplex reagent set. Thus, these advances in proteomics technologies led to the generation of unprecedentedly large amounts of proteome data, which are used for global examination of cellular metabolism in fundamental as well as applied researches. Many proteomic studies revealed changes in proteome profiles in response to various stresses, such as changes in pH, oxygen, cell density, temperature, organic solvents, or nutrient starvation [67]. These studies resulted in the identification of various stress-induced proteins. Proteomic analysis has also been used to directly monitor cellular changes occurring during the production of heterologous proteins in microorganisms and develop efficient strains for the enhanced production of bioproducts and biodegradable polymers. Initial effort was made in “trial-and-error” type approaches, in which various genetic modifications are repeatedly tried until a desired objective is achieved. However, since bioproducts are formed by coordinated enzyme reactions acting through the metabolic pathways, it is essential to understand the metabolism and regulation that occur during cell growth and product formation. Recently, these investigations have been streamlined with the use of new high-throughput analytical, molecular biological, and mathematical tools, all of which have been combined to facilitate development of “custom-made” production systems in microorganisms. In this important context, proteome analysis enables estimation of whole protein (enzyme) expression levels, facilitating the construction and validation of metabolic pathways that researchers can use to elucidate which molecules supply the energy and building blocks or precursors (e.g., amino acids and other metabolic intermediates) necessary for cell function and product formation. For example, Han et al. [104] reported that protein profiling of recombinant E. coli during the overproduction of human leptin was used to identify a target gene, leading to the development of a successful metabolic engineering strategy for achieving the increased productivity of leptin and other serine-rich proteins by coexpression of the cysK gene. More recently, the physiological changes of recombinant E. coli during secretory production of a recombinant humanized antibody fragment were monitored by 2-DE [105]. Twenty-five protein spots were differentially expressed in the control and production fermentations at 72 h, while 19 other protein spots were present only in the control or production fermentation at this time. The synthesis of the stress protein, phage shock protein A (PspA), was strongly correlated with the synthesis of a recombinant product. Coexpression of the pspA gene with a recombinant antibody fragment in E. coli significantly improved the yield of the secreted biopharmaceutical. Thus, it appears that proteomics can be effectively used to identify candidates for successful metabolic engineering toward enhanced bioproducts formation. In addition, proteomic studies for analyzing the composition of inclusion bodies (IBs) have been carried out in order to improve the quality (or uniformity) of the desired product, and the downstream process of recombinant proteins such as protein purification and refolding. Indeed, proteomic studies led to the enhanced production of recombinant proteins including IBs and secretory proteins, and improved industrial processes. For example, two small heat shock proteins (sHsps), IbpA and IbpB, were first identified by the conventional biochemical technique as the major proteins associated with the IBs of recombinant proteins produced in E. coli [106]. IbpA and IbpB were recently found

13-13

Genome-Wide Technologies

to facilitate the production of recombinant proteins in E. coli and play important roles in protecting recombinant proteins from degradation by cytoplasmic proteases [107]. Amplification of the ibpA and/ or ibpB genes enhanced production of recombinant proteins as IBs, whereas ibpAB gene knock-out enhanced the secretory production of recombinant proteins as soluble forms. More recently, LeThanh et al. [108] reported similar results with Han et al. [107] that α-glucosidase production was enhanced at elevated IbpA and IbpB levels, and reduced in ibpAB negative mutant strain in a temperature-dependent manner. Also, it was revealed that IbpA and IbpB prevent IBs of α-glucosidase from degradation in a temperature-dependent manner. These findings suggest that manipulation of ibpAB gene expression may prove to be a valuable new technique for fine-tuning the production of recombinant proteins in E. coli. In addition, these results demonstrate the effectiveness of employing proteome profiling in the development of production strains suitable for industrial applications. The use of sHsps has recently been extended to significantly enhance the performance of 2-DE [109]. Proteolytic degradation is one of the critical problems in 2-DE. Loss of protein spots in 2-D gels due to residual protease activity is commonly observed when using immobilized pH gradient gels for isoelectric focusing. Three sHsps, IbpA and IbpB from E. coli and Hsp26 from S. cerevisiae, were found to be able to protect proteins in vitro from proteolytic degradation. Addition of sHsps during 2-DE of human serum or whole cell extracts of bacteria (E. coli, M. succiniciproducens), plant A. thaliana, and human kidney cells allowed detection of up to 50% more protein spots than those obtained with currently available protease inhibitors. This may change the way proteome profiling is carried out by generally enabling the detection of many more protein spots that could not be seen previously. Taken together, these findings and other reports continue to emphasize the fact that proteomics is likely to become increasingly important not only in pure biological research but also in various biotechnological applications.

13.5 Combined Genome-Wide Analysis Genome-wide technologies allow large-scale analysis of biological systems at transcript, protein, metabolite, and phenotype levels, leading to a wealth of information that is useful in discovery-based science. Ideally, information obtained from each of these technologies must be integrated to establish a deeper understanding of the relationship between genotype and any particular phenotype. Several successful examples of engineering of strains to enhance the production of recombinant proteins or metabolites from the understanding of cellular physiology and metabolism using combined approaches of transcriptome, proteome, metabolome, or fluxome have been reported (Table 13.1). High cell density cultivation (HCDC) is often practiced in industry because it can increase the productivity of desired products with several other advantages of reduced reactor volume and reduced wastes. Cell density is an important factor affecting microbial physiology due to the availability of various nutrients and possible cell to cell signaling processes. Cells during HCDC often experience different kinds of undesirable conditions, including the fluctuating concentrations of medium components such as glucose and oxygen, or accumulation of byproducts such as acetic acid and lactic acid, which are not Table 13.1 Representative Examples for Enhanced Production of a Target through the Combined GenomeWide Analysis Technologies DNA microarrays and 2-DE DNA microarrays, 2-DE, and/ or in silico simulation DNA microarrays and metabolic flux analysis DNA microarrays and metabolic flux analysis

Target L-threonine α-hemolysin Lovastatin L-lysine

Description Comparative analysis of a parent E. coli strain and amino acid overproducing its mutant strain Comparative analysis of a parent E. coli strain and active α-hemolysin overproducing its mutant strain Comparative analysis of wild-type A. terreus and its lovastatin overproducing strain Comparative analysis of L-lysine producing C. glutamicum

Reference [110] [111] [112] [113]

13-14

Application of Emerging Technologies to Metabolic Engineering

beneficial for cell growth. These unfavorable conditions have been known to induce stress responses in microbial cells, change the cellular protein composition, and consequently, lead to the decrease of specific productivity as cell density increases. To find the specific reasons for this phenomenon, Yoon et al. [39] reported the results of a combined transcriptome and proteome analyses during the HCDC of E. coli. It was found that the expression of most of amino acid biosynthesis genes was down-regulated as cell density increased. This finding will be important to carefully examine amino acid biosynthetic pathways and the reactions within as some of them may be rate controlling steps in the production of recombinant proteins. Another integrated analysis of transcriptome and proteome profiles was carried out for E. coli W3110 and its L-threonine-overproducing mutant strain [110]. This study found that the expression of genes involved in glyoxylate shunt, the tricarboxylic acid (TCA) cycle and amino acid biosynthesis were significantly up-regulated whereas ribosomal protein genes were down-regulated. In addition, mutation in the thrA and ilvA genes was suggested to be important for the overproduction of L-threonine. This combined analysis provided valuable information regarding the regulatory mechanism of L-threonine production and the physiological changes in the mutant strain. Recently, a combined analysis of proteome, transcriptome, metabolome, and/or in silico simulation of genome-scale metabolic network was used to engineer an E. coli strain [111]. The E. coli mutant strain obtained by random mutagenesis, which secretes four-fold more active α-hemolysin (HlyA) than its parent strain, was characterized using both high-density microarray for mRNA profiling and a proteomic strategy for protein expression. The relative mRNA and protein expression levels of tRNA-synthetases including AsnS, Asps, LysS, PheT, and TrpS were decreased in the mutant compared to the parent. This combined examination of the mRNA and protein expression profiles showed that downregulation of the tRNA-synthetases in the mutant slowed the general translation rate, and more specifically slowed the rate of HlyA synthesis. Improved secretion of α-hemolysin at low synthesis rate is attributable to a balance between translation and secretion. The use of rare codons in hlyA gene has been shown to reduce its rate of translation, because the number of available aminoacyl tRNAs is limited. A variant of the hlyA gene by altering five bases but encoding the same amino acid sequence was designed using a mathematical model of prokaryotic translation. In this way, the rate of translation could be artificially slowed down, leading to further improved secretory production of α-hemolysin. A combined analysis of transcriptome and metabolome was reported to develop an Aspergillus strain overproducing lovastatin, a cholesterol-lowering drug [112]. Improved lovastatin production was initiated by generating a library of strains by expressing the genes thought to be involved in lovastatin synthesis or known to broadly affect secondary metabolite production in the parental strain. These strains were characterized by metabolome and transcriptome profiling, followed by a statistical association analysis to extract potential key parameters affecting the production of lovastatin and (C)-geodin. Using this approach, target genes were identified and manipulated to improve lovastatin production by greater than 50%. In another study, Krömer et al. [113] performed combined transcriptome, metabolome, and fluxome analysis of L-lysine producing Corynebacterium glutamicum at different stages of batch culture. A decrease in glucose uptake rate resulted in the shift of cellular activities from growth to L-lysine production, redirecting the metabolic fluxes from the TCA cycle toward anaplerotic carboxylation and lysine biosynthesis. During this shift, the intracellular metabolite pools exhibited transient dynamics, including an increase of L-lysine up to 40 mM before its excretion to the medium. The expression levels of most genes involved in L-lysine biosynthesis remained constant whereas the metabolic fluxes showed marked changes, suggesting that metabolic fluxes are strongly regulated at the metabolic level. These are some of the good examples of integrating omics studies and computational analysis toward the development of improved strains. As recently reviewed by Lee et al. [6], this systems approach towards strain development will become more and more popular as it allows identification of new targets for metabolic engineering in more systematic manner (Figure 13.4).

13-15

Genome-Wide Technologies

•Genome sequencing •ORF annotation and prediction

•Protein level •Enzyme activity

Genomics

Proteomics

Metabolic engineering

Transcriptomics • Transcription level • Regulation

Rational design • Reconstruction of metabolic pathway

Figure 13.4 Metabolic engineering with combined omics analyses. Since each omics technology has different perspective of the cell, to understand whole cellular physiology and mechanism, integrated usage of omics technologies is demanded to generate valuable information which can be used for efficient strain development.

13.6 Conclusions and Future Prospects Genome-wide technologies are becoming not only popular but also important research and screening tools for studying differentially expressed genes and proteins and final phenotypes. Their ability to monitor hundreds to tens of thousands of biological characteristics simultaneously is unsurpassed. However, some limitations still exist in current technologies and have become more apparent. Although the existence and trend of the changes in biological characteristics can be reliably detected in their appropriate sensitivity range, accurate measurements of absolute expression levels and the reliable detection of low abundance genes, proteins, or phenotypes are still difficult. Due to these limitations, tremendous volumes of data generated from genome-wide technologies contain large number of false positives and/or false negatives. For this reason, more highly sensitive and accurate instruments will need to be continuously developed. Many cutting edge biological and biotechnological studies are currently driven by the highthroughput acquisition and examination of omics data supported by systems biological and bioinformatic analyses [6]. Because each technology alone is not good enough to understand cellular physiology and regulatory mechanisms as a whole, combined analysis will become increasingly important. This

13-16

Application of Emerging Technologies to Metabolic Engineering

type of integrated analysis will lead to an understanding of cellular physiology and metabolism at the systems level, and will pave the way toward more efficient metabolic engineering.

Acknowledgments Our work described in this chapter was supported by the Korean Systems Biology Research Grant (M10309020000-03B5002-00000) of the Ministry of Science and Technology. Further supports by LG Chem Chair Professorship, IBM SUR program, Microsoft, KOSEF through the Center for Ultramicrochemical Process Systems, and by Brain Korea 21 project are appreciated.

References 1. Fleischmann, R.D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269, 496, 1995. 2. O’Farrell, P.H. High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem., 250, 4007, 1975. 3. Bochner, B.R. Sleuthing out bacterial identities. Nature, 339, 157, 1989. 4. Fodor, S.A. et al. Multiplexed biochemical assays with biological chips. Nature, 364, 555, 1993. 5. Kitano, H. Systems biology: a brief overview. Science, 295, 1662, 2002. 6. Lee, S.Y., Lee, D.Y., and Kim, T.Y. Systems biotechnology for strain improvement. Trends Biotechnol. 23, 349, 2005. 7. Brown, P.O. and Botstein, D. Exploring the new world of the genome with DNA microarrays. Nat. Genet., 21, 33, 1999. 8. Lockhart, D.J. and Winzeler, E.A. Genomics, gene expression and DNA arrays. Nature, 405, 827, 2000. 9. Singh-Gasson, S. et al. Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nat. Biotechnol., 17, 974, 1999. 10. Lipshutz, R.J. et al. High density synthetic oligonucleotide arrays. Nat. Genet., 21, 20, 1999. 11. Hughes, T. R. et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol., 19, 342, 2001. 12. Schena M. Microarrays: biotechnology’s discovery platform for functional genomics. Trends Biotechnol., 16, 301, 1998. 13. Foy, C.A., and Anderson, M.T. The development of microarray standards. Anal. Bioanal. Chem. 381, 87, 2005. 14. Draghici, S. et al. Reliability and reproducibility issues in DNA microarray measurements. Trends Genetics, 22, 101, 2006. 15. Andrew, J.H. et al. Options available-from start to finish-for obtaining data from DNA microarrays II. Nat. Genet. Suppl., 32, 481, 2002. 16. Eisen, M.B. et al. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863, 1988. 17. Tamayo, P. et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA, 96, 2907, 1999. 18. DeRisi, J.L., Iyer, V.R., and Brown, P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278, 680, 1997. 19. Alter, O., Brown, P., and Botstein, D. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA, 97, 10101, 2000. 20. Troyankskaya, O. et al. Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520, 2001. 21. Raychaudhuri, S., Stuart, J.M., and Altman, R.B. Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac. Symp. Biocomput., 455, 2000.

Genome-Wide Technologies

13-17

22. Dharmadi, Y. and Gonzalez, R. DNA microarrays: experimental issues, data analysis, and application to bacterial systems. Biotechnol. Prog., 20, 1309, 2004. 23. Syvanen, A.C. Toward genome-wide SNP genotyping. Nat. Genet. Suppl., S5, 2005. 24. Edwards-Ingram, L.C. et al. Comparative genomic hybridization provides new insights into the molecular taxonomy of the Saccharomyces sensu stricto complex. Genome Res., 14, 1043, 2004. 25. Karaca, G. et al. Herpesvirus of turkeys: microarray analysis of host gene responses to infection. Virology, 318, 102, 2004. 26. Fu, M. et al. Egr-1 target genes in human endothelial cells identified by microarray analysis. Gene, 315, 33, 2003. 27. Sandler, N.G. et al. Global gene expression profiles during acute pathogen-induced pulmonary inflammation reveal divergent roles for Th1 and Th2 responses in tissue repair. J. Immunol., 171, 3655, 2003. 28. Miki, R. et al. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc. Natl. Acad. Sci. USA, 98, 2199, 2001. 29. Ahrendt, S.A. et al. Rapid p53 sequence analysis in primary lung cancer using an oligonucleotide probe array. Proc. Natl. Acad. Sci. USA, 96, 7382, 1999. 30. Lock, C. et al. Gene-microarray analysis of multiple sclerosis lesions yields new targets validated in autoimmune encephalomyelitis. Nature Med., 8, 500, 2002. 31. Tummala, S.B. et al. Transcriptional analysis of product concentration driven changes in cellular programs of recombinant Clostridium acetobutylicum strains. Biotechnol. Bioeng., 84, 842, 2003. 32. Roth, F.P. et al. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol., 16, 939, 1998. 33. Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531, 1999. 34. van’t Veer, L.J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530, 2002. 35. Weinstein, J.N. Pharmacogenomics-teaching old drugs new tricks. N. Engl. J. Med., 343, 1408, 2000. 36. Scherf, U. et al. A gene expression database for the molecular pharmacology of cancer. Nat. Genet., 24, 236, 2000. 37. Staunton, J.E. et al. Chemosensitivity prediction by transcriptional profiling. Proc. Natl. Acad. Sci. USA, 98, 10787, 2001. 38. Lennon, G.G. High-throughput gene expression analysis for drug discovery. Drug Discov. Today, 5, 59, 2000. 39. Yoon, S.H. et al. Combined transcriptome and proteome analysis of Escherichia coli during high cell density culture. Biotechnol. Bioeng., 81, 753, 2003. 40. Choi, J.H. et al. Enhanced production of insulin-like growth factor I fusion protein in Escherichia coli by the coexpression of the down–regulated genes identified by transcriptome profiling. Appl. Environ. Microbiol., 69, 4737, 2003. 41. Pool, W.A. et al. Natural sweetening of food products by engineering Lactococcus lactis for glucose production. Metab. Eng., 8, 456, 2006. 42. Wahlbom, F. et al. Molecular analysis of a Saccharomyces cerevisiae mutant with improved ability to utilise xylose shows enhanced expression of proteins involved in transport, initial xylose metabolism and the pentose phosphate pathway. Appl. Environ. Microbiol., 69, 740, 2003. 43. Gonzalez, R. Gene array-based identification of changes that contribute to ethanol tolerance in ethanologenic Escherichia coli: comparison of KO11 (parent) to LY01 (resistant mutant). Biotechnol. Prog., 19, 612, 2003. 44. Jurgen, B. et al. Global expression profiling of Bacillus subtilis cells during industrial-close fed-batch fermentations with different nitrogen sources. Biotechnol. Bioeng., 92, 277, 2005.

13-18

Application of Emerging Technologies to Metabolic Engineering

45. Akesson, M., Forster, J., and Nielsen, J. Integration of gene expression data into genome-scale metabolic models. Metab. Eng. 6, 285, 2004. 46. Wei, Y. et al. High-density microarray-mediated gene expression profiling of Escherichia coli. J. Bacteriol., 183, 545, 2001. 47. Dudley, A.M. et al. Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range. Proc. Natl. Acad. Sci. USA, 99, 7554, 2002. 48. Helmann, J.D. et al. Global transcriptional response of Bacillus subtilis to heat shock. J. Bacteriol., 183, 7318, 2001. 49. Khil, P.P. and Camerini-Otero, R.D. Over 1000 genes are involved in the DNA damage response of Escherichia coli. Mol. Microbiol., 44, 89, 2002. 50. Glassbrook, N., Beecher, C., and Ryals, J. Metabolic profiling on the right path. Nat. Biotechnol., 18, 1142, 2000. 51. Dunn, W.B., Bailey, N.J., and Johnson, H.E. Measuring the metabolome: current analytical technologies Analyst, 130, 606, 2005. 52. Last, R.L., Jones, A.D., and Shachar-Hill, Y. Towards the plant metabolome and beyond. Nat. Rev. Mol. Cell Biol., 8, 167, 2007. 53. Bochner, B.R., Gadzinski, P., and Panomitros, E. Phenotype microarrays for high-throughput phenotypic testing and assay for gene function. Genome Res., 11, 1246, 2001. 54. Bochner, B.R. New technologies to assess genotype-phenotype relationships. Nature Rev. Genetics, 4, 309, 2003. 55. Klingler, J.M. et al. Evaluation of the biolog automated microbial identification system. Appl. Environ. Microbiol., 58, 2089, 1992. 56. Mukherjee, A. et al. Exploring genotypic and phenotypic diversity of microbes using microarray approaches. Toxicol. Mech. Methods, 16, 121, 2006. 57. Zhou, L. et al. Phenotypic microarray analysis of Escherichia coli K-12 mutants with deletions of all two-component systems. J. Bacteriol., 185, 4956, 2003. 58. Ulrich, R.L. et al. Mutational analysis and biochemical characterization of the Burkholderia thailandensis DW503 quorum-sensing network. J. Bacteriol., 186, 4350, 2004. 59. Druzhinina, I. et al. Global carbon utilization profiles of wild type, mutant and transformant strains of Hypocrea jecorina. Appl. Environ. Microbiol., 72, 2126, 2006. 60. Outeiro, T.F. and Giorgini, F. Yeast as a drug discovery platform in huntington’s and parkinson’s diseases. Biotechnol. J., 1, 1, 2006. 61. von Eiff, C. et al. Phenotype microarray profiling of Staphylococcus aureus menD and hemB mutants with the small-colony-variant phenotype. J. Bacteriol., 188, 687, 2006. 62. Ihssen, J. and Egli, T. Global physiological analysis of carbon- and energy-limited growing Escherichia coli confirms a high degree of catabolic flexibility and preparedness for mixed substrate utilization. Appl. Environ. Microbiol., 7, 1568, 2005. 63. Narayanaswamy, R. et al. Systematic profiling of cellular phenotypes with spotted cell microarrays reveals mating-pheromone response genes. Genome Biol., 7, R6, 2006. 64. Bro, C. and Nielsen, J. Impact of “ome” analyses on inverse metabolic engineering. Metab. Eng., 6, 204, 2004. 65. Förster, J., Gombert, A.K., and Nielsen, J. A functional genomics approach using metabolomics and in silico pathway analysis. Biotechnol. Bioeng., 79, 703, 2002. 66. Wilkins, M.R. et al. Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotechnol. Genet. Eng. Rev., 13, 19, 1996. 67. Han, M.J. and Lee, S.Y. The Escherichia coli proteome: past, present, and future prospects. Microbiol. Mol. Biol. Rev., 70, 362, 2006. 68. Taylor, S.W., Fahy, E., and Ghosh, S.S. Global organellar proteomics. Trends Biotechnol., 21, 82, 2003. 69. Tonella, L. et al. ‘98 Escherichia coli SWISS-2DPAGE database update. Electrophoresis, 19, 1960, 1998.

Genome-Wide Technologies

13-19

70. Tonella, L. et al. New perspectives in the Escherichia coli proteome investigation. Proteomics, 1, 409, 2001. 71. Lopez-Campistrous, A. et al. Localization, annotation and comparison of the Escherichia coli K-12 proteome under two states of growth. Mol. Cell. Proteomics, 4, 1205, 2005. 72. Nandakumar, M.P., Cheung, A., and Marten, M.R. Proteomic analysis of extracellular proteins from Escherichia coli W3110. J. Proteome Res., 5, 1155, 2006. 73. Lee, J.W. et al. The proteome of Mannheimia succiniciproducens, a capnophilic rumen bacterium. Proteomics, 6, 3550, 2006. 74. Tjalsma, H. et al. Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome. Microbiol. Mol. Biol. Rev., 64, 515, 2000. 75. Bumann, D. et al. Proteome analysis of secreted proteins of the gastric pathogen Helicobacter pylori. Infect. Immun., 70, 3396, 2002. 76. Tan, Y.P. et al. Comparative proteomic analysis of extracellular proteins of Edwardsiella tarda. Infect. Immun., 70, 6475, 2002. 77. Ziebandt, A.K. et al. Extracellular proteins of Staphylococcus aureus and the role of SarA and sigma B. Proteomics, 1, 480, 2001. 78. Len, A.C. et al. Cellular and extracellular proteome analysis of Streptococcus mutans grown in a chemostat. Proteomics, 3, 627, 2003. 79. Jeong, K.J. and Lee, S.Y. Excretion of human beta-endorphin into culture medium by using outer membrane protein F as a fusion partner in recombinant Escherichia coli. Appl. Environ. Microbiol., 68, 4979, 2002. 80. Harry, J.L. et al. Proteomics: capacity versus utility. Electrophoresis, 21, 1071, 2000. 81. Zuo, X., Lee, K., and Speicher, D.W. Electrophoretic prefractionation for comprehensive analysis of proteomes In Proteome Analysis: Interpreting the Genome, Speicher, D.W., Ed. Elsevier Science, New York, 2004, 93. 82. Molloy, M.P. et al. Proteomic analysis of the Escherichia coli outer membrane. Eur. J. Bioche., 267, 2871, 2000. 83. Fountoulakis, M. and Gasser, R. Proteomic analysis of the cell envelope fraction of Escherichia coli. Amino Acids, 24, 19, 2003. 84. Lee, S.Y., Choi, J.H., and Xu, Z. Microbial cell-surface display. Trends Biotechnol., 21, 45, 2003. 85. Lee, S.H. et al. Display of bacterial lipase on the Escherichia coli cell surface by using FadL as an anchoring motif and use of the enzyme in enantioselective biocatalysis. Appl. Environ. Microbiol., 70, 5074, 2004. 86. Lee, S.H. et al. Display of lipase on the cell surface of Escherichia coli using OprF as an anchor and its application to enantioselective resolution in organic solvent. Biotechnol. Bioeng., 90, 223, 2005. 87. Gevaert, K. et al. Chromatographic isolation of methionine-containing peptides for gel-free proteome analysis: identification of more than 800 Escherichia coli proteins. Mol. Cell. Proteomics, 1, 896, 2002. 88. Corbin, R.W. et al. Toward a protein profile of Escherichia coli: comparison to its transcription profile. Proc. Natl. Acad. Sci. USA, 100, 9232, 2003. 89. Washburn, M.P., Wolters, D., and Yates, J.R. III. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol., 19, 242, 2001. 90. Ünlü, M., Morgan, M.E., and Minden, J.S. Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis, 18, 2071, 1997. 91. Yan, J.X. et al. Fluorescence two-dimensional difference gel electrophoresis and mass spectrometry based proteomic analysis of Escherichia coli. Proteomics, 2, 1682, 2002. 92. Hu, Y. et al. Proteome analysis of Saccharomyces cerevisiae under metal stress by two-dimensional differential gel electrophoresis. Electrophoresis, 24, 1458, 2003. 93. Skynner, H.A. et al. Alterations of stress related proteins in genetically altered mice revealed by two-dimensional differential in-gel electrophoresis analysis. Proteomics, 2, 1018, 2002.

13-20

Application of Emerging Technologies to Metabolic Engineering

94. Van den, B.G. et al. Fluorescent two-dimensional difference gel electrophoresis and mass spectrometry identify age-related protein expression differences for the primary visual cortex of kitten and adult cat, J. Neurochem., 85, 193, 2003. 95. Gharbi, S. et al. Evaluation of two-dimensional differential gel electrophoresis for proteomic expression analysis of a model breast cancer cell system. Mol. Cell. Proteomics., 1, 91, 2002. 96. Friedman, D.B. et al. Proteome analysis of human colon cancer by two-dimensional difference gel electrophoresis and mass spectrometry. Proteomics, 4, 793, 2004. 97. Gygi, S.P. et al. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol., 17, 994, 1999. 98. Zhou, H. et al. Quantitative proteome analysis by solid-phase isotope tagging and mass spectrometry. Nat. Biotechnol., 20, 512, 2002. 99. Mirgorodskaya, O.A. et al. Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using 18O-labeled internal standards. Rapid Commun. Mass Spectrom., 14, 1226, 2000. 100. Yao, X. et al. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal. Chem., 73, 2836, 2001. 101. Oda, Y. et al. Accurate quantitation of protein expression and site-specific phosphorylation. Proc. Natl. Acad. Sci. USA, 96, 6591, 1999. 102. Washburn, M.P. et al. Analysis of quantitative proteomic data generated via multidimensional protein identification technology. Anal. Chem., 74, 1650, 2002. 103. Ross, P.L. et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics, 3, 1154, 2004. 104. Han, M.J. et al. Engineering Escherichia coli for increased productivity of serine-rich proteins based on proteome profiling. Appl. Environ. Microbiol., 69, 5772, 2003. 105. Aldor, I.S. et al. Proteomic profiling of recombinant Escherichia coli in high-cell-density fermentations for improved production of an antibody fragment biopharmaceutical. Appl. Environ. Microbiol., 71, 1717, 2005. 106. Allen, S.P. et al. Two novel heat shock genes encoding proteins produced in response to heterologous protein expression in Escherichia coli. J. Bacteriol., 174, 6938, 1992. 107. Han, M.J. et al. Roles and applications of small heat shock proteins in the production of recombinant proteins in Escherichia coli. Biotechnol. Bioeng., 88, 426, 2004. 108. LeThanh, H., Neubauer, P., and Hoffmann, F. The small heat-shock proteins IbpA and IbpB reduce the stress load of recombinant Escherichia coli and delay degradation of inclusion bodies. Microb. Cell Fact. 4, 6, 2005. 109. Han, M.J., Lee, J.W., and Lee, S.Y. Enhanced proteome profiling by inhibiting proteolysis with small heat shock proteins. J. Proteome Res., 4, 2429, 2005. 110. Lee, J.H. et al. Global analyses of transcriptomes and proteomes of a parent strain and an L-threonineoverproducing mutant strain. J. Bacteriol., 185, 5442, 2003. 111. Lee, P.S. and Lee, K.H. Engineering HlyA hypersecretion in Escherichia coli based on proteomic and microarray analyses. Biotechnol. Bioeng., 89, 195, 2005. 112. Askenazi, M. et al. Integrating transcriptional and metabolite profiles to direct the engineering of lovastatin-producing fungal strains. Nat. Biotechnol., 21, 150, 2003. 113. Krömer, J.O. et al. In-depth profiling of lysine-producing Corynebacterium glutamicum by combined analysis of the transcriptome, metabolome, and fluxome. J. Bacteriol., 186, 1769, 2004.

14 Monitoring and Measuring the Metabolome Maria Rowena N. Monton Keio University

Tomoyoshi Soga Keio University

14.1 Introduction �� 14-1 14.2 Mass Spectrometry �� 14-2 Direct Injection Mass Spectrometry (DIMS) • Gas Chromatography– Mass Spectrometry • Liquid Chromatography–Mass Spectrometry Capillary Electrophoresis–Mass Spectrometry

14.3 Nuclear Magnetic Resonance Spectroscopy..............................14-8 References �� 14-9

14.1 Introduction The “metabolome,” the complete set of small (less than 1000 Da) molecules (metabolites) present in cells in a particular physiological or developmental state,1 closely reflects cellular activities at a functional level.2 Its complexities and dynamism pose significant challenges from the analytical viewpoint. Metabolomics, the qualitative and quantitative analysis of the metabolome, is an extremely demanding science. Whereas DNA, RNA, and proteins are largely coherent chemically since they are formed from a limited pool of monomers (four nucleotide bases or 20 amino acids) strung together, metabolites are much more diverse chemically and physically because of the number of possible atomic combina tions, 3 as well as spatial orientation. Metabolites can range from ionic inorganic species to hydrophilic carbohydrates, volatile alcohols and ketones, amino and nonamino organic acids, hydrophobic lipids and complex natural products.4 Their dynamic concentration range is very wide, spanning over an estimated nine orders of magnitude (from pmol to mmol). 3 Moreover, metabolites participate in a complex network of reactions, which are subject to rapid enzymatic turnover, thus requiring extreme care in sampling.5 Such intricacies render analysis of the complete metabolome practically impossible. Metabolomic approaches, depending on the intended area of application, study subsets of the metabolome, requiring technologies that are robust, rapid, comprehensive, sensitive, and accurate. These approaches include:2–7 • Metabolite/metabolic profiling—qualitative and quantitative analysis of metabolites common to a specific pathway or chemical class. • Targeted analysis—quantitative analysis of one or a group of selected metabolites. • Metabolic fingerprinting—comprehensive analysis of samples for the purpose of classification based on observed metabolite pattern. • Metabolic footprinting—global analysis of intracellular metabolites secreted in to the spent growth medium. The current arsenal of analytical technologies for metabolomics includes mass spectrometry (MS), nuclear magnetic resonance (NMR) spectroscopy, and chromatography (gas chromatography (GC), 14-1

14-2

Application of Emerging Technologies to Metabolic Engineering

liquid chromatography (LC)), and electrophoresis (capillary electrophoresis (CE))-based separation platforms. To a limited extent, molecular spectroscopic methods (UV, Raman, IR) are also used. Some stand-alone techniques are employed, but it is generally regarded that strategic combinations of highperformance separation techniques with sophisticated detection systems constitute the best methods. Upstream separation boosts detector performance, and confidence in metabolite identification is reinforced by the use of two (or more) parameters in tandem (e.g., retention time/migration time plus accurate molecular mass). These combinations may be performed off-line as distinct techniques carried out in succession, or on-line as integrated steps comprising a single analytical procedure. The off-line mode permits independent optimization of analytical conditions and circumvents technical difficulties in coupling different systems. However, the on-line mode is used more extensively because of higher throughput and minimal sample loss due to handling.

14.2 Mass Spectrometry MS is the most important tool for probing the metabolome, unmatched in terms of sensitivity, and with near universality of application. It can provide molecular weight accuracy on the order of ± 0.01% and structural information as well.8 A number of ionization methods are available; however, electron ionization (EI), chemical ionization (CI), electrospray ionization (ESI) and matrix-assisted laser desorption/ ionization (MALDI) are the most common in metabolomics. The first two are used almost exclusively with GC–MS, while the last two, which are compatible with nonvolatile, thermolabile compounds, have wider applications. ESI and MALDI (Table 14.1) belong to the so-called “soft” ionization techniques, which cause little or no fragmentation of the molecular ion.8 They can be operated in both positive and negative ion modes, thus potentially expanding coverage. Several designs of mass analyzers are available to match specific needs. A comparison of four commonly used ones (quadrupole, ion trap, time-of-flight (TOF), Fourier transform-ion cyclotron resonance (FT-ICR)) in terms of performance is shown in Table 14.2.8 Some hybrid (i.e., combination of basic types, e.g., quadrupole—TOF) mass spectrometers have also been developed, enabling very high mass accuracy, sensitivity, resolution, and scan speed. When necessary to structurally characterize an unknown metabolite, or to provide unequivocal identification by direct interpretation of its Table 14.1 Comparison of ESI and MALDI Sensitivity Tolerance for complex mixtures Tolerance for salts Integration with upstream separation

ESI

MALDI

High Low Low Adaptable

High High Medium Essentially off-line

Table 14.2 Performance of Commonly Employed Mass Analyzers Mass accuracy Mass range Resolution Relative cost MS/MS capability

Quadrupole

Ion Trap

TOF

FT-ICR

Medium Medium Low Low Yesa

Medium Medium Low Low Yes

High Very high High Medium Yesb

Very high High Very high High Yes

With multiple quadrupoles. With quadrupole-TOF or reflectron-TOF instrument.

a

b

Monitoring and Measuring the Metabolome

14-3

mass spectrum or by comparison with putative fragments, tandem MS (MS/MS) may be used, and fragmentation may be accomplished via collision-induced dissociation (CID).

14.2.1 Direct Injection Mass Spectrometry (DIMS) In DIMS, samples, frequently without derivatization, are directly introduced in to the mass spectrometer by infusion using a syringe pump or an automated flow injection system (in ESI), or by deposition onto the target plate (in MALDI). The total run time per sample can be as short as 30s9 enabling hundreds of samples to be processed per day. Hence, DIMS is particularly suitable for highthroughput screening. It has been widely exploited in the areas of microbial and plant metabolomics for chemotaxonomy,10–12 and in clinical diagnostics for targeted analyses of key metabolites related to metabolic disorders.13,14 With ESI or MALDI as ion sources, the presence of molecular ions simplifies the interpretation of mass spectra, and the use of high-mass accuracy, high-mass resolution instruments (e.g., FT–ICR–MS) permits assignment of empirical formulae to peaks. Structural isomers, however, cannot be mass-resolved and a prior separation step is necessary. For quantitation, DIMS should be used with caution because of matrix effects, i.e., changes in MS response of a chemical species due to the presence of other species in the mixture. Such effects can be minimized and the accuracy of results improved by executing a clean-up step prior to MS analysis15 or by incorporating an internal standard.16

14.2.2 Gas Chromatography–Mass Spectrometry GC and MS are inherently compatible techniques—they both require the sample to be in the gaseous phase—hence, GC–MS technology matured early and is considered as a standard tool for the analysis of volatile compounds and those that can be derivatized to yield volatile, thermostable products. In this hyphenation of techniques, GC confers high resolution while MS provides high selectivity and sensitivity. Efficiencies on the order of 105 plates enabling identification of hundreds of compounds in the pmol level, in a single chromatographic run, can be obtained. Volatile metabolites can be collected by headspace solid-phase microextraction (SPME),17 headspace sorbent extraction,18 or solvent extraction,19 then injected directly to the gas chromatograph. Comprehensive studies, however, generally require a preceding derivatization step since many metabolites contain polar functional groups that are thermally labile at the temperatures required for their separation, or are not volatile at all.20 An overview of the derivatization methods used in metabolomics is presented in Table 14.3. GC columns can either be of packed or open tubular (capillary) type. Because of their superior resolution, however, majority of GC columns today are of the latter. Fast, efficient separations can be obtained using fused silica capillaries (0.1–0.7 mm in internal diameter) as short as 10 m, but longer capillaries are required for highly complex mixtures. The stationary phase can be coated onto the wall of the capillary as a liquid film (wall-coated open tubular (WCOT)), a porous polymer (polymer layer open tubular (PLOT)), or adsorbed onto a support material lined on the wall (support-coated open tubular (SCOT)). WCOT columns are broadly used because they afford higher efficiencies and better stability. The film can vary in thickness from 0.1 to 3 mm, with thicker films required for low-boiling compounds. Polysiloxane-based columns (e.g., DB-1, DB-5, VF-1ms) work well for nonpolar compounds and those of intermediate polarities, while polyethylene glycol (PEG)-based ones (e.g., DB-WAX) are better for highly polar compounds. The low flows associated with capillary columns permit their direct insertion into the MS ion source, frequently EI. Eluted compounds can be identified by comparing their retention indices (RI) against those of known standards analyzed under similar conditions, or by their mass spectra since EI induces

14-4

Application of Emerging Technologies to Metabolic Engineering

Table 14.3 Derivatization Methods in GC–MS Applied for Metabolomics Type of Reaction Silylation

Alkylation/ esterification

Type of Reagent

Typical Reaction Conditions

Classes of Metabolites Derivatized

TMS, HMDS, TBS, Reaction in pyridine under Sugars and derivatives, amino MTBSTFA, BSTFA, anhydrous conditions and acids, organic acids, MDBSTFA, QSM, MSHFBA, heating terpenoids, fatty acids, TSIM, MSTFA, TMCS, flavonoids, amides, and TMSDEA, BSA, etc. phytohormones Amino acids and derivatives, Chloroformates Reaction in pyridine in mono- and polycarboxylic aqueous solution, room temperature, with or without acids, keto acids, hydroxyl acids, fatty acids, aliphatic a second reactant and alicylic amines, and amine-alcohols Diazomethane Nonpolar solvent in the absence of water Acidic esterification

H2SO4, Na2SO4, methanol/ ethanol

Source: Villas-Bôas, S.G. et al. Mass Spectrom. Rev., 24, 613, 2005. With permission.

fragmentation of the molecular ions. GC–MS has been used extensively for both metabolite profiling and targeted analyses.17–22 Multidimensional separations operating on orthogonal mechanisms provide greater resolving power for extremely complex mixtures than one-dimensional systems, though this comes at the expense of longer analysis times. A significant development in GC in the last decade is comprehensive twodimensional (2D) GC (GC × GC). In such configuration, the sample is first separated in a long, nonpolar column, then small fractions of the effluent are trapped and focused using a modulator, and sequentially released into a second short, polar (or shape-selective) column for further separation. Separation in the second column is extremely fast (1–10 s), thus requiring a rapid detection system. Among currently available MS detectors, only TOF instruments are compatible because of their high-scan speed capabilities (more than 50 spectra per second).23 Because the peak capacity (i.e., the number of peaks that can be potentially resolved in a column) is significantly increased (equal to the product of the peak capacities of the individual columns), the number of components that can be separated is also increased, thus GC × GC is ideal for very complex samples. In addition, components coeluting in the first column can possibly be resolved in the second, because the latter is driven by an independent separation principle. GC × GC data are presented in a 2D space (example shown in Figure 14.1), from which a chemical map of compound properties can be derived, making it particularly attractive for sample fingerprinting.24 It is broadly used for petrochemical and environmental analyses and is now increasingly deployed for metabolomic approaches, principally in plants.25,26

14.2.3 Liquid Chromatography–Mass Spectrometry To date, LC is the dominant liquid-phase separation technique. It can simultaneously accommodate analytes with widely varying properties, such as ionic/neutral, hydrophilic/hydrophobic, and acid/base, and it is this ruggedness which has contributed to its ease of use and utility.27 Similar to the trend in GC, the downscaling of the chromatographic column has been a major step toward improved separation performance, higher sensitivity, lower sample volume requirement and lower solvent consumption. LC, as it is practised today, is largely in the micro (0.5–1 mm) or capillary (0.1–0.5 mm) scale. The low flow rates in these small-internal diameter columns, in combination with the introduction of ESI

14-5

Monitoring and Measuring the Metabolome (a)

(b) 3.5

3.5 3.0

4 1

2.0 D2 Retention time (s)

3.0

3

2.5

1.0

12.50 13.75 15.00 16.25 17.50 18.75 20.00

(c)

(d)

3.5

3.5

3.0 1

3.0

2.0 1.5 1.0

5

1.5

7

3

4

2.0

2

2.5

1

2.5 5

1.5 1.0

6

4

7

2

12.50 13.75 15.00 16.25 17.50 18.75 20.00

15

2.5

5

5 2

7

12.50 13.75 15.00 16.25 17.50 18.75 20.00

1.5 1.0

7

4

1

2.0

2

3

6

9

11

8 10

13 12

14

12.50 13.75 15.00 16.25 17.50 18.75 20.00

D1 Retention time (min)

Figure 14.1 Expanded windows showing a possible fingerprint region for GC × GC space analysis of the headspace volatiles from (a) Agrostis stolonifera; (b) Pennisetum clandestinum; (c) Trifolium repens and (d) Eucalyptus leucoxycolon. (Reprinted from Perera, R.M.M., Marriott, P.J., and Galbally, I.E. Analyst, 127, 2002, 1601. With permission of The Royal Society of Chemistry.)

and its variants (microESI, nanoESI), which permit direct analysis of compounds in the aqueous or hydroorganic eluate, have made LC-MS routine. Several separation modes (e.g., ion-exhange, size-exclusion, affinity) are available, but the most common by far is reversed-phase LC (RPLC), in which the sample components are eluted in the order of increasing hydrophobicity. The mobile phases used are MS-compatible, which accounts for the popularity of RPLC-MS couplings. High separation efficiencies can be obtained, and can be further improved when operated in the gradient elution mode (i.e., the organic component of the mobile phase is changed stepwise or continuously), enabling resolution of components that coelute otherwise. ESI, the usual interface, is susceptible to ion suppression effects; hence, good resolution is necessary for reliable peak assignment and quantitation. RPLC-MS has been used successfully in many metabolomic applications, including metabolite profiling in plants28,29 and microorganisms,30 and targeted analysis in the clinical setting.31 In most cases, the use of MS/MS expanded the capability for definitive metabolite identification. RPLC columns are typically packed with spherical silica particles (3–5 µm) whose surfaces have been modified by alkyl chains varying in length between C 4 and C18 to promote analyte retention. Chromatographic efficiency can be increased by decreasing the size of these particles or using a longer column, but pressure limits of most commercial instruments (10,000 psi) have largely precluded this. Recent innovations in pump systems, however, enabled operations at elevated pressures, and led to the development of ultrahigh-pressure or ultra performance LC (UPLC). Particles smaller than 2 µm can

14-6

Application of Emerging Technologies to Metabolic Engineering

be packed in long capillaries using packing pressures as high as 60,000 psi and running pressures at optimum flow rates on the order of 20,000 psi. These columns can generate as many as 300,000 plates, and analysis times can be significantly shortened.32 A comparison of results using conventional LC–MS and UPLC–MS under similar analytical conditions is shown in Figure 14.2. UPLC–MS is used increasingly in metabolomic applications33–35 where sample complexity requires a technology of high resolving power. An option to improve efficiency is to use continuous media, such as monoliths, instead of particlepacked columns. A monolithic column is made of a single piece of a porous solid with small-sized skeletons and relatively large through-pores.36 Its high permeability allows enhanced mass transport and high flow rates at reasonable column back pressures for efficient (100,000 plates), fast operations. Interest in monolithic columns has picked up considerably in recent years for proteomic37 and genomic38 studies, and the same interest can be expected in metabolomic research. For highly polar compounds that are not retained well, and therefore not separated, on RPLC columns, hydrophilic interaction chromatography (HILIC)39 may be used. HILIC is similar to the traditional normal phase LC (NPLC) in that the components of a sample elute in the order of increasing hydrophilicity. However, in lieu of highly hydrophobic solvents (e.g. hexane) used in NPLC, HILIC uses low-aqueous, high-organic (ca. 80%) mobile phases, thus eliminating the problem associated with the insolubility of polar analytes in the nonpolar mobile phase. Moreover, such mobile phases are compatible with MS, and coupling can be implemented with ease. The superiority of HILIC over traditional RPLC has been demonstrated in profiling oligosaccharides and sugar nucleotides,40 and some highly polar urinary constitutents.41

14.2.4 Capillary Electrophoresis–Mass Spectrometry CE is a group of separation techniques that can be used for both small and large molecules. It is ideal for charged or chargeable species, but its utility can be expanded to include even neutrals when they are incorporated into charged carriers. References to CE generally mean its most popular form, capillary zone electrophoresis (CZE). However, several other modes are available, including micellar electrokinetic chromatography (MEKC), capillary gel electrophoresis (CGE), capillary sieving electrophoresis (a)

HPLC

2.5 min 5

Response

(b) 7.5 10 150 300 450 600 750 m/z

9000 7500 6000 4500 3000 1500 0 Cnt

UPLC 4

6

8

150 300 450 600 750 m/z 1750 1500 1250 1000 750 500 250 0 Cnt

m/z

Retention time

Figure 14.2 3D maps for (a) HPLC–MS and (b) UPLC–MS of white male mouse urine from AM collection mouse showing retention time, m/z and intensity. (Reprinted with permission from Wilson, I.D. J. Proteome Res., 4, 2005, 591. Copyright 2005 American Chemical Society.)

Monitoring and Measuring the Metabolome

14-7

(CSE), capillary isotachophoresis (CITP), and capillary isoelectric focusing (CIEF). CSE is an important tool in DNA sequencing, while CIEF is a mainstay in many proteomic laboratories. Compared to GC and LC, CE is still an emerging technology whose development began in the early 1980s. It can be regarded as the capillary analog of traditional slab gel electrophoresis in the same way that column chromatography is to planar chromatography. Using mostly aqueous solutions, analytes can be separated according to their mass-to-charge ratios inside narrow-bore capillaries with internal diameters that are typically less than 100 µm for efficient dissipation of Joule heat. The use of fused silica capillaries enables application of high electric fields, resulting in short analysis times. Separation efficiencies in CE are generally higher than in LC, which can be attributed in part to the plug-like profile of electroosmotic flow (EOF), the bulk flow of solution inside the capillary; hence, analyte zone dispersion is minimal. Under normal conditions (i.e., anode at the inlet and cathode at the outlet), the magnitude of the EOF is higher than the electrophoretic mobilities of most analytes, so cations, neutrals, and anions will migrate, in the given order, in the same direction and be analyzed simultaneously. In addition, CE can be automated easily and running costs are low. Despite its many advantages, widespread acceptance of CE in the research community has been slow in coming, and this can be traced back to the capillary itself. To maintain performance, sample loading is limited to ca. 1% (a few nl at most compared to µl-range for LC)) of the total capillary volume. While this translates to minimal sample requirement, it also means that the concentration sensitivity is low. This drawback becomes aggravated since most commercial CE instruments are fitted with absorbance-related detectors, thus the available path length for detection is short. The concentration limits of detection in CE (µM level) are acknowledged to be worse by two orders of magnitude compared to chromatographic systems, but the use of sample enrichment techniques and more sensitive detectors have helped close the gap. The resolving power of CE has been demonstrated for diverse classes of analytes, ranging from small inorganic anions to large proteins, and even whole organisms. Compared to the more established GC and LC methods, few CE applications in metabolomics have been reported, many of them with oncapillary detectors, such as UV or laser-induced fluorescence (LIF). However, with the availability of commercial interfaces for combining CE with MS, CE–MS-based techniques have grown steadily, particularly for polar metabolites that are not separated well by chromatographic means. The most robust approach for CE–MS coupling is via a sheath liquid interface, in which a hydroorganic solution (sheath liquid) flows between a grounded stainless steel capillary and a coaxial separation capillary. It serves to complete the CE electrical circuit and helps provide a stable electrospray. CE–MS can be configured in a number of ways, with a given configuration optimized for the separation and detection of a group of compounds. (1) Amino acids, amines, nucleosides, and other positively chargeable compounds are separated as their cations using a highly acidic separation solution in a bare, fused silica capillary, and then detected by ESI–MS as their protonated molecular ions, [M + H] + .42 (2) The metabolites for key pathways for cellular energy production such as glycolysis, tricarboxylic acid (TCA) and pentose phosphate cycles, and other negatively chargeable compounds are separated as their anions under alkaline conditions. A capillary coated with a cationic polymer is employed to reverse the EOF (from cathode to anode) so that it is in the same direction as the electrophoretic migration of the anions, and analysis time can be shortened. The separated anions are subsequently detected by MS as their deprotonated molecular ions, [M-H]−.43 (3) Multivalent anions such as nucleotides and CoA derivatives are separated on a noncharged polymer-coated capillary, under slightly alkaline conditions and a pressure-assist system to counteract the EOF and deliver a constant flow of solution toward the mass spectrometer where they are detected as their deprotonated molecular ions. These metabolites can not be analyzed together with other anions under (2) because of significant adsorption on the positively charged wall.44 Schematic diagrams of these CE–MS conditions are shown in Figure 14.3. High separation efficiencies could be obtained in each of them, even permitting separation of isomers (Figure 14.4). By combining data from these conditions, thousands of peaks can be separated, identified and quantified, and the strategy can be used for obtaining global metabolite profiles. Its usefulness has

14-8

Application of Emerging Technologies to Metabolic Engineering (a)

––––– + +

+

2+

EOF

–

MS

EOF

+

MS

Flow

+

MS

Inlet vial

(b) – –

+++++

– – –

–

––

–

–– +++++

(c)

Air pump

2+

+ + + –––––

–

–– ––

2– 2–

Figure 14.3 Schematic diagrams of the various CE–MS configurations. (a) for cationic metabolites; (b) for anionic metabolites, and (c) for nucleotides and CoA compounds. (Reprinted with permission from Soga, T. et al. J. Proteome Res., 2, 2003, 488. Copyright 2003 American Chemical Society.)

been previously demonstrated for investigating how changes in metabolites are implicated in Bacillus subtilis sporulation,45 for determination of the major metabolites in rice leaves,46 and for biomarker discovery.47

14.3 Nuclear Magnetic Resonance Spectroscopy NMR spectroscopy is a nondestructive analytical method. It can use one of several high-abundance isotopes with nonzero nuclear spin (e.g., 1H, 13C, 14N, 19F, 31P), thus it is applicable to most biologically important molecules. For metabolomics, however, the preferred nucleus is 1H. Biofluids, cells, and even intact tissues can be analyzed directly without extensive sample preparation steps. Thus, NMR is an attractive platform for high-throughput fingerprinting and profiling studies. Its major application is in the clinical setting, where it is used to track changes in metabolite levels in response to stress (e.g., a disease or a drug), setting off a related field of study termed metabonomics.48 Unlike MS which cannot detect compounds that are not effectively ionized in the ionization source and is susceptible to matrix effects, NMR is compound-independent and can uniformly detect all compounds having NMR-measurable nuclei.49 Its principal limitations, however, are its poor sensitivity and high sample requirement (high µl). NMR has been prefaced with LC to enhance its resolving capability for complex mixtures (e.g., components with spectral overlaps). The coupling is best operated using the stopped-flow mode, in which the chromatographic run is briefly stopped once the peak of interest reaches the NMR probe and is resumed after the spectrum has been recorded. Though chromatographic performance is compromised to some extent, sensitivity gain can be achieved because the time window for NMR data acquisition is wide enough. A strategy that exploits the combination of two powerful detectors is LC–NMR–MS. It allows simultaneous acquisition of NMR and MS data in a single LC run, and enables comprehensive analysis of a complex matrix through real-time comparison and complementation of NMR and MS data.49 The detectors may be combined to the LC system serially, i.e., the LC eluent is first analyzed by NMR and then directed to the mass spectrometer for second analysis. Alternatively, the LC eluent may be split for

14-9

Lactate

Glyoxylate 12

14 m/z 73 min m/z 75 m/z 87 m/z 89 m/z 115

F6P G6P

G1P

PEP 2-Oxoglutarate 2,3DPG

3PG

cis-Aconitate

Glycolate

10

Glycerol3P

8

Pyruvate

6

DHAP

4

Malate

2

Succinate

350000 0 300000 200000 150000 100000 50000

Fumarate

Monitoring and Measuring the Metabolome

m/z 117 m/z 133 m/z 145 m/z 167 m/z 169 m/z 173

F1,6P

Citrate

Isocitrate

m/z 171 m/z 185 m/z 191 m/z 259 m/z 265 m/z 339 5

Time (min)

10

Figure 14.4 CE-MS selected ion electropherograms for a standard mixture of 20 metabolites of glycolysis and the TCA cycle. (Reprinted with permission from Soga, T. et al. J. Proteome Res., 2, 2003, 488. Copyright 2002 American Chemical Society.)

parallel detection by NMR and MS. LC–NMR–MS has been demonstrated to be a particularly useful for resolving ambiguities that cannot be handled by LC–MS or LC–NMR alone. It is not yet used for routine analysis, however, several applications for analysis of drug metabolites50 and for natural product screening 51,52 have already been reported.

References 1. Oliver, S.G. et al. Systematic functional analysis of the yeast genome. Trends Biotechnol., 16, 373, 1998. 2. Goodacre, R. et al. Metabolomics by numbers: acquiring and understanding global metabolite data. Trends Biotechnol., 22, 245, 2004. 3. Dunn, W.B. and Ellis, D.I. Metabolomics: current analytical platforms and methodologies. Trends Anal. Chem., 24, 285, 2005. 4. Villas-Bôas, S.G. et al. Mass spectrometry in metabolome analysis. Mass Spectrom. Rev., 24, 613, 2005. 5. Birkemeyer, C. et al. Metabolome analysis: the potential of in vivo labeling with stable isotopes for metabolite profiling. Trends Biotechnol., 23, 28, 2005. 6. Nielsen, J. and Oliver, S. The next wave in metabolome analysis. Trends Biotechnol., 23, 544, 2005.

14-10

Application of Emerging Technologies to Metabolic Engineering

7. Dunn, W.B., Bailey, N.J.C., and Johnson, H.E. Measuring the metabolome: current analytical technologies. Analyst, 130, 606, 2005. 8. Suizdak, G. Mass Spectrometry for Biotechnology. Academic Press, San Diego, 1996, xi. 9. Goodacre, R. et al. Chemometric discrimination of unfractionated plant extracts analyzed by electrospray mass spectrometry. Phytochemistry, 62, 2003, 859. 10. Kaderbhai, N.N. et al. Functional genomics via metabolic footprinting: monitoring metabolite section by Escherichia coli tryptophan metabolism mutants using FT-IR and direct injection electrospray mass spectrometry. Comp. Func. Genom., 4, 376, 2003. 11. Smedsgaard, J. and Frisvad, J.C. Using direct electrospray mass spectrometry in taxonomy and secondary metabolite profiling of crude fungal extracts. J. Microbiol. Methods, 24, 1996, 5. 12. Dunn, W.B., Overy, S., and Quick, W.P. Evaluation of automated electrospray-TOF mass spectrometryfor metabolic fingerprinting of the plant metabolome. Metabolomics, 1, 2005, 137. 13. Rashed, M.S. et al. Diagnosis of inborn errors of metabolism from blood spots by acylcarnitines and amino acids profiling using automated electrospray tandem mass spectrometry. Pediatr. Res., 38, 1995, 324. 14. Bonafe, L. Evaluation of urinary acylglycines by electrospray tandem mass spectrometry in mitochondrial energy metabolism defects and organic acidurias. Mol. Genet. Metab., 69, 2000, 302. 15. Liu, C. et al. On-line microdialysis sample cleanup for electrospray ionization mass spectrometry of nucleic acid samples. Anal. Chem., 68, 1996, 3295. 16. Wittmann, C. and Heinzle, E. MALDI-TOF MS for quantification of substrates and products in cultivations of Corynebacterium glutamicum. Biotechnol. Bioeng., 72, 2001, 642. 17. Deng, C., Zhang, X., and Li, N. Investigation of volatile biomarkers in lung cancer blood using solidphase microextraction and capillary gas chromatography–mass spectrometry. J. Chromatogr. B, 808, 2004, 269. 18. Demyttenaere, J.C.R., Moriña, R.M., and Sandra, P. Monitoring and fast detection of mycotoxinproducing fungi based on headspace solid-phase microextraction and headspace sorptive extraction of the volatile metabolites. J. Chromatogr. A, 985, 2003, 127. 19. Southwell, I.A. et al. Differential metabolism of 1,8–cineole in insects. J. Chem. Ecol., 29, 2003, 83. 20. Koek, M.M. et al. Microbial metabolomics with gas chromatography/mass spectrometry. Anal. Chem., 78, 1272, 2006. 21. Fiehn., O. et al. Metabolite profiling for plant functional genomics. Nat. Biotechnol., 18, 2000, 1157. 22. Birkemeyer, C., Kolasa, A., and Kopka, J. Comprehensive chemical derivatization for gas chromatography–mass spectrometry-based multi-targeted profiling of the major phytohormones. J. Chromatogr. A, 993, 2003, 89. 23. Dallüge, J., Beens, J., and Brinkman, U.A.Th. Comprehensive two-dimensional gas chromatography: a powerful and versatile analytical tool. J. Chromatogr. A, 1000, 2003, 69. 24. Marriott, P. and Shellie, R. Principles and applications of comprehensive two-dimensional gas chromatography. Trends Anal. Chem., 21, 2002, 573. 25. Özel, M.Z. Analysis of volatile components from Ziziphora taurica subsp. taurica by steam distillation, superheated-water extraction, and direct thermal desorption with GC × GC–TOFMS. Anal. Bioanal. Chem., 382, 2005, 115. 26. Perera, R.M.M., Marriott, P.J., and Galbally, I.E. Headspace solid-phase microextraction— comprehensive two-dimensional gas chromatography of wound induced plant volatile organic compound emissions. Analyst, 127, 2002, 1601. 27. Shen, Y. and Smith, R.D. Proteomics based on high-efficiency capillary separations. Electrophoresis, 23, 2002, 3106.

Monitoring and Measuring the Metabolome

14-11

28. Witters, E. et al. Analysis of cyclic nucleotides and cytokinins in minute plant samples using phase-system switching capillary electrospray-liquid chromatography-tandem mass spectrometry. Phytochem. Anal., 10, 1999, 143. 29. Prinsen, E. Micro and capillary liquid chromatography–tandem mass spectrometry: a new dimension in phytohorhome research. J. Chromatogr. A, 826, 1998, 25. 30. Buchholz, A., Takors, R., and Wandrey, C. Quantification of intracellular metabolites in Escherichia coli K12 using liquid chromatographic-electrospray ionization tandem mass spectrometric techniques. J. Chromatogr. A, 295, 2001, 129. 31. Ito, T. et al. Rapid screening of high-risk patients for disorders of purine and pyrimidine metabolism using HPLC-electrospray tandem mass spectrometry of liquid urine or urine-soaked filter paper strips. Clin. Chem., 46, 2000, 445. 32. MacNair, J.E., Lewis, K.C., and Jorgenson, J.W. Ultrahigh-pressure reversed-phase liquid chromatography in packed capillary columns. Anal. Chem., 69, 1997, 983. 33. Shen, Y. et al. Automated 20 kpsi RPLC-MS and MS/MS with chromatographic peak capacities of 1000–1500 and capabilities in proteomics and metabolomics. Anal. Chem., 77, 2005, 3090. 34. Wilson, I.D. High resolution “ultra performance” liquid chromatography coupled to oa-TOF mass spectrometryas a tool for differential metabolic pathway profiling in functional genomic studies. J. Proteome Res., 4, 2005, 591. 35. Giri, S. et al. Metabolomic approach to the metabolism of the areca nut alkaloids arecoline and arecaidine in the mouse. Chem. Res. Toxicol., 19, 2006, 818. 36. Ishizuka, N. et al. Performance of a monolithic silica column in a capillary under pressure-driven and electrodriven conditions. Anal. Chem., 72, 2000, 1275. 37. Batycka, M. Ultra-fast tandem mass spectrometry scanning combined with monolithic column liquid chromatography increases throughput in proteomic analysis. Rapid Commun. Mass Spectrom., 20, 2006, 2074. 38. Walcher, W. et al. Monolithic capillary columns for liquid chromatography–electrospray ionization mass spectrometry in proteomic and genomic research. J. Chromatogr. B, 782, 2002, 111. 39. Alpert, A. et al. Hydrophilic-interaction chromatography of complex carbohydrates. J. Chromatogr. A, 676, 1994, 191. 40. Tolstikov, V.V. and Fiehn, O. Analysis of highly polar compounds of plant origin: combination of hydrophilic interaction chromatography and electrospray ion trap mass spectrometry. Anal. Biochem., 301, 2002, 298. 41. Idborg, H. et al. Metabolic fingerprinting of rat urine by LC/MS: Part 1. Analysis by hydrophilic interaction liquid chromatography–electrospray ionization mass spectrometry. J. Chromatogr. B, 828, 2005, 9. 42. Soga, T. and Heiger, D.N. Amino acid analysis by capillary electrophoresis electrospray ionization mass spectrometry. Anal. Chem., 72, 2000, 1236. 43. Soga, T. et al. Simultaneous determination of anionic intermediates for Bacillus subtilis metabolic pathways by capillary electrophoresis electrospray ionization mass spectrometry. Anal. Chem., 74, 2002, 2233. 44. Soga, T. et al. Pressure-assisted capillary electrophoresis electrospray ionization mass spectrometry for anlaysis of multivalent anions. Anal. Chem., 74, 2002, 6224. 45. Soga, T. et al. Quantitative metabolome analysis using capillary electrophoresis mass spectrometry. J. Proteome Res., 2, 2003, 488. 46. Sato, S. et al. Simultaneous determination of the main metabolites in rice and leaves using capillary electrophoresis mass spectrometry and capillary electrophoresis diode array detection. Plant J., 40, 2000, 151. 47. Soga, T. et al. Differential metabolomics reveals ophthalmic acid as an oxidative stress biomarker indicating hepatic glutathione consumption. J. Biol. Chem., 21, 2006, 16768.

14-12

Application of Emerging Technologies to Metabolic Engineering

48. Lindon, J.C., Holmes, E., and Nicholson, J.K. So what’s the deal with metabonomics? Anal. Chem., 75, 2003, 384A. 49. Yang, Z. Online hyphenated liquid chromatography-nuclear magnetic resonance spectroscopymass spectrometry for drug metabolite and nature product analysis. J. Pharm. Biomed. Anal., 40, 2006, 516. 50. Silva Elipe, M.V., Huskey, S.W., and Zhu, B. Application of LC–NMR for the study of the volatile metabolite of MK-0869, a substance P receptor antagonist. J. Pharm. Biomed. Anal. 30, 2003, 1431. 51. Xiao, H.B. Capillary liquid chromatography–microcoil 1H nuclear magnetic resonance spectroscopy and liquid chromatography–ion trap mass spectrometry for on-line structure elucidation of isoflavones in Radix astragali. J. Chromatogr. A, 1067, 2005, 135. 52. Sumarah, M.W., Miller, J.D., and Blackwell, B.A. Isolation and metabolite production by Penicillium roqueforti, P. paneum and P. crustosum isolated in Canada. Mycopathologia, 159, 2005, 571.

Future Prospects in Metabolic Engineering

IV

Bernhard ø. Palsson University of California

Sang Yup Lee

Korea Advanced Institute of Science and Technology

15 Systems Biology, Genome-Scale Models, and Metabolic Engineering Sang Yup Lee, Hyun Uk Kim, Hongseok Yun, Seung Bum Sohn, Jin Sik Kim, Bernhard Ø. Palsson, Markus J. Herrgård, and Vasiliy A. Portnoy...................................................... 15-1 Introduction • Systems Biology • Genome-Scale Models • Metabolic Engineering Based on High-Throughput Technologies • Metabolic Engineering Based on GenomeScale Models • Conclusion

16 Cell-Free Systems for Metabolic Engineering Kara A. Calhoun and James R. Swartz................................................................................................................... 16-1 Introduction • Advantages and Challenges of Cell-Free Systems for Metabolic Engineering • Examples • Summary

17 In Silico Models for Metabolic Systems Engineering Kumar Selvarajoo, Satya Nanda Vel Arjunan, and Masaru Tomita..........................................................................17-1 Introduction • Metabolic Systems Engineering • Simulation Tools: E-Cell for Metabolic Systems Engineering • Dynamic in Silico Simulation • Practical Applications • Future Prospects

M

etabolic engineering is now an established discipline as witnessed by the material assembled in this handbook and with a journal that is now more than 10 years old. With the impending change from chemical to biological feedstocks over the coming decades, this field is taking on significant socio-economic dimensions. Due to the unpredictability of the creative IV-1

IV-2

Future Prospects in Metabolic Engineering

process, it is always difficult to forecast the developments of scientific and engineering fields. However, it is important to try to look into the future and assess some of the impeding changes. Section IV contains only three chapters and is thus in no way a comprehensive assessment of potential future developments in metabolic engineering. The section tries to focus on what might be thought of as systems and engineering approaches that are on the horizon, and not cover those in molecular biology, genomics, genetic manipulation systems and novel bioprocessing schemas. Chapter 15–17 cover the following topics. • The availability of annotated genomic sequences for many microorganisms, including those of metabolic engineering importance, has led to the reconstruction of genome-scale metabolic maps for particular organisms. Such network reconstructions have found many applications, and the metabolic engineering uses are covered in Chapter 15. • Even though it is not yet possible to build a whole cellular system for performing desired metabolic transformations, it is possible to break cells up and reconstitute particular functions in vitro. Chapter 16 describes the progress that has been made with engineering metabolic transformations in what are called cell-free systems. Such systems offer a number of novel options since the constraints of a fully functional organism are alleviated. • Full kinetic models of metabolism have been historically difficult to achieve due to the lack of reliable kinetic and thermodynamic information. These limitations are being challenged now. Chapter 17 covers this traditional but challenging topic now to be performed at a cell or genomescale. As the use of genome-scale models (GEMs) in developing metabolic engineering strategies are becoming increasingly popular, it is tempting to speculate a bit about their future development and use. A well curated reconstruction represents a biochemical, genetic and genomic (BiGG) knowledge base that forms the basis for computational interrogation and subsequent experimentation. Further development of BiGG knowledge bases and their use is expected to occur. We select three areas here for further discussion; (i) scope and content of network reconstructions, (ii) computational BiGG query tools (i.e., modeling and computational tools), and (iii) the use of adaptive evolution as a design tool. (i) Scope. The scope of reconstructions is bound to grow, representing more and more BiGG knowledge in the structured format of a GEM. Growth in scope in the near-term is likely to involve the transcriptional and translational machineries. Such an extension will enable a range of studies including the direct inclusion of proteomic data, fine graining of growth requirements, and the explicit consideration of secreted protein products. Another expected expansion in scope is the reconstruction of the genome-scale transcriptional regulatory network (TRN). Such reconstruction at the genome-scale is now enabled by new experimental technologies, such as ChIP-chip. A reconstructed TRN will allow better computational predictions of the context-specific uses of the bacterial genome. (ii) Computational methods. More in silico computational methods are likely to be developed. We now know how to represent BiGG data in a mathematical format (either a stoichiometric format or in the form of causal relationships). We also now know how to convert a BiGG knowledge base into a GEM and then perform computational inquires. Computational query tools of GEMs will thus continue to be developed. New advances will likely include modularization methods, use of fluxomic data and eventually kinetics. As the scope and content of the reconstruction grows, the need to modularize its content becomes more pressing. Current computational limitations force the reduction in a network for the analysis of isotopomer data, and a rational way to carry out such reduction is needed. Finally, although detailed kinetic models of microbial functions may currently be mostly of academic interest, we will most likely be able to construct them based on advances with metabolomic and fluxomic data, and the developments that are occurring with the incorporation of thermodynamic information into GEMs. Such large-scale kinetic models are likely to differ from those resulting from traditional approaches for construction of kinetic models. Large-scale estimation/determination and validation of the parameters will be an additional challenge.

Future Prospects in Metabolic Engineering

IV-3

(iii) Adaptive evolution as a design tool. The ability of GEMs to predict distal causation has the potential to impact bioprocess design in a fundamental way. Most strain designs for bioprocessing rely on intricate and complex manipulations of a wild-type organism to force it to produce a compound of interest. Normally such strains are highly unstable and, if left unsupported, may lose productivity over time. This feature necessitates batch processing with stringent QC/QA procedures for cell line banking and the production of an inoculum from the frozen stock. The ability of GEMs to predict distal causation has led to the development of insightful methods to calculate deletion/addition sets of genes that couple production objectives to selection pressure (i.e., growth rate). In other words, strains can potentially be designed that self-optimize in a given growth conditions. Initial experimental studies along these lines show some promise. If successful, continued development and application of this approach has the potential to lead to continuous bioprocessing for products that can be coupled to appropriate selection pressure. With the development of algorithms that can growth-couple metabolic designs comes the tantalizing possibility to use adaptive evolution as a design tool. These systems-level approaches can be combined with synthetic biology, thus paving a way toward an era of genome-scale synthetic biology. The cell-free system described in Chapter 16 is not yet close to genome-scale artificial biosystem, but is one of the important modules to be used for reconstructing designer-biosystems in the coming years. We hope that the reader will enjoy reading about some of the future possibilities that are on the horizon: the dynamic and growing field of metabolic engineering!

15 Sang Yup Lee Korea Advanced Institute of Science and Technology

Hyun Uk Kim Korea Advanced Institute of Science and Technology

Hongseok Yun

Systems Biology, GenomeScale Models, and Metabolic Engineering

Korea Advanced Institute of Science and Technology

Seung Bum Sohn Korea Advanced Institute of Science and Technology

Jin Sik Kim Korea Advanced Institute of Science and Technology

Bernhard Ø. Palsson University of California

Markus J. Herrgård University of California

Vasiliy A. Portnoy University of California

15.1 15.2 15.3 15.4

Introduction �� 15-1 Systems Biology �� 15-2 Genome-Scale Models �� 15-3 Metabolic Engineering Based on High-Throughput Technologies �� 15-4

Utilizing Single High-Throughput Technologies • Metabolic Engineering Based on Integration of High-Throughput Data Sets

15.5 Metabolic Engineering Based on Genome-Scale Models......... 15-5 Metabolic Engineering Aided by Flux Balance Analysis (FBA) • Development of in Silico Methods for Metabolic Engineering Applications • Integration of Genome-Scale Models with Heterogeneous Data

15.6 Conclusion ��15-10 References ��15-11

15.1 Introduction One of the key features of metabolic engineering is the modification of metabolic networks through addition, deletion, and/or alteration of metabolic pathways with the purpose of improving production of specific metabolites. These manipulations are often archived by using recombinant DNA technologies and chemical engineering methods [1,2]. However, due to the complexity metabolic networks and their regulation, modifications of metabolic pathways can have unpredictable consequences that may hamper achieving the original engineering goals. Prior to the advent of systemic approaches, metabolic engineering relied on intuition and biochemical knowledge for the selection of metabolic pathways for manipulation. Results of this approach were often unexpected and the resulting strains required extensive fine tuning to yield viable production strains. Implementation of complicated metabolic engineering designs involves genetic modifications that are associated with significant phenotypic changes of the organism. Such changes can result in slower growth rates and production of unnecessary and potentially toxic by-products among other complications [3]. Due to these issues, classical metabolic engineering approaches are often time consuming, labor intensive, and ineffective from an economical standpoint. In recent years, metabolic engineering has shifted its paradigm toward the implementation of systemic approaches that rely extensively on large-scale screening and experimentation, and computational analysis of metabolic and regulatory networks (Figure 15.1). Presently, there is significant interest among scientists and engineers to study cells and microorganisms in the context of systems biology. 15-1

15-2

Future Prospects in Metabolic Engineering

Strain improvement using systematical targeting and designing • Identifying mutation points • Identifying gene targets • Finding optimal conditions

Genome-scale model/ in silico simulation • Flux balance analysis • MOMA • ROOM • OptKnock

Metabolic engineering

Trial and error

Systems biology

High-throughput analysis/ X-omics technologies • Genome • Transcriptome • Proteome • Metabolome

Purposeful modiﬁcation of metabolic network • Recombinant DNA/ biomolecular technologies • Chemical engineering methodologies

Figure 15.1 Metabolic engineering based on systems approaches, high-throughput techniques, and genomescale models. These approaches will contribute to more efficient metabolic engineering.

High-throughput technologies including genome sequencing have allowed building comprehensive databases representing our knowledge of metabolic networks in specific organisms such as KEGG (Kyoto Encyclopedia of Genes and Genomes), SEED, SGD (Saccharomyces Genome Database) [4–6]. These types of databases contain biochemical, molecular, and genomic information that can be used to enable more systematic and efficient metabolic engineering. Reconstructions of in silico genomescale stoichiometric models of metabolic networks have also appeared thanks to the influx of hight hroughput data. This chapter is devoted to future prospects of metabolic engineering based on systems biology and genome-scale models with illustrations of successful case studies.

15.2 Systems Biology Biologists have traditionally used a bottom-up approach for systems analysis; the system is broken down to its elementary components and based on their properties, systemic behavior is inferred. Although understanding the behavior and role of the individual components in a particular biological system is critical, studying each component in isolation cannot give the full picture of how the system works.

Systems Biology, Genome-Scale Models, and Metabolic Engineering

15-3

In order to fully understand the behavior of the system, the various components need to be studied simultaneously in an integrative fashion [7]. Systems biology seeks to integrate existing knowledge of the biology of a particular system with quantitative high-throughput experiments in order to elucidate how different subsystems affect each other and function as a whole [8,9]. With the advances made in high-throughput techniques, information on the molecular characteristics of cells is being generated at an increasing rate [10]. As a result, methods capable of extracting valuable information from noisy large-scale datasets are necessary. Moreover, methods that are able to link information extracted from high-throughput datasets to cellular phenotypes must also be developed. For example, genetic data may identify specific alleles that increase susceptibility to certain diseases, but the data does not reveal the biological mechanisms that cause the increased susceptibility. It is only by combining the genetic information with knowledge of metabolic, regulatory, and signaling network structures that allows determining how specific genetic variants cause the observed phenotypic consequences (i.e., disease susceptibility). Large amounts of information characterizing microorganisms that are commonly used in metabolic engineering applications is currently available in the form of high-throughput data sets including transcriptomic, proteomic, metabolomic, and phenotypic data. While each of these data sets allows studying a particular facet of the overall microbial physiology, the data sets must be analyzed together in order to maximize the value extracted from the data. Systems biology seeks to achieve this aim by generating comprehensive models of biological networks that can be used as a framework for data integration in order to facilitate scientific discovery and hypothesis generation. Mechanisms such as alternative metabolic pathways, feedback effects in transcriptional regulation, and signaling cross-talk can be represented and interrogated with such models. The ability to systematically account for these complex systems level mechanisms can significantly improve our ability to engineer bacteria to produce desired bioproducts.

15.3 Genome-Scale Models The reconstruction process of genome-scale models is a fundamental first step toward in silico analysis of metabolic physiology of microbial organisms. The reconstructed metabolic networks allow topological characterization of the network [11,12], identification of essential genes [13], and gene deletion targets for improved by-product production [14,15], and prediction of growth phenotypes under various conditions [16,17]. Furthermore, in silico modeling facilitates the analysis of various types of high-throughput data sets such as gene and protein expression profiling data as well as the visualization of these data sets within a functional context of the model [18–20]. As mentioned earlier, the reconstruction of genome-scale models has been made easier by the accumulation of various high-throughput datasets and the development of comprehensive databases. The first genome-scale metabolic model of E. coli [17,21] has been followed by a number of models for other organisms including Haemophilus influenzae [22,23], Helicobater pylori [24–26], Saccharomyces cerevisiae [27,28], Mannheimia succiniciproducens [16], Staphylococcus aureus [29,30], Mus musculus [31], and Homo sapiens [32]. These models have served as a framework for intensive in-depth research of the organism’s metabolic physiology [17,26]. One of the most promising aspects of these in silico metabolic models is their ability to accurately predict the organism’s phenotype based on its metabolic network alone, without relying on knowledge of regulatory mechanisms. Successful efforts have been directed toward systematic expansion of metabolic networks [17,26] and toward developing new methods to improve in silico phenotypic predictions made by genome-scale models [33,34]. The rapid development of metabolic network reconstruction and in silico simulation methods has lead to the recognition of the value of genome-scale metabolic models in aiding the metabolic engineering process. The complex interactions that exist between genes in even relatively simple microbial cells complicate the identification of correct genes to manipulate in order to bring out the desired phenotype. However, genome-scale metabolic models can make this identification process significantly easier

15-4

Future Prospects in Metabolic Engineering

and faster and thus allow rapidly prioritizing potential genetic manipulations. Despite the fact that the genotype-phenotype relationship is rather complex and nonlinear and is complicated by regulatory mechanisms, several successful case studies aimed at identifying genetic manipulations based on genome-scale metabolic models have been reported [15,35–37].

15.4 Metabolic Engineering Based on High-Throughput Technologies High-throughput technologies refer to experimental methods that produce genome-scale data on any of the molecular components (genes, proteins, transcripts, and metabolites) or their interactions in an organism. These technologies increasingly play a critical role in metabolic engineering as they allow researchers to design strategies that take into account complex interactions between the target metabolic pathways and all other cellular processes. Herein we provide a brief introduction of high-throughput methods focusing on their use in metabolic engineering applications followed by discussion of how integration of multiple data types can further facilitate metabolic engineering.

15.4.1 Utilizing Single High-Throughput Technologies High-throughput genome sequencing has allowed determining the gene repertoire of both prokaryotic and eukaryotic organisms [38–40]. The sequencing of a novel microbial genome can now be done routinely, and multiple databases storing annotated complete genome sequences exist [41,42]. Genomics has been widely used for the identification of novel biochemical activities present in a genome and discovery of new metabolic pathways. With the increasing number of fully sequenced organisms, comparative genomics has become an extremely useful tool for research, discovery, and metabolic engineering. Similarly, targeted sequencing of production strains created by mutagenesis followed by comparison with nonmutated wild-type strain sequences can be used to identify specific beneficial mutations. The comparative genomics approach has been demonstrated for Corynebacterium glutamicum for the purpose of lysine overproduction [43]. Here, the specific regions of the genome of an overproduction strain created through multiple rounds of mutagenesis were compared with the wild-type strain to identify specific mutations that increase production of lysine. This approach can also help in identifying target genes for further manipulation for metabolic engineering purposes. Metabolic engineering has traditionally involved insertion or deletion of genes to obtain a strain that can be used for bioprocessing applications, and with the help of a fully sequenced genome, identifying target genes for such purposes is made much easier. Transcriptomics, which makes use of high-density DNA microarrays, allows for the parallel study of the relative abundance of mRNAs under different conditions and in different strains [44]. Transcriptome profiles can give insight on the metabolic and regulatory state of the cells as well as explain any physiological changes in the cell, thereby providing information about active pathways in specific conditions [45]. By comparative analysis of the transcriptome profiles, genes, and potential mechanisms responsible for any physiological behaviors such as alcohol tolerance or changes in growth rate can be identified [46]. Transcriptomics was successfully used to improve the production of human insulin-like growth factor I fusion protein (IGF-If) in E. coli [47]. Here, the profile of gene expression during the production of IGF-If was obtained and genes that were down-regulated compared to the wild type strain were selected for overexpression. An additional problem was caused by slow growth rates during high cell density culture (HCDC) conditions required for practical applications. By studying the transcriptome profiles, it was possible to eliminate this problem and increase production of IGF-If though overexpression of a small number of genes. Proteomics allows researchers to identify and quantity the levels of proteins present in an organism in a given condition [45]. This analysis is usually done by isolating the proteins expressed using 2D electrophoresis or other separation technologies and identifying them using mass spectroscopy [48,49].

Systems Biology, Genome-Scale Models, and Metabolic Engineering

15-5

Although proteome analysis has not yet been as widely utilized in metabolic engineering as genomic or transcriptomic methods, the information from the proteome analysis can provide researchers with additional insights on the activity of metabolic pathways in engineered strains. An example where proteome analysis was used for strain enhancement is with the recombinant E. coli strain for overproducing the human hormone leptin [50]. When the initial strain designed for the overproduction of leptin was studied, researchers discovered that the levels of proteins responsive to heat shock increased and the levels of the enzymes present in the amino acid biosynthetic pathways decreased. Furthermore, it was observed that the enzymes in the serine family amino acid biosynthetic pathways were expressed at lower levels compared to other amino acid biosynthesis pathways. The overexpression of a key enzyme in the serine biosynthetic pathway was found to increase leptin production indicating that this pathway was a bottleneck for producing leptin. Metabolomics is devoted to the identification and quantification of the concentrations of metabolites in a system. Techniques involving mass spectrometry, NMR spectroscopy, and various other automated tools [51] are used to create a profile of a subset of cellular metabolites. Due to the wide range of metabolite structure/composition, no single technique is capable of detecting and quantifying all the metabolites present in a cell. Therefore, a mixture of techniques, such as liquid chromatography-mass spectroscopy (LC-MS), needs to be employed, depending on the chemical and physical properties of the metabolites of interest. In spite of the limited success in employing metabolomics due to technical difficulties, its potential in a wide range of applications including metabolic engineering has lead to significant efforts to improve and standardize experimental methods [52,53].

15.4.2 Metabolic Engineering Based on Integration of High-Throughput Data Sets Typically the information obtained based on a single high-throughput data type is not enough to fully characterize the behavior of an organism due to the nonlinear relationship that relates gene expression to enzymatic activities and metabolic fluxes. For example, protein levels and their activity are not directly proportional, and the same can be said about the relationship between gene and protein expression [54–59]. Therefore, a combination of different types of high-throughput data is necessary for complete understanding of a biological system within a particular physiological state [55,60–62]. Successful integration of multiple high-throughput data types was demonstrated in the case of HCDC of E. coli utilized for the production of various bio-products [60]. It was observed that during HCDC, the specific production rate (g product/gDW/h) decreased as the cell density increased. To understand why this phenomenon occurred, an integrated transcriptome and proteome analysis was conducted [60]. The transcriptome and proteome were both measured as a function of time during various phases of HCDC. The genes encoding TCA cycle enzymes as well as genes for NADH dehydrogenase and ATPase were up-regulated during the exponential phase and down-regulated upon entering the stationary phase, indicating a significant reduction in aerobic respiration at high cell densities [60]. Moreover, chaperone genes were found to be up-regulated suggesting that high cell density induces cellular stress. Surprisingly, a significant reduction in the expression of genes involved in amino acid biosynthesis was observed as cell density increased. Decreased availability of amino acids may then explain the decrease in the specific productivity during HCDC. With further study, the regulation of protein production as a function of cell density may be elucidated and strategies for increasing productivity can be developed.

15.5 Metabolic Engineering Based on Genome-Scale Models One of the biggest challenges in efforts to engineer overproduction of desired by-products is identifying genes that must be manipulated to successfully generate the desired phenotype. Choosing the most productive genetic manipulation strategies requires understanding how the altered metabolic

15-6

Future Prospects in Metabolic Engineering

pathways will function in the context of the whole system. The high-throughput data-based approaches described above typically allow identifying potential bottlenecks in already existing engineered strains, but they are less useful when used in a prospective fashion to attempt to select genes to overexpress or delete. This prospective design phase of metabolic engineering is where genome-scale models of metabolic networks have shown great promise. Constraint-based modeling techniques such as flux balance analysis (FBA) can be applied to genome-scale metabolic models to gain better insight into the interplay between metabolic pathways within the in vivo system and to rapidly evaluate potential engineering strategies [16,63,64]. The increasing availability of high-quality genome-scale metabolic models together with the development of novel constraint-based modeling methods has lead metabolic engineers to increasingly apply such models in engineering microorganisms to overproduce commercially desirable metabolic products [15,35–37].

15.5.1 Metabolic Engineering Aided by Flux Balance Analysis (FBA) Results obtained from genome-scale in silico model analysis frequently suggest metabolic engineering strategies that differ from those derived from simple inspection of the target pathways. For instance, in silico analysis may suggest overexpression of genes in pathways that were not considered as initial targets for engineering. This usually occurs due to the cofactor induced high level of interconnectivity between metabolic pathways [35]. Also, intracellular flux distributions obtained from simulating genome-scale models may suggest limiting factors for successful metabolic engineering, such as an inadequate supply of reducing agents or cofactors, or a pathway bottleneck [35,37]. So far, metabolic engineering with in silico experiments has mainly focused on the overproduction of bioproducts. Lee et al. [15] provide a typical example in which researchers identified the optimal combination of gene knockout targets that would improve succinic acid-production capability of an organism that does not naturally produce succinic acid in sufficient quantities (Figure 15.2). Researchers first used comparative genomics to identify genes that are present in E. coli, but are missing in the natural succinic acid producer M. succiniciproducens. These candidate gene targets for gene deletion were further investigated using combinatorial in silico knockout simulations with FBA of a genome-scale E. coli metabolic model. In silico analysis allowed identifying a set of multiple gene deletions that was predicted to result in succinic acid overproduction. The suggested genetic modifications were implemented in E. coli and fermentation data showed that the genetically modified strain had significantly increased succinic acid production. The ability to quickly simulate the outcome of a particular genetic modification and monitor its secretion or production profile in silico is one of the powerful tools that systems biology provides to researchers. In order to facilitate the usage of computational tools for metabolic engineering applications, additional in silico methods based on genome-scale models are being developed. The OptKnock method was developed for the systematic identification of gene knockout targets that couple the production of a desirable by-product to cellular growth [65]. The idea behind the approach is that the resulting in vivo gene deletion strain could then be subjected to evolutionary engineering (i.e., continuous adaptation) and increased product formation would happen as by-product of increasing growth rate. The OptKnock method relies on bi-level mixed-integer optimization that finds the optimal combination of gene deletions that allows maximizing the production rate of a bioproduct simultaneously with maximizing the biomass formation rate. This framework was applied to predict optimal gene deletions that allow growth-coupled overproduction of succinate, lactate, 1,3-propanediol [65], as well as amino acids [66], and the computational results were found to be in good agreement with experimental data collected from the literature. The gene deletion strategies suggested by OptKnock for lactate production were also implemented in vivo and as predicted laboratory evolution of the deletion strains resulted in increased lactate production [67]. It is likely that in silico experiments will play a key role in metabolic engineering applications beyond overproduction of metabolites that are native to the host organism. These applications include the production of new biologics that are not native to the host organism, broadening the substrate utilization range of an organism, designing novel biodegradation pathways, and modification of general cellular

15-7

Systems Biology, Genome-Scale Models, and Metabolic Engineering (a)

(b)

Glucose

Glucose

Phosphoenolypyruvate

Phosphoenolypyruvate

Pyruvate

pckA ppc

IdhA

Acetyl-CoA

Pyruvate

pckA ppc

Lactate

IpdA pfIB adhEF

Oxaloacetate

pta

Acetate

ackA

mdh

mqo

Malate

pta

ackA

Acetate Ethanol

aceB

mdh Glyoxylate

Malate fumA fumB fumC Fumarate sdhABCD

Succinyl-CoA

poxB Formate

Isocitrate

Oxaloacetate

frdABCD

Lactate

adhE adhC

Isocitrate

Succinate

IdhA

Acetyl-CoA

Ethanol

fumC

Glucose-6-phosphate

IpdA pfIB aceEF

Formate

adhE adhC

Fumarate

Glucose

pykA pykF ptsG

pykA

aceA frdABCD

Succinate

Single gene deletion

Succinic acid production

31

Multiple gene deletion

∆ptsG ∆aceBA Wild type/∆pykFA /∆sdhA/∆mqo

ptsG, pykFA deletion

0.05 0.1 0.15 0.2 Biomass formation(h–1)

W

W

31

ptsG, pykFA deletion strain

W

0.25

0.14 0.12 0.1 0.08 0.04 0.02 0

Succinic acid ratioa

10 G W W 31 31 10 10 G FO H W E 31 W 10G 31 F 10 H G F W H O 31 10 G F O W 31 10 W G F 31 10 G FA

18 16 14 12 10 8 6 4 2 0 0

0.05 0.1 0.15 0.2 Biomass formation(h–1)

31

0

10

G

9 8 7 6 5 4 3 2 1 0

10 311 0 G FO H W E 31 10 G W FH 31 10 G W FH O 31 10 G W FO 31 10 G W F 31 10 G FA

Succinic acid (mM)

∆ptsG ∆aceBA Wild type/∆pykFA /∆sdhA/∆mqo

Succinic acid ratio

18 16 14 12 10 8 6 4 2 0

W

Succinic acid production (mmol/g DCW/h)

Succinic acid production (mmol/g DCW/h)

Comparative genomics for identifying gene candidates

0.25

a

Calculated as succinic acid/(succinic acid+lactic acid+ formic acid+acetic acid+ethanol)

Figure 15.2 Metabolic engineering of E. coli for improved succinic acid production, based on comparative genome analysis between M. succiniciproducens (a) and E. coli (b), and combinatorial in silico simulation. Genes present only in E. coli were first identified as gene knockout targets. Subsequently, all possible combinations of those genes were simulated so as to find the mutant strain that produces the maximal production rate of succinic acid. In this study, mutant strain whose ptsG and pykFA genes were knocked-out showed the highest value among the candidates. W3110 is the wild-type strain and the mutant strains are named according to the corresponding gene(s) disrupted; H for sdhA, O for mqo, E for aceBA, G for ptsG, and FA for pykFA.

15-8

Future Prospects in Metabolic Engineering

properties (e.g., stress tolerance) for the facilitation of bioprocess applications. For instance, in the case of production of non-native biological by a host organism, the corresponding novel metabolic pathways can be easily added in silico to the reconstructed metabolic network of the wild-type organism. The properties of the organism expressing a heterologous pathway can then at least in principle be investigated in silico using standard constraint-based modeling methods. Whether such models that include heterologous pathways can be successfully validated and employed for the metabolic engineering of the target organism remains to be seen. In addition to applications to microbial metabolic engineering, the use of genome-scale models has expanded to the field of biomedical engineering, where drug discovery can be aided by genome-scale models [68,69]. With increasing number of fully sequenced organisms, the spectrum of possible applications of metabolic engineering has broadened significantly [70]. The availability of genome sequences also facilitates building genome-scale metabolic models for less well characterized organisms. Recently, the genomescale metabolic network of an archaeal species, Methanosarcina barkeri, was reconstructed and thoroughly studied for its methanogenetic properties [71]. This work will facilitate the metabolic engineering of archaea, including applications such as bioprocessing of toxic wastes and the generation of renewable energy. The reconstruction of genome-scale metabolic networks for mammalian cells has also been recently achieved including the completion of initial genome-scale metabolic models of mouse [31] and human [32]. Development of highly predictive genome-scale models of specific mammalian cell types would allow using these models additional metabolic engineering applications, such as improved production of antibodies [31].

15.5.2 Development of in Silico Methods for Metabolic Engineering Applications Although FBA and methods based on it such as OptKnock have proven to be very useful in the analysis of metabolic networks, these methods have a number of shortcomings. The fluxes obtained from FBA are those that support maximal growth of the cell (or the maximization of some other cellular objectives such as ATP) and thus do not necessarily reflect its true intracellular fluxes. The flux distributions obtained from FBA are typically also not unique as there may be multiple alternative pathways that the cell can utilize even in the optimal growth state. These issues become even more obvious in the case of simulations of gene deletion or overexpression strain phenotypes as it is not expected that metabolism would operate optimally in such strains. For this reason a number of new algorithms for metabolic network analysis have been developed that relax the optimality assumption. One such method is the minimization of metabolic adjustment (MOMA) algorithm to simulate the phenotype of the gene knockout mutant, which is based on the hypothesis that mutants generated in the laboratory have not undergone enough adaptation to achieve the optimal growth phenotype predicted by FBA [33]. MOMA employs quadratic programming to search the point in the altered solution space of a mutant strain that is closest to the optimal point in the solution space of the wild-type strain. In this way, MOMA tries to minimize the flux redistribution of the mutant and thereby more realistically capture the phenotypic characteristics of the mutant. MOMA was employed in the work of Alper et al., in which researchers identified a number of gene deletion targets that result in increased biosynthesis of lycopene in E. coli [14]. When implemented in vivo these deletions contributed to increasing production of lycopene by 40% compared to the lycopene-producing parental strain. Shlomi et al. [34] reported another new algorithm called regulatory on/off minimization (ROOM) for the search of realistic flux distributions of the gene knockout mutant. They specifically focus on the metabolic steady-state of the cell after knockout mutants are introduced. Previous experiments showed that global regulatory changes due to the knockout mutations eventually converge to a steady-state that is close to the metabolic state of the wild-type [33]. Based on these findings, researchers attempted to minimize the total number of significant flux changes after gene knockout compared to the wild-type flux distribution. Thorough investigation of the three algorithms for deletion strain phenotype prediction

Systems Biology, Genome-Scale Models, and Metabolic Engineering

15-9

(FBA, MOMA, and ROOM) show that MOMA more accurately describes the initial transient growth rates after gene knockout, whereas ROOM and FBA provide more accurate predictions of the growth of the gene knockout mutant at the final optimal metabolic steady state. In terms of the flux distribution at the final metabolic steady state, ROOM is shown to be superior to FBA and MOMA. Although there has been no report yet on the utilization of ROOM in metabolic engineering applications, it will contribute to efficient metabolic engineering through more accurate analysis of intracellular flux distributions.

15.5.3 Integration of Genome-Scale Models with Heterogeneous Data It has been demonstrated that genome-scale metabolic models are able to predict growth phenotypes for gene deletion strains with 65–80% accuracy depending on the organism and growth condition [63,64]. However, it has been observed that the accuracy of the prediction decreases and observed growth characteristics deviate from the experimental data in cases when multiple gene knockouts are required [14,35]. Aside from the incompleteness of the metabolic network, this discrepancy is often thought to be caused by the regulatory effects that the genome-scale stoichiometric metabolic model fails to capture. Therefore, approaches have been developed to incorporate regulatory information in silico into the metabolic network modeling process to improve prediction accuracy. Covert et al. incorporated data on transcriptional regulation into the metabolic model of E. coli in the form of a Boolean model of the known transcriptional regulatory network in E. coli [72]. In this regulatory network, genes can have only two states, either expressed or not expressed. Consequently, the reactions associated with particular genes that are inactivated under certain culture conditions can be constrained to zero flux and the metabolic model can be used to make predictions of metabolic phenotypes in the presence of regulatory constraints. Covert et al. demonstrated that the prediction capability of the genome-scale metabolic model was improved when it was combined with the regulatory network model. However, the predictive power of combined metabolic/regulatory networks is limited by our incomplete understanding of transcriptional regulatory network structures even in well-characterized organisms such as E. coli. For this reason, it is crucial that the network structures are continuously updated based on both high-throughput data and targeted experimentation. Another example where the combination of high-throughput data with in silico model predictions results in improved metabolic engineering strategies has been demonstrated by Alper et al. [73]. Alper et al. generated a transposon mutagenesis library of E. coli in vivo and screened the library to identify deletion strains with enhanced lycopene production. This screening approach allowed discovering additional gene knockout targets that may be strongly affected by the regulatory mechanism in the parental lycopene producing strain and thus would not have been identified based on the metabolic model (described in Section 15.5.2). Implementing the additional knockouts that had not been predicted by the metabolic model in vivo resulted in further increase in lycopene production. Finally, Park et al. [74] reported a metabolic engineering of E. coli for the production of L-valine where metabolic and regulatory information in the literature and transcriptome profiling were synergistically combined with in silico model to construct a superior strain. In this study, they first removed feedback inhibition and attenuations, and knocked out genes, all of which hamper the biosynthesis of L-valine, based on the published metabolic and regulatory information. With this strain as a base strain, they then identified several genes to be amplified from the comparative transcriptome analysis of the base and wild-type strain. The identified genes include a global regulator affecting the biosynthesis of L-valine and an L-valine exporter. Lastly, they further performed gene deletions by simulating the genomescale in silico model of E. coli in order to maximally improve the L-valine production capability of the strain. Very similar to the work of Alper et al., this work demonstrates how heterogeneous data that are complementary to each other (i.e., transcriptome profiles and simulation of the in silico model) can be systematically exploited, leading to successful metabolic engineering applications. The three studies above clearly demonstrate the usefulness of integration of genome-scale models with heterogeneous and high-throughput data for metabolic engineering purposes. It is expected that

15-10

Future Prospects in Metabolic Engineering

metabolic engineering will incorporate both in silico approaches and high-throughput experimentation as powerful and sophisticated tools for strain design.

15.6 Conclusion Systems biology can be considered to be the bridge that allows connecting the different aspects of biological function of a specific system to address various biological problems [75]. Systems biology methodologies can also be used in metabolic engineering, consequently enabling a systemic approach to engineering for example the overproduction of a metabolite. Modern systems biology includes a number of steps that lead to systemic description of, for example a microbial cell [76]. Generally, the first step in the systemic approach is the acquisition of large-scale quantitative and qualitative data and the identification of the components of the biological system. Genome-scale data such as genomic sequences and transcript or protein profiles can then be used to construct genome-scale models of metabolic as well as other types of networks. Once these models are built and verified experimentally, they can be used for various applications including metabolic engineering. High-throughput technologies and genome-scale metabolic models will serve as powerful tools for metabolic engineering [77–79] (Figure 15.3). The scope and depth of these tools is growing rapidly to respond to the needs created by novel metabolic engineering challenges. This section has provided examples of metabolic engineering studies that utilize high-throughput data [43,47,49,60] as well as

Succinic acid

Lactic acid

1,3-propanediol

L-lysine Amino acids Insulin-like growth factor I fusion protein

Ethanol Drug targeting Lycopene

Serine-rich proteins Poly (3-hydroxybutyrate)

X-omics technologies

Genome-scale models

Metabolic engineering based on systematical approaches

Figure 15.3 Various demonstrative studies of metabolic engineering based on systematic approaches including studies on the use of either high throughput technologies (X-omics studies) or genome-scale models or both.

Systems Biology, Genome-Scale Models, and Metabolic Engineering

15-11

hypothesis-driven metabolic engineering studies using genome-scale models [35,37,73]. These types of methods are now firmly established as a part of the metabolic engineer’s toolbox where they complement both classical metabolic engineering approaches and other emerging technologies such as those developed in the growing field of synthetic biology. In the near future, systems biology-enabled metabolic engineering based on high-throughput technologies and genome-scale models will give rise to advancements in various scientific fields, such as medical research allowing for the discovery of novel drug targets for specific diseases [77,79].

References 1. Lee, S.Y., Choi, H.S., and Kim, T.Y. Metabolic engineering. News Inform. Chem. Eng., 22, 436, 2004. 2. Lee, S.Y. and Papoutsakis, E.T. Metabolic Engineering. Marcel Dekker, Inc., New York, 1999. 3. Lee, S.J., Song, H., and Lee, S.Y. Genome-based metabolic engineering of Mannheimia succiniciproducens for succinic acid production. Appl. Environ. Microbiol., 72, 1939, 2006. 4. Ogata, H. et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res., 27, 29, 1999. 5. Overbeek, R. et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res., 33, 5691, 2005. 6. Cherry, J.M. et al. SGD: Saccharomyces Genome Database. Nucleic Acids Res., 26, 73, 1998. 7. Kitano, H. Systems biology: a brief overview. Science, 295, 1662, 2002. 8. Witkamp, R.F. Genomics and systems biology—how relevant are the developments to veterinary pharmacology, toxicology and therapeutics? J. Vet. Pharmacol. Ther., 28, 235, 2005. 9. Smid, E.J. et al. Functional ingredient production: application of global metabolic models. Curr. Opin. Biotechnol., 16, 190, 2005. 10. Bansal, A.K. Bioinformatics in microbial biotechnology—a mini review. Microb. Cell Fact., 4, 2005. 11. Patil, K.R. and Nielsen, J. Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc. Natl. Acad. Sci. USA, 102, 2685, 2005. 12. Stelling, J. et al. Metabolic network structure determines key aspects of functionality and regulation. Nature, 420, 190, 2002. 13. Borodina, I., Krabben, P., and Nielsen, J. Genome-scale analysis of Streptomyces coelicolor A3(2) metabolism. Genome Res., 15, 820, 2005. 14. Alper, H. et al. Identifying gene targets for the metabolic engineering of lycopene biosynthesis in Escherichia coli. Metab. Eng., 7, 155, 2005. 15. Lee, S.J. et al. Metabolic engineering of Escherichia coli for enhanced production of succinic acid, based on genome comparison and in silico gene knockout simulation. Appl. Environ. Microbiol., 71, 7880, 2005. 16. Hong, S.H. et al. The genome sequence of the capnophilic rumen bacterium Mannheimia succiniciproducens. Nat. Biotechnol., 22, 1275, 2004. 17. Reed, J.L. et al. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol., 4, R54, 2003. 18. Osterman, A. and Overbeek, R. Missing genes in metabolic pathways: a comparative genomics approach. Curr. Opin. Chem. Biol., 7, 238, 2003. 19. Green, M.L. and Karp, P.D. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics, 5, 39, 2004. 20. Francke, C., Siezen, R.J., and Teusink, B. Reconstructing the metabolic network of a bacterium from its genome. Trends Microbiol., 13, 550, 2005. 21. Edwards, J.S. and Palsson, B.O. The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc. Natl. Acad. Sci. USA, 97, 5528, 2000. 22. Edwards, J.S. and Palsson, B.O. Systems properties of the Haemophilus influenzae Rd metabolic genotype. J. Biol. Chem., 274, 17410, 1999.

15-12

Future Prospects in Metabolic Engineering

23. Schilling, C.H. and Palsson, B.O. Assessment of the metabolic capabilities of Haemophilus influenzae Rd through a genome-scale pathway analysis. J. Theor. Biol., 203, 249, 2000. 24. Paley, S.M. and Karp, P.D. Evaluation of computational metabolic-pathway predictions for Helicobacter pylori. Bioinformatics, 18, 715, 2002. 25. Schilling, C.H. et al. Genome-scale metabolic model of Helicobacter pylori 26695. J. Bacteriol., 184, 4582, 2002. 26. Thiele, I. et al. An expanded metabolic reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an in silico genome-sale characterization of single and double deletion mutants. J. Bacteriol., 187, 5818, 2005. 27. Förster, J. et al. Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res., 13, 244, 2003. 28. Duarte, N.C., Herrgard, M.J. and Palsson, B.O. Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Res., 14, 1298, 2004. 29. Becker, S.A. and Palsson, B.O. Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol., 5, 8, 2005. 30. Heinemann, M. et al. In silico genome-scale reconstruction and validation of the Staphylococcus aureus metabolic network. Biotechnol. Bioeng., 92, 850, 2005. 31. Sheikh, K., Förster, J. and Nielsen, L.K. Modeling hybridoma cell metabolism using a generic genome-scale metabolic model of Mus musculus. Biotechnol. Prog., 21, 112, 2005. 32. Duarte, N.C. et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc. Natl. Acad. Sci. USA, 104, 1777, 2007. 33. Segre, D., Vitkup, D., and Church, G.M. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. USA, 99, 15112, 2002. 34. Shlomi, T., Berkman, O., and Ruppin, E. Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proc. Natl. Acad. Sci. USA, 102, 7695, 2005. 35. Bro, C. et al. In silico aided metabolic engineering of Saccharomyces cerevisiae for improved bioethanol production. Metab. Eng., 8, 102, 2006. 36. Hong, S.H. et al. In silico prediction and validation of the importance of the Entner-Doudoroff pathway in poly(3-hydroxybutyrate) production by metabolically engineered Escherichia coli. Biotechnol. Bioeng., 83, 854, 2003. 37. Lee, S.Y., Hong, S.H., and Moon, S.Y. In silico metabolic pathway analysis and design: succinic acid production by metabolically engineered Escherichia coli as an example. Genome Informatics, 13, 214, 2002. 38. Burge, C. and Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 78, 1997. 39. Salzberg, S.L. et al. Microbial gene identification using interpolated Markov models. Nucleic Acids Res., 26, 544, 1998. 40. Salzberg, S.L. et al. Interpolated Markov models for eukaryotic gene finding. Genomics, 59, 24, 1999. 41. Mount, D.W. Bioinformatics, Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2001. 42. Fraser-Liggett, C.M. Insights on biology and evolution from microbial genome sequencing. Genome Res., 15, 1603, 2005. 43. Ohnishi, J.S. et al. A novel methodology employing Corynebacterium glutamicum genome information to generate a new L-lysine-producing mutant. Appl. Microbiol. Biotechnol., 58, 217, 2002. 44. Mills, J.C. et al. DNA microarrays and beyond: completing the journey from tissue to cell. Nat. Cell Biol., 3, 943, 2001.

Systems Biology, Genome-Scale Models, and Metabolic Engineering

15-13

45. Joyce, A.R. and Palsson, B.O. The model organism as a system: integrating ‘omics’ data sets. Nat. Rev. Mol. Cell Bio., 7, 198, 2006. 46. Gonzalez, R. et al. Global gene expression differences associated with changes in glycolytic flux and growth rate in Escherichia coli during the fermentation of glucose and xylose. Biotechnol. Prog., 18, 6, 2002. 47. Choi, J.H. et al. Enhanced production of insulin-like growth factor I fusion protein in Escherichia coli by coexpression of the down-regulated genes identified by transcriptome profiling. Appl. Environ. Microbiol., 69, 4737, 2003. 48. Patterson, S.D. and Aebersold, R.H. Proteomics: the first decade and beyond. Nat. Genet., 33, 311, 2003. 49. Han, M.J., Yoon, S.S. and Lee, S.Y. Proteome analysis of metabolically engineered Escherichia coli producing poly(3-hydroxybutyrate). J. Bacteriol., 183, 301, 2001. 50. Han, M.J. et al. Engineering Escherichia coli for increased production of serine-rich proteins based on proteome profiling. Appl. Environ. Microbiol., 69, 5772, 2003. 51. Dunn, W.B., Bailey, N.J., and Johnson, H.E. Measuring the metabolome: current analytical technologies. Analyst, 130, 606, 2005. 52. Rochfort, S. Metabolomics reviewed: a new “omics” platform technology for systems biology and implications for natural products research. J. Nat. Prod., 68, 1813, 2005. 53. Sauer, U. High-throughput phenomics: experimental methods for mapping fluxomes. Curr. Opin. Biotechnol., 15, 58, 2004. 54. Anderson, L. and Seilhamer, J. A comparison of selected mRNA and protein abundances in human liver. Electrophoresis, 18, 533, 1997. 55. Corbin, R.W. et al. Toward a protein profile of Escherichia coli: comparison to its transcription profile. Proc. Natl. Acad. Sci. USA, 100, 9232, 2003. 56. Glanemann, C. et al. Disparity between changes in mRNA abundance and enzyme activity in Corynebacterium glutamicum: implications for DNA microarray analysis. Appl. Microbiol. Biotechnol., 61, 61, 2003. 57. Griffin, T.J. et al. Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol. Cell Proteomics, 1, 323, 2002. 58. Lee, P.S. et al. Insights into the relation between mRNA and protein expression patterns: II. experimental observations in Escherichia coli. Biotechnol. Bioeng., 84, 834, 2003. 59. Mehra, A., Lee, K.H., and Hatzimanikatis, V. Insights into the relation between mRNA and protein expression patterns I. theoretical considerations Biotechnol. Bioeng., 84, 822, 2003. 60. Yoon, S.H. et al. Combined transcriptome and proteome analysis of Escherichia coli during the high cell density culture. Biotechnol. Bioeng., 81, 753, 2003. 61. Jurgen, B. et al. Proteome and transcriptome based analysis of Bacillus subtilis cells overproducing an insoluble heterologous protein. Appl. Microbiol. Biotechnol., 55, 326, 2001. 62. Lee, J.H. et al. Global analyses of transcriptomes and proteomes of a parent strain and an L-threonine overproducing mutant strain. J. Bacteriol., 185, 5442, 2003. 63. Edwards, J.S., Ibarra, R.U., and Palsson, B.O. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat. Biotechnol., 19, 125, 2001. 64. Famili, I. et al. Saccharomyces cerevisiae phenotypes can be predicted using constraint-based analysis of a genome-scale reconstructed metabolic network Proc. Natl. Acad. Sci. USA, 100, 13134, 2003. 65. Burgard, A.P., Pharkya, P., and Maranas, C.D. OptKnock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng., 84, 647, 2003. 66. Pharkya, P., Burgard, A.P., and Maranas, C.D. Exploring the overproduction of amino acids using the bilevel optimization framework OptKnock. Biotechnol. Bioeng., 84, 887, 2003. 67. Fong, S.S. et al. In silico design and adaptive evolution of Escherichia coli for production of lactic acid. Biotechnol. Bioeng., 91, 643, 2005.

15-14

Future Prospects in Metabolic Engineering

68. Raman, K., Rajagopalan, P., and Chandra, N. Flux balance analysis of mycolic acid pathway: targets for anti-tubercular drugs. PLoS Comput. Biol., 1, e46, 2005. 69. Yeh, I. et al. Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery. Genome Res., 14, 917, 2004. 70. Reed, J.L. et al. Towards multidimensional genome annotation. Nat. Rev. Genet., 7, 130, 2006. 71. Feist, A.M. et al. Modeling methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina barkeri. Mol. Syst. Biol., 2, msb4100046-E1, 2006. 72. Covert, M.W. et al. Integrating high-throughput and computational data elucidates bacterial networks. Nature, 429, 92, 2004. 73. Alper, H., Miyaoku, K., and Stephanopoulos, G. Construction of lycopene-overproducing E. coli strains by combining systematic and combinatorial gene knockout targets. Nat. Biotechnol., 23, 612, 2005. 74. Park, J.H. et al. Metabolic engineering of Escherichia coli for the production of L-valine based on combined transcriptome analysis and in silico gene knockout simulation. Proc. Natl. Acad. Sci. USA, 104, 7797, 2007. 75. Tadmor, B. and Tidor, B. Interdisciplinary research and education at the biology-engineeringcomputer science interface: a perspective. Drug Discov. Today., 10, 1183, 2005. 76. Bork, P. and Serrano, L. Towards cellular systems in 4D. Cell, 121, 507, 2005. 77. Lee, S.Y., Lee, D.-Y., and Kim, T.Y. Systems biotechnology for strain improvement. Trends Biotechnol., 23, 349, 2005. 78. Borodina, I. and Nielsen, J. From genomes to in silico cells via metabolic networks. Curr. Opin. Biotechnol., 16, 350, 2005. 79. Aderem, A. Systems biology: its practice and challenges. Cell, 121, 511, 2005.

16 Cell-Free Systems for Metabolic Engineering 16.1 Introduction ��16-1

What Is Cell-Free Biology?

16.2 A dvantages and Challenges of Cell-Free Systems for Metabolic Engineering ��16-2 16.3 Examples ��16-3

Kara A. Calhoun Stanford University

James R. Swartz Stanford University

Engineering a Cell Extract that Maintains Stable Amino Acid Concentrations • Activating Complex Metabolism in Cell-Free Reactions • Engineering Cell-Free Systems to Produce Disulfide-Bonded Proteins

16.4 Summary ��16-10 References ��16-11

16.1 Introduction Metabolic engineering is traditionally practiced with whole cell systems. However, the engineering of metabolism in a cell-free environment is also possible (Michel-Reydellet et al., 2004). In fact, in vitro reactions have several advantages over in vivo applications such as the ability to sample easily and to control reaction conditions more precisely. Better understanding of the cellular extract used in cell-free reactions has allowed complex, interrelated metabolic processes to be activated in a single test tube outside of the cell (Jewett and Swartz, 2004a). In this chapter, we highlight some of the advantages of cell-free biology for metabolic engineering and describe improvements made to cell-free reactions by stabilizing amino acids, activating complex metabolism, and engineering cell-free extracts that produce disulfide-bonded proteins.

16.1.1 What Is Cell-Free Biology? We define cell-free biology as the reproduction, study, and exploitation of complex biological processes without the use of intact cells (Jewett, 2005). Historically, cell-free protein synthesis is the main type of cell-free biology investigated. In fact, cell-free systems were first used to help decipher the genetic code over 45 years ago (Nirenberg and Matthaei, 1961). These early systems were studied using a “black box” approach. A crude cell extract provided the catalytic components necessary for protein translation to which a DNA template, amino acids, and energy source were added. Recently, a more systematic investigation of the cell extract and cell-free metabolism has been performed. Better understanding of cell-free protein synthesis has allowed for protein production of over 500 µg/mL in a batch reaction (Jewett and Swartz, 2004a). A cell extract is one of the main components of a cell-free reaction, and it typically supplies nearly all of the required catalysts. Our lab has focused on using E. coli extracts, but extracts from eukaryotic 16-1

16-2

Future Prospects in Metabolic Engineering

organisms have also been described (Anderson et al., 1983; Balkow et al., 1975). Detailed descriptions of the methods for extract preparation are described elsewhere (Swartz et al., 2004). Briefly, to prepare the extract, cells are harvested during mid-exponential phase and lysed with high-pressure homogenization. The lysate is centrifuged three times at 30,000 × g to remove cellular debris, chromosomal DNA, and any remaining intact cells. A “run-off” reaction is performed to allow dissociation of ribosomes from DNA, and the extract is dialyzed to remove small molecules. The extract provides the source of ribosomes and other soluble proteins necessary for the combined transcription-translation reaction. Since protein synthesis is an energy intensive process, an energy source must be added to the reaction. Traditionally, this compound contained a high-energy phosphate bond such as in phosphoenolpyruvate (PEP) or creatine phosphate (CP) to allow regeneration of ATP during the reaction. However, such compounds are very costly and typically constitute the majority of reagent cost (Calhoun and Swartz, 2005a). Improvements to cell-free reactions have been reported that allow the use of less expensive energy sources, such as glutamate and glucose. These systems will be described in more detail below. In addition to the extract and energy source, additional components of cell-free reactions include amino acids, nucleotides, cofactors, RNA polymerase and the DNA template containing the gene for the protein to be synthesized.

16.2 Advantages and Challenges of Cell-Free Systems for Metabolic Engineering Cell-free biology has several advantages over in vivo systems. For metabolic engineering purposes, the lack of the cell wall is one of the greatest advantages of cell-free systems. Sampling the reaction environment is much easier than for in vivo systems. In fact, HPLC methods have been described that allow for measurements of nearly 40 organic acids, nucleotides, and amino acids that are present during a cell-free reaction (Calhoun and Swartz, 2005a; Michel-Reydellet et al., 2004). In this way, we gain much more information about the protein synthesis environment than is possible from whole cell systems. The lack of cell wall also allows for direct control of the reaction environment. When substrates are limiting or an unwanted enzymatic reaction is present, oftentimes the problem can be addressed by direct addition of the limiting substrate (Kim et al., 1996) or addition of the appropriate reaction inhibitor (Kim and Swartz, 2000a). Furthermore, the conditions can be adjusted to promote protein folding by adding chaperonins or by altering the redox environment to allow disulfide-bond formation (Kim and Swartz, 2004; Yin and Swartz, 2004). Without a cell wall, transport issues are not a problem so cell-free systems are especially amenable to producing proteins with unnatural amino acid incorporation (Kigawa et al., 1995). Besides the benefits of removing the cell wall, cell-free systems also have advantages over in vivo systems in that all of the cellular resources are directed toward a single goal. This goal is usually production of a single protein, but engineering a cell-free reaction for small molecule bioconversions is also possible. The cellular resources are used most efficiently to produce the desired product when cell maintenance and viability are not a concern. Finally, cell-free reactions are particularly suited to certain applications such as high-throughput proteomics and personalized medicine (Yang et al., 2005). Linear DNA templates, such as PCR products, can be used directly in cell-free reactions (Michel-Reydellet et al., 2005). This allows multiplexed reactions for proteomic applications and quick turnaround for personalized medicine. In spite of the advantages of cell-free systems, their use is still not widespread mainly because of challenges with cost, scale-up, short reaction duration, and low product yields. The high cost of reagents, especially of the energy source and nucleotides, has limited cell-free reactions to laboratory scale. However, these challenges are being addressed by engineering the reaction environment to activate more complex metabolic processes in vitro so that less expensive energy sources can be used (Calhoun and Swartz, 2005a; Jewett and Swartz, 2004a). To address the short reaction duration and low product yields, careful analysis of major classes of substrates has been performed so that substrate limitations

16-3

Cell-Free Systems for Metabolic Engineering

can be identified and alleviated. Until recently, cell-free systems were regarded as somewhat of a “black box.” As with in vivo systems, the challenge remains to fully understand the complex metabolism that occurs during the in vitro protein synthesis reaction. This chapter provides examples that demonstrate how some of these metabolic challenges are being addressed. The first example explains how the cell-free system was engineered to alleviate amino acid limitations. Next, alterations to the cell-free environment are described that created conditions where complex metabolism, such as glycolysis and oxidative phosphorylation, could be activated. The last example describes how the cell-free extract was engineered to promote the formation of proteins that require disulfide bonds.

16.3 Examples 16.3.1 Engineering a Cell Extract that Maintains Stable Amino Acid Concentrations Amino acids are critical substrates for cell-free protein synthesis reactions. However, the crude cell extract used for typical cell-free reactions may contain enzymes that cause unwanted depletion of critical substrates, such as the energy source and the amino acids. Recent advances in cell-free protein synthesis have allowed over 500 µg/mL of protein production in a 3-hour batch reaction (Jewett et al., 2002). This increase in protein concentration can be partially attributed to alleviation of the substrate limitations. Limitations in amino acids were first identified by Kim et al. (1996) who determined that more protein was synthesized when twice the concentration of amino acids was used. This led to development of the PANOx system where the standard concentration of amino acids was increased from 0.5 mM to 2.0 mM (Kim and Swartz, 2001). In addition, it was shown that batch feeding of amino acids could prolong protein synthesis and increase protein concentrations (Jewett and Swartz, 2004b; Kim and Swartz, 2000b; Sitaraman et al., 2004). These results suggest that one of the factors limiting prolonged protein production in cell-free protein synthesis is the stability of amino acids. To address substrate limitations, more complex reactor configurations such as continuous or semicontinuous reactors have been successfully used (Kim and Choi, 1996; Spirin et al., 1988). One advantage of these reactors is the ability to maintain substrate supply from a support solution while simultaneously removing inhibitory reaction byproducts. However, batch reactions are much less expensive and easier to conduct at larger scale. We have developed a metabolic engineering strategy for cell-free systems that addresses the issue of amino acid stability in a batch format (Michel-Reydellet et al., 2004). First, we identified four amino acids that were depleted during the cell-free protein synthesis reaction: arginine, tryptophan, cysteine and serine. Next, we determined the specific enzymatic activities most likely responsible for the amino acid instabilities by literature examination or experiment. These enzymatic activities include arginine decarboxylase, tryptophanase, serine deaminase, and glutamatecysteine ligase (see Table 16.1). Finally, we deleted the genes encoding those enzymes (speA, tnaA, sdaA, sdaB, and gshA, respectively) from the original E. coli strain used to make the cell extract for cell-free Table 16.1 Summary of Enzymatic Activities Responsible for Amino Acid Depletion during Cell-Free Reactions Amino Acid Tryptophan

Enzyme (gene) Tryptophanase (tnaA)

Arginine

Arginine decarboxylase (speA)

Serine

Serine deaminase (sdaA, sdaB)

Cysteine

Glutamate-cysteine ligase (gshA)

Reaction L-tryptophan + H20 → indole + pyruvate + NH3 L-serine → pyruvate + NH3 L-cysteine + H20 → pyruvate + NH3 + H2S Arginine → agamatine + CO2

L-serine → pyruvate + NH3

L-cysteine + L-glutamate + ATP → γ-glutamylcysteine + ADP + Pi

Source: Reproduced from Calhoun, K.A., and Swartz, J.R., J. Biotechnol., 123(2), 193–203, 2006. With permission.

16-4

Future Prospects in Metabolic Engineering

reactions. The resulting strain, an A19 derivative named KC6, has the genotype A19∆endA∆tonA∆speA ∆tnaA∆sdaA∆sdaB∆gshAmet + (Calhoun and Swartz, 2006). By combining genetic modifications, we have produced a cell extract that stabilizes the four limiting amino acids (Figure 16.1) and maintains all amino acid concentrations over 1 mM in a 3-hour batch, cell-free reaction (data not shown). This system removes one of the major limitations of cell-free reactions, amino acid depletion. Protein synthesis in cell-free reactions using extract from the modified cell strain KC6 was compared to results from reactions using extract from the control strain NMR1 (genotype: A19∆endAmet + ). When producing the bacterial protein chloramphenicol acetyl transferase (CAT) under typical conditions, the use of extract from strain KC6 increased yields approximately 30%, from 530 ± 70 µg/ml to 670 ± 70 µg/mL in a 3-hour reaction (Figure 16.2). The benefit of using extract from strain KC6 is greater when the initial amino acid concentration is decreased to 0.5 mM. In this case, extract from KC6 increases protein yields by more than 250%, from 150 ± 20 µg/mL to 520 ± 100 µg/mL. Prior to

Arginine (mM)

(b)

3.0 2.5 2.0 1.5 1.0 0.5 0.0

Serine (mM)

(c) 0.6 0.5 0.4 0.3 0.2 0.1 0

0

30

60

120 90 Time (min)

150

180 (d) Cysteine (mM)

Tryptophan (mM)

(a)

0

30

60

120 90 Time (min)

150

180

2.5 2 1.5 1 0.5 0

2.5 2 1.5 1 0.5 0

0

30

60

120 90 Time (min)

150

180

0

30

60

120 90 Time (min)

150

180

CAT produced (µg/mL)

Figure 16.1 Concentrations of four amino acids during cell-free protein synthesis reactions using extract from modified strain KC6 (dashed line, triangle) and control strain NMR1 (solid line, squares). Results are the average of n = 3 reactions. (a) Tryptophan (b) Arginine (c) Serine (d) Cysteine. (Reproduced from Calhoun, K.A., and Swartz, J.R., J. Biotechnol., 123(2), 193–203, 2006. With permission.) 800 700 600 500 400 300 200 100 0

2.0 mM 0.5 mM Initial amino acid concentration

Figure 16.2 Protein synthesis results for CAT protein in cell-free reactions using extract from control strain NMR1 (white) or from modified strain KC6 (black). The reactions were run with two different initial amino acid concentrations. Results are the average of n = 9 reactions. (Reproduced from Calhoun, K.A., and Swartz, J.R., J. Biotechnol., 123(2), 193–203, 2006. With permission.)

Cell-Free Systems for Metabolic Engineering

16-5

identification of amino acid depletion as a limitation in cell-free reaction, most cell-free reactions were performed with this lower initial amino acid concentration. With the modified cell extract, we can achieve protein concentrations similar to the control extract with only one-fourth the amino acid reagents. This leads to a reduction in reagent cost as well as providing a more homeostatic environment for protein synthesis. In this example, a cell-free system provides a convenient platform to engineer the catalytic machinery for protein synthesis. Applying genetic modifications to the cell extract used in cell-free reactions has advantages over traditional in vivo reactions such as the ability to directly measure metabolic precursors and to maintain precise control over reaction conditions. Cell-free reactions avoid issues associated with membrane transport or with variability in catalytic composition as living cells respond to changing environmental conditions. This work on amino acid metabolism in cell-free reactions illustrates that when the appropriate metabolic targets are identified, beneficial changes enhance the performance of cell-free protein synthesis. Furthermore, this work suggests that the appropriate cell-free platform can provide a powerful tool for studying complex microbial physiology, i.e., systems biology.

16.3.2 Activating Complex Metabolism in Cell-Free Reactions In addition to amino acid supply, supplying energy for cell-free protein synthesis reactions is another challenge to the success of these systems. Oftentimes, short reaction duration is attributed to an unstable energy source. Traditional cell-free reactions use a compound with a high-energy phosphate bond, such as PEP (Kim et al., 1996), CP (Spirin et al., 1988), or acetyl phosphate (AP) (Ryabova et al., 1995) to generate the ATP required to drive transcription and translation. However, these compounds are expensive and can cause inhibitory levels of inorganic phosphate to accumulate in the reaction (Kim and Swartz, 1999). Recent work has led to an appreciation that complex metabolism can be activated during cell-free reactions. We have now engineered systems to generate ATP from less expensive energy sources while prolonging the protein synthesis reaction. These energy sources use complex metabolic pathways such as glycolysis and oxidative phosphorylation (Calhoun and Swartz, 2005b; Jewett and Swartz, 2004a). In this section, we describe the development of three energy-generating systems for cell-free reactions: PANOxSP, Cytomim, and glucose. Traditional batch cell-free reactions contain 30–50 mM of PEP, CP, or AcP to directly phosphorylate ADP to ATP. Although these compounds should be able to provide enough energy to make 1 mg/mL of protein, most reported yields are well below this value. Most likely, the major cause of the low yields is instability of the energy sources as they are degraded by nonspecific phosphatases present in the cell extract (Shen et al., 1998). Not only do these reactions reduce the productivity of the energy source, but they also generate high concentrations of inorganic phosphate in the cell-free reaction. Inorganic phosphate is known to be inhibitory to protein synthesis at concentrations above 30 mM by complexing with the essential cation, Mg + + (Kim and Swartz, 2001). To address this instability, reduction of phosphatase activity has been attempted with some success. Kim and Choi suggest that the addition of phosphate in the growth media reduces phosphatase levels in an E. coli system (Kim and Choi, 2000). The wheat-germ system has been improved through by immunoprecipitation of phosphatases (Shen et al., 1998). Another method for increasing the supply of the energy source is to use different reactor configurations, such as continuous-flow or continuous exchange (semi-continuous) reactors (Spirin et al., 1988). These designs allow continued supply of the secondary energy source, while providing a means for inhibitory byproducts removal (such as inorganic phosphate). Finally, the PURE system which uses all purified components to catalyze protein synthesis has no phosphatase activity (Shimizu et al., 2001). Overall, the use of compounds with high-energy phosphate bonds is a simple, yet expensive, way to drive in vitro transcription/translation.

16-6

Future Prospects in Metabolic Engineering

Because traditional energy sources are extremely expensive, the use of alternative compounds has been investigated. Most of these alternative energy sources require multi-step enzymatic reactions to generate ATP. For instance, another system, named PANOxSP (PEP, Amino Acids, NAD, Oxalic Acid, Spermidine, and Putrescine), still uses PEP as the main source of energy, but increases the efficiency of ATP generation through the addition of the cofactors, NAD and CoA, to activate the additional ATPgenerating reactions catalyzed by phosphotransacetylase and acetate kinase (Jewett and Swartz, 2004a; Kim and Swartz, 2001). In this way, up to 1.5 moles of ATP can be obtained from each mole of PEP instead of just 1 mole-ATP/mole-PEP. The use of oxalic acid is this system is beneficial because it inhibits the activity of PEP synthase (Kim and Swartz, 2000a), an enzyme that wastes ATP by requiring two high energy bonds to convert pyruvate to PEP. By activating multi-step pathways through cofactor addition and by inhibiting wasteful reactions, the PANOxSP system has advantages over traditional energy supply schemes in cost, efficiency, and stability. The ability to activate multi-step reactions through addition of cofactors or enzymes suggested the possibility of engineering entire energy-generating pathways in a cell-free environment. The basic hypothesis is that replicating intracellular conditions will activate complex intracellular functions. This concept led to the development of the cytomim system (Jewett and Swartz, 2004a). Some of the specific changes include replacing PEG, a high molecular weight compound used for nucleic acid stability, with the natural polyamines putrescine and spermidine. In addition, a nonphosphorylated energy source, pyruvate, replaced PEP, thus avoiding phosphate accumulation during the course of the reaction. This new reaction maintains pH homeostasis, so the buffer can be removed. In addition, acetate salts were replaced with glutamate salts. Together, these changes resulted in protein yields similar to the PANOxSP reactions, except with a much less expensive energy source. Interestingly, the high protein yields are well above what was expected through ATP-generation by conversion of pyruvate to acetate alone. Another energy-generating pathway, oxidative phosphorylation, is responsible for the additional energy. The necessary catalytic components for oxidative phosphorylation are present in the cell-free reaction from inverted membrane vesicles created during the cell extract preparation process. Verification of oxidative phosphorylation was accomplished in several ways. First, protein synthesis yields were shown to be oxygen-dependent, both in batch (Figure 16.3a) and in stirred tank reactors (Figure 16.3b). Oxidative phosphorylation inhibitors also significantly reduce protein yields for the cytomim system (Figure 16.3c) but not for the PEP-based system (Figure 16.3d). In addition to oxidative phosphorylation, cell-free protein synthesis reactions have also been engineered to use glycolysis as an energy-generation pathway (Calhoun and Swartz, 2005b). Glucose is the preferred carbon and energy source of many organisms and is also one of the least expensive and most desirable commercial substrates for industrial biotechnology applications. In order for cell-free protein synthesis to effectively compete with conventional in vivo approaches to protein production, it would be highly advantageous to develop a system where glucose can be used as the energy source. To use glucose for cell-free protein synthesis, the cytomim system had to be adapted slightly. For example, the conversion of glucose to acetate and lactate results in pH instability, so the use of a buffer or other appropriate pH control is necessary. In addition, without the use of a phosphorylated energy source, inorganic phosphate limits the glucose-based reactions. Phosphate is necessary for the initial reaction steps in the glycolytic pathway. When 90 mM Bis-Tris buffer and 10 mM phosphate are added to cell-free reactions with glucose as the energy source, significant protein yields are possible (~550 ug/ mL) (Calhoun and Swartz, 2005b). These yields are above those seen using glutamate metabolism and oxidative phosphorylation alone (Figure 16.4). To further characterize the cell-free reactions using glucose as an energy source, ATP concentrations were measured. In fact, the ATP levels for the glucose reactions were the highest of the various energy sources tested (Figure 16.5). This is consistent with the higher theoretical yield of ATP per mole of glucose (2 mol ATP/mol glucose) compared with PEP in the PANOx system (1.5 mol ATP/mol PEP). Along with ATP concentrations, glucose metabolism was verified by using radiolabeled glucose as a substrate

16-7

Cell-Free Systems for Metabolic Engineering (c) CAT yield (µg/mL)

700

500 400 300 200

O P 2/ Py yr/ 75 µ Py r/ O M r/ O 2/1 H O Q 2 2/ Py 2. mM N 5 r/ m TO A M TA P r/ 7 D Py yr/ 5 µ P NP r/ Ar M yr/ A /1 H A r/ m Q r 2. M N 5 m TO M TA D O N P 2 /7 5 O µM O O 2/1 H 2 2/ 2. mMQN 5 m T O M TA D N P A r

CAT yield (µg/mL)

600

100

CAT yield (µg/mL) dissolved oxygen µM) ATP (µM)

(b)

Pyr/ Ar

O2

Ar

r/

Pyr/ O2

Py

0

(d)

450 400

CAT yield Dissolved oxygen ATP

350 300 250 200 150

Oxygen feed interrupted

100 50 0

700 600 500 400 300 200 100 0

0

20

40

60 80 Time (min)

100

120

CAT yields (µg/mL) in the traditional PANOx system using PEP as an energy substrate

(a)

900 800 700 600 500 400 300 200 100 0

Control

75 µM HQNO

1 mM TTA

2.5 mM DNP

Figure 16.3 Oxygen dependent energy production in the cytomim system is caused by oxidative phosphorylation. (a) 20-μl cell-free batch reactions were carried out for 5 hours. CAT production yields from the cytomim system with (Pyr) or without 33 mM pyruvate in the presence (O2) or absence of oxygen (Ar) are shown. Error bars represent the standard deviation for n = 8 experiments. (b) 2 mL stirred tank cell-free reaction using the glutamatephosphate-NMP cytomim system. Total CAT yield, ATP concentration and dissolved oxygen concentration are plotted versus time. After 40 minutes, the oxygen feed was turned off. This resulted in complete consumption of available oxygen, reduction of ATP concentration, and termination of protein synthesis. (c) 20-μl cell-free batch reactions were carried out for 5 hours. CAT production yields from the cytomim system with (Pyr) or without 33 mM pyruvate in the presence (O2) or absence of oxygen (Ar) are shown. The reduction in protein synthesis after addition of either 75 µM 2-heptyl-4-hydroxyquinoline-N-oxide (HQNO, an inhibitor of electron transport) 1 mM thenoyltrifluoroacetone (TTA, an inhibitor of electron transport) or 2.5 mM 2-4-dinitrophenol (DNP, an uncoupling agent) indicates that oxygen dependent protein synthesis relies on energy derived from oxidative phosphorylation. Oxygen independent CAT expression is unaffected. Error bars represent the standard deviation for at least n = 6 experiments. (d) 20-μl cell-free batch reactions using the conventional PEP as an energy substrate (in other words, not the cytomim system) were carried out for 5 hours. Consistent with previous results, inhibitors of oxidative phosphorylation do not affect protein biosynthesis in this case (conducted in the presence of oxygen). Error bars represent the standard deviation for n = 6 experiments.

in cell-free reactions and subsequent monitoring of radioactive accumulation in byproducts. The majority of the radioactivity from uniformly labeled glucose accumulated in lactate and acetate, the anaerobic byproducts of glucose metabolism (Figure 16.6). The engineering of conditions that allow for complex metabolism such as oxidative phosphorylation and glycolysis in a cell-free environment is important not only for inexpensive ATP generation during protein synthesis, but also as an example of how complex biological systems can be understood and exploited through cell-free biology.

16-8

CAT produced (µg/mL)

Future Prospects in Metabolic Engineering 600 450 300 150 0

Glucose

Energy source

Glutamate

Figure 16.4 CAT protein production with (white) or without (black) an additional 10 mM phosphate in cell-free reactions using glucose as an energy source. These results are compared with a reaction without a secondary energy source that is producing protein through metabolism of glutamate present in reaction salts. Results are the average of n ≥ 9 reactions performed on at least three different days. Error bars represent one standard deviation. (Reproduced from Calhoun, K.A., and Swartz, J.R., Biotechnol. Bioeng. 90(5), 606–13, 2005b. With permission.)

ATP (mM)

1.2 0.9 0.6 0.3 0

0

30

60

90 120 Time (min)

150

180

Figure 16.5 ATP concentration during cell-free reactions with various energy sources: PEP (solid diamonds), glucose (open triangles), glutamate (solid circles). The ATP concentration was measured with a firefly luciferase assay. All reactions contain 10 mM additional phosphate. Results are the average of n = 3 reactions. Error bars represent one standard deviation. (Reproduced from Calhoun, K.A., and Swartz, J.R., Biotechnol. Bioeng. 90(5), 606–13, 2005b. With permission.)

Relative radioactivity accumulation

1.0 0.8 0.6 0.4 0.2 0.0

0 min

15 min 45 min 1 hr Reaction time

2 hr

3 hr

Figure 16.6. Relative accumulation of radioactivity in metabolites during cell-free protein synthesis reactions. Uniformly labeled 14C-glucose was used at the start of the reaction. Samples at various timepoints were applied to a HPLC system to separate organic acids. Fractions were collected and counted for radioactivity on a scintillation counter. Graphical representation of radioactivity accumulation: glucose (black), pyruvate (horizonal lines), lactate (white), acetate (vertical lines), other (patterned). Results are the average of samples from three experiments. (Reproduced from Calhoun, K.A., and Swartz, J.R., Biotechnol. Bioeng. 90(5), 606–13, 2005b. With permission.)

16-9

Cell-Free Systems for Metabolic Engineering Table 16.2 Reaction Components for Various Cell-Free Protein Synthesis Reactions Final Concentration Solution

Description

PANO × SP

Cytomim

Glucose

1

Salt solution

1 ×

1 ×

1 ×

Magnesium glutamate (mM)

20

8

8

2

Ammonium glutamate (mM)

10

10

10

Potassium glutamate (mM)

175

130

130

Master mix

1 ×

1 ×

1 ×

ATP (mM)

1.2

1.2

1.2

GTP (mM)

0.85

0.85

0.85

CTP (mM)

0.85

0.85

0.85

UTP (mM)

0.85

0.85

0.85

Folinic acid (ug/mL)

34

34

34

tRNA (ug/mL)

170

170

170

3

Bis-Tris (mM)

0

0

50

4

20 Amino acid mix (mM)

2

2

2

5

NAD (mM)

0.33

0.33

0.33

6

Coenzyme A (mM)

0.27

0.27

0.27

7

Putrescine (mM)

1

1

1

8

Spermidine (mM)

1.5

1.5

1.5

9

PEP (mM)

33

0

0

10

Glucose (mM)

0

0

33

11

Potassium phosphate (mM)

0

0

10

12

Sodium oxalate (mM)

2.7

2.7

0

13

14C-Leucine (uM)

5

5

5

14

T7 RNA polymerase (mg/mL)

0.1

0.1

0.1

15

pK7CAT (ug/mL)

13

13

13

16

E. coli S30 extract (vol)

24%

24%

24%

Table 16.2 lists the reaction conditions for cell-free reactions using the PANOxSP, cytomim, and glucose systems. The total protein yields of CAT that can be expected from these systems are 700 ± 105 µg/mL, 710 ± 50 µg/mL, and 550 ± 60 µg/mL, respectively.

16.3.3 Engineering Cell-Free Systems to Produce Disulfide-Bonded Proteins Disulfide bonds are required for many industrially relevant mammalian proteins, and disulfide bond formation can be induced by the addition of glutathione buffers (Kim and Swartz, 2004). However, the E. coli extract used for our cell-free reactions quickly reduces all glutathione because of two active reduction pathways catalyzed by glutathione reductase (Gor) and thioredoxin reductase (TrxB) (Knapp et al., 2007). An early solution to the problem of disulfide bond formation in CFPS involved a chemical approach whereby 1 mM iodoacetamide (IAM) was added to the extract. IAM derivatizes active site cysteines thereby inactivating TrxB and Gor (along with other potentially important enzymes). Cellfree production of several proteins that require disulfide bonds for activity was reported using IAMtreated extract. Examples are murine granulocyte macrophage colony stimulating factor (mGM-CSF)

16-10

Future Prospects in Metabolic Engineering

(mGM-CSF) µg/mL

600 500 400 300 200 100

IA M

0 K1

m

M

KG 0 K1 KG

KC

6

+

+

50

1m

M

KC

IA M

6

0

Figure 16.7 Cell-free production of mammalian mGM-CSF in reactions fueled with glucose. KC6 and KGK10 extracts were pretreated with the indicated concentration of IAM. The total (black bars) and active (white bars) yields are indicated. The data are an average of n = 6 experiments, with error bars of ± one standard deviation. (Reproduced from Knapp, K.G., Goerke, A.R., and Swartz, J.R., Biotechnol. Bioeng., 97(4), 901–908, 2007. With permission.)

and plasminogen activator (Yang et al., 2005; Yin and Swartz, 2004). However, these reports all utilized the PANOx system of CFPS with the expensive energy source PEP. The use of IAM-treated extract with glucose as the energy source was not possible because the IAM-treatment derivatized all sulfhydryl groups nonselectively. One of the key enzymes in glycoslysis, glyceraldehyde-3-phospate dehydrogenase requires a free thiol in its active site (Polgar, 1975). The IAM treatment likely renders this enzyme inactive and destroys the ability to use glycolysis for energy generation in the cell-free environment. Because of the substantial cost benefits of using glucose as an energy source, a novel metabolic engineering approach was used to reduce the activities of Gor and TrxB. First, the source E. coli strain, KC6, was genetically modified to delete the gor gene and inactivate the Gor pathway creating strain KGK10. This strain requires 20-fold less IAM to stabilize oxidized glutathione in the cell-free reaction. A second genetic deletion in the trxB gene was not feasible since an E. coli double mutant in gor/trxB has been shown to convert the enzyme AhpC from a peroxiredoxin to a disulfide reductase (Ritz et al., 2001). The ahpC mutation promotes more rapid growth and stimulates disulfide bond reduction. The open environment of the cell-free reaction allowed for a different approach for TrxB inactivation. Basically, a sequence encoding a high-affinity purification tag (hemagglutinin, HA) was added to the trxB gene in the chromosome of strain KGK10. After extract preparation, the extract was passed over an affinity column. The TrxB protein was retained on the column creating an extract devoid of known reduction pathways. The resulting KGK10-TrxB extract still required some minimal IAM pretreatment to completely avoid disulfide reduction, suggesting that other unidentified cytoplasmic reduction pathways exist (Knapp and Swartz, 2007). Nevertheless, the less intensive IAM treatment (50 µM vs 1 mM) allowed glucose to be used to fuel a production system that promotes oxidative protein folding. Extract from the KGK10 strain was able to produce almost 200 ug/mL of active mGM-CSF protein using glucose as an energy source (Figure 16.7). This is over double the yield obtained with the KC6 strain cell extract, and the new process provides a substantial cost benefit because of the less expensive energy source.

16.4 Summary Our recent attempts to engineer cell-free metabolism have focused on activating and integrating the complex, interrelated pathways that are needed for protein synthesis and oxidative folding. We have described three metabolic engineering examples in this chapter. First, we used genetic modifications

Cell-Free Systems for Metabolic Engineering

16-11

of the source strain used to make cell extract that alleviates amino acid substrate limitations. Through an understanding of the metabolism causing amino acid instability, protein synthesis yields have been increased. Second, we describe the construction of a new reaction environment that allows for activation of complex metabolic processes: glycolysis and oxidative phosphorylation. Energy generation in these systems is much less expensive than traditional cell-free reactions. Finally, through a novel engineering approach we have altered the reduction pathways in the E. coli extract so that disulfide-bonded proteins can be produced in a cell-free environment. As one of the most complex metabolic processes, there is much to learn from the interrelated systems that are active during cell-free protein synthesis. Until recently, it had not been widely appreciated that central metabolism could be activated and that in vitro systems could be carefully analyzed and easily controlled. These characteristics make cell-free biology an interesting and attractive system for continued metabolic engineering research.

References Anderson C.W., Straus J.W., and Dudock B.S. 1983. Preparation of a cell-free protein-synthesizing system from wheat germ. Methods Enzymol., 101:635–44. Balkow K., Hunt T., and Jackson R.J. 1975. Control of protein synthesis in reticulocyte lysates: the effect of nucleotide triphosphates on formation of the translational repressor. Biochem. Biophys. Res. Commun., 67(1):366–75. Calhoun K.A. and Swartz J.R. 2005a. An economical method for cell-free protein synthesis using glucose and nucleoside monophosphates. Biotechnol. Prog., 21(4):1146–53. Calhoun K.A. and Swartz J.R. 2005b. Energizing cell-free protein synthesis with glucose metabolism. Biotechnol. Bioeng. 90(5):606–13. Calhoun K.A. and Swartz J.R. 2006. Total amino acid stabilization during cell-free protein synthesis reactions. J. Biotechnol., 123(2):193–203. Jewett M.C. 2005. The Impact of Cytoplasmic Mimicry on Cell-free Biology. Doctor of Philosophy. Stanford: Stanford University. Jewett M.C. and Swartz J.R. 2004a. Mimicking the Escherichia coli cytoplasmic environment activates long-lived and efficient cell-free protein synthesis. Biotechnol. Bioeng., 86(1):19–26. Jewett M.C. and Swartz J.R. 2004b. Substrate replenishment extends protein synthesis with an in vitro translation system designed to mimic the cytoplasm. Biotechnol. Bioeng., 87(4):465–72. Jewett M.C., Voloshin A., and Swartz J. 2002. Prokaryotic systems for in vitro expression. In: Weiner M, Lu Q, editors. Gene Cloning and Expression Technologies. Westborough, MA: Eaton Publishing. 391–411. Kigawa T., Muto Y., and Yokoyama S. 1995. Cell-free synthesis and amino acid-selective stable isotope labeling of proteins for NMR analysis. J. Biomol. NMR, 6(2):129–34. Kim D.-M. and Swartz J.R. 2000a. Oxalate improves protein synthesis by enhancing ATP supply in a cellfree system derived from Escherichia coli. Biotechnol. Lett., 22(19):1537–42. Kim D.M. and Choi C.Y. 1996. A semicontinuous prokaryotic coupled transcription/translation system using a dialysis membrane. Biotechnol. Prog., 12(5):645–49. Kim D.M., Kigawa T., Choi C.Y., and Yokoyama S. 1996. A highly efficient cell-free protein synthesis system from Escherichia coli. Eur. J. Biochem., 239(3):881–86. Kim D.M. and Swartz J.R. 1999. Prolonging cell-free protein synthesis with a novel ATP regeneration system. Biotechnol. Bioeng., 66(3):180–88. Kim D.M. and Swartz J.R. 2000b. Prolonging cell-free protein synthesis by selective reagent additions. Biotechnol. Prog., 16(3):385–90. Kim D.M. and Swartz J.R. 2001. Regeneration of adenosine triphosphate from glycolytic intermediates for cell-free protein synthesis. Biotechnol. Bioeng., 74(4):309–16.

16-12

Future Prospects in Metabolic Engineering

Kim D.M. and Swartz J.R. 2004. Efficient production of a bioactive, multiple disulfide-bonded protein using modified extracts of Escherichia coli. Biotechnol. Bioeng., 85(2):122–29. Kim R.G. and Choi C.Y. 2000. Expression-independent consumption of substrates in cell-free expression system from Escherichia coli. J. Biotechnol., 84(1):27–32. Knapp K.G., Goerke A.R., and Swartz J.R. 2007. Cell-free synthesis of proteins that require disulfide bonds using glucose as an energy source. Biotechnol. Bioeng., 97(4):901–908. Knapp K.G. and Swartz J.R. 2007. Evidence for an additional disulfide reduction pathway in Escherichia coli. J. Biosci. Bioeng., 103(4):373–76. Michel-Reydellet N., Calhoun K., and Swartz J. 2004. Amino acid stabilization for cell-free protein synthesis by modification of the Escherichia coli genome. Metab. Eng., 6(3):197–203. Michel-Reydellet N., Woodrow K., and Swartz J. 2005. Increasing PCR fragment stability and protein yields in a cell-free system with genetically modified Escherichia coli extracts. J. Mol. Microbiol. Biotechnol., 9(1):26–34. Nirenberg M. and Matthaei J. 1961. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. PNAS, 47:1588–1602. Polgar L. 1975. Ion-pair formation as a source of enhanced reactivity of the essential thiol group of D-glyceraldehyde-3-phosphate dehydrogenase. Eur. J. Biochem., 51(1):63–71. Ritz D., Lim J., Reynolds C., Poole L., and Beckwith J. 2001. Conversion of a peroxiredoxin into a disulfide reductase by a triplet repeat expansion. Science, 294(5540):158–60. Ryabova L.A., Vinokurov L.M., Shekhovtsova E.A., Alakhov Y.B., and Spirin A.S. 1995. Acetyl phosphate as an energy source for bacterial cell-free translation systems. Anal. Biochem., 226(1):184–6. Shen X.-C., Yao S.-L., Terada S., Nagamune T., and Suzuki E. 1998. Protein productivity of cell-free translation was improved by removing phosphatase from wheat germ extract with immunoprecipitation. Biochem. Eng. J., 2:23–28. Shimizu Y., Inoue A., Tomari Y., Suzuki T., Yokogawa T., Nishikawa K., and Ueda T. 2001. Cell-free translation reconstituted with purified components. Nat. Biotechnol., 19(8):751–55. Sitaraman K., Esposito D., Klarmann G., Le Grice S.F., Hartley J.L., and Chatterjee D.K. 2004. A novel cell-free protein synthesis system. J. Biotechnol., 110(3):257–63. Spirin A.S., Baranov V.I., Ryabova L.A., Ovodov S.Y., and Alakhov Y.B. 1988. A continuous cell-free translation system capable of producing polypeptides in high yield. Science, 242(4882):1162–64. Swartz J., Jewett M.C., and Woodrow K. 2004. Cell-free protein synthesis with prokaryotic coupled transcription-translation. In: Balbas P, Lorence A, editors. Recombinant Gene Expression: Reviews and Protocols. Second Edition. Totowa, NJ: Humana Press. 169–82. Yang J., Kanter G., Voloshin A., Michel-Reydellet N., Velkeen H., Levy R., and Swartz J.R. 2005. Rapid expression of vaccine proteins for B-cell lymphoma in a cell-free system. Biotechnol. Bioeng., 89(5):503–11. Yin G. and Swartz J.R. 2004. Enhancing multiple disulfide bonded protein folding in a cell-free system. Biotechnol. Bioeng., 86(2):188–95.

17 In Silico Models for Metabolic Systems Engineering 17.1 17.2

Introduction ��17-1 Metabolic Systems Engineering..................................................17-3

17.3 17.4

Simulation Tools: E-Cell for Metabolic Systems Engineering ��17-7 Dynamic in Silico Simulation ��17-9

17.5

Practical Applications �� 17-14

Kumar Selvarajoo Keio University

Satya Nanda Vel Arjunan Keio University

Masaru Tomita Keio University

In Silico Methods • Stochastic Spatiotemporal Dynamics

Theoretical Illustration • Stochastic Spatiotemporal Simulations Budding Yeast Metabolism • Innate Immune Signaling

17.6 Future Prospects �� 17-18 Acknowledgment ��17-20 References ��17-21

17.1 Introduction Biological systems often display remarkable behaviors that are not easily anticipated or comprehended. One broad example is the ability of cellular systems to maintain phenotypic stability under vast and diverse conditions [1–4]. Also, cellular systems are in constant evolution involving numerous tightly controlled molecular interactions to achieve specific goals; metabolism to balance the cell systems’ energy requirement, immune response signalling for tackling invading pathogens etc. Therefore, the properties of cellular systems cannot be understood if we treat biological entities in isolation, rather we have to consider them as an integrated system. The reliance of only using traditional wet-bench biological techniques to study cellular behavior is therefore insufficient and the investigation of molecular interactions in detail is necessary in order to understand especially time-evolving biological properties, such as morphology, growth, metabolism, and disease progression. It is easy to imagine that cellular systems are complex and that the many cellular processes occur at random. However, we now realise that cellular interactions are structurally organized and can be interpreted in physical terms. Recent studies have revealed that large-scale biological networks are organized in a scale-free manner and their construction consists high degree of modularity [5–7]. It has also been proposed that at the elementary level the network consist of the building blocks of life, the network motifs, and these are connected into modular groups and the modular groups are hierarchically arranged [8–11]. The overall network structure of a complex system is thus built to ensure stability, or robustness to perturbations, and display emergent properties such as phenotypic oscillations that act as biological switches [4,12,13]. 17-1

17-2

Future Prospects in Metabolic Engineering

We can accept the notion that system dynamics and network construction have a close relationship. When we consider network to network communications, e.g., the interactions between intracellular signalling with transcriptional phenotype or protein expression with metabolic network behavior, we know that an understanding of all these interactions is vital in explaining the holistic behavior of cells. Moreover, cellular interactions are not static and are constantly evolving. In order to interpret network properties such as feedback control/regulation and oscillatory behavior, it is therefore important to temporally quantify the relevant biological entities, such as gene expression or metabolite concentration. Only through the analysis of such time variant interactions phenotype can we understand the dynamic cellular behavior. As dynamic cellular phenotypes cannot be comprehended by visual inspection or simple statistical or linear approaches, the development of appropriate complex network theory is thus essential. Over the last few years, there has been active development of systemic methodologies to decipher dynamic cellular behavior. This phenomenon led to the creation of a new interdisciplinary field, called systems biology, inviting scientist across various fields to actively participate in joint research. Although interdisciplinary research involving biology has been in existence for a long time, in a rather ad hoc manner, only in the last 5–6 years that we have witnessed a global consorted effort [14,15]. The goal of systems biology is to generate, integrate, and analyze biological data, both in time and space, (i) for the understanding of molecular circuit design in detail and (ii) to predict the response of cellular system to various extracellular and intracellular perturbations. A typical cellular system consists of hundreds of thousands of molecular interactions and to consider them in entirety, though desirable, is an overwhelming and impossible task. Therefore, to reduce such complexity, it would be appropriate to modularize cellular systems into layers of biological interest, for example, modularizing pathways of gene regulation system, for the determination of gene to gene interaction, signal transduction cascades for the understanding of extracellular signal propagation into the nucleus, and metabolic pathways for calculating the redistribution of fluxes to a given concentration perturbation. Although this kind of modularization concept preceded well before systemic approaches were evaluated, the detailed molecular machinery that govern the interactions between biological components along each layer and between layers cannot be understood without the introduction of theoretical concepts. (Figure 17.1). Metabolites

Proteins

RNA

DNA

Figure 17.1 Schematic depicting the levels or layers of interactions found in biological system.

In Silico Models for Metabolic Systems Engineering

17-3

System biology methods are based upon formalized theories, in most cases utilizing physico-chemical laws. They are intended to provide better insights into the underlying molecular circuitry that controls complex biological systems. Another advantage of systems biology is the possible reduction of time and cost associated with traditional biological research, as the ability to perform optimized experiments in silico becomes an increasing reality [16–18]. In the field of metabolic engineering, the desire has been to manipulate the cellular system so as to optimize or improve cellular properties leading to an increased industrial output, for example, optimal production of ethanol for beer brewery. However, using simple intuition or linear approaches to manipulate the metabolic pathways involving the desired substance often leads to failure [19,20]. This is not surprising as we now know that the biological system consists of highly nonlinear regulatory properties and hence targeting just one step in a network may not yield a beneficial outcome [11]. It is thus inevitable to consider the complexity of cellular system in a systemic manner and this is only possible if we consider the use of mathematical and computational approaches to supplement the ongoing wetbench experimental research. In this chapter, we briefly introduce the concept of metabolic systems engineering (Section 17.2). We mention some of the popularly used theoretical approaches and introduce our very own computational platform, the E-Cell systems (Section 17.3), which can be used for metabolic engineering studies. In Section 17.4 we perform simple theoretical examples to show the utility of dynamic analysis of metabolic networks. Also in Section 17.5, we provide some practical examples of dynamic models that could benefit the metabolic engineering community. We end the chapter by mentioning some of the future trends and requirements for the field.

17.2 Metabolic Systems Engineering Metabolic engineering is aimed at improving the biological properties of a cell, by the exploitation of its metabolic network design. In the past, selective breeding of better yielding strains were used for industrial and medical gain, the production of penicillin by Penicillium chrysogenum is a good example. The process involved several iterations between selection of new strains and mutating them. As the success of this field started to expand, it attracted scientists from various disciplines; biochemists, chemical engineers, analytic chemists, microbiologists, and physiologists. This resulted in the development of better analytical methodologies such as, recombinant DNA technology, which introduces purposeful intermediary pathways or genetic changes that usually result in better yielding strains [21,22]. The concept of metabolic engineering can be broken down into steps. The initial step is the selection of appropriate metabolic pathways which involve the cellular substance that is desired to be increased, for example, glycolysis and related pathways for the production of alcohol. As mentioned in the introductory section, since it is daunting to evaluate pathways involving thousands of reactions, metabolic engineers usually modularized their interest into specific pathways or network constituting of manageable size, usually in tens of reactions. The next step is the identification of the most effective target within this framework that could be modified for improved specific transport, increased product formation, or optimized conversion of substrate. This then requires the development or utilization of methods and tools to achieve the intended result, for example reducing an inhibitor enzyme concentration by using PCR-based gene deletion strategy [23,24]. However, there often exists unknown or unexpected metabolic regulation that eventually does not lead to the required result or does not yield the intended production volume. Schaaff et al. investigated the production rate of ethanol by overexpressing eight glycolytic and fermentative pathway enzymes of Saccharomyces cerevisiae by placing their genes on multicopy vectors [20]. By doing so, they increased specific enzyme activities between 3.7 and 13.9-fold in logarithmically growing cultures. Surprisingly, at that time, the increases in the activities of the different glycolytic enzymes did not affect the rate of ethanol production significantly as compared with wild type. This experiment was perhaps one of the early experiments that demonstrated that living cells are robust

17-4

Future Prospects in Metabolic Engineering

to diverse perturbation. Metabolic engineering thus cannot just rely on recombinant DNA techniques alone for success, it also requires the effort from systems biology. Therefore, it is important to review this field as metabolic systems engineering. In metabolic systems engineering, we assess the prospects of utilizing computational models to further refine and optimize the current metabolic engineering design. The aim is to initially develop a dynamic computational or in silico model that simulates metabolic fluxes, the amount of product that accumulates in a cell or efflux out of the cell, the strength of enzyme activities that participate in the system and their directions under various perturbation conditions, using existing knowledge of the metabolic system of interest, including linear and nonlinear regulatory features. Basically, the initial model (with system parameters) is built based on what we know currently about the system. The next step is to perform computational analysis to determine the optimal target reaction (Figure 17.2). We can perturb the model (e.g. in silico enzymatic inhibition) at multiple steps or specific known key regulatory steps and by the analysis of the simulation result, we determine the combination of perturbation that yields the most beneficial in silico result. We then perform genetic changes to verify whether the

Cytoplasm network selected Schematic cell with mitochondrion Step 1

Computational representation A Step 2

B

x1

x2 x3 x4 D C x5 x6 x7 E

Step 3

x1

x2 x3 x4 D C x5 x6 x7 Step 4

Step 6 Industrial production of compound E

A B

F

Step 5

Computational optimization

E

F

Compound E is required to be increased. Overexpressing enzyme ×5 alone does not produce significant E. Overexpressing ×4,×5 and inhibiting ×6 results in better yield.

Testing in silico prediction with wet-bench experiments

Figure 17.2 In silico analysis and optimization of metabolic networks for better product yield. Although a etabolic system consists of an entire cell or organism, some degree of isolation is necessary in order to analyze m it. Step 1: Model Abstraction. This involves selecting pathways of manageable size in which the component(s) that exert strongest control over metabolic flux is(are) present. Step 2: Model Construction. Computational model (reference model) of the selected pathways is developed based upon existing knowledge, and system parameters are usually chosen to match experimental results of control (e.g., wild type). Step 3: Model Optimization. The reference model is first perturbed (e.g., mutation) at selected locations (single or multiple) using intuition for optimal production of required substance(s) in silico. The model simulation is iterated until a satisfactory intended in silico outcome is obtained. Step 4: Experimental Testing. The simulation result is used to design new experiments that would produce the intended result. Usually, there will be a need to fine tune the computational model with wet-bench experiments (Step 5). Step 5: Model Iteration. The in silico models are tuned until satisfactory experimental system behavior, due to perturbations, is obtained. This is a cyclic process. Step 6: Industrial Production. Once desired optimized result is obtained, the experiments are scaled up for industrial volume output. (From Selvarajoo, K., FEBS Lett. 580, 1457, 2006. With permission.)

In Silico Models for Metabolic Systems Engineering

17-5

desired outcome is actually produced. In many cases the model simulations may not look satisfactory to the experimental findings, as we may not yet know other key regulatory features of the system. In such cases, we have to perform iterative work between in silico prediction and experimental findings (model refinement), before a desired outcome is achieved [25] (Figure 17.2). In the following section, we introduce some of the theoretical methods popularly used to model metabolic systems.

17.2.1 In Silico Methods There are a number of computational and mathematical approaches popularly used to model metabolic network behavior. These include kinetic methodologies such as enzyme kinetics [26] and metabolic control analysis (MCA) [27], stoichiometric approaches, such as metabolic flux analysis (MFA) [28] and flux balance analysis (FBA) [29], and power law formalism such as the biochemical systems theory (BST) [30]. In kinetic methodologies, mathematical models of metabolic networks are created with the aid of detailed enzyme kinetic equations. The successful simulation of such models requires the system parameters (usually rate constants) to be known a priori. For example, to create a metabolic network model, ordinary differential equations (usually composed of Michaelis–Menten type equations) are set up to describe the fluxes for each metabolite. These are then integrated to obtain the metabolite concentrations over time. As most metabolic reactions are complex involving cofactors or other substrate regulation, the resultant ordinary differential equations are usually complex and not solvable using analytic approaches. Often numerical schemes are introduced to overcome this difficulty [31]. The main problem with kinetic methodologies, however, is the determination of the system parameters which are highly limited. Therefore, simplifying assumptions are generally used to make model simulation [32]. This often results in poor prediction of cellular response and requires improvement by iterative work involving experimental work (Figure 17.2). Kinetic methodologies are often used to determine key regulatory steps of metabolic pathways. In MCA, the philosophy of modelling metabolic reactions is different. It is not intended to be used to discover a single rate limiting or key regulatory step. Rather, its use focuses on discovering the collective control of a series of interconnected reactions. Hence, it introduces the concept of control, that is, a measure to determine the effect one reaction has on all the interconnected metabolic pathway reactions. It thus defines and incorporates terms like flux and flux control in traditional enzyme kinetics. Instead of assuming the existence of a unique rate-limiting step, it assumes that there is a definite amount of flux control and that this is spread quantitatively among the component enzymes [27]. That is, MCA proposes the idea that the regulation of a cell requires the coordinated activity changes of multiples enzymes by analyzing how the control of fluxes and intermediate concentrations in a metabolic pathway is distributed among the different enzymes that constitute the pathway. The applications of MCA have resulted in notable successes in metabolic engineering usually involving detailed flux calculations with rationalized strain improvements [22]. Stoichiometric methodologies such as MFA and FBA are used when detailed kinetic information of metabolic interactions are not available. These models are therefore, not usually of a kinetic nature, rather they rely upon mass-action constraints to mathematically represent the direction for metabolic modulation. Therefore, they are mostly suited for steady-state analysis of a biological system under a given perturbation. In FBA, metabolic fluxes are represented using stochiometry and assembled into matrices. This usually results in a greater number of metabolic fluxes than the number of mass balances, implying a plurality of feasible flux distributions. Objective functions in metabolic essence, for exxample optimal growth rate, are introduced and chosen to explore the best use of the metabolic network within a given metabolic genotype [33]. In genetic perturbations studies, such as knockouts or overexpressions, flux profiles are determined by the use of an optimizing function, the minimization of metabolic adjustment (MOMA) [34]. Stephanopoulos’s group used MOMA as an additional constraint to study heterologous expression of

17-6

Future Prospects in Metabolic Engineering

lycogene in E. coli using stoichiometric modelling [22,35]. They performed both single and multiple in silico gene knockouts to optimize the production of lycopene. Their simulation trends were subsequently verified through experiments [36]. Another theoretical method that has been used widely in modelling biochemical network but has not gained much popularity within the metabolic engineering community is the BST. BST is the original work of Savageau [37] and is aimed at addressing the characterization of integrated biological systems that cannot be represented to a large extent by linear systems. BST, hence, is a mathematical representation of nonlinear biological systems. The main essence is to consider reaction rates by general power-law expressions:

dXi = αi dt

n +m

∏ j =1

g

X j ij - βi

n +m

∏X

hij j

(17.1)

j =1

where X1, …, Xn are dependent variables (dynamic concentrations of internal metabolites), Xn + 1, …, Xn + m are external variables (fixed concentrations of external metabolites), gij, hij are kinetic orders, which may be noninteger and nonpositive, and ai, bi are rate constants. In logarithmic coordinates, Equation 17.1 can be interpreted as a linearization of nonlinear kinetics, and as such BST claims to be a better approximation of reaction kinetics than linear expressions. BST suggests that all reactions that generate a metabolite Xj are combined into a single reaction with net vi, and all reactions that consume the same metabolite similarly are combined into another reaction with net rate v−i. The rate of each of these combined reactions is approximated by power-law expressions and from mass balances around all the metabolites, a system of differential equations can be written that can be studied in detail for its control characteristics [30].

17.2.2 Stochastic Spatiotemporal Dynamics The methodologies discussed so far are only concerned with static or temporal molecular concentration variations, neglecting the fact that molecular concentration can also vary in space. In vivo systems often consist of well defined intracellular compartments such as mitochondrion, nucleus, and golgi apparatus. Intracellular molecules can be localized within these cellular structures through membrane anchoring or sequestration. Glucokinase, for example, is sequestered with glucokinase regulatory protein and predominantly remains in the nucleus of hepatocytes prior to glucose intake [38]. In addition, the molecular accessibility and mobility at different regions of the cellular environment is subjected to cytosol viscosity, dynamic subcellular structure and intracellular molecular crowding [39,40]. Such heterogeneity in temporal molecular distribution can highly influence the reaction kinetics of interacting molecules. In saponin-skinned cardiac fibres studies, for example, the Km value for ADP to ATP conversion in situ mitochondria was found to be an order of magnitude higher than in isolated mitochondria because of in vivo diffusion limited reactions [41]. As such, to accurately determine the reaction kinetics of the molecular species that is known to participate with several intracellular compartments, spatial consideration to metabolic reaction modelling is an important future direction. The observation that certain biological networks are inherently stochastic by nature has lead to the discovery of selective phenotypic switching behavior of cellular systems [42]. Stochastic effects are usually observed when molecules are present in low-copy numbers, for example mRNA levels. This condition is usually not true for many metabolic systems and hence this approach has often been neglected. However, we know that certain metabolites or enzymes within a metabolic framework can be in low concentrations, for example 1,3-BPG in glycolysis [43]. It will be interesting to observe in silico how such specific low concentration spots could affect the propagation of downstream metabolism if we consider stochasticity aspects.

In Silico Models for Metabolic Systems Engineering

17-7

There are several simulation approaches that consider both stochastic and spatiotemporal aspects for cellular systems. Takahashi et al. and Lemerle et al. provide comprehensive up-to-date reviews on these approaches [44,45]. More recently, Tolle and Le Novere argued that among the many approaches, particle based simulation with individual molecule resolution can best reproduce in vivo phenomena such as substrate channelling and colocalization of molecules [46,47]. In Section 17.3, we introduce the concept of space and noise in metabolic networks through theoretical examples using our newly developed particle based simulation approach with single molecule resolution [48]. We demonstrate that the simultaneous coupling of space and noise can significantly alter the phenotypic outcomes of metabolic pathways.

17.3 Simulation Tools: E-Cell for Metabolic Systems Engineering As systems biology approaches become increasingly appreciated and adopted, there is a need for the development of systemic tools to perform theoretical analysis of biological processes. As most mathematical models developed to represent biological processes involve large number of reactions or interactions often involving highly nonlinear equations with multiple parameters, it daunting to solve them without proper computational tools. Furthermore, these computational tools must be available in a form appreciated by biologists, whom are usually not well versed in detailed programming, with ease of use and analysis. For example, model construction, parameter selection or estimation, simulation results comparison with experimental findings, model modifications etc. should be done with ease without the requirement to possess programming skills or to know the detailed background architecture of the computational tools. In this light, there have been numerous efforts across the globe to develop user friendly computational simulation platforms. As of today, there are more than 90 such tools freely available (http://sbml.org/index.psp) and among the many, one of the earliest and pioneering computational tool developed is the E-Cell simulation platform [49]. The E-Cell simulation system was induced in 1995 at our institute with the aim to perform simulation and analysis of an organism’s entire metabolic reaction kinetics. This ongoing effort also incorporates several other methodologies, apart from reaction kinetics, such as MCA, FBA and S-systems, into one simulation platform (Table 17.1). As a consequence, the E-Cell system can be used to model several biological processes albeit metabolic pathways modelling, such as membrane transport, transcription, translation, DNA replication, signal transduction etc. [50]. We know that certain cellular process like metabolic reactions can be treated as deterministic processes while others like gene regulation networks are usually considered stochastic events [51,52]. Using modern simulation tools, like E-Cell, we can selectively use deterministic approaches to model protein interactions at the receptor and cytoplasm and stochastic processes in the nucleus for gene expression output for a more accurate representation of the entire signalling process. The E-Cell platform also provides the user with freedom to combine and test various methodologies into one model simulation. For example, Nakayama et al. developed the hybrid dynamic and static simulation (HDSS) method to combine the simulation technique of kinetic with stoichiometric methods (Figure 17.3). They claim that such hybrid methods could optimize the benefits of both methods to yield faster and improved simulation results with lesser reliance on detailed kinetic parameters which are often difficult to obtain [53]. Apart from model creation and simulation, the E-Cell system also features intelligent built-in optimization processes such as genetic algorithm and genetic programming for the determination of system parameter values (e.g., rate constants) and the selection of system mechanism type (e.g., type of enzyme regulation), respectively when dynamic experimental information (e.g., temporal metabolite concentration profiles) are available [54,55]. In the next section, using E-Cell, we show the development of simple models, to demonstrate the utility of dynamic computational models that can be used to interpret biological network properties.

17-8

Future Prospects in Metabolic Engineering Table 17.1 Core Features of E-Cell System Type of Feature

Description

Modelling capabilities Model types

Stochastic and deterministic events Enzyme kinetics Metabolic control analysis Flux balance analysis Biochemical systems theory (S-systems) Spatial simulation* Hybrid dynamic/static simulation* Gillespie-Gibson (stochastic) Explicit/implicit Tau-leap (stochastic) Langevin method (stochastic) Radau5/Dormand-Prince adaptive Dormand-Prince 4(5)7M explicit Fehlberg 2(3) explicit Euler explicit Radau5 implicit Genetic algorithm Genetic programming* Distributed computing

Algorithms

Model optimization schemes Computing optimization schemes (parallel computing) User interface

Real time user intervention and visualization Python scripting for automation of simulation Graphical Model Editing Linux and Microsoft Windows XP

Platform Source code

Object Oriented C + + /Python with GPL license SBML and EML

File types *Under implementation.

Cytoplasm A B

D

C

E Nucleus I

H

F

G

J K

Thick lines – kinetic analysis (cytoplasm) Thin lines – flux balance analysis (nucleus)

Figure 17.3 Combining multiple simulation algorithms is the idea behind hybrid dynamic/static simulation (HDSS) process. Metabolic fluxes between A to K can be simulated using two methods, kinetic reaction analysis for metabolites A to G and stoichiometric flux determination for metabolites G to K. This method calculates the metabolic fluxes for all reactions without the necessity to obtain all kinetic parameters. (From Yugi, K., Nakayama, Y., Kinoshita, A., and Tomita, M., Theor. Biol. Med. Model., 2, 42, 2005. With permission.)

In Silico Models for Metabolic Systems Engineering

17-9

17.4 Dynamic in Silico Simulation 17.4.1 Theoretical Illustration In this section, we introduce basic ideas to illustrate the usefulness of studying dynamic models to understand regulatory behavior of biological networks. The intention is to show how analyzing system dynamics may change the way we view or understand biological phenotype. For example, how do we understand feedback regulation controlling the flux distribution of a metabolic system? To illustrate, let us consider a simple theoretical metabolic pathway system, consisting of five metabolites, with a negative feedback mechanism. Figure 17.4a shows that increasing concentration of metabolite D negatively controls the flux through metabolite C. (b)

(a)

– A

B

C

D

E

(c)

Figure 17.4 (See color insert following page 13-20.) (a) A simple schematic of negative feedback system in metabolic pathways. (b) The temporal simulation profile of metabolite concentrations of a hypothetical system as depicted in Figure 17.4 and Table 17.2a. (c) The temporal simulation profile of metabolite concentrations of a hypothetical system as depicted in Figure 17.4 and Table 17.2b (without negative feedback mechanism). In (b) and (c) all metabolites are initially at steady-state levels and at t = 0 s, the concentration of A is increased instantaneously (perturbed) by 266 molecules or 0.44 mM. The x-axis represents time in seconds and the y-axis represents the number of metabolites. (Using an assumed volume of 1e–18 l, we could covert the y-axis to metabolite concentration, if necessary.) All simulations were carried out using the E-Cell system version 3.

17-10

Future Prospects in Metabolic Engineering

Table 17.2 Initial Concentration, Transient Concentration, Kinetic Reaction Formulae and the Parameter Values: (a) for Linear Pathway with Feedback Mechanism; (b) without Feedback Mechanism, as Depicted in Figure 17.4 Metabolite, S

S0 Steady-State Conc. (mM)

S1 Kinetic Formulae

Parameter Values (1/s)

S Trans. Conc. t = 253 s (mM)

(a) A

0.21

B

0.21

C

dE1 = - k5 E1 + k4 D1 dt dA1 = - k1 A1 dt dB1 - k2 B1 = + k1 A1 dt q(D - Do + 1)

k1 = 0.01

0.24

k2 = 0.01, q = 1 (1/mM)

0.52

0.46

dC1 k2 B1 = - k3C1 + dt q(D - Do + 1)

k3 = 0.0045

0.51

D

0.26

dD1 = - k4 D1 + k3C1 dt

k4 = 0.008

0.28

E

0.19

dE1 = - k5 E1 + k4 D1 dt

k5 = 0.011

0.20

A

0.21

dA1 = - k1 A1 dt

k1 = 0.01

0.24

B

0.21

dB1 = - k2 B1 + k1 A1 dt

k2 = 0.01

0.30

C

0.46

dC1 = - k3C1 + k2 B1 dt

k3 = 0.0045

0.65

D

0.26

dD1 = - k4 D1 + k3C1 dt

k4 = 0.008

0.33

E

0.19

k5 = 0.011

0.22

(b)

The model parameters, k values, were selected so that the metabolites reach designated (hypothetical) steady-state levels with a constant source of 0.0023 mM/s given to metabolite A. Once the steady-state levels were reached, we reset the simulation time to zero and pulse the metabolite A by 266 molecules. So, a typical metabolite, S, is ∞ represented by: S = S0 + ∫ 0 (dS1/dt ) dt.

We performed two types of simulations with this system, one in which the feedback regulation is “switched on” and the other with the feedback regulation “switched off”. Table 17.2 shows the simulation details such as kinetic formula, parameter values and end simulation result for the various metabolites shown in Figure 17.4a. (The actual theoretical models can be downloaded from http://e-cell.org/ community/models.). When we compare the simulations between the two cases, we notice that transiently the flux through metabolite C to E is reduced due to the negative feedback mechanism. However, eventually at larger simulation time, the differences between the two cases for all metabolites ceases (Figure 17.4b and c). That is to say, the steady-state levels are similar in the presence or absence of negative feedback regulation in such a metabolic pathway. This simple analysis of metabolic phenotype suggests that steady-state condition alone is insufficient for the discovery of novel regulatory network features of metabolism. We next extend this illustration to include a slightly modified scenario that results in profound difference between the steady-state levels in the presence or absence of negative feedback mechanism. Figure 17.5a includes an additional reaction for metabolite B, which converts it to metabolite F. In this renewed scenario, the flux through metabolite C to E is noticeably reduced under negative feedback control (Figure 17.5b and c and Table 17.3). In silico dynamic models can, therefore, allow us to predict

In Silico Models for Metabolic Systems Engineering

17-11

metabolic network behavior under various types of conditions. Such models should be increasingly used as part of metabolic engineering design.

17.4.2 Stochastic Spatiotemporal Simulations The issue of biochemical movement within cells, especially intercompartmental exchanges, could be an important aspect of biological or metabolic regulation that is often left out due to lack of availability of experimental information or theoretical expertise. One such example is the translocation of pyruvate, in mammalian cells, from cytoplasm to mitochondrion. (b)

(a) F A

– B

C

D

E

(c)

Figure 17.5 (See color insert following page 13-20.) (a) Metabolite B having an additional reaction that converts it to metabolite F. The temporal simulation profile of metabolite concentrations, (b) without and (c) with negative feedback mechanism, a hypothetical system depicted in Figure 17.6 and Table 17.3. Initially all metabolites remain at steady-state condition and at t = 0 s, the concentration of A is increased instantaneously (perturbed) by 266 molecules or 0.44 mM (volume of cell is assumed to be 1e–18 l). The x-axis represents time in seconds and the y-axis represents the number of metabolites. All simulations were carried out using the E-Cell system version 3.

17-12

Future Prospects in Metabolic Engineering

Table 17.3 Initial Concentration, Final Concentration, Kinetic Reaction Formulae and the Parameter Values: (a) for Branching Pathway with Feedback Mechanism and (b) without the Feedback Mechanism, as Depicted in Figure 17.6 Metabo-lite, S

S0 Steady-State Conc. (mM)

S1 Kinetic Formulae

Parameter Values (1/s)

S Quasi-Steady-State Conc. t = 300 s (mM)

(a) A

0.65

dA1 = - k1 A1 dt

k1 = 0.1

0.21

B

0.06

dB1 - k2 B1 = dt q(D - Do + 1)

k2 = 0.7, k6 = 0.06, q = 1 (1/mM)

0.06

C

0.26

dC1 k2 B1 = - k3C1 + dt q(D - Do + 1)

k3 = 0.05

0.26

D

0.14

k4 = 0.06

0.14

E

0.11

dD1 = - k4 D1 + k3C1 dt dE1 = - k5 E1 + k4 D1 dt

k5 = 0.0001

0.30

F

0.08

- k6 B1 + k1 A1

0.32

dF1 = k6 B1 dt

(b) A

0.65

B

0.06

C

0.26

D

0.14

E

0.11

F

0.08

dA1 = - k1 A1 dt dB1 = - k2 B1 - k6 B1 + k1 A1 dt dC1 = - k3C1 + k2 B1 dt dD1 = - k4 D1 + k3C1 dt dE1 = - k5 E1 + k4 D1 dt

k1 = 0.1

0.21

k2 = 0.7, k6 = 0.06

0.06

k3 = 0.05

0.26

k4 = 0.06

0.14

k5 = 0.0001

0.50 0.12

dF1 = k6 B1 dt ∞

A typical metabolite, S, is represented by: S = S0 + ∫ 0 (dS1/dt ) dt.

To demonstrate computationally how intercompartmental diffusion and localization of biochemical molecules can affect the overall reaction kinetics of a metabolic network, we developed two three-dimensional in silico models of a single cell consisting of a few reactions. In the first model, the cell constitute of only a single compartment, the cytoplasm and all the metabolites are free to diffuse and react anywhere within this compartment (Figure 17.6a). When the metabolite A is pulse perturbed, the concentration of metabolites C and D reached steady state levels of 0.74 mM and 0.75 mM respectively, at around t = 750 s (Figure 17.6c). The details of the model are shown in Table 17.4. In the second model, we introduced another compartment, like the mitochondrion, and localized one of the reaction’s enzyme, E2, within this compartment (Figure 17.6b). This means that the enzyme E2 exclusively reside only in mitochondrion and cannot travel outside the compartment. Under this renewed situation, with metabolite A perturbed in the same way, the steady-state levels of C and D reached, 0.64 mM and 0.83 mM, respectively and the time to reach steady-state is t = 1500 s (Figure 17.6d and e). These simulations reveal, even for a simple situation, considering spatial effects produce significant changes in the time to reach steady-state conditions. In addition, the steady-state levels for C and D also differ perceptibly. Theoretically, the delay and changes in the steady-state levels are caused by (i) the intercompartmental diffusion of metabolites

17-13

In Silico Models for Metabolic Systems Engineering

and due to the absence of enzyme E1 and E3 in the mitochondrion compartment and (ii) enzyme E2 being located in the mitochondrion only; metabolite B, which diffuses through Brownian motion, is unable to be catabolized by E2 as frequently as it could under the noncompartmental situation. Our example, even though minimal, demonstrates the utility of spatiotemporal effects when incorporated with in silico models representing multiple reactions across multiple intracellular compartments. (Similar results can also be shown for noncompartmental localization of molecules at different regions of the cell.) Though the present usage spatial simulation is limited due to the general lack of quantitative experimental data, the advent of fluorescence correlation spectroscopy, immunoelectron microscopy and other related technologies may change the situation in the future [56,57]. (a)

(b)

E1

A

D

E1 D

E3

E3

B E2

B

C Mitochondrion

E2 C Cytoplasm

(c)

A

Cytoplasm

(d)

Figure 17.6 A schematic of four reactant metabolic pathway developed using spatial simulation algorithm (Box 17.1). (a) A hypothetical cell with only one compartment, the cytoplasm. (b) A hypothetical cell with two compartments, cytoplasm, and mitochondrion. Reactions A to B to D takes place within the cytoplasm and reactions B to C occurs in the mitochondrion. (c) The dynamic in silico simulations of the various reactants concentration in the cytoplasm for model represented in (a). (d) The dynamic in silico simulations of the various reactants concentration in the cytoplasm, (e) in the mitochondrion and (f) both combined (overall), obtained using model represented in (b). The x-axis represents time in seconds and the y-axis represents the number of metabolites. The volume of cell used is 2e–18 liter. All simulations were carried out in E-Cell system version 3.

17-14 (e)

Future Prospects in Metabolic Engineering (f)

Figure 17.6 (continued)

Box 17.1 Spatiotemporal Stochastic Simulation Algorithm

W

e developed a novel spatial simulation algorithm, which describes the threedimensional cell space and components such as intracellular compartments, and performed our simulations using Monte–Carlo technique [48]. The three-dimensional space, corresponding to the simulated cell volume, is discretized into a lattice of spheres arranged in hexagonal close-packing (Figure 17.B1). A molecule can occupy a single sphere in the lattice and diffuse based on its diffusion probability to one of its 12 adjacent spheres in a time step (Figure 17.B2). The selection of a destination sphere out of the 12 adjacent spheres is performed randomly. After a large number of time steps, the diffusion of each molecule converges into a Brownian motion. The time step interval is determined from the diameter of the sphere, i.e., the molecule’s displacement in a time step. The diameter of sphere on the other hand, is determined from the diffusion coefficient of the fastest moving molecular species, such that the diffusion probability is unity in a time step. The diffusion probability of other slower species is computed from its diffusioncoefficient and the time step interval. During diffusion, if the destination sphere is occupied by another molecule and the molecule is its reaction partner, both molecules can form a complex probabilistically based on their reaction rate (Figure 17.B3). If the molecule in the destination sphere is not a reaction partner, the molecule stays at its currently occupied sphere. On the other hand, if the molecule is a complex, it can also dissociate based on its dissociation probability into two separate molecules, with one occupying the currently occupied sphere and the other, occupying one of the free neighboring spheres that is randomly selected (Figure 17.B4).

17-15

In Silico Models for Metabolic Systems Engineering

Box 17.1 (continued)

Figure 17.B1

Figure 17.B2 A+B

A

C

C

B

t = n + ∆t

t=n

Figure 17.B3 C

A+B

B A

C

t = n + ∆t

t=n

Figure 17.B4 Table 17.4 Model Details for in Silico Cell with (a) One Compartment, Cytoplasm (b) with Two Compartments, Cytoplasm and Mitochondrion; (c) Represents the System Parameters and Reactions (a) Cytoplasm Compartment (Total Vol.: 2.0e–18 l) Metabolite/ Enzyme A B C D E1 E2 E3

Initial Value (t = 0 s) mM 1.49 0 0 0 0.06 0.03 0.03

Steady-State value (t = 750 s) mM 0 0 0.74 0.75 0.06 0.03 0.03 (continued)

17-16

Future Prospects in Metabolic Engineering Table 17.4 Model Details for in Silico Cell with (a) One Compartment, Cytoplasm (b) with Two Compartments, Cytoplasm and Mitochondrion; (c) Represents the System Parameters and Reactions (Continued) (b) Cytoplasm Compartment (Total Vol.: 1.85e–18 l) Metabolite/Enzyme A B C D E1 E2 E3

Initial Value (t = 0 s) mM 1.34 0 0 0 0.06 0.03 0

Steady-State Value (t = 1500 s) mM 0 0 0.33 0.42 0.06 0.03 0

Mitochondrion compartment (Total Vol.: 1.5e–19 l) A B C D E1 E2 E3

0.15 0 0 0 0 0 0.03

0 0 0.31 0.41 0 0 0.03

(c) System Parameters (Reaction Probability) p1 = 0.004

System Reactions A + E1→B + E1

p2 = 0.006

B + E2→C + E2

p3 = 0.006

B + E3→D + E3

17.5 Practical Applications 17.5.1 Budding Yeast Metabolism Metabolomics is an emerging science that aims at temporal quantification of metabolites in cellular system [58]. Although the field still faces many challenges to produce accurate high-throughput metabolic profiling, there have been recent successes when considering smaller network quantification. For example, Theobald et al. and Visser et al. have temporally quantified the primary energy metabolites and adenine nucleotides of Saccharomyces cerevisiae in pulse perturbed experiments [59,60]. The generation of such in vivo “snap shot” of metabolism is indispensable as it allows one to check in silico predictions over a period of time rather than just comparing at steady-state conditions. In this section, we discuss an example of how dynamic models can be used to decipher key regulatory steps of metabolic pathways. Figure 17.7a through c, adapted from Theobald et al. shows a section of glycolytic phenotype of Saccharomyces cerevisiae [59]. By simple visual inspection or static analysis, we are unable to understand the mechanism underlying the dynamic changes in the glycolytic phenotype. For example, in Saccharomyces cerevisiae glycolysis we expect glucose pulse to be predominantly metabolized into pyruvate, lactate, ethanol, and glycerol (end products). However, this prediction using stoichiometry does not allow us to comprehend the results shown in Figure 17.7b and c. We would not expect, for instance, phosphoenol pyruvate (PEP) levels to reduce after glucose pulse experiments. Also, by using stoichiometry, we would expect glycerol levels to rise significantly for the amount of glucose pulse given (Figure 17.7a).

17-17

In Silico Models for Metabolic Systems Engineering Glc

(b) Membrane

G3PH FBP DHAP 3P-Glyc ALD NADH NAD TIM GAP NAD + Pi GAPH NADH 1.3-DPG ADP PGK ATP 3-PG ADH Etoh PGM EN NADH NAD ADP PEP PK ATP ALDH Ac Pyr PDC Ald NAD NADH NAD NADH

AMP ATP

PDH

TCC

Ac-CoA-Synth GDP + FAD + 3 NAD CTP + FADH2 + 3 NADH

G6P,F6P (mmol/l)

Pi FBPase

A G6P F6P B

3 2 1 0 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.15

C

D 2.0

0.10

3-PG

0.05 0.00

GAP

1.5 1.0 0.5

0

30

60

90 120 Time (s)

150

3-PG (mmol/l)

PPshunt

F6P

ATP PFK ADP

Cytoplasm

6 5 4 3 2 1 0

FBP (mmol/l)

ATP Glc HK ADP G6P PGI

PEP (mmol/l)

Perm

GAP (mmol/l)

(a)

0.0 180

Mitochondria

Glycerol (mmol/l)

(c)

B

0.15 0.10 0.05 0.00 Time (s)

Figure 17.7 (a) The glycolytic pathway of Saccharomyces cerevisiae. (Adapted from Theobald, U. et al Biotechnol Bioeng., 55, 305, 1997. Reprinted with permission of John Wiley & Sons, Inc.) (b) The temporal changes in the levels of glucose-6-phosphate (G6P), fructose-6-phosphate (F6P) (A), fructose 1,6-bisphosphate (FBP) (B), PEP (C), Glyceraldehyde phosphate (GAP)/3-phosphoglycerate (3PG) (D) from steady-state conditions after a glucose pulse perturbation. (c) The changes in the levels of glycerol from steady-state conditions after a glucose pulse perturbation. We represent GAP as G3P throughout the text. (a) and (b) have the same time scale.

The certain answer to these misnomers is that we do not yet understand the glycolytic pathway regulations well enough to use it for industrial optimization. There still exist novel regulatory features that require elucidation. The best way that we could approach the problem is to analyze the dynamic glycolytic phenotype of the Saccharomyces cerevisiae using systemic in silico methods. Recently, Selvarajoo and Tsuchiya analyzed the result shown in Figure 17.7 using a novel dynamic network analysis method [61]. They suggest an unassuming location of glycolysis, the reaction catalyzed

17-18

Future Prospects in Metabolic Engineering

by aldolase, may have become saturated and thus causes the lower than expected production of glycerol. To verify this result, they performed an additional test using traditional mass-action kinetic analysis with pulse perturbation and obtained similar result (Figure 17.8). Although their prediction that aldolase might be a novel key regulator of glycolysis has not been validated with subsequent wet experiments, there is the hope that in silico models could be utilized to decipher previously undiscovered key regulatory steps which may turn up to benefit metabolic engineering field, such as targeting the suggested novel steps for increased/decreased production of substances of interest/concern.

17.5.2 Innate Immune Signaling In innate immunity, the Toll-like receptors (TLRs) play a central role in combating invading pathogens by the induction of proinflammatory chemokines and cytokines [62]. The activation of TLR receptors are self-limiting but in certain cases, aberration of the signalling mechanism leads to (b)

F6P

(a) k1

F6P

k2

FBP

k3

k4

GLY

k5

Concentration (mM)

1.2 1 0.8 0.6 0.4 0.2

FBP

(d)

2.5

25

50

75

100 125 Time (s)

150

175

25

50

75

100 125 Time (s)

150

175

GLY 0.15

Concentration (mM)

Concentration (mM)

(c)

2 1.5 1 0.5 25

50

75

100 125 Time (s)

150

0.125 0.1 0.075 0.05 0.025

Figure 17.8 Mass-action kinetics analysis for local network involving fructose-6-phosphate (F6P), fructose 1,6bisphosphate (FBP) and glycerol (GLY). A unit pulse perturbation is given to F6P. The aim was to determine the various rate constants of the model, by making a close fit to the experimental data, and then comparing their values to infer the presence or absence of any rate-limiting phenomenon. In order to fit the model to experimental data, the value of k2 has to be much larger than that of k4 and k5 (k2 = 0.05, k4 = 0.0004, k5 = 0.0002). This mathematically indicates that the reaction aft of FBP causes bottleneck, which suggests that the enzyme responsible, aldolase, is a key regulatory enzyme for glucose pulse experiments. (Solid lines indicate simulation result and dotted points indicate experimental result obtained.) (From Selvarajoo, K. and Tsuchiya, M. Systematic determination of biological network topology: Non-integral connectivity method (NICM). Humana Press, Totowa, New Jersey, 449–471, 2007; Tolle, D. and Le Novère, N. Curr. Bioinform., 1, 315, 2006; Theobald, U., Mailinger, W., Baltes, M., Rizzi, M., and Reuss, M. Biotechnol. Bioeng., 55, 305, 1997. With permission.)

In Silico Models for Metabolic Systems Engineering

17-19

inflammation which eventually results in chronic diseases such as asthma, rheumatoid arthritis, multiple sclerosis etc. Although understanding proinflammatory signalling is key in resolving inflammation, there is little known about the regulatory role of the various intracellular signalling molecules. One example is IκB kinase (IKK) α. Recently, Lawrence et al. implicated that IKKα limits the activation of NF-κB in macrophages and therefore could be one of the candidate target for downregulating inflammation [63]. Lawrence et al. performed various time-course experiments. Although they report very interesting and fascinating results, they did not perform systemic analysis that may have potentially influenced their final conclusion. For example they relatively quantified mRNA levels of several chemokines and cytokines at various time points for both wild type and IKKα mutant macrophages under lipopolysaccharide (LPS) stimlus (Figure 4a of Ref. [63]). Many of the mRNA levels show distinct features. For example, Bfl-1, A20, GADD45β have similar response profiles while KC and MIP-2 have similar response profiles. By grouping similar mRNA response profiles together and using systemic approaches, we could possibly determine the connectivity of genes, or gene networks which may eventually help in the better understanding of signalling network [64]. To understand TLR4 signaling mechanism in a more systemic manner we developed a computational model of TLR4 signaling pathway using a deterministic approach [65,66]. The in silico model was designed to simulate, for both wild type and myeloid differentiation primary-response protein 88 (MyD88) knockout macrophages, the expression of all known protein interactions in the cytoplasm and the mRNA levels of two genes encoding the proteins, IP-10 and TNF-α [65]. We proposed, through our systemic model, that the kink observed for the temporal phenotype of TNF-α mRNA in the IKKα wildtype may not be an experimental artefact, but rather it displays the behavior of the TLR4 signaling network; the kink is a consequence of superposition of two signals, one coming from the MyD88-dependent pathway and the other delayed signalling from the MyD88-independent pathway (Figure 17.9). We subsequently also performed an in silico IKKα mutant simulation which resulted in increased mRNA expression of TNF-α in accordance with experimental findings (data not shown). However, in contrast to the concept that IKKα is a negative regulator of NF-κB, we observed IKKα mutant causes a bottleneck at the signaling upstream of IKK complex thus resulting in more flux through the alternative pathway of JNK and p38 which then results in increased expression of TNF-α. Although this prediction is preliminary and requires further investigations, this alternative explanation to the role of IKKα brings us to consider systemic approaches that could elucidate nonintuitive behavior of biological networks. The resultant in silico hypothesis should then be complemented with wet-bench reality.

17.6 Future Prospects Although metabolic engineering field is constantly advancing with the introduction of modern combinatorial tools that explores cellular behavior, the quantitative optimization of biochemical products has yet to take huge strides forward. This is partly due to the fact that our knowledge of biological network behavior is still very limited. One way to overcome this difficulty is to use the knowledge gained from studying network architectures found in nonbiological fields and apply such insights to biological interactions. Barabasi et al. studied the “wiring” of world-wide-web and social networks and found that complex networks are often designed in a scale-free manner [5,6]. The protein interaction network of Saccharomyces cerevisiae was later shown to possess similar characteristics [67]. Such observations, which suggests that certain (dominant) elements in a network are much more highly connected than others, could pave way for the discovery of key “hubs” of biological (or metabolic) network. The “hubs” could then be targeted, say, by drugs for eliminating key disease progression. Therefore, uncovering the design principle of biological network construction is very important. Another challenge lies with wet-bench research. Even though today, we are presented with a deluge of biological information gathered from high throughput experimental sources such as microarray and

17-20

Future Prospects in Metabolic Engineering LPS

Extracellular

Cytoplasm

TLR4

TRIF

MyD88

TRAM

IRAK4

MyD88-dependent pathway Early-phase NF-κB activation

MyD88-independent pathway Late-phase NF-κB activation

IRAK1

TRAF6

TAB1

TAB2

TAK1

IKKz IKKα IKKβ

IκBa NF-κB

Nucleus NF-κB

NF-κB

Figure 17.9 Schematic of TLR4 signaling pathways. Upon LPS binding to TLR4, NF-κB is activated through two pathways, the MyD88-dependent and MyD88-independent pathways. The MyD88-dependent pathway consist of MyD88, IRAK1 and 4, TRAF6, TAB/TAK and IKK complexes. The MyD88-independent pathway is less understood, hence, at this stage that TRIF activates NF-κB is the only universally accepted mechanism. (From Miggin, S.M. and O’Neill, L.A., J. Leukoc. Biol., 80, 226, 2006. With permission.)

mass spectrometry, the raw data generated are usually not in a form that could easily be used for in silico model analysis. Often there are issues to remove experimental artefact such as noise (especially, for low concentration species) and accurate deconvolution of spectra peaks (mass spectrometry). In addition, the reproducibility of high throughput data is also a major challenge. Nevertheless, the quantitation of metabolic phenotype is gradually improving. The slow but steady progress of systems biology will eventually result not only in the advancement of metabolic engineering but also revolutionize the industrial bioprocess output.

Acknowledgment We thank Masa Tsuchiya from our institute, Koichi Matsuo of Keio Medical School, Lam Kong Peng of the Institute of Molecular and Cellular Biology, Singapore and Ravichandran Ramasamy of Columbia University, for reviewing the chapter.

In Silico Models for Metabolic Systems Engineering

17-21

References 1. Alon, U., Surette, M.G., Barkai, N., and Leibler, S. Robustness in bacterial chemotaxis. Nature, 397, 168, 1999. 2. Little, J.W., Shepley, D.P., and Wert, D.W. Robustness of a gene regulatory circuit. EMBO. J., 18, 4299, 1999. 3. Eldar, A., Dorfman, R., Weiss, D., Ashe, H., Shilo, B.Z., and Barkai, N. Robustness of the BMP morphogen gradient in Drosophila embryonic patterning. Nature, 419, 304, 2002. 4. Kitano, H. Biological robustness. Nat. Rev. Genet., 5, 826, 2004. 5. Barabási, A.L. and Albert, R. Emergence of scaling in random networks, Science. 286, 509, 1999. 6. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Barabási, A.L. The large-scale organization of metabolic networks. Nature, 407, 651, 2000. 7. Hartwell, L.H., Hopfield, J.J., Leibler, S., and Murray, A.W. From molecular to modular cell biology. Nature, 402, C47, 1999. 8. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U., Network motifs: Simple building blocks of complex networks. Science, 298, 824, 2002. 9. Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., and Barabasi, A.L. Hierarchical organization of modularity in metabolic networks. Science, 297, 1551, 2002. 10. Spirin, V. and Mirny, L.A. Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA., 100, 12123, 2003. 11. Barabasi, A.L. and Oltvai, Z.N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet., 5, 101, 2004. 12. Elowitz, M.B. and Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature, 403, 335, 2000. 13. Judd, E.M., Laub, M.T., and McAdams, H.H. Toggles and oscillators: new genetic circuit designs. Bioessays, 22, 507, 2000. 14. Kitano, H. Systems biology: a brief overview. Science, 295, 1662, 2002. 15. Kitano, H. Computational systems biology. Nature, 420, 206, 2002. 16. Gilchrist, M., Thorsson, V., Li, B., Rust, A.G., Korb, M., Kennedy, K., Hai, T., Bolouri, H., and Aderem, A. Systems biology approaches identify ATF3 as a negative regulator of Toll-like receptor 4. Nature, 441, 173, 2006. 17. Janes, K.A., Albeck, J.G., Gaudet, S., Sorger, P.K., Lauffenburger, D.A., and Yaffe, M.B. A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis. Science, 310, 1646, 2005. 18. Klipp, E., Nordlander, B., Krüger, R., Gennemark, P., and Hohmann, S. Integrative model of the response of yeast to osmotic shock. Nat. Biotechnol., 23, 975, 2005. 19. Thomas, S., Mooney, P.J.F., Burrell, M.M., and Fell, D.A. Metabolic control analysis of glycolysis in tuber tissue of potato (Solanum tuberosum): explanation for the low control coefficient of phosphofructokinase over respiratory flux. Biochem. J., 322, 119, 1997. 20. Schaaff, I., Heinisch, J., and Zimmermann, F.K. Overproduction of glycolytic enzymes in yeast. Yeast, 5, 285, 1989. 21. Tao, H., Gonzalez, R., Martinez, A., Rodriguez, M., Ingram, L.O., Preston, J.F., and Shanmugam, K.T. Engineering a homo-ethanol pathway in Escherichia coli: increased glycolytic flux and expression levels of glycolytic genes during xylose fermentation. J. Bacteriol., 183, 2979, 2001. 22. Stephanopoulos, G., Alper, H., and Moxley, J. Exploiting biological complexity for strain improvement through systems biology. Nat. Biotechnol., 22, 1261, 2004. 23. Baudin, A., Ozier-Kalogeropoulos, O., Denouel, A., Lacroute, F., and Cullin, C. A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae. Nucleic Acids Res., 21, 3329, 1993. 24. Wach, A., Brachat, A., Pohlmann, R., and Philippsen, P.R. New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast 10, 1793, 1994.

17-22

Future Prospects in Metabolic Engineering

25. Sweetlove, L.J., Last, R.L., and Fernie, A.R. Predictive metabolic engineering: a goal for systems biology. Plant Physiol., 132, 420, 2003. 26. Cornish-Bowden, A. Enzyme Kinetics. IRL Press, Oxford, U.K., 1988. 27. Fell, D.A. Understanding the Control of Metabolism. Portland Press, London, U.K., 1996. 28. Stephanopoulos, G.N., Aristidou, A.A., and Nielsen, J. Metabolic Engineering. Academic Press, London, U.K., 1998. 29. Edwards, J.S., Ibarra, R.U. and Palsson, B.O. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat. Biotechnol., 19, 125, 2001. 30. Voit, E.O. Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists. Cambridge University Press, U.K., 2000. 31. Garrido-del, S.C., Garcia-Canovas, F., Havesteen, B.H., and Castellanos, R.V. Kinetic analysis of enzyme reactions with slow-binding inhibition. Biosystems, 51, 169, 1999. 32. Bailey, J.E. Complex biology with no parameters. Nat. Biotechnol., 19, 503, 2001. 33. Fong, S.S. and Palsson, B.Ø. Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes. Nat. Gen., 36, 1056, 2004. 34. Segre, D., Vitkup, D., and Church, G.M. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. USA., 99, 15112, 2002. 35. Alper, H., Jin, Y.S., Moxley, J.F., and Stephanopoulos, G. Identifying gene targets for the metabolic engineering of lycopene biosynthesis in Escherichia coli. Metab. Eng., 7, 155, 2005. 36. Alper, H., Miyaoku, K., and Stephanopoulos, G. Characterization of lycopene-overproducing E. coli strains in high cell density fermentations. Appl Microbiol Biotechnol., 72(5), 968–974, 2006. 37. Savageau, M.A. Biochemical systems analysis. II. The steady-state solutions for an n-pool system using a power-law approximation. J. Theor. Biol., 25, 370, 1969. 38. Farrelly, D., Brown, K.S., Tieman, A., Ren, J., Lira, S.A., Hagan, D., Gregg, R., Mookhtiar, K.A., and Hariharan, N., Mice mutant for glucokinase regulatory protein exhibit decreased liver glucokinase: a sequestration mechanism in metabolic regulation. Proc. Natl. Acad. Sci. USA., 96, 14511, 1999. 39. Gitai, Z. The new bacterial cell biology: moving parts and subcellular architecture. Cell, 120, 577, 2005. 40. Hall, D. and Minton, A.P. Macromolecular crowding: qualitative and semiquantitative successes, quantitative challenges. Biochim. Biophys. Acta. 1649, 127, 2003. 41. Kongas, O., Wagner, M.J., ter Veld, F., Nicolay, K., van Beek, J.H., and Krab, K. The mitochondrial outer membrane is not a major diffusion barrier for ADP in mouse heart skinned fibre bundles. Pflügers Arch. Eur. J. Physiol., 447, 840, 2004. 42. Tian, T., and Burrage, K. Stochastic models for regulatory networks of the genetic toggle switch. Proc. Natl. Acad. Sci. USA., 103, 8372, 2006. 43. Mulquiney, P.J. and Kuchel, P.W. Model of 2,3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations: equations and parameter refinement. Biochem J., 342, 581, 1999. 44. Takahashi, K., Arjunan, S.N.V., and Tomita, M. Space in systems biology of signaling pathways— towards intracellular molecular crowding in silico. FEBS Lett., 579, 1783, 2005. 45. Lemerle, C., Di Ventura, B., and Serrano, L. Space as the final frontier in stochastic simulations of biological systems. FEBS Lett., 579, 1789, 2005. 46. Tolle, D. and Le Novère, N. Particle-based stochastic simulation in systems biology. Curr. Bioinform., 1, 315, 2006. 47. Perham, R.N. Swinging arms and swinging domains in multifunctional enzymes: catalytic machines for multistep reactions. Ann. Rev. Biochem. 69, 961, 2000. 48. Arjunan, S.N.V. and Tomita, M. A 3D pole-to-pole oscillation model of MinD spiral formation on growing Escherichia coli membrane at the 7th International Conference on Systems Biology, Yokohama, Japan, Oct 9–13, 2006.

In Silico Models for Metabolic Systems Engineering

17-23

49. Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T.S., Matsuzaki, Y., Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J.C., and Hutchison, C.A. 3rd. E-CELL: software environment for whole-cell simulation. Bioinformatics, 15, 72, 1999. 50. Ishii, N., Robert, M., Nakayama, Y., Kanai, A., and Tomita, M. Toward large-scale modeling of the microbial cell for computer simulation. J. Biotechnol. 113, 281, 2004. 51. Takahashi, K., Kaizu, K., Hu, B., and Tomita, M. A multi-algorithm, multi-timescale method for cell simulation. Bioinformatics, 20, 538, 2004. 52. Elowitz, M.B., Levine, A.J., Siggia E.D., and Swain P.S. Stochastic gene expression in a single cell. Science, 297, 1183, 2002. 53. Yugi, K., Nakayama, Y., Kinoshita, A., and Tomita, M. Hybrid dynamic/static method for large-scale simulation of metabolism. Theor. Biol. Med. Model., 2, 42, 2005. 54. Kikuchi, S., Tominaga, D., Arita, M., Takahashi, K., and Tomita M. Dynamic modeling of genetic networks using genetic algorithm and S-system. Bioinformatics, 19, 643, 2003. 55. Sugimoto, M., Kikuchi, S., and Tomita, M. Reverse engineering of biochemical equations from timecourse data by means of genetic programming. Biosystems 80, 155, 2005. 56. Lenne, P.F., Wawrezinieck, L., Conchonaud. F., Wurtz, O., Boned, A., Guo, X.J., Rigneault, H., He, H.T., and Marguet, D. Dynamic molecular confinement in the plasma membrane by microdomains and the cytoskeleton meshwork. EMBO. J., 25(14), 3245–3256, 2006. 57. Mayhew, T.M., Griffiths, G., and Lucocq, J.M. Applications of an efficient method for comparing immunogold labelling patterns in the same sets of compartments in different groups of cells. Histochem. Cell Biol., 122, 171, 2004. 58. Weckwerth, W. Metabolomics in systems biology. Ann. Rev. Plant. Biol., 54, 669, 2003. 59. Theobald, U., Mailinger, W., Baltes, M., Rizzi, M., and Reuss, M. In vivo analysis of metabolic dynamics in Saccharomyces cerevisiae: I. Experimental observations. Biotechnol. Bioeng., 55, 305, 1997. 60. Visser, D., van Zuylen, G.A., van Dam, J.C., Eman, M.R., Proll, A., Ras, C., Wu, L., van Gulik, W.M., and Heijnen, J.J. Analysis of in vivo kinetics of glycolysis in aerobic Saccharomyces cerevisiae by application of glucose and ethanol pulses. Biotechnol. Bioeng., 88, 157, 2004. 61. Selvarajoo, K., and Tsuchiya, M. Systematic determination of biological network topology: Nonintegral connectivity method (NICM). In Introdution to System Biology. Choi S., Editor. Humana Press, Totowa, New Jersey, 449–471, 2007. 62. Miggin, S.M. and O’Neill, L.A. New insights into the regulation of TLR signaling. J. Leukoc. Biol., 80, 226, 2006. 63. Lawrence, T., Bebien, M., Liu, G.Y., Nizet, V., and Karin, M. IKKalpha limits macrophage NF-kappaB activation and contributes to the resolution of inflammation. Nature, 434, 1138, 2005. 64. Laub, M.T., McAdams, H.H., Feldblyum, T., Fraser, C.M., and Shapiro, L. Global analysis of the genetic network controlling a bacterial cell cycle. Science, 290, 2144, 2000. 65. Selvarajoo, K. Discovering differential activation machinery of the Toll-like receptor 4 signaling pathways in MyD88 knockouts. FEBS Lett. 580, 1457, 2006. 66. Selvarajoo, K. Decoding the signalling mechanism of Toll-like receptor 4 signaling pathways in wildtypes and knockouts. In E-Cell System: Basic Concepts and Applications. Dhar P.K., Arjunan S.N.V., and Tomita M., Editors. (Guest Edited Sankar Ghosh) Kluwer Academic Publishers–Landes Bioscience, Austin, Texas, 2007. (http://eurekah.com/chapter/3584) 67. Jeong, H., Mason, S.P., Barabasi, A.L., and Oltvai, Z.N. Lethality and centrality in protein networks. Nature, 411, 41, 2001.

Tools for Experimentally Determining Flux through Pathways

V

Ralf Takors Evonik Degussa GmbH–Health & Nutrition

18 GC–MS for Metabolic Flux Analysis Christoph Wittmann........................................ 18-1 Introduction • GC–MS Instrumentation • Labeling Analysis • Metabolic Flux Studies Using GC–MS • Outlook

19 Tools for Measuring Intermediate and Product Formation Marco Oldiges............ 19-1 Introduction • Analytical Methods for Metabolomics • Biochemical Engineering Aspects of Metabolomics • Isotopically Nonstationary 13C Metabolic Flux Analysis Outlook

20 Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux Analysis in Saccharomyces cerevisiae and Penicillium chrysogenum Wouter A. van Winden, Roelco J. Kleijn, Walter M. van Gulik, and Joseph J. Heijnen.......................... 20-1 Introduction • Methods • Results and Discussion • Conclusions

S

ince its foundation in the early 1990s (Bailey, 1991) metabolic engineering has repeatedly demonstrated its outstanding property to quantitatively understand cell metabolism thus providing a sound basis for target-oriented optimization of new production strains (Nielsen, 1998; 2001). Among all the different tools being developed for metabolic engineering, metabolic flux analysis (MFA) V-1

V-2

Tools for Experimentally Determining Flux through Pathways

has played a dominating role (Wiechert, 2001), and perhaps can be regarded as the core of metabolic engineering (Stephanopoulos, 1999)—even today. Excellent contributions (e.g., Stephanopoulos et al., 1998; Wiechert, 2001; 2002) regarding the (historical) development of MFA has already been outlined. While early approaches were solely based on the analysis of the stoichiometry reaction matrix using measured substrate consumption and product formation rates together with, e.g., assumed P/O ratios, subsequent approaches made use of 13C-labeling patterns originating from the consumption, of for instance 13C-labeled glucose. The analysis of 13C patterns has typically been focused on proteinogenous amino acids or secreted products. For analysis, NMR as well as GC-MS approaches were used. MFA based on 13C labeling patterns turned out to be superior to the previous approaches because (i) No additional model assumptions such as the P/O ratio were necessary. (ii) Parallel, bidirectional and/or cyclic fluxes could be identified. (iii) The global fit of unknown flux parameters based on broad 13C-labeling information offers the possibility of detailed statistical analysis, e.g., for estimating flux accuracies. Regarding the successful MFA-applications and technology developments so far one may anticipate that the development of this tool is already completed, and at its optimum. However this is not true. The contributions of this section clearly indicate that the potential of 13C MFA is still great, offering new chances and challenges for novel analytical approaches, numerical methods and experimental applications. In the first chapter (Chapter 18), the advantage of MS-based 13C analysis is outlined by Wittmann, focusing on GC-MS approaches for the analysis of labeled amino acids. In the following the idea of MS-based labeling analysis is further developed by using LC-MS approaches. Here, not only amino acids but also metabolites of central metabolism are analyzed with respect to their 13C labeling patterns. Examples are given for Saccharomyces cerevisiae and Penicillium chrysogenum (Chapter 20, van Winden.) and for Escherichia coli (Chapter 19, Oldiges). Using 13C labeling information of the metabolome instead of the biomass hydrolysate offers a great advantage: Because stationary labeling is achieved faster in intracellular pools than in amino acids of the biomass, labeling periods of the culture can be reduced significantly using metabolome analysis (Iwatani et al., 2007). Recent advances even indicate, that metabolome based 13C-analysis does not necessarily need to use stationary labeling information, but can also be realized with instationary labeling information as well (Noeh and Wiechert, 2006; Noeh et al., 2006). This can additionally reduce the labeling period to less than 1 minute. This ongoing development opens a new door for MFA applications. Based on the new techniques it will be possible to analyze transient metabolism states. So production-driven questions like the following could be studied in detail: How does cellular metabolism change during phases of different carbon supply in a fed-batch process? What are the carbon flux changes immediately after e.g., induction of plasmid-encoded product formation? How does increasing product inhibition effect central metabolism flux distribution? Without a doubt, this list of potential instationary MFA applications can easily be extended further. Hence, the ongoing development of tools for MFA will expand the range of MFA applications thus further increasing the importance of MFA for metabolic engineering. Basic technologies of these current developments are presented in this section.

References Bailey J. E. 1991. Towards a science of metabolic engineering. Science, 252, 692–696. Iwatani S., Van Dien S., Shimbo K., Kubota K., Kageyama N., Iwahata D., Miyano H., Hirayama K., Usida Y., Shimizu K., and Matsui K. 2007. Determination of metabolic flux changes during fed-batch cultivation from measurements of intracellular amino acids by LC-MS/MS. J. Biotechnol., 128, 93–111.

Tools for Experimentally Determining Flux through Pathways

V-3

Noeh K. and Wiechert W. 2006. Experimental design and principles for isotopically instationary 13C labelling experiments. Biotechnol. Bioeng., 94(2), 234–251. Nielsen J. 1998. Metabolic engineering: Techniques for analysis of targets for genetic manipulations. Biotechnol. Bioeng., 58(2), 125–132. Nielsen J. 2001. Metabolic engineering. Appl. Micobiol. Biotechnol., 55, 263–283. Noeh K., Grönke K., Luo B., Takors R., Oldiges M., and Wiechert W. 2007. Metabolic flux analysis at ultra short time scale: Isotopically non stationary 13C labeling experiments. J. Biotechnol., 129(2), 249–267. Stephanopoulos G., Aristidou A. A., and Nielsen J. 1998. Metabolic Engineering—Principles and Metho dolodies. Academic Press, New York, ISBN–0-12-666260-6. Stephanopoulos G. 1999. Metabolic fluxes and metabolic engineering. Metab. Eng., 1, 1–11. Wiechert W. 2001. 13C metabolic flux analysis. Metab. Eng., 3, 195–206. Wiechert W. 2002. Modeling and simulation: Tools for metabolic engineering. J. Biotechnol., 94, 37–63.

18 GC–MS for Metabolic Flux Analysis Christoph Wittmann Technische Universita¯t Braunchweig

18.1 Introduction ��18-1 18.2 GC–MS Instrumentation ��18-2 18.3 Labeling Analysis ��18-3 18.4 Metabolic Flux Studies Using GC–MS ...................................... 18-5 18.5 Outlook �� 18-8 References �� 18-8

18.1 Introduction Mass spectrometry (MS) based metabolic flux approaches, mainly utilizing 13C labeled tracer substrates, have emerged as a key technology in metabolic physiology and biotechnology.1–5 MS hereby plays a central role in providing accurate 13C labeling data of metabolites formed during cultivation of the studied organism on the tracer substrate. These labeling data sensitively depend on the intracellular pathway fluxes and are utilized for their estimation. The flux calculation from the labeling data is either performed by a global fit of the unknown flux parameters6–10 or the calculation of local flux ratios in the network.11,12 Among different MS and NMR approaches available, GC–MS has turned out as the most popular technique for labeling measurement in metabolic flux studies. This is due to several advantageous characteristics of GC–MS such as high robustness, versatility, precision and sensitivity5 combined with an enormous separation capacity of the GC for the often complex biological mixtures.13 Moreover, GC–MS provides labeling data with rich information content so that important fine structures of the metabolic network, e.g., fluxes through parallel, bidirectional, or cyclic pathways, can be resolved. Combined with various simple one-pot protocols for sample derivatization GC–MS allows the labeling analysis of a variety of metabolites14 and is applicable for flux studies in many biological systems and cultivation conditions. This is underlined by recent examples from bacteria,10,11,15 yeasts,6,16,17 fungi,18 mammalian cells,19,20 or intact tissues.21,22 The most prominent flux approach is based on GC–MS labeling analysis of amino acids which generally have a high abundance in the studied cells23 and display an extensive set of labeling information. Knowing the precursor-amino acid relationships it is easy to deduce the labeling patterns of the precursor metabolites from labeling patterns of the amino acids.24 The analysis of amino acids from cell protein hydrolysates is a straightforward approach to analyze fluxes during steady-state in chemostat culture or under conditions of balanced growth in batch culture.23,25 Moreover, also fluxes in non-growing cells or fluxes under dynamic conditions, e.g., in different phases of a production process, are accessible via amino acid labeling measurement, considering secreted compounds9 or free intracellular pools continuously adapting to the actual flux distribution.8

18-1

18-2

Tools for Experimentally Determining Flux through Pathways

18.2 GC–MS Instrumentation GC–MS instruments are composed of a GC part for separation of the analytes of interest and an MS part where ions are generated from the sample molecules and subsequently separated according to their mass to charge (m/z) ratio. Both parts are connected within the instrument via a heated transfer line. Recommendable for flux analysis is a GC with a thin film capillary column with an inner diameter of about 250 µm and a length of 30–60 m, whereby the setup of the mostly used type, the wall coated open tubular (WCOT) column, is shown in Figure 18.1a. Poly-dimethylsiloxane (DB-1) or poly-diphenyldimethylsiloxane (DB-5) are often used general purpose stationary phases. They form the “liquid” film at the inner wall of the column (Figure 18.1b). Such capillary columns exhibit a low column bleed and are thermally stable up to 320°C suitable for elution of most derivates. Concerning the MS part, singlequadrupole mass spectrometers with electron impact (EI) ionization are most convenient for labeling analysis in flux studies. Such instruments are very robust in use and relatively low cost. The EI ionization is based on the interference of the sample stream with a crossing electron beam which causes the loss of electrons from the sample molecules (Figure 18.2a). Single electrons are usually released resulting in single-charged ions. Due to the high energy input, the molecular ion, formed during the ionization, typically fragments into smaller ions. This leads to mass spectra with characteristic relative abundance of the formed ions which display a fingerprint for the analyte. Chemical ionization (CI) as alternative ionization method plays a minor role for flux analysis.2,26–28 The generated ions are separated in the quadrupole, composed of four parallel cylindrical rods (about 25 cm length each) (Figure 18.2b). The mass separation is based on the motion of the ions in an oscillating electric field created through voltage variation between the rods. At a certain voltage between the rods, only analyte molecules of a distinct mass to charge ratio can pass, whereas ions of different mass to charge ratios are subjected to oscillations, which cause their collision with the rods. Variation of the voltage between the rods creates an oscillating radio-frequency field. The quadrupole thus works as a mass filter. It allows a fast scan over the whole mass range of the instrument within a few seconds. The mass range of a single quadrupole mass separator is typically between m/z 50 and 850. The sensitivity can be enhanced up to several thousand fold by selected ion monitoring (SIM), whereby only selected masses are sequentially measured with frequencies of 0.1–2 seconds each. Other mass separators such time-of-flight (TOF) mass separators or ion traps are typically not used for flux analysis. TOF instruments are much more expensive and do not provide substantial advantages for labeling analysis. A drawback of ion traps is the comparatively low accuracy for the labeling measurement9,29 resulting in a significantly higher uncertainty of the determined fluxes.

(a)

(b) Poly-dimethyl siloxane

Mobile phase with carrier gas

“Liquid” stationary phase as film on inner tube wall

Polyimide coating

Poly-diphenyldimethylsiloxane

CH3 Si

CH3 Si O

O

CH3

CH3

n

CH3 Si

O

Si O CH3

Fused silica m

n

Figure 18.1 Cross section through a wall coated open tubular (WCOT) column with a fused silica capillary (a) and chemical structures of the most often used polysiloxane phases in GC–MS analysis (b).

18-3

GC–MS for Metabolic Flux Analysis (a) Cathode

Electron beam

(b)

Quadrupole rods (ca.25 cm length)

e–

From GC column

+ + +

To mass separation

– + + + +

+

+

+ To detector

–

Anode Uacc

Figure 18.2 Schematic view of electron impact ionization (a) and mass separation by a quadrupole (b).

An advantage of ion traps, however, is the possibility to use them as a multi stage MSn mass spectrometer which can provide more detailed information on the labeling pattern and is useful in selected cases.30 Similarly triple-quadrupole instruments can be operated as multi stage MS.

18.3 Labeling Analysis The major compounds analyzed by GC–MS for flux analysis are amino acids, 2,9,10,26 organic acids, 2,26,31 sugars, 27,32,33 lipids, and fatty acids. 34 Typically derivatization of the analytes into volatile and thermally stable derivates is performed prior to labeling analysis. For this purpose a number of simple, one-pot derivatization reactions involving silylation, alkylation, or acetylation is available. 35–40 For the analysis of amino acids and related compounds, silylation with N-methyl-N-t-butyldimethylsilyltrifluoroacetamide (MBDSTFA) into t-butyl-dimethylsilyl (TBDMS) derivates has proven useful. Labeling analysis of amino acids in flux studies involving TBDMS-derivatization has been successfully applied for protein hydrolysates11,12,16,41,42 as well as for culture supernatants7,9,10,43 or even cell extracts.8,44 Alternatively, also alkylation with activated methyl groups such as N,Ndimethylformamide-dimethylacetal, forming N-dimethyl-aminomethylene amino acid methyl esters, is a useful derivatization for amino acid labeling analysis.23 A total ion current (TIC) spectrum of a mixture containing various TBDMS-derivatized amino acids and related metabolites underlines the high resolution capacity of the GC. In the given case it allows separation of all analytes of interest within about 40 minutes (Figure 18.3). The extraction of the labeling patterns from the single eluting analytes is a critical step of metabolic flux analysis. It has to be ensured that the labeling patterns are not affected by the sampling, sample pretreatment, or the GC–MS analysis itself. Potential problems to be carefully considered are isotopic discrimination effects during the GC separation, 25 incomplete resolution of adjacent ions, 25 nonlinearity of the detector,45,46 isobaric interference with background noise or with coeluting compounds.44 To avoid background interference the mass of the formed derivate must be sufficiently high (usually above 175 apparent mass units).47 The consistency check of GC–MS mass isotopomer distributions is facilitated by efficient software tools recently developed.48–50 If all items are properly addressed the relative abundance of mass isotopomers can be measured with relative errors in the range of 0.1–0.5 %.5 Figure 18.4 displays the mass spectrum of TBDMS-derivatized serine, obtained from average integration of the measurement signal during the time window of elution of the compound. One should notice the fragment ion at

18-4

8000000 6000000 4000000 2000000 0

10.00

15.00

Cystathionine

Histidine

Cysteine Glutamate

Aspartat

Phenylalanine

20.00

25.00 30.00 Time (min)

Tryptophan3

1e+07

Tryptophan2 Glutamine4

1.2e+07

Asparagine Ornithine Homocysteine Lysine Glutamine3 Arginine

1.4e+07

Homoserine

1.6e+07

Alanine Glycine α-Aminobutyrate β-Alanine

2e+07 1.8e+07

Valine Leucine Isoleucine Proline

2.2e+07

5-Oxoproline O-Acetyl-Homoserine Methionine Serine Threonine

Abundance 2.4e+07

Tyrosine

Tools for Experimentally Determining Flux through Pathways

35.00

40.00

45.00

Figure 18.3 Total ion current (TIC) spectrum of a sample containing TBDMS-derivatized amino acids and related metabolites. The separation of the totally 28 compounds is performed on a HP5-MS column (60 m, 250 µm inner diameter, Hewlett-Packard, Avondale, PA). Glutamine and tryptophan form two different products depending weather the nitrogen in the side chain is derivatized. (M-57) 390 (M-85) 362

(M-159) 288

Abundance 1800000 1600000 1400000 1200000

(M-R) 302

1000000 800000 600000 400000 200000 0

174 150

230 202 200

(M-15) 432

258

250

332 300 m/z

350

400

450

Figure 18.4 Mass spectrum of TBDMS3-serine derived by electron impact ionization. The mass of the molecular ion, which itself is not detected, is 447. Valuable ion clusters for labeling analysis in metabolic flux studies are marked.

m/z 390. This [M-57]+ ion results from loss of t-butyl group from the derivatization residue and contains all three carbons of the amino acid.51 The analysis of the mass isotopomer distributions of these [M-57]+ amino acid fragments from a protein hydrolysate is sufficient to determine an extensive set of intracellular pathways in the central metabolism of different microorganisms.6,41 In case more labeling information is required, additional fragments containing other parts of the carbon skeleton of the analyte can be taken into account. Combining the mass isotopomer distributions from these different fragments

18-5

GC–MS for Metabolic Flux Analysis CH3 O CH3 CH3 a b H3C C Si CH3

CH3

O

Si

CH3 a CH3 C

CH3

CH3

C c NH

CH d R

b

Figure 18.5 Chemical structure of a TBDMS-derivatized amino acid with most prominent fragments yielded in GC–MS analysis with electron impact ionization. Cracking at the denoted positions yields the following fragments: (a) [M-15]+ and methyl group; (b) [M-57]+ and t-butyl group; (c) [M-159]+ and C(O)O-TBDMS; (d) [M302]+ (the double silylated C1–C2 fragment) and the side chain with possibly further TBDMS groups.

allows partial or even complete resolution of the positional isotopomer pools of a compound.23,52 A number of such additional fragments are available from GC–MS analysis of TBDMS-derivatized compounds. In the mass spectrum in Figure 18.4, the ion clusters [M-85]+ and [M-159]+ are formed via the loss of the carbon C1 from serine. The ion cluster at m/z 302 is due to the release of the amino acid side chain and contains carbon atoms C1 and C2 of serine. Corresponding valuable fragments also result for most other compounds following the same fragmentation pathways (Figure 18.5). Alternatively, labeling data from parallel tracer studies on different tracer substrates, representing complementary labeling information, can be combined for the same purpose.7,9,10 Also a combination of both, parallel tracer studies and extended fragment analysis, has been applied for flux analysis.11,16,42,53

18.4 Metabolic Flux Studies Using GC–MS Originally developed and applied mainly in the biomedical area 2,3 GC–MS based flux approaches have been extensively extended and optimized through recent years. State-of-art metabolic flux analysis is carried out by a comprehensive approach combining tracer studies on labeled substrates with GC–MS labeling analysis of metabolites formed in the cultivation with computational flux calculation based on modeling frameworks with a systematic and general approach for the quantitative description of the transfer of labeled 13C atoms in metabolic networks.4,5 An overview on metabolic flux studies using GC–MS is given in Table 18.1. Fluxes in bacteria, fungi, yeasts, mammalian cells, tissues, organs, and even humans and animals have been studied so far illustrating the enormous application potential of GC–MS for flux analysis. GC–MS based metabolic flux analysis has been applied to determine carbon fluxes throughout the entire central carbon metabolism involving, e.g., glycolysis, pentose phosphate pathway (PPP), Entner-Doudoroff pathway, gluconeogenesis, TCA cycle, glyoxylate shunt, carboxylation and decarboxylation reactions interconverting C3 and C 4 metabolites between glycolysis and TCA cycle, various anabolic pathways, or even transport reactions between different cellular compartments. Among the best studied organisms with respect to pathway fluxes is C. glutamicum, an industrially important soil bacterium for industrial production of amino acids. Flux studies applied to this organism have substantially contributed to our current understanding on functioning and regulation of metabolic networks. Prominent examples are comparative flux studies on different carbon sources,7,10 in chemostat culture under substrate limitation, 30 of different lysine producing mutants9,41,43 and in different phases of a production process.8 The quantitative understanding of the central metabolism gained by these flux studies has proven a valuable basis for rational strain improvement.41 As example, the outcome of such a flux study is displayed in Figure 18.6 for a lysine producing strain of C. glutamicum.7 In this particular case, fluxes were determined for lysine producing cells via the labeling pattern of secreted lysine, alanine and trehalose. To obtain a sufficient degree of information for flux calculation the labeling patterns were determined from two parallel experiments with [1-13C] labeled substrate and with an equimolar mixture of [13C6] labeled and nonlabeled substrate. Each flux shown in Figure 18.6 reflects the in vivo activity of the corresponding reaction, whereby all values are normalized to

18-6

Tools for Experimentally Determining Flux through Pathways Table 18.1 Selection of Prominent Metabolic Flux Studies in Different Biological Systems Based on Isotopic Tracer Experiments with GC–MS Analysis of Labeling Patterns Biological System

Scope of the Study and References Bacteria

C. glutamicum

E. coli

B. subtilis B. clausii other species

Fluxes on different substrates in batch culture7,10 Fluxes in chemostat culture under substrate limitation30 Comparison of fluxes in different lysine producing mutants9,41,43 Fluxes in different phases of a lysine production process8 Fluxes at miniaturized scale (200 µL tracer cultivation)43 Flux response to loss of key metabolic enzymes57 Function of soluble and membrane bound transhydrogenase12 Fluxes during growth on different substrates15 Effect of single gene deletion on intracellular fluxes58–63 Fluxes during low and high riboflavin production50 Flux response to gene deletions42 Fluxes on minimal and semi-rich medium18 Flux comparison in different species in batch culture64 Fungi

P. chrysogenum A. nidulans A. niger

Fluxes in an adipoyl-7-ADCA-producing strain18 Influence of CreA on fluxes during growth on glucose and xylose65 Fluxes in a glucoamylase-producing strain66 Yeasts

S .cerevisiae

Pichia anomala Other species

Influence of environmental factors on TCA cycle flux16 Comparison of different mutants18 Fluxes at different growth rates in chemostat6 Oxygen- and glucose-dependent flux regulation67 Flux comparison in different species16,53

Human cells Adenocarcinoma cells Rat hepatozytes Rat hepatozytes Rat renal tubules

Comparison of fluxes in lymphoblasts and fibroblasts19 Influence of butyrate on cell differentiation68 Forward and reverse fluxes in the glucose hepatic network20 Flux through gluconeogenesis69 TCA cycle flux31

Human hepatoma cells Perfused rat liver Perfused rat heart

Study of the pentose phosphate pathway70 Anaplerosis, cataplerosis, and TCA cycle2,26 Anaplerosis, cataplerosis, and TCA cycle21,71,72

Rat Rat Human Human Rat and human

In vivo quantification of DNA synthesis in rats73–75 Gluconeogenesis and TCA cycle flux76 Quantification of protein turn-over77 Quantification of lipidogenesis78 Quantification of gluconeogenesis79

Mammalian cells

Tissues and organs

Humans and animals

18-7

GC–MS for Metabolic Flux Analysis Glucose 100 Biomass

1.1 Trehalose

Glucose 6-P

0.3

36.2

4.8 Pentose 5-P

62.0 0.4

18.1

Fructose 6-P

Erythrose 4-P

73.5

Sedoheptulose 7-P

73.5 Dihydroxyacetone P

70.4

Glyceraldehyde 3-P

3.0 Dihydroxyacetone

2.6

164.5

Dihydroxyacetone 0.4

Glycerol

3-P Glycerate 156.8

Glycerol

0.4

1.5

0.7 6.0

Biomass

10.8

13.5

Pyruvate 44.3

0.7

Lysine

28.1

Oxaloacetate

Diaminopimelate 1.9 Biomass

43.5

Lactate

70.9 Acetyl-CoA

30.0

19.6

19.6

Fructose 16-BP

4.5 52.9

4.9 Biomass

Isocitrate

Succinate 9.3

52.9

43.5 2-Oxoglutarate

Figure 18.6 In vivo carbon flux distribution in the central metabolism of Corynebacterium glutamicum during lysine production on glucose. The fluxes are estimated from the best fit to the experimental results from 13C tracer experiments with GC–MS labeling measurement. All fluxes are expressed as a molar percentage of the mean specific glucose uptake which is set to 100 %. (From Kiefer, P., Heinzle, E., Zelder, O., and Wittmann, C., Appl. Environ. Microbiol., 70 (1), 229–39, 2004. With permission.)

the specific glucose uptake rate, which is set to 100. As example the high flux through the pentose phosphate pathway in the given case is linked to the high demand for NADPH. NADPH is required as cofactor for the formation of lysine and supplied by the two initial reactions of the pentose phosphate pathway. Comparative flux analysis of mutants with different lysine production yields visualizes the close coupling of PPP flux and lysine pathway flux.9 The precision of the estimated fluxes directly benefits from the precisely measurable labeling patterns by GC–MS. This allows the differentiation of even only slightly different phenotypes and displays a great advantage of GC–MS flux approaches as compared to, e.g., studies using NMR for the labeling measurement. The latter result in up to ten fold larger errors for the determined fluxes.9

18-8

Tools for Experimentally Determining Flux through Pathways

18.5 Outlook Due to high robustness, sensitivity, and versatility, GC–MS based approaches will play a central role as routine technology for future analysis of metabolic fluxes in various biological systems. The use of stable isotopes in combination with GC–MS for metabolic flux analysis will establish as important method in actual fields such as functional genomics, systems biology, and pharmaceutical research for drug development.54,55 The compatibility with miniaturized cultivation tools will hereby allow the application to broad sets of mutants or cultivation conditions.12,43,56

References 1. Christensen, B. and Nielsen, J. Metabolic network analysis. A powerful tool in metabolic engineering. Adv. Biochem. Eng. Biotechnol., 66, 209–31, 2000. 2. Chatham, J. C., Bouchard, B., and Des Rosiers, C. A comparison between NMR and GCMS 13Cisotopomer analysis in cardiac metabolism. Mol. Cell. Biochem., 249 (1–2), 105–12, 2003. 3. Kelleher, J. K. Flux estimation using isotopic tracers: common ground for metabolic physiology and metabolic engineering. Metab. Eng., 3 (2), 100–10, 2001. 4. Wiechert, W. 13C metabolic flux analysis. Metab. Eng., 3 (3), 195–206, 2001. 5. Wittmann, C. Metabolic flux analysis using mass spectrometry. Adv. Biochem. Eng. Biotechnol., 74, 39–64, 2002. 6. Frick, O. and Wittmann, C. Characterization of the metabolic shift between oxidative and fermentative growth in Saccharomyces cerevisiae by comparative 13C flux analysis. Microb. Cell. Fact, 4, 30, 2005. 7. Kiefer, P., Heinzle, E., Zelder, O., and Wittmann, C. Comparative metabolic flux analysis of lysineproducing Corynebacterium glutamicum cultured on glucose or fructose. Appl. Environ. Microbiol., 70 (1), 229–39, 2004. 8. Kromer, J. O., Sorgenfrei, O., Klopprogge, K., Heinzle, E., and Wittmann, C. In-depth profiling of lysine-producing Corynebacterium glutamicum by combined analysis of the transcriptome, metabolome, and fluxome. J. Bacteriol., 186 (6), 1769–84, 2004. 9. Wittmann, C. and Heinzle, E. Genealogy profiling through strain improvement by using metabolic network analysis: metabolic flux genealogy of several generations of lysine-producing corynebacteria. Appl. Environ. Microbiol., 68 (12), 5843–59, 2002. 10. Wittmann, C., Kiefer, P., and Zelder, O. Metabolic fluxes in Corynebacterium glutamicum during lysine production with sucrose as carbon source. Appl. Environ. Microbiol., 70 (12), 7277–87, 2004. 11. Fischer, E. and Sauer, U. Metabolic flux profiling of Escherichia coli mutants in central carbon metabolism using GC–MS. Eur. J. Biochem., 270 (5), 880–91, 2003. 12. Sauer, U., Canonaco, F., Heri, S., Perrenoud, A., and Fischer, E. The soluble and membranebound transhydrogenases UdhA and PntAB have divergent functions in NADPH metabolism of Escherichia coli. J. Biol. Chem., 279 (8), 6613–19, 2004. 13. Villas-Boas, S. G., Mas, S., Akesson, M., Smedsgaard, J., and Nielsen, J. Mass spectrometry in metabolome analysis. Mass Spectrom. Rev., 24 (5), 613–46, 2005. 14. Villas-Boas, S. G., Hojer-Pedersen, J., Akesson, M., Smedsgaard, J., and Nielsen, J. Global metabolite analysis of yeast: evaluation of sample preparation methods. Yeast, 22 (14), 1155–69, 2005. 15. Zhao, J. and Shimizu, K. Metabolic flux analysis of Escherichia coli K12 grown on 13C-labeled acetate and glucose using GC–MS and powerful flux calculation method. J. Biotechnol., 101 (2), 101–17, 2003. 16. Blank, L. M. and Sauer, U. TCA cycle activity in Saccharomyces cerevisiae is a function of the environmentally determined specific growth and glucose uptake rates. Microbiology, 150 (4), 1085–93, 2004.

GC–MS for Metabolic Flux Analysis

18-9

17. von Stockar, U., Valentinotti, S., Marison, I., Cannizzaro, C., and Herwig, C. Know-how and knowwhy in biochemical engineering. Biotechnol. Adv., 21 (5), 417–30, 2003. 18. Christensen, B., Christiansen, T., Gombert, A. K., Thykaer, J., and Nielsen, J. Simple and robust method for estimation of the split between the oxidative pentose phosphate pathway and the Embden-Meyerhof-Parnas pathway in microorganisms. Biotechnol. Bioeng., 74 (6), 517–23, 2001. 19. Lin, Y. Y., Cheng, W. B., and Wright, C. E. Glucose metabolism in mammalian cells as determined by mass isotopomer analysis. Anal. Biochem., 209 (2), 267–73, 1993. 20. Steegborn, C., Clausen, T., Sondermann, P., Jacob, U., Worbs, M., Marinkovic, S., Huber, R., and Wahl, M. C. Kinetics and inhibition of recombinant human cystathionine gamma-lyase. Toward the rational control of transsulfuration. J. Biol. Chem., 274 (18), 12675–84, 1999. 21. Comte, B., Vincent, G., Bouchard, B., Jette, M., Cordeau, S., and Rosiers, C. D. A 13C mass isotopomer study of anaplerotic pyruvate carboxylation in perfused rat hearts. J. Biol. Chem., 272 (42), 26125–31, 1997. 22. Khairallah, M., Labarthe, F., Bouchard, B., Danialou, G., Petrof, B. J., and Des Rosiers, C. Profiling substrate fluxes in the isolated working mouse heart using 13C-labeled substrates: focusing on the origin and fate of pyruvate and citrate carbons. Am. J. Physiol. Heart Circ. Physiol., 286 (4), H1461–70, 2004. 23. Christensen, B. and Nielsen, J. Isotopomer analysis using GC–MS. Metab. Eng., 1 (4), 282–90, 1999. 24. Szyperski, T. Biosynthetically directed fractional 13C-labeling of proteinogenic amino acids. An efficient analytical tool to investigate intermediary metabolism. Eur. J. Biochem., 232 (2), 433–48, 1995. 25. Dauner, M. and Sauer, U. GC–MS analysis of amino acids rapidly provides rich information for isotopomer balancing. Biotechnol. Prog., 16 (4), 642–49, 2000. 26. Di Donato, L., Des Rosiers, C., Montgomery, J. A., David, F., Garneau, M., and Brunengraber, H. Rates of gluconeogenesis and citric acid cycle in perfused livers, assessed from the mass spectrometric assay of the 13C labeling pattern of glutamate. J. Biol. Chem., 268 (6), 4170–80, 1993. 27. Hellerstein, M. K., Neese, R. A., Linfoot, P., Christiansen, M., Turner, S., and Letscher, A. Hepatic gluconeogenic fluxes and glycogen turnover during fasting in humans. A stable isotope study. J. Clin. Invest., 100 (5), 1305–19, 1997. 28. Hellerstein, M. K., Neese, R. A., Schwarz, J. M., Turner, S., Faix, D., and Wu, K. Altered fluxes responsible for reduced hepatic glucose production and gluconeogenesis by exogenous glucose in rats. Am. J. Physiol., 272 (1), E163–72, 1997. 29. Klapa, M. I., Aon, J. C., and Stephanopoulos, G. Ion-trap mass spectrometry used in combination with gas chromatography for high-resolution metabolic flux determination. Biotechniques, 34 (4), 832–36, 838, 840 passim, 2003. 30. Klapa, M. I., Aon, J. C., and Stephanopoulos, G. Systematic quantification of complex metabolic flux networks using stable isotopes and mass spectrometry. Eur. J. Biochem., 270 (17), 3525–42, 2003. 31. Nissim, I., Nissim, I., and Yudkoff, M. Carbon flux through tricarboxylic acid cycle in rat renal tubules. Biochim. Biophys. Acta, 1033 (2), 194–200, 1990. 32. Katz, J., Wals, P. A., and Lee, W. N. Determination of pathways of glycogen synthesis and the dilution of the three-carbon pool with [U-13C]glucose. Proc. Natl. Acad. Sci. USA, 88 (6), 2103–7, 1991. 33. Kelleher, J. K. Estimating gluconeogenesis with [U-13C]glucose: molecular condensation requires a molecular approach. Am. J. Physiol., 277 (3), E395–400, 1999. 34. Hellerstein, M. K., Schwarz, J. M., and Neese, R. A. Regulation of hepatic de novo lipogenesis in humans. Annu. Rev. Nutr., 16, 523–57, 1996. 35. Fox, A. Carbohydrate profiling of bacteria by gas chromatography-mass spectrometry and their trace detection in complex matrices by gas chromatography-tandem mass spectrometry. J. Chromatogr. A, 843 (1–2), 287–300, 1999.

18-10

Tools for Experimentally Determining Flux through Pathways

36. Halket, J. M., Waterman, D., Przyborowska, A. M., Patel, R. K., Fraser, P. D., and Bramley, P. M. Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/ MS. J. Exp. Bot., 56 (410), 219–43, 2005. 37. Halket, J. M. and Zaikin, V. G. Derivatization in mass spectrometry—1. Silylation. Eur. J. Mass Spectrom, 9 (1), 1–21, 2003. 38. Segura, J., Ventura, R., and Jurado, C. Derivatization procedures for gas chromatographic-mass spectrometric determination of xenobiotics in biological samples, with special attention to drugs of abuse and doping agents. J. Chromatogr. B Biomed. Sci. Appl., 713 (1), 61–90, 1998. 39. Toyo’oka, T. Use of derivatization to improve the chromatographic properties and detection selectivity of physiologically important carboxylic acids. J. Chromatogr. B Biomed. Appl., 671 (1–2), 91–112, 1995. 40. Wasels, R. and Belleville, F. Gas chromatographic-mass spectrometric procedures used for the identification and determination of morphine, codeine and 6-monoacetylmorphine. J. Chromatogr. A, 674 (1–2), 225–34, 1994. 41. Becker, J., Klopprogge, C., Zelder, O., Heinzle, E., and Wittmann, C. Amplified expression of fructose 1,6-bisphosphatase in Corynebacterium glutamicum increases in vivo flux through the pentose phosphate pathway and lysine production on different carbon sources. Appl. Environ. Microbiol., 71 (12), 8587–96, 2005. 42. Alper, H., Fischer, C., Nevoigt, E., and Stephanopoulos, G. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA, 102 (36), 12678–83, 2005. 43. Wittmann, C., Kim, H. M., and Heinzle, E. Metabolic network analysis of lysine producing Corynebacterium glutamicum at a miniaturized scale. Biotechnol. Bioeng., 87 (1), 1–6, 2004. 44. Wittmann, C., Hans, M., and Heinzle, E. In vivo analysis of intracellular amino acid labelings by GC/MS. Anal. Biochem., 307 (2), 379–82, 2002. 45. Patterson, B. W. and Wolfe, R. R. Concentration dependence of methyl palmitate isotope ratios by electron impact ionization gas chromatography/mass spectrometry. Biol. Mass Spectrom., 22 (8), 481–86, 1993. 46. Vogt, J. A., Wachter, U., and Georgieff, M. Non-linearity in the quadrupole detector system: implications for the determination of the 13C mass distribution of an ion fragment. J. Mass Spectrom., 38 (2), 222–30, 2003. 47. Thompson, G. N., Pacy, P. J., Ford, G. C., and Halliday, D. Practical considerations in the use of stable isotope labeled compounds as tracers in clinical studies. Biomed. Environ. Mass Spectrom., 18 (5), 321–27, 1989. 48. Talwar, P., Wittmann, C., Lengauer, T., and Heinzle, E. Software tool for automated processing of 13C labeling data from mass spectrometric spectra. Biotechniques, 35 (6), 1214–15, 2003. 49. Wahl, S. A., Dauner, M., and Wiechert, W. New tools for mass isotopomer data evaluation in (13)C flux analysis: mass isotope correction, data consistency checking, and precursor relationships. Biotechnol. Bioeng., 85 (3), 259–68, 2004. 50. Dolzan, M., Johansson, K., Roig-Zamboni, V., Campanacci, V., Tegoni, M., Schneider, G., and Cambillau, C. Crystal structure and reactivity of YbdL from Escherichia coli identify a methionine aminotransferase function. FEBS Lett., 571 (1–3), 141–46, 2004. 51. Kitson, F. G., Larsen, B., and Mc Even, C. N. Gas Chromatography and Mass Spectrometry: A Practical Guide. Academic Press, San Diego, 1996. 52. Wittmann, C. and Heinzle, E. Mass spectrometry for metabolic flux analysis. Biotechnol. Bioeng., 62 (6), 739–750, 1999. 53. Fiaux, J., Cakar, Z. P., Sonderegger, M., Wuthrich, K., Szyperski, T., and Sauer, U. Metabolic-flux profiling of the yeasts Saccharomyces cerevisiae and Pichia stipitis. Eukaryot. Cell, 2 (1), 170–80, 2003. 54. Hellerstein, M. K. New stable isotope-mass spectrometric techniques for measuring fluxes through intact metabolic pathways in mammalian systems: introduction of moving pictures into functional genomics and biochemical phenotyping. Metab. Eng. 6 (1), 85–100, 2004.

GC–MS for Metabolic Flux Analysis

18-11

55. Hellerstein, M. K. In vivo measurement of fluxes through metabolic pathways: the missing link in functional genomics and pharmaceutical research. Annu. Rev. Nutr., 23, 379–402, 2003. 56. Sauer, U. High-throughput phenomics: experimental methods for mapping fluxomes. Curr. Opin. Biotechnol., 15 (1), 58–63, 2004. 57. Fong, S. S. and Palsson, B. O. Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes. Nat. Genet., 36 (10), 1056–58, 2004. 58. Li, M., Ho, P. Y., Yao, S., and Shimizu, K. Effect of lpdA gene knockout on the metabolism in Escherichia coli based on enzyme activities, intracellular metabolite concentrations and metabolic flux analysis by 13C-labeling experiments. J. Biotechnol., 122 (2), 254–66, 2006. 59. Zhu, T., Phalakornkule, C., Ghosh, S., Grossmann, I. E., Koepsel, R. R., Ataai, M. M. and Domach, M. M. A metabolic network analysis & NMR experiment design tool with user interface-driven model construction for depth-first search analysis. Metab. Eng., 5 (2), 74–85, 2003. 60. Shimizu, K. Metabolic flux analysis based on 13C-labeling experiments and integration of the information with gene and protein expression patterns. Adv. Biochem. Eng. Biotechnol., 91, 1–49, 2004. 61. Peng, L., Arauzo-Bravo, M. J., and Shimizu, K. Metabolic flux analysis for a ppc mutant Escherichia coli based on 13C-labelling experiments together with enzyme activity assays and intracellular metabolite measurements. FEMS Microbiol. Lett., 235 (1), 17–23, 2004. 62. Xu, H., Zhang, Y., Guo, X., Ren, S., Staempfli, A. A., Chiao, J., Jiang, W., and Zhao, G. Isoleucine biosynthesis in Leptospira interrogans serotype lai strain 56601 proceeds via a threonine-independent pathway. J. Bacteriol., 186 (16), 5400–9, 2004. 63. Al Zaid Siddiquee, K., Arauzo-Bravo, M. J., and Shimizu, K. Metabolic flux analysis of pykF gene knockout Escherichia coli based on 13C-labeling experiments together with measurements of enzyme activities and intracellular metabolite concentrations. Appl. Microbiol. Biotechnol., 63 (4), 407–17, 2004. 64. Fuhrer, T., Fischer, E., and Sauer, U. Experimental identification and quantification of glucose metabolism in seven bacterial species. J. Bacteriol., 187 (5), 1581–90, 2005. 65. David, H., Krogh, A. M., Roca, C., Akesson, M., and Nielsen, J. CreA influences the metabolic fluxes of Aspergillus nidulans during growth on glucose and xylose. Microbiology, 151 (7), 2209–21, 2005. 66. Schmidt, K., Norregaard, L. C., Pedersen, B., Meissner, A., Duus, J. O., Nielsen, J. O., and Villadsen, J. Quantification of intracellular metabolic fluxes from fractional enrichment and 13C-13C coupling constraints on the isotopomer distribution in labeled biomass components. Metab. Eng., 1 (2), 166–79, 1999. 67. Fredlund, E., Blank, L. M., Schnurer, J., Sauer, U., and Passoth, V. Oxygen- and glucose-dependent regulation of central carbon metabolism in Pichia anomala. Appl. Environ. Microbiol., 70 (10), 5905–11, 2004. 68. Boren, J., Lee, W. N., Bassilian, S., Centelles, J. J., Lim, S., Ahmed, S., Boros, L. G., and Cascante, M. The stable isotope-based dynamic metabolic profile of butyrate-induced HT29 cell differentiation. J. Biol. Chem., 278 (31), 28395–402, 2003. 69. Arnoldi, L., Valsecchi, G., Magni, F., Monti, L. D., Piatti, P. M., Costa, S., and Kienle, M. G. Gluconeogenesis in isolated rat hepatocytes evaluated by gas chromatography/mass spectrometry using deuterated water. J. Mass Spectrom., 33 (5), 444–52, 1998. 70. Lee, W. N., Boros, L. G., Puigjaner, J., Bassilian, S., Lim, S., and Cascante, M. Mass isotopomer study of the nonoxidative pathways of the pentose cycle with [1,2–13C2]glucose. Am. J. Physiol., 274 (5), E843–51, 1998. 71. Poirier, M., Vincent, G., Reszko, A. E., Bouchard, B., Kelleher, J. K., Brunengraber, H., and Des Rosiers, C. Probing the link between citrate and malonyl-CoA in perfused rat hearts. Am. J. Physiol. Heart Circ. Physiol., 283 (4), H1379–86, 2002. 72. Vincent, G., Comte, B., Poirier, M., and Rosiers, C. D. Citrate release by perfused rat hearts: a window on mitochondrial cataplerosis. Am. J. Physiol. Endocrinol. Metab., 278 (5), E846–56, 2000.

18-12

Tools for Experimentally Determining Flux through Pathways

73. Lee, S. Y., Lee, D. Y., and Kim, T. Y. Systems biotechnology for strain improvement. Trends Biotechnol., 23 (7), 349–58, 2005. 74. Collins, M. L., Eng, S., Hoh, R., and Hellerstein, M. K. Measurement of mitochondrial DNA synthesis in vivo using a stable isotope-mass spectrometric technique. J. Appl. Physiol., 94 (6), 2203–11, 2003. 75. Macallan, D. C., Fullerton, C. A., Neese, R. A., Haddock, K., Park, S. S., and Hellerstein, M. K. Measurement of cell proliferation by labeling of DNA with stable isotope-labeled glucose: studies in vitro, in animals, and in humans. Proc. Natl. Acad. Sci. USA, 95 (2), 708–13, 1998. 76. Katz, J., Wals, P., and Lee, W. N. Isotopomer studies of gluconeogenesis and the Krebs cycle with 13C-labeled lactate. J. Biol. Chem., 268 (34), 25509–21, 1993. 77. Busch, R., Kim, Y. K., Neese, R. A., Schade-Serin, V., Collins, M., Awada, M., Gardner, J. L., Beysen, C., Marino, M. E., Misell, L. M., and Hellerstein, M. K. Measurement of protein turnover rates by heavy water labeling of nonessential amino acids. Biochim. Biophys. Acta, 1760 (5), 730–44, 2006. 78. Siler, S. Q., Neese, R. A., and Hellerstein, M. K. De novo lipogenesis, lipid kinetics, and whole-body lipid balances in humans after acute alcohol consumption. Am. J. Clin. Nutr., 70 (5), 928–36, 1999. 79. Saadatian, M., Peroni, O., Diraison, F., and Beylot, M. In vivo measurement of gluconeogenesis in animals and humans with deuterated water: a simplified method. Diabetes Metab, 26 (3), 202–9, 2000.

19 Tools for Measuring Intermediate and Product Formation

Marco Oldiges Forschungszentrum Jülich GmbH

19.1 Introduction ��19-1 19.2 Analytical Methods for Metabolomics..................................... 19-4 19.3 Biochemical Engineering Aspects of Metabolomics.............. 19-6 19.4 Isotopically Nonstationary 13C Metabolic Flux Analysis ..... 19-9 19.5 Outlook ��19-11 Acknowledgment ��19-12 References ��19-12

19.1 Introduction The measurement of the concentration of intermediates and products of metabolic pathways is part of the metabolomics field. Metabolomics deals with the identification, qualitative, and quantitative measurement of the metabolites acting in the biochemical network. In analogy to the genome which is used as synonym for the entirety of all genetic information, the metabolome represents the entirety of the metabolites within a biological system. The origin of the word genome is attributed to the botanist Hans Winkler in 1920, by making a portmanteau of the words gene and chromosome (Winkler, 1920). In the biological context the suffix -ome has been developed to “direct attention to holistic abstractions, an eventual goal, of which only a few parts may be initially in hand” (Lederberg and McCray, 2001) and has found its way into the scientific colloquial and is now essential part of many buzzwords in the bio(techno)logical field. The total number of metabolites is dependent on the biological system investigated. For microorganisms several hundred metabolites are described or postulated, respectively. The E. coli database EcoCYC contains 1170 metabolite entries (Keseler et al., 2005) and the metabolome of S. cerevisiae is estimated around 600 metabolites (Forster et al., 2003), showing a molecular weight below 300 g/mol for the major metabolites in E. coli (Nobeli et al., 2003) and S. cerevisiae (Forster et al., 2003). For the plant kingdom the fascinating number of more than 200,000 metabolites or phytochemicals, i.e., primary, but mainly secondary metabolites, are expected (Mungur et al., 2005). Eukaryotes´ cell compartimentation further increases complexity because the metabolite levels will differ depending on their localization in the organelles, whereas prokaryotes are assumed to have a much simpler inner structure (bag of enzymes). Metabolomics can be expected to push forward the areas of functional genomics (Oliver et al., 1998; Fiehn et al., 2000; Raamsdonk et al., 2001; Fiehn, 2002; Allen et al., 2003; Kell et al., 2005), systems biology (Kell, 2004; Kell et al., 2005; Wendisch et al., 2006) and metabolic engineering (Nielsen, 2001; Raab et al., 2005). The term metabolomics is widely accepted in the scientific community indicating the scientific area which focuses on the analyses and interpretation of metabolite levels in a biological sample, e.g., by 19-1

19-2

Tools for Experimentally Determining Flux through Pathways

target analysis, profiling, finger, or footprinting. These neologisms of the postgenomic era in the context of metabolomics are generally accepted, but are sometimes used with different meanings. One obvious reason for this is the very rapid development of the metabolomics area in the past years, which is documented on the base of the corresponding literature hits (Figure 19.1) exhibiting an almost exponential behavior, starting in the late 1990s with an estimated mean doubling time of one year. Hence, the scientific terms belonging to metabolomics activities are in a finalizing process at the moment and will find their appropriate definitions. Compared to genome, transcriptome, and proteome the term metabolome was coined much later, although studies focusing on the metabolome, of course, have been performed before that point. In 1998 Oliver et al. mentioned metabolome analysis in the context of functional genomics (Oliver et al., 1998) and Tweeddale et al. discussed the term with respect to the phenotype investigation of E. coli (Tweeddale et al., 1998). Fiehn (2002) gave the first more detailed definition of metabolomics and its subsections. According to Fiehn (2002), metabolomics is multifunctional making use of different analytical approaches depending on the topic of the study. It is subdivided into (i) target analysis, aiming at quantitative analysis of substrate and/or product metabolites of a target protein; (ii) metabolic profiling, focusing at quantitative analysis of a set of predefined metabolites belonging to a class of compounds or members of particular pathways or a linked group of metabolites; and (iii) metabolomics, striving for an unbiased overview of whole-cell metabolic patterns. For a more rapid analysis metabolic fingerprinting (Fiehn, 2001) can be used, which reduces the analytical effort to the analysis of these metabolites with biochemical relevance. In target analysis the bulk of metabolome information is usually ignored, because of the narrow focus on specific compounds. But beneficially, selective sample preparation steps can be applied and optimized to improve the data quality leading to highly sensitive and precise pool quantification. On the other hand, following Fiehns unbiased metabolomics approach, sample preparation must not be compound specific to prevent any exclusion of potentially interesting metabolite signals. This is very difficult or almost impossible to achieve, so that the choice of the sample processing protocol will influence the final outcome. The basic requirement is the reproducibility of the protocol to allow direct comparison between samples, although the protocol potentially discriminates between chemical properties of the compounds. Hence, metabolomics provide a comprehensive overview of the biological system, however quantitative information about, for instance intracellular pool sizes, is usually not accessible or of unsufficient accuracy.

Number of publications

400

300

200

100

0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 Year of publication

Figure 19.1 Course of the literature hits for the search term “metabolom*” in the electronic ISI Web of Knowledge database.

Tools for Measuring Intermediate and Product Formation

19-3

Metabolic profiling aims at the quantification of preselected metabolic pathways or groups of metabolites in a metabolic segment and promises to be an effective method for investigating microbial metabolism in a quantitative manner. Knowledge about the changes of the intracellular metabolite concentrations offers a direct access to the identification of enzymatic kinetics especially to the in vivo kinetics of the underlying metabolic network (Rizzi et al., 1997; Theobald et al., 1997; Buchholz et al., 2002; Chassagnole et al., 2002; Visser et al., 2002; Wiechert, 2002; Wiechert and Takors, 2003) or may elucidate limiting or inhibiting biosynthetic steps, which can be used for iterative strain optimization (Oldiges et al., 2004; Magnus et al., 2006). Allen et al. (2003) presented another variant of high-throughput classification, called metabolic footprinting, which differs from the aforementioned methods by analyzing extracellular metabolites instead of intracellular pools. Footprinting analysis can have some advantages over fingerprinting analysis. Most notably, there is no requirement for time-consuming quenching and extraction steps when metabolites in the extracellular medium of bacterial cultures are analyzed, although the full pattern of intracellular metabolome information (e.g., phosphorylated compounds and other highly charged metabolites) is unlikely to be present outside, which is limiting this approach compared to metabolic fingerprinting. Nielsen and Oliver (2005) introduced the terms endo- and exo-metabolome to distinguish between intra and extracellular metabolites. Villas-Boas et al. (2005a) proposed a different definition arguing that metabolomics in the sense of Fiehn (2002) “is still an impossible task owing to the complexity of the metabolome” and should be taken as the scientific keyword instead of an analytical subsection. In addition, they suggest that metabolomics is divided only into target analysis and metabolic profiling, the latter containing the finger and footprinting. Although this is a reasonable definition, the aspect of compound identification and quantification is not sufficiently addressed. The basic difference between metabolic profiling and the foot-/fingerprinting is seen in the required quantification of the metabolites in the metabolic profiling approach. In contrast, foot and fingerprint data do not require this quantitative information and processing of semiquantitative data of qualified or even unknown metabolite peaks in these fully unbiased approaches are used to gain more insight. Following this view the difference between target analysis and metabolic profiling is smoother and mainly determined by the number of metabolites and the higher data quality which can be achieved when analyzing only specific targets. Based on these arguments Table 19.1 and Figure 19.2 summarize the different metabolomics approaches based on Fiehn (2002) partly modified according to Allen et al. (2003), Nielsen and Oliver (2005) and Villas-Boas et al. (2005a). All these different approaches are usually applied to distinguish differences in metabolite concentrations after genetic modification or changes in the cell´s environment, which have a direct influence on the metabolite levels. Since the metabolites are downstream of all genome and proteome regulatory structures they will therefore provide valuable information about the regulatory or catalytic properties of a gene product. It has become evident that the metabolome plays an important role for control of cellular functions and metabolic network operation (Fell, 2001; Raamsdonk et al., 2001; Even et al., 2003; Fiehn and Weckwerth, 2003). This includes application of metabolomics for functional genomics (i.e., genotype–phenotype correlation), fermentation monitoring, detection of metabolic disfunctions and disease control, but also microbial strain identification, elucidation of regulatory networks and their hierarchy and finally the directed improvement of biocatalysts (i.e., enzymes or whole cells) in metabolic engineering for white biotechnology applications (Wang et al., 2006). Nowadays, metabolomics is gaining much attention which is also visible by the formation of the Metabolomics Society (www.metabolomicssociety.org) and the initiation of the Metabolomics journal (Goodacre, 2005). For readers seeking recent reviews of the metabolomics field, the following surveys might be useful (Fiehn, 2002; Fiehn and Weckwerth, 2003; Sumner et al., 2003; Kell, 2004; Brown et al., 2005; Dunn et al., 2005; Rochfort, 2005; Villas-Boas et al., 2005a; Hall, 2006; Hollywood et al., 2006; Wang et al., 2006; Dettmer et al., 2007), especially an excellent review of microbial metabolomics (Mashego et al., 2007).

19-4

Tools for Experimentally Determining Flux through Pathways

Table 19.1 Definitions Used in Metabolomics Studies Term

Definition

Metabolite Metabolome Exo-metabolome Endo-metabolome Metabolic quenching Metabolomics Target analysis Metabolic profiling

Metabolic fingerprinting Metabolic footprinting

Bio-active small molecule involved in the biochemical network Set of all metabolites present in a biological system (e.g., cell) Metabolites present in the extracellular surrounding of the cell (e.g., culture supernatant) Metabolites present inside the cell (sometimes separated by compartmentation) Immediate stop of all metabolic activity (preferably during the sample generation procedure) Quantificationa of the metabolome (almost impossible to achieve so far) and keyword for this scientific branch Quantitativea analysis of substrate and/or product metabolites of a target protein Quantitativea analysis of a set of predefined metabolites belonging to a class of compounds or members of particular pathways or a linked group of metabolites (e.g., sugars, sugar phosphates, lipids, organic acids) Semiquantitativeb analysis of the exo-metabolome Semiquantitativeb analysis of the exo-metabolome

aQuantification requires unambiguous qualitative identification of the metabolites and correction for analytical inaccuracies (e.g., due to matrix effects) and results in an absolute concentration value. bSemiquantitative data originating from peak areas or heights and error-prone due to sample or matrix effects, but suitable for direct comparison of the relative changes in metabolite content between similar samples. Sources: Initially based on Fiehn (Plant Molecular Biology, 48(1–2), 155–71, 2002) and adapted according to Allen et al. (Nature Biotechnology, 21(6), 692–96, 2003); Nielsen and Oliver (Trends in Biotechnology, 23(11), 544–46, 2005); and VillasBoas et al. (Mass Spectrometry Reviews, 24(5), 613–46, 2005a).

Endo-Metabolome

Exo-Metabolome Metabolic profiling Target analysis

METABOLIC PATHWAYS Metabolism of coplex carbohydrates

Biodegradtion xenobiotics

Nucleotide metabolism

Metabolism of coplex lipids

Nucleotide metabolism Carbohydrate metabolism Metabolism of other amino acids

Lipid metabolism

Metabolic fingerprinting

Amino acid metabolism

Energy metabolism

Metabolism of cofactors and vitamins Biosynthesis of secondary metabolites

Metabolic footprinting

Figure 19.2 Classification of different approaches of metabolomics investigations with respect to comprehension and metabolite localization.

19.2 Analytical Methods for Metabolomics In the past a variety of different analytical techniques have been adopted and used for metabolite analysis. Among them are enzymatic assays (Bergmeyer, 1984), thin layer chromatography (TLC) (Tweeddale et al., 1998), gas chromatography (GC) (Womersley, 1981), nuclear magnetic resonance (NMR) (Ogino, 1982; De Graaf et al., 1999; Neves et al., 1999; Teleman et al., 1999) high pressure liquid chromatography (HPLC) (Bhattacharya et al., 1995; Smits et al., 1998; Meyer et al., 1999) and especially mass spectrometry (MS) by direct-infusion (Castrillo et al., 2003) or coupled to chromatographic techniques, like GC (Harvey and Horning, 1973; Katona et al., 1999; Roessner et al., 2000; Fuzfai et al., 2004; Koek et al., 2006; Kopka, 2006), CE (Soga, et al., 2002; Soga et al., 2003; Edwards et al., 2006; Ramautar et al., 2006) and LC (Feurle et al., 1998; Buchholz, et al., 2001).

Tools for Measuring Intermediate and Product Formation

19-5

In principle, almost every analytical technique or detection method can be employed for the different parts of metabolomics, but in fact there is an inevitable trend to highly selective and sensitive multifunctional methods, which allow the reliable detection of a large spectrum of compounds requiring only small sample volumes. The metabolome constitutes only a minor part of the cell dry weight in bacteria (e.g., 3–5% for enteric bacteria (Neidhardt and Umbarger, 1996) and 85% of the metabolites of E. coli is attributed to have a molecular weight less than 500 g/mol (Nobeli et al., 2003). Hence, the sample volume is limited and dilution steps are not favorable because of the typically low metabolite concentrations in the cell, which can be down to pM range (Dunn et al., 2005). In contrast to proteins and DNA/RNA the metabolite solution cannot be concentrated by ultrafiltration due to the low molecular weight of the metabolites. The detector should also allow a specific determination of the compounds of interest and should reduce interferences from the complex sample matrix to a minimum. The different analytical technologies for measuring the metabolome have been summarized by Dunn et al. (2005), Dettmer et al. (2007) and Villas-Boas et al. (2005a) focused on MS techniques and Brown et al. (2005) reviewed Fourier transform–ion cyclotron resonance–MS (FT–ICR–MS) based metabolome analysis. Research work of the last years show that MS serves as an essential tool for metabolome analysis, which allows an alternative orthogonal metabolite resolution. Coupled to a chromatographic setup, e.g., LC, GC, or CE, this provides a very powerful analytical system. Robust and effective MS interfaces for GC coupling are long-established techniques, but the development of atmospheric pressure based interfaces for LC/CE-MS coupling made incredible progress during the late 1990s and the following years. Due to the typically polar to ionic chemical characteristics of metabolites the electrospray iononization (ESI) (Fenn et al., 1989; Tarr et al., 2000) is one very important technique for metabolome analysis. The further development of ESI interfaces, being more efficient and sensitive, but also robust against matrix effects from dirty biological samples, was a substantial cornerstone for the LC-MS based metabolomics area. Even more resolution or sensitivity is achieved with tandem MS detectors or two dimensional chromatography steps like LC-MS/MS (Tolstikov and Fiehn, 2002; van Dam et al., 2002; Dalluge et al., 2004; Monique Piraud, 2005; Bajad et al., 2006; Coulier et al., 2006; Luo et al., 2007) or GCxGC-MS (Marriott and Shellie, 2002; Shellie et al., 2005; Welthagen et al., 2005) applications. Due to higher resolution chromatography coupled MS-based detection systems have shown their superior performance for metabolomics. An outstanding advantage of MS is its ability to distinguish between isotopes. This feature is widely used for stationary 13C metabolic flux analysis to determine the 13C-labeling in proteinogenic amino acids after long-term feeding of a 13C-labeled substrate (e.g., 13C-glucose) (Christensen and Nielsen, 1999; Dauner and Sauer, 2000). With the development of more advanced LC-MS/MS methods (van Dam et al., 2002; Luo et al., 2007) it is now possible to determine labeling information of intracellular central metabolism metabolites (Mashego et al., 2004; van Winden et al., 2005; Iwatani et al., 2007; Noh et al., 2007). Together with the absolute metabolite concentration which is determined in parallel this opens up the opportunity for isotopically nonstationary 13C metabolic flux analysis (Noh et al., 2006) enabling rapid 13C metabolic flux analysis as reported for the first time using E. coli as model system (Noh et al., 2007). The MS based isotope discrimination is not limited to carbon, in principle all other biologically relevant isotopes can be employed (e.g., 2H, 15N, 18O). An essential, but also still very time consuming work is the data processing after the LC-/GC- or CE-MS analysis, i.e., the peak assignment, integration, and manual handwork on the measured MS chromatograms. Especially in multicomponent analysis, shifting retention times due to variances in the metabolome content or matrix background, incorrect peak identification or not properly integrated compound peaks require a lot of manual handwork and control after the measurement. This data postprocessing is time consuming, but a necessary prerequisite to maintain sufficient data quality, especially when quantitative metabolite concentrations are the final goal. Although MS devices are usually supplied with data processing and analyzing software, the functionality of these software packages for the chromatographic MS raw data analysis is often limited. This lead to the unfavorable situation, that the

19-6

Tools for Experimentally Determining Flux through Pathways

data postprocessing can require a multiple amount of time compared to the pure measurement time of the analytical hardware. Hence, the manually guided data postprocessing of the raw data can be a major bottleneck in the total metabolome analysis. Although first steps are made to tackle this challenge by software tools (Stein, 1999; Duran et al., 2003; Katz et al., 2004; Baran et al., 2006; Broeckling et al., 2006; Katajamaa et al., 2006; van den Berg et al., 2006), the major breakthrough concerning routine and robust application seems to be still in the future. With respect to data quality, different quantification strategies can be employed, although unambiguous qualitative peak identification of a compound is required to obtain quantitative or semi-quantitative data of a compound. This is typically only possible by use of a reference standard of the respective compound. Due to their sometimes limited availabilities a reasonable compromise could be evaluated for unavailable analytes. For semiquantitative data simply the measured peak areas or heights (both in arbitrary units) are taken. These values are error-prone due to sample or matrix effects, but are suitable for direct comparison of the relative changes in metabolite content between similar samples. This negative analytical interference, usually called matrix effect, is often a result of ion suppression on the response of metabolites in complex biological fluids (Annesley, 2003). Every compound that coelutes with the analyte to be quantified, even if it is not detected during the analysis, can be a source of variation in the MS signal of the analyte being measured (Monique Piraud, 2005). This drawback can be overcome by the adoption of internal standards (IS, known from pharmaceutical analytics (Rashed, 2001; Zimmer, 2003). The used IS correct not only for matrix effects like ion suppression, but depending on the time point of addition, also for losses during sample preparation and variations of the overall analytic procedure. For real quantitative information about pool sizes in metabolome analysis reference standards are indispensable and external standard (ES), IS, standard-addition (SA) (Bader, 1980) or isotope-dilution (ID) (Wu et al., 2005) calibration methods can be used. ES calibration is only indicated, if no matrix interference is observed, e.g., at sufficient sample dilution. As mentioned before, IS, SA and ID additionally correct for matrix effects and can provide the valid metabolite concentration (in mM) of the sample in the presence of the whole dirty sample matrix. Recent publications proved the benefits of ID with stable isotope labeling of the analytes for metabolome quantification in S. cerevisiae and Salmonella enterica (Lafaye et al., 2005; Wu et al., 2005; Lu et al., 2006). An elegant access to such stable isotope labelled material for ID application is microbial cultivation with labelled substrates (e.g., 13C6 glucose) resulting in labelled metabolites after cell extraction (Wu et al., 2005).

19.3 Biochemical Engineering Aspects of Metabolomics The metabolite concentrations are a direct mirror of the physiological state of the cell and rapidly respond to changes in the cellular environment, e.g., changes of inhibitor or oxygen or substrate concentration. Roels (1983) put the different cellular response levels in an order with respect to time and argues that the metabolites will respond at subsecond time scale which is several orders of magnitude faster than any change in protein or even mRNA level is expected Figure 19.3. Indeed, estimations of metabolic fluxes, which are measures of the reaction rates between metabolite pools in the metabolic network, give rise to the conclusion that the metabolic turnover rates especially for central metabolism and cofactors are very high, e.g., estimated turnover rates for cytosolic glucose and glucose 6-phosphate is about 1 mMs -1 and 1.5 mMs -1 for ATP in S. cerevisiae (Dekoning and Vandam, 1992; Theobald et al., 1993). With pool sizes measured in the same range (Wu et al., 2005) a potential subsecond scale response of the pool size is most obvious. This rapid metabolic response was visualized for an E. coli strain in a stimulus–response experiment (SRE, reviewed in Oldiges and Takors, 2005) using 13C-glucose as stimulating reagent (Noh et al., 2007). In Figure 19.4 the 13C-labeling course of the intracellular glucose-6-phosphate (G6P) pool in E. coli is displayed after rapid switch to 13C-labeled substrate. A very fast response and an isotopical steady-state

19-7

Tools for Measuring Intermediate and Product Formation 10–610–510–410–310–210–1

101 102 103 104 105 106

Relaxation time (seconds)

100

Mass action law

Allosteric control

Enzyme concentration

m-RNA

Selection with a population of one or more species

control Evolutionary changes

Figure 19.3 Adaptional mechanisms in biological systems with respect to their relaxation time. (From Roels, J.A., Energetics and Kinetics in Biotechnology. Amsterdam, New York, Oxford, Elsevier, Biomedical Press, 1983. With permission.) 0.9

Glucose 6-phosphate (13C mass isotopomers)

0.8

+0 +1 +5 +6

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

2

4

6

8 Time (s)

10

12

14

16

Figure 19.4 Rapid increase of 13C mass isotopomer fractions in the intracellular glucose-6-phosphate pool in E. coli after addition of a labelled glucose mixture. (From Noh, K., Gronke, K., and B. Luo, et al. Elsevier B.V., Journal of Biotechnology, 129(2): 249–267, 2007. With permission.)

in less than 1 s is observed, resolved by a concerted rapid sampling/metabolic quenching approach and LC-MS/MS technology. To conserve a defined metabolic pattern within the biochemical network it is of utmost importance to immediately stop all metabolic activity by direct chemical disruption of the cell during sampling or by an efficient temperature drop to freeze the metabolic state, e.g., by spraying the sample into a cold (typically -50°C) aqueos/organic solution. This quenching of all metabolic activity and an overview of the different methods applied for the quenching, but also for cell extraction are presented in Wang et al. (2006), Villas-Boas et al. (2005a, 2005b) and Mashego et al. (2007). Recently, the use of heat exchangers were described for sampling of microbial (Schaub et al., 2006) and also mammalian cells (Wiendahl et al., 2007). For microbial systems the quenching step using cold temperature seems critical concerning metabolite leakage during this procedure. In fact, taking the amino acids as indicator Wittmann et al. (2004) made the first systematic investigation of the cold shock phenomenon for C. glutamicum. Whereas eukaryotic cells like S. cerevisiae (Dekoning and Vandam, 1992; Visser et al., 2002; Villas-Boas et al., 2005b), P. chrysogenum (Nasution et al., 2006) and A. niger (Ruijter and Visser, 1996) seem to be more or less resistant to this leakage effect, prokaryotic systems like

19-8

Tools for Experimentally Determining Flux through Pathways

L. lactis (Jensen et al., 1999), E. coli (Taymaz et al., 2006) and C. glutamicum (Wittmann et al., 2004; Magnus et al., 2006) seem to be sensitive to leakage. Significant amounts of intracellular metabolites were described in the extracellular medium after cold quenching and metabolite data from the authors lab confirm this severe effect for E. coli and C. glutamicum (unpublished results). It can be speculated that this effect is present for all prokaryotic systems and it is definitely a challenge for all endo-metabolome studies. To correct for and quantify the leakage effect, the metabolite concentration in the quenched cell extract, quenching and culture supernatant and from a total cell extract combining extra and intracellular metabolite amounts needs to be measured, also recommended by Mashego et al. (2007). Apart an appropriate sampling procedure the physiological state of the cells at the sampling time point is crucial. Differences in the metabolite pattern are expected if the culture conditions are not properly defined and controlled. Hence, from a biochemical engineering point of view a bioreactor is the best suited device for metabolome investigations since a lot of relevant process parameters (e.g., substrate supply, oxygen concentration, pH, temperature etc.) are monitored or are under process control. For functional genomics investigations more simpler cultivation systems, e.g., the standard shake flask might be sufficient, if the relevant differences in the metabolic pattern in mutant strains are much more significant than the variation originating from the cultivation. Thus, controlled culture conditions with the aid of biochemical engineering can sharpen the view for minor influences besides the highly visible effects in functional genomic studies. Independent of the used cultivation system it is necessary to include a metabolic quenching step for all metabolomics investigations to ensure the stop of all metabolic activities as fast as possible, because the network response is expected to be below 1 s (Figure 19.3). Consequently, leaving out this step leads to unpredictable changes in the metabolome, resulting in unreliable data of mutant and a reference strain and even worse, it can result in wrong gene-function correlations. To allow a better comparison, evaluation and interpretation of published metabolome data the Metabolome Standardization Initiative (MSI) was launched as a follow-up counterpart to the MIAMI standardization initiative for microarray data (Brazma et al., 2001) (MIAMI Update 2005: http://www. mged.org/Workgroups/MIAME/MIAMEchecklist_Jan2005.pdf). MSI aims at the identification and development of a minimal set of reporting standards and a best practice guideline for reporting standards for biological samples in metabolomics experiments (Report of MSI Subgroup: In vitro biology/microbiology: http://msi-workgroups.sourceforge.net/bio-metadata/ reporting/invitro/doc.pdf). This definition include detailed metabolomics specific information about, e.g., cultivation, sampling/quenching, sample preparation, storage, analytics, data handling and other critical parameters in metabolomics experiments. A suitable and very famous setup to obtain in vivo kinetic network information based on metabolome measurement is a stimulus-response experiment (reviewed in Oldiges and Takors, 2005) in which a culture is rapidly shifted away from its steady-state by a sudden change in the cellular environment. This is typically realized by a step increase function for the substrate (e.g., glucose) resulting in a dynamic intracellular response of the metabolite concentrations. To monitor these dynamics a series of metabolome samples are taken before and after the step increase. First pioneering work for S. cerevisiae was reported by Theobald et al. (1997) who manually sampled 15–20 samples per minute. Later, automated sampling devices with exact documentation of the sampling times were introduced (Schaefer et al., 1999; Buziol et al., 2002; Visser et al., 2002; Schaub et al., 2006). Next generation sampling devices aiming at further downscale, experimental flexibility but also mobility of the setup have been recently described (Mashego et al., 2006; Noh et al., 2007). Due to their smaller scale they afford the frequent use of cost intensive isotope substrates (e.g., 13C, 2H, 18O, 15N, etc.). Based on data sets from metabolome measurements, kinetic mathematical modeling (Wiechert, 2002; Heijnen, 2005) have been described for central metabolism of S. cerevisiae (Rizzi et al., 1997; Vaseghi et al., 1999), E. coli (Chassagnole et al., 2002; Degenring et al., 2004; Visser et al., 2004) and glycolysis in L.

Tools for Measuring Intermediate and Product Formation

19-9

lactis (Voit et al., 2006), but also for biosynthetic pathways leading to shikimate in Streptococcus pneumoniae (Noble et al., 2006), L-phenylalanine (Wahl et al., 2006) and L-threonine (Chassagnole et al., 2001) in E. coli, L-valine/L-leucin (Magnus et al., 2006) and L-lysine (Yang et al., 1999) in C. glutamicum and penicillin in P. chrysogenum (Pissara et al., 1996). To obtain reliable data from metabolomics, the aspect of biochemical engineering is important. Especially with respect to their use for mathematical modeling purposes, attention must be paid to the experimental and sampling conditions used for the data generation. For kinetic in vivo studies stimulus–response experiments are valuable and a very information rich experimental setup. Data and models from those experiments provide insight into the function and kinetic of the metabolic network in vivo and facilitate the identification of metabolic engineering targets for strain and process optimization (Oldiges et al., 2004; Magnus et al., 2006).

19.4 Isotopically Nonstationary 13C Metabolic Flux Analysis To calculate intracellular reaction rates, usually 13C metabolic flux analysis (Wiechert, 2001; Fischer et al., 2004; Sauer, 2006) is used. The information about the stationary 13C-labeling state of proteinogenic amino acids, after hydrolysis of the cellular protein, is required. This data is typically acquired via GC-MS techniques (Christensen and Nielsen, 1999; Dauner and Sauer, 2000) which have replaced the NMR based 13C measurement (Marx et al., 1996). Use of the stationary 13C information from intracellular metabolites has also been described, i.e., amino acids (Iwatani et al., 2007) and central metabolism (van Winden et al., 2005). Recently, the approach of isotopically nonstationary 13C metabolic flux analysis (INST 13C MFA) has been theoretically elaborated (Noh et al., 2006) and was successfully conducted making use of the 13C labeling dynamics of the central metabolic intermediates after application of 13C labeled substrate at subsecond to second time scale (Noh et al., 2007). With the development of advanced LC-MS/MS methods (van Dam et al., 2002; Luo et al., 2007) it is now possible to determine labeling information of intracellular central metabolism intermediates, together with the absolute intracellular metabolite concentration. Both informations are strictly required for the successful application of the nonstationary approach of 13C metabolic flux analysis. The key to the access to the labeling dynamics was the development of a new rapid sampling device specially designed and connected to a small scale bioreactor, in this particular case to the sensor reactor setup (El Massaoudi et al., 2003). In such an INST 13C MFA experiment 13C-glucose was pulsed to a residual saturating concentration of naturally labeled glucose in an E. coli fed-batch fermentation process. Using rapid sampling at subsecond to second time scale immediately after addition of labelled glucose the 13C labeling dynamics in the intracellular metabolites are monitored via LC-MS/MS analysis. In Figure 19.5 the determined 13C-labeling data of some metabolites from glycolysis and pentose phosphate pathway are shown. The labeling dynamics in the pools of G6P, fructose 1,6-bisphosphate (FBP), phosphoenolpyruvate (PEP), lumped pentose 5-phosphates (P5P) and sedoheptulose 7-phosphate (S7P) are very fast, while pyruvate showed an unexpected slow behavior. For G6P showing the fastest response, an isotopical steady-state is reached within the first second. The rapid sampling device together with metabolic quenching proofs to be a valid method to give a representative picture of the fast labeling dynamics in this experimental setup. Together with the measured data from the tricarboxilic acid cycle (TCA) metabolites it was possible to calculate the complete metabolic flux map of the central metabolic pathways on the basis of an INST 13C MFA experiment (Figure 19.6). Finally, only 11 metabolome samples taken within the first 16 seconds of the isotopically nonstationary experiment were sufficient to show that this approach allows 13C MFA at ultra short time scale. In the past 13C MFA was always dependent on growing biological systems which incorporated 13C information into the proteins of the biomass. The INST 13C MFA is no longer dependent on the label incorporation into the proteins and it allows very short experimental time frames (in the range

19-10

Tools for Experimentally Determining Flux through Pathways G6P

0.9

+0 +1 +5 +6

0.8 0.7 0.6

0.7 0.6 0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1 2

4

6

8 10 Time (s)

12

14

16

P5P 0.9 0.7 0.6

0

6

8 10 Time (s)

12

14

+0 +1 +2 +3 +4 +5 +6

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1 2

4

6

8 10 Time (s)

12

14

16

PEP 0.9 0.7 0.6 0.5 0.4 0.3 0.2 0.1 2

4

6

8 10 Time (s)

12

14

0

2

4

6

16

8 10 Time (s)

12

14

16

Pyr

1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

+0 +1 +2 +3

0.8

16

S7P

0.7

0.4

0

4

0.8

0.5

0

2

0.9

+0 +1 +2 +3 +5

0.8

+0 +1 +2 +3 +4 +5 +6

0.8

0.5

0

FBP

0.9

+0 +3

0

2

4

6

8 10 Time (s)

12

14

16

Figure 19.5 13C labeling dynamics (represented by the mass isotopomer fractions) of some representative intracellular metabolites of gycolysis and pentose phosphate pathway together with standard deviation (symbols) and fitted labeling dynamics after INST 13C MFA (line). (From Noh, K., Gronke, K., and B. Luo, et al. Elsevier B.V., Journal of Biotechnology, 129(2): 249–267, 2007. With permission.)

of seconds or minutes). Hence, this method can be expected to make substantial contributions to the application of metabolic flux analysis for nongrowing cells (e.g., in growth decoupled production), short transient biological changes, instable recombinant cells or slow growing cells (e.g., mammalian cell cultures) in the near future.

19-11

Tools for Measuring Intermediate and Product Formation Gluc

Emden-Meyerhof-Parnas pathway (EMP)

100.0 upt 2.1

BM

BM

1.2

3

DHAP

4

16.5

16.3 47.6 4

16.3 14.3 5

FBP

BM

Exchange flux

Rib5P Rul5P Xyl5P

3

12.9 10.8

BM

9.0 70.6

2 54.4

2 71.8

70.6 4151.2

Net flux

PG6

1

F6P

0.9

71.8 4.5

BM

CO2

54.4 1

G6P

43.5 104.8

9.0

P5Pxch

S7P 3.4

E4P

GAP

BM

5

Pentose phosphate pathway (PPP)

13PG 2PG 3PG 6 138.8

CO2

30.9 0.0

1

CO2

PEP

7.7

Pyr

30.0

7 100.2 0.0

92.0

21.8 2 4138.6

BM

AcCoA

Anaplerosis (ANA)

BM

CO2

1

35.9 2

OAA 16.3 2008.7

Asp 16.3 BM

21.3 6 2.6

BM

35.9

2 20.2

Cit ICit

GlyOx Glyoxylate shunt MAL 1 (GS)

22.9 5a/b 2920.5

3

15.7

AKG

20.2

Succ FUM

Tricarboxylic acid cycle (TCA)

4

2.7

CO2

13.0 4151.2

Glut

13.0

BM

CO2

Figure 19.6 Network model of the central metabolism of E. coli. Calculated fluxes based on INST 13C MFA are given in % of glucose uptake flux. (From Noh, K., Gronke, K., and B. Luo, et al. Elsevier B.V., Journal of Biotechnology, 2007. With permission.)

19.5 Outlook In times of emerging systems biology metabolome analysis is broadening its place in the family of the omic technologies not only for qualitative identification and relative changes in the metabolite levels for functional genomics but also to access reliable absolute metabolite levels to get hands on in vivo kinetic data for broad range of biological systems. Future activities will focus on the further improvement of analytical methods to enable a more robust, efficient and reliable analysis of complex multicomponent metabolome samples. Hence, increase of analytical resolution, accuracy and speed of analysis is highly desirable. The application of comprehensive two-dimensional GC (GCxGC) coupled to MS adds new flavor to the field of GC based metabolome analysis (Marriott and Shellie, 2002; Shellie, 2005). Compared to one-dimensional GC the resolution in the two-dimensional GC is significantly higher and, most noteworthy, sensitivity increases.

19-12

Tools for Experimentally Determining Flux through Pathways

In terms of LC separation of complex matrices, ultra performance liquid chromatography (UPLC), a new design in LC seems to be very promising concerning increased resolution, sensitivity and speed of analysis. Churchwell et al. (2005) directly compared UPLC-MS to HPLC-MS experienced a general improvement in sensitivity (up to ten-fold), total analysis time (five-fold) and separation of diastereomers not possible using HPLC. A drawback of UPLC-MS couplings is the requirement of high data acquisition rates for mass detectors, e.g., time of flight and quadrupole with fast scan rates to ensure a sufficient number of data points across narrow LC peaks. Another promising development is the FT-ICR-MS and its application to metabolome analysis (Breitling et al., 2006), reviewed in Brown et al. (2005). A very high mass resolution combined with relatively fast data aquisition rates characterizes this MS technique. In a proof of concept study using strawberry fruit based material and tobacco flower extracts FT–ICR–MS was applied for the metabolic screening of differentially expressed metabolites (Aharoni et al., 2002). The separation of the complex mixtures was achieved by the extreme high mass resolution. Identification of the different substances was performed by elucidating the elemental composition based on the accurate mass determination. No chromatography for separation or derivatization was needed. We cannot deny the trend toward fusion and hyphenation of available analytical techniques to achieve maximum output of existing technologies. Promising for metabolomics are combinations of stationary phases, chromatographic techniques, different scan modes, information-dependent data acquisition (Staack, 2005) and hardware combinations, meaning quadrupole/linear ion trap (Q-Trap) and quadrupole/time of flight (Q-TOF) and linear ion trap/FT–ICR developments. The only recent, but finally also not “new” design applicable to metabolome analysis is UPLC in the field of chromatography. There is also a clear and necessary trend to improve comparability of metabolome data sets in literature. The MSI provides a suggestion for minimal reporting standards and a best practice guideline for reporting standards for biological samples in metabolomics experiments. In 2005 the call to contribute for a MS database for all available MS data of reference standard was published (Kopka et al., 2005), first data collections were already published in the past (Wagner et al., 2003; Schauer et al., 2005). All aim at a comprehensive collection of biologically related compounds in plants, microorganism or other biological system. In combination with more sophisticated and reliable automated raw data postprocessing this will push high, but also improve low throughput metabolome applications.

Acknowledgment The author is indebted to Prof. Dr. C. Wandrey for ongoing support and fruitful discussions.

References Aharoni, A., C. H. Ric de Vos, et al. 2002. Nontargeted metabolome analysis by use of Fourier transform ion cyclotron mass spectrometry. Omics, 6(3): 217–34. Allen, J., H. M. Davey, et al. 2003. High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nature Biotechnology, 21(6): 692–96. Annesley, T. M. 2003. Ion suppression in mass spectrometry. Clinical Chemistry, 49(7): 1041–44. Bader, M. 1980. A systematic-approach to standard addition methods in instrumental analysis. Journal of Chemical Education, 57(10): 703–706. Bajad, S. U., W. Y. Lu, et al. 2006. Separation and quantitation of water soluble cellular metabolites by hydrophilic interaction chromatography-tandem mass spectrometry. Journal of Chromatography A, 1125(1): 76–88. Baran, R., H. Kochi, et al. 2006. MathDAMP: a package for differential analysis of metabolite profiles. BMC Bioinformatics, 7: 530. Bergmeyer, H. 1984. Methods of Enzymatic Analysis. Weinheim, Verlag Chemie.

Tools for Measuring Intermediate and Product Formation

19-13

Bhattacharya, M., L. Fuhrman, et al. 1995. Single-run separation and detection of multiple metabolic intermediates by anion-exchange high-performance liquid-chromatography and application to cell pool extracts prepared from Escherichia coli. Analytical Biochemistry, 232(1): 98–106. Brazma, A., P. Hingamp, et al. 2001. Minimum information about a microarray experiment (MIAME)— toward standards for microarray data. Nature Genetics, 29(4): 365–71. Breitling, R., A. R. Pitt, et al. 2006. Precision mapping of the metabolome. Trends in Biotechnology, 24(12): 543–48. Broeckling, C. D., I. R. Reddy, et al. 2006. MET-IDEA: Data extraction tool for mass spectrometry-based metabolomics. Analytical Chemistry, 78(13): 4334–41. Brown, S. C., G. Kruppa, et al. 2005. Metabolomics applications of FT-ICR mass spectrometry. Mass Spectrometry Reviews, 24(2): 223–31. Buchholz, A., J. Hurlebaus, et al. 2002. Metabolomics: quantification of intracellular metabolite dynamics. Biomolecular Engineering, 19(1): 5–15. Buchholz, A., R. Takors, et al. 2001. Quantification of intracellular metabolites in Escherichia coli K12 using liquid chromatographic-electrospray ionization tandem mass spectrometric techniques. Analytical Biochemistry, 295(2): 129–37. Buziol, S., I. Bashir, et al. 2002. New bioreactor-coupled rapid stopped-flow sampling technique for measurements of metabolite dynamics on a subsecond time scale. Biotechnology and Bioengineering, 80(6): 632–36. C. Wiendahl, J. J. Brandner, et al. 2007. A microstructure heat exchanger for quenching the metabolism of mammalian cells. Chemical Engineering & Technology, 30(3): 322–28. Castrillo, J. I., A. Hayes, et al. 2003. An optimized protocol for metabolome analysis in yeast using direct infusion electrospray mass spectrometry. Phytochemistry, 62(6): 929–37. Chassagnole, C., D. A. Fell, et al. 2001. Control of the threonine-synthesis pathway in Escherichia coli: a theoretical and experimental approach. Biochemical Journal, 356: 433–44. Chassagnole, C., N. Noisommit-Rizzi, et al. 2002. Dynamic modeling of the central carbon metabolism of Escherichia coli. Biotechnology and Bioengineering, 79(1): 53–73. Christensen, B. and J. Nielsen. 1999. Isotopomer analysis using GC-MS. Metabolic Engineering, 1: 282–90. Churchwell, M. I., N. C. Twaddle, et al. 2005. Improving LC-MS sensitivity through increases in chromatographic performance: comparisons of UPLC-ES/MS/MS to HPLC-ES/MS/MS. Journal of Chromatography B, 825(2): 134–43. Coulier, L., R. Bas, et al. 2006. Simultaneous quantitative analysis of metabolites using ion-pair liquid chromatography–electrospray ionization mass spectrometry. Analytical Chemistry, 78(18): 6573–82. Dalluge, J. J., S. Smith, et al. 2004. Potential of fermentation profiling via rapid measurement of amino acid metabolism by liquid chromatography-tandem mass spectrometry. Journal of Chromatography A 23rd International Symposium on the Separation of Proteins, Peptides and Polynucleotides, 1043(1): 3–7. Dauner, M. and U. Sauer. 2000. GC-MS analysis of amino acids rapidly provides rich information for isotopomer balancing. Biotechnology Progress, 16(4): 642–49. De Graaf, A. A., K. Striegel, et al. 1999. Metabolic state of Zymomonas mobilis in glucose-, fructose-, and xylose-fed continuous cultures as analysed by C-13- and P-31-NMR spectroscopy. Archives of Microbiology, 171(6): 371–85. Degenring, D., C. Froemel, et al. 2004. Sensitivity analysis for the reduction of complex metabolism models. Journal of Process Control, 14(7): 729–45. Dekoning, W. and K. Vandam. 1992. A method for the determination of changes of glycolytic metabolites in yeast on a subsecond time scale using extraction at neutral pH. Analytical Biochemistry, 204(1): 118–23. Dettmer, K., P. A. Aronov, et al. 2007. Mass spectrometry-based metabolomics. Mass Spectrometry Reviews, 26(1): 51–78.

19-14

Tools for Experimentally Determining Flux through Pathways

Dunn, W. B., N. J. C. Bailey, et al. 2005. Measuring the metabolome: current analytical technologies. Analyst, 130(5): 606–25. Duran, A. L., J. Yang, et al. 2003. Metabolomics spectral formatting, alignment and conversion tools (MSFACTs). Bioinformatics, 19(17): 2283–93. Edwards, J. L., C. N. Chisolm, et al. 2006. Negative mode sheathless capillary electrophoresis electrospray ionization-mass spectrometry for metabolite analysis of prokaryotes. Journal of Chromatography A, 1106(1–2): 80–88. El Massaoudi, M., J. Spelthahn, et al. 2003. Production process monitoring by serial mapping of microbial carbon flux distributions using a novel sensor reactor approach: I—Sensor reactor system. Metabolic Engineering, 5(2): 86–95. Even, S., N. D. Lindley, et al. 2003. Transcriptional, translational and metabolic regulation of glycolysis in Lactococcus lactis subsp. cremoris MG 1363 grown in continuous acidic cultures. Microbiology-SGM, 149: 1935–44. Fell, D. A. 2001. Beyond genomics. Trends in Genetics, 17(12): 680–82. Fenn, J. B., M. Mann, et al. 1989. Electrospray ionization for mass-spectrometry of large biomolecules. Science, 246(4926): 64–71. Feurle, J., H. Jomaa, et al. 1998. Analysis of phosphorylated carbohydrates by high-performance liquid chromatography electrospray ionization tandem mass spectrometry utilising a beta-cyclodextrin bonded stationary phase. Journal of Chromatography A, 803(1–2): 111–19. Fiehn, O. 2001. Combining genomics, metabolome analysis, and biochemical modelling to understand metabolic networks. Comparative and Functional Genomics, 2(3): 155–68. Fiehn, O. 2002. Metabolomics—the link between genotypes and phenotypes. Plant Molecular Biology, 48(1–2): 155–71. Fiehn, O., J. Kopka, et al. 2000. Metabolite profiling for plant functional genomics. Nature Biotechnology, 18(11): 1157–61. Fiehn, O. and W. Weckwerth. 2003. Deciphering metabolic networks. European Journal of Biochemistry, 270(4): 579–88. Fischer, E., N. Zamboni, et al. 2004. High-throughput metabolic flux analysis based on gas chromatography-mass spectrometry derived C-13 constraints. Analytical Biochemistry, 325(2): 308–16. Forster, J., I. Famili, et al. 2003. Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Research, 13(2): 244–53. Fuzfai, Z., Z. F. Katona, et al. 2004. Simultaneous identification and quantification of the sugar, sugar alcohol, and carboxylic acid contents of sour cherry, apple, and ber fruits, as their trimethylsilyl derivatives, by gas chromatography-mass spectrometry. Journal of Agricultural and Food Chemistry, 52(25): 7444–52. Goodacre, R. 2005. Metabolomics—the way forward. Metabolomics, 1(1): 1–2. Hall, R. D. 2006. Plant metabolomics: from holistic hope, to hype, to hot topic. New Phytologist, 169(3): 453–68. Harvey, D. J. and M. G. Horning. 1973. Characterization of trimethylsilyl derivatives of sugar phosphates and related compounds by gas-chromatography and gas-chromatography mass-spectrometry. Journal of Chromatography, 76(1): 51–62. Heijnen, J. J. 2005. Approximative kinetic formats used in metabolic network modeling. Biotechnology and Bioengineering, 91(5): 534–45. Hollywood, K., D. R. Brison, et al. 2006. Metabolomics: current technologies and future trends. Proteomics, 6(17): 4716–23. Iwatani, S., S. Van Dien, et al. 2007. Determination of metabolic flux changes during fed-batch cultivation from measurements of intracellular amino acids by LC-MS/MS. Journal of Biotechnology, 128(1): 93–111. Jensen, N. B. S., K. V. Jokumsen, et al. 1999. Determination of the phosphorylated sugars of the EmbdenMeyerhoff-Parnas pathway in Lactococcus lactis using a fast sampling technique and solid phase extraction. Biotechnology and Bioengineering, 63(3): 356–62.

Tools for Measuring Intermediate and Product Formation

19-15

Katajamaa, M., J. Miettinen, et al. 2006. MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics, 22(5): 634–36. Katona, Z. F., P. Sass, et al. 1999. Simultaneous determination of sugars, sugar alcohols, acids and amino acids in apricots by gas chromatography-mass spectrometry. Journal of Chromatography A, 847(1–2): 91–102. Katz, J. E., D. S. Dumlao, et al. 2004. A new technique (COMSPARI) to facilitate the identification of minor compounds in complex mixtures by GC/MS and LC/MS: Tools for the visualization of matched datasets. Journal of the American Society for Mass Spectrometry, 15(4): 580–84. Kell, D. B. 2004. Metabolomics and systems biology: making sense of the soup. Current Opinion in Microbiology, 7(3): 296–307. Kell, D. B., M. Brown, et al. 2005. Metabolic footprinting and systems biology: the medium is the message. Nature Reviews Microbiology, 3(7): 557–65. Keseler, I. M., J. Collado-Vides, et al. 2005. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Research, 33: D334–D337. Koek, M. M., B. Muilwijk, et al. 2006. Microbial metabolomics with gas chromatography/mass spectrometry. Analytical Chemistry, 78(4): 1272–81. Kopka, J. 2006. Current challenges and developments in GC-MS based metabolite profiling technology. Journal of Biotechnology, 124(1): 312–22. Kopka, J., N. Schauer, et al. 2005. [email protected]: the Golm Metabolome Database. Bioinformatics, 21(8): 1635–38. Lafaye, A., J. Labarre, et al. 2005. Liquid chromatography-mass spectrometry and 15N metabolic labeling for quantitative metabolic profiling. Analytical Chemistry, 77(7): 2026–33. Lederberg, J. and A. T. McCray. 2001. Ome sweet omics—A genealogical treasury of words. Scientist, 15(7): 8. Lu, W., E. Kimball, et al. 2006. A high-performance liquid chromatography-tandem mass spectrometry method for quantitation of nitrogen-containing intracellular metabolites. Journal of the American Society for Mass Spectrometry, 17(1): 37–50. Luo, B., K. Groenke, et al. 2007. Simultaneous determination of multiple intracellular metabolites in glycolysis, pentose phosphate pathway and tricarboxylic acid cycle by liquid chromatography-mass spectrometry. Journal of Chromatography A, 1147(2): 153–164. Magnus, J. B., D. Hollwedel, et al. 2006. Monitoring and modeling of the reaction dynamics in the valine/ leucine synthesis pathway in Corynebacterium glutamicum. Biotechnology Progress, 22(4): 1071–83. Marriott, P. and R. Shellie. 2002. Principles and applications of comprehensive two-dimensional gas chromatography. Trac-Trends in Analytical Chemistry, 21(9–10): 573–83. Marx, A., A. A. deGraaf, et al. 1996. Determination of the fluxes in the central metabolism of Corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolite balancing. Biotechnology and Bioengineering, 49(2): 111–29. Mashego, M. R., K. Rumbold, et al. 2007. Microbial metabolomics: past, present and future methodologies. Biotechnology Letters, 29(1): 1–16. Mashego, M. R., W. M. van Gulik, et al. 2006. In vivo kinetics with rapid perturbation experiments in Saccharomyces cerevisiae using a second-generation BioScope. Metabolic Engineering, 8(4): 370–83. Mashego, M. R., L. Wu, et al. 2004. MIRACLE: mass isotopomer ratio analysis of U-C-13-labeled extracts. A new method for accurate quantification of changes in concentrations of intracellular metabolites. Biotechnology and Bioengineering, 85(6): 620–28. Meyer, S., N. Noisommit-Rizzi, et al. 1999. Optimized analysis of intracellular adenosine and guanosine phosphates in Escherichia coli. Analytical Biochemistry, 271(1): 43–52. Mungur, R., A. D. M. Glass, et al. 2005. Metabolite fingerprinting in transgenic Nicotiana tabacum altered by the Escherichia coli glutamate dehydrogenase gene. Journal of Biomedicine and Biotechnology, 2: 198–214. Nasution, U., W. M. Van Gulik, et al. 2006. Measurement of intracellular metabolites of primary metabolism and adenine nucleotides in chemostat cultivated Penicillium chrysogenum. Biotechnology and Bioengineering, 94(1): 159–66.

19-16

Tools for Experimentally Determining Flux through Pathways

Neidhardt, F. C. and H. E. Umbarger. 1996. Chemical composition of Escherichia coli. In Escherichia coli and Salmonella typhimurium—Cellular and Molecular Biology. F. C. Neidhardt, Editor. Washington DC, ASM Press. Neves, A. R., A. Ramos, et al. 1999. In vivo nuclear magnetic resonance studies of glycolytic kinetics in Lactococcus lactis. Biotechnology and Bioengineering, 64(2): 200–12. Nielsen, J. 2001. Metabolic engineering. Applied Microbiology and Biotechnology, 55(3): 263–83. Nielsen, J. and S. Oliver. 2005. The next wave in metabolome analysis. Trends in Biotechnology, 23(11): 544–46. Nobeli, I., H. Ponstingl, et al. 2003. A structure-based anatomy of the E. coli metabolome. Journal of Molecular Biology, 334(4): 697–719. Noble, M., Y. Sinha, et al. 2006. The kinetic model of the shikimate pathway as a tool to optimize enzyme assays for high-throughput screening. Biotechnology and Bioengineering, 95(4): 560–73. Noh, K., K. Gronke, B. Luo, et al. 2007. Metabolic flux analysis at ultra short time scale: isotopically nonC-13 stationary labeling experiments. Journal of Biotechnology, 129(2): 249–267. Noh, K., A. Wahl, et al. 2006. Computational tools for isotopically instationary C-13 labeling experiments under metabolic steady state conditions. Metabolic Engineering, 8(6): 554–77. Ogino, T., C. Garner, J. L. Markley, and K. M. Herrmann. 1982. Biosynthesis of aromatic compounds: 13C NMR spectroscopy of whole Escherichia coli cells. Proc. Natl. Acad. Sci., USA. 79: 5828–5832. Oldiges, M., M. Kunze, et al. 2004. Stimulation, monitoring, and analysis of pathway dynamics by metabolic profiling in the aromatic amino acid pathway. Biotechnology Progress, 20(6): 1623–33. Oldiges, M. and R. Takors. 2005. Applying metabolic profiling techniques for stimulus-response experiments: Chances and pitfalls. Technology Transfer in Biotechnology: From Lab to Industry to Production, 92: 173–96. Oliver, S. G., M. K. Winson, et al. 1998. Systematic functional analysis of the yeast genome. Trends Biotechnol., 16(9): 373–78. Piraud, M., C. Vianey-Saban, K. Petritis, C. Elfakir, J. P. Steghens, D. Bouchu. 2005. Ion-pairing reversedphase liquid chromatography/electrospray ionization mass spectrometric analysis of 76 underivatized amino acids of biological interest: a new tool for the diagnosis of inherited disorders of amino acid metabolism. Rapid Communications in Mass Spectrometry, 19(12): 1587–1602. Pissara, P. D., J. Nielsen, et al. 1996. Pathway kinetics and metabolic control analysis of a high-yielding strain of Penicillium chrysogenum during fed batch cultivations. Biotechnology and Bioengineering, 51(2): 168–76. Raab, R. M., K. Tyo, et al. 2005. Metabolic engineering. Biotechnology for the Future, 100: 1–17. Raamsdonk, L. M., B. Teusink, et al. 2001. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nature Biotechnology, 19(1): 45–50. Ramautar, R., A. Demirci, et al. 2006. Capillary electrophoresis in metabolomics. Trac-Trends in Analytical Chemistry, 25(5): 455–66. Rashed, M. S. 2001. Clinical applications of tandem mass spectrometry: ten years of diagnosis and screening for inherited metabolic diseases. Journal of Chromatography B-Analytical Technologies in the Biomedical and Life Sciences, 758(1): 27–48. Rizzi, M., M. Baltes, et al. 1997. In vivo analysis of metabolic dynamics in Saccharomyces cerevisiae. 2. Mathematical model. Biotechnology and Bioengineering, 55(4): 592–608. Rochfort, S. 2005. Metabolomics reviewed: A new “Omics” platform technology for systems biology and implications for natural products research. Journal of Natural Products, 68(12): 1813–20. Roels, J. A. 1983. Energetics and Kinetics in Biotechnology. Amsterdam, New York, Oxford, Elsevier, Biomedical Press. Roessner, U., C. Wagner, et al. 2000. Simultaneous analysis of metabolites in potato tuber by gas chromatography-mass spectrometry. Plant Journal, 23(1): 131–42. Staack, R. F., E. Varesio, G. Hopfgartner. 2005. The combination of liquid chromatography/tandem mass spectrometry and chip-based infusion for improved screening and characterization of drug metabolites. Rapid Communications in Mass Spectrometry, 19(5): 618–26.

Tools for Measuring Intermediate and Product Formation

19-17

Ruijter, G. J. G. and J. Visser. 1996. Determination of intermediary metabolites in Aspergillus niger. Journal of Microbiological Methods, 25(3): 295–302. Sauer, U. 2006. Metabolic networks in motion: C-13-based flux analysis. Molecular Systems Biology. 2: 62. Schaefer, U., W. Boos, et al. 1999. Automated sampling device for monitoring intracellular metabolite dynamics. Analytical Biochemistry, 270(1): 88–96. Schaub, J., C. Schiesling, et al. 2006. Integrated sampling procedure for metabolome analysis. Biotechnology Progress, 22(5): 1434–42. Schauer, N., D. Steinhauser, et al. 2005. GC-MS libraries for the rapid identification of metabolites in complex biological samples. FEBS Letters, 579(6): 1332–37. Shellie, R. A. 2005. Comprehensive two-dimensional gas chromatography-mass spectrometry and its use in high-resolution metabolomics. Australian Journal of Chemistry, 58(8): 619–619. Shellie, R. A., W. Welthagen, et al. 2005. Statistical methods for comparing comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry results: Metabolomic analysis of mouse tissue extracts. Journal of Chromatography A, 1086(1–2): 83–90. Smits, H. P., A. Cohen, et al. 1998. Cleanup and analysis of sugar phosphates in biological extracts by using solid-phase extraction and anion-exchange chromatography with pulsed amperometric detection. Analytical Biochemistry, 261(1): 36–42. Soga, T., Y. Ohashi, et al. 2003. Quantitative metabolome analysis using capillary electrophoresis mass spectrometry. Journal of Proteome Research, 2(5): 488–94. Soga, T., Y. Ueno, et al. 2002. Simultaneous determination of anionic intermediates for Bacillus subtilis metabolic pathways by capillary electrophoresis electrospray ionization mass spectrometry. Analytical Chemistry, 74(10): 2233–39. Stein, S. E. 1999. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. Journal of the American Society for Mass Spectrometry, 10(8): 770–81. Sumner, L. W., P. Mendes, et al. 2003. Plant metabolomics: large-scale phytochemistry in the functional genomics era. Phytochemistry, 62(6): 817–36. Tarr, M. A., J. Zhu, et al. 2000. Atmospheric pressure ionization mass spectrometry. In Encyclopedia of Analytical Chemistry. R. A. Meyers, Editor. Volume 13. Chichester, John Wiley & Sons. 11597–630. Taymaz, H., M. R. Mashego, et al. 2006. Evaluation of experimental protocols for metabolome analysis in Escherichia coli K-12 MG1655 (poster contribution). Metabolic Engineering VI Conference. Nordwijkerhout, The Netherlands. Teleman, A., P. Richard, et al. 1999. Identification and quantitation of phosphorus metabolites in yeast neutral pH extracts by nuclear magnetic resonance spectroscopy. Analytical Biochemistry, 272(1): 71–79. Theobald, U., W. Mailinger, et al. 1997. In vivo analysis of metabolic dynamics in Saccharomyces cerevisiae. 1. Experimental observations. Biotechnology and Bioengineering, 55(2): 305–16. Theobald, U., W. Mailinger, et al. 1993. In vivo analysis of glucose-induced fast changes in yeast adeninenucleotide pool applying a rapid sampling technique. Analytical Biochemistry, 214(1): 31–37. Tolstikov, V. V. and O. Fiehn. 2002. Analysis of highly polar compounds of plant origin: combination of hydrophilic interaction chromatography and electrospray ion trap mass spectrometry. Analytical Biochemistry, 301(2): 298–307. Tweeddale, H., L. Notley-McRobb, et al. 1998. Effect of slow growth on metabolism of Escherichia coli, as revealed by global metabolite pool (“metabolome”) analysis. Journal of Bacteriology, 180(19): 5109–16. van Dam, J. C., M. R. Eman, et al. 2002. Analysis of glycolytic intermediates in Saccharomyces cerevisiae using anion exchange chromatography and electrospray ionization with tandem mass spectrometric detection. Analytica Chimica Acta, 460(2): 209–18. van den Berg, R. A., H. C. J. Hoefsloot, et al. 2006. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7: 142. van Winden, W. A., J. C. van Dam, et al. 2005. Metabolic-flux analysis of Saccharomyces cerevisiae CEN. PK113-7D based on mass isotopomer measurements of C-13-labeled primary metabolites. FEMS Yeast Research, 5(6–7): 559–68.

19-18

Tools for Experimentally Determining Flux through Pathways

Vaseghi, S., A. Baumeister, et al. 1999. In vivo dynamics of the pentose phosphate pathway in Saccharomyces cerevisiae. Metabolic Engineering, 1: 128–40. Villas-Boas, S. G., S. Mas, et al. (2005a). Mass spectrometry in metabolome analysis. Mass Spectrometry Reviews, 24(5): 613–46. Villas-Boas, S. G., J. Hojer-Pedersen, et al. 2005b. Global metabolite analysis of yeast: evaluation of sample preparation methods. Yeast, 22(14): 1155–69. Visser, D., J. W. Schmid, et al. 2004. Optimal re-design of primary metabolism in Escherichia coli using linlog kinetics. Metabolic Engineering, 6(4): 378–90. Visser, D., G. A. van Zuylen, et al. 2002. Rapid sampling for analysis of in vivo kinetics using the BioScope: A system for continuous-pulse experiments. Biotechnology and Bioengineering, 79(6): 674–81. Voit, E. O., J. Almeida, et al. 2006. Regulation of glycolysis in Lactococcus lactis: an unfinished systems biological case study. IEE Proceedings Systems Biology, 153(4): 286–98. Wagner, C., M. Sefkow, et al. 2003. Construction and application of a mass spectral and retention time index database generated from plant GC/EI-TOF-MS metabolite profiles. Phytochemistry, 62(6): 887-900. Wahl, S. A., M. D. Haunschild, et al. 2006. Unravelling the regulatory structure of biochemical networks using stimulus response experiments and large-scale model selection. IEE Proceedings Systems Biology, 153(4): 275–85. Wang, Q. Z., C. Y. Wu, et al. 2006. Integrating metabolomics into a systems biology framework to exploit metabolic complexity: strategies and applications in microorganisms. Applied Microbiology and Biotechnology, 70(2): 151–61. Welthagen, W., R. A. Shellie, et al. 2005. Comprehensive two dimensional gas chromatography – time of flight mass spectrometry, GCxGC-TOF for high resolution metabolomics: Biomarker discovery on spleen tissue extracts of obese NZO compared to lean C57BL/6 mice. Metabolomics, 1(1): 65–73. Wendisch, V. F., M. Bott, et al. 2006. Emerging Corynebacterium glutamicum systems biology. Journal of Biotechnology, 124(1): 74–92. Wiechert, W. 2001. C-13 metabolic flux analysis. Metabolic Engineering, 3(3): 195–206. Wiechert, W. 2002. Modeling and simulation: tools for metabolic engineering. Journal of Biotechnology, 94(1): 37–63. Wiechert, W. and R. Takors. 2004. Validation of metabolic models: concepts, tools and problems. In Metabolic Engineering in a Post Genomic Era. H. V. Westerhoff and B. N. Kholodenko, Editors, Horizon Bioscience, Wymondham, England. Winkler, H. 1920. Verbreitung and Ursache der Parthenogenesis in Pflanzen-und Tierreiche Jena, Fischer Verlag. Wittmann, C., J. O. Kromer, et al. 2004. Impact of the cold shock phenomenon on quantification of intracellular metabolites in bacteria. Analytical Biochemistry, 327(1): 135–39. Womersley, C. 1981. A micromethod for the extraction and quantitative-analysis of free carbohydrates in nematode tissue. Analytical Biochemistry, 112(1): 182–89. Wu, L., M. R. Mashego, et al. 2005. Quantitative analysis of the microbial metabolome by isotope dilution mass spectrometry using uniformly 13C-labeled cell extracts as internal standards. Analytical Biochemistry, 336(2): 164–71. Yang, C., Q. Hua, et al. 1999. Development of a kinetic model for L-lysine biosynthesis in Corynebacterium glutamicum and its application to metabolic control analysis. Journal of Bioscience and Bioengineering, 88(4): 393–403. Zimmer, D. 2003. Introduction to quantitative liquid chromatography-tandem mass spectrometry (LC-MS-MS). Chromatographia, 57: S325–32.

20 Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux Analysis in Saccharomyces cerevisiae and Penicillium chrysogenum Wouter A. van Winden Delft University of Technology

20.1 Introduction �� 20-1

Roelco J. Kleijn

20.2 Methods �� 20-3

Delft University of Technology

Walter M. van Gulik Delft University of Technology

Joseph J. Heijnen Delft University of Technology

Metabolic Flux Analysis • 13C-Tracer Experiments

Experimental • Flux Analysis by Whole Isotopomer Modeling Local Node Flux Analysis

20.3 Results and Discussion �� 20-8 Analysis of Fluxes in Glycolysis and Pentose Phosphate Pathway of Saccharomyces cerevisiae by Whole Isotopomer Modeling • Determination of the Pentose Phosphate Pathway Split Ratio in Penicillium chrysogenum by Local Node Flux Analysis

20.4 Conclusions �� 20-13 References �� 20-13

20.1 Introduction 20.1.1 Metabolic Flux Analysis Metabolic reaction rates are an important phenotypic characteristic of microbial cells. These rates (from here on also termed “fluxes”) are a function of the environment and genetic code of the cell. Metabolic flux analysis aims at determining changes of fluxes in response to changes in the environment (e.g., different carbon substrates) or gene deletions or insertions. Metabolic fluxes can only be calculated in case the stoichiometry of the biochemical pathways in which the fluxes are to be determined is known. This stoichiometric information is typically organized in a systematic linear algebraic format to give mass balances of all intracellular metabolites:

dX (t ) = S ⋅ v (t ) dt

(20.1)

20-1

20-2

Tools for Experimentally Determining Flux through Pathways

where X is the vector containing all intracellular metabolite concentrations, S is the time-invariant stoichiometry matrix and v is a vector containing all reaction rates. Metabolic fluxes are mostly determined for cells growing in so-called (pseudo) steady state where the left-hand term of Equation 20.1 is insignificant compared to the fluxes v. In that case, the mass balances become simple linear relations between the fluxes: 0 = S ⋅ v (t )

(20.2)

which can be solved if sufficient substrate uptake and product excretion rates are experimentally determined and if S has full rank. The latter is not always the case. Causes of unobservable intracellular fluxes include: (i) parallel reaction pathways, (ii) bi-directional reactions, and (iii) cyclic reaction pathways. A well-known example of parallel reaction pathway is the glycolysis and the pentose phosphate pathway (PPP) shown in Figure 20.1. Both routes convert glucose-6-phosphate to glyceraldehyde 3-phosphate. One could argue that these two routes are not strictly parallel, since the PPP produces carbondioxide as a side-product and the glycolysis does not. Moreover, the PPP reduces the cofactor NADP+ to NADPH, whereas the glycolysis reduces the cofactor NAD+ to NADH. As the production of CO2 in the PPP goes at the expense of the flux through the glycolysis, flux through the PPP leads to less production of NADH via the glycolysis. Therefore, mass balances of CO2 , NADH, or NADPH could help to resolve the fluxes through both pathways, provided that all remaining sinks and sources of CO2 , NADH, or NADPH are accurately known. The latter is usually not the case: all three metabolites are involved in many intracellular reactions, some of which are hard to quantify (such as anaplerotic carboxylation reactions), and for many of which the exact specificity of the enzymes for NADH or NADPH as a cofactor is not firmly established (especially if the cell contains isozymes with different cofactor specificities). Therefore, in practice the mass balances of these three metabolites can only be used for determining the flux partitioning between the glycolysis and PPP at the risk of large inaccuracies.

CO2 g6p NADP+ NADPH f6p

g3p NAD+ NADH

Figure 20.1 A lumped representation of the glycolytic and pentose phosphate pathways. f6p=fructose 6-phosphate; g3p=glyceraldehyde 3-phosphate; g6p=glucose 6-phosphate.

Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux Analysis

20-3

20.1.2 13C-Tracer Experiments Fluxes through parallel pathways cannot be resolved as a result of their identical overall stoichiometry with respect to metabolites of which complete mass balances can be made. In case the carbon atoms in the common substrate of the parallel pathways end up at different positions in the common end product, depending on the followed route, this difference can be used to resolve the fluxes. To do so, transitions of individual carbons in reaction pathways have to be traced. This is done by labeling individual carbon atoms with the radioactive isotope 14C or the stable isotope 13C. Mass balances of metabolic intermediates are then refined to mass balances of fractions of metabolites with a specific 13C or 14C-labeling pattern (the so-called “isotopomers”). Whereas the biochemical information and experimental data needed for conventional flux analysis are limited to reaction stoichiometry, respectively measured net substrate uptake and product excretion rates, labeling based metabolic flux analysis additionally requires information on carbon transitions in all reactions and measurements of label distributions within metabolites. Although the use of 14C as a tracer has as a large advantage that it can be very sensitively detected, its radioactivity is a major experimental inconvenience. During the past decade, 13C has become by far the most commonly used metabolic tracer isotope as its stability allows its use in normal experimental set-ups. Moreover, the available detection techniques for 13C have the potential to yield more information than those for 14C. If a given molecule has N carbon atoms, it has 2N isotopomers. For 14C, using scintillation counting only the fractional labeling of the N carbon positions of the molecule can be determined. This yields N independent data points on the 2N isotopomer fractions. For 13C, using mass spectrometry (MS), possibly combined with fragmentation of the molecules, or using different applications of nuclear magnetic resonance (NMR) spectroscopy many more independent data points on the isotopomer fractions can be obtained, sometimes sufficient to determine all 2N isotopomer fractions. An example of the isotopomer information that can be obtained using MS and NMR is illustrated in Figure 20.2. The mathematical aspects of setting up and solving isotopomer balances are described in Zupke and Stephanopoulos,1 Schmidt et al.,2 Wiechert et al.,3 and Möllney et al.4 and will not be discussed in detail in this chapter. The focus of this contribution is the description of the background and application of a new, LC-MS based, analytical method for measuring isotopomer distributions of metabolic intermediates. The power of this analytical method is illustrated by two alternative theoretical and experimental strategies aiming at determining the flux partitioning between the glycolysis and PPP in the microorganisms Saccharomyces cerevisiae and Penicillium chrysogenum.

20.2 Methods 20.2.1 Experimental In order to study metabolic fluxes in a microorganism, the organism has to be cultivated under defined and well-controlled conditions. A chemostat cultivation system is highly suitable for this as such a system allows control of the growth rate (and all other rates, which are stoichiometrically coupled to the growth rate) via the substrate feeding rate. In a chemostat labeling experiment, the culture is initially fed with unlabeled medium, until steady state has been reached. At that moment, the unlabeled medium is replaced by chemically identical medium where the carbon substrate has been replaced by 13C-labeled substrate. Doing so, the metabolic steady state is not perturbed, but the 12C atoms in the system are replaced by 13C atoms until a new isotopic steady state has been established, i.e., when the isotopomer fractions do not change anymore. After the medium switch, the 13C-labeled substrate molecules that are taken up by the cell will initially replace the 12C of the free intracellular metabolite pools. These pools typically have turnover times (size divided by the metabolic throughput) in the order of seconds to minutes. When the metabolic

20-4

Tools for Experimentally Determining Flux through Pathways (a)

(b)

ω13C ω1H (c)

(d)

m/z

m/z

Figure 20.2 Four examples of MS and NMR data on the isotopomer distribution of a three carbon compound (a metabolite). The circles denote carbon atoms, open circles 12C, closed circles 13C. (a) 1H-NMR spectrum of the central carbon atom of the metabolite. Protons bound to 12C atom give a singlet peak, protons bound to 13C atom give two doublet peaks. (b) 13C-NMR spectrum of the central carbon atom of the metabolite. Isotopomers containing a central 12C give no signal. A central 13C bound to two 12C gives a singlet peak, a central 13C bound to one 12C and one 13C gives two doublet peaks, a central 13C bound to two 13C gives four double doublet peaks. (c) mass spectrum of the metabolite. Unlabeled isotopomer gives a peak at lowest m/z. Groups of isotopomers with increasing number of 13C give peaks at increasing m/z. (d) mass spectrum of a fragment of a metabolite consisting of last two carbon atoms. The unlabeled fragment isotopomer gives peak at lowest m/z. Groups of fragment isotopomers with increasing number of 13C give peaks at increasing m/z.

intermediates gradually become 13C-labeled, the macromolecular cell components (protein, lipid, polysaccharide, RNA, and DNA) that are synthesized from these precursors will also start accumulating 13C. These macromolecules may undergo turnover (depolymerization and repolymerization), but their monomeric building blocks are generally considered to be synthesized unidirectionally from their precursors in central carbon metabolism. Therefore, replacement of 12C by 13C in these macromolecules takes place mostly by formation of new and washout of old biomass from the chemostat (costing many hours, depending on the reciprocal growth rate). Almost all 13C-labeling studies published thus far present data on isotopomer distributions of the macromolecular cell component protein obtained with GC-MS or NMR. The reason for this is that protein constitutes roughly half of the dry matter in biomass and therefore, gives large measurement signals for a given sampled amount of biomass. Moreover, proteins consist of amino acids, which are synthesized from a large variety precursors originating from the glycolysis, PPP, and TCA cycle.5 Therefore, of all macromolecular cell constituents, proteins contain most (indirect) information on the isotopomer distributions of the intermediates of the central carbon metabolism.

20-5

Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux Analysis

Because of the slow attainment of isotopic steady state of cell proteins, chemostat cultures are fed with 13C-labeled medium during multiple retention times before sampling. For the same reason, this type of 13C-labeling based metabolic flux analysis is principally only suitable for metabolic flux analysis in experimental systems where a metabolic steady state is maintained for a long time; not to monitor metabolic transients. A few studies have recently been published in which the 13C-labeling distribution of central metabolic intermediates is directly assessed by means of LC-MS.6,7 This direct measurement of the 13C-labeling of the intermediates of central carbon metabolism prevents the need to make biosynthetic assumptions on the metabolic origins of the amino acids in protein hydrolysate that are analyzed by GC-MS or NMR. This removes a potential source of errors. For example, depending on the organism under study and experimental conditions the labeling distribution of the amino acid glycine may alternatively stem from serine, threonine, glyoxylate, or from CO2 plus methylenetetrahydrofolate, but no 13C-labeling study published thus far has considered all these options. Because of their much shorter turnover times than cell protein, pools of central metabolic intermediates attain metabolic steady state within minutes and their analysis therefore in principle allows (i) shorter 13C-labeled feed supply when analyzing steady state fluxes in a chemostat system and (ii) the analysis of fluxes in non steady state conditions with time constants of the metabolic changes in the order of tens of minutes to hours. The high turnover rates of the free intracellular metabolites impose an important constraint on the experimental procedure: biomass that is sampled for the analysis of the labeling distribution of intermediates has to be instantaneously quenched (i.e., metabolism has to be stopped). This requires rapid sampling hardware and quenching and processing protocols that allow the extraction of intracellular metabolites under conditions where cell metabolism remains inactivated. Figure 20.3 summarizes the 13C-labeled medium

Sample rapidly 1 ml broth Boil 5 ml 75% v/v ethanol/water (>95°C) 3 min

Chemostat Air Fermentation data (uptake, growth and production rates

Mix with 5 ml 60% v/v methanol/water in cryostat (–40°C) Waste

Estimate metabolic fluxes based on fermentation data and isotopomer distributions of metabolites

Centrifuge twice (–20°C, 2,000g, 5 min)

Analyze isotopomer distribution of metabolites with LC-MS

Resuspend pellet in 0.5 ml Milli-Q H2O, centrifuge (3,000g, 5 min) and freeze supernatant –80°C

Dry under vacuum in rapidvap (110 minutes for yeast, 45 min for penicillium)

Figure 20.3 Summary of entire 13C-labeling experiment including the fermentation, the sampling, and sample processing protocol, LC-MS analysis of isotopomer distributions of intracellular metabolites and the data analysis.

20-6

Tools for Experimentally Determining Flux through Pathways

procedure. The analytical platform used in the studies presented in this chapter is HPLC coupled to a triple quadrupole mass spectrometer. The employed HPLC method is anion exchange (AEX), which chromatographically separates intracellular sugar phosphates and organic acids. The interface with the MS is electrospray ionization (ESI) operated in negative mode. In order to maximize the sensitivity, the mass spectrometer is operated in single ion recording (SIR) mode. For further details see Van Winden et al.6 and Kleijn et al.7 and references therein.

20.2.2 Flux Analysis by Whole Isotopomer Modeling Any type of measurement of 13C-label distributions (be it NMR, LC-MS. GC-MS) in any type of cell component (be it excreted products, intracellular metabolites, cell protein) contains information on the metabolic flux distribution. However, with the exception of fractional enrichments (see e.g. Marx et al.8) the relations between the metabolic fluxes and the measurement data are non-linear. As a consequence, in all except for very simple metabolic networks the fluxes cannot be expressed as an explicit function of the data. 13C-labeling experiments typically yield much data. Often, this data is (partly) redundant, i.e. it overlaps in terms of information content. An obvious example is the measurement of the 13C-labeling distributions of the amino acids phenylalanine and tyrosine in protein hydrolysate. Both these amino acids are formed from the same precursors of central carbon metabolism: two molecules of phosphoenol pyruvate and a molecule of erythrose 4-phosphate. In less trivial cases, molecules of which the labeling distribution is measured only share fragments of common precursors. One would like to include all available data, even if it is redundant, to improve the statistics of the estimated fluxes. The above two challenges are solved by whole isotopomer modeling. In this approach, a flux set is chosen based on which all isotopomers of all metabolites in the studied metabolic network are simulated. As the isotopomer distributions contain the maximal amount of information on 13C-labeling distributions in the network, the output of any kind of (partly redundant) measurement can be derived from these. The difference between these simulated data and the actual measurements is the objective function of an estimation procedure that searches for the flux set that minimizes the objective function. Examples of this approach have been reported by Van Winden et al.,6 Schmidt et al.,9 Petersen et al.,10 and Dauner et al.11 The whole isotopomer modeling approach has three important disadvantages, which are • The network of which the isotopomer distributions are to be simulated must be complete. Any metabolite that is converted to a component of the studied network influences the isotopomer distribution of that component (and of reaction products thereof) and therefore has to be included in the studied network. Genome-wide isotopomer models have not been published to date. Therefore, all current state-of-the-art models are, by necessity, to some extent simplified or incomplete. The omitted components and reactions may cause inaccuracies in the estimated fluxes. • As a result of the above, network models of which the isotopomers are to be simulated often contain tens of metabolites and will contain hundreds when genome-scale models become the standard. As explained above, a metabolite with N carbon atoms has 2N isotopomers. Consequently, whole isotopomer models typically contain many hundreds to thousands of isotopomer fractions (i.e., state variables) that have to be recalculated for hundreds of parameter sets during the flux estimation. This is computationally demanding. • Since all simulated data are simultaneously fitted to the measurements and are interrelated via complex dependencies on the metabolic fluxes, a single systematic measurement error or a model error propagates to the estimated fluxes (far) beyond the part of the network model where the error itself is located.

Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux Analysis

20-7

20.2.3 Local Node Flux Analysis A whole isotopomer model is the complete set of balances of each isotopomer fraction of each individual metabolite pool in the studied network. While the previously described method relies on solving all these balances simultaneously, in special cases the 13C-labeling balances of individual metabolite pools can be solved in isolation. This is the case when all labeling fractions in the given balance are measured. In such a case, a ratio of multiple fluxes flowing into a single metabolite pool can be calculated. A simple example is given below: v1 2  C ←v B A →

↓

v3

v1 ⋅ IDVA + v 2 ⋅ IDVB = v 3 ⋅ IDVC

giveen metabolic steady state

=

(v1 + v 2) ⋅ IDVC ⇔

(20.3)

v1 IDVB - IDVC =v2 IDVA - IDVC where IDV denotes “isotopomer distribution vector.” In practice, no experiment yields complete IDVs. However, some experiments yield linear transformations of IDVs, whereby the linear transformations of Equation 20.3 can be solved. Two examples are

i00  i  given IDVi =  01  where i = A,B B,C = two carbon compound i10     i11 

( I)

mass spectroscopic measurements :

 f i ,0  1   MDVi =  fi ,1  = TMS ⋅ IDVi where TMS = 0  fi ,2  0

0 1 0

0 1 0

0 0  1 

(20.4)

MDVB - MDVC v1 T ⋅ IDVB - TMS ⋅ IDVC == - MS MDVA - MDVC v2 TMS ⋅ IDVA - TMS ⋅ IDVC where MDV denotes “mass distribution vector” and fi,x is the fraction of metabolite i with a m/z of the unlabeled metabolite plus x.

(II)

fractional enrichment measurements:

f  0 FEVi =  i1  = TFE ⋅ IDVi where TFE =  f  i2  0

0 1

v1 T ⋅ IDVB - TFE ⋅ IDVC FEVB - FEVC = - FE =v2 TFE ⋅ IDVA - TFE ⋅ IDVC FEVA - FEVC

1 0

1 1

(20.5)

20-8

Tools for Experimentally Determining Flux through Pathways

where FEV denotes “fractional enrichment vector” and fi,x is the fractional enrichment of carbon position x of metabolite i. An example of case I is given in Fischer and Sauer12 and an example of case II is given in Petersen et al.10 13C-NMR multiplet data can be used for the same purpose, but this requires a prior conversion of the NMR relative intensities to fractions of intact substrate fragments in the studied metabolites, the explanation of which is beyond the scope of this contribution. Details on this method are given in a publication by Szyperski.13 Szyperski’s method for local node flux analysis has found application in a flux analysis platform called “Metafor.”14 The local node flux analysis does not suffer from any of the disadvantages of the whole isotopomer modeling approach that were listed at the end of the previous section. This makes the local node approach the preferred option in case one is only interested in the flux ratio at a specific metabolic node, especially if the remaining network model is incomplete. At the other hand, if one is interested in the general flux distribution in a larger network, the local node analysis has the disadvantage that it does not allow the reconciliation of partially overlapping data. Often, only a small part of the available 13C-labeling information is used in the local node flux analysis. Moreover, in case not all required 13C-labeling data needed to calculate the balance (e.g., the MDVs in Equation 20.4 or the FEVs in Equation 20.5) is available, the concerning labeling data are often inferred from those of metabolites up or downstream of the node. Not seldom, such inference relies on assumptions, which can invalidate the outcome of the local node flux analysis. As an example, Petersen et al.10 compared flux ratios obtained by whole isotopomer modeling with those calculated by local node analysis. The estimates of the fraction of oxaloacetate stemming from phosphoenol pyruvate yielded by both methods differed by a factor of 2.6. This was explained by the fact that, in contrast to the whole isotopomer modeling approach, in the local node analysis it had to be assumed that no oxaloacetate was reversibly converted to fumarate, a symmetrical molecule of which the labeling of the first and second positions is fully scrambled with the fourth and third positions.10

20.3 Results and Discussion 20.3.1 Analysis of Fluxes in Glycolysis and Pentose Phosphate Pathway of Saccharomyces Cerevisiae by Whole Isotopomer Modeling All glucose taken up from the medium by the microorganisms S. cerevisiae and P. chrysogenum is processed either via the glycolysis or via the PPP (Figure 20.1). As discussed in the introduction, both pathways yield glyceraldehyde 3-phosphate but differ in the produced cofactors: whereas the glycolysis yields more ATP and NADH (which is reoxidized to NAD+ via oxidative phosphorylation to form more ATP), the PPP is an important source of NADPH (which is reoxidized to NADP+ in many anabolic processes). Therefore, the partitioning of the fluxes over these two pathways is an important phenotypic characteristic. The partitioning of the fluxes over the glycolysis and the PPP in S. cerevisiae grown in an aerobic, carbon limited chemostat culture at a growth rate of 0.1 hr −1 was determined by means of the whole isotopomer modeling approach in which the fluxes were estimated by fitting the simulated mass isotopomer distributions of ten intermediates of the two pathways to mass isotopomer distributions that were measured by means of LC-MS.6 The simulated metabolic network model included an extended version of the traditional representation of the PPP, as proposed by Van Winden et al.15 The applied substrate labeling was a combination of 100% 1-13C1 labeled glucose and 100% u-13C2 labeled ethanol. As explained in Section 20.2, when one directly assesses the labeling state of the intermediates of the central carbon metabolic pathways, their high turnover rate allows for relatively short 13C-labeled feed supply. Whereas in typical labeling studies, labeled feed is supplied for three to four times the reciprocal growth rate (i.e., 30–40 hr in this case) to ensure isotopic steady state of all cell components, in this study the labeled feed supply was limited to only 1 hr.

Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux Analysis

20-9

The measured and fitted mass isotopomer distributions are given in Table 20.1. It can be seen that the measured mass isotopomer distributions are well fitted by the model. Given the standard deviations of the measurements the fit is statistically accepted. The metabolic fluxes that yielded the minimal deviation between the measured and simulated mass isotopomer distributions are shown in Figure 20.4. It can be seen that the whole isotopomer modeling approach yields all fluxes in the metabolic network, Table 20.1 Simulated and Measured Mass Isotopomer Fractions of Intermediates of Glycolysis and Pentose Phosphate Pathway in S. Cerevisiae Measured Compound and Mass Fraction

Measured

Fitted

G6P (M = 259)

M + 0 M + 1 M + 2 M + 3

0.192 ± 0.015 0.697 ± 0.013 0.092 ± 0.010 0.020 ± 0.002

0.193 0.696 0.092 0.017

G1P (M = 259)

M + 0 M + 1 M + 2

0.332 ± 0.027 0.610 ± 0.021 0.059 ± 0.016

0.313 0.592 0.080

F6P (M = 259)

M + 0 M + 1 M + 2 M + 3 M + 4

0.202 ± 0.033 0.684 ± 0.038 0.092 ± 0.012 0.021 ± 0.001 0.001 ± 0.001

0.197 0.689 0.095 0.017 0.002

F16BP (M = 339)

M + 0 M + 1 M + 2 M + 3 M + 4

0.350 ± 0.027 0.496 ± 0.026 0.127 ± 0.015 0.023 ± 0.002 0.004 ± 0.001

0.351 0.498 0.130 0.018 0.003

2/3PG (M = 185)

M + 0 M + 1 M + 2 M + 3

0.628 ± 0.016 0.343 ± 0.016 0.022 ± 0.001 0.007 ± 0.000

0.632 0.345 0.018 0.005

PEP (M = 167)

M + 0 M + 1 M + 2 M + 3

0.631 ± 0.020 0.338 ± 0.021 0.024 ± 0.002 0.007 ± 0.001

0.634 0.345 0.017 0.004

6PG (M = 275)

M + 0 M + 1 M + 2 M + 3

0.195 ± 0.006 0.698 ± 0.006 0.096 ± 0.004 0.011 ± 0.006

0.192 0.695 0.093 0.018

P5P (M = 229)

M + 0 M + 1 M + 2 M + 3

0.758 ± 0.012 0.202 ± 0.012 0.037 ± 0.004 0.003 ± 0.003

0.756 0.202 0.038 0.004

E4P (M = 199)

M + 0 M + 1 M + 2

0.866 ± 0.005 0.123 ± 0.005 0.012 ± 0.003

0.862 0.119 0.017

S7P (M = 289)

M + 0 M + 1 M + 2 M + 3

0.483 ± 0.013 0.395 ± 0.014 0.106 ± 0.007 0.016 ± 0.001

0.480 0.398 0.101 0.019

Measured data were determined from a sample taken after 1 hr supply of 13Clabeled feed, average and standard deviations of five repeated injections.

20-10

Tools for Experimentally Determining Flux through Pathways Storage oligo/ polysaccharide (26) g1p 26

glc

100 26 (105) g6p

3

24

p5p

p5p

50 (>1000) f6p 63 fbp

8 (0)

63 (194) 13

1 (5)

6 (1)

7 (2)

f6p 148 24

e4p 2

g3p

8

s7p

g3p

>1000 e4p 0

s7p

119

Toward lower glycolysis/TCA cycle

Figure 20.4 The metabolic fluxes of S. cerevisiae growing in a carbon-limited, aerobic chemostat culture at D = 0.1 hr −1 as determined by fitting the mass isotopomer fractions measured by LC-MS in biomass sampled 60 minutes after the shift to 13C-labeled medium. Values outside brackets represent net fluxes; the direction of the positive net flux being defined by the solid arrowhead. The values within brackets present exchange fluxes. Doubleheaded arrows indicate reversible fluxes, the solid arrowhead defines the direction of net flux. The sub-network on the right-hand side forms part of the network on the left-hand side and is shown separately for visual clarity. e4p = erythrose 4-phosphate; fbp = fructose 1,6-bisphosphate; f6p = fructose 6-phosphate; glc = glucose; g1p = glucose 1-phosphate; g3p = glyceraldehyde 3-phosphate; g6p = glucose 6-phosphate; p5p = pentose 5-phosphate; s7p = sedoheptulose 7-phosphate. (Reprinted with permission from Van Winden, W.A. et al., FEMS Yeast Res., 5, 559, 2005.)

including the reversibility of bidirectional reactions. The available data on the majority of the intermediates of the two studied pathways do allow precise quantification of these reversibilities, whereas GC-MS or NMR data on the 13C-labeling of proteinogenic amino acids and storage carbohydrates (e.g., trehalose) would not have yielded any information on the labeling of the intermediates fructose 6-phosphate, fructose 1,6-bisphosphate, 6-phosphogluconate, and sedoheptulose 7-phosphate, since none of these serves as a precursor of a biosynthetic pathway. The LC-MS data/whole isotopomer modeling based estimation of the metabolic fluxes of S. cerevisiae illustrates one of the disadvantages of the whole isotopomer modeling approach, namely that all fluxes in the network can be affected by a local error or uncertainty. In this case, the uncertainty was related to the short supply of 13C-labeled feed to the culture. When comparing the mass isotopomer fractions of glucose 1-phosphate (g1p) measured in a sample taken after 1 hr of 13C-labeled feed supply (given in Table 20.1) with those measured in a sample taken after 0.67 hr of 13C-labeled feed supply (not shown), it was observed that the g1p pool had not yet reached isotopic steady state after 1 hr of labeling. This was possibly due to the slow turnover of a large intracellular pool of unlabeled glycogen. The possible inflow of unlabeled g1p stemming from glycogen was taken into account as an extra degree of freedom in the flux fit of which the results are shown in Figure 20.4. When analyzing the 95% confidence interval of the determined PPP split ratio of 24%, it was found that this interval (10–45%, note: different from the interval reported in Ref. 6, recalculated as proposed in Ref. 7) was considerably enlarged by the free influx of unlabeled hexose into the g1p pool. This can be understood by realizing that the oxidative branch of the PPP specifically splits off the 13C-labeled carbon of 1-13C1-glucose as 13CO2. Thus, the PPP decreases

Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux Analysis

20-11

the amount of 13C-label in the system, the effect of which could in this case be mimicked by the inflow of unlabeled hexose into the system via the g1p pool.

20.3.2 Determination of the Pentose Phosphate Pathway Split Ratio in Penicillium chrysogenum by Local Node Flux Analysis In P. chrysogenum the PPP split ratio is an especially relevant physiological parameter, since this fungus has found large-scale industrial application for the production of its secondary metabolite penicillin, which creates an important demand for NADPH in addition to the NADPH demand for biomass synthesis. The PPP plays an essential role in meeting this demand. Kleijn et al.7 applied the same whole isotopomer modeling approach as discussed in the previous section to determine the PPP split ratio in an aerobic glucose-limited chemostat culture of P. chrysogenum, growing at 0.02 hr −1 and producing 0.48 mmol penicillin G/Cmol biomass/hr. Based on the above observation that the fitted turnover rate of unlabeled storage carbohydrate into the g1p pool caused a large 95% confidence interval of the estimated split ratio of S. cerevisiae, Kleijn et al. ensured complete labeling of all cell components by supplying 13C-labeled feed (60% 1-13C1-glucose, 20% u-13C6-glucose and 100% u-13C2-acetate) during 150 hr (i.e., three residence times) before sampling for 13C-analysis. By fitting the measured mass isotopomer distributions of glycolytic and PPP intermediates (not shown here) they estimated the PPP split ratio to be 51% with a 95% confidence interval of 40–64%. In order to find out whether the metabolism-wide data reconciliation of the whole isotopomer modeling approach goes at the expense of the accuracy of flux ratios determined at a specific metabolic node, Kleijn et al. also determined the PPP split ratio in the same culture of P. chrysogenum by a local node flux analysis that focussed at the distribution of fluxes around the glucose-6 phosphate (g6p) node. Referring to Equation 20.3 it should be stressed that local node analysis is excellently suited for analyzing fluxes around convergent nodes (i.e., metabolite pools that have more than one influx) but not around divergent flux nodes (having multiple effluxes)16. Unfortunately, the PPP split ratio represents the partitioning of the two fluxes leaving the divergent g6p node (Figure 20.5). That is why existing local node analysis approaches such as Metafor never directly determine the split ratio, but instead give flux parameters that represent an upper bound of the flux through the PPP, such as the fraction of phosphoenol pyruvate that is formed by at least one action of the transketolase enzyme in the nonoxidative branch of the PPP.14

glc

gln v3

v1 g6p

v2

6pg

PPP

Glycolysis vx

Figure 20.5 The divergent g6p node has an inflow from glc and outflows toward glycolysis and PPP. The lin ear 6pg node becomes a convergent node by adding an additional inflow of gln (within dotted circle). glc=glucose; gln=gluconate; g6p=glucose 6-phosphate; 6pg=6-phosphogluconate.

20-12

Tools for Experimentally Determining Flux through Pathways

For this reason, Kleijn et al. co-fed glucose and a small amount (5% on Cmol basis) of gluconate to the mentioned culture of P. chrysogenum to create an additional influx into the 6-phosphogluconate (6pg) pool, which changed this linear node to a convergent node (Figure 20.5). By feeding a combination of unlabeled glucose and u-13C 6 -gluconate, the 13C-labeling of 6pg became a function of the inflow of “unlabeled” molecules with flux v2 and the inflow of u-13C-labeled molecules with flux v3. Note that due to the possible recycling of PPP intermediates to g6p (indicated by flux vx in Figure 20.5), some of the 13C-label entering the metabolism via 6pg also reaches the g6p pool, hence the molecules entering the 6pg pool with flux v2 are not strictly unlabeled. This recycling is taken into account when measuring the mass isotopomer distributions of the fed labeled gluconate and of the intracellular metabolites g6p and 6pg with LC-MS and applying Equation 20.4 to estimate the ratio of the fluxes v2 and v3 in Figure 20.5. The measured mass isotopomer distributions of the three metabolites are given in Table 20.2. From the measured uptake rates of glucose (v1) and gluconate (v3), the PPP split ratio (v2/v1) was determined to be 51%, in perfect agreement with the whole isotopomer modeling based flux balancing. As expected, the local node analysis increased the accuracy of the determined split ratio; in this study the 95% confidence interval was found to be 48–56%. The local node flux analysis thus indeed proved to be a more accurate method to determine the PPP split ratio. The presented method requires the measurement of the 13C-labeling of metabolites immediately surrounding the specific node, for which LC-MS analysis of intracellular metabolites is excellently suited, as it does not limit the available 13C-labeling information to that of metabolites that serve as precursors of storage carbohydrates and protein. The method is generally applicable to other metabolic nodes. It

Table 20.2 Measured Mass Isotopomer Fractions of Intermediates around the 6pg Node in P. Chrysogenum Measured Compound and Mass Fraction

Measured

G6P (M = 259)

M + 0 M + 1 M + 2 M + 3 M + 4 M + 5 M + 6

0.866 ± 0.003 0.094 ± 0.002 0.017 ± 0.002 0.015 ± 0.000 0.007 ± 0.000 0.001 ± 0.000 0.001 ± 0.000

Gln (M = 195)

M + 0 M + 1 M + 2 M + 3 M + 4 M + 5 M + 6

0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.000 ± 0.000 0.002 ± 0.000 0.059 ± 0.001 0.939 ± 0.001

6PG (M = 275)

M + 0 M + 1 M + 2 M + 3 M + 4 M + 5 M + 6

0.786 ± 0.003 0.082 ± 0.003 0.015 ± 0.001 0.011 ± 0.000 0.016 ± 0.003 0.008 ± 0.001 0.081 ± 0.003

Data were determined in a sample taken after 150 hr (three residence times) supply of 13C-labeled gluconate tracer, average and standard deviations of five repeated injections of two independent samples.

Use of AEX-HPLC-ESI-MS for 13C-Labeling Based Metabolic Flux Analysis

20-13

should however be kept in mind that if a new inflow of 13C-tracer is created for the purpose of analyzing the fluxes, the following preconditions have to be met: • The carbon substrate and the co-fed 13C-tracer must be simultaneously consumed by the microorganism under investigation • The uptake rate of the co-fed tracer must be accurately measurable • The cofeeding must be small enough not to disturb the metabolism and thus create an artefactual flux analysis In their study, Kleijn et al.7 created a new inflow of gluconate as a tracer. In agreement with the former two preconditions, they observed and quantified the simultaneous uptake of glucose and gluconate. Moreover, they compared macroscopic physiological parameters (production rates of biomass and penicillin) and intracellular metabolite concentrations both in the absence and in the presence of gluconate to verify that the metabolism was not affected.

20.4 Conclusions The 13C-labeling technology has become an established tool to complement the metabolic flux balancing approach for analysis of intracellular metabolic fluxes. The recently presented method of using AEX-HPLC-ESI-MS for measuring mass isotopomer distributions of metabolic intermediates has been successfully applied to analyze fluxes in the glycolysis and pentose phosphate pathway of the microorganisms S. cerevisiae and P. chrysogenum. This analytical method yields 13C-labeling information on more metabolites than the conventional NMR and GC-MS methods that indirectly yield 13C-labeling information of metabolites that serve as precursor for the biosynthesis of storage carbohydrates and proteins of which the labeling is actually measured. The direct measurement of the labeling patterns of metabolic intermediates also allows much shorter 13C-labeled feed supply to the studied microorganisms and in principle allows the analysis of fluxes in dynamic metabolic systems with time constants in the order of tens of minutes to hours. However, when applying a short labeled feed supply to S. cerevisiae it was found that turnover of large pools of intracellular storage compounds may complicate time-resolved flux analysis. The measured mass isotopomer fractions can be used for metabolic flux analysis in two alternative approaches: whole isotopomer modeling based flux analysis and local node flux analysis. In this chapter the former method was applied to both S. cerevisiae and P. chrysogenum and the latter approach to P. chrysogenum. It was found that the whole isotopomer modeling approach yields information on fluxes in a network of arbitrary size, including exchange fluxes of bi-directional reactions, and allows the reconciliation of (partly) overlapping 13C-labeling information. The drawback of this approach is the sensitivity of many fluxes to localized measurement or model errors and uncertainties. This point was illustrated with the example of an unknown influx of unlabeled storage carbohydrate that negatively affected the accuracy of the PPP flux estimate for S. cerevisiae. The local node flux analysis was shown to yield a more accurate estimation of the partitioning of the fluxes between the glycolysis and PPP. This approach is, however, node specific and can only be applied to convergent nodes, unless a linear or divergent node can be made convergent by creating an additional influx. The latter was done for the presented case of P. chrysogenum. The strict physiological preconditions under which this approach is to be applied, were indeed met in the presented case.

References 1. Zupke, C. and Stephanopoulos, G. Modelling of isotope distributions and intracellular fluxes in metabolic networks using atom mapping matrices. Biotechnol. Progr., 10, 489, 1994. 2. Schmidt, K. et al. Modelling isotopomer distributions in biochemical networks using isotopomer mapping matrices. Biotechnol. Bioeng., 55, 831, 1997.

20-14

Tools for Experimentally Determining Flux through Pathways

3. Wiechert, W. et al. Bidirectional reaction steps in metabolic networks: III. Explicit solution and analysis of isotopomer labelling systems. Biotechnol. Bioeng., 66, 69, 1999. 4. Möllney, M. et al. Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol. Bioeng., 66, 86, 1999. 5. Maaheimo, H. et al. Central carbon metabolism of Saccharomyces cerevisiae explored by biosynthetic fractional 13C labelling of common amino acids. Eur. J. Biochem., 268, 2464, 2001. 6. Van Winden, W.A. et al. Metabolic flux analysis of Saccharomyces cerevisiae CEN.PK113-7D based on mass isotopomer measurements of 13C-labeled primary metabolites. FEMS Yeast Res., 5, 559, 2005. 7. Kleijn, R.J. et al. 13C-Labeled gluconate tracing as a direct and accurate method for determining the pentose phosphate pathway split ratio in Penicillium chrysogenum. Appl. Env. Microbiol., 72, 4743, 2006. 8. Marx, A. et al. Response of the central metabolism of Corynebacterium glutamicum to different flux burdens. Biotech. Bioeng., 56, 168, 1997. 9. Schmidt, K. et al. Quantification of intracellular metabolic fluxes from fractional enrichment and 13C-13C coupling constraints on the isotopomer distribution in labeled biomass components. Metabol. Eng., 1, 166, 1999. 10. Petersen, S. et al. In vivo quantification of parallel and bidirectional fluxes in the anaplerosis of Corynebacterium glutamicum. J. Biol. Chem., 275, 35932, 2000. 11. Dauner, M. et al. Metabolic flux analysis with a comprehensive isotopomer model in Bacillus subtilis. Biotechnol. Bioeng., 76, 144, 2001. 12. Fischer, E. and Sauer, U. Metabolic flux profiling of Escherichia coli mutants in central carbon metabolism using GC-MS. Eur. J. Biochem., 270. 880, 2003. 13. Szyperski, T. Biosynthetically directed fractional 13C-labelling of proteinogenic amino acids. Eur. J. Biochem., 232, 433, 1995. 14. Sauer, U. et al. Metabolic flux ratio analysis of genetic and environmental modulations of Escherichia coli central carbon metabolism. J. Bact., 181, 6679, 1999. 15. Van Winden, W.A. et al. Possible pitfalls of flux calculations based on 13C-labeling. Metabol. Eng., 3, 151, 2001. 16. Van Winden, W.A. et al. A priori analysis of metabolic flux identifiability from 13C-labeling data. Biotechnol. Bioeng., 74, 505, 2001.

Future Applications of Metabolic Engineering

VI

Brian F. Pfleger University of Wisconsin

21 Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes Bruno Bühler, Lars M. Blank, Birgitta E. Ebert, Katja Bühler, and Andreas Schmid....................................................................................................................21-1 Introduction • Microbial Energetics and Biotechnological Applications • Biological Energy Issues in Fermentation Process Examples • Biological Energy Issues in Whole-Cell Oxyfunctionalization Process Examples • Conclusion

22 Microbial Biosynthesis of Fine Chemicals: An Emerging Technology Zachary L. Fowler and Mattheos Koffas........................................................................................... 22-1 Introduction • Flavonoids: A Natural Medicine • Isoprenoids • Specialized Fine Chemicals from Microorganism Biosynthesis

23 Applications of Metabolic Engineering for Natural Drug Discovery Yi Tang, Suzanne Ma, and Wladyslaw A. Wojcicki........................................................................ 23-1 Introduction • Antimicrobial Drugs • Anticancer Drugs • Cholesterol Lowering Statins

24 Metabolic Engineering for Alternative Fuels Yandi Dharmadi and Ramon Gonzalez................................................................................................................. 24-1 Introduction • Ethanol Production Process and Metabolic Engineering Opportunities Feedstock Engineering • Cellulase Engineering • Biocatalyst Engineering Consolidated Bioprocessing • Emerging Biofuel Platforms • Conclusions and Future Outlook

VI-1

VI-2

I

Future Applications of Metabolic Engineering

n the past two decades, metabolic engineering has emerged as a promising mechanism for enhancing biological production of small molecules. Living organisms have long been a source of food, fuel, therapeutics, and other valuable chemicals. Early breeding techniques enabled domestication of wild plants and animals and allowed human populations to flourish. Today, novel genetic methods based on recombinant DNA technology promise to accelerate the construction and selection of highly productive organisms. In the coming decades, scientists will use these methods to engineer a sustainable existence for generations to come. Three areas of particular concern with regard to long-term sustainability are the health of the human population, the generation of energy, and the production of chemicals. Metabolic engineering has begun to address each and will continue to provide solutions in the future. One of the greatest ongoing challenges faced by society is the fight against human disease. Eradication of infectious diseases in the developed world (e.g., polio, tuberculosis) is one of society’s greatest accomplishments. The use of vaccines and antibiotics to fight infections has lengthened the human lifespan and enabled individuals to lead fuller, more productive lives [11]. Despite these triumphs, new therapeutics are needed to confront the limitations of today’s medicines and the challenges of tomorrow’s diseases. For example, infectious disease is very much a part of daily life in the developing world, yet economics often prevent delivery of medicine to the patients who need it most. Reducing the cost of therapeutics by developing new drug production methods can save and/or enhance the quality of many lives. Outside the developing world, the spread of antibiotic resistance (e.g., methicillin-resistant Staphylococcus aureus, MRSA) is once again making the fight against infectious disease a significant scientific problem [6,8]. For these reasons and others, the search for new treatments against infectious and systemic disease will continue to be a top priority for researchers around the world. Metabolic engineering will play a large role in the success of these endeavors. Metabolic engineering has already impacted several areas of medicine. Chapter 23 illustrates how metabolic engineering is well suited to develop production schemes for biological molecules whose chemical syntheses are too costly or whose supplies are too limited. A prominent example is the production of Taxol® (paclitaxel), a potent β-tubulin inhibitor naturally produced in the bark of the Pacific Yew (Taxus brevifolia). Insufficient yields from limited native sources (1 mg/750 kg bark) prompted metabolic engineering efforts aimed at complete Taxol biosynthesis. Even when source material is plentiful, yields of natural products can be low and require expensive purifications that increase production costs. For example, work to engineer Escherichia coli and Saccharomyces cerevisiae for the production of artemisinin, an antimalarial terpene, was driven by a desire to reduce production costs associated with extracting the compound from crops of Artemisia annua [21,32]. A major challenge to producing known therapeutics, like Taxol and artemisinin, is identifying essential biosynthetic genes from native organisms. This challenge is being addressed by genome sequencing projects as well as genome annotation and characterization efforts. In addition to producing natural products, metabolic engineering is playing a role in the search for new therapeutics. Most antibiotics are classified as polyketides, nonribosomal peptides, or hybrids of the two. The modular nature of assembly used to generate these compounds has become the basis of combinatorial biosynthesis strategies. Strains of bacteria expressing engineered megasynthases, multidomain enzymes that function as chemical assembly lines, are used to generate libraries of novel scaffolds [22,28]. These libraries are further diversified via coexpression of tailoring enzymes, which add biologically-important functional handles (e.g., sugars, prenyl groups, acyl chains, halogens, hydroxyl groups) to core scaffolds [28]. High-throughput screens, developed with or without metabolic engineering, are used to identify strong candidates for further chemical or biological modification. Once identified, sufficient amounts of lead compounds can be produced for clinical evaluation by engineering production in tractable hosts such as E. coli or Streptomyces, as has been accomplished for erythromycin precursors [15,17,19]. A challenge to implementing these strategies lies in determining the underlying principles that govern megasynthase structure and assembly. Comprehension of how neighboring modules interact and pass compounds to subsequent stages of the assembly line will permit the construction

Future Applications of Metabolic Engineering

VI-3

of a megasynthase for any desired compound of this class. Development of these techniques will facilitate the discovery process and enhance production of therapeutics to fight tomorrow’s diseases. Metabolic engineering is also being utilized by synthetic biologists to develop new drug delivery strategies. Synthetic biology [3], focuses on designing new biological systems from the assembly of biological parts. For example, many bacteria express proteins that confer the ability to adhere to and invade mammalian tissues. Work by Anderson et al. demonstrated the ability to engineer a bacterium to invade mammalian cells in response to specific physiological conditions (cell density, hypoxia, presence of arabinose) by integrating several biological components [2]. Future metabolic engineering efforts could combine the ability to selectively invade a diseased cell with the ability to produce a potent therapeutic. The end product would be an organism capable of seeking out, invading, and killing a diseased cell through the action of a locally-produced therapeutic. This example demonstrates how synthetic biology can expand the scope of metabolic engineering beyond industrial and specialty chemical production. Recently, sustainable energy production has become a major concern for scientists, government officials, and the public. In the past four years (Jan 2004–Jan 2008), the price of oil has nearly tripled, reaching all-time highs that top $100 per barrel [33]. For more than a decade, U.S. petroleum imports have exceeded domestic production, and now imports surpass production by more than two to one [33]. Simultaneously, concerns regarding fossil fuel supply and global climate change have strengthened the call for improved alternative energy sources. Metabolic engineering is already playing a role in addressing these concerns through the production of biofuels, chiefly ethanol. The production of ethanol from sugar cane in Brazil has become viable economically and now serves as a model for reducing dependence on foreign oil [18]. Similar efforts are underway in the U.S. and across the world using crops such as corn and beets. However, research has shifted towards cellulosic feedstocks [12] such as corn stover, poplar, and switchgrass because corn-based ethanol has already had an impact on food prices (corn prices are up >150% per bushel between 2005 and 2008) [31]. Chapter 24 describes the progress made to date and the remaining challenges in bringing cellulosic ethanol to market. Ethanol, however, is not the only fuel microorganisms can be engineered to produce and is not necessarily the ideal fuel for long-term sustainable energy generation. Production of longer chain alcohols (e.g. butanol), which have higher energy yields and lower water solubility than ethanol, has been demonstrated in engineered solventogenic bacteria (Clostridia sp.) [10] and E. coli [5]. Biodiesel, a mixture of acyl-esterified fatty acids, is currently the second leading biofuel behind ethanol because of its compatibility with existing engines and its improved efficiency relative to gasoline in internal combustion engines. Biodiesel, which is typically made by transesterifying fats or oils with methanol, has been produced by a metabolically engineered strain of E. coli [13]. In this process, ethanol generated via fermentation is esterified with exogenous fatty acids by a heterologously-expressed acyltransferase (ADP1) from Acinetobacter baylyi [13]. While current strains are not capable of producing biodiesel on an industrial scale, future metabolic engineering efforts may open the door to commercial production of cellulosic biodiesel. Other microbially-produced hydrocarbons including methane (methanogens), isoprene (Bacillus subtilis), C25-C31 olefins (microalgae), and C30-C37 terpenes (microalgae) have also attracted attention as biofuel candidates [7,26]. Metabolic engineers can develop production strategies for these hydrocarbons by either engineering native producers and/or heterologously expressing biosynthetic pathways in tractable hosts [24]. Sustainable production of liquid biofuels, whether alcohol or hydrocarbon, will require the use of cellulosic biomass as a feedstock. The challenge to this process is a complicated combination of discovering, expressing, and secreting enzymes capable of accessing the web of cellulose and hemicellulose fibers and efficiently converting them into fermentable sugars. The conversion of sugars to fuels is a better understood process, but is still limited to those fuels whose biosynthese are currently known. Continued study of fuel producing organisms, through genome sequencing and biochemical characterization, will expand the list of potential fuels until the challenges of breaking down cellulosic biomass are solved. Metabolic engineers are also investigating pathways in phototosynthetic bacteria in order to bypass the complex issues surrounding biomass generation and breakdown. Hydrogen is a potential gasoline

VI-4

Future Applications of Metabolic Engineering

alternative because it can be used to power fuel cells [9]. This approach to fuel production is environmentally attractive because energy will be produced indirectly from sunlight, without the emission of greenhouse gases. Photoheterotrophic bacteria that can utilize hydrogen as a terminal point in electron transfer are the likely targets of metabolic engineering efforts to enhance production [16]. In addition to producing hydrogen, some phototrophic bacteria fix carbon dioxide to generate larger organic molecules that could be biosynthetically converted to biofuels. While ideal, engineering photosynthetic bacteria is not commonplace, and new metabolic engineering tools are required for these organisms. Alternatively, genes essential for photosynthesis and carbon fixation could be engineered into a more genetically tractable organism such as E. coli [27]. While promising, either choice will require significant research and engineering investments before commercial processes can be developed. In addition to the more apparent energy concerns, the rising price of oil is impacting the cost of other petrochemicals generated from fossil fuels. As a result, metabolic engineering is beginning to play a larger role in producing specialty chemicals for use in everyday products. This role will increase as oil prices continue to rise and the demand for sustainable production grows. Sustainable materials have been a focus of metabolic engineering efforts in which both natural biopolymers and traditional plastic monomers are produced from renewable resources. For instance, production of polyhydroxyalkanoates (PHAs), natural bioplastics used for energy storage, has been engineered in many species including bacteria and plants [1,20,23]. The use of PHAs as structural materials in consumer goods has been considered for several decades because the mechanical properties of PHAs are comparable to or better than those of existing plastics. Additionally, PHAs are biodegradable, making them an attractive, environmental-friendly option. Despite these benefits, attempts to commercialize production in the 1980s–1990s (Biopol®) failed because of high costs relative to petrochemical plastics [4]. Because of increasing oil prices and advances in metabolic engineering, PHAs are being revisited as a potential sustainable material. Telles, a joint venture between Metabolix and Archer Daniels Midland Company, is building a plant with the capacity to produce 110 million pounds of PHA resin (Mirel®) annually from corn [34]. Unlike PHAs, most plastics used today are not derived from renewable resources. Ongoing research is developing methods for sustainable biological production of plastic monomers and other specialty chemicals through biorefining. A biorefinery is analogous to a petroleum refinery where inputs are processed into a family of useful compounds, only biomass is used as an input to a biorefinery instead of petroleum [14]. In 2004, the Pacific Northwest National Laboratory and the National Renewable Energy Laboratory distributed a list of the top value added chemicals derived from biomass [29]. The list was populated with natural metabolites (diacids, amino acids, reduced sugars) that could be readily converted into other valuable commodities. In order for biorefineries to become reality, microorganisms need to be metabolically engineered to produce each of these central compounds in high yields from renewable feedstocks. This has been performed in a few cases and is best exemplified by 1,3-propanediol, a component of commercial polyesters. In the early 1990s, E. coli was metabolically engineered by the addition of the dha regulon from Klebsiella pneumonia to produce 1,3-propanediol from glycerol [25]. DuPont has since teamed with Tate & Lyle to commercialize the process in order to generate bioplastics from renewable sugars [30]. Similar metabolic engineering strategies are underway to generate additional plastic monomers and further diversify the list of renewable chemical precursors. Chapter 22 expands beyond the ideas listed here by describing work to produce other classes of specialty chemicals through metabolic engineering of microorganisms. Viable production strategies for any of the mentioned targets will depend on new techniques to balance metabolite production and cell growth. Most organisms have evolved to survive in a dynamic environment, and when challenged with the overproduction of a particular compound, most organisms will evolve to alleviate associated stresses. If, however, production is seamlessly woven into the metabolism of an organism and stresses are minimized, then a stable and efficient process can be developed. This is the central idea behind metabolic engineering. Chapter 21 describes the challenges in regulating a cell’s energy state, cofactor supply, and redox balance. Overcoming these challenges is crucial for all

Future Applications of Metabolic Engineering

VI-5

metabolic engineering efforts and can only be addressed by global metabolic studies where the impacts of all cellular enzymes and metabolites are considered. Finally, as the fields of systems biology (from the top down) and synthetic biology (from the bottom up) clarify how metabolic parts interact, metabolic engineers will be able to construct increasingly sophisticated microorganisms to address the many sustainability challenges facing our world.

References 1. Aldor, I.S. and Keasling, J.D. 2003. Process design for microbial plastic factories: metabolic engineering of polyhydroxyalkanoates. Curr. Opin. Biotechnol., 14: 475–83. 2. Anderson, J., Clarke, E., Arkin, A., and Voigt, C. 2006. Environmentally controlled invasion of cancer cells by engineered bacteria. J. Mol. Biol., 355: 619–27. 3. Andrianantoandro, E., Basu, S., Karig, D., and Weiss, R. 2006. Synthetic biology: new engineering rules for an emerging discipline. Mol. Syst. Biol., 2: 2006.0028. 4. Asrar, J. and Gruys, K.J. 2002. Biodegradable polymer (Biopol®). In Biopolymers, Y Doi, A Steinbüchel (eds), pp. 53–90. Weinheim, Germany: Wiley-VCH. 5. Atsumi, S., Hanai, T., and Liao, J. 2008. Non-fermentative pathways for synthesis of branched-chain higher alcohols as biofuels. Nature, 451: 86–9. 6. Bancroft, E. 2007. Antimicrobial resistance: it’s not just for hospitals. JAMA, 298: 1803–4. 7. Banerjee, A., Sharma, R., Chisti, Y., and Banerjee, U. 2002. Botryococcus braunii: a renewable source of hydrocarbons and other chemicals. Crit. Rev. Biotechnol., 22: 245–79. 8. Camargo, I. and Gilmore, M. 2008. Staphylococcus aureus—Probing for host weakness? J. Bacteriol, 190: 2253–6. 9. Cho, Y., Donohue, T., Tejedor, I., Anderson, M., McMahon, K., and Noguera, D. 2008. Development of a solar-powered microbial fuel cell. J. Appl. Microbiol., 104: 640–50. 10. Ezeji, T., Qureshi, N., and Blaschek, H. 2007. Bioproduction of butanol from biomass: from genes to bioreactors. Curr. Opin. Biotechnol., 18: 220–7. 11. Fernandes, P. 2006. Antibacterial discovery and development--the failure of success? Nat. Biotechnol., 24: 1497–503. 12. Houghton, J., Weatherwax, S., and Ferrell, J. 2006. Breaking the Biological Barriers to Cellulosic Ethanol: A Joint Research Agenda. DOE/SC-0095. Washington, DC: Department of Energy. 13. Kalscheuer, R., Stöveken, T., Luftmann, H., Malkus, U., Reichelt, R., and Steinbüchel, A. 2006. Neutral lipid biosynthesis in engineered Escherichia coli: jojoba oil-like wax esters and fatty acid butyl esters. Appl. Environ. Microbiol., 72: 1373–9. 14. Kamm, B. and Kamm, M. 2007. Biorefineries—multi product processes. Adv. Biochem. Eng. Biotechnol., 105: 175–204. 15. Kennedy, J., Murli, S., and Kealey, J. 2003. 6-Deoxyerythronolide B analogue production in Escherichia coli through metabolic pathway engineering. Biochemistry, 42: 14342–8. 16. Koku, H., Eroglu, I., Gunduz, U., Yucel, M., and Turker, L. 2002. Aspects of the metabolism of hydrogen production by Rhodobacter sphaeroides. Int. J. Hydrogen Eenergy, 27: 1315–29. 17. Murli, S., Kennedy, J., Dayem, L., Carney, J., and Kealey, J. 2003. Metabolic engineering of Escherichia coli for improved 6-deoxyerythronolide B production. J. Ind. Microbiol. Biotechnol., 30: 500–9. 18. Nass, L.L., Pereira, P.A.A., and Ellis, D. 2007. Biofuels in Brazil: An overview. Crop Science, 47: 2228–7. 19. Pfeifer, B. and Khosla, C. 2001. Biosynthesis of polyketides in heterologous hosts. Microbiol. Mol. Biol. Rev., 65: 106–18. 20. Rehm, B.H. 2003. Polyester synthases: natural catalysts for plastics. Biochem. J., 376: 15–33. 21. Ro, D., Paradise, E., Ouellet, M., Fisher, K., and Newman, K., et al. 2006. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature, 440: 940–3.

VI-6

Future Applications of Metabolic Engineering

22. Sherman, D. 2005. The Lego-ization of polyketide biosynthesis. Nat. Biotechnol., 23: 1083–4. 23. Stubbe, J., Tian, J., He, A., Sinskey, A.J., Lawrence, A.G., and Liu, P. 2005. Nontemplate-dependent polymerization processes: polyhydroxyalkanoate synthases as a paradigm. Annu. Rev. Biochem., 74: 433–80. 24. Tollefson, J. 2008. Energy: not your father’s biofuels. Nature, 451: 880–3. 25. Tong, I.T., Liao, H.H., and Cameron, D.C. 1991. 1,3-Propanediol production by Escherichia coli expressing genes from the Klebsiella pneumoniae dha regulon. Appl. Environ. Microbiol., 57: 3541–6. 26. Wagner, W., Helmig, D., and Fall, R. 2000. Isoprene biosynthesis in Bacillus subtilis via the methylerythritol phosphate pathway. J. Nat. Prod., 63: 37–40. 27. Walter, J., Greenfield, D., Bustamante, C., and Liphardt, J. 2007. Light-powering Escherichia coli with proteorhodopsin. Proc. Natl. Acad. Sci. USA, 104: 2408–12. 28. Weissman, K. and Leadlay, P. 2005. Combinatorial biosynthesis of reduced polyketides. Nat. Rev. Microbiol., 3: 925–36. 29. Werpy, T., and Petersen, G. 2004. Top Value Added Chemicals from Biomass. Springfield, VA: National Technical Information Service. Available: www1.eere.energy.gov/biomass/pdfs/35523.pdf. 30. www2.dupont.com/Renewably_Sourced_Materials/en_US/susterra.html. 31. www.agriculture.state.ia.us/historic.html. 32. www.artemisininproject.org. 33. www.eia.doe.gov. 34. www.metabolix.com.

21 Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes Bruno Bühler TU Dortmund

Lars M. Blank TU Dortmund and ISAS-Institute for Analytical Sciences

Birgitta E. Ebert TU Dortmund

Katja Bühler TU Dortmund

Andreas Schmid TU Dortmund and ISAS-Institute for Analytical Sciences

21.1 Introduction ��21-1 21.2 Microbial Energetics and Biotechnological Applications.........21-5 Biocatalytic Reactions and Energy Metabolism • Energy Aspects of Growth and Biocatalysis • Stress Metabolism during Biocatalysis • Energy Aspects of Recombinant Enzyme Overproduction

21.3 B iological Energy Issues in Fermentation Process Examples �� 21-11 Primary Metabolites • Secondary Metabolites

21.4 B iological Energy Issues in Whole-Cell Oxyfunctionalization Process Examples..................................21-18 21.5 Conclusion ��21-24 References ��21-24

21.1 Introduction Biocatalysis utilizes the catalytic potential of enzymes either in the isolated state or in whole cells to produce fine or bulk chemicals for the pharmaceutical and chemical industry. It represents a strongly expanding field, especially, as the number of available enzymes catalyzing industrially relevant reactions and the knowledge about the metabolic pathways involved in biocatalytic processes is rapidly increasing. Enzymes catalyze a huge variety of reactions in order to synthesize the building blocks and to generate the energy necessary for growth and proliferation of living organisms. Two different synthesis strategies can be distinguished in biocatalysis: (i) Fermentations make use of the (modified) metabolic network of microorganisms and the product of choice is derived from the growth substrate (e.g., glucose). Strain development often includes complex metabolic pathway engineering, since the natural objective of the host differs from industrial process objectives such as high product yields, high productivities, and high final product concentrations. Selected examples for fermentations are listed in Table 21.1 and discussed in the text. (ii) In biotransformations, the substrate to be converted into the desired product is different from the carbon and energy source of whole-cell biocatalysts. Alternatively, isolated enzymes instead of whole cells can be used to catalyze biotransformations. In this chapter however, we focus on whole cell bioconversions employing oxygenases for oxyfunctionalization reactions, which depend on the cellular metabolism for redox cofactor regeneration. Oxygenases, serving as an example for cofactor-dependent 21-1

Glycerol

Clostridium butyricum

Z. mobilis

1,3-Propanediol

Xylose

Glucose

Corn fibre

Saccharomyces strain 1400 S. cerevisiae

E. coli

Glucose

M. succino producens

Pyruvate

Glucose

E. coli

Glucose

Glucose

E. coli

C. glutamicum

3 glucose → 5 ethanol + 5 ATP

Xylose

E. coli

L-Lysine

glucose → 2 ethanol + 2 ATP

Glucose

E. coli

Glucose/ Xylose Xylose

glucose → 2 lactic acid + 2 ATP

Glucose

glycerol + NADH → 1,3-propanediol

glucose → 2 pyruvate + 2 ATP + 2 NADH

glucose + 2 NH3 + 4 NADPH → L-lysine + 2 NADH

3 xylose → 5 ethanol + 3 ATP

glucose → 2 ethanol + ATP

glucose + 2 CO2 + 4 NADH → 2 succinate

glucose → succinate + 2 CO2 + 4 NADH + NAD(P)H + 2 ATP

glucose + 2 CO2 + 4 NADH → 2 succinate

3 xylose →5 lactic acid + 5 ATP

glucose → 2 lactic acid + 2 ATP

glucose → 2 lactic acid + 2 ATP

glucose → 2 lactic acid + 2 ATP

Glucose

Glucose

glucose → 2 lactic acid + 2 ATP

Net Synthesis Equationb

Glucose

Substrate

Enterococcus faecalis S. cerevisiae

Lactococcus casei L. lactis

Catalyst

Z. mobilis

Ethanol

Succinate

Lactic acid

Primary metabolites

Product

Table 21.1 Examples for Fermentation Processesa

0.45

0.72

0.65

30

84 79

0.38 1.78 6f (0.96)f,g

2.4f (2)f,g

1.62

1.44 0.281

0.86f 0

1.67f

11

62

8.5

21

52.4

n.r. 58.3

99.2

63.3

73.2

82.3

f

1.65

1.37

1.67f 2f

1.96

2f

1.16

1.29 0.94

1.5

1.48

1.86

1.63

144

210

120

Final Titer (g L-1)

1f

1f

1.67f

2f

1.71f

1.71f 1f

1.72f 12f 1.72f

1.71f

1.67f

2f

2f

1.92

1.94

2f 2f

2

Yobservede (mol mol-1)

2f

Ytheoreticald (mol mol–1)

1.72f

1.67f

2f

2f

2f

2f

2f

YATP/Sc (mol mol-1)

Chemostat process

Continuous process Repetitive fed-batch process

Batch process

Batch process

Anaerobic batch process Batch process

Fed-batch process

Anaerobic process Aerobic fed-batch process

Two-stage process

Batch process

Microaerobic batch process Batch process

Batch process

Batch process, resting cells Fed-batch process

Comments

189, 190

129 188

127

122

187

118

186

43

185 109

110

184

184

99

183

182

181

Ref.

21-2 Future Applications of Metabolic Engineering

Glucose

Glucose

Glycerol

S. erythraea

B. subtilis

B. subtilis

Riboflavin

6 deoxyerythronolide (6dEB)

Erythromycin A

Glucose

L. lactis

Glucose

Glucose

E. coli

E. coli

Glucose

E. coli

Propionate

Glucose

E. coli

E. coli

Secondary metabolites

L-Alanine

L-Phenylalanine

Glucose

E. coli

6 glycerol + CO2 + 4 NH3 + 17 ATP + 3 NADPH → riboflavin + 2 formate + 2 NADH

7 glucose + 8 NH3 + 26 ATP + 2 NADPH → 2 riboflavin + 4 formate + 4 CO2 + 4 NADH

7 glucose + 4 NH3 + 8 ATP → riboflavin + 7 pyruvate + 2 formate + 2 CO2 + 3 NADPH + 9 NADH

10 glucose + NH3 + 8 H2O + 9 ATP → erythromycin + 23 CO2 + 28 NADH

7 glucose → 6dEB + NADPH + 28 NADH + 14 CO2

7 propionate + 7 ATP + 6 NADPH → 6dEB

glucose → 2 L-alanine + 2 NADH + 2 ATP

7 glucose + 8 ATP + 4 NH3 → 4 L-phenylalanine + 6 CO2 + 8 NADH

7 glucose + 2 NH3 → 2 L-phenylalanine + 7 pyruvate + 3 ATP + 3 CO2 + 11 NADH

glucose + 2 ATP + 4 NADH → 1,3propanediol

glucose + 2 ATP + 4 NADH → 1,3propanediol

0.15

0.27

0

0

0.16

n.r.

0.21

0.11

2f

0.6f

0.3f

1.2

1.45

10.8

n.r.

0

0

2f

0f

12.4f

0

0

n.r.

0.105

0.105

n.r.

n.r.

0.014

1.52

0.36

0.23

n.r.

1.21

n.r.

14

14

4

n.r.

0.095

12.5

0.45

0.44

n.r.

135

Glucose uptake via glucose permease

Glucose uptake via PTS

Fed-batch process

Fed-batch process, production during stationary phase

Glucose uptake via galactose permease, resting cells Batch process, resting cells

Glucose uptake via PTS, resting cells

Anaerobic process

Aerobic process

(continued)

150

147, 150

147, 150

139, 195

41

41, 42

194

193

193

191

191, 192

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes 21-3

Glucose

Xanthomonas campestris

Xanthan 23 glucose + 43 ATP → 4 [. . .] 4 CO2 + 18 NAD(P)H

glucose + NADPH → 3 hydroxybutyric acid + 4 NADH + 2 ATP

Net Synthesis Equationb

n.r.

2

YATP/Sc (mol mol-1)

0.16

0.66

Ytheoreticald (mol mol–1)

0.15

0.130

Yobservede (mol mol-1)

30

141.6

Final Titer (g L-1)

Batch process

Anaerobic process

Comments

197, 198

196

Ref.

b

a

See text for further details concerning the individual examples listed. n.r. not reported. Net synthesis equations were derived assuming exclusive use of the Emden–Meyerhoff–Parnas pathway (or Entner–Doudoroff pathway where appropriate) for the breakdown of glucose to pyruvate. c Y -1 ATP/S specifies the maximal ATP yield in mol ATP (mol substrate) on a given substrate for the target biosynthetic pathway. d Maximal theoretical product yield in mol product (mol substrate)-1 on the given substrate with closed energy and redox balance. e Observed product yield in mol product (mol substrate)-1. f Theoretical yields were not given in literature and thus calculated using stoichiometric models. g CO assimilation excluded. 2 h Xanthan-repeating unit.

Glucose

Substrate

E. coli

Catalyst

(R)-3Hydroxybutyric acid

Product

Table 21.1 Examples for Fermentation Processes (Continued)

21-4 Future Applications of Metabolic Engineering

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-5

enzymes with high industrial potential, catalyze the highly selective introduction of molecular oxygen into their substrate under mild conditions. The cofactor dependency of these enzymes together with their often observed limited stability in the isolated state argues for in vivo applications. Selected examples are listed in Table 21.2 and discussed in the text. In whole-cell processes, the biocatalytic reactions of interest may strongly interact with the metabolism of the microbial host, especially, if cofactor dependent enzymes are involved. This applies in the first place for biotransformation reactions, which do not contribute to carbon and energy metabolism. In contrast, classical fermentation processes directly make use of the natural, mostly anaerobic carbon catabolism. However, for the production of more complex molecules such as amino acids and secondary metabolites, fermentative processes typically require comprehensive metabolic pathway engineering to increase relevant rates. This may change the energy efficiency of the overall metabolism of the host strain and thus have an impact on biocatalyst efficiency. In conclusion, it is important for both whole-cell biotransformations as well as fermentations to consider the total energy metabolism, as it may directly influence the biocatalytically relevant reactions. In this chapter we focus on the interrelationship between biocatalysis and the energy metabolism of biocatalytically active cells. A comprehensive overview on biochemical processes resulting in energy generation, consumption, and dissipation is given and different examples for fermentations and biotransformations are presented and discussed from an energy perspective.

21.2 Microbial Energetics and Biotechnological Applications The role of energy in biological systems (bioenergetics) is to drive biosynthesis for growth and reproduction. Living cells and organisms must perform work to stay alive, to grow, and to reproduce themselves. The ability to harness energy from a variety of metabolic pathways and to channel it into biological energy is a fundamental property of all living organisms. Thus, biological thermodynamics (bioenergetics) concerns itself with the study of internal biochemical dynamics such as ATP hydrolysis, protein stability, DNA binding, membrane diffusion, enzyme kinetics, and other such essential energy controlled pathways. In biotechnological applications of microorganisms, carbohydrates or organic acids typically serve as both carbon and energy source for microbial growth. In industrial fermentations, these nutrients additionally serve as precursors for target products such as organic alcohols and acids, amino acids, antibiotics, or proteins. Furthermore, cofactor-dependent whole-cell biotransformations including redox biocatalysis require an efficient regeneration of cofactors such as NAD(P)H and ATP, which in turn interferes with energy metabolism. Thus, the central carbon metabolism and the coupled generation of biological energy in the form of ATP is of major interest in biotechnological applications of microorganisms. The details of microbial metabolism including energy source degradation, ATP formation, monomer synthesis, polymerization of biological macromolecules, and cell duplication are well understood. Less information is available on quantitative thermodynamics and kinetics of microbial growth. Typically, microbial growth is described by Monod kinetics and by yield and maintenance coefficients.1–4 In 1960, Bauchop and Elsden5 correlated biomass production with ATP availability and defined YATP as the yield of cell dry weight per mol of ATP. Despite considerable variation, a value of 10.5 g mol -1 for YATP was treated as a biological constant in microbiology text books.6 In the 1970s, Stouthamer7 calculated the amount of ATP needed for biomass formation based on a cell composition that was typical for Escherichia coli. These theoretical calculations indicated that YATP should be three-fold higher than the value derived experimentally by Bauchop and Elsden. Also the introduction of YATP/MAX,8 which was corrected for maintenance energy, could not give values as high as 32 g mol -1 for experimental data. This indicated that maintenance energy alone cannot explain the difference between theoretical and experimental growth efficiencies. Here, the high variability of experimental growth efficiencies had to be considered as well, including the strong dependence of YATP/MAX on growth rate and type of carbon and energy source, the large variation of YATP/MAX values among different organisms, and the low YATP/MAX values, when growth is limited by nutrients other than the energy source.

Toluene

Caprolactone

Lactone

(S)-Styrene oxide

Toluene cis-glycol

3-Methylcatechol

Dimethylbenzaldehyde

Hydroxymethylsimvastatin

5

6

7

8

9

10

11

a

Styrene

Octanoic acid

4

Simvastatin

Pseudocumene

Toluene -1

-2

-1 0

-1

-1

Ketone

e

-1

≤ 1

Cyclohexanone

Octane d

-1

-1

0c

NAD(P)H Balance

66

0.91 g L-1 7.89 g L-1 3.8 g L-1 32.3 g Lorg-1 37.7 g Lorg-1

Continuous culture of recombinant E. coli Growing recombinant E. coli Resting recombinant E. coli Resting recombinant E. coli Growing recombinant E. coli

19 42 18

n.r.

43

n.r.

120

60

55

6 g Lorg-1 0.27 g Laq-1 36.8 g Lorg-1

Growing mutant of P. putida F1 Growing recombinant E. coli Suspended whole cells of a Nocardia sp.

57 g Laq-1

6.6 g Laq-1 1.9 g Laq–1

Fed-batch culture of recombinant E. coli

4

Resting P. putida UV4

171

5 g Lorg–1

180

46

174

173

166

163 38,165

44

160

155,157

152,154

Growing recombinant P. putida

Spontaneous reaction 60

n.r.

23 g L-1

Ref.

41 g L-1

Final Titerb (g L–1)

Recombinant and engineered growing E. coli

Catalyst Recombinant and engineered growing E. coli

Specific Activity [U (g CDW)–1]

b

See text for further details concerning the individual examples listed. n.r., not reported. Final product concentration. For two phase processes the respective phase is indicated: aq, aqueous phase; org, organic phase. c 2-Oxoglutarate serves as a cosubstrate for proline 4-hydroxylase and is converted to succinate and CO at the expense of the formation of 1 NADH and 1 ATP in the TCA cycle. 2 d NADH balance depends on the rate, with which the oxygenase contributes to alcohol oxidation. e Conversion of bicyclo[3.2.0]oct-6-en-2-one to (–)-(1S, 5R)-2-oxabicyclo[3.3.0]oct-6-en-one and (–)-(1R, 5S)-3-oxabicyclo[3.3.0]oct-6-en-one.

e

Alkanols

3

L-proline

Tryptophane or Indole Alkanes

Hydroxyproline

Indigo

Substrate

1

Product

2

#

Table 21.2 Examples for Whole-Cell Oxyfunctionalization Processesa

21-6 Future Applications of Metabolic Engineering

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-7

Mathematically, the introduction of maintenance terms, which varied with growth rate and carbon source availability, provided a realistic method of describing the variation of maintenance. 3,9 However, this model did not address the biological mechanisms responsible for variable maintenance. Further studies indicated that ATP production does not necessarily determine the growth rate of microorganisms and that anabolism is incompletely coupled to catabolism.10,11 Russell and Cook6 summarized various mechanisms of energy loss including overflow metabolism, metabolic shifts, uncoupling, and futile cycles. During overflow metabolism, microorganisms excrete or leak partially oxidized metabolic intermediates, cell wall components, and protein into culture media. Acetic acid production by E. coli is a prominent example and was shown to decrease as a response to NADH drain by a recombinant NADH oxidase leading to more efficient glucose catabolism.12 When an energy source is in excess, microorganisms were shown to shift catabolism to an energetically less efficient scheme such as the methylglyoxal pathway.13 The variable efficiency of coupling NADH and FADH2 oxidation at the cytoplasmic membrane to O2 reduction and metabolic energy generation can be ascribed to the fact that not all membrane-bound redox enzymes are necessarily involved in energy conservation and that microorganisms can have multiple strategies of electron flow and coupling. Thus, it is inherently difficult to estimate YATP values of aerobes. While futile cycles of antagonistic enzymes involved in ATP-dependent phosphorylation and fortuitous dephosphorylation of metabolites (e.g., phosphofructokinase and fructose-1,6-diphosphatase) were considered to play no major role in ATP-spilling due to their tight regulation, experimental evidence indicates that microorganisms can dissipate energy in futile cycles of ions through cytoplasmic membranes.6,14 Some ion cycles only occur under specific nutrient limitations (e.g., potassium and ammonia), but proton cycles can operate whenever there is an imbalance of catabolic and anabolic rates. Such energy spilling could significantly influence the efficiency of microbial energy metabolism and explain the discrepancy between experimentally observed and theoretical growth yields. As indicated by low YATP/MAX values, the energetic efficiency of microbial growth is significantly reduced, when growth is limited by nutrients other than energy. This was confirmed in an assessment of B. subtilis metabolism and energetics via quantitative metabolic flux analysis during continuous cultivation.14 Furthermore, an ATP balance indicated that, under glucose limitation, fast-growing B. subtilis dissipated more energy for processes such as ion leakage than slow-growing cells.14 In contrast, studies based on quantitative physiology and metabolic flux balancing by means of a stoichiometric model indicated that slow glucose-limited growth of E. coli is less energy efficient (limited by cell-carbon supply) than fast growth (limited by energy-carbon supply).15 However, quantitative flux analysis during continuous E. coli cultures could not confirm these flux balancing results and demonstrated nonlinear dependency of intracellular fluxes on growth rate, emphasizing the highly dynamic behavior of carbon metabolism, which is controlled by yet largely unknown regulatory mechanisms.16,17 Possible reasons for energy spilling include the compensation for imbalances between anabolism and catabolism, coping with limitations by nutrients other than energy (e.g., potassium or ammonia), the transformation of energy sources to compounds unusable or toxic for competing organisms, rigidity and robustness of metabolic networks, and anticipation of and disposition for potentially changing environmental conditions.6,10,18,19 Thus, such extra energy consumption adds to the enormous flexibility of microbial metabolism, which is able to adapt to various physiological objectives. The economic objectives of the application of microbial cells differ from the “natural” objectives of cells in that energy metabolism should be optimally exploited for the desired reactions and not necessarily for optimal growth and competition. However, the flexibility and robustness of microbial metabolism augurs well for the efficient implementation of biocatalytic reactions interfering with energy metabolism. Processes important for the bioenergetics of biocatalytically active microbial cells include the reactions of interest, biomass formation (growth), stress metabolism, and overexpression of genes of interest (Table 21.3). Figure 21.1 summarizes the main energy generating pathways during oxidative glucose catabolism, including substrate level- and oxidative phosphorylation, and the main energy consuming reactions including growth, maintenance, and factors potentially influencing the energy metabolism of biocatalytically active cells.

21-8

Future Applications of Metabolic Engineering

Table 21.3 Intracellular Processes Interfering with Host Energy Metabolism during Biocatalytic Oxyfunctionalization Type of Process

Description

Relevant Conditions

Reactions of interest

Metabolite consumption, e.g., ATP, NAD(P)H

Growth Stress metabolism

Catalyst production Processes coping with adverse effects occurring during biocatalysis

Overexpression

Increase of enzyme concentration

PPP/TCA

ED/TCA

ADP ~3H+ ATP

3 ATP 2 FADH2 10 NAD(P)H

6CO2

½O2

Energy dissipation

4 ATP 2 FADH2 10 NAD(P)H

6CO2

ATP consumption for growth and maintenance

+

NAD ≤10H+ H2O

2 ATP 1 FADH2 11 NAD(P)H

6CO2

NADH

Glucose

EMP/TCA

Glucose

Glucose

Energy dependent reactions (e.g., cofactor regeneration necessary) Instable enzymes, instable metabolism Substrate, product, and/or enzyme toxicity, product extrusion, cofactor drain, enzyme overexpression Use of recombinant cells

FADH2 ½O2

FAD

Substrate & product toxicity Enzyme toxicity Solvent stress Energy dependent biotransformation reaction Uncoupling phenomena

≤6H+ H2O

Figure 21.1 Biological energy generation and consumption in a biocatalytically active cell. Energy generation via the main pathways of oxidative glucose catabolism is shown as an example, and energy dependent glucose import via the phosphotransferase system is assumed. Per molecule of glucose entering the PPP, one molecule of glyceraldehyde 3-phosphate is assumed to enter glycolysis and the TCA whereas fructose-6-phophate is assumed to be cycled. Block arrows indicate energy consumption. Biocatalysis related effects leading to energy dissipation are indicated by the arrows on the right hand side. ED = Entner–Doudoroff pathway; TCA = tricarboxylic acid cycle; PPP = pentose phosphate pathway; EMP = Emden–Meyerhoff–Parnas pathway.

21.2.1 Biocatalytic Reactions and Energy Metabolism The reactions exploited during biotransformation and fermentation processes may involve multiple substrates and products and may be energy- and/or cofactor-dependent. Fermentation processes thereby often exploit native energy generating catabolic pathways for the production of oxidized and/or reduced metabolites as they occur during anaerobic growth or as a result of aerobic overflow metabolism (e.g., ethanol, acetic acid, and lactic acid). In this case, the energy metabolism follows its natural scheme as long as fluxes are not altered by, e.g., manipulating enzyme titers. However, flux manipulations are required, when metabolites naturally present at low concentrations such as amino acids or antibiotics are the target. Furthermore, the synthesis of amino acids and secondary metabolites often consumes biological energy. In such cases, the energy required for growth and maintenance often limits the process efficiency in terms of yield as well as productivity. The same has to be considered, when additional reactions are introduced into an organism to transform naturally occurring metabolites into desired products. If such additional reactions include energy consuming steps, the picture again changes and an additional level of interaction

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-9

with the energy metabolism of the host has to be taken into account. For this latter case, only very few examples exist, of which two are listed in Table 21.2 (#1 and #2) and discussed below. The impact of such additional reactions can be compared with the impact of energy-dependent biotransformation reactions on the host metabolism (as detailed in the next paragraph) with the difference that the substrate of the biotransformation is not fed but synthesized by the host metabolism. Thereby, it has to be considered, that the metabolic burden of both substrate synthesis and energy dependent substrate conversion may, depending on the catalytic rates and the catabolic capacity of the organism, affect productivities and yields. Efforts toward industrial biotransformations based on cofactor-dependent enzymes such as oxygenases, which often are unstable and/or membrane associated, mainly have focused on whole-cell biocatalysts, in which the cofactors are regenerated by the cell metabolism.20–24 Thereby, redox cofactor-dependent intracellular enzyme catalysis is expected to interfere with the energy metabolism of the microbial host, since the same cofactors are involved in both redox biocatalysis and energy metabolism. This raises the question about a possible limitation of whole-cell redox biocatalysis by the cofactor regeneration capacity of the cell metabolism. NADH and NADPH, which can be converted into one another by transhydrogenases,25,26 are the cofactors most often used by oxygenases. Estimates of the NAD(P) H regeneration capacity of microbial cells based on physiological data of aerobically growing cells vary between 126 and 2218 U (g CDW) -1 (µmols per minute per gram of cell dry weight) depending on the strain. For instance, E. coli showed a value of 575 U (g CDW) -1.27,28 Granted that the capacity for glucose catabolism is the same for growing and nongrowing cells, the glucose that is metabolized for biomass formation would become available for cofactor regeneration in nongrowing cells. Using this assumption, a potential NAD(P)H regeneration rate of 1125 U (g CDW) -1 was estimated for nongrowing E. coli, whereas this value amounted to 3150 U (g CDW) -1 for Paracoccus versutus, showing the highest specific rate of glucose catabolism among the strains evaluated.27,29 However, uncoupling of cofactor consumption from product formation and other undesired side reactions such as overoxidation and oxidation at multiple sites may increase the biooxidation related cofactor demand.30 In the case of oxygenases, uncoupling not only leads to a loss of reducing equivalents, but also to an increased oxygen demand and the production of toxic oxygen species such as hydrogen peroxide.31 Another factor which reduces cofactor availability is the interference of enzymatic background activity of the host with biocatalysis. Such an interference was observed, when recombinant E. coli containing xylene monooxygenase (XMO) were used for the successive oxidation of toluenes and xylenes to the corresponding alcohols, aldehydes, and acids.32–34 In these applications, the oxidation of alcohols to the corresponding aldehydes catalyzed by the monooxygenase was counteracted by the reduction of a ldehydes to alcohols catalyzed by dehydrogenases from the E. coli host. This “futile” cycle not only reduced the aldehyde formation rate but also acted as a sink for reduced cofactors, since both reactions , alcohol oxygenation and aldehyde reduction, use stoichiometric amounts of NADH.32,34,35 On the other hand, the involvement of both, oxygenases and dehydrogenases for the catalysis of successive oxidations uncouples biocatalysis from cofactor regeneration by the host metabolism. The NADH consumption by an oxygenase catalyzed reaction thereby is compensated by NADH generation in a dehydrogenase catalyzed oxidation. This strategy was followed in several synthetic applications (see below) including, e.g., the production of 3-methylcatechol from toluene by genetically engineered Pseudomonas putida strains.36,37

21.2.2 Energy Aspects of Growth and Biocatalysis In general, cells employed in biocatalytic applications have to be produced at a certain stage of the process. Thus, cell growth is always an issue in whole-cell bioprocesses. Separation of growth and production phases and the use of resting cells is not only an interesting strategy from a technical point of view, allowing the separate optimization of the two phases,21,38 but also with respect to bioenergetics and fermentation yields.39–41 Uncoupling of energy demands for biotransformation from demands for cell growth may increase the amount of cofactors available for redox biocatalysis. In fermentation processes, reducing the amount of carbon and energy source needed for growth and maintenance is expected

21-10

Future Applications of Metabolic Engineering

to increase fermentation yields. However, many fermentative processes as well as biotransformations involving, e.g., oxygenases are only productive when growing cells are used as biocatalysts and resting cells or cells reaching the stationary phase after batch or fed-batch growth were found to steadily loose their activities despite of the supply of an energy source.42–49 Possible reasons include decreasing intracellular enzyme levels due to changes in the regulation of gene expression,50 protein stability,51 or cell physiology (membrane stability).52,53 Furthermore, decreasing activities may be a consequence of reduced ATP and NAD(P)H regeneration rates, when the cells adapt their metabolic activity to the actual growth stage. Thus, the use of growing cells with high metabolic activities may guarantee the maintenance of high levels of the enzymes of interest and/or high cofactor regeneration rates. However, the presence of additional cofactor-consuming enzymes will have an impact on the energy metabolism of both, growing and resting cells. Due to the enormous flexibility of microbial metabolism as depicted above,6,10,18,54 a certain extent of additional cofactor consuming activity might also cause an increase of the glucose uptake rate in order to accomplish increased demands.55 Here, metabolic engineering may contribute to a better understanding and enable the manipulation of metabolic fluxes in order to channel energy to biocatalysis instead of growth.56–59

21.2.3 Stress Metabolism during Biocatalysis When living cells are applied for biocatalytic reactions, maintenance has to be considered as well. Maintenance includes processes compensating for the instability of essential macromolecules, reflects inter alia the flexibility to adapt to changing environmental conditions, and may be increased by adverse effects occurring during biocatalysis. Such additional maintenance can also be termed stress metabolism (Table 21.3). Factors leading to an increased maintenance metabolism include, beside the above mentioned uncoupling of redox catalysis and unspecific side reactions, the overexpression of genes of interest (discussed in the next paragraph) and adverse effects exerted by substrates and products of the biocatalytic reactions. Organic substrates and products may be toxic to living cells, especially in the case of oxidative biotransformations, 30,60–63 with membrane intercalation and disintegration being the most prominent mechanisms of toxicity.64 Thus, in the presence of toxic solvents, uncoupling of the proton motive force 65 and cofactor/metabolite loss due to membrane permeabilization are expected to increase maintenance requirements and may reduce the amount of energy available for biocatalysis. A recent study on stereospecific styrene epoxidation catalyzed by recombinant E. coli containing the styrene monooxygenase from Pseudomonas sp. strain VLB120 suggests that a product toxicity induced cofactor limitation affected the styrene oxide productivity under process conditions (see below).66 Gram-negative bacteria such as E. coli and particularly Pseudomonas strains have been reported to possess various mechanisms of solvent tolerance including adaptive alterations of the membrane fatty acid- and phospholipid headgroup composition, energy-dependent efflux pumps exporting toxic compounds to the external medium, and formation of vesicles loaded with toxic compounds.67–69 Such mechanisms again are expected to increase maintenance energy requirements, for which proteome analyses of solvent tolerant P. putida S12 and transcriptome analyses of solvent-sensitive P. putida KT2440 gave clear indications. In the presence of toluene, P. putida S12 showed a reduced growth yield, upregulation of NAD(P)H generating systems, and downregulation of the major proton-driven system, the ATP synthase,70 whereas downregulation of energy-dependent systems, e.g., motility functions, was found for P. putida KT2440.71 In general, the presence of toxic substrates, products, or extraction phases may have a considerable impact on the energy metabolism of living cells and thus on the efficiency of wholecell biocatalysis. Approaches to overcome toxicity-related limitations include, beside the use of solvent tolerant strains, which so far showed rather low productivities, in situ product removal and regulated substrate addition.30,72–75

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-11

21.2.4 Energy Aspects of Recombinant Enzyme Overproduction The strong overexpression of heterologous genes often leads to the inhibition of cell growth, reduced growth yields, reduced ribosome concentrations, loss of culturability, and induction of different stress reactions76–79 such as the heat shock response80,81 or the unfolded protein response.82–84 Thereby, the synthesis of stress response proteins can considerably increase the energy demand of the microbial host.85 Furthermore, recombinant gene overexpression directly interferes with cell metabolism via consumption of precursors leading to a metabolite imbalance and/or an increased demand of carbon and energy.86,87 Furthermore, Sanden et al.86 showed that the rate of glucose-limited growth at the time of induction influences heterologous protein production. As a consequence of heterologous protein production, microorganisms generally seem to readjust their metabolic activities according to their energetic requirements and, if necessary, at the cost of their biosynthetic capabilities. In the case of intracellular oxidoreductase overproduction, additional aspects have to be considered.63 As mentioned above, uncoupling, which also can occur in the absence of productive biocatalysis, may lead to cofactor drain and the formation of reactive oxygen species and thereby interfere with oxidative carbon metabolism, thus causing additional metabolic burden. This especially holds for heterologous overexpression, since the molecular environment in recombinant hosts may differ from the wildtype strain, which might complicate the stable and functional expression of oxidoreductases in recombinants and thus promote uncoupling. Potential critical factors in this regard include the ratio of multiple enzyme components, the incorporation of prosthetic groups, and the interactions of membrane-associated components with the host membrane. Furthermore, high-level expression of membrane bound proteins may destabilize the membrane and thus affect energy metabolism and cell growth. Overexpression of the alkane monooxygenase genes of P. putida GPo1 in P. putida and E. coli altered the membrane lipid composition and reduced the growth rate as well as the stability of the heterologous genes.88–91 Considering all these aspects of oxidoreductase overexpression, it becomes clear that high level expression can be detrimental and that the expression level should not necessarily be maximized but optimized for maximal performance of the whole-cell biocatalyst. Nevertheless, genetic engineering to optimize gene expression, e.g., via the fine-tuned overexpression of small heat shock proteins,92,93 may help improving biocatalytic processes based on recombinant whole-cells.

21.3 Biological Energy Issues in Fermentation Process Examples Extensive literature exists about the coherence of product yield and host energy metabolism in fermentation processes. Table 21.1 lists selected examples for the production of primary and secondary metabolites and some of them are discussed below in the context of general and product specific cellular energy issues.

21.3.1 Primary Metabolites In the first example, we discuss microbial energy metabolism during the production of optically pure L(+) - lactic acid from glucose. Industrial lactic acid production processes to date utilize anaerobic fedbatch cultures of lactic acid bacteria (LAB) such as Lactobacillus lactis. Lactic acid carbons have the same oxidation state as glucose carbons. Thus, reducing equivalents are conserved in such anaerobic bioconversions leading to high carbon yields that can be close to the theoretical yield of 2 mol lactic acid (mol glucose) -1 and are readily achievable using wildtype strains.94 In order to achieve high productivities and yields, lactic acid fermentations with LAB are typically carried out at neutral pH. However, as lactic acid possesses a pKa of 3.86, it accumulates in the dissociated form at neutral pH and CaCO3 is added during the process to neutralize the reaction broth. During downstream processing, the broth has to be acidified again, yielding half a mol of gypsum per mol of

21-12

Future Applications of Metabolic Engineering

lactic acid, which complicates product recovery and therefore increases overall production costs.95 The simplest strategy to overcome this problem would be to shift the pH from neutral to acidic values below 3.8, allowing the production of free lactic acid. Unfortunately, most LAB do not tolerate pH values significantly below 5. Therefore, pH tolerant hosts such as fungi of the genus Rhizopus or recombinant Kluyveromyces lactis and Saccharomyces cerevisiae strains have been discussed as new hosts for the production of free lactic acid.96–101 Recent reports, however, point out that the pH tolerance of such strains is low during lactic acid production,97,102 which can be explained by energy limitation. In LAB, the low energy yield of 2 mol ATP during the conversion of one mol of glucose to 2 mols of lactic acid can sustain biomass synthesis and maintenance requirements, but is not sufficient to establish a proton gradient over the membrane. However, as detailed below, such a gradient would be required for organic acid extrusion at low pH. This energy limitation can be considered a fundamental barrier for anaerobic acidogenic fermentation processes such as lactic acid and succinic acid production at low pH. Thus, anaerobically growing yeast strains are not expected to be better hosts for weak acid production, despite their high tolerance for low pH. As mentioned above, the energy cost for lactic acid export over the cytoplasmic membrane is believed to be the main cause for low or lacking lactic acid productivity at pH values below 3.8. For this transport, three driving forces can be distinguished: (i) the gradient between intracellular and extracellular lactic acid concentrations, (ii) the pH gradient over the membrane, (iii) and ATP hydrolysis energy in the case of active transport by ABC transporters. In principal, four export mechanisms can be distinguished.97,102 At high internal and low external lactic acid concentrations, product extrusion results in net proton extrusion. This lactic acid proton symport was shown to increase the energy yield of LAB on glucose.103 Under these conditions, LAB lower ATP hydrolysis-driven proton pumping via the F1F0-ATPase.104 At low lactic acid gradients, the dissociated acid including the free proton or the undissociated acid is transported by passive or facilitated diffusion, resulting in an energy neutral lactic acid export. However, at high extracellular lactic acid concentrations as present in industrial processes, only energy dependent lactic acid export occurs, either via secondary transport (uniport) of the anion or via primary transport of the undissociated acid by ABC transporters.102 Uniport is driven by the membrane potential and results in intracellular acidification necessitating ATPase-driven proton export. Primary transport via ABC transporters, the most energy intensive export mechanism, consumes 1 mol of ATP per mol of lactic acid exported and allows pumping against large lactic acid gradients at low extracellular pH. In this case, lactic acid fermentation becomes energy neutral (1 ATP synthesized in glycolysis and 1 ATP consumed for export per lactic acid molecule produced), which prevents other energy dependent reactions such as cell growth and maintenance. This energy limitation might be overcome by conducting the bioconversion in an aerated system that potentially can lead to an 18 times higher ATP yield on glucose as compared to the anaerobic process. In contrast to yeast, LAB generally do not possess a functional electron transport chain for oxidative phosphorylation. However, complementing a heme auxotrophy allowed the reconstitution of functional cytochrome in Lactococcus lactis.105 This cytochrome could rescue a F1F0-ATPase negative mutant, indicating that a proton gradient could be established without ATP hydrolysis. However, the pH sensitivity of this aerobically growing strain was not investigated. In any case, additional ATP can only be produced by glucose oxidation to CO2, resulting in decreased lactic acid yields. Well balanced microaerobic processing at low pH including partial glucose oxidation might allow both efficient lactic acid fermentation and a simple down stream processing but most likely is difficult to achieve in an industrial setting. For the production of lactic acid and weak acids in general, not only the stoichiometry of the biochemical pathway of the host, but also the energy cost of product export has to be considered. Thus, the industrially interesting fermentation parameters low pH and lactic acid titers above 100 g L–1 might not be combinable, due to energy dissipation during lactic acid export. The second example in Table 21.1 is the production of succinic acid, which was described to be an important building block for a wide variety of high value-added commodity chemicals.106 Microbes

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-13

produce succinic acid either as the final product of the fumarate reductase catalyzed reaction employing fumarate as terminal electron acceptor under anaerobic conditions, as a by-product of the isocitrate lyase catalyzed conversion of isocitrate to glyoxylate in the glyoxylate shunt, or as intermediate in the TCA cycle via succinyl-CoA oxidation catalyzed by succinyl-CoA ligase. Succinic acid is the major natural fermentation product of, e.g., the gram-negative anaerobes Anaerobiospirillium succiniciproducens and Mannheimia succiniciproducens. M. succiniciproducens recently was engineered to minimize byproduct formation.43 Alternatively, E. coli strains were engineered for aerobic or anaerobic succinic acid production. Aerobic conditions allow high biomass yields and high growth rates, but enable a maximal t heoretical succinic acid yield of only 1 mol (mol glucose) -1, which is lower than the 1.71 mol (mol glucose)–1 reached under anaerobic conditions. A recombinant E. coli that could reach the theoretical aerobic yield was constructed by deleting the major acetic acid producing reactions, the succinate dehydrogenase, and the phosphotransferase (PTS)-based glucose uptake system to increase phosphoenolpyruvate (PEP)availability for anaplerosis. In addition, the anaplerotic PEP carboxylase was overexpressed and the glyoxylate shunt deregulated to enhance carbon flow toward succinic acid.107 The high yield reached was somewhat unexpected as the oxidative branch of the TCA cycle has an unfavorable stoichiometry for succinic acid production, since two CO2 are released for one succinic acid formed. However, a strain carrying, beside the above mentioned modifications, a deletion in the isocitrate dehydrogenase produced not only succinic acid but also significant amounts of pyruvate, citrate, and isocitrate, which lowered the product yield.108 In an aerobic fed-batch culture, the engineered strain with both aerobic succinic acid producing pathways (glyoxylate shunt and oxidative TCA cycle branch) allowed a final titer of up to 58 g succinic acid L -1, a yield of 0.94 mol (mol glucose) -1, and a productivity of 0.63 mmol (g CDW) -1 h -1 (Table 21.1).109 Further yield improvements can only be achieved by anaerobic succinic acid production. To overcome the limitations of the anaerobic process such as slow carbon uptake and slow succinic acid production, a dual-phase production system can be used with fast biomass formation in an aerobic phase followed by an anaerobic succinic acid production phase.110,111 E. coli was engineered for anaerobic succinic acid production by deleting lactate dehydrogenase and pyruvate-formate-lyase to avoid by-product formation and by deleting the PTS glucose uptake system to increase the flux toward succinic acid. In addition, the pyruvate carboxylase was overexpressed. Without this enzyme activity, increased acetic acid production was observed. This observation was explained by an energy limitation. The increased flux toward succinic acid via PEP carboxylase circumvents the ATP yielding reaction from PEP to pyruvate catalyzed by pyruvate kinase. To compensate this lack of ATP, carbon flux is channeled to the acetate kinase catalyzed reaction generating one ATP per molecule acetic acid produced. Via acceleration of glucose catabolism, the extra pyruvate carboxylase activity thus allowed the generation of additional ATP per time without compromising the flux into the TCA cycle and toward succinic acid. The maximal yield of 1.71 mol (mol glucose)–1 is only possible, when the fumarate reductase pathway and the glyoxylate shunt are active.111 As the glyoxylate lyase is only expressed under aerobic conditions, a dual-phase fermentation strategy was chosen allowing glyoxylate lyase expression in the aerobic phase and efficient succinic acid production by this enzyme in the anaerobic phase. This strategy enabled impressive final titers of up to 99 g L -1, a yield of 1.5 mol (mol glucose)–1, and a succinic acid production rate of 1.1 mmol (g CDW) -1 h -1.110 The mode of production, e.g., aerobic or anaerobic fermentation, has not only a significant impact on the product yield, but also influences the productivity and overall economics of a process. The presented coupled, dual-phase process takes advantage of the two cultivation modes, high growth rate during the aerobic phase and high product yield during the anaerobic phase. The bulk chemical bioethanol is produced in processes based on the yeast Saccharomyces cerevisiae using crops such as sugar cane, corn, or sugar beet as starting material. However, the low market price of 0.5 to 0.8 € L -1 (2006 in Europe) and the competition with fossil fuels push the industry toward the use of cheaper substrates such as lignocellulose.112 Processing and utilization of lignocellulose differs

21-14

Future Applications of Metabolic Engineering

in many aspects from crop-based ethanol production. Efficient catabolism of the various hexoses and pentoses found in lignocellulose is one critical aspect. With lignocellulose sugars as substrates, none of the known ethanol producing microbes combines the needed properties of high ethanol yield, high final ethanol titer (high ethanol tolerance), and high ethanol production rate. To overcome such limitations, the ethanol producers S. cerevisiae and Zymomonas mobilis were engineered for pentose (e.g., xylose, arabinose) utilization. Alternatively, the Z. mobilis ethanol pathway was introduced into E. coli that can grow on a wide variety of carbohydrates including pentoses. Below, we discuss biological energy aspects of the former strategy. The most intensely studied strategy for xylose utilization by S. cerevisiae is to introduce the xylose catabolic pathway from xylose utilizing yeasts like Pichia stipitis.113–117 In this pathway, xylose catabolism is initiated by NADPH-dependent xylose reduction to xylitol. A NAD-dependent xylitol dehydrogenase then converts xylitol into xylulose, which subsequently enters the pentose phosphate pathway of S. cerevisiae and is converted to ethanol. Due to the redox cofactor imbalance introduced by this pathway (1 NADPH consumed and 1 NADH produced) and the lack of transhydrogenases in yeasts, xylose can only be metabolized under aerobic conditions. Three scenarios to balance the redox cofactors and produce ethanol with this xylose catabolic pathway were described:118 (i) NADH formed in the xylitol dehydrogenase reaction is utilized by the electron transport chain during aerobic growth. However, aeration leads to higher process costs and is difficult to optimize, as too much respiratory NADH consumption would lower the yield of ethanol, (ii) Excess NADH is balanced with the production of glycerol as found during anaerobic growth of S. cerevisiae on hexoses. During growth on xylose however, the NADH excess from xylitol dehydrogenation is so large that most of the carbon would have to be converted into glycerol, with the consequence of insufficient ethanol and ATP yields. Thus, this scenario is not feasible without an additional external energy source, (iii) Yeasts expressing a xylose reductase with dual cofactor specificity, accepting both NADH and NADPH, produce xylitol to balance excess NADH. The successful application of this strategy was reported by Sonderegger et al.,114 who performed extensive evolutionary engineering of a recombinant xylose metabolizing S. cerevisiae strain and could show for the first time anaerobic ethanol formation from xylose in this system. However, due to the production of xylitol, this strategy results in a reduced ethanol yield. In order to reduce xylitol accumulation, a mutated xylose reductase with an increased K m for NADPH was employed, which indeed resulted in increased ethanol and reduced xylitol yields.119 In an alternative approach, the redox cofactor issue was overcome by the overexpression of a xylose isomerase from the anaerobic fungus Piromyces sp. E2 in S. cerevisiae.120 This enzyme catalyzes the redox cofactor-independent conversion of xylose into xylulose by a xylose isomerase. A selected mutant of this recombinant strain was able to anaerobically produce ethanol from xylose with a yield of 1.37 mol mol–1, which is 82% of the theoretical ethanol yield of 1.67 mol mol -1.118 Furthermore, the nonoxidative PP pathway was introduced into this strain, which had no significant effect on the ethanol yield but increased the growth rate (to 0.09 h -1) and thus the ethanol productivity three times.121 Summarizing, the overall redox balance is a critical factor for anaerobic ethanol production, especially when organisms without transhydrogenase activities are used. As a bacterial alternative, recombinant Z. mobilis strains were constructed that carry the xylose utilization pathway from E. coli on a plasmid including xylA (xylose isomerase), xylB (xylulokinase), tal (transaldolase), and tktA (transketolase).122 Although this pathway does not generate a redox imbalance, it was successfully expressed only in bacteria and not in yeast. On xylose, the recombinant Z. mobilis strain reached a yield of 1.4 mol ethanol (mol xylose) -1, which is 86% of the theoretical ethanol yield. On a glucose/xylose mixture, a yield of 1.7 mol ethanol (mol sugars) -1 was reached. The two sugars were taken up simultaneously during ethanol production, although at different rates. Glucose exhaustion was followed by a diauxic shift with slower ethanol production rates on pure xylose. The same strain converted acid-pretreated corn fibers to ethanol at similar yields and a volumetric productivity of 1.04 g L -1 h -1.123 The deviation from the theoretical yield on xylose was attributed to energetic limitations at the low xylose uptake rates. Indeed, in vivo 31P-NMR studies indicated that NTP and UDP-sugar concentrations were lower during growth on xylose as compared to glucose.124 Since the central carbon

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-15

metabolism of Z. mobilis is based on the Entner–Doudoroff pathway, only 1 mol ATP is formed per mol glucose or xylose catabolized. However, due to high glucose uptake rates of up to 60 mmol (g CDW) -1 h -1 and low biomass yields of 2–5%, Z. mobilis cells produce at least 50% more ATP per time as compared to anaerobically growing S. cerevisiae cells.125 This and the fact that the glucose uptake rate had little influence on the growth rate126 indicate an energy excess rather than an energy limitation during growth of Z. mobilis on glucose. The most likely candidate for ATP dissipation is the F0F1-ATPase. Inhibition of the ATPase activity by dicyclohexylcarodiimide (DCCD) increased the growth yield of Z. mobilis.125 Thus, the glucose catabolism of Z. mobilis seems to be optimized for high substrate turn over rates rather than energetic efficiency. On the other hand, the lower xylose conversion rates observed might indeed bring Z. mobilis into a state of starvation, as less ATP is accessible per time. Typically, biocatalytic processes for bulk chemical production are only economically feasible, when they are based on cheap carbon sources. The described example shows that the use of a different carbon substrate may not only change the product yield, but also the rate of production. Thus an optimal microbial host has a large spectrum of carbon and energy substrates and is able to simultaneously and efficiently utilize all carbon sources present in the industrial production medium. As an example for an amino acid process, we will now discuss L-lysine production. Although Corynebacterium glutamicum strains naturally secret a range of amino acids, the production rates of the wildtype organisms are too low for industrial applications. Therefore, the industrially employed strains underwent several rounds of mutagenesis and selection to overcome product inhibitions and to increase the catalytic rates in the amino acid synthesis pathways. A yield of 0.28 mol lysine (mol glucose)-1,127 a volumetric lysine productivity of up to 3 g L–1 h–1,128 and final titers of more than 100 g L -1 have been reported.129 Regarding lysine production by C. glutamicum, an energy aspect comes into play, which has not been discussed in the previously presented examples—energy dissipating futile cycles. C. glutamicum harbors the unusual number of five enzymes catalyzing reactions between the C3 metabolites pyruvate and PEP and the C4 metabolites malate and oxaloacetate. Two enzymes have a CO2 fixing, energy dependent anaplerotic function (PEP carboxylase, pyruvate carboxylase), while three enzymes are involved in gluconeogenesis (malic enzyme, PEP carboxykinase, oxaloacetate decarboxylase). The combination of one of the anaplerotic reactions with one of the gluconeogenic reactions results in the dissipation of one ATP per catalytic cycle. The metabolic network structure of C. glutamicum and its operation were intensively studied using, besides other techniques, 13C-tracer based flux analysis.130–134 This method provided indications that a futile cycle around the pyruvate node is active during lysine production.135,136 During growth on a mixture of glucose and lactate, the pyruvate carboxylase was the major active anaplerotic enzyme (more than 90% of the flux to oxaloacetate). Interestingly, the demand for anaplerotic formation of oxaloacetate was estimated to be three times lower than the measured flux. The surplus of oxaloacetate was transformed to PEP by the PEP carboxykinase, thus closing one of the possible futile cycles.137 A mutation in the pyruvate carboxylase that most likely caused a higher flux through the corresponding reaction increased lysine production by increasing the flux toward lysine.128 On the other hand, the deletion of the PEP carboxykinase improved lysine production during growth on glucose (from 0.13 to 0.27 mmol (g CDW) -1 h -1) by eliminating the energy dissipating futile cycle described above.130 During pathway engineering, not only the net fluxes have to be considered but also possible futile cycles that can lower the yield of the envisaged product.

21.3.2 Secondary Metabolites Secondary metabolites are generally not directly involved in growth metabolism, but rather needed to facilitate survival in a specific ecological niche or under adverse conditions. Although secondary metabolites are a very heterogeneous and poorly defined group of molecules, they often have complex synthesis routes in common with high demands for energy and redox cofactors. Here, we discuss biological energy aspects in the biocatalytic syntheses of riboflavin, also known as vitamin B2, and the polyketide 6-deoxyerythronolide B (see also Table 21.1).

21-16

Future Applications of Metabolic Engineering

Polyketides constitute a large family of medically important natural compounds produced by microbes and plants. Since the chemical synthesis routes of most polyketides are complex and costly, the production of polyketides by microbial organisms using cheap carbon sources such as glucose is often considered. The biochemical synthesis pathways rely on polyketide synthases, large proteins with distinct modular functionality. These enzymes are selective for starter and extender units and determine the chain length of the molecule and the degree of reduction of every repeating unit.138 The natural product titer accumulated by wildtype microbes, however, is rarely economically feasible and traditionally is improved by extensive mutagenesis and screening programs. In the case of the polyketide erythromycin, a potent antibiotic produced on a several 1000 tons per year scale, random mutagenesis and screening yielded Saccharopolyspora erythraea strains that accumulate a final product titer of at least 4 g L -1 at a rate of about 6 μmol (g CDW) -1 h -1 during a 220 h fed-batch fermentation.139 These values, however, are low when compared to the up to 15-fold higher titers reached for other secondary metabolites such as penicillins.140 Consequently, there is a strong interest in rational strain improvement strategies using alternative hosts such as E. coli or S. cerevisiae,42,141–143 which in contrast to the original hosts (mainly Actinomyces sp.) can easily be genetically modified and used in large scale fermentations. Although industrial production of erythromycin or its nonglycosylated precursor 6-deoxyerythronolide B (6dEB) is performed in fed-batch processes,139 resting cell approaches also have been considered, as no competition for energy and carbon precursors exists under these conditions.41 Based on a study of Pfeiffer et al.,42 Gonzalez-Lergier et al.41 estimated the theoretical 6dEB yield of resting recombinant E. coli cells. Thereby, the precursor propionyl-CoA synthesized from propionate is converted by the propionyl carboxylase from Streptomyces coelicolor to the extender unit methylmalonylCoA (Figure 21.2).41,42,141 The erythromycin polyketide synthase from S. erythraea then catalyzes the final reactions to 6dEB. The net synthesis equation of this newly introduced pathway is 7 propionate + 7 ATP + 6 NADPH → 6dEB + 7 ADP + 7 Pi + 6 NADP + + H2O (Figure 21.2), highlighting the high energy and redox cofactor demand of polyketide synthesis. Assuming an additional energy and redox cofactor source, a theoretical molar yield of 0.14 mol 6dEB (mol propionate)–1 was calculated.41 However, under the reported conditions, only propionate was supplied as carbon and energy source. Based on metabolic flux analysis, a theoretical maximum yield of 0.11 mol mol–1 was estimated with the assumption that 7 propionate 7 ATP 7 ADP+7 Pi

7 CoA Propionyl-CoA

6 propionyl-CoA 6 CO2

6 (2S)-methylmalonyl-CoA 6 NADPH 6 NADP+

Propionyl-CoA 6 CO2, 7 CoA, H2O

6-deoxyerythronolide B

Figure 21.2 The engineered 6-deoxyerythronolide B (6dEB) pathway in E. coli with propionate as carbon and energy source. Propionyl-CoA is the substrate for the synthesis of the extender unit methylmalonyl-CoA and is used by the polyketide synthase as starter unit.

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-17

resting cells consume no energy for maintenance and propionate transport. Thereby, the TCA cycle and oxidative phosphorylation were assumed to provide the required energy and redox cofactors by means of propionate oxidation to CO2. The yield dropped to zero (no 6dEB production) assuming a maintenance demand of about 6 mmol ATP (g CDW) -1 h -1, as it was determined for E. coli growing in a chemostat on glucose.144 However, the experimentally observed yield42 of 0.014 mol mol–1 suggested that the true energy demand for maintenance under the applied conditions was 0.158 mmol ATP (g CDW) -1 h -1. Assuming such a nonzero maintenance value, the molar yield of 6dEB is a function of the propionate uptake rate, for which a value of 0.65 mmol (g CDW) -1 h -1 was estimated to satisfy the maintenance demands only. Higher uptake rates were proposed to be necessary to allow 6dEB production. Thus, metabolic engineering strategies would focus on maximizing propionate uptake and/or minimizing maintenance demands. In order to enable the use of a cheaper carbon source such as glucose, the metabolic network structure was altered, including changes in the 6dEB synthesis pathway. With glucose as the substrate, (2S)-methylmalonyl-CoA is not synthesized from propionyl-CoA, but from the TCA cycle intermediate succinylCoA. The theoretical yield per three carbon unit of glucose was calculated to be 0.105 mol mol-1, slightly lower as compared to the yield on propionate (0.11 mol mol -1).41 However, glucose allows a higher molar yield, when maintenance is taken into account. This results from co-utilization of glycolysis, pentose phosphate pathway, and TCA cycle for energy generation and precursor synthesis. Nevertheless, as discussed for propionate, the yield of 6dEB increases with increasing glucose uptake rates. The vitamin riboflavin is industrially produced in fed-batch fermentations with the natural overproducer strains Ashbya gossypii and Candida famata or recombinant strains of Bacillus subtilis.145 Here, we concentrate on the well described and characterized B. subtilis system, which was reported to produce final riboflavin titers of about 14 g L -1 in fed-batch fermentations.146,147 However, it can be assumed that titers above 20 g L -1 are reached in the industrial fermentation process. The net synthesis equation of riboflavin is 3.5 glucose + 13 ATP + NADPH + 4 NH3 → riboflavin + 2 NADH + 2 formate + 2 CO2, emphasizing the high demand of carbon and energy resources. Consequently, the maximal theoretical yield on glucose, calculated in the absence of growth, is rather low at 0.267 mol mol -1 (Table 21.1). During glucose limited continuous cultivation, wildtype B. subtilis exhibited a P/O ratio (mol ATP synthesized per mol NADH consumed and O-atom reduced) of about 1.33, an ATP demand for maintenance m ATP of about 9 mmol (g CDW) -1 h -1, and a YATP/MAX. Around 9.5 g mol -1, which is equivalent to an ATP demand of 105 mmol per gram of biomass.148 Using metabolic flux analysis, a 20% higher riboflavin yield was estimated in this strain by increasing the P/O ratio from 1.33 to 1.5. However, this estimation relies on the assumption that additional ATP is solely used for riboflavin production, although improved ATP production in engineered strains is also likely to increase biomass formation.149 To account for unknown variations of the bioenergetic parameters P/O and YATP with process conditions, Sauer et al.150 introduced a separate ATP dissipating flux into their metabolic flux model. This flux was defined as a substrate specific amount of ATP that is consumed in processes other than biomass or product formation. A P/O ratio of 2 was assumed to calculate maximal ATP formation rates and ATP fluxes to biomass and product formation were calculated based on observed yields. The ATP dissipating flux was estimated to be 9.5 mol ATP (mol glucose)-1 in glucose limited chemostat cultivations, in which the energetic efficiency of B. subtilis was considered to be close to maximal.150 Simulations for non-growing cells predicted a maximal riboflavin yield of 0.16 mol (mol glucose) -1 and an ATP dissipating flux of 10.8 mol ATP (mol glucose) -1. Since the latter value is close to the energetic efficiency reached in glucose-limited chemostat cultivations, the maximal riboflavin yield on glucose was considered to be both energetically and stoichiometrically limited. Several strategies were proposed to overcome the potential energy limitation:150 (i) One strategy consisted in the use of a highly reduced substrate such as glycerol. The theoretical riboflavin yield on glycerol was estimated to be 0.295 mol (2 mol glycerol) -1. Using an experimentally determined ATP dissipating factor of 8.9 mol ATP (2 mol glycerol) -1 reduced the maximal yield to 0.217 mol (2 mol) -1, which indicates energy limitation, but still is an improvement when compared to glucose as the substrate, (ii) The use of a bacterial strain with a higher energetic efficiency

21-18

Future Applications of Metabolic Engineering

was proposed. To calculate the impact of higher energy efficiency, the estimation was repeated with a substrate leading to lower stoichiometric limitations, such as sucrose. The biochemical reaction network of Bacillus licheniformis would allow a maximal yield on sucrose of 0.219 as compared to 0.178 mol mol–1 in B. subtilis. Thus, for energetically limited processes the B. licheniformis strain might be a superior host, (iii) Replacing the PTS-based glucose uptake system by a glucose permease and a hexokinase might overcome stoichiometric limitations. Simulations predicted a Ymax of 0.267 mol (mol glucose) -1. Following the second approach, the energetic efficiency of a riboflavin hyperproducer was improved by altering its electron transport chain.151 The knockout of cytochrome bd oxidase (encoded by cydBC) reduced the maintenance requirements by about 40% as estimated from chemostat cultivations at different dilution rates. The most likely explanation for this improvement is the translocation of two protons per transported electron via the alternative cytochrome aa3 oxidase (encoded by qoxA-D) instead of only one proton via the bd oxidase. However, the critical dilution rate, above which overflow metabolism was observed, was lower for the cyd mutant as compared to the original strain. Possible explanations include a reduced oxygen scavenging capacity of the aa3 oxidase at high respiration rates (the aa3 oxidase has a 100-fold lower affinity for oxygen than the bd oxidase) and/or overflow metabolism triggered by increased ATP levels. However, industrial fed-batch processes are conducted with slowly growing cells and for such processes the cyd mutant might be favorable as, in comparison to the wildtype, the glucose demand for maintenance is reduced from 0.67 to 0.39 mmol (g CDW) -1 h -1. Indeed, in experimental fed-batch fermentations, the riboflavin yield increased from 0.013 to 0.015 mol (mol glucose) -1, whereas the biomass yield was constant. When several copies of the riboflavin pathway operon were introduced into wildtype and mutant strains the productivity of the wildtype strain was higher, but the cyd mutant exhibited a higher riboflavin yield (0.020 and 0.015 mol (mol glucose) -1 for the cyd mutant and the wildtype strain, respectively). Thus, it appears that increasing the efficiency of energy coupling not only reduced maintenance requirements, but, as suggested previously,150 also improved the yield of riboflavin production.151

21.4 Biological Energy Issues in Whole-Cell Oxyfunctionalization Process Examples As examples for whole-cell redox cofactor-dependent biotransformation reactions, we look in detail at a selection of oxyfunctionalization reactions catalyzed by oxygenases. The following section gives an overview of various oxidative biotransformations (listed in Table 21.2) and discusses the relevance of host energy metabolism where possible. The first two examples represent processes, which may either be run as a fermentation process, yielding the product of choice from the growth substrate, or as a biotransformation process requiring an additional substrate. These processes are impressive examples for successful metabolic pathway engineering including the use of recombinant oxygenases for the modification of metabolites produced by the heterologous host. Thus, the recombinant enzymes are directly coupled to host metabolism. The production of trans-4-hydroxy-L-proline starts either from L-proline or directly from glucose (Table 21.2 #1; Figure 21.3). For the hydroxylation of L-proline to 4-hydroxy-L-proline, a proline 4-hydroxylase gene from Dactylosporangium sp. RH1 was expressed in E. coli.152 In the biotransformation approach, 2-oxoglutarate, the cosubstrate of proline 4-hydroxylase, was efficiently supplied from glucose via the cell metabolism, while proline was added as biotransformation substrate, yielding succinate, CO2, and trans-4-hydroxy-L-proline. The introduced shortcut in the TCA cycle between 2-oxoglutarate and succinate results in a loss of one NADH and one ATP, which would otherwise be generated by 2-oxoglutarate dehydrogenase and succinyl-CoA ligase, respectively. Interestingly, it was reported that cells expressing proline 4-hydroxylase showed an increased growth rate in the presence of L-proline compared to cells lacking this enzyme. It was suggested that this was due to an increased flux from 2-oxoglutarate directly to succinate, bypassing the reaction catalyzed by the 2-oxoglutarate dehydrogenase, of which the expression is repressed at high NADH levels. A similar effect was observed in a

21-19

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

HO Proline-4-hydroxylase COOH

N H

2-oxoglutarate

Succinate + CO2

COOH

N H

Figure 21.3 Hydroxylation of L-proline to trans-4-hydroxy-L-proline catalyzed by proline-4-hydroxylase in recombinant E. coli.

different study by Vemuri et al.,153 who introduced a NADH oxidase activity into E. coli and thereby reduced the redox ratio resulting in an increase in TCA cycle activity. The production of trans-4-hydroxy-L-proline from glucose was achieved by introducing additional copies of two genes involved in L-proline synthesis into E. coli.154 These recombinant enzymes were not controlled by feedback inhibition, the regulation mechanism which tightly controls the intracellular concentration of L-proline in the wildtype system. Consequently, the rate of proline synthesis strongly increased, which made the external addition of this amino acid obsolete. What effect this had on the growth of the host was not reported, but it can be speculated that the overproduction of one specific amino acid leads to a metabolite imbalance, which has to be settled by the metabolism of the host. A far more complex metabolic engineering effort allowed the production of indigo from glucose (Table 21.2 #2; Figure 21.4).155 In 1983, Ensley et al.156 reported the formation of indigo from tryptophane by recombinant E. coli expressing the naphthalene dioxygenase genes of P. putida PpG7. Although there was some product formed directly from glucose, acceptable yields were achieved only after adding extra tryptophane or indole. Comprehensive engineering of the production strain finally allowed the production of up to 18 g L -1 high quality indigo from glucose.155,157 Although, the central carbon metabolism of the host strain was significantly altered, it seems that the withdrawal of NADH for indole oxygenation and the alteration of the respective amino acid synthesis pathways do not interfere much with the general energy metabolism of the host. No problems concerning cell growth or process stability have been reported so far. The next two examples listed in Table 21.2 (Table 21.2, #’s 3 and 4; Figure 21.5) show oxidations of different alkanes to corresponding alkanols or alkanoic acids. These reactions are based on the alkane degradation pathway of P. putida GPo1 (formerly known as P. oleovorans GPo1 = TF4-1L = ATCC 29347), which is responsible for the conversion of linear alkanes to the corresponding alkanols, alkanals, and carboxylic acids.158,159 The initial reaction, the hydroxylation of the alkane to the corresponding alkanol, is catalyzed by the AMO, a nonheme diiron enzyme. This reaction was exploited for the production of n-alkanols by cloning the corresponding genes into a P. putida COOH N H Spontaneous reactions

NH2

Tryptophanase N H O

N H

OH

Naphthalene dioxygenase

NADH

NAD+

N H

H N

O

Figure 21.4 Synthesis of indigo from tryptophane by recombinant E. coli. Tryptophane is converted to indole by the tryptophanase of the E. coli host. Indole then is hydroxylated by the recombinant naphthalene dioxygenase yielding indoxyl, which reacts spontaneously to indigo in the presence of molecular oxygen.

21-20

Future Applications of Metabolic Engineering Alkane monooxygenase ( )n

( )n n = 1-8

NAD+

NADH

Alkanal dehydrogenase ( )n NAD+

Alkanol dehydrogenase

OH

( )n NAD+

O

NADH

COOH

NADH

Figure 21.5 Hydroxylation of n-alkanes to the corresponding alkanols and alkanoic acids, catalyzed by recombinant P. putida and E. coli harboring enzymes of the upper alkane degradation pathway of P. putida GPo1.

strain unable to further oxidize the alcohol, which thus accumulated in the fermentation broth.160 A two-liquid phase system was applied, in which the organic phase consisting of the respective alkane served as substrate for the biotransformation reaction. These experiments showed a negative correlation between the hydroxylation activity of AMO and the maximal growth rate of the host organism. C10 –C12 alkanes were barely converted by AMO. The use of these biotransformation substrates resulted in a two-fold higher growth rate as compared to experiments involving the fast hydroxylation of C7–C9 alkanes. The authors argued that the decreased growth rate reflected a NADH shortage in the host brought about by the increased rate of hydroxylation consuming one NADH per product molecule formed. In addition, AMO also catalyzes the oxidation of alkanols to the corresponding alkanals at the expense of another NADH molecule.161,162 Indications for this reaction have also been found in the two-phase process of Bosetti et al.160 Cells employed in bioprocesses based on two-phase systems with low logP compounds have to cope with cofactor loss due to membrane permeabilization and increased maintenance metabolism. Nevertheless, the specific activities reached in the process with recombinant growing P. putida for the production of octanol have been fairly high (60 U [g CDW]-1) during growth, but rapidly decreased when the cells reached the stationary phase. It was suggested that this decrease in activity was due to a breakdown of the intracellular NADH supply, because the carbon and energy source was depleted.160 A resting cell approach was not reported for this system so far. In a similar process, E. coli containing AMO in combination with the alcohol and aldehyde dehydrogenases of the alkane degradation pathway was used to convert octane to octanoic acid in a three step reaction.44,52,53,163 Octanoic acid at concentrations above 40 mM as well as induction of gene overexpression inhibited growth of the recombinant host.164 Therefore, the cells were grown to high cell densities before induction and biotransformation. Yeast extract played an essential role in the heterologous gene expression, which indicates that the recombinant protein showed an amino acid composition atypical for E. coli leading to a metabolite imbalance. E. coli cells were found to withstand the solvent stress of a bulk octane phase as long as they were growing. This indicates some maintenance mechanism, which is coupled to growth metabolism. It seems unlikely that cofactor regeneration limited the biotransformation, as the specific enzyme activity was rather low and the overall reaction even produces one NADH per product molecule formed, assuming the alcohol oxidation is catalyzed by the alkanol dehydrogenase and not by AMO. The conversion of cyclohexanone to ε-caprolactone by cyclohexanone monooxygenase (CHMO) from Acinetobacter sp. is an interesting example for a Baeyer–Villiger type biooxidation (Table 21.2 #5; Figure 21.6). This NADPH-dependent enzyme was used in recombinant E. coli BL21 cells in a growing as well as in a resting state.38,165 With growing cells, productivities of 0.047 g L -1 h -1 were reached with a specific activity of 42 U (g CDW) -1. The rather low volumetric productivities were due to the low cell densities of approximately 0.4 (g CDW) L -1 under biotransformation conditions. By using resting cells,

21-21

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

O

O Cyclohexanone monooxygenase

O NADP+

NADPH

Figure 21.6 Conversion of cyclohexanone to ε-caprolactone catalyzed by cyclohexanone monooxygenase in recombinant E. coli. H Cyclohexanone monooxygenase

O

NADP+

NADPH

O

H O

O

O

+

H

H

Figure 21.7 Conversion of bicyclo[3.2.0]oct-6-en-2-one to (–)-(1S,5R)-2-oxabicyclo[3.3.0]oct-6-en-3-one and (–)-(1R,5S)-3-oxabicyclo[3.3.0]oct-6-en-3-one catalyzed by cyclohexanone monooxygenase in recombinant E. coli.

the volumetric productivity was increased 16-fold to 0.79 g L–1 h -1, but the specific activity was lower at 18 U (g CDW) -1. The final product titer reached 7.89 g L -1. Protein instability and mass transfer over the cell membrane were identified as the key limiting factors. The intracellular NADPH concentration turned out to be well above the K m value of CHMO. From these findings, one can conclude that NADPH regeneration in resting cells is fast enough to buffer NADPH consumption by the biotransformation reaction at specific rates around 20 U (g CDW) -1. This conclusion is confirmed by the much higher activities reached for the conversion of bicyclo[3.2.0]oct-6-en-2-one to (–)-(1S,5R)-2-oxabicyclo[3.3.0] oct-6-en-3-one and (-)-(1R,5S)-3-oxabicyclo[3.3.0]oct-6-en-3-one catalyzed by the same enzyme in resting recombinant E. coli TOP10 (Table 21.2 #6; Figure 21.7).166,167 During 2.5 h of reaction, a specific activity of 55 U (g CDW) -1 has been reached on a 1.5 L as well as on a 55 L scale.166 Substrate and product inhibition were identified as the main limiting factors, whereas energy issues such as cofactor availability seemed not to be limiting. Some energy aspects have also been discussed for the enantioselective epoxidation of styrene to (S)-styrene epoxide (ee > 99%), for which a productivity of 8.4 g Laq-1 h-1 was reached. This reaction is catalyzed by the flavin dependent styrene monooxygenase (SMO) of Pseudomonas sp. strain VLB120 (Table 21.2 #7; Figure 21.8).168,169 This two component enzyme system consumes one NADH per product molecule formed. For production, growing recombinant E. coli were used in an optimized two-liquid phase system.170,171 In this process, styrene oxide was shown to cause acetic acid formation, membrane permeabilization, and cell lysis.66 As a response to styrene oxide accumulation, the specific activity of the recombinant cells as well as specific CO2 evolution, O2 uptake, and glucose uptake rates decreased. Since CO2 evolution during aerobic glucose catabolism is directly proportional to the generation of reducing equivalents in the form of NAD(P)H and FADH2,85 the simultaneous decrease of glucose uptake rates, O

Styrene monooxygenase

NADH

NAD+

Figure 21.8 Enantiospecific epoxidation of styrene to (S)-styrene oxide by recombinant E. coli expressing the styrene monooxygenase genes from Pseudomonas sp. strain VLB120.

21-22

Future Applications of Metabolic Engineering

H

Toluene dioxygenase NADH

OH OH

NAD+

H

Figure 21.9 Conversion of toluene to toluene cis-glycol catalyzed by toluene dioxygenase containing P. putida UV4.

specific CO2 evolution rates, and specific SMO activities suggested that decreasing cofactor regeneration rates may have reduced biotransformation performance. Limitations caused by mass transfer and intrinsic biocatalyst activity could be excluded. Also the 50% lower specific activity of growing cells as compared to resting cells might be explained by NADH shortage in growing cells. As possible reasons for the decreasing cofactor regeneration rates as a consequence of product toxicity, metabolite and cofactor loss over permeabilized membranes, enzyme deactivation, and regulatory phenomena have been discussed.66 Toluene dioxygenase was used to produce toluene cis-glycol from toluene at productivities of 9.5 g Laq -1 h -1 with cis-dihydrodiol dehydrogenase deficient P. putida UV4 at the expense of one NADH per product molecule formed (Table 21.2 #8; Figure 21.9).172,173 Non-growing cells were used in a two-phase system, in order to prevent toxification of the organisms by the toxic substrate. The nontoxic, rather polar product accumulated in the aqueous phase. The high activity of 120 U (g CDW) -1 for this NADH consuming reaction suggests efficient NADH regeneration by the cell metabolism. A detailed analysis of potential limiting factors has not been made. The same enzyme in combination with cis-dihydrodiol dehydrogenase was used to produce 3-methylcatechol in a two step reaction in the presence of a second octanol phase (Table 21.2 #9; Figure 21.10). The corresponding genes were overexpressed in a mutant of the natural host P. putida F136,174 or introduced into solvent-tolerant P. putida S12.37 Productivities ranged from 0.067 g Laq -1 h -1 with recombinant P. putida S12 to 0.11 g Laq -1 h -1 with the mutant of P. putida F1 in a process using octanol as a second liquid phase. Using the latter strain in an aqueous one-phase system resulted in a similar productivity (0.1 g L -1 h -1) but in a lower overall product concentration (1.24 as compared to 3.10 g Ltot-1 in the twoliquid phase system). Alternatively, the two phases were separated by a membrane to reduce octanol toxicity.175 The overall reaction is NADH neutral, since NADH is consumed during oxygenation and produced in the dehydrogenation step. This and the rather low activities suggest that the reaction itself has no significant impact on the energy metabolism of the host. The reaction conditions, e.g., the presence of octanol as a second liquid phase, more likely interfered with cell metabolism by triggering stress responses as discussed above. Our next example discusses the multistep oxygenation of xylenes to the corresponding alcohols, aldehydes, and acids catalyzed by XMO of P. putida mt-2 (Table 21.2 #10).32 Recombinant E. coli containing this enzyme showed complex inhibition kinetics. The favorable partitioning behavior of the reactants in a two-liquid phase system allowed the exploitation of these kinetics for the production of 3,4-dimethylbenzaldehyde from pseudocumene (Figure 21.11).47,176 As the activity of resting cells of recombinant E. coli was rather instable, growing cells were applied during fed-batch cultivation. The influence of

Toluene dioxygenase NADH

NAD+

H OH OH H

cis-Dihydrodiol dehydrogenase NAD+

NADH

OH

OH

Figure 21.10 Synthesis of 3-methylcatechol from toluene by the action of toluene dioxygenase and cis-dihydrodiol dehydrogenase in P. putida F1.

21-23

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes OH

O

Xylene monooxygenase NADH

Xylene monooxygenase

NAD+

NAD+

NADH

Figure 21.11 Production of 3,4-dimethylbenzaldehyde from pseudocumene with recombinant E. coli expressing xylene monooxygenase as the biocatalyst.

NADH availability on the reaction rates was further analyzed inter alia on the basis of a mathematical process model for this two-step oxygenation, which consumes two NADH molecules per product molecule formed.177 Surprisingly, a pH shift from 7.1 to 7.4 significantly reduced growth rates and doubled specific biotransformation rates but had no effect on single step bioconversion rates in short-term experiments based on resting cells.46,177 This observation and process simulation provided evidence for a pH-influenced competition for NADH between XMO and the respiratory chain with its consequential impact on bioconversion and cell growth. For the simulation of such differential NADH limitation, a pH-dependent feedback inhibition of the NADH consuming bioconversions was introduced as a modeling tool, which allowed good simulations of biotransformation experiments performed at varying pH, scale, and initial substrate concentration. A change in the pH influences the proton gradient and thus the proton motive force over the cytoplasmic membrane,178,179 where the enzyme systems competing for NADH, namely XMO and the enzymes involved in oxidative phosphorylation, are located. Therefore, a rise in the pH might promote the flow of NADH to XMO at the expense of the flow to the electron transport chain. This might cause additional stress for the cells and interfere with cell growth. In fact, this was reflected by lower biomass yields and growth rates at higher biotransformation rates. The hydroxylation of simvastatin to its 6-β-hydroxymethyl derivative (Table 21.2 #11; Figure 21.12) was catalyzed by wildtype Nocardia sp. and consumes one NADH per product molecule formed.180 The biotransformation was carried out with suspended cells. From the available data, it is difficult to discuss this reaction with respect to energy metabolism. As the volumetric productivity (7.3 × 10 -3 g L -1 h -1) is very low, it can be assumed that the specific activity of this biocatalyst is low and thus probably not limited by NADH supply. The only limitation actually discussed for this reaction is the inhibition at high substrate concentrations. Considering the bulkiness of the substrate, mass transfer limitations concerning the transport of the substrate from the fermentation broth into the cell, may also be an issue. Fortunately, processes yielding high value compounds often are economically viable at low productivities and product concentrations.

HO

HO

O O

O O

O

O O

Nocardia sp. NADH

O

NAD+

OH

Figure 21.12 Hydroxylation of simvastatin to its 6-β-hydroxymethyl derivative by wildtype Nocardia sp.

21-24

Future Applications of Metabolic Engineering

21.5 Conclusion In fermentation process design, different, sometimes contradicting criteria that couple cellular host energy metabolism to product synthesis have to be considered. Here, we highlighted the influence of anaerobic and aerobic production schemes on process performance and the energetic cost of product export in dependence of physical parameters such as pH. For pathway engineering, not only the net synthesis equation has to be considered, but also the operation of the entire metabolic network under industrial conditions including the production of side products, the overall redox balance, and the occurrence of futile cycles. Superior production hosts can be generated by following general metabolic engineering strategies such as the increase of the substrate uptake rate and the optimization of the energetic efficiency of the host organism. The interactions of biotransformation reactions with host energy metabolism are only poorly understood, so far. The few examples, for which such interactions have been considered, indicate a limitation in cofactor availability especially in systems, in which the specific bioconversion activity is high and/or the biocatalyst, the cell, is stressed by toxic substrates, products, or organic solvents. Via flux balance and quantitative metabolic flux analyses, recent studies give evidence for NADH limitation during oxygenase catalysis especially in growing cells and highlight the high energy demand for solvent tolerance.199,200,201 The integration and generation of knowledge about microbial energy metabolism during the design of recombinant biocatalytically active microbes, but importantly also during the design of the entire industrial biocatalytic process, will accelerate the industrial implementation of competitive fermentation and biotransformation processes.

References 1. Monod, J. Recherches sur la croissance des cultures bacteriennes. Herman et Cie, Paris, 1942. 2. Marr, A.G., Clark, D.J., and Nilson, E.H. Maintenance requirement of Escherichia coli. Ann. N. Y. Acad. Sci., 102, 536, 1963. 3. Pirt, S.J. Maintenance energy: a general model for energy-limited and energy-sufficient growth. Arch. Microbiol., 133, 300, 1982. 4. Pirt, S.J. Maintenance energy of bacteria in growing cultures. P. Roy. Soc. Lond. B Bio., 163, 224, 1965. 5. Bauchop, T. and Elsden, S.R. The growth of Microorganisms in relation to their energy supply. J. Gen. Microbiol., 23, 457, 1960. 6. Russell, J.B. and Cook, G.M. Energetics of bacterial growth: Balance of anabolic and catabolic reactions. Microbiol. Rev., 59, 48, 1995. 7. Stouthamer, A.H. A theoretical study on the amount of ATP required for synthesis of microbial cell material. Antonie van Leeuwenhoek, 39, 545, 1973. 8. Stouthamer, A.H. and Bettenhausen, C. Utilization of energy for growth and maintenance in continuous and batch cultures of microorganisms. Biochim. Biophys. Acta, 301, 53, 1973. 9. Neijssel, O.M. and Tempest, D.W. Bioenergetic aspects of aerobic growth of Klebsiella aerogenes NCTC 418 in carbon-limited and carbon-sufficient chemostat culture. Arch. Microbiol., 107, 215, 1976. 10. Marr, A.G. Growth rate of Escherichia coli. Microbiol. Rev., 55, 316, 1991. 11. Westerhoff, H.V., Hellingwerf, K.J., and Vandam, K. Thermodynamic efficiency of microbial growth is low but optimal for maximal growth rate. Proc. Natl. Acad. Sci. USA, 80, 305, 1983. 12. Vemuri, G.N. et al. Overflow metabolism in Escherichia coli during steady-state growth: Transcriptional regulation and effect of the redox ratio. Appl. Environ. Microbiol., 72, 3653, 2006. 13. Weber, J., Kayser, A., and Rinas, U. Metabolic flux analysis of Escherichia coli in glucose-limited continuous culture. II. Dynamic response to famine and feast, activation of the methylglyoxal pathway and oscillatory behaviour. Microbiology, 151, 707, 2005.

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-25

14. Dauner, M., Storni, T., and Sauer, U. Bacillus subtilis metabolism and energetics in carbon-limited and excess-carbon chemostat culture. J. Bacteriol., 183, 7308, 2001. 15. Kayser, A. et al. Metabolic flux analysis of Escherichia coli in glucose-limited continuous culture. I. Growth-rate-dependent metabolic efficiency at steady state. Microbiology, 151, 693, 2005. 16. Nanchen, A., Schicker, A., and Sauer, U. Nonlinear dependency of intracellular fluxes on growth rate in miniaturized continuous cultures of Escherichia coli. Appl. Environ. Microbiol., 72, 1164, 2006. 17. Perrenoud, A. and Sauer, U. Impact of global transcriptional regulation by ArcA, ArcB, Cra, Crp, Cya, Fnr, and Mlc on glucose catabolism in Escherichia coli. J. Bacteriol., 187, 3171, 2005. 18. Fischer, E. and Sauer, U. Large-scale in vivo flux analysis shows rigidity and suboptimal performance of Bacillus subtilis metabolism. Nat. Genet., 37, 636, 2005. 19. Blank, L.M., Kuepfer, L., and Sauer, U. Large-scale 13C-flux analysis reveals mechanistic principles of metabolic network robustness to null mutations in yeast. Genome Biol., 6, R49, 2005. 20. Schmid, A. et al. Industrial biocatalysis today and tomorrow. Nature, 409, 258, 2001. 21. Woodley, J.M. Choice of biocatalyst form for scalable processes. Biochem. Soc. T., 34, 301, 2006. 22. Ishige, T., Honda, K., and Shimizu, S. Whole organism biocatalysis. Curr. Opin. Chem. Biol., 9, 174, 2005. 23. Buckland, B.C., Robinson, D.K., and Chartrain, M. Biocatalysis for pharmaceuticals—status and prospects for a key technology. Metab. Eng., 2, 42, 2000. 24. Liese, A., Seelbach, K., and Wandrey, C. Industrial Biotransformations. Wiley-VCH, Weinheim, Germany, 2000. 25. Sauer, U. et al. The soluble and membrane-bound transhydrogenases UdhA and PntAB have divergent functions in NADPH metabolism of Escherichia coli. J. Biol. Chem., 279, 6613, 2004. 26. Jackson, J.B. Proton translocation by transhydrogenase. FEBS Lett., 545, 18, 2003. 27. Meyer, D., Bühler, B., and Schmid, A. Process and catalyst design objectives for specific redox biocatalysis. Adv. Appl. Microbiol., 59, 53, 2006. 28. Duetz, W.A., van Beilen, J.B., and Witholt, B. Using proteins in their natural environment: potential and limitations of microbial whole-cell hydroxylations in applied biocatalysis. Curr. Opin. Biotechnol., 12, 419, 2001. 29. Fuhrer, T., Fischer, E., and Sauer, U. Experimental identification and quantification of glucose metabolism in seven bacterial species. J. Bacteriol., 187, 1581, 2005. 30. Bühler, B. and Schmid, A. Process implementation aspects for biocatalytic hydrocarbon oxyfunctionalization. J. Biotechnol., 113, 183, 2004. 31. Lee, K. Benzene-induced uncoupling of naphthalene dioxygenase activity and enzyme inactivation by production of hydrogen peroxide. J. Bacteriol., 181, 2719, 1999. 32. Bühler, B. et al. Xylene monooxygenase catalyzes the multistep oxygenation of toluene and pseudocumene to corresponding alcohols, aldehydes, and acids in Escherichia coli JM101. J. Biol. Chem., 275, 10085, 2000. 33. Maruyama, T., Iida, H., and Kakidani, H. Oxidation of both termini of p- and m-xylene by Escherichia coli transformed with xylene monooxygenase gene. J. Mol. Catal. B: Enzym., 21, 211, 2003. 34. Meyer, D., Witholt, B., and Schmid, A. Suitability of recombinant Escherichia coli and Pseudomonas putida strains for selective biotransformation of m-nitrotoluene by xylene monooxygenase. Appl. Environ. Microbiol., 71, 6624, 2005. 35. Harayama, S., Kok, M., and Neidle, E.L. Functional and evolutionary relationships among diverse oxygenases. Annu. Rev. Microbiol., 46, 565, 1992. 36. Hüsken, L.E. et al. High-rate 3-methylcatechol production in Pseudomonas putida strains by means of a novel expression system. Appl. Microbiol. Biotechnol., 55, 571, 2001. 37. Wery, J., Mendes da Silva, D.I., and de Bont, J.A.M. A genetically modified solvent-tolerant bacterium for optimized production of a toxic fine chemical. Appl. Microbiol. Biotechnol., 54, 180, 2000. 38. Walton, A.Z. and Stewart, J.D. An efficient enzymatic Baeyer-Villiger oxidation by engineered Escherichia coli cells under non-growing conditions. Biotechnol. Prog., 18, 262, 2002.

21-26

Future Applications of Metabolic Engineering

39. Abdelkafi, S. et al. Bioconversion of ferulic acid to vanillic acid by Halomonas elongata isolated from table-olive fermentation. FEMS Microbiol. Lett., 262, 115, 2006. 40. Allouche, N. et al. Use of whole cells of Pseudomonas aeruginosa for synthesis of the antioxidant hydroxytyrosol via conversion of tyrosol. Appl. Environ. Microbiol., 70, 2105, 2004. 41. Gonzalez-Lergier, J., Broadbelt, L.J., and Hatzimanikatis, V. Analysis of the maximum theoretical yield for the synthesis of erythromycin precursors in Escherichia coli. Biotechnol. Bioeng., 95, 638, 2006. 42. Pfeifer, B. et al. Process and metabolic strategies for improved production of Escherichia coli-derived 6-deoxyerythronolide B. Appl. Environ. Microbiol., 68, 3287, 2002. 43. Lee, S.J., Song, H., and Lee, S.Y. Genome-based metabolic engineering of Mannheimia succiniciproducens for succinic acid production. Appl. Environ. Microbiol., 72, 1939, 2006. 44. Wubbolts, M.G., FavreBulle, O., and Witholt, B. Biosynthesis of synthons in two-liquid-phase media. Biotechnol. Bioeng., 52, 301, 1996. 45. Hüsken, L.E. et al. Optimisation of microbial 3-methylcatechol production as affected by culture conditions. Biocatal. Biotransf., 20, 57, 2002. 46. Bühler, B. et al. Chemical biotechnology for the specific oxyfunctionalization of hydrocarbons on a technical scale. Biotechnol. Bioeng., 82, 833, 2003. 47. Bühler, B. et al. Use of the two-liquid phase concept to exploit kinetically controlled multistep biocatalysis. Biotechnol. Bioeng., 81, 683, 2003. 48. Phumathon, P. and Stephens, G.M. Production of toluene cis-glycol using recombinant Escherichia coli strains in glucose-limited fed batch culture. Enzyme Microb. Technol., 25, 810, 1999. 49. Prichanont, S., Leak, D.J., and Stuckey, D.C. Alkene monooxygenase-catalyzed whole cell epoxidation in a two-liquid phase system. Enzyme Microb. Technol., 22, 471, 1998. 50. Ishihama, A. Adaptation of gene expression in stationary phase bacteria. Curr. Opin. Genet. Dev., 7, 582, 1997. 51. Gottesman, S. Proteolysis in bacterial regulatory circuits. Annu. Rev. Cell Dev. Biol., 19, 565, 2003. 52. Favre-Bulle, O. et al. Bioconversion of n-octane to octanoic acid by a recombinant Escherichia coli cultured in a two-liquid phase bioreactor. Bio/Technol., 9, 367, 1991. 53. Favre-Bulle, O. et al. Continuous bioconversion of n-octane to octanoic-acid by recombinant Escherichia-coli (alk+) growing in a 2-liquid-phase chemostat. Biotechnol. Bioeng., 41, 263, 1993. 54. Kuepfer, L., Sauer, U., and Blank, L.M. Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res., 15, 1421, 2005. 55. Koebmann, B.J. et al. The glycolytic flux in Escherichia coli is controlled by the demand for ATP. J. Bacteriol., 184, 3909, 2002. 56. Sonderegger, M., Schumperli, M., and Sauer, U. Selection of quiescent Escherichia coli with high metabolic activity. Metab. Eng., 7, 4, 2005. 57. Lee, J.N., Shin, H.D., and Lee, Y.H. Metabolic engineering of pentose phosphate pathway in Ralstonia eutropha for enhanced biosynthesis of poly-β-hydroxybutyrate. Biotechnol. Prog., 19, 1444, 2003. 58. Wandrey, C. Biochemical reaction engineering for redox reactions. Chem. Rec., 4, 254, 2004. 59. Poulsen, B.R. et al. Increased NADPH concentration obtained by metabolic engineering of the pentose phosphate pathway in Aspergillus niger. FEBS J., 272, 1313, 2005. 60. Alphand, V. et al. Towards large-scale synthetic applications of Baeyer-Villiger monooxygenases. Trends Biotechnol., 21, 318, 2003. 61. Leon, R. et al. Whole-cell biocatalysis in organic media. Enzyme Microb. Technol., 23, 483, 1998. 62. Salter, G.J. and Kell, D.B. Solvent selection for whole-cell biotransformations in organic media. Crit. Rev. Biotechnol., 15, 139, 1995. 63. van Beilen, J.B. et al. Practical issues in the application of oxygenases. Trends Biotechnol., 21, 170, 2003. 64. Sikkema, J., de Bont, J.A., and Poolman, B. Mechanisms of membrane toxicity of hydrocarbons. Microbiol. Rev., 59, 201, 1995. 65. Sikkema, J., de Bont, J.A., and Poolman, B. Interactions of cyclic hydrocarbons with biological membranes. J. Biol. Chem., 269, 8022, 1994.

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-27

66. Park, J.B. et al. The efficiency of recombinant Escherichia coli as biocatalyst for stereospecific epoxidation. Biotechnol. Bioeng., 95, 501, 2006. 67. de Bont, J.A.M. Solvent-tolerant bacteria in biocatalysis. Trends Biotechnol., 16, 493, 1998. 68. Isken, S. and de Bont, J.A. Bacteria tolerant to organic solvents. Extremophiles, 2, 229, 1998. 69. Ramos, J.L. et al. Mechanisms of solvent tolerance in gram-negative bacteria. Annu. Rev. Microbiol., 56, 743, 2002. 70. Volkers, R.J.M. et al. Chemostat-based proteomic analysis of toluene-affected Pseudomonas putida S12. Environ. Microbiol., 8, 1674, 2006. 71. Dominguez-Cuevas, P. et al. Transcriptional tradeoff between metabolic and stress-response programs in Pseudomonas putida KT2440 cells exposed to toluene. J. Biol. Chem., 281, 11981, 2006. 72. Carragher, J.M. et al. The use of oxygen uptake rate measurements to control the supply of toxic substrate: toluene hydroxylation by Pseudomonas putida UV4. Enzyme Microb. Technol., 28, 183, 2001. 73. Hack, C.J. et al. Design of a control system for biotransformation of toxic substrates: toluene hydroxylation by Pseudomonas putida UV4. Enzyme Microb. Technol., 26, 530, 2000. 74. Lye, G.J. and Woodley, J.M. Application of in situ product-removal techniques to biocatalytic processes. Trends Biotechnol., 17, 395, 1999. 75. Stark, D. and von Stockar, U. In situ product removal (ISPR) in whole cell biotechnology during the last twenty years. Adv. Biochem. Eng. Biotechnol., 80, 149, 2003. 76. Dong, H.J., Nilsson, L., and Kurland, C.G. Gratuitous overexpression of genes in Escherichia coli leads to growth inhibition and ribosome destruction. J. Bacteriol., 177, 1497, 1995. 77. Gill, R.T., Valdes, J.J., and Bentley, W.E. A comparative study of global stress gene regulation in response to overexpression of recombinant proteins in Escherichia coli. Metab. Eng., 2, 178, 2000. 78. Rinas, U. Synthesis rates of cellular proteins involved in translation and protein folding are strongly altered in response to overproduction of basic fibroblast growth factor by recombinant Escherichia coli. Biotechnol. Prog., 12, 196, 1996. 79. Andersson, L. et al. Impact of plasmid presence and induction on cellular responses in fed batch cultures of Escherichia coli. J. Biotechnol., 46, 255, 1996. 80. Hoffmann, F., Weber, J., and Rinas, U. Metabolic adaptation of Escherichia coli during temperature-induced recombinant protein production: 1. Readjustment of metabolic enzyme synthesis. Biotechnol. Bioeng., 80, 313, 2002. 81. Jürgen, B. et al. Monitoring of genes that respond to overproduction of an insoluble recombinant protein in Escherichia coli glucose-limited fed-batch fermentations. Biotechnol. Bioeng., 70, 217, 2000. 82. Kauffman, K.J. et al. Decreased protein expression and intermittent recoveries in BiP levels result from cellular stress during heterologous protein expression in Saccharomyces cerevisiae. Biotechnol. Prog., 18, 942, 2002. 83. Mori, K. et al. A 22 bp cis-acting element is necessary and sufficient for the induction of the yeast KAR2 (BiP) gene by unfolded proteins. EMBO J., 11, 2583, 1992. 84. Mattanovich, D. et al. Stress in recombinant protein producing yeasts. J. Biotechnol., 113, 121, 2004. 85. Hoffmann, F. and Rinas, U. On-line estimation of the metabolic burden resulting from the synthesis of plasmid-encoded and heat-shock proteins by monitoring respiratory energy generation. Biotechnol. Bioeng., 76, 333, 2001. 86. Sanden, A.M. et al., Limiting factors in Escherichia coli fed-batch production of recombinant proteins. Biotechnol. Bioeng., 81, 158, 2003. 87. Weber, J., Hoffmann, F., and Rinas, U. Metabolic adaptation of Escherichia coli during temperatureinduced recombinant protein production: 2. Redirection of metabolic fluxes. Biotechnol. Bioeng., 80, 320, 2002. 88. Chen, Q., Janssen, D.B., and Witholt, B. Physiological changes and alk gene instability in Pseudomonas oleovorans during induction and expression of alk genes. J. Bacteriol., 178, 5508, 1996.

21-28

Future Applications of Metabolic Engineering

89. Chen, Q., Janssen, D.B., and Witholt, B. Growth on octane alters the membrane lipid fatty acids of Pseudomonas oleovorans due to the induction of alkB and synthesis of octanol. J. Bacteriol., 177, 6894, 1995. 90. Nieboer, M., Kingma, J., and Witholt, B. The alkane oxidation system of Pseudomonas oleovorans: induction of the alk genes in Escherichia coli W3110 (pGEc47) affects membrane biogenesis and results in overexpression of alkane hydroxylase in a distinct cytoplasmic membrane subfraction. Mol. Microbiol., 8, 1039, 1993. 91. Nieboer, M., Vis, A.J., and Witholt, B. Overproduction of a foreign membrane protein in Escherichia coli stimulates and depends on phospholipid synthesis. Eur. J. Biochem., 241, 691, 1996. 92. Han, M.J. et al. Engineering Escherichia coli for increased productivity of serine-rich proteins based on proteome profiling. Appl. Environ. Microbiol., 69, 5772, 2003. 93. Han, M.J. et al. Roles and applications of small heat shock proteins in the production of recombinant proteins in Escherichia coli. Biotechnol. Bioeng., 88, 426, 2004. 94. Wee, Y.J. et al. Biotechnological production of L(+)-lactic acid from wood hydrolyzate by batch fermentation of Enterococcus faecalis. Biotechnol. Lett., 26, 71, 2004. 95. Datta, R. and Henry, M. Lactic acid: recent advances in products, processes and technologies—a review. J. Chem. Tech. Biotechnol., 81, 1119, 2006. 96. Bianchi, M.M. et al. Efficient homolactic fermentation by Kluyveromyces lactis strains defective in pyruvate utilization and transformed with the heterologous LDH gene. Appl. Environ. Microbiol., 67, 5621, 2001. 97. van Maris, A.J. et al. Homofermentative lactate production cannot sustain anaerobic growth of engineered Saccharomyces cerevisiae: possible consequence of energy-dependent lactate export. Appl. Environ. Microbiol., 70, 2898, 2004. 98. Branduardi, P. et al. Lactate production yield from engineered yeasts is dependent from the host background, the lactate dehydrogenase source and the lactate export. Microb. Cell Fact., 5, 2006. 99. Ishida, N. et al. Metabolic engineering of Saccharomyces cerevisiae for efficient production of pure L-(+)-lactic acid. Appl. Biochem. Biotechnol., 129–132, 795, 2006. 100. Liu, T. et al. Scale-up of L-lactic acid production by mutant strain Rhizopus sp. MK-96-1196 from 0.003 m3 to 5 m3 in airlift bioreactors. J. Biosci. Bioeng., 101, 9, 2006. 101. Valli, M. et al. Improvement of lactic acid production in Saccharomyces cerevisiae by cell sorting for high intracellular pH. Appl. Environ. Microbiol., 72, 5492, 2006. 102. van Maris, A.J.A. et al. Microbial export of lactic and 3-hydroxypropanoic acid: implications for industrial fermentation processes. Metab. Eng., 6, 245, 2004. 103. Otto, R. et al. Increase of molar growth yield of Streptococcus cremoris for lactose as a consequence of lactate consumption by Pseudomonas stutzeri in mixed culture. FEMS Microbiol. Lett., 9, 85, 1980. 104. Konings, W.N. Microbial transport: Adaptations to natural environments. Antonie van Leeuwenhoek, 90, 325, 2006. 105. Blank, L.M. et al. Hemin reconstitutes proton extrusion in an H+-ATPase-negative mutant of Lactococcus lactis. J. Bacteriol., 183, 6707, 2001. 106. Werpy, T. and Petersen, G. Top value added chemicals from biomass: Vol. i – results of screening for potential candidates from sugars and synthesis gas. Report of the U.S. Department of Energy (DOE), 2002. 107. Lin, H., Bennett, G.N., and San, K.Y. Metabolic engineering of aerobic succinate production systems in Escherichia coli to improve process productivity and achieve the maximum theoretical succinate yield. Metab. Eng., 7, 116, 2005. 108. Lin, H., Bennett, G.N., and San, K.Y. Genetic reconstruction of the aerobic central metabolism in Escherichia coli for the absolute aerobic production of succinate. Biotechnol. Bioeng., 89, 148, 2005. 109. Lin, H., Bennett, G.N., and San, K.Y. Fed-batch culture of a metabolically engineered Escherichia coli strain designed for high-level succinate production and yield under aerobic conditions. Biotechnol. Bioeng., 90, 775, 2005.

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-29

110. Vemuri, G.N., Eiteman, M.A., and Altman, E. Succinate production in dual-phase Escherichia coli fermentations depends on the time of transition from aerobic to anaerobic conditions. J. Ind. Microbiol. Biotechnol., 28, 325, 2002. 111. Vemuri, G.N., Eiteman, M.A., and Altman, E. Effects of growth mode and pyruvate carboxylase on succinic acid production by metabolically engineered strains of Escherichia coli. Appl. Environ. Microbiol., 68, 1715, 2002. 112. Zaldivar, J., Nielsen, J., and Olsson, L. Fuel ethanol production from lignocellulose: a challenge for metabolic engineering and process integration. Appl. Microbiol. Biotechnol., 56, 17, 2001. 113. van Zyl, W.H. et al. Xylose utilisation by recombinant strains of Saccharomyces cerevisiae on different carbon sources. Appl. Microbiol. Biotechnol., 52, 829, 1999. 114. Sonderegger, M. and Sauer, U. Evolutionary engineering of Saccharomyces cerevisiae for anaerobic growth on xylose. Appl. Environ. Microbiol., 69, 1990, 2003. 115. Hahn-Hägerdal, B. et al. Bio-ethanol—the fuel of tomorrow from the residues of today. Trends Biotechnol., 24, 549, 2006. 116. Karhumaa, K. et al. High activity of xylose reductase and xylitol dehydrogenase improves xylose fermentation by recombinant Saccharomyces cerevisiae. Appl. Microbiol. Biotechnol., 73, 1039, 2006. 117. van Maris, A.J. et al. Alcoholic fermentation of carbon sources in biomass hydrolysates by Saccharomyces cerevisiae: current status. Antonie van Leeuwenhoek, 90, 391, 2006. 118. Kuyper, M. et al. Minimal metabolic engineering of Saccharomyces cerevisiae for efficient anaerobic xylose fermentation: a proof of principle. FEMS Yeast Res., 4, 655, 2004. 119. Jeppsson, M. et al. The expression of a Pichia stipitis xylose reductase mutant with higher Km for NADPH increases ethanol production from xylose in recombinant Saccharomyces cerevisiae. Biotechnol. Bioeng., 93, 665, 2006. 120. Kuyper, M. et al. High-level functional expression of a fungal xylose isomerase: the key to efficient ethanolic fermentation of xylose by Saccharomyces cerevisiae? FEMS Yeast Res., 4, 69, 2003. 121. Kuyper, M. et al. Metabolic engineering of a xylose-isomerase-expressing Saccharomyces cerevisiae strain for rapid anaerobic xylose fermentation. FEMS Yeast Res., 5, 399, 2005. 122. Zhang, M. et al. Metabolic engineering of a pentose metabolism pathway in ethanologenic Zymomonas mobilis. Science, 267, 240, 1995. 123. Bothast, R.J., Nichols, N.N., and Dien, B.S. Fermentations with new recombinant organisms. Biotechnol. Prog., 15, 867, 1999. 124. Kim, I.S., Barrow, K.D., and Rogers, P.L. Kinetic and nuclear magnetic resonance studies of xylose metabolism by recombinant Zymomonas mobilis ZM4(pZB5). Appl. Environ. Microbiol., 66, 186, 2000. 125. Kalnenieks, U. Physiology of Zymomonas mobilis: Some unanswered questions. Adv. Microb. Physiol., 51, 73, 2006. 126. Snoep, J.L. et al. Protein burden in Zymomonas mobilis—negative flux and growth-control due to overproduction of glycolytic enzymes. Microbiology, 141, 2329, 1995. 127. Kiefer, P. et al. Comparative metabolic flux analysis of lysine-producing Corynebacterium glutamicum cultured on glucose or fructose. Appl. Environ. Microbiol., 70, 229, 2004. 128. Ohnishi, J. et al. A novel methodology employing Corynebacterium glutamicum genome information to generate a new L-lysine-producing mutant. Appl. Microbiol. Biotechnol., 58, 217, 2002. 129. Hirao, T. et al. L-Lysine production in continuous culture of an L-lysine hyperproducing mutant of Corynebacterium glutamicum. Appl. Microbiol. Biotechnol., 32, 269, 1989. 130. Petersen, S. et al. Metabolic consequences of altered phosphoenolpyruvate carboxykinase activity in Corynebacterium glutamicum reveal anaplerotic regulation mechanisms in vivo. Metab. Eng., 3, 344, 2001. 131. Wittmann, C. and Heinzle, E. Genealogy profiling through strain improvement by using metabolic network analysis: metabolic flux genealogy of several generations of lysine-producing corynebacteria. Appl. Environ. Microbiol., 68, 5843, 2002.

21-30

Future Applications of Metabolic Engineering

132. Klapa, M.I., Aon, J.C., and Stephanopoulos, G. Systematic quantification of complex metabolic flux networks using stable isotopes and mass spectrometry. Eur. J. Biochem., 270, 3525, 2003. 133. Koffas, M. and Stephanopoulos, G. Strain improvement by metabolic engineering: lysine production as a case study for systems biology. Curr. Opin. Biotechnol., 16, 361, 2005. 134. Wendisch, V.F. et al. Emerging Corynebacterium glutamicum systems biology. J. Biotechnol., 124, 74, 2006. 135. Sonntag, K. et al. C-13 NMR studies of the fluxes in the central metabolism of Corynebacterium glutamicum during growth and overproduction of amino acids in batch cultures. Appl. Microbiol. Biotechnol., 44, 489, 1995. 136. Marx, A. et al. Determination of the fluxes in the central metabolism of Corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolite balancing. Biotechnol. Bioeng., 49, 111, 1996. 137. Petersen, S. et al. In vivo quantification of parallel and bidirectional fluxes in the anaplerosis of Corynebacterium glutamicum. J. Biol. Chem., 275, 35932, 2000. 138. Menzella, H.G. et al. Redesign, synthesis and functional expression of the 6-deoxyerythronolide B polyketide synthase gene cluster. J. Ind. Microbiol. Biotechnol., 33, 22, 2006. 139. Minas, W. et al. Improved erythromycin production in a genetically engineered industrial strain of Saccharopolyspora erythraea. Biotechnol. Prog., 14, 561, 1998. 140. Elander, R.P. Industrial production of beta-lactam antibiotics. Appl. Microbiol. Biotechnol., 61, 385, 2003. 141. Pfeifer, B.A. et al. Biosynthesis of complex polyketides in a metabolically engineered strain of E. coli. Science, 291, 1790, 2001. 142. Peiru, S. et al. Production of the potent antibacterial polyketide erythromycin C in Escherichia coli. Appl. Environ. Microbiol., 71, 2539, 2005. 143. Mutka, S.C. et al. Metabolic pathway engineering for complex polyketide biosynthesis in Saccharomyces cerevisiae. FEMS Yeast Res., 6, 40, 2006. 144. Varma, A. and Palsson, B.O. Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type Escherichia coli W3110. Appl. Environ. Microbiol., 60, 3724, 1994. 145. Stahmann, K.P., Revuelta, J.L., and Seulberger, H. Three biotechnical processes using Ashbya gossypii, Candida famata, or Bacillus subtilis compete with chemical riboflavin production. Appl. Microbiol. Biotechnol., 53, 509, 2000. 146. Humbelin, M. et al. GTP cyclohydrolase II and 3,4-dihydroxy-2-butanone 4-phosphate synthase are rate-limiting enzymes in riboflavin synthesis of an industrial Bacillus subtilis strain used for riboflavin production. J. Ind. Microbiol. Biotechnol., 22, 1, 1999. 147. Perkins, J.B. et al. Genetic engineering of Bacillus subtilis for the commercial production of riboflavin. J. Ind. Microbiol. Biotechnol., 22, 8, 1999. 148. Sauer, U. and Bailey, J.E. Estimation of P-to-O ratio in Bacillus subtilis and its influence on maximum riboflavin yield. Biotechnol. Bioeng., 64, 750, 1999. 149. Chong, B.F. and Nielsen, L.K. Amplifying the cellular reduction potential of Streptococcus zooepidemicus. J. Biotechnol., 100, 33, 2003. 150. Sauer, U., Cameron, D.C., and Bailey, J.E. Metabolic capacity of Bacillus subtilis for the production of purine nucleosides, riboflavin, and folic acid. Biotechnol. Bioeng., 59, 227, 1998. 151. Zamboni, N. et al. Reducing maintenance metabolism by metabolic engineering of respiration improves riboflavin production by Bacillus subtilis. Metab. Eng., 5, 49, 2003. 152. Shibasaki, T., Mori, H., and Ozaki, A. Enzymatic production of trans-4-hydroxy-L-proline by regioand stereospecific hydroxylation of L-proline. Biosci. Biotech. Biochem., 64, 746, 2000. 153. Vemuri, G.N., Eiteman, M.A., and Altman, E. Increased recombinant protein production in Escherichia coli strains with overexpressed water-forming NADH oxidase and a deleted ArcA regulatory protein. Biotechnol. Bioeng., 94, 538, 2006.

Energy and Cofactor Issues in Fermentation and Oxyfunctionalization Processes

21-31

154. Shibasaki, T. et al. Construction of a novel hydroxyproline-producing recombinant Escherichia coli by introducing a proline 4-hydroxylase gene. J. Biosci. Bioeng., 90, 522, 2000. 155. Berry, A. et al. Application of metabolic engineering to improve both the production and use of biotech indigo. J. Ind. Microbiol. Biotechnol., 28, 127, 2002. 156. Ensley, B. et al. Expression of naphthalene oxidation genes in Escherichia coli results in the biosynthesis of indigo. Science, 222, 167, 1983. 157. Murdock, D. et al. Construction of metabolic operons catalyzing the de novo biosynthesis of indigo in Escherichia coli. Bio/Technology, 11, 381, 1993. 158. van Beilen, J.B. et al. Analysis of Pseudomonas putida alkane-degradation gene clusters and flanking insertion sequences: evolution and regulation of the alk genes. Microbiology, 147, 1621, 2001. 159. Witholt, B. et al. Bioconversions of aliphatic compounds by Pseudomonas oleovorans in multiphase bioreactors: background and economic potential. Trends Biotechnol., 8, 46, 1990. 160. Bosetti, A. et al. Production of primary aliphatic alcohols with a recombinant Pseudomonas strain, encoding the alkane hydroxylase system. Enzyme Microb. Technol., 14, 702, 1992. 161. Katopodis, A.G. et al. Mechanistic studies on non-heme iron monooxygenase catalysis: Epoxidation, aldehyde formation, and demethylation by the ω-hydroxylation system of Pseudomonas oleovorans. J. Am. Chem. Soc., 106, 7928, 1984. 162. May, S. and Katopodis, A.G. Oxygenation of alcohol and sulphide substrates by a prototypical non-haem iron monooxygenase: catalysis and biotechnological potential. Enzyme Microb. Technol., 8, 17, 1986. 163. Rothen, S.A. et al. Biotransformation of octane by E. coli HB101[pGEc47] on defined medium: Octanoate production and product inhibition. Biotechnol. Bioeng., 58, 356, 1998. 164. Favre-Bulle, O. and Witholt, B. Biooxidation of n-octane by a recombinant Escherichia coli in a twoliquid-phase system: effect of medium components on cell growth and alkane oxidation activity. Enzyme Microb. Technol., 14, 931, 1992. 165. Walton, A.Z. and Stewart, J.D. Understanding and improving NADPH-dependent reactions by nongrowing Escherichia coli cells. Biotechnol. Prog., 20, 403, 2004. 166. Doig, S.D. et al. Reactor operation and scale-up of whole cell Baeyer-Villiger catalyzed lactone synthesis. Biotechnol. Prog., 18, 1039, 2002. 167. Hilker, I. et al. Microbial Transformations 59: First kilogram scale asymmetric microbial BaeyerVilliger oxidation with optimized productivity using a resin-based in situ SFPR strategy. Biotechnol. Bioeng., 92, 702, 2005. 168. Panke, S. et al. Towards a biocatalyst for (S)-styrene oxide production: characterization of the styrene degradation pathway of Pseudomonas sp. strain VLB120. Appl. Environ. Microbiol., 64, 2032, 1998. 169. Otto, K. et al. Biochemical characterization of StyAB from Pseudomonas sp. strain VLB120 as a twocomponent flavin-diffusible monooxygenase. J. Bacteriol., 186, 5292, 2004. 170. Panke, S. et al. Production of enantiopure styrene oxide by recombinant Escherichia coli synthesizing a two-component styrene monooxygenase. Biotechnol. Bioeng., 69, 91, 2000. 171. Panke, S. et al. Pilot-scale production of (S)-styrene oxide from styrene by recombinant Escherichia coli synthesizing styrene monooxygenase. Biotechnol. Bioeng., 80, 33, 2002. 172. Collins, A.M., Woodley, J.M., and Liddell, J.M. Determination of reactor operation for the microbial hydroxylation of toluene in a 2-liquid phase process. J. Ind. Microbiol., 14, 382, 1995. 173. Lilly, M.D. and Woodley, J.M. A structured approach to design and operation of biotransformation processes. J. Ind. Microbiol., 17, 24, 1996. 174. Hüsken, L.E. et al. Integrated bioproduction and extraction of 3-methylcatechol. J. Biotechnol., 88, 11, 2001. 175. Hüsken, L.E. et al. Membrane-facilitated bioproduction of 3-methylcatechol in an octanol/water two-phase system. J. Biotechnol., 96, 281, 2002. 176. Bühler, B. et al. Characterization and application of xylene monooxygenase for multistep biocatalysis. Appl. Environ. Microbiol., 68, 560, 2002.

21-32

Future Applications of Metabolic Engineering

177. Bühler, B. et al. Analysis of two-liquid-phase multistep biooxidation based on a process model: Indications for biological energy shortage. Org. Proc. Res. Dev., 10, 628, 2006. 178. Zilberstein, D. et al. Escherichia coli intracellular pH, membrane potential, and cell growth. J. Bacteriol., 158, 246, 1984. 179. Slonczewski, J.L. et al. pH homeostasis in Escherichia coli: measurement by 31P nuclear magnetic resonance of methylphosphonate and phosphate. Proc. Natl. Acad. Sci. USA, 78, 6271, 1981. 180. Gbewonyo, K., Buckland, B.C., and Lilly, M.D. Development of a large-scale continuous substrate feed process for the biotransformation of simvastatin by Nocardia sp. Biotechnol. Bioeng., 37, 1101, 1991. 181. Hujanen, M. et al. Optimisation of media and cultivation conditions for L(+)(S)-lactic acid production by Lactobacillus casei NRRL B-441. Appl. Microbiol. Biotechnol., 56, 126, 2001. 182. Bai, D.M. et al. Fed-batch fermentation of Lactobacillus lactis for hyper-production of L-lactic acid. Biotechnol. Lett., 25, 1833, 2003. 183. Yun, J.S., Wee, Y.J., and Ryu, H.W. Production of optically pure L(+)-lactic acid from various carbohydrates by batch fermentation of Enterococcus faecalis RKY1. Enzyme Microb. Technol., 33, 416, 2003. 184. Dien, B.S., Nichols, N.N., and Bothast, R.J. Recombinant Escherichia coli engineered for production of L-lactic acid from hexose and pentose sugars. J. Ind. Microbiol. Biotechnol., 27, 259, 2001. 185. Wang, Q. et al. Genome-scale in silico aided metabolic analysis and flux comparisons of Escherichia coli to improve succinate production. Appl. Microbiol. Biotechnol., 73, 887, 2006. 186. Moniruzzaman, M. et al. Fermentation of corn fibre sugars by an engineered xylose utilizing Saccharomyces yeast strain. World J. Microbiol. Biotechnol., 13, 341, 1997. 187. Joachimsthal, E., Haggett, K.D., and Rogers, P.L. Evaluation of recombinant strains of Zymomonas mobilis for ethanol production from glucose/xylose media. Appl. Biochem. Biotechnol., 77-9, 147, 1999. 188. Zelic, B. et al. Process strategies to enhance pyruvate production with recombinant Escherichia coli: from repetitive fed-batch to in situ product recovery with fully integrated electrodialysis. Biotechnol. Bioeng., 85, 638, 2004. 189. Zeng, A.P. Pathway and kinetic analysis of 1,3-propanediol production from glycerol fermentation by Clostridium butyricum. Bioprocess Eng., 14, 169, 1996. 190. Gonzalez-Pajuelo, M., Andrade, J.C., and Vasconcelos, I. Production of 1,3-propanediol by Clostridium butyricum VPI 3266 in continuous cultures with high yield and productivity. J. Ind. Microbiol. Biotechnol., 32, 391, 2005. 191. Cameron, D.C. et al. Metabolic engineering of propanediol pathways. Biotechnol. Prog., 14, 116, 1998. 192. Nakamura, C.E. and Whited, G.M. Metabolic engineering for the microbial production of 1,3-propanediol. Curr. Opin. Biotechnol., 14, 454, 2003. 193. Baez-Viveros, J.L. et al. Metabolic engineering and protein directed evolution increase the yield of L-phenylalanine synthesized from glucose in Escherichia coli. Biotechnol. Bioeng., 87, 516, 2004. 194. Hols, P. et al. Conversion of Lactococcus lactis from homolactic to homoalanine fermentation through metabolic engineering. Nat. Biotechnol., 17, 588, 1999. 195. Stouthamer, A.H. and van Verseveld, H.W. Microbial energetics should be considered in manipu lating metabolism for biotechnological purposes. Trends Biotechnol., 5, 149, 1987. 196. Wlaschin, A.P. et al. The fractional contributions of elementary modes to the metabolism of Escherichia coli and their estimation from reaction entropies. Metab. Eng., 8, 338, 2006. 197. Linton, J.D. Metabolite production and growth efficiency. Antonie van Leeuwenhoek, 60, 293, 1991. 198. Garcia-Ochoa, F. et al. Xanthan gum: production, recovery, and properties. Biotechnol. Adv., 18, 549, 2000. 199. Bühler, B. et al. NADH availability limits asymmetric biocatalytic epoxidation in growing recombinant Escherichia coli. Appl. Environ. Microbiol., 74, 1436, 2008. 200. Blank, L.M. et al. Metabolic capacity estimation of Escherichia coli as platform for redox biocatalysis: Constraint based modeling and experimental verification. Biotechnol. Bioeng., 100, 1050, 2008. 201. Blank, L.M. et al. Metabolic response of Pseudomonas putida during redox biocatalysis in the presence of a second octanol phase. FEBS J., 275, 5173.

22 Microbial Biosynthesis of Fine Chemicals: An Emerging Technology 22.1 22.2

Introduction ��22-1 Flavonoids: A Natural Medicine................................................ 22-2

22.3

Isoprenoids �� 22-6

22.4

S pecialized Fine Chemicals from Microorganism Biosynthesis ��22-13

Initiating the Biosynthesis of Flavonoids • Flavonoid Production in Plant Cell Cultures • Pathway Expression in Recombinant Hosts

Zachary L. Fowler State University of New York at Buffalo

Mattheos Koffas State University of New York at Buffalo

Microbial Biosynthesis of Carotenoids • Coenzyme Q10: The Ubiquitous Quinone • Terpenoids

Polyketides • Microbial Synthesis of Chain Molecules • Pigments, Flavor, and Fragrance • Strategies and Trends • New Approaches on Old Methods • Engineering the Genetic Machinery

References �� 22-23

22.1 Introduction The emergence of recombinant DNA technologies has provided the experimental means to not only explore cellular function but to alter it toward a specific goal, such as to solely synthesize and extract a protein or to catalyze the synthesis of an important chemical product 2. In that respect, we are currently experiencing the emergence of biotechnology as the technology of choice for the chemical and materials industries. In this context, we will discuss some current fine chemicals produced in a biotechnological process through recombinant microbial biosynthesis, the method typically followed by purification, and possibly additional chemical processes, to a create high quality, low energy processes. Fine chemicals are defined as organic molecules whose production, typically in low amounts, has a designed goal in mind. As an alternative to chemical synthesis modes that usually require lengthy developmental times, and typically result in low yields due to their often-complicated chemical structure, many fine chemicals are readily suited for microbial biosynthesis since they occur as end points or intermediates of natural metabolic networks. For example, the cancer drug taxol, one of a few major isoprenoids we will discuss, was widely developed through chemical synthesis yet is now being explored for microbial biosynthesis to increase productivity and reduce cost and energy. It is important to note that recombinant DNA technology, one of the enabling technologies for metabolic engineering, did not initiate the biotechnology era. Initially the development of chemical synthesis in microorganisms was highly reliant on experimentation via bioprospecting and random genetic alterations followed by assessment of the resulting physiology (trial-and-error methods). More recently a paradigm shift has occurred toward the more rational and systematic approach of microbial 22-1

22-2

Future Applications of Metabolic Engineering

biosynthesis (Bailey, 1991; Stephanopoulos and Vallino, 1991) whereby taking advantage of the recent developments in molecular biology techniques, analytical methods and improved mathematical models. This approach to altering cellular physiology is commonly known as metabolic engineering. The field of metabolic engineering has been expanding since the early 1980s though the term was not actually coined until 1991 by Jay Bailey (Bailey, 1991). While this is not a chapter on metabolic engineering, these methods have been and will continue to be highly influential in the development of recombinant microorganism production platforms for fine chemicals. One of the earliest accomplishments of recombinant DNA technology was the engineering of Escherichia coli for the synthesis of human insulin (Goeddel et al., 1979; Wetzel et al., 1981), whereby the cloning of A and B subunits of insulin separately into an E. coli expression plasmid resulted in the generation of the native human protein after the two subunits were chemically adjoined in vitro. In other early work, David Hopwood synthesized novel isochromanequinone antibiotics in Streptomyces coelicolor A3, initiating the era of fine chemical production from recombinant microrganisms (Hopwood et al., 1985). Recombinant DNA techniques that have been implemented include: (1) the addition or deletion of genes to increase the metabolic carbon flux to desired end products where overexpressions can involve single or multiple gene insertions or even the insertion of a whole genome (thereby forming a hybrid genome). Deletions are performed via direct selection or the probabilistic prediction of gene targets that, upon removal, result in carbon flow redistribution toward desired metabolic networks. For example, recently Causey et al. engineered efficient ethanol production in recombinant E. coli by developing deletion mutant strains with minimal energy requirements and lowered biosynthesis levels of acetate and other fermentative byproducts (Causey et al., 2004); (2) site specific or random point mutagenesis, directed evolution or other genetic altering mechanisms that are used to evolve targeted enzymes, metabolic pathways or entire genomes and (3) development of new approaches to alter the cellular physiology toward a specific goal by incorporating artificial circuit mimicry, cell-to-cell communication or synthetic regulatory processes. In this chapter, we will investigate a number of the fine chemicals being produced through the combination of these techniques with a particular emphasis on the metabolic engineering of two genetically tractable hosts, namely E. coli and Saccharomyces cerevisiae.

22.2 Flavonoids: A Natural Medicine Flavonoids are plant secondary metabolites with over 8,000 configurations of the basic 15-carbon phenylpropanoid core that can be diversified through a variety of alkylation, oxidation, and glycosylation reactions. Flavonoids, being active estrogenic, antioxidant, antiviral, antibacterial, antiobesity, and anticancer molecules are of major interest for personal health applications (Forkmann and Martens, 2001). As such, they have generated extensive interest from pharmaceutical companies keen on their nutraceutical properties and as possible precursors for market pharmaceuticals. They are currently being used as dietary supplements and intensively investigated for use as treatments to many chronic human pathological conditions including cancer and diabetes (Allister et al., 2005; Caltagirone et al., 2000; Hou et al., 2004; McDougall and Stewart, 2005; Popiolkiewicz et al., 2005; Potter et al., 1998; Pouget et al., 2001; Zava and Duwe, 1997). For example anthocyanins (a class of colored glycosylated flavonoids) while of interest as possible replacement of banned or artificial dyes with real or perceived adverse effects, have come under increased attention as general antioxidants that may play a role in the reduction of various diseases (Hannum, 2004; Nakajima et al., 2001), such as obesity (Greenwald, 2004). As glycosylation reactions remain a challenging step for conventional organic chemistry in addition to the difficulties posed by acylations at specific positions, biochemical approaches for the large-scale production of these important molecules remain the only alternative to current methods reliant on plant extractions. This is especially true since the health-promoting effects so prevalent in these compounds have been the driving force toward the elucidation of their biosynthetic pathways with significant advances in the recent past.

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-3

22.2.1 Initiating the Biosynthesis of Flavonoids Depending on their chemistry, flavonoid molecules can be classified in five major classes, namely flavones, flavonols, isoflavones, flavanols, and anthocyanins, that all are derived from the common flavanone precursors (Winkel-Shirley, 2001). Flavanones are synthesized from phenylalanine through a five-step enzymatic process (Figure 22-1). Phenylalanine is first converted into cinnamic acid by the enzyme phenylalanine ammonia lyase (PAL) and then subsequently hydroxylated into p-coumaric acid by the enzyme cinnamate 4-hydroxylase (C4H). Next, p-coumaric acid is converted into a coumaroyl coenzyme A (CoA) ester by 4-coumaroyl:CoA ligase (4CL). Following the ligation reaction, chalcone synthase (CHS), the first committed step in flavonoid biosynthesis, catalyzes the sequential decarboxylative condensation of three acetate units from malonyl-CoA to 4-coumaroyl-CoA. This results in a linear phenylpropanoid tetraketide that forms 4,2′,4′,6′-tetrahydroxychalcone via intramolecular cyclization and aromatization (Austin and Noel, 2003). The formation of flavanones from chalcones then occurs through an isomerization performed by the enzyme chalcone isomerase (CHI). Various biosynthetic enzymes further down this upper pathway are accountable for catalyzing the conversion of flavanones into the plethora of flavonoid molecules with reactions involving hydroxylation, reduction, oxidation, glycosylation, methylation, and acylation (Figure 22.1). For more details on the pathway’s biochemistry, please see references Chemler et al. 2006, Leonard et al. 2005, Yan et al. 2005a. All of the flavonoid biosynthetic enzymes were originally thought to be plant derived, however reports have recently appeared demonstrating the presence of polyketide synthases in microorganisms that are also able to perform flavanone biosynthesis (Ueda et al., 1995).

22.2.2 Flavonoid Production in Plant Cell Cultures Several attempts have been made to produce and extract flavonoids from plants with the purpose of utilizing them as nutraceuticals and natural colorants. Although colorant agents have been used in food technology for a very long time, this industry has not been comprehensively documented and tends to remain secretive. Size estimations of color agent markets in both volume and value are therefore difficult to ascertain (Marz, 1996), nonetheless the most recent data available estimates revenues for the overall European polyphenols market in 2008 at $144 million (Frost & Sullivan, 2003). Leading this market expansion is red fruit anthocyanins as well as green tea flavonoids followed by grape and olive polyphenols (NutraUSAingredients.com, 2004), with all compounds currently derived from plant extracts. Additionally, bioreactor-based systems for mass production of flavonoids have been described for a few species (Kobayashi et al., 1993; Zhong et al., 1991), but to date economic feasibility has not been established, partly because of engineering challenges in large scale cultivation of plant cultures. One challenge is that plant cells tend to form aggregates that influence culture productivity (Hanagata et al., 1993) since cells within aggregates are not adequately exposed to the required lighting needed in flavonoid biosynthesis by plant tissue. For example, formation of PAL, a key enzyme in the biosynthetic pathway is promoted primarily by UV wavelengths, particularly those of the UV-B region (Wellmann, 1975). Other enzymes in the pathway, particularly those of the anthocyanin biosynthetic branch, appear to be regulated in part by UV and in part by the phytochrome, activating wavelengths of 700–800 nm (Meyer et al., 2002). In that respect, irradiance becomes a limiting factor to productivity not only when excluded from cells at the interior of an aggregate (Hall and Yeoman, 1986), but also when in a dense cell culture, at reduced cell dosing, or when the vessel wall composition selectively restricts certain wavelengths (Smith and Spomer, 1995). Expression of flavonoid biosynthetic genes in transgenic plants has also been investigated for production of these ubiquitous secondary metabolites. In a recent study, CHI from Saussurea medusa was transformed into Nicotiana tabacum plants, a nonleguminous species, and resulted in an up to five fold total flavonoid production increase when compared to wild-type plants (Li et al., 2006). The flavonoid sub group isoflavonoids, predominantly synthesized in leguminous plants by isoflavone synthase (IFS) from flavanone, play key roles in plant physiology, acting as signal molecules in

22-4

Future Applications of Metabolic Engineering OH C4H

NH2

HOOC Phenylalanine

OH

HOOC p-Coumaric acid

PAL HOOC Cinnamic acid

OH HOOC Caffeic acid

4CL R1

PAL - phenylalanine lyase C4H - cinnamate 4-hydroxylase 4CL - 4-coumaryl:CoA lyase CHS - chalcone synthase CHI - chalcone isomerase DFR - dihydroxyflavanone reductase ANS - anthocyanidin synthase IFS - isoflavone synthase 3GT - 3-O-glucosyltransferase FSI - flavone synthase FHT - flananone 3b-hydoxylase FLS - flavonol synthase

R2 COSCoA Acid-CoA complex

H2C

O COSCoA Malonyl-CoA

x3

CHS

COOH

R1 OH

HO

R2

OH O Chalcones CHI

R1 HO

O+

OH

DFR ANS

OH OH Anthocyanidins

OH O Flavanones

FHT FLS

FSI

3GT

R1

R1 O+

OH

R1

O

HO

HO

R2

O-Glc R2 OH Anthocyanin 3-O-glucosides

Flavonoid class

Isoflavones flavan-4-ols

R2

R2

HO

IFS DFR

R1

O

HO

OH R2 OH O Flavonols

OH O Flavones

R1=H

R2=H

R1= OH

OH

O

R2=H

R1= OH

R2= OH

Flavanones

(2S)-Pinocembrin

(2S)-Naringenin

(2S)-Eriodictyol

Flavones

Apigenin

Luteolin

Chrysin

Floavonols

Kaempferol

Quercetin

Myrecetin

Anthocyanidins

Palargonidin

Cyanidin

Delphinidin

Anthocyanin 3-O-glucosides

Palargonidin 3-O-glucoside

Cyanidin 3-O-glucoside

Delphinidin 3-O-glucoside

Figure 22.1 (See color insert following page 13-20.) Metabolic network for biosynthesis of the variety of flavonoids is shown with relevant genes for each reaction highlighted. The structure of each class is shown with the R groups (i.e., H, OH) dependent on the substrate fed to the pathway.

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-5

lant–bacteria interactions, specifically plant nodulation (for reviews, see Cornwell et al., 2004 and p Dixon, 2004). Because leguminous plants are not widely consumed, engineering of isoflavonoid biosynthesis in more commonly utilized crop plants have been investigated to increase the availability of these health-promoting metabolites in human diets. Tobacco again was used as a case study where, soybean IFS1 was cloned into a binary vector consisting of the cauliflower mosaic virus 35S promoter. The microorganism Agrobacterium tumefaciens was transformed with the IFS carrying plasmid for the subsequent transfection of tobacco plants (Yu et al., 2000). Even though the transgenic tobacco plants successfully synthesized isoflavonoids, the amounts were low, due to competing pathways leading to the biosynthesis of another flavonoid class, the anthocyanins. This diversion of metabolic flux, as well as issues inherent with plant cell suspension cultures (Hellwig et al., 2004), are some of the reasons why until now no such system has been established for the commercial, large-scale production of flavonoids despite a more than 10 year effort in this field.

22.2.3 Pathway Expression in Recombinant Hosts The important pharmacological properties of flavonoids and the limited availability of purified forms from plants have inspired the production of these secondary metabolites in recombinant hosts. E. coli and S. cerevisiae have been engineered to harbor flavonoid biosynthetic pathways and synthesize flavanones (Hwang et al., 2003; Kaneko et al., 2003; Miyahisa et al., 2005b; Yan et al., 2005b), flavones (Leonard et al., 2005), flavonols (Leonard et al., 2006; Miyahisa et al., 2005a), and anthocyanins (Yan et al., 2005a). In all cases, the biosynthetic genes were episomally inserted into E. coli or S. cerevisiae using coreplicable plasmids. 22.2.3.1 Recombinant E. coli for Flavonoid Biosynthesis Among the numerous natural pigments found in plants, anthocyanins are the largest water soluble group and can be found in most flower petals, leaves, and fruit skins. Anthocyanins are derived from the flavanones naringenin and eriodictyol through four consecutive synthesis steps (Figure 22.1). The enzyme flavanone 3β-hydroxylase (FHT) first converts flavanones to dihydroflavonols which are then reduced by dihydroflavanone 4-reductase (DFR) to form the leucoanthocyanidins. Further processing by the enzymes anthocyanidin synthase (ANS) and UDP-glucose:flavonoid 3-O-glucosyltransferase (3-GT) forms the first stable anthocyanins, such as B-ring monohydroxylated pelargonidin 3-O-glucoside (with an orange/red color) and the dihydroxylated cyanidin 3-O-glucoside (with a red color). It is important to note that anthocyanin a-glycons (known as anthocyanidins) are unstable metabolites therefore the requirement for a sugar side-chain is essential to form a stable soluble product. In the study by Yan et al., the four step metabolic pathway was constructed in E. coli using biosynthetic genes derived from heterologous origins. After using PCR to place the genes under trc promoters and bacterial ribosomal binding sites, the enzymes were cloned into the low-copy number vector pK184 for transformation. The constructed recombinant pathway enabled the conversion of naringenin and eriodictyol to produce the corresponding glycosolated anthocyanins, albeit at low production levels (Yan et al., 2005a). This was the first demonstration of production of plant-specific anthocyanins by a microorganism, opening the way for optimization and further synthesis of other natural and nonnatural anthocyanins via enzyme and pathway engineering (Yan et al., 2008, Leonard et al., 2008). It was noted that the production of the side product flavonols existed due in part to an alternate reaction of ANS which was assumed to be the cause of the low production levels (Yan et al., 2005a). One of the biggest challenges however for engineering the flavonoid (and many other phytochemical’s) biosynthetic pathway in E. coli is the functional expression of plant cytochromic P450 monoxygenases. This is a class of enzymes catalyzing regiospecific and stereospecific oxidation of nonactivated carbohydrates at moderate temperatures. As such, they are heavily involved in the functionalization of various natural products in general and flavonoids in particular. Two important requirements of P450s are their attachment to the eukaryotic cell’s endoplasmic reticulum (ER) membrane and a P450

22-6

Future Applications of Metabolic Engineering

reductase for transporting electrons from the NADPH donor to the heme core of the P450 complex. These requirements are the reasons why functional expression of these enzymes in E. coli is a challenging task: as a prokaryote, E. coli lacks ER and at the same time it does not have a P450-redox partner protein. Efforts to engineer a “soluble” version of the P450 monooxygenases has generally resulted in enzymes with low solubility and the formation of inclusion bodies, especially in cases of high expression levels, something that results in extensive cell lysis. Such was the case for the generation of an active flavonoid 3′5′-hydroxylase derived from Catharanthus roseus in E. coli for the biosynthesis of hydroxylated flavonols, such as quercetin and myricetin (Leonard et al., 2006). Specifically, the nucleotides of the fifth codon were replaced to ATG along with the removal of four N-terminal codons. Moreover, the second codon of the shortened F3′5′H which encodes for leucine was changed into alanine. In order to compensate for the lack of a P450 reductase in E. coli, a shortened P450-reductase also derived from C. roseus was fused with the modified F3′5′H through a short linker sequence having no preference for the formation of secondary structures reducing interference with the folding of the two proteins. When the engineered chimera was expressed in E. coli together with a grafted flavanone biosynthetic pathway, small amounts of quercetin could be recovered from the culture media, but the tri-hydroxylated myricetin could only be produced when the recombinant strain was supplemented with the monohydroxylated flavanone naringenin. Since cell lysis was evident in the culture, it was obvious that several other parameters, including the use of weaker promoters and lower copy number plasmids are also required for biocatalysis optimization. Overall, key issues remain unaddressed for optimal flavonoid biosynthesis in E. coli, including further optimization of the functional expression P450 hydroxylases and increases to the intracellular pool of malonyl-CoA. 22.2.3.2 Flavonoid Biosynthesis in Recombinant Yeast As a eukaryote, S. cerevisiae offers the advantage of supporting the functional expression of ER membrane-bound P450 monoxygeanses. This feature, together with the expectation of better expression of the plant-derived flavonoid biosynthetic enzymes makes yeast an attractive alternative to E. coli as a production platform for flavonoid molecules. The first two enzymes of the phenylpropanoid pathway, namely PAL and 4CL were first positively expressed by Ro and Douglas in S. cerevisiae together with a P450 reductase (Ro and Douglas, 2004); this work allowed the investigation of the formation of “metabolons” or complexes between these two enzymes. In another study, the synthesis of the flavonoid naringenin from cinnamic acid was afforded by a recombinant S. cerevisiae strain expressing 4CL, CHS, CHI, and the P450 cinnamate 4-hydroxylase (C4H) that performs the hydroxylation of cinnamic acid into p-coumaric acid (Yan et al., 2005b). An extension of this work involved the biosynthesis of flavones in yeast. Derived from flavanones, the flavones such as apigenin and luteolin are plant flavonoids with potent medicinal properties (Caltagirone et al., 2000). They are synthesized by two distinct enzymes; the soluble flavone synthase I (FSI) that is found only among plants that belong to the Apiceae family (such as parsley), and the membrane-bound flavone synthase II (FSII) together with the yeast P450 reductase. Metabolically engineered S. cerevisiae overexpressing FSI together with a flavanone biosynthetic pathway resulted in 50% higher production than the FSII expressing recombinant strain (Leonard et al., 2006) with further optimization involving the use of alternative carbon sources, such as acetate.

22.3 Isoprenoids The diverse group of molecules known as isoprenoids includes carotenoids, terpenes, sterols, polyprenyl alcohols, ubiquinone, and even prenylated proteins. Many isoprenoids currently have significant biotechnological value for their roles as natural food colorants (carotenoids), antioxidants, natural aromas, and flavors (terpenes), and for their antiparasitic and anticancer properties (Haynes and Krishna, 2004; Lee and Schmidt-Dannert, 2002). Their name originates from the five-carbon molecule isoprene, a molecule

22-7

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

ubiquitous in all plant species from which most isoprenoids are derived (Barkovich and Liao, 2001). Isoprene is used to synthesize the common precursor molecule isopentenyl diphosphate (IDP) by two major metabolic pathways: the mevalonic acid pathway or the deoxyxylulose 5-phosphate (DXP) pathway.

22.3.1 Microbial Biosynthesis of Carotenoids Carotenoids, a subfamily of isoprenoids, are synthesized from the general terpenoid pathway and involve geranylgeranyl diphosphate (GGDP) synthase (encoded by crtE) and phytoene synthase (crtB) for the production of the carotenoid phytoene (Figure 22.2). Subsequent desaturation by phytoene desaturase (crtI), and further enzymatic modifications generate a repertoire of carotenoid molecules responsible for the yellow, orange, and red pigments naturally synthesized in bacteria, algae, and fungi. Carotenoids typically have a 40-carbon (C 40) backbone formed from the condensation of four isoprene units, however the existence of 30, 45, and 50-carbone backbones has also been shown (Chemler et al., 2006; Tao et al., 2005b; Tobias and Arnold, 2006). In efforts to enable biochemical production of these fine chemicals, the enzymes responsible for carotenoid synthesis have been successfully expressed in noncarotenogenic microbes such as E. coli and yeast with the resulting recombinant strains currently under further optimization. Typically synthesis of carotenoids begins with head-to-head condensations of IDP to form the twentycarbon GGDP molecule by ispA and crtE. This is then followed by the unification of two GGDP molecules, again via a head-to-head condensation, by crtB. The further desaturation by phytoene desaturase results in the synthesis of the first colored carotenoid lycopene, a compound of high nutritional importance present in high natural concentrations within tomatoes. Lycopene is made up of 11 trans double bonds Dimethlallyl diphosphate (DMADP)

Isopentyl diphosphate (IDP)

ispA

P

OPP

OP

idi

Farnesyl diphosphate (FDP) IDP

C30 Carotenoids

P

crtM crtN

ispA

OP

IDP

OP

Geranyl diphosphate (GDP)

P

OPP

crtE

Geranylgeranyl diphosphate (GGDP) GGDP

crtB Phytoene crtI

crtEb crtYe/Yf

Iycopene crtY

C50 Carotenoids

β, β-carotene

Figure 22.2 The biosyntheic pathway for production of carotenoids where the molecular structure of each molecule is shown to illustrate the lengthening of the carbon change through addition of IDP or GGDP. Enzymes included are idi, isopentyl diphosphate isomerase; ispA, farnesyl diphosphate synthase; crtE, geranylgeranyl diphosphate synthase; crtB, phytoene synthase; crtI, phytoene desaturase; crtY, lycopene cyclase.

22-8

Future Applications of Metabolic Engineering

in the carbon chain that result in a highly active molecule. This accessibility of double bonds allows for its easy modification by lycopene cyclase (crtY) to form the cyclic molecule, β,β-carotene. Lycopene, and β,β-carotene can be processed by a number of enzymes to yield the multitude of linear and cyclic compounds making up the carotenoid family. The C50 compounds are formed through the extension of the C40 backbone by prenyl-transferase like enzymes while C30 compounds are made from condensations of two C15 farnesyl diphosphate (FDP) molecules. In yeast, ergosterol (provitamin D2) is the principal isoprenoid molecule as it is an essential part of the yeast membrane and is derived from FDP. Redirection of the flux away from ergosterol and into GGPP and subsequent carotenoids was achieved through the insertion of a plasmid containing Erwinia uredovora crtE, crtB, and crtI genes under the control of various S. cerevisiae promoters. The engineered recombinant yeast strain produced lycopene up to 113 μg/g dry weight where a similarly engineered strain harboring a plasmid containing the additional E. uredovora crtY gene resulted in the production of β-carotene (103 μg/g dry weight) (Yamano et al., 1994). The availability of genetic elements, such as transposons and transformation techniques that permit a more high-throughput genetic manipulation of E. coli have allowed the application of stochastic methods for the generation of lycopene overproducing strains (Alper et al., 2005a). More specifically, the genes encoding for 1-deoxy-D-xylose 5-phosphate (dxs), FDP synthase (ispA), and isopentenyl diphosphate (IPP) isomerase (idi) were first grafted into the E. coli chromosome while episomally expressing the crtEB1 operon. Of these two inserted gene sets, the first enabled the conversion of the glycolytic metabolites pyruvate and glucose 3-phosphate into the required precursor isopentenyl diphosphate which is then converted to lycopene by the enzymes encoded within the crtEB1 operon. Introduction of random gene deletions by transposon-based mutagenesis (Alexeyev and Shokolenko, 1995) resulted in the identification of three genes whose deletion improved lycopene biosynthesis (Alper et al., 2005b; Alper et al., 2005c). The genes identified as positive deletions encode for rssB, a gene controlling macromolecule degradation, and two hypothetical proteins yjfP, yjiD. To further improve lycopene production, the authors combined the single mutations found from the transposon library screening with a set of gene deletions predicted to result in lycopene production increases based on the results of flux balance analysis and minimization of metabolic adjustment simulations (Segre et al., 2002; Varma et al., 1993). After identification, combinatorial deletions were performed that resulted in strains producing lycopene over 10 mg/g dry cell weight. Carotenoid biosynthesis in E. coli has also been achieved and optimized using directed protein evolution to develop novel compounds with unique properties. Notably, the generation of novel acyclic carotenoids was achieved by gene shuffling crtI genes derived from Erwinia herbicola and Erwinia uredovora. After an opening round of shuffling with the crtI genes, the resulting library was introduced into E. coli harboring wild-type crtE and crtB for subsequent selection of clones that confer carotenoid colorations. One clone with yellow coloration (I25) and one with pink (I14) were isolated. Further sequence analysis of the mutated crtE gene isolated from I14 revealed two amino acid mutations and a replacement of the 39 N-terminus amino acids of crtE from E. uredovora with that of E. herbicola. Sequence analysis of crtE isolated from I25 showed two amino acid changes from the original sequence. In order to extend the breeding of novel cyclic carotenoids, crtY from E. uredova and E. herbicola were introduced separately into the carotenoid pathway containing crtE from clone I14. When the wild-type crtE was introduced in the recombinant E. coli expressing the carotenoid pathway, a bright yellow– orange coloration was produced. However, replacing the wild-type crtE with the desaturase from I14 resulted in bright yellow coloration. A library of crtY was also created by shuffling the crtY genes from the two origins, and introduced into the E. coli carrying the I14 desaturase pathway. Out of 4,500 clones screened, 25 colonies with different colorations were selected. Sequencing of the cyclase isolated from a bright red clone revealed the generation of two amino acid changes within the E. uredova cyclase, without a recombination event occurring. One reaction product extracted from this colony was identified to be torulene, a compound not native to the recombinant metabolic pathway introduced. A similar approach has recently been presented where carotenoid production alterations from E. coli by random chromosomal mutations were used to produce novel carotenoids (Tao et al., 2005a).

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-9

22.3.2 Coenzyme Q10: The Ubiquitous Quinone Coenzyme Q10 (CoQ10) is widely regarded as one of the most important lipophilic antioxidants that can prevent the generation of free radicals as well as oxidative modifications of proteins, lipids, and DNA. Many human pathological conditions are associated with reduced levels of CoQ10, including cardiac disorders, neurodegenerative diseases, and cancer, all of which are usually treated with dietary CoQ10 supplements. There are two major components of this molecule, the quinine ring and the long isoprene tail, having uniquely defined functions. The quinine component allows CoQ10 to transfer electrons while the isoprenoid side-chain holds it within the mitochondrial or cytoplasmic membrane. To satisfy demand for this important fine chemical, CoQ10 has been synthesized by conventional chemical synthesis (Negishi et al., 2002), semichemical synthesis (Lipshutz et al., 2002) and more recently from native and recombinant microbial strains (Park et al., 2005; Yoshida et al., 1998). While the metabolic pathways for the synthesis of CoQ10 in eukaryotes and prokaryotes differ, both require common assembly steps, namely assembly of the quinonoid ring and generation of the decaprenyl diphosphate tail. Synthesis of the quinoid nucleus is accomplished by the shikimate pathway via chorismate and p-hydroxybenzoate in bacteria or, in higher eukaryotes, by tyrosine that is supplied through dietary means due to the absence of the shikimate pathway. Yet uniquely, S. cerevisiae can synthesize CoQ10 from either chorismate or tyrosine. The isoprene tail, most commonly decaprenyl diphosphate (DPDP), is synthesized through consecutive condensations of IDP to form FDP followed by subsequent condensations of FDP by DPDP synthase. In the final steps, the head and tail groups are unified by phenylation using 4-hydroxybenzoate polyprenyltransferase (HBPT), a membrane bound protein known to have wide substrate specificity (Melzer and Heide, 1994). Additional modifications, such as decarboxylations, hydroxylations, and methylations follow to finally synthesize CoQ10. A more detailed review of the complete synthesis can be found in Choi et al. (2005). Table 22.1 lists the major enzymes for the biochemical synthesis and their corresponding genes in yeast and E. coli. An early study of bacterial production identified three efficient CoQ10 producing strains from the species A. tumefaciens, Rhodobacter sphaeroides, and Paracoccus denitrificans, and the discovery of two mutated over producer strains; one mutant A. tumerfaciens and a mutant R. sphaeroides (Yoshida et al., 1998). Before additional mutations were introduced, an original set of 34 strains across the three species were grown in flask fermentation cultures. Interestingly, a number of the parental strains achieved high production levels, but for many of the cultures, extractions proved difficult owing to the high viscosity of the fermentation media and were thus excluded from further study. Following selection, two parent strains belonging to the A. tumefaciens species were subjected to random mutagenesis by chemical treatment by N-methyl-N′-nitro-N-nirtosoguanidine (NTG). After 90 hr fermentations, CoQ10 production in mutant strains ranged from 60 mg/L up to 110 mg/L as compared to the wildtype A. tumefaciens production of at most 50 mg/L. Higher concentrations in mutant strains were attributed to the introduced genetic alterations, although mutation sites were never identified (Yoshida et al., 1998). Recombinant DNA technology has also been applied in the biosynthesis of CoQ10 through multiple gene insertions. Park and colleagues performed batch and fed-batch fermentations of E. coli BL21 strains episomally expressing DPDP synthase (ddsA) from Gluconobacter suboxydans (Park et al., 2005). The ddsA gene was introduced to the cells by two different coreplicable plasmids: (1) a high-copy number plasmid (pUC19) resulting in strain pYCDdsA, and (2) a low-copy number plasmid (pACYC184) to yield the pACDdsA strain. The low-copy pACDdsA strain consistently outperformed the high-copy pYCDdsA strain to achieve CoQ10 production levels of 0.97 mg/L with 103 g dry cell weight/L in batch cultures and a final concentration of approximately 25.5 mg/L under fed-batch conditions. While titers are still low relative to chemical synthesis, these studies illustrate the limited effort needed to achieve competitive titers through microbial production of this complex isoprenoid.

22-10

Future Applications of Metabolic Engineering

Table 22.1 Genes Involved in the Biosynthesis of Coenzyme Q10 Enzyme

Escherichia Coli

Saccharomyces Cerevisiae

Chorismate lyase

ubiC

Decarboxylase

ubiD/ubiX

Polyprenyl diphosphate synthase p-Hydroxybenzoate polyprenyl transferase O-Methyltransferase Unknown

ispB ubiA ubiG

COQ1 COQ2 COQ3 COQ4*

C-Methyltransferase Mono-oxygenase Mono-oxygenase Mono-oxygenase

ubiE ubiH ubiF ubiB

COQ5 COQ6 COQ7 COQ8*

*The function of COQ4 and COQ8 is unknown however they are believed to act in methylation and hydroxylation, respectively. Sources: Modified from Jonassen, T., Proft, M., Randez-Gil, F., Schultz, J. R., Marbois, B. N., Entian, K. D., and Clarke, C. F., J. Biol. Chem. 273, 3351–3357, 1998; Meganathan, R., FEMS Microbiol. Lett., 203, 131–139, 2001; Okada, K., Suzuki, K., Kamiya, Y. et al., Biochim. Biophys. Acta, 1302, 217–223, 1996; Okada, K., Minehira, M., Zhu, X., Suzuki, K., Nakagawa, T., Matsuda, H., and Kawamukai, M., J. Bacteriol., 179, 3058–3060, 1997; Poon, W. W., Barkovich, R. J., Hsu, A. Y., Frankel, A., Lee, P. T., Shepherd, J. N., Myles, D. C., and Clarke, C. F., J. Biol. Chem., 274, 21665–21672, 1999; Suzuki, K., Ueda, M., Yuasa, M., Nakagawa, T., Kawamukai, M., and Matsuda, H. Biosci. Biotechnol. Biochem., 58, 1814–1819, 1994.

22.3.3 Terpenoids Terpenoids represent a diverse class of natural, high-value chemicals containing more than 30,000 different structures. Commercial applications for terpenoids include flavor and fragrance additives, essential oil constituents and an expanding role in pharmaceuticals where commercial generation of terpenoids is generally achieved through chemical synthesis or plant extraction. Conventional chemical synthesis uses a series of isoprene condensations and a cyclization to form terpenoids although production quantities of critical terpenoids are generally obtained by extraction from a variety of plant tissues (Watts et al., 2005). Both conventional chemical synthesis and extraction are highly expensive and low-yielding processes, therefore an opportunity exists for the engineering of terpenoid biosynthesis in recombinant organisms. This is especially beneficial since microbial production means efficient enzymatic cyclization reactions can occur by a variety of terpene cyclases, thus offering more variation in terpenoid conformation (Kim et al., 2006; Picaud et al., 2006). Similarly to classes of flavonoids, many terpenoids require the use of membrane-bound cytochrome P450 monooxygenases, a challenging hurdle to overcome in prokaryotes. As with most critical fine chemicals being investigated for their pharmaceutical potential, two critical terpenoids, artemisinin and taxol, are especially important since the current demand exceeds production capabilities (Hezari et al., 1995). 22.3.3.1 Artemisinin Artemisinin is a sesquiterpene found in sweet wormwood (Artemisia annua) that is derived from amorphadiene, a cyclization product of FDP. Malaria-causing Plasmodium strains have begun to develop resistance to traditional antimalarial compounds, such as chloroquine, cycloguanile, and sulfadoxin. As such, the potential of artemisinin is becoming increasingly important as an alternative treatment to this deadly disease (Liu et al., 2006; Rathod et al., 1997). Current extraction methods for artemisinin are inefficient and result in inadequate production levels that can not accommodate the growing global demand for inexpensive anti-malarial drugs. This is especially true among countries of the developing world where malaria infections are the most frequent.

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-11

Considering these factors, recombinant strains are beginning to show potential as microbial factories capable of producing artemisinin. In one of the first attempts, an engineered E. coli strain was developed capable of synthesizing the precursor amorphadiene (Martin et al., 2003). In this study, mevalonate kinase, phosphomevalonate kinase, and mevalonate pyrophosphate decarboxylase from S. cerevisiae, together with IDP isomerase and farnesyl pyrophosphate synthase from E. coli were first sewn together under the control of the lac promoter to form a synthetic operon cloned onto a coreplicable plasmid. The synthetic operon was transformed and coexpressed with a codon modified amorphadiene synthase (Martin et al., 2003) resulting in a recombinant strain able to produce up to 24 mg/L of amorphadiene. Further analysis revealed this to be an underestimate on production levels in that losses from stripping to the air during fermentation were not considered significant, but is now known to be prevalent for isoprenoid products. To rectify this issue, a two-phase partitioning bioreactor (TPPB) strategy was implemented that resulted in the separation of the hydrophobic product, amorphadiene, from the fermentation broth. By doing so, the previously engineering strain had improved product titers of up to 20-fold, generating approximately 500 g/L of amorphadiene (Newman et al., 2006). More recently, efforts have focused on engineering S. cerevisiae to improve the production yields of amorphadiene by increasing precursor availability and to produce artemisinic acid, a later stage precursor to artemisinin. In a first step, the carbon flux leading to competing biosynthetic pathways was reduced in an effort to increase FDP production within the host. In the end, up regulation occurred for several genes responsible for FDP synthesis and at the same time the gene responsible for FDP conversion to sterols, squalene synthase, was down regulated (Ro et al., 2006). The resulting recombinant yeast produced 153 mg/L of amorphadiene after introduction of amorphadiene synthase from A. annua (Ro et al., 2006). As with flavonols, the synthesis of artemisinic acid from amorphadiene requires functional expression of membrane bound cytochrome P450 monoxygenases. After isolation of genes encoding the oxidizing of amorphadiene in A. annua, the cloning and expression of genes responsible for hydroxylation (CYP71AV1) and oxidation, a cytochrome P450 oxidoreductase (CPR) as a partner protein, led to the biosynthesis of artemisinic acid in recombinant yeast. As a result of the engineered mevalonate pathway and introduction of CYP71AV1 and CPR, high titers of up to 100 mg/L of artemisinic acid were found (Ro et al., 2006). 22.3.3.2 Taxol In the early 1970s extensive characterization of extractions from the bark of a number of Northwestern United States trees, including one from the Pacific yew (Taxus brevifolia) were undertaken in an effort to identify new natural products with potential therapeutic properties. With advances in chemical characterization, a highly influential terpenoid, named taxol, was eventually identified. Today this important chemical is increasingly used in cancer chemotherapy (Foa et al., 1994). Yet extraction from T. brevifolia is highly inefficient (yielding only 1 mg of taxadiene from 750 kg of dry Pacific yew bark) and as a further result, the inefficient extraction has led to the depletion of natural resources thus driving the cost of taxol higher. The elucidation of the biosynthetic pathway to taxadiene, a precursor of taxol, opened the door to available approaches for production through chemical synthesis, but these processes remain cumbersome and sometimes require as many as 25 steps (Kingston, 1991; Shuler, 1994). In recent years, taxadiene production in microorganisms has been explored in an effort to lower costs and provide a simple mode of extraction. Recombinant microorganisms provide an environmentally friendly and competitive approach for the possible large-scale production of taxadiene via its phosphate precursor, IDP, through a 3-step reaction pathway. First IDP is isomerized to form dimethlallyl diphosphate (DMADP) by IDP isomerase. In the second step, GGDP is formed from the condensation of three molecules of IDP with one molecule of DMADP by the enzyme GGDP synthase. These two steps, shown in Figure 22.3, are universal steps for the biosynthesis of various isoprenoids, such as carotenoids. Finally, taxadiene synthase catalyzes the cyclization of GGDP to form taxadiene (Lin et al., 1996). In a first attempt to achieve substantial biosynthesis of taxadiene in E. coli, IDP isomerase, GGDP synthase, and a truncated taxadiene synthase were

22-12

Future Applications of Metabolic Engineering

Carotenoid pathway

DMADP OPP

TS GGDPS GGDP

OPP

OP P

Taxadiene

IDP THY5a

TAT

AcO

HO Taxa-5α-diol

Taxadien-5α-yl acetate THY10b

O

OH O OAc

OH

HN O

AcO Taxadiene-5α,10β-diol monoacetate

O OAc

O O

OH O

OH Taxol

Figure 22.3 The major metabolites of taxol biosynthesis discussed are shown along where the enzymatic steps are encoded by TS, taxadiene synthase; THY5a, cytochrome P450 taxadiene 5α-hydroxylase; TAT, taxadienol 5α-O-acetyl transferase; THY10b, taxoid 10β-hydroxylase. The final taxol molecule is also shown.

overexpressed, together with the deoxyxylulose-5-phosphate (DXP) synthase from E. coli, to increase the availability of IDP (Huang et al., 2001). Truncation of taxadiene synthase was used to improve the solubility of the enzyme in the host cell. Heterologous genes were cloned separately into multiple coreplicable expression plasmids where the expression level of each gene was regulated by the strong T7 phage promoter. The recombinant E. coli strain achieved production levels of up to 1.3 mg taxadiene/L in batch fermentations, a significant improvement over plant extraction. In a first attempt toward the complete synthesis of taxol, the tractable host S. cerevisiae was employed as it provides the ability (unlike the prokaryote E. coli) to functionally express P450 enzymes that are widely utilized in the taxol biosynthetic pathway (DeJong et al., 2006). In addition to the incorporation of GGDP synthase (GGDPS) and taxadiene synthase (TS), the three steps following taxadiene formation were also introduced to the recombinant yeast to product taxadiene-5α,10β-diol monoacetate. The three additional enzymes are taxoid 10β-hydroxylase (THY10b), taxadienol 5α-O-acetyl transferase (TAT) and the cytochrome P450 taxadiene 5α-hydroxylase (THY5a). Fermentations up to three days using the recombinant strains yielded 1.0 mg/L of the taxadiene intermediate, but only trace amounts (<25 μg/L) of the diol product. This indicated the ample cooperativity of GGDPS and TS but also highlights their poor expression with the rest of the biosynthetic pathway. To resolve the problem of poor functional expression in the rest of the taxol biosynthetic pathway, coordinated overexpression of the P450 oxygenases and P450 reductases has been suggested as a way to increase total pathway activity, thereby increasing the diol (end product) production. (DeJong et al., 2006; Jennewein et al., 2005). PCR differential display has identified a number of the genes actively expressed during in vivo synthesis which will aide in the development of new recombinant strains to realize the formation of taxol in a production host.

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-13

22.4 Specialized Fine Chemicals from Microorganism Biosynthesis The following section details a number of efforts undertaken to synthesize some of the more industrially specialized chemicals through microbial biosynthesis. These specialty chemicals have high market value in the food industry, pharmaceuticals, cosmetics and other niche areas of consumer goods, particularly where “natural” products are an important concern among consumers. Many of the following chemicals to be discussed have been fully realized through microbial production while others are still in the development phases.

22.4.1 Polyketides Polyketides form a large class of natural products with interest in both pharmaceutical and agricultural applications. The synthesis of these complex molecules originates from simple building blocks such as acetyl-CoA, malonyl-CoA, methylmalonyl-CoA and propionyl-CoA and is carried out by the action of polyketide synthases (PKSs). In traditional chemical polyketide synthesis, rapid polymerization forming unwanted products and proper site directed synthesis can be troublesome to overcome. As such, microbial biosynthesis can overcome these bottlenecks to improve yields of desired stereochemistries, yet there are issues such as natural microbial resistance still left to be solved. Since many microorganisms actively express unique PKSs responsible for the folding and post translational modifications needed, development of an efficient production platform requires simple overexpressions in the host organism. To engineer an E. coli polyketide over producer, PKS expression is accomplished primarily by enzyme mutations and plasmid transformations. In addition to expression, increasing the availability of the small precursor metabolites is a priority, particularly in high cell density conditions where basic building blocks are depleted rapidly. Compounding the challenge, the issues of expressing PKSs and precursor availability must be resolved independently and simultaneously, so that a well-synchronized metabolic system can be developed (Pfeifer et al., 2001). Recently the generation of microbial production strains for a number of different polyketide antibiotics has been achieved using a variety of engineering and biochemical techniques. Erythromycin is a potent antibiotic synthesized by the soil bacterium Saccharopolyspora erythrea where the macrolytic-antibiotic core is synthesized by the large modular PKSs. By incorporating genes into the E. coli chromosome for the co-expression of an array of PKSs, the synthesis of 6-deoxyerythronolide B (6dEB), the core molecule, was achieved. 6dEB is formed from one propionyl-CoA unit and subsequent elongation of six (2S)-methylmalonyl-CoA by the enzyme deoxyerythronolide B synthase (DEBS). DEBS is a three-subunit enzyme (α2β2γ2), comprising of two sets of 28 distinct active sites, seven of which are modified after translation by panthetheinlation. Due to the complex chemical structure hampering the total chemical synthesis of the antibiotic, any large-scale production relies on fermentation technologies for part of the synthesis steps. Expression of DEBS genes has been achieved in recombinant Strepomyces coelicolor (Kao et al., 1994), but due to the challenges in developing a scalable fermentation process of Actinomyces, researchers have been seeking to create a production platform in recombinant E. coli. For that reason, the three subunits of DEBS were cloned individually into the E. coli expression vector pET21c. In order to facilitate pantetheinylation of the recombinant DEBS and synthesis of propionyl-CoA, a phosphopantetheinyl transferase gene (sfp) and propionyl-CoA synthase (prpE) were also inserted into E. coli by integration into the prpRBCD operon within the E. coli genome (Pfeifer et al., 2001). Disruption of the prp operon, which is responsible for propionate metabolism, was intended to allow optimum conversion of exogeneously supplemented propionate into propionyl-CoA by the prpE gene product. Furthermore, the two-subunit propionyl-CoA carboxylase (pcc) and the biotin ligase carrier protein (birA) were also introduced into the recombinant strain mediated by a coreplicable plasmid. Introduction of the carboxylase gene allowed the conversion of propionate into (2S)-methylmalonyl-CoA, which served

22-14

Future Applications of Metabolic Engineering

as an extender unit of the recombinant DEBS. Fermentation of the highly engineered recombinant strain in propionate supplemented media yielded 0.1 mmol of 6dEB per gram of cellular protein per day, which is superior to wild-type S. erythraea, and compatible to a modified strain for industrial 6dEB production (Pfeifer et al., 2001). More recently, the DEBS genes have been inserted directly onto the chromosome of E. coli, although only minimal production was found (Wang and Pfeifer, 2008). An interesting study illustrated microbial production’s ability for site directed biosynthesis while producing a modified form of erythromycin in combining two widely used aspects of metabolic engineering: alteration of genetic elements and heterologous gene introduction. First the functional unit of the erythromycin PKS was replaced by removing the methylmalonate-specific acyltransferase domain responsible for formation of the methyl side chain at C 6 with an ethylmalonyl-specific acyltransferase used for niddamycin biosynthesis. This produced an erythromycin-like product, but only after further expression of a gene encoding crotonyl-CoA reductase was the recombinant strain able to produce the desired 6-ethylerythromycin product (Stassi et al., 1998). The commercial production of tylosin, a complex polyketide antibiotic, was recently accomplished where the concept of DNA shuffling was adapted to shuffle the DNA of an entire genome, that of the bacterium Streptomyces fradiae (Zhang et al., 2002). In order to generate a new tylosin over-producer, the wild-type S. fradiae strain was subjected to one round of chromosomal random mutagenesis through cellular exposure to a nitrosoguanidine mutagen. Upon screening of 22,000 individual mutants, 11 strains producing more tylosin than the wild-type were isolated. To generate a genome-shuffled library, protoplasts of the 11 strains were mixed in equal proportion and recursively fused. One thousand clones were screened from the first round of genome shuffling, and seven identified superior strains were used as the parental strains for the next shuffling cycle. Similarly, another 1,000 new colonies were screened and seven strains with further improvement of tylosin production were isolated. Analysis of two overproducer strains from those isolated showed tylosin titers nine-fold higher compared to the wild-type S. fradiae. It is compelling that the development of a similar overproducer strain using various mutagens took place in 20 years, requiring 1 million assays while application of the genome shuffling method achieved the creation of an overproducer strain in the course of 1 year with only 24,000 assays. In some cases the application of a few point mutations within the enzymatic coding sequence will result in radical changes of the enzyme’s catalytic properties, as has been the case for mutations of the plant type III PKSs. These key enzymes are responsible for the biosynthesis of structurally diverse valuable natural products found in plants (extensively reviewed in Austin and Noel, 2003) and include such enzymes as benzalacetone synthase (BAS) which catalyzes a condensation reaction of 4-coumaroyl-CoA with one malonyl-CoA to form benzalacetone. This critical molecule is the major precursor of the antiinflammatory lindleyin found in rhubarb, the chemicals gingerol and curcumin found in ginger plants, and the characteristic chemical conferring the raspberry aroma, raspberry ketone. With the availability of amino acid sequences for various plant type III PKSs, for example, chalcone synthase, the resveratrol producing enzyme stilbene synthase (STS), 2-pyrone synthase (2-PS), and acridone synthase (ACS), bioinformatic studies have highlighted critical differences within the catalytic region of these enzymes. After sequence alignment, it was shown that the conserved amino acid residue Phe-215, thought to be a crucial integral of the catalytic activities of CHS, is not present in the BAS gene. By replacing the amino acid Leu-215, together with its adjacent Ile-214 with Phe-215 and Leu-214, respectively, it was shown to confer CHS activities to an enzyme naturally exhibiting BAS activity. The mutations of the BAS sequence resulted in chalcone-forming properties, in which the chalcone naringenin, along with other byproducts were generated from incubation with appropriate substrates (Abe et al., 2003). These bioinformatic endeavors are vital to optimization efforts of microbial production cell lines for polyketides as well as the array of fine chemicals available for production by microorganisms.

22.4.2 Microbial Synthesis of Chain Molecules The microbial production of polyunsaturated fatty acids (PUFAs) is beginning to reestablish itself as a cost-effective production method, especially since microbial oils, or single cell oils (SCOs), where

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-15

found to be high in PUFA content. PUFAs do not occur to any extent in plant extracted oils generated by agriculture and until now could only be generated from marine animal sources (Ratledge, 2004). Clear clinical evidence has shown PUFAs to be highly important to protect infants from cardiovascular disease and to be beneficial in the development of brains and retina functions to achieve improved memory and eyesight (Damsgaard et al., 2006; von Schacky and Dyerberg, 2001). Their prominence in mother’s milk and absence from cow’s milk and infant formulas has only set to reinforce the importance of these critical molecules for human development. Today a number of different strains are being used to produce some of the more important fatty acids. PUFAs are long chains of 16 or more carbon atoms containing two or more double bonds along the chain. Though the needed amounts of PUFAs differ among species, they are required by all for normal cellular functioning as they are responsible for membrane fluidity and act as signaling molecules in some species. In general, most fatty acids necessary for cell survival are synthesized by cellular metabolic pathways. However the synthesis of some natural fatty acids, such as linoleic acid (18:2ω6), and ω-3, ω-6, and ω-9 fatty acids, can not occur in mammals but are essential parts of diets for normal growth and development (Chemler et al., 2006). The formation of PUFAs requires the expression of a number of chain lengthening elongases as well as a variety of desaturases used to introduce double bonds (i.e., Δ5-, Δ6-, Δ12-). Although all living organisms must synthesize some lipids for membranes, few microorganisms are able to accumulate lipid levels of greater than 20% of the cell mass, but some yeast (along with a few species of algae and fungi) are able to naturally synthesize large amounts of lipids, making yeast an ideal host for microbial production of PUFAs (Figure 22.4). Engineering the synthesis of PUFAs in yeast has progressed in recent years through the functional expression of enzymes from a variety of exogenous sources. For example, accumulation of γ-linolenic acid (18:3ω6) was achieved by the expression of Δ6-desaturase from Mucor rouxii in yeast. After sequencing and cloning of the encoding gene, it was transformed into the strain using vector pYES2 downstream of a GAL1 promoter. Fermentations resulted in yields of approximately 7% of the total fatty acids as γ-linolenic acid when fed with the precursor linoleic acid (18:2ω6) (Laoteng et al., 2000). A similar study introduced Δ6-desaturase and Δ12-desaturase from Mortierella alpine, where the resulting γ-linolenic acid (18:3ω6) accumulation was as high as 8% of total fatty acid content for the coexpression in S. cerevisiae (Huang et al., 1999). While this is only the beginning of expressing the synthesis of PUFAs in microorganisms, further efforts have uncovered PKS used in the Shewanella bacteria that are similar to the fatty acid synthase (FAS) used by E. coli. Eight PKS domains were found to lead to PUFA biosynthesis, thus identifying the mechanism used for long chain fatty acid synthesis in Thraustochytrids where the fatty acid chain remains unsaturated as the chain continues to be lengthened (Metz et al., 2001). This is unlike conventional eukaryotic synthesis where the chain is completely saturated while growing, thus providing easier access to longer chain PUFAs through possible plasmid transformations into recombinant hosts. Wax esters are long chain carbon molecules with lengths from 38 to 44 carbons and are composed of mainly 20:1 fatty acids and 20:1 and 22:1 fatty alcohols. They are primarily utilized as lubricants but also in medicine, cosmetics, and the food industry. A recent study used recombinant E. coli to produce wax esters similar to those found in the Jojoba plant (A. baylyi), from which plant wax is extracted at a high cost. Only with the coexpression of plant acyl-CoA reductase Acr1 using an ampicillin selective plasmid and a kanamycin selective plasmid harboring A. baylyi ADP1 wax ester synthase/acyl-CoAdiacylglycerol acyltransferase (WS/DGAT) the synthesis of a mixture of wax esters was accomplished, predominantly containing palimityl oleate (C34:1) (Kalscheuer et al., 2006). Bacterial WS/DGAT was shown in an earlier study to have a highly unspecific acyltransferase activity and was capable of accepting a broad range of alcohols as substrates from long chain fatty alcohols to the short chain alcohols of ethanol (Kalscheuer et al., 2004).

22.4.3 Pigments, Flavor, and Fragrance While many previously mentioned compounds can be classified as color or flavor compounds, the following are typically utilized for their properties of exhibiting color and/or imparting flavor and

22-16

Future Applications of Metabolic Engineering Palmatic acid (16:0)

FATTY ACID SYNTHASE

Elongase α-Linolenic acid (18:2 ω-3) ∆6 Stearidonic acid (18:4 ω-3)

∆12

Linolenic acid (18:2 ω-6) ∆6 γ-Linolenic acid (18:3 ω-6)

Elongase

Elongase

Eicosatetraenoic acid (20:4 ω-3)

Eicosatrienoic acid (20:3 ω-6)

∆5 Eicosapentaenoic acid (20:5 ω-3) Elongase Docosapentaenoic acid (22:5 ω-3) ∆4

∆5 Arachidonic acid (20:4 ω-6) Elongase Adrenic acid (22:4 ω-6)

∆12

Stearic acid (18:0) ∆9 Oleic acid (18:1 n-9) Elongase Elcosenoic acid (20:1 n-9) Elongase Erucic acid (22:1 n-9) Elongase Nervonic acid (24:1 n-9)

∆4

Docosahexaenoic acid (22:6 ω-3)

Docosapentaenoic acid (22:5 ω-6)

ω-3 Fatty acids

ω-6 Fatty acids

n-9 Fatty acids

Figure 22.4 The formation of ω-3, ω-6, and ω-9 families of polyunsaturated fatty acids indicating the instances where elongation and desaturation occur for the generation of clinically important fatty acids such as docosahexaenoic acid and arachidonic acid.

fragrance. The following sections contain molecules derived from precursors of one or more molecular classes to create large complex molecules that pose difficult challenges when using chemical synthesis. As such, interest in microbial biosynthesis and biocatalysis has grown as it provides an ample means of production for many of the critical flavors, colors and scents used in food, perfumes, and cosmetics. Additionally, a number of these compounds have been shown to have medicinal benefits either in their natural state or in a modified form of their natural product. 22.4.3.1 Flavors Development of flavors to be used in food products, especially cheese, wine, and fermented sausages, as well as aromas has recently undergone a tremendous “back-to-nature” demand. This phenomenon is ever present in today’s food markets where consumers’ preference for ‘naturally’ flavored products instead of the synthetic (chemically synthesized) flavors is growing (Demyttenaere and van Ruth, 2001). Recent legislation in Europe and the United States has defined “natural” products as those synthesized by native enzymes, and with consumer interest in natural labeled products high, it has resulted in a push toward the development of flavors in microbial synthesis. A large amount of development has occurred in starter cultures used to provide the enzymes for dairy products, most notably cheeses. The important organisms used in these cultures include Lactococcus lactis, Lactobacillus, Streptoccus, and Propionibacterium, and other lactic acid bacteria, impart cheeses with their characteristic flavors due to

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-17

the organisms’ ability to synthesize the required peptides and amino acids making up the volatile aroma compounds needed. Lactic acid bacteria are efficient synthesizers of the branched-chain amino acids (Leu, Ile, Val) used for malty and fruity flavors, the aromatic amino acids (Phe, Tyr, Trp) responsible for floral and chemical flavors and the sulphuric amino acids (Met, Cys) that create the cabbage, meat, and garlic flavors. Much interest has also been paid to the development of high throughput screening methods for product analysis from flavor-forming organisms’ fermentations since these products are critical in the formation of various flavors (Smit et al., 2004, 2005). In addition to cheeses, bioflavors are becoming increasingly important as additives in beer and other carbonated beverages as alternatives to chemically synthesized analogs which are coming under increased scrutiny due to possible negative health and environmental effects (Vanderhaegen et al., 2003). 22.4.3.1.1 Precursor Biosynthesis Is a Vital Step in Flavor Production Biosynthesis of secondary products creates a large drain on microbial metabolism, particularly in production of the critical precursor metabolites needed for flavor compounds, as such, the development of strains unaffected by such metabolic demands has garnered a lot of interest. While the elucidation of the complex pathways leading to the formation of the many flavor compounds has just started, the catabolism of small molecule precursors and the important character imparting amino acids has been known for some time (Ardo, 2006). The biosynthesis of all aromatic amino acids begins with the shikimate pathway, encoded by the aro gene cluster, in which the glycolysis and pentose precursors are reacted to from the branching molecule chorismate via shikimate. In the first step, erythrose 4-phosphate and phosphoenolpyruvate are converted to 3-dehydro-shikimate by 2-dehydro-3-deoxyphosphoheptonate aldolase (aroFGH), 3-dehydroquinate synthase (aroB) and 3-dehydroquinate dehydrate (aroD). Shikimate, shikimmic acid, is then formed using NAD or NADP by shikimate dehydrogenase (the two isoforms are encoded by aroE or ydiB, respectively). The remained of the gene cluster then converts shikimate using ATP to chorismate by shikimate kinase I or II (aroK or aroL), 3-enolpyruvylshikimate-5-phosphate synthetase (aroA), and chorismate synthase (aroC). Chorismate can then branch into phenylalanine and tyrosine synthesis, tryptophan synthesis. (Figure 22.5) For the final three steps of phenylalanine biosynthesis, chorismate in E. coli is acted on by a bifunctional enzyme encoded by pheA in which chorismate mutase is encoded on the N-terminus and prephenate dehydrase on the C-terminus. This bifunctional enzyme generates phenylpyruvate via prephenate, the branching molecule for tyrosine synthesis. Phenylpyruvate is then converted to phenylalanine using glutamate by an aminotransferase (aspC). Tyrosine on the other hand synthesized from prephanate through the action of NAD-dependent prephanate dehydrogenase, encoded by tyrA, to form β-hydroxyphenylpyruvate. Then a transaminase encoded by tyrB makes the final conversion with glutamate to form L-tyrosine. L-tryptophan biosynthesis begins back at chorismate using glutamine to generate indole through a series of reactions controlled by enzymes of the trp operon. The enzymes encoded by trpA, trpC, trpD, and trpE are sometimes found in bifunctional enzymes, such as the trpDE complex in E. coli. Indole is finally converted to L-tryptophan by tryptophan synthase (trpB). Of particular importance is L-phenylalanine, an essential amino acid, as it has wide interest for use as a feed source for a number of aromatic compounds, such as raspberry ketone (see below). It is predominantly produced as the starting chemical for the low-calorie sweetener aspartame, created by the Nutrasweet process (Bongaerts et al., 2001). The major microorganisms used to synthesize phenylalanine include strains of E. coli, C. glutamicum, and Brevibacterium flavum, lactofermentum and linens (Boyaval et al., 1983; Ito et al., 1990; Wu et al., 2003). Engineering C. glutamicum resulted in an increased production of phenylalanine through the introduction of feedback resistant variants of D-arabinoheptulosonate 7-phosphate (DHAP) synthase, chormismate mutase, and prephenate dehydratase, three important genes along the pathway for aromatic amino acid synthesis. The genes were all cloned onto one coreplicable vector and transformed to the bacterium for expression. Transformants were isolated and screened for altered carbon flows where one such strain produced up to 26 g/L of phenylalanine (Ikeda and Katsumata, 1992).

22-18

Future Applications of Metabolic Engineering

Phosphoenolpyruvate

D-Erythrose 4-phosphate

aroC

aroFGH Shikimate aroB pathway aroD 3-Dehydro-shikimate aroE

aroA

Chorismate pheA

Prephanate tyrA

trpD Anthranilate trpGD

pheA

trpC tyrB Tyrosine

aspC

trpF

Phenylalanine

Indole trpB

Shikimate aroL

Tryptophan

Figure 22.5 Biosynthesis of the aromatic amino acids via the shikimate pathway. Enzymes encoded by identified genes are described in the text. The N-terminus of pheA is the first reaction from chorismate to prephanate.

Various groups have engineered E. coli for phenylalanine biosynthesis using an array of heterologous genes with various levels of success (Backman et al., 1990; Konstantinov et al., 1991; Miller et al., 1987). Miller and colleagues inserted a wild-type DHAP synthase (aroF) along with a feedback resistant chorismate mutase/prephenate dehydratase (pheA) to perform the phenylalanine synthesis (Miller et al., 1987). It was later published that when the strain was combined with an optimizaed fermentation process, titers of 50 g/L for 36 hr fermentations where obtained with production levels of 0.23 g of L-phenylalanine per gram of glucose fed (Backman et al., 1990). In a similar study, feedback resistant forms of DHAP synthase and chorismate mutase/prephenate dehydratase where cloned onto a temperature-controlled expression vector enabling the production levels of up to 16.9 g/L, with additional process development eventually resulting in process titers of 46 g/L with a productivity of 0.85 g/L/hr (Konstantinov et al., 1991; Takagi et al., 1996). Efforts to increase tyrosine production in microorganisms have been just as vast using some of the same biochemical techniques and strains. Increasing the central carbon flows toward aromatics has been a common approach among all the aromatics, and has been shown to lead to increased levels of tyrosine production in C. glutamicum (Bongaerts et al., 2001). Additionally, by overexpressing the homologous transketolase of the pentose phosphate pathway, C. glutamicum strains were able to have 10–50% increases in product titers (Ikeda, 2006). The development of tryptophan overproducers has been driven by the increased market interest in tryptophan itself, as well as the possibility of a route for production of indigo, an industrial blue dye, from indole. Alterations of the aromatic pathway, increases to precursors and regulation of pathway enzymes have all been attempted in increasing production (Bongaerts et al., 2001). A simple hypothesis for tryptophan over producers is to delete pheA and tyrA leading to phenylalanine and tyrosine production, respectively. However, this has been shown to cause autotrophic shock and actually lead to reduced production levels. Thus, it was later shown that the simple overexpression of the first step in tryptophan synthesis from chorismate, anthranilate synthase (trpE), resulted in only small carbon fluxes away from tryptophan. This is a result of anthranilate synthase’s higher affinity for chorismate (Bongaerts et al., 2001; Ikeda, 2006). An important aspect of precursor production is to have at hand an efficient extraction method after synthesis. One study has developed a method for the continuous extraction of the precursors in a bioreactor by investigation the production ester precursors (isobutyl acetate, ethyl acetate, and propyl acetate) and the terpenoid precursors of citronellol and geraniol (Bluemke and Schrader, 2001). Since aroma compounds tend to pose strong inhibitory effects on microorganism growth even at low levels, an integrated bioprocess (IBP) was designed using pervaporation to separate the aroma products during

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-19

continuous fermentation. Using the IBP, the product mass increased from 50% to 413% depending on the component cultivated from Ceratocystis moniliformis. IBP was able to produced increased amounts of aroma compounds in highly concentrated aroma mixtures free from the culture broth and thus making further purification easier (Bluemke and Schrader, 2001). 22.4.3.1.2 Vanillin: A Major Flavor The use of higher fungi, particularly white-rot basidiomycetes, has been explored for the biosynthesis of attractive odors via de novo synthesis in native metabolic channels. Living on dead or live timber, these fungi are able to completely degrade lignin, a polymer of substituted alcohols, and metabolize the resulting monomers into aromatic compounds of interest (Lomascolo et al., 1999). One of the most important compounds from these organisms is vanillin (3-methoxy-4hydroxybenzaldehyde), a compound widely used in food preparation and fragrances. While production of vanillin through chemical synthesis (~US$15/ kg) is possible, it is considerably more profitable when coming from a “natural” production mechanism. Vanillin synthesis in white-rot fungi begins with the metabolism of lignin to form the precursor ferulic acid. This is then converted into vanillic acid by the degredation of the propenoic acid chain residue and either further reduced to form vanillyl alcohol via vanillin by aryl aldehyde dehydrogenase or oxidized to from methoxyhydroquinone (Lomascolo et al., 1999). This process for vanillin production yielded only 64 mg/L after a seven day culture (with a 27.5% molar yield) using the fungi Pycnoporus cinnabarinus with ferulic acid substrate, and has been patented (Lesage-Meessen et al., 1996; Sun, 2005). To improve this, a two-step, patented process was also developed that uses Aspergillus niger to transform ferulic acid to vanillic acid at an 88% molar yield and the use of P. cinnabarinus in the second step for synthesis of vanillin (Lesage-Meessen et al., 1996; Sun, 2003). 22.4.3.1.3 Raspberry Ketone and Raspberry Alcohol The raspberry flavor widely used in soft drinks and other food products is the result of a combination of a number of different chemical components with raspberry ketone (4-(4-hydroxyphenyl)-butan-2one) the most prominent effecter. A part from the obvious native plant source raspberries, the ketone and its precursor alcohol (raspberry alcohol or betuligenol) have been identified in other berries as well as grapes, apples, and peaches. The enzymatic pathway for synthesis of raspberry ketone involves the β-glucosidase-catalyzed hydrolysis of the native betuloside to form betuligenol, which is finally converted to the ketone by microbial alcohol dehydrogenase. The enzymatic production process has been patented (Dumont et al., 1996; Falconnier, 1999). Alternative raspberry ketone production platforms have been developed that use cellular suspensions, such as plant cell cultures of Rubus idaeus (raspberry). Similarly to flavonoid synthesis, formation of raspberry ketone begins with the enzyme phenylalanine ammonia lyase converting phenylalanine to p-cinnamic acid which then gains a hydroxyl group in forming p-coumaric acid by C4H. After the generation of p-coumaryl-CoA by 4CL, the pathway’s first committed enzyme, BAS condenses one molecule of malonyl-CoA with p-coumaryl-CoA to form benzalacetone. Raspberry ketone is finally synthesized by reducing benzalacetone with NADPH as the proton donor (Pedapudi et al., 2000). Cell suspensions of Rubus idaeus are able to produce the precursor and end metabolite in concentrations of 10–50 μM (Pedapudi et al., 2000). While microbial production of raspberry ketone is still in its infancy, higher fungi have been used for production of the valuable metabolite. Specifically, the basidiomycete Nidula niveo-tomentosa was used for de novo synthesis of raspberry ketone in submerged cell cultures (Böker et al., 2001). Several approaches have also been described for improving N. niveo-tomentosa’s productivity. More specifically, growing cells in soy peptone media with glucose as a substrate and yeast extract as added nutrients resulted in raspberry ketone and raspberry alcohol production of 43.5–119 mg/L in combined titers over a range of cultivation periods (Böker et al., 2001). While other recent attempts have included the use of “green” biocatalytic oxidation (Kosjek et al., 2003), further elucidation of required enzymes involved in select synthesis steps could lead to the cloning and microorganism transformation, thus creating recombinant over producing strains for industrial production.

22-20

Future Applications of Metabolic Engineering

22.4.3.2 Pigments With many of the synthetic dyes being attributed to adverse health and environmental effects, natural pigments have come under increased demand for use in a broad range of applications, such as (among others) food products, cosmetics, and animal feed. Many of these compounds also have alternative uses as pharmaceuticals and nutraceuticals or as starting materials for their development. The high-value pigment astaxanthin, which produces pink–orange hues, is one such case that is widely used as a pigmentation source for salmon and trout but, as it is a carotenoid, has been found to hold nutritional benefits as well (Guerin et al., 2003). In a study using the microalgae Haematococcus pluvialus, which naturally produces the pigment, chemostat cultures where performed at constant light irradiance and dilution rates with varying nitrate concentrations in the feed medium. Productivity levels were seen as high as 5.6 mg/L/day in optimal conditions where cells were growing and actively dividing in response to the limitation of nitrate availability (Del Rio et al., 2005). Lee and Kim isolated and characterized the gene cluster responsible for astanxanthin biosynthesis in the marine bacterium Paracoccus haeundaensis, designated by crtW, crtZ, crtY, crtI, crtB, and crtE (see Figure 22.2). The cluster was PCR amplified and cloned onto and expression vector pCR-XL-TOPO and transformed into E. coli BL21(DE3) Codon Plus cells. The inserted gene cluster was used to synthesize β-carotene using crtE, crtB, crtI, and crtY which then is acted on by β-carotene ketolase (crtW) and β-carotene hydroxylase (crtZ), to form astaxanthin. Recombinant E. coli expressing astaxanthin biosynthesis where found to produce 400 mg/g of dry cell weight representing about 70% of the total carotenoids produced in the strain (Lee and Kim, 2006). Naphthoquinones, generally derived from plants through extraction, are colored substances of red hues derived from phenylpropanoid and isoprenoid precursors. The chemicals have been used in diverse cultures as colorants for cosmetics, fabrics, and foods (Ballantine, 1969), and for medicinal applications, including antitumor, antiinflammatory, and antimicrobial agents (Papageorgiou, 1978). Production of naphthoquinones by the pathogenic fungus Cordyceps unilateralis BCC 1869 was investigated in shake flask cultures where cultivation conditions, including temperature, initial pH of medium, and aeration, were optimized to improve the yield of total naphthoquinones. The highest yield of naphthoquinones (3 g/L) was obtained from a 28 day culture grown in potato dextrose broth with an initial pH of 7.0, at 28°C with shaking-induced aeration at 200 rpm. An extraction process for isolation of the targeted naphthoquinone, 3,5,8-trihydroxy6-methoxy-2-(5-oxohexa-1,3-dienyl)-1,4-naphthoquinone (3,5,8-TMON), from a culture of C. unilateralis, was also developed resulting in a yield of 1.2 g/L of 3,5,8-TMON or 40% of total naphthoquinones. The stability of 3,5,8-TMON was very high, even upon exposure to strong sunlight (70,000 lx), high temperature up to 200°C, and acid and alkali solutions at concentrations of 0.1 M (Unagul et al., 2005). In an example of reverse engineering, the gram-negative bacterium Ralstonia eutropha was not known to synthesize pigmentation. However, during a study of the 2-methylcitric acid cycle of R. eutropha in which its entire genome was cloned into E. coli, an open reading frame was discovered encoding biosynthetic enzymes for indigoids. Blue pigments of plant and bacterial origins, indigoids have been used as dyes and pharmaceuticals. A genomic library of R. eutropha was created and then inserted into pHC79, a vector capable of carrying large DNA fragments. Upon transfection of E. coli, blue color transformants were identified after spreading on selective agar plates. Because E. coli does not normally produce blue pigments, the pigment synthesis in the recombinant cells must be derived from the action of foreign enzymes from R. eutropha. Further isolation and subcloning experiments isolated an open reading frame of 1251 base-pairs deemed responsible for the blue color formation where further characterization identified the sequence to encode for a dehydrogenase having sequence similarity with known proteins (Drewlo et al., 2001).

22.5 Strategies and Trends The previously presented examples of metabolic engineering of microorganisms clearly demonstrate that biocatalysis using unicellular organisms offers significant advantages for the production of fine

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-21

chemicals. Many of the metabolic engineering, biochemical, and bioinformatics techniques used for the generation of recombinant strains have been utilized with success; however, the quest for improved yields and the need for exploration of even larger metabolic and phenotypic spaces require the development of new experimental and computational tools that will eventually permit the construction of a phenotypic optimum. In this final section, we conclude with a look forward to a few of the novel strategies that have recently been developed for microbial biosynthesis.

22.5.2 New Approaches on Old Methods Researchers have been able to insert fragments of DNA for some time through a number of different methods. The engineering of cellular function is commonly achieved by adopting foreign pathways or over expressing native enzymes, which requires a priori knowledge of the genetic framework in both the recombinant and native strains. Although this genetic insertion strategy has proven useful, it is limited to the availability of genome sequences and gene function information. Moreover, isolation, selection and culturing the enormous array of unknown organisms for the discovery of novel biosynthetic pathways are cumbersome and require more knowledge of the physiology of the organisms. Therefore, relying on the availability of characterized pathways is a bottleneck toward progress of novel enzyme or product discovery and recombinant synthesis. In the case where available genes are not known or data is unavailable, it has been possible to insert whole or random fragments of genomes followed by the creative selection of desirable traits with reverse engineering used to characterize unknown pathways of an end product. This method was used to express pigments unknown to the bacterium R. eutropha within in E. coli (as discussed above). It is has been well established that both eukaryotic and prokaryotic cells use small, non-coding RNA sequences that can bind to RNA transcripts and prevent their subsequent translation. Such RNA molecules act as regulators in various signal transduction mechanisms (Gottesman, 2005) and in plasmid replication and copy number (Lacatena and Cesareni, 1981; Lacatena and Cesareni, 1983; Tomizawa and Itoh, 1981). Since producing fine chemicals in microorganisms is generally performed using cellular phenotypes under none native cellular conditions, RNA silencing (siRNA), or interference (RNAi) using artificial antisense RNA could be a powerful tool toward the goal of developing different genetic perturbations. Such an approach has been followed in the case of commercial production of two solvents, namely acetone and butanol using the gram-positive bacterium Clostridium acetobutylicum. To improve such solvent production, RNAi was employed for reducing the expression of enzymes in a competing pathway leading to butyrate formation. Two enzymes mediate butyrate synthesis, phosphotransbutyrylase (PTB) and butyrate kinase (BK) where PTB converts butyryl-CoA into butyryl phosphate with the subsequent reaction catalyzed by BK to produce butyrate. For the purpose of reducing the expression of BK, a synthetic oligonucleotide fragment was developed containing only ten codons of the original BK and its native putative ribosome binding site. To start and end the transcription of the antisense BK, the PTB promoter sequence and a rho-dependent termination sequence were also included in the construct (Desai and Papoutsakis, 1999). Repeating this approach, an antisense fragment for PTB was constructed by including a 567-bp PTB fragment, its putative ribosome binding site, and adc terminator. The C. acetobutylicum strain which expressed antisense butyrate kinase exhibited up to 90% lower BK synthesis and resulted in 50% and 35% higher final concentration of acetone and butanol, respectively. The strain which expressed the PTB antisense synthesized 70% lower PTB; however, acetone and butanol concentration were 96 and 75% lower, respectively, compared with that of the native strain (Desai and Papoutsakis, 1999).

22.5.2 Engineering the Genetic Machinery In an effort to engineer optimal strains as high-value chemical production platforms, directed cell evolution, gene shuffling, in vitro recombination, and custom designed proteins are some of the few tools

22-22

Future Applications of Metabolic Engineering

available to adjust the metabolic mechanisms of microbial biosynthesis. The strategies of directed evolution are inspired by the routes of evolution in nature, in which selective pressures lead to the accumulation of beneficial genetic mutations that confer metabolic fitness and thus the survival of the fittest cellular phenotypes (Koffas and Cardayre, 2005). The availability of structural studies is normally not a prerequisite to successful laboratory enzyme evolution, because directed evolution employs stochastic methods to generate mutant libraries. However, recently, deterministic methods that incorporate structural information have been described. DNA shuffling is based on an iterative processes of random mutation generation and selection for improvement of phenotypes toward a desired goal (Stemmer, 1994). In these methods, random point mutations are generated in the parental DNA sequences in order to introduce sequence diversity. Then, the redistribution of mutation locations is achieved through reassembly of the parental DNA pool by random selection of fragmented DNA where upon assembly the genetic library can be inserted into a tractable host, such as E. coli. Screening, through various high-throughput techniques that depend on the desired phenotype, clones with improved phenotypes are identified and used to isolate evolved genes responsible for improved fitness and consecutively used as the parental sequences in the next generation of evolved strains. Some strategies utilize a pool of homologous parental sequences derived from different species as a template for the recombination effort, in order to increase sequence diversity, thus increasing the protein sequence space (Crameri et al., 1998). As a requirement of gene shuffling, the parental gene pool requires high sequence homology among the strains for successful reassembly of the fragmented DNA. Exploring novel crossover points in regions of low identity that would accelerate enzyme in vitro evolution is limited, since typical crossover points occur in DNA regions of high sequence identity, thus restricting the method’s impact ability (Bogarad and Deem, 1999). In response to this limitation, a method termed incremental truncation for the creation of hybrid enzyme (ITCHY) (Ostermeier et al., 1999b) was developed that allows such exploration by generating all possible fusions between two non-homologous genes (Ostermeier et al., 1999a). As a case study, ITCHY was implemented to create a glycinamide ribonucleotide (GAR) formyltransferase from E. coli and human DNA sequences, which only share 50% DNA sequence identity but do contain similar secondary structures and proper active sites. Taking the 5′-terminal of the E. coli sequence and the 3′-terminal of the human sequence, truncated unidirectional enzyme digests were performed to generate the ITCHY library to be used for construction. A full-length DNA library consisting of the E. coli and human fragments was assembled by adjoining the truncated sites from two randomly selected sequences which were then cloned into co-replicable plasmids for transfection of an E. coli strain deficient of GAR transformylase activity. Growing the E. coli mutant in the absence of purine led to the identification of the functional hybrid enzymes, where upon sequencing of the hybrid proteins, it was discovered that the sequences generated from ITCHY exhibited wider crossover distributions and as such, scanned a larger protein sequence space than a library generated from standard DNA shuffling (Ostermeier et al., 1999a). Discovery of novel enzymatic activities has also been found by blindly recombining homologous and nonhomologous sequences to produce chimeric proteins, however the probability to isolate complete mutants with highly improved functions is low, and limited further by the prerequisite for good screening systems. In a more effective approach to protein evolution, native proteins are evolved based on their structure by swapping domains of structural similarity (Ostermeier and Benkovic, 2000; Ranganathan et al., 1999; Riechmann and Winter, 2000). The correct identification of interchangeable modules, freely swappable sequences, and the locations of safe crossover points are two essential objectives for successful structure-based recombination of DNA. In this effort, a measurement for the interactions between amino acid residues and levels of disruption resulting from the replacement of a subset of amino acids were developed in a computational algorithm called SCHEMA (Voigt et al., 2002). Locations of amino acids corresponding to a minimum level of disruption are used to identify the potential crossover points for swapping protein modules. The information obtained from SCHEMA was used to construct a structurally similar hybrid protein derived from β-lactamases (TEM-1) and PSE-4, two proteins that share

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-23

only 40% amino acid sequence homology. Domains of the two proteins were interchanged with the resulting hybrid sequences inserted into E. coli. To select proper functionality and activity of the hybrid β-lactamases, cultures grown under different antibiotic concentrations yielded several hybrid proteins exhibiting distinct modular combinations that corresponded to an increased antibiotic resistance by the host (Voigt et al., 2002). While designer proteins have yet to be applied to fine chemical production, a recent construction allowed E. coli to secrete a black compound in response to light stimulus, thus functioning as a bacterial photograph. Phytochromes found in plants and some bacteria are two-component systems that consist of photoreceptor and response-regulator domains. The photoreceptor domain of phytochrome Cph1 derived from Synechocystis was fused with the EnvZ histidine kinase domain from E. coli. The bacterial photograph was created by introducing the chimera into E. coli containing a chromosomal insertion of lacZ reporter gene under the control of the OmpR-dependent ompC promoter, and two phycocyanobilin biosynthesis genes, hol1 and pcyA from Synechocystis. Phosphorylated histidine kinase acted as an activator for the lacZ transcription under conditions with no light excitation. In the presence of light, phycocyanobilin response inactivated the phosphorylation of the histidine kinase, hence the expression of lacZ was inactivated, producing a contrasting replica of the image on a lawn of E. coli (Levskaya et al., 2005) and making a first step in bacterial imaging.

References Abe, I., Sano, Y., Takahashi, Y., and Noguchi, H. 2003. Site-directed mutagenesis of benzalacetone synthase. The role of the Phe215 in plant type III polyketide synthases. J. Biol. Chem., 278, 25218–25226. Alexeyev, M. F. and Shokolenko, I. N. 1995. Mini-Tn10 transposon derivatives for insertion mutagenesis and gene delivery into the chromosome of gram-negative bacteria. Gene, 160, 59–62. Allister, E. M., Borradaile, N. M., Edwards, J. Y., and Huff, M. W. 2005. Inhibition of microsomal triglyce ride transfer protein expression and apolipoprotein B100 secretion by the citrus flavonoid naringenin and by insulin involves activation of the mitogen-activated protein kinase pathway in hepatocytes. Diabetes, 54, 1676–1683. Alper, H., Fischer, C., Nevoigt, E., and Stephanopoulos, G. 2005a. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA, 102, 12678–12683. Alper, H., Jin, Y. S., Moxley, J. F., and Stephanopoulos, G. 2005b. Identifying gene targets for the metabolic engineering of lycopene biosynthesis in Escherichia coli. Metab. Eng., 7, 155–164. Alper, H., Miyaoku, K., and Stephanopoulos, G. 2005c. Construction of lycopene-overproducing E. coli strains by combining systematic and combinatorial gene knockout targets. Nat. Biotechnol., 23, 612–616. Ardo, Y. 2006. Flavour formation by amino acid catabolism. Biotechnol. Adv., 24, 238–242. Austin, M. B. and Noel, J. P. 2003. The chalcone synthase superfamily of type III polyketide synthases. Nat. Prod. Rep., 20, 79–110. Backman, K., O’Connor, M. J., Maruya, A., and other authors 1990. Genetic engineering of metabolic pathways applied to the production of phenylalanine. Ann. N. Y. Acad. Sci., 589, 16–24. Bailey, J. E. 1991. Toward a science of metabolic engineering. Science, 252, 1668–1675. Ballantine, J. A. 1969. The isolation of two esters of the naphthaquinone alcohol, shikonin, from the shrub jatropha glandulifera. Phytochemistry, 8, 1587–1590. Barkovich, R. and Liao, J. C. 2001. Metabolic engineering of isoprenoids. Metab. Eng., 3, 27–39. Bluemke, W. and Schrader, J. 2001. Integrated bioprocess for enhanced production of natural flavors and fragrances by Ceratocystis moniliformis. Biomol. Eng., 17, 137–142. Bogarad, L. D. and Deem, M. W. 1999. A hierarchical approach to protein molecular evolution. Proc. Natl. Acad. Sci. USA, 96, 2591–2595. Böker, A., Fischer, M., and Berger, R. G. 2001. Raspberry Ketone from Submerged Cultured Cells of the Basidiomycete Nidula niveo-tomentosa. Biotechnol. Prog., 17, 568–572.

22-24

Future Applications of Metabolic Engineering

Bongaerts, J., Kramer, M., Muller, U., Raeven, L., and Wubbolts, M. 2001. Metabolic engineering for microbial production of aromatic amino acids and derived compounds. Metab. Eng., 3, 289–300. Boyaval, P., Moreira, E., and Desmazeaud, M. J. 1983. Transport of aromatic amino acids by Brevibacterium linens. J. Bacteriol., 155, 1123–1129. Caltagirone, S., Rossi, C., Poggi, A., Ranelletti, F. O., Natali, P. G., Brunetti, M., Aiello, F. B., and Piantelli, M. 2000. Flavonoids apigenin and quercetin inhibit melanoma growth and metastatic potential. Int. J. Cancer, 87, 595–600. Causey, T. B., Shanmugam, K. T., Yomano, L. P., and Ingram, L. O. 2004. Engineering Escherichia coli for efficient conversion of glucose to pyruvate. Proc. Natl. Acad. Sci. USA, 101, 2235–2240. Chemler, J. A., Yan, Y., and Koffas, M. A. 2006. Biosynthesis of isoprenoids, polyunsaturated fatty acids and flavonoids in Saccharomyces cerevisiae. Microb. Cell Fact, 5, 20. Choi, J. H., Ryu, Y. W., and Seo, J. H. 2005. Biotechnological production and applications of coenzyme Q10. Appl. Microbiol. Biotechnol., 68, 9–15. Cornwell, T., Cohick, W., and Raskin, I. 2004. Dietary phytoestrogens and health. Phytochemistry, 65, 995–1016. Crameri, A., Raillard, S. A., Bermudez, E., and Stemmer, W. P. 1998. DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature, 391, 288–291. Damsgaard, C. T., Schack-Nielsen, L., Michaelsen, K. F., Fruekilde, M. B., Hels, O., and Lauritzen, L. 2006. Fish oil affects blood pressure and the plasma lipid profile in healthy Danish infants. J. Nutr., 136, 94–99. DeJong, J. M., Liu, Y. L., Bollon, A. P., Long, R. M., Jennewein, S., Williams, D., and Croteau, R. B. 2006. Genetic engineering of Taxol biosynthetic genes in Saccharomyces cerevisiae. Biotechnol. Bioeng., 93, 212–224. Del Rio, E., Acien, F. G., Garcia-Malea, M. C., Rivas, J., Molina-Grima, E., and Guerrero, M. G. 2005. Efficient one-step production of astaxanthin by the microalga Haematococcus pluvialis in continuous culture. Biotechnol. Bioeng., 91, 808–815. Demyttenaere, J. and van Ruth, S. 2001. Natural flavours. Overviews and applications of analytical methods and microbial production. Biomol. Eng., 17, 119. Desai, R. P. and Papoutsakis, E. T. 1999. Antisense RNA strategies for metabolic engineering of Clostridium acetobutylicum. Appl. Environ. Microbiol., 65, 936–945. Dixon, R. A. 2004. Phytoestrogens. Annu. Rev. Plant Biol., 55, 225–261. Drewlo, S., Bramer, C. O., Madkour, M., Mayer, F., and Steinbuchel, A. 2001. Cloning and expression of a Ralstonia eutropha HF39 gene mediating indigo formation in Escherichia coli. Appl. Environ. Microbiol., 67, 1964–1969. Dumont, B., Hugueny, P., and Belin, J.-M., EP A1707072, 1996, Preparation of raspberry ketone by bioconversion. Falconnier, B., WO 9949069, 1999. Raspberry ketone bioconversion. 1999-03-19. Foa, R., Norton, L., and Seidman, A. D. 1994. Taxol (paclitaxel): a novel anti-microtubule agent with remarkable anti-neoplastic activity. Int. J. Clin. Lab. Res., 24, 6–14. Forkmann, G. and Martens, S. 2001. Metabolic engineering and applications of flavonoids. Curr. Opin. Biotechnol., 12, 155–160. Frost & Sullivan. 2003. The European Polyphenols Markets. Online www.frost.com. Goeddel, D. V., Kleid, D. G., Bolivar, F., and other authors 1979. Expression in Escherichia coli of chemically synthesized genes for human insulin. Proc. Natl. Acad. Sci. USA, 76, 106–110. Gottesman, S. 2005. Micros for microbes: non-coding regulatory RNAs in bacteria. Trends Genet., 21, 399–404. Greenwald, P. 2004. Clinical trials in cancer prevention: current results and perspectives for the future. J. Nutr., 134, 3507S–3512S. Guerin, M., Huntley, M. E., and Olaizola, M. 2003. Haematococcus astaxanthin: applications for human health and nutrition. Trends Biotechnol., 21, 210–216.

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-25

Hall, R. D. and Yeoman, M. M. 1986. Temporal and spatial heterogeneity in the accumulation of anthocyanins in cell-cultures of Catharanthus roseus (L) Don, G. J. Exp. Bot., 37, 48–60. Hanagata, N., Ito, A., Uehara, H., Asari, F., Takeuchi, T., and Karube, I. 1993. Behavior of cell aggregate of Carthamus tinctorius L cultured cells and correlation with red pigment formation. J. Biotechnol., 30, 259–269. Hannum, S. M. 2004. Potential impact of strawberries on human health: a review of the science. Crit. Rev. Food Sci. Nutr., 44, 1–17. Haynes, R. K. and Krishna, S. 2004. Artemisinins: activities and actions. Microbes Infect., 6, 1339–1346. Hellwig, S., Drossard, J., Twyman, R. M., and Fischer, R. 2004. Plant cell cultures for the production of recombinant proteins. Nat. Biotechnol., 22, 1415–1422. Hezari, M., Lewis, N. G., and Croteau, R. 1995. Purification and characterization of taxa-4(5),11(12)diene synthase from Pacific Yew (Taxus brevifolia) that catalyzes the first committed step of taxol biosynthesis. Arch. Biochem. Biophy., 322, 437–444. Hopwood, D. A., Malpartida, F., Kieser, H. M., Ikeda, H., Duncan, J., Fujii, I., Rudd, B. A., Floss, H. G., and Omura, S. 1985. Production of “hybrid” antibiotics by genetic engineering. Nature 314, 642–644. Hou, D. X., Fujii, M., Terahara, N., and Yoshimoto, M. 2004. Molecular mechanisms behind the chemopreventive effects of anthocyanidins. J. Biomed. Biotechnol., 5, 321–325. Huang, Q., Roessner, C. A., Croteau, R., and Scott, A. I. 2001. Engineering Escherichia coli for the synthesis of taxadiene, a key intermediate in the biosynthesis of taxol. Bioorg. Med. Chem., 9, 2237–2242. Huang, Y. S., Chaudhary, S., Thurmond, J. M., Bobik, E. G., Jr., Yuan, L., Chan, G. M., Kirchner, S. J., Mukerji, P., and Knutzon, D. S. 1999. Cloning of delta12- and delta6-desaturases from Mortierella alpina and recombinant production of gamma-linolenic acid in Saccharomyces cerevisiae. Lipids, 34, 649–659. Hwang, E. I., Kaneko, M., Ohnishi, Y., and Horinouchi, S. 2003. Production of plant-specific flavanones by Escherichia coli containing an artificial gene cluster. Appl. Environ. Microbiol., 69, 2699–2706. Ikeda, M. and Katsumata, R. 1992. Metabolic engineering to produce tyrosine or phenylalanine in a tryptophan producing Corynebacterium glutamicum strain. Appl. Environ. Microbiol., 58, 781–785. Ikeda, M. 2006. Towards bacterial strains overproducing L-tryptophan and other aromatics by metabolic engineering. Appl. Microbiol. Biotechnol., 69, 615–626. Ito, H., Sakurai, S., Tanaka, T., Sato, K., and Enei, H. 1990. Genetic breeding of L-tyrosine producer from Brevibacterium lactofermentum. Agric. Biol. Chem., 54, 699–705. Jennewein, S., Park, H., DeJong, J. M., Long, R. M., Bollon, A. P., and Croteau, R. B. 2005. Coexpression in yeast of Taxus cytochrome p450 reductase with cytochrorne P450 oxygenases involved in taxol biosynthesis. Biotechnol. Bioeng., 89, 588–598. Jonassen, T., Proft, M., Randez-Gil, F., Schultz, J. R., Marbois, B. N., Entian, K. D., and Clarke, C. F. 1998. Yeast Clk-1 homologue (Coq7/Cat5) is a mitochondrial protein in coenzyme Q synthesis. J. Biol. Chem., 273, 3351–3357. Kalscheuer, R., Luftmann, H., and Steinbuchel, A. 2004. Synthesis of novel lipids in Saccharomyces cerevisiae by heterologous expression of an unspecific bacterial acyltransferase. Appl. Environ. Microbiol., 70, 7119-7125. Kalscheuer, R., Stoveken, T., Luftmann, H., Malkus, U., Reichelt, R., and Steinbuchel, A. 2006. Neutral lipid biosynthesis in engineered Escherichia coli: jojoba oil-like wax esters and fatty acid butyl esters. Appl. Environ. Microbiol., 72, 1373–1379. Kaneko, M., Hwang, E. I., Ohnishi, Y., and Horinouchi, S. 2003. Heterologous production of flavanones in Escherichia coli: potential for combinatorial biosynthesis of flavonoids in bacteria. J. Ind. Microbiol. Biotechnol., 30, 456–461. Kao, C. M., Katz, L., and Khosla, C. 1994. Engineered biosynthesis of a complete macrolactone in a heterologous host. Science, 265, 509–512.

22-26

Future Applications of Metabolic Engineering

Kim, S. H., Heo, K., Chang, Y. J., Park, S. H., Rhee, S. K., and Kim, S. U. 2006. Cyclization mechanism of amorpha-4,11-diene synthase, a key enzyme in artemisinin biosynthesis. J. Nat. Prod., 69, 758–762. Kingston, D. G. 1991. The chemistry of taxol. Pharmacol. Ther., 52, 1–34. Kobayashi, Y., Akita, M., Sakamoto, K., Liu, H. F., Shigeoka, T., Koyano, T., Kawamura, M., and Furuya, T. 1993. Large scale production of Anthocyanin by Aralia cordata cell suspension cultures. Appl. Microbiol. Biotechnol., 40, 215–218. Koffas, M. and Cardayre, S. D. 2005. Evolutionary metabolic engineering. Metab. Eng., 7, 1–3. Konstantinov, K. B., Nishio, N., Seki, T., and Yoshida, T. 1991. Physiologically motivated strategies for control of the fed-batch cultivation of recombinant Escherichia coli for phenylalanine production. J. Ferment. Bioeng., 71, 350–355. Kosjek, B., Stampfer, W., van Deursen, R., Faber, K., and Kroutil, W. 2003. Efficient production of raspberry ketone via “green” biocatalytic oxidation. Tetrahedron, 59, 9517–9521. Lacatena, R. M. and Cesareni, G. 1981. Base pairing of RNA I with its complementary sequence in the primer precursor inhibits ColE1 replication. Nature, 294, 623–626. Lacatena, R. M. and Cesareni, G. 1983. Interaction between RNA1 and the primer precursor in the regulation of Co1E1 replication. J. Mol. Biol., 170, 635–650. Laoteng, K., Mannontarat, R., Tanticharoen, M., and Cheevadhanarak, S. 2000. delta(6)-desaturase of Mucor rouxii with high similarity to plant delta(6)-desaturase and its heterologous expression in Saccharomyces cerevisiae. Biochem. Biophys. Res. Commun., 279, 17–22. Lee, J. H. and Kim, Y. T. 2006. Cloning and characterization of the astaxanthin biosynthesis gene cluster from the marine bacterium Paracoccus haeundaensis. Gene, 370, 86–95. Lee, P. C. and Schmidt-Dannert, C. 2002. Metabolic engineering towards biotechnological production of carotenoids in microorganisms. Appl. Microbiol. Biotechnol., 60, 1–11. Leonard, E., Yan, Y., Lim, K. H., and Koffas, M. A. 2005. Investigation of two distinct flavone synthases for plant-specific flavone biosynthesis in Saccharomyces cerevisiae. Appl. Environ. Microbiol., 71, 8241–8248. Leonard, E., Yan, Y., and Koffas, M. A. 2006. Functional expression of a P450 flavonoid hydroxylase for the biosynthesis of plant-specific hydroxylated flavonols in Escherichia coli. Metab. Eng., 8, 172–181. Leonard, E., Yan, Y., Fowler, Z. L., Li, Z., Lim, C. G., Lim, K. H., and Koffas, M. A. 2008. Strain improvement of recombinant Escherichia coli for efficient production of plant flavonoids. Mol. Pharm., 5, 257–265. Lesage-Meessen, L., Delattre, M., Haon, M., and Asther, M., WO 9608576, 1996. Obtaining vanillic acid and vanillin by bioconversion by an association of filamentous microorganisms. 1995-09-13. Levskaya, A., Chevalier, A. A., Tabor, J. J., and other authors 2005. Synthetic biology: engineering Escherichia coli to see light. Nature, 438, 441–442. Li, F., Jin, Z., Qu, W., Zhao, D., and Ma, F. 2006. Cloning of a cDNA encoding the Saussurea medusa chalcone isomerase and its expression in transgenic tobacco. Plant Physiol. Biochem., 44, 455–461. Lin, X., Hezari, M., Koepp, A. E., Floss, H. G., and Croteau, R. 1996. Mechanism of taxadiene synthase, a diterpene cyclase that catalyzes the first step of taxol biosynthesis in Pacific yew. Biochemistry, 35, 2968–2977. Lipshutz, B. H., Mollard, P., Pfeiffer, S. S., and Chrisman, W. 2002. A short, highly efficient synthesis of coenzyme Q10. J. Am. Chem. Soc., 124, 14282–14283. Liu, C., Zhao, Y. and Wang, Y. 2006. Artemisinin: current state and perspectives for biotechnological production of an antimalarial drug. Appl. Microbiol. Biotechnol., 72, 11–20. Lomascolo, A., Stentelaire, C., Asther, M., and Lesage-Meessen, L. 1999. Basidiomycetes as new biotechnological tools to generate natural aromatic flavours for the food industry. Trends Biotechnol., 17, 282–289.

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-27

Martin, V. J. J., Pitera, D. J., Withers, S. T., Newman, J. D., and Keasling, J. D. 2003. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nature Biotechnol., 21, 796–802. Marz, U. 1996. Natural Food Colors: Emphasizing European Technology and Markets. Business Communications Company, Inc., Norwalk, CT, USA. McDougall, G. J. and Stewart, D. 2005. The inhibitory effects of berry polyphenols on digestive enzymes. Biofactors, 23, 189–195. Meganathan, R. 2001. Ubiquinone biosynthesis in microorganisms. FEMS Microbiol. Lett., 203, 131–139. Melzer, M. and Heide, L. 1994. Characterization of polyprenyldiphosphate: 4-hydroxybenzoate polyprenyltransferase from Escherichia coli. Biochim. Biophys. Acta, 1212, 93–102. Metz, J. G., Roessler, P., Facciotti, D., and other authors 2001. Production of polyunsaturated fatty acids by polyketide synthases in both prokaryotes and eukaryotes. Science, 293, 290–293. Meyer, J. E., Pepin, M. F., and Smith, M. A. L. 2002. Anthocyanin production from Vaccinium pahalae: limitations of the physical micro environment. J. Biotechnol., 93, 45–57. Miller, J. E., Backman, K. C., O’Connor, M. J., and Hatch, R. T. 1987. Production of phenylalanine and organic-acids by phosphoenolpyruvate carboxylase-deficient mutants of Escherichia coli. J. Ind. Microbiol., 2, 143–149. Miyahisa, I., Funa, N., Ohnishi, Y., Martens, S., Moriguchi, T., and Horinouchi, S. 2005a. Combinatorial biosynthesis of flavones and flavonols in Escherichia coli. Appl. Microbiol. Biotechnol., 71, 53–58. Miyahisa, I., Kaneko, M., Funa, N., Kawasaki, H., Kojima, H., Ohnishi, Y., and Horinouchi, S. 2005b. Efficient production of (2S)-flavanones by Escherichia coli containing an artificial biosynthetic gene cluster. Appl. Microbiol. Biotechnol., 68, 498–504. Nakajima, J., Tanaka, Y., Yamazaki, M., and Saito, K. 2001. Reaction mechanism from leucoanthocyanidin to anthocyanidin 3-glucoside, a key reaction for coloring in anthocyanin biosynthesis. J. Biol. Chem., 276, 25797–25803. Negishi, E., Liou, S. Y., Xu, C., and Huo, S. 2002. A novel, highly selective, and general methodology for the synthesis of 1,5-diene-containing oligoisoprenoids of all possible geometrical combinations exemplified by an iterative and convergent synthesis of coenzyme Q10. Org. Lett., 4, 261–264. Newman, J. D., Marshall, J., Chang, M., Nowroozi, F., Paradise, E., Pitera, D., Newman, K. L., and Keasling, J. D. 2006. High-level production of amorpha-4,11-diene in a two-phase partitioning bioreactor of metabolically engineered Escherichia coli. Biotechnol. Bioeng., 95, 684–691. NutraUSAingredients.com 2004.Tomatoes bred to contain extra antioxidant. Okada, K., Suzuki, K., Kamiya, Y., and other authors 1996. Polyprenyl diphosphate synthase essentially defines the length of the side chain of ubiquinone. Biochim. Biophys. Acta, 1302, 217–223. Okada, K., Minehira, M., Zhu, X., Suzuki, K., Nakagawa, T., Matsuda, H., and Kawamukai, M. 1997. The ispB gene encoding octaprenyl diphosphate synthase is essential for growth of Escherichia coli. J. Bacteriol., 179, 3058–3060. Ostermeier, M., Nixon, A. E., Shim, J. H., and Benkovic, S. J. 1999a. Combinatorial protein engineering by incremental truncation. Proc. Natl. Acad. Sci. USA, 96, 3562–3567. Ostermeier, M., Shim, J. H., and Benkovic, S. J. 1999b. A combinatorial approach to hybrid enzymes independent of DNA homology. Nature Biotechnol., 17, 1205–1209. Ostermeier, M. and Benkovic, S. J. 2000. Evolution of protein function by domain swapping. Adv. Protein Chem., 55, 29–77. Papageorgiou, V. P. 1978. Wound-healing properties of naphthaquinone pigments from Alkanna Tinctoria. Experientia, 34, 1499–1501. Park, Y. C., Kim, S. J., Choi, J. H., Lee, W. H., Park, K. M., Kawamukai, M., Ryu, Y. W., and Seo, J. H. 2005. Batch and fed-batch production of coenzyme Q10 in recombinant Escherichia coli containing the decaprenyl diphosphate synthase gene from Gluconobacter suboxydans. Appl. Microbiol. Biotechnol., 67, 192–196. Pedapudi, S., Chin, C. K., and Pedersen, H. 2000. Production and elicitation of benzalacetone and the raspberry ketone in cell suspension cultures of Rubus idaeus. Biotechnol. Prog., 16, 346–349.

22-28

Future Applications of Metabolic Engineering

Pfeifer, B. A., Admiraal, S. J., Gramajo, H., Cane, D. E., and Khosla, C. 2001. Biosynthesis of complex polyketides in a metabolically engineered strain of E. coli. Science, 291, 1790–1792. Picaud, S., Mercke, P., He, X., Sterner, O., Brodelius, M., Cane, D. E., and Brodelius, P. E. 2006. Amorpha4,11-diene synthase: mechanism and stereochemistry of the enzymatic cyclization of farnesyl diphosphate. Arch. Biochem. Biophys., 448, 150–155. Poon, W. W., Barkovich, R. J., Hsu, A. Y., Frankel, A., Lee, P. T., Shepherd, J. N., Myles, D. C., and Clarke, C. F. 1999. Yeast and rat Coq3 and Escherichia coli UbiG polypeptides catalyze both O-methyltransferase steps in coenzyme Q biosynthesis. J. Biol. Chem., 274, 21665–21672. Popiolkiewicz, J., Polkowski, K., Skierski, J. S., and Mazurek, A. P. 2005. In vitro toxicity evaluation in the development of new anticancer drugs - genistein glycosides. Cancer Lett., 229, 67–75. Potter, S. M., Baum, J. A., Teng, H. Y., Stillman, R. J., Shay, N. F., and Erdman, J. W. 1998. Soy protein and isoflavones: their effects on blood lipids and bone density in postmenopausal women. Am. J. Clin. Nutr., 68, 1375S–1379S. Pouget, C., Lauthier, F., Simon, A., Fagnere, C., Basly, J. P., Delage, C., and Chulia, A. J. 2001. Flavonoids: Structural requirements for antiproliferative activity on breast cancer cells. Bioorg. Med. Chem. Lett., 11, 3095–3097. Ranganathan, A., Timoney, M., Bycroft, M., and other authors 1999. Knowledge-based design of bimodular and trimodular polyketide synthases based on domain and module swaps: a route to simple statin analogues. Chem. Biol., 6, 731–741. Rathod, P. K., McErlean, T., and Lee, P. C. 1997. Variations in frequencies of drug resistance in Plasmodium falciparum. Proc. Natl. Acad. Sci. USA, 94, 9389–9393. Ratledge, C. 2004. Fatty acid biosynthesis in microorganisms being used for Single Cell Oil production. Biochimie, 86, 807–815. Riechmann, L. and Winter, G. 2000. Novel folded protein domains generated by combinatorial shuffling of polypeptide segments. Proc. Natl. Acad. Sci. USA, 97, 10068–10073. Ro, D. K. and Douglas, C. J. 2004. Reconstitution of the entry point of plant phenylpropanoid metabolism in yeast (Saccharomyces cerevisiae): implications for control of metabolic flux into the phenylpropanoid pathway. J. Biol. Chem., 279, 2600–2607. Ro, D. K., Paradise, E. M., Ouellet, M., and other authors 2006. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature, 440, 940–943. Segre, D., Vitkup, D., and Church, G. M. 2002. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. USA, 99, 15112–15117. Shuler, M. L. 1994. Bioreactor engineering as an enabling technology to tap biodiversity. The case of taxol. Ann. N. Y. Acad. Sci., 745, 455–461. Smit, B. A., Engels, W. J. M., Bruinsma, J., Hylckama Vlieg, J. E. T., Wouters, J. T. M., and Smit, G. 2004. Development of a high throughput screening method to test flavour-forming capabilities of anaerobic micro-organisms. J. Appl. Microbiol., 97, 306–313. Smit, G., Smit, B. A., and Engels, W. J. 2005. Flavour formation by lactic acid bacteria and biochemical flavour profiling of cheese products. FEMS Microbiol. Rev., 29, 591–610. Smith, M. A. L. and Spomer, L. A. 1995. Vessels, gels, liquid media and support systems. In Automation and Environmental Control in Plant Tissue Culture, pp. 371–404. Edited by J. Aitken-Christie, T. Kozai and M. A. L. Smith. Dordrecht, The Netherlands: Kluwer Academic Publishers. Stassi, D. L., Kakavas, S. J., Reynolds, K. A., and other authors 1998. Ethyl-substituted erythromycin derivatives produced by directed metabolic engineering. Proc. Natl. Acad. Sci. USA, 95, 7305–7309. Stemmer, W. P. 1994. DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc. Natl. Acad. Sci. USA, 91, 10747–10751. Stephanopoulos, G. and Vallino, J. J. 1991. Network rigidity and metabolic engineering in metabolite overproduction. Science, 252, 1675–1681. Sun, Z., CN 1421523, 2003, Aspergillus niger and biotransformation method for producing vanillic acid and vanillin by using it. 2002-07-22.

Microbial Biosynthesis of Fine Chemicals: An Emerging Technology

22-29

Sun, Z. 2005. Manufacture of vanillin by Pycnoporus cinnabarinus biotransformation. 7 pp. Application: CN: Southern Yangtze University, Peoples Republic of China. Suzuki, K., Ueda, M., Yuasa, M., Nakagawa, T., Kawamukai, M., and Matsuda, H. 1994. Evidence that Escherichia coli ubiA product is a functional homolog of yeast COQ2, and the regulation of ubiA gene expression. Biosci. Biotechnol. Biochem., 58, 1814–1819. Takagi, M., Nishio, Y., Oh, G., and Yoshida, T. 1996. Control of L-phenylalanine production by dual feeding of glucose and L-tyrosine. Biotechnol. Bioeng., 52, 653–660. Tao, L., Jackson, R. E., and Cheng, Q. 2005a. Directed evolution of copy number of a broad host range plasmid for metabolic engineering. Metab. Eng., 7, 10–17. Tao, L., Schenzle, A., Odom, J. M., and Cheng, Q. 2005b. Novel carotenoid oxidase involved in biosynthesis of 4,4’-diapolycopene dialdehyde. Appl. Environ. Microbiol., 71, 3294–3301. Tobias, A. V. and Arnold, F. H. 2006. Biosynthesis of novel carotenoid families based on unnatural carbon backbones: A model for diversification of natural product pathways. Biochim. Biophys. Acta, 1761, 235–246. Tomizawa, J. and Itoh, T. 1981. Plasmid ColE1 incompatibility determined by interaction of RNA I with primer transcript. Proc. Natl. Acad. Sci. USA, 78, 6096–6100. Ueda, K., Kim, K. M., Beppu, T., and Horinouchi, S. 1995. Overexpression of a gene cluster encoding a chalcone synthase-like protein confers redbrown pigment production in Streptomyces griseus. J. Antibiot. (Tokyo), 48, 638–646. Unagul, P., Wongsa, P., Kittakoop, P., Intamas, S., Srikitikulchai, P., and Tanticharoen, M. 2005. Production of red pigments by the insect pathogenic fungus Cordyceps unilateralis BCC 1869. J. Ind. Microbiol. Biotechnol., 32, 135–140. Vanderhaegen, B., Neven, H., Coghe, S., Verstrepen, K. J., Derdelinckx, G., and Verachtert, H. 2003. Bioflavoring and beer refermentation. Appl. Microbiol. Biotechnol., 62, 140–150. Varma, A., Boesch, B. W., and Palsson, B. O. 1993. Stoichiometric interpretation of Escherichia coli glucose catabolism under various oxygenation rates. Appl. Environ. Microbiol., 59, 2465–2473. Voigt, C. A., Martinez, C., Wang, Z. G., Mayo, S. L., and Arnold, F. H. 2002. Protein building blocks preserved by recombination. Nat. Struct. Biol., 9, 553–558. von Schacky, C. and Dyerberg, J. 2001. omega 3 fatty acids. From eskimos to clinical cardiology—what took us so long? World Rev. Nutr. Diet, 88, 90–99. Wang, Y. and Pfeifer, B.A. 2008. 6-deoxyerythronolide B production through chromosomal localization of the deoxyerythronolide B synthase genes in E. coli. Metab. Eng., 10, 33–38. Watts, K. T., Mijts, B. N., and Schmidt-Dannert, C. 2005. Current and emerging approaches for natural product biosynthesis in microbial cells. Adv. Syn. Catalysis, 347, 927–940. Wellmann, E. 1975. UV dose-dependent induction of enzymes related to flavonoid biosynthesis in cellsuspension cultures of parsley. FEBS Lett., 51, 105–107. Wetzel, R., Kleid, D. G., Crea, R., and other authors 1981. Expression in Escherichia coli of a chemically synthesized gene for a mini-C analog of human proinsulin. Gene, 16, 63–71. Winkel-Shirley, B. 2001. Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiol., 126, 485–493. Wu, Y. Q., Jiang, P. H., Fan, C. S., Wang, J. G., Shang, L., and Huang, W. D. 2003. Co-expression of five genes in E coli for L-phenylalanine in Brevibacterium flavum. World J. Gastroenterol., 9, 342–346. Yamano, S., Ishii, T., Nakagawa, M., Ikenaga, H., and Misawa, N. 1994. Metabolic engineering for production of beta-carotene and lycopene in Saccharomyces cerevisiae. Biosci. Biotechnol. Biochem., 58, 1112–1114. Yan, Y., Chemler, J., Huang, L., Martens, S., and Koffas, M. A. 2005a. Metabolic engineering of anthocyanin biosynthesis in Escherichia coli. Appl. Environ. Microbiol., 71, 3617–3623. Yan, Y., Kohli, A. and Koffas, M. A. 2005b. Biosynthesis of natural flavanones in Saccharomyces cerevisiae. Appl. Environ. Microbiol., 71, 5610–5613.

22-30

Future Applications of Metabolic Engineering

Yan, Y., Li, Z, and Koffas, M. A. 2008. High-yield anthocyanin biosynthesis in engineered Escherichia coli. Biotechnol. Bioeng., 100, 120–140. Yoshida, H., Kotani, Y., Ochiai, K., and Araki, K. 1998. Production of ubiquinone-10 using bacteria. J. Gen. Appl. Microbiol., 44, 19–26. Yu, O., Jung, W., Shi, J., Croes, R. A., Fader, G. M., McGonigle, B., and Odell, J. T. 2000. Production of the isoflavones genistein and daidzein in non-legume dicot and monocot tissues. Plant Physiol., 124, 781–794. Zava, D. T. and Duwe, G. 1997. Estrogenic and antiproliferative properties of genistein and other flavonoids in human breast cancer cells in vitro. Nutr. Cancer—Inter. J., 27, 31–40. Zhang, Y. X., Perry, K., Vinci, V. A., Powell, K., Stemmer, W. P., and del Cardayre, S. B. 2002. Genome shuffling leads to rapid phenotypic improvement in bacteria. Nature, 415, 644–646. Zhong, J. J., Seki, T., Kinoshita, S., and Yoshida, T. 1991. Effect of light irradiation on anthocyanin production by suspended culture of Perilla frutescens. Biotechnol. Bioeng., 38, 653–658.

23 Applications of Metabolic Engineering for Natural Drug Discovery Yi Tang University of California

Suzanne Ma University of California

Wladyslaw A. Wojcicki University of California

23.1 Introduction �� 23-1 23.2 Antimicrobial Drugs ��23-4

Penicillin • Erythromycin • Artemisinin

23.3 Anticancer Drugs �� 23-15

Daunorubicin (DNR) and Doxorubicin (DXR) • Paclitaxel • Epothilone

23.4 Cholesterol Lowering Statins ��23-24

Lovastatin • Pravastatin and Compactin

References ��23-29

23.1 Introduction Nature manufactures a wide assortment of compounds highly diverse in structure and function. Many of these compounds are therapeutically valuable, and play important roles in maintaining and enhancing human health. A large percentage of these compounds are produced as secondary metabolites by their native hosts. There are over 20,000 biologically active secondary microbial metabolites, and over 16,000 have antibiotic activity.1 The famous Streptomyces genus alone produces approximately 8,000 known bioactive small molecules. Considering all natural products (1,000,000+), over a quarter of a million display some bioactivity, with 30,000 known to possess antibiotic activity.1 Table 23.1 is a partial list of natural drugs on the market today along with their respective natural hosts and therapeutic value. More natural products (or those derived from natural precursors) are still completing clinical trials, and an excellent review on therapeutics derived from natural products has been published by Mark Butler.2 Metabolic engineering can be defined as the purposeful alteration of an organism’s genetic code to redirect its metabolic flux. Metabolic engineering strategies are important in drug discovery and drug production for a number of reasons: (1) to elucidate a specific biosynthetic pathway; (2) to optimize the yield of a natural compound (or a key intermediate); and (3) to biosynthesize analogs of natural compounds that are synthetically inaccessible. Increasing the yield and titer of a natural product drug is an imperative objective. Metabolic engineering of natural products addresses one recurring problem, insufficient supply of the bioactive product versus market demand. The impractical and destructive means that isolating some of these important drugs necessitates a better approach. For example, it takes 750 kg of dry bark from the Pacific Western yew tree in order to extract 1 mg of taxadiene, the first important precursor during paclitaxel biosynthesis.3 In the case of artemisinin, a single treatment requires 12 mg/kg for both adults and children.4 However, at best, the yield of artemisinin from the Artemisia annua L. plant is only about 0.5% (w/w).5 This means one treatment of malaria with artemisinin requires 2.4 g of plant per kg of body mass. 23-1

1948 1976 1940 1964

Streptomyces aureofaciens Beauveria nivea Streptomyces sp. Streptococcus peucetius var. caesius Streptococcus peucetius var. caesius Saccharopolyspora erythrea Podophyllum peltatum Camptotheca acuminata

Chlortetracycline Cyclosporine Dactinomycin Daunorubicin

Etoposide (ss) Irinotecan (ss)

Erythromycin 1988 1980s

1952

1969

1966 1948

Streptococcus verticullus Cephalosporium acremonium

Bleomycin Cephalosporin

Doxorubicin

1955 340 AD 1929

Year of discovery

Streptomyces nodosus Artemisia annua L. Penicillium chrysogenum

Natural host

Amphotericin B Arteminisin Benzylpenicillin

Compound

Table 23.1 Representative Natural and Semisynthetic (ss) Drugs

Ilotycin, A/T/S, Emgel, Ery-Tab, EES Vepesid, Toposar, Etopophos Camptosar

Adriamycin, Rubex

Blenoxane Cefazolin, Cefoxitin, Cefotetan, Ceftriazone, Cefoxime, Ceftriaxone, Cefotaxime, Ceftazidime, Cefepime Aureomycin Neoral Cosmegen Daunomycin

Fungilin, Fungizone Arteminisin Penicillin G

Commercial name

Breast cancer, small cell lung carcinoma, Hodgkin’s disease, non-Hodgkin’s lymphomas, osteogenic, Ewing’s, and soft tissue sarcoma Bronchitis, diphtheria, Legionnaires’ disease, pertussis, pneumonia, rheumatic fever, STDs, ear, intesting, lung, urinary tract, and skin infections Testicular cancer, small cell lung carcinoma Colorectal cancer

Rickettsiae, mycoplasma, chlamydia Immunosuppressant, atopic dermatitis Rhabdomyosarcoma, Wilms’ tumor, Kaposi’s sarcoma, soft tissue carcinomas Acute myelogenous/lymphocytic leukemia

Antifungal Severe malaria (including cerebral) Pneumonia, meningitis, scarlet fever and other Streptococcal infections, syphilis, anthrax, Ovarian and testicular cancers Skin and soft tissue infections, respiratory tract infections, gonorrhea, Lyme disease, meningitis

Indications

23-2 Future Applications of Metabolic Engineering

1929

Penicillium chrysogenum

Actinomadura sp. 1976 Streptomyces hygroscopicus 1975 Streptomyces acromogenes Late 1950s to 1960s Podophyllum peltatum 1975 Camptotheca acuminata 1980s Streptococcus orientalis 1955 Catharanthus roseus 1958

Phenoxymethyl Penicillin

Pravastatin (ss) Rapamycin/Sirolimus Streptozocin

Teniposide Topotecan (ss) Vancomycin Vincristine/Vinorelbine/ Vinblastine

1980 1958 1971 Late 1970s

Aspergillus terreus Streptococcus caespitosus Taxus brevifolia Streptomyces antibioticus

Lovastatin Mitomycin Paclitaxel Pentostatin

Vumon Hycamtin Vancocin Oncovin, Vincasar/ Navelbine/Velban

Pravachol Rapamune Zanosar

Penicillin V

Mevacor Mutamycin Taxol Nipent

Childhood leukemia Ovarian and small cell lung cancer Broad spectrum antibiotic Active toward various cancers and Hodgkin’s disease

Hypercholesterolemia Carcinomas of colon, stomach, cervix, breast, bladder, head, neck, and lung Cancers of the ovaries, breast, lung, head and neck Hairy cell leukemia, promyelocytic leukemia, cutaneous T-cell lymphoma, non-Hodgkin’s lymphoma, Langerhans-cell histiocytosis Anthracis, bronchitis, acute otitis media, sinusitis, skin and soft tissue infections Hypercholesterolemia Organ transplant rejection Pancreatic islet cell carcinoma, malignant carcinoid tumor cells

Applications of Metabolic Engineering for Natural Drug Discovery 23-3

23-4

Future Applications of Metabolic Engineering

Random or rational modifications are widely used to improve the titer of secondary metabolites produced by their natural hosts. One major disadvantage of using the natural host is that some can be slow growing, low yielding, limited in quantity, and difficult to manipulate genetically. As a result, many researchers seek heterologous hosts for drug production. Genetically well-characterized hosts such as Escherichia coli and Saccharomyces cerevisiae that have favorable growth and culturing properties are attractive as alternative biosynthetic hosts. The primary metabolisms of these organisms have been intensively studied; hence abundant information about their metabolome is available. However, hurdles occur with introducing complex foreign proteins into an alternative host, and consequences such as poor expression, lack of coordinated gene expression, low protein solubility and low activity are commonly encountered. An equally important goal of metabolic engineering for drug production is the engineered biosynthesis of novel “unnatural” natural products that display enhanced potency, stability, and attenuated side effects. Subtle structure modifications to a natural product may improve several pharmacological properties, such as acid stability and tissue penetration upon human administration. With increasingly more infectious microorganisms and tumors acquiring resistance toward existing medications, many of the front-line antibiotics and anticancer drugs are becoming ineffective. Therefore, obtaining new bioactive compounds that can combat the modes of resistance is essential for improving human health. While organic synthesis is a powerful method of generating structural diversity, many of the compounds are structurally complex and chemically labile, thus precluding efficient synthetic transformations. Furthermore, a multistep chemical synthesis is difficult and impractical to scale up. Therefore, for the foreseeable future, fermentation will remain the preferred method to obtain many of the natural product drugs and their analogs. One important method to derivatize natural product drugs is through the synergistic approach of chemical and biological synthesis. The joined approaches can be performed in two ways: first is the classical semisynthesis, in which fermentation is used to afford the drug, or an important intermediate, followed by chemical modification. This has been employed to produce important therapeutics, such as simvastatin (Zocor®) and the recently approved tetracycline analog, tigecycline. An alternative approach is precursor-directed biosynthesis, in which a metabolic precursor is chemically synthesized and supplied exogenously to the producing organism. The precursor is then utilized by the biosynthetic pathway (natural or engineered) to yield a variant of the drug. Precursor-directed feeding not only employs an organism’s own machinery to do most of the work, but also breaks the barrier of restricting production to only metabolically available precursors. These methods have been used extensively in the engineered biosynthesis of polyketide compounds. This review will focus on recent approaches in metabolic engineering of biosynthetic pathways to increase drug production, afford key intermediates, and create bioactive analogs of natural products. The drugs covered include commonly prescribed therapeutics for infectious disease, cancer, and hypercholesterolemia.

23.2 Antimicrobial Drugs 23.2.1 Penicillin No one natural product has changed the history of medicine more than the antibiotic penicillin. Penicillin has become synonymous with the first line of defense against bacterial infections. Penicillin was first isolated from Penicillium notatum in 1889 by Ernest Duchesne, and then again by Clodomiro Picado Twight in 1923. However, the inception of penicillin as a commanding antibacterial did not occur until the 1940s when Howard Florey and Ernst Chain created a powder form of the antibiotic. This first strain used for the isolation of penicillin only produced 1 part of penicillin per million part of culture medium. This highly inefficient producing host was unable to meet the demand for penicillin. The first attempts at increased penicillin production encompassed searching for a strain that naturally produced the antibiotic in higher titers (Table 23.2). Florey and Chain discovered the strain Penicillium

Applications of Metabolic Engineering for Natural Drug Discovery

23-5

Table 23.2 Selected Metabolic Engineering Techniques Utilized to Enhance Penicillin Biosynthesis Metabolic Engineering Technique Overexpression of penicillin biosynthetic genes Placement of penicillin structural genes under strong promoters Genome-wide analysis of metabolism to increase penicillin production Elimination of competing metabolic pathways Expression of a heterologous bacterial protein in fungus to increase biosynthetic precursor pool Biosynthetic pathway reconstitution in a heterologous host

References 6–10 11,12 13,14

15,16 17,18

19

chrysogeum, which naturally produces ~200 times more penicillin than P. notatum. P. chrysogeum is still the strain used in the industrial production of penicillin as well as for research purposes. Aspergillus nidulans is the strain often used as a model system for penicillin research. The biosynthetic pathway of penicillin in both P. chrysogeum and A. nidulans is shown in Figure 23.1. Starting from the monomeric amino acid precursors L-aminoadipic acid, L-cysteine, and L-valine, three enzyme are required to biosynthesize the antibiotic.20 23.2.1.1 Overexpression of the Penicillin Biosynthetic Genes P. chrysogenum BW1890 is a high producing strain isolated via classic strain improvement studies. Smith and coworkers determined that the improved stain contains between 8 and 16 copies of the biosynthetic genes, which lead to a 32- to 64-fold increase in penicillin G production.6 Newbert and coworkers subsequently elucidated that the multiple copies are tandemly arranged instead of randomly dispersed throughout the genome.9 These findings suggested that chromatid misalignment and recombination may be the mechanism for yielding the high producing strain. Veenstra and coworkers isolated two genes, the pcbC gene and the penDE gene, which in P. chrysogenum codes for the IPNS and ACYT, respectively.8 Amplification of the two genes in P. chrysogenum Wis54-1255, which contains a single copy of each of the biosynthetic genes, led to an observed increase of up to 40% in penicillin production. An additional observation from this study uncovered the rate-limiting step of the pathway in P. chrysogenum to be the acyltransferase catalyzed reaction. In a related study with A. nidulans, MacCabe elucidated the three structural genes involved in penicillin biosynthesis, acvA, ipnA, and acyA, which code for the ACVS, IPNS, and ACYT biosynthetic enzymes, respectively.7 Coordinated expression of these three biosynthetic genes occurs under penicillin production conditions. Without synchronized expression of the three genes, no penicillin is produced. These observations lead to the conclusion that the transcriptions of the three genes are controlled by a common regulatory system. Placing the biosynthetic genes under the regulation of strong promoters is an effective method of overexpressing the enzymes. Kennedy replaced the natural acvA promoter regulating the ACVS gene with the strong ethanol dehydrogenase promoter, alcAp, in A. nidulans.11 Fusion of the lacZ reporter gene with the alcAp promoter allowed visual identification of the expression levels of the ACVS gene. Fermentation with this new strain of A. nidulans resulted in a 100-fold increase in the expression level of ACVS and a 30-fold increase in penicillin yield. Placing the IPNS and ACYT genes under the strong

23-6

Future Applications of Metabolic Engineering SH

NH2

HO

+

OH

O

H2N

O

+

O

L-Amino-adipic acid

OH

H2N O

OH

L-Cysteine

L-Valine

Aminoadipyl-cysteinyl-valine synthetase (ACVS) NH2

HO O

NH O

O

SH OH

N H

O

ACV-Tripeptide

Isopenicillin N. Synthetase (IPNS) NH2

HO O

NH O

O

S N O

Isopenicillin N

O

OH

SCoA

Phenylacetyl-CoA

Acyltransferase (ACYT)

CoA NH O

O

S N O

OH

Pencillin G

Figure 23.1 The biosynthesis of penicillin in P. chrysogeum and in A. nidulans. Synthesis of the ACV tripeptide is catalyzed by ACVS, a nonribosomal peptide synthetase (NRPS). Cyclization of the lactam ring is catalyzed by IPNS. The final acyltransferase reaction, in which the phenylacetyl acyl unit is appended to yield penicillin G, is catalyzed by ACYT.

alcA promoter led to a ten-fold rise in each of the two gene’s transcriptional levels, which resulted in a 40-fold increase in IPNS activity and an eight-fold increase in ACYT activity.12 However, only a modest increase in penicillin biosynthesis was observed, suggesting that the rate-limiting step in penicillin biosynthesis in A. nidulans is catalyzed by ACVS. This is in contrast to the metabolic studies in P. chrysogenum, in which the acyltransfer reaction is the bottleneck. 23.2.1.2 Genome-Wide Analysis of Metabolism to Increase Penicillin Production A different approach toward increasing penicillin production via metabolic engineering is to optimize the entire metabolic network (including primary metabolism) in P. chrysogenum instead of concentrating solely on the structural genes. Van Gulik deciphered the relationship between intracellular metabolic

Applications of Metabolic Engineering for Natural Drug Discovery

23-7

flux and penicillin production and determined where bottlenecks in overall metabolism might occur.13 Experimental design could then be tailored to remove identified metabolic obstacles from penicillin production. Results from the model supported their experimental analysis, which showed limitations residing in the supply/regeneration of cofactors such as NADPH. For that reason, increasing the pool of NADPH during penicillin production could increase product yield. By systematic analysis of genes expressed in P. chrysogenum in conjunction with the suppression subtractive hybridization strategy in cloning, Castillo identified differentially expressed genes in glucose or lactose media.14 In particular, they examined 95 clones grown with glucose as the carbon source, and 72 clones grown with lactose as the carbon source. Results indicate that when glucose is used as the carbon source, expression patterns correspond to robust cell respiration; in contrast, when lactose is used as the carbon source, genes involved in secondary metabolism are upregulated. The effect of carbon source may explain the observation that metabolism shifts toward respiration instead of generation of secondary metabolites when glucose is utilized. 23.2.1.3 Increasing the Levels of Metabolic Precursors To further improve penicillin production in P. chrysogenum and A. nidulans, many experiments were done to eliminate competing metabolic pathways. In P. chrysogenum, the penicillin biosynthesis pathway and the lysine biosynthesis pathway share several common steps. L-Aminoadipic acid is the branching intermediate where the biosynthetic routes for the two pathways diverge. Casquiero and coworkers hypothesized that penicillin biosynthesis can be increased through elimination of the lysine pathway.15 Starting with the Wis54-1255 strain of P. chrysogenum, lys2-disruption mutants were constructed via two homologous recombination events to affect gene replacement of the lys2 gene with a pyrG selection marker. The disruption mutants showed penicillin levels that were two-fold higher than those of the parental strain, supporting the hypothesis that the lysine pathway competes with the penicillin pathway for precursors. Bañuelos further engineered the ∆lys2 P. chrysogenum strain by overexpressing the lys1 gene16. The lys1 gene encodes for homocitrate synthase, which catalyzes the first step in the lysine and L-amino adipic acid pathway, prior to the divergent step. Additional copies of the lys1 gene increased homocitrate synthase levels, but did not result in a significant increase in penicillin production. Therefore, the lys1 gene product is not a rate-limiting step during L-aminoadipic acid synthesis. Studies in A. nidulans to increase the levels of other relevant biosynthetic precursors have also resulted in an increase in penicillin yield. Phenylacetyl-CoA is the substrate of the acyltrasferase in the last step of penicillin assembly. Mingot discovered that disruption of the phacA gene, which is a cytochrome P450 enzyme involved in the catabolism of phenylacetate, increases penicillin production three-to fivefold.18 Supplementary to increasing the amount of phenylacetate by eliminating a catalobic pathway is the overexpression of a heterologous phenylacetyl-CoA biosynthetic pathway to increase the precursor level. Miñambres introduced the gene encoding the phenylacetyl-CoA ligase (pcl) from Pseudomonas putida U into P. chrysogenum.17 In P. putida U, the pcl gene is the first enzyme involved in the aerobic catabolism of phenylacetic acid to yield phenylacetyl-CoA. All P. chrysogenum transformants containing the foreign gene showed a modest 1.8-to 2.2-fold higher increase in penicillin production as compared to the parental strain. 23.2.1.4 Biosynthetic Pathway Reconstitution in a Heterologous Host Thus far, metabolic engineering of the penicillin biosynthetic pathway has mostly been within the natural hosts. Lutz transferred the penDE gene encoding the acyltransferase from P. chrysogenum to the yeast strain Hansenula polymorpha under the control of the methanol-inducible H. polymorpha alcohol oxidase promoter to study the roles of peroxisomes in penicillin synthesis.19 They were able to show functional expression of the gene and correct processing of the protein product in H. polymorpha. This is an initial step toward the introduction of a new metabolic pathway in H. polymorpha, which could lead to complete penicillin biosynthesis in a heterologous host.

23-8

Future Applications of Metabolic Engineering

23.2.2 Erythromycin The macrolide antibiotic erythromycin was discovered in 1952 by McGuire and coworkers after analyzing the metabolic products of a Philippine soil sample, which included what was at the time named Streptomyces erythreus, but later cataloged as Saccharopolyspora erythrea.4 Macrolides are macrocyclic compounds with 14, 15, or 16-membered lactone rings. Many macrolide antibiotics are bacteriostatic (arrests microbial growth) as opposed to bacteriocidal (completely killing the microbe). Erythromycin and derivatives prevent bacteria proliferation after penetrating the cell wall and inhibiting protein synthesis by reversibly binding to the 50S ribosomal subunit of sensitive organisms. In vitro experiments show activity against gram-positive and most gram-negative bacterial species, with the exception of aerobic enteric gram-negative Bacilli.4 Important strains that are targeted by erythromycin include Neisseria gonorrhoeae, Haemophilus influenza, Mycoplasma pneumoniae, and Neisseria meningitides. Semisynthetic derivatives derived from erythromycin, such as clarithromycin and azithromycin, are also used as antimicrobial drugs.4 Erythromycin belongs to the polyketide family of natural products. Most polyketides are biosynthesized by soil-borne actinomyces as secondary metabolites. Polyketides are formed by the successive condensation and extension of short chain carboxylic acid units, controlled by a coordinated module of active sites that make up the polyketide synthases (PKS). The minimal PKS, which is necessary for chain elongation, contains the ketosynthase (KS), AT, and acyl carrier protein (ACP). Other catalytic domains may be present in a PKS, such as a ketoreductase (KR), dehydratase (DH), and enoylreductase (ER), to further modify the polyketide. Comprehensive reviews of the biochemistry of PKS can be found in a number of references.39–41 Erythromycin is derived from the polyketide aglycon 6-deoxyerythronolide B (6-dEB). 6-dEB is biosynthesized from one propionate starter unit and six units of methylmalonyl-CoA units by the 6-deoxyerythronolide B synthase (DEBS) (see Figure 23.2). DEBS is the most extensively characterized macrolide PKS to date, and a variety of metabolic engineering techniques have been applied to the biosynthesis of erythromycin and its analogs. This section will highlight some of the recent advances in this subject (Table 23.3) 23.2.2.1 Strain Improvement and Engineered Biosynthesis Using S. erythrea Kao’s group at Stanford used DNA microarray experiments to decipher the genetic basis of a S. erythrea overproducing strain.42 The reverse engineering studies revealed the overproducing strain that arose through classic strain improvement expressed the entire 56 kb erythromycin gene cluster for a significantly longer time than the wild type strain, hence supporting a more sustained production of the macrolide. These findings are consistent with the observations made by Rodriguez and coworkers.35 They discovered that industrially important overproducing S. erythrea strains contained mutations not in the erythromycin PKS genes, but in non-PKS genes on the genome. Specifically, overproducing S. erythrea strains contained mutations in regulatory elements that contributed to high expression of the PKS genes. Therefore, the erythromycin overproducer strain of S. erythrea can potentially be engineered to overexpress other heterologous biosynthetic pathways and produce heterologous metabolites at high levels. The authors established a powerful technique for transferring large segments of DNA (e.g. pathway gene clusters) into the S. erythrea genome using the φC31 based integrating vectors35. Homologous recombination in S. erythrea has been successfully used in the construction of engineered PKS modules toward synthesis of different erythromycin analogs. Katz and coworkers replaced the methylmalonyl-specific DEBS AT2 domain with an ethylmalonyl-specific domain found in the niddamycin PKS cluster.43 The resulting host was able to produce the expected 6-desmethyl-6-ethylerythromycin A in addition to erythromycin A when precursor molecules such as diethyl ethylmalonate were included in the growth media. Similar experiments have also been performed with malonyl-CoA specific acyltransferase domains inserted into various DEBS modules in place of the cognate methylmalonyl-CoA

23-9

Applications of Metabolic Engineering for Natural Drug Discovery Loading

Module 1

Module 2

Module 3

AT ACP |KS AT KR ACP |KS AT KR ACP

S

S

O

O OH

S

O OH OH

S

OH

N

HO O O

O O

S

O OH O OH OH

OH

eryK [O]

N

HO O O O

O

OH

O OH OH

7 11 12 OH 6

eryFBC

13 O 1 15 O

[O]

OH

O

OH

Erythromycin D

eryG

O OH OH

O

OH

Erythromycin C

OH OH

6-deoxyerythronolide B

eryG

O

O OH

OH

N

HO O O

O O

S

O O OH OH

O

OH

O

HO

Module 6

KS AT KR ACP |KS AT KR ACP TE

O OH

O

Module 5

S

O O OH OH

O HO

Module 4

KS AT ACP |KS AT DH ER KR ACP

O O

OH OH

eryK O

OCH3

[O]

OH

Erythromycin A

O

N

HO O O O O

OCH3 OH

Erythromycin B

Figure 23.2 The polyketide synthase (PKS) assembly line, encoded by eryAI-III producing 6-deoxyerythronolide B (6-dEB) and the remainder of the biosynthetic pathway leading to erythromycin.

Table 23.3 Selected Metabolic Engineering Techniques Utilized to Enhance Erythromycin Biosynthesis Metabolic Engineering Technique 6-dEB/erythromycin analogs via precursor-directed feeding in natural host 6-dEB/erythromycin analogs via precursor-directed feeding in heterologous hosts Heterologous and natural host strain engineering DEBS reconstitution in heterologous host Optimization of fermentation yields Other: creating hybridized pathways, full erythromycin reconstitution

References 21–26 26–33 21–27,31,34–36 26–30,32,34–37 24,25,27,33,34 37,38

23-10

Future Applications of Metabolic Engineering

specific domains.28 When the AT4 domain was engineered to accept malonyl-CoA in S. erythrea, the hydroxyl group normally present at C6 was attached to C7 to produce the analog 7-hydroxy-6-demethyl6-deoxy erythromycin D.22 The primer units of macrolides have been modified using genetic approaches as well. Swapping loading modules with alternative substrate specificities can lead to incorporation of new starter units. Leadlay and coworkers replaced the loading didomain of DEBS1 with the broadly specific loading domain from the avermectin PKS found in Streptomyces avermitilis.44 In addition to propionate, the hybrid PKS inserted a large assortment of α-branched starter units into 6-dEB. Downstream tailoring enzymes were able to transform the unnatural aglycons into the corresponding erythromycin A analogs. While continuing to be an important industrial organism for large-scale fermentative production of erythromycin, S. erythrea has also been engineered for the bioconversion of 6-dEB analogs into erythromycin analogs. As will be discussed later, a large number of 6-dEB analogs have been generated from the rational and combinatorial manipulation of the DEBS PKS in heterologous hosts. These foreign hosts, however, do not produce the bioactive erythromycins, which require regiospecific oxidation and attachment of deoxysugars to the 6-dEB aglycon. These tailoring pathways reside in the wild type S. erythrea strain. Carreras and coworkers constructed a mutated strain of S. erythrea that cannot perform de novo 6-dEB synthesis due to inactivation of the PKS assembly line. The mutant strain is able to biologically convert 6-dEB analogs to the corresponding erythromycin analogs.24 The same group optimized the fermentation conditions for the mutant strain to produce 15-methyl erythromycin A from 15-methyl-6DEB aglycon on a large scale.24 A seven-fold increase in bioconversion was detected when F1 media was chosen over R5 media. Optimization of pH and dissolved oxygen content further increased the yield and stability of the desired product. There are two competing pathways present in S. erythrea that can transform erythromycin D into the desirable erythromycin A, both requiring the actions of EryK and EryG (see Figure 23.2). EryK displays a 1000-fold preference (kcat/K m) for erythromycin D over erythromycin B.45 Therefore, it is more desirable to direct the pathway toward hydroxylation of erythromycin D, followed by methylation of erythromycin C. Desai and coworkers at Kosan Biosciences overexpressed eryK to direct turnover of erythromycin D to erythromycin C and increased the yield of erythromycin A conversion.23 The best reported yield using the engineered strain was 3.5 g/L after increasing both the precursor feed rate and dissolved oxygen content during fermentation. Megalomicin, produced by Micromonospora megalomicea, is a natural analog of erythromycin that contains the megosamine sugar moiety attached to the C6 hydroxyl group and displays diverse antibiotic, antiparasitic, and antiviral properties. Both the megalomicin and erythromycin PKS synthesize the common aglycon 6-dEB. Volchegursky and colleagues demonstrate biosynthesis of megalomicin in S. erythrea by expressing the megosamine biosynthetic pathway.38 The same techniques used to generate erythromycin analogs can therefore be utilized to produce megalomicin analogs in S. erythrea to create a plethora of megalomicin analogs with improved pharmacological properties. 23.2.2.2 Heterologous Biosynthesis Engineered biosynthesis of 6-dEB in heterologous hosts has played an important role in elucidating the modular properties of DEBS PKS and other macrolide biosynthetic pathways.40 Establishing heterologous platforms in S. coelicolor 46 and E. coli 47 have enabled genetic engineering approaches toward combinatorial biosynthesis of 6-dEB analogs. S. coelicolor was the first heterologous host developed by Chaitan Khosla’s group.46 Kao and coworkers cloned the three DEBS megasynthases into a SCP2*-based S. coelicolor vector in a multicistronic fashion, all under the control of the native ActI promoter. Transformation into the S. coelicolor strain CH999 resulted in the biosynthesis of the 6-dEB aglycon at ~20 mg/L. The host/vector approach was amendable and versatile toward recombinant DNA manipulations. In the next five years, hybrid DEBS PKSs consisting of shuffled domains and rearranged modules were constructed using this approach and afforded numerous 6-dEB analogs.48 Recently, Desai and coworkers subjected

Applications of Metabolic Engineering for Natural Drug Discovery

23-11

S. coelicolor to multiple rounds of random mutagenesis and screened for improved 6-DEBS production.27 Over 3 g/L of 6-DEBS were generated from the top mutant strain using this classic strain improvement approach. E. coli is intrinsically a clean host without any PKS genes, thus providing an excellent opportunity for production of polyketide metabolites. Furthermore, developing fermentation processes for E. coli is considerably easier and more economical than that for most natural biological sources. Pfeiffer and coworkers succeeded in the biosynthesis of 6-dEB in E. coli.47 The engineered strain BAP1, which is derived from the commercial BL21(DE3) E. coli strain, contains several important modifications. The genes encoding the Bacillus subtillis phosphopantethienyl transferase gene (sfp) and the E. coli propionyl-CoA ligase (prpE) were placed under the control of a T7 promoter and integrated into the prp operon on the E. coli chromosome. The genetic manipulation not only introduced essential genes for PKS function and precursor biosynthesis, but also inactivated the propionate catabolic genes prpRBCD at the same time. As a result, the BAP1 strain accumulated propionyl-CoA upon feeding of propionate. The metabolically engineered strain, when transformed with plasmid-borne copies of the DEBS PKS genes, was able to synthesize 6-dEB at a titer of ~60 mg/L under high cell-density fermentation conditions. Murli and coworkers inspected three independent pathways for improved accumulation of methylmalonyl-CoA, the building blocks of 6-dEB, in E. coli and concluded the S. coelicolor propionyl-CoA carboxylase (PCC) pathway was the best.36 Genes encoding the two subunits (accA1 and pccB) of PCC from S. coelicolor were placed under the control of the T7 promoter and integrated into the yfgG (methylmalonyl decarboxylase) locus of E. coli to yield the strain K207-3. Lau and coworkers adjusted several process variables to optimize polyketide production using the K207-3 strain. 34 Using a 5-L bioreactor, up to 1.1 g/L of 6-dEB was produced in a high cell density fed-batch process. Doubling phosphate concentration and minimizing the amount of extracellular ammonia produced helped to maintain a high cell density, thereby increasing total yield of 6-dEBS. Kennedy and coworkers furnished E. coli with the ability to synthesize butyryl-CoA toward the biosynthesis of 15-methyl-6-dEB.31 The DEBS loading module has the ability to accept butyryl-CoA, but butyryl-CoA is present in E. coli at low levels. E. coli utilizes the ato pathway for butyryl-CoA synthesis when butyrate is supplied exogenously. Overexpression of the ato pathway and the DEBS PKS in engineered strain K214-037, along with supplementation of 5 mM butyrate, resulted in the biosynthesis of 1 mg/L of the 15-methyl-6-dEB analog. Gramajo and coworkers further engineered the E. coli strain to produce the fully glycosylated erythromycin C.37 This is an important advance in demonstrating the full utility of E. coli as a heterologous host for producing bioactive natural products. Seventeen heterologous genes were reconstituted, including the mycarose and desosamine biosynthetic pathways, eryK, and the self-resistance rRNA methyltransferase ermE. Overexpression of ErmE is necessary because E. coli is sensitive to the fully decorated erythromycin. Coexpression of the above genes, along with the DEBS PKS, led to the efficient biosynthesis of erythromycin C (0.4 mg/L) and D (0.5 mg/L).37 23.2.2.3 Precursor Directed Biosynthesis Precursor directed biosynthesis is a synergistic approach between synthesis and biosynthesis to produce 6-dEB analogs. This method was first developed by Jacobsen for S. coelicolor,26 and has subsequently been expanded to S. erythrea and E. coli. Precursor directed erythromycin biosynthesis utilizes a mutant DEBS PKS that contains an inactivation mutation in the KS of module 1 (KS1, Figure 23.2).26 This mutation (C729 to A729) deactivates the first condensation step between propionate and methylmalonate and is, therefore, unable to synthesize 6-dEB.49 The remaining assembly line is unaltered and remains functional. Biosynthesis of 6-dEB can be restored by an exogenous supply of the natural diketide that can be accepted by the KS domain of module 2 (KS2), thereby bypassing the necessity for KS1. KS2 of the DEBS PKS accepts a variety of diketide compounds (supplied as thioesters, such as the cell permeable acyl-SNAC (N-acetylcysteamine)) differing from

23-12

Future Applications of Metabolic Engineering

the natural diketide intermediate. Downstream catalytic domains are then able to incorporate the synthetic precursors and further elaborate it into 6-dEB derivatives. The 6-dEB analogs can then be converted into bioactive erythromycin derivatives using S. erythrea. Using this approach, Jacbonsen and Khosla were able to synthesize a large array of erythromycin analogs that were inaccessible through semi-synthesis alone. 26 The same group generated novel 12-desmethyl-12-ethyl-6-dEB in S. coelicolor using the KS1 mutant and supplying the appropriate diketide precursor. 50 The unnatural macrolide is a suitable substrate for the post-PKS tailoring enzymes expressed in engineered strain S. erythrea A34 and is transformed into the corresponding erythromycin C analog. Desai and coworkers used a similar two step fermentation process to yield fluorinated erythromycin compounds.23 Precursor directed biosynthesis with racemic precursor (2R*,3S*)-5-fluoro-3-hydroxy-2methylpentanoate N-propionylcysteamine thioester afforded C15 fluorinated 6-dEB analog in the engineered S. coelicolor B9 strain. The fluorinated aglycon was then fed to S. erythrea cultures to yield fully derivatized, fluorinated erythromycin analogs. The reactive fluorine is an orthoganol reactive handle for additional semisynthesis. The fluorinated derivative also binds the ribosome at two sites, rather than at just one, thereby increasing antibiotic potency. Precursor directed biosynthesis combined with semisynthesis yielded erythromycin analogs with benzyl-containing side chains that have substantially improved antibiotic activity.21 The benzylamide version of erythromycin has comparable, and in some cases higher antibiotic activity than erythromycin A when tested against infections with S. aureus, S. pneumoniae, and H. influenzae. Using a mutant S. erythrea strain that contains the KS1 null mutation, Frykman and coworkers produced C13-substituted erythromycin analogs using precursor directed biosynthesis.25 The highest yield reported was 25 mg/L for 15-methyl-erythromycin in this mutated overproducing strain that normally affords 6.7 g/L of erythromycin. The authors noted a linear dependency of erythromycin analog titers to the amount of diketide precursor, until a plateau occurs upon precursor saturation. The overproducing strain apparently consumes precursors at a much higher rate when compared to the wild type strain, yet the polyketide production rate is lower, likely due to the rapid degradation of the thioester by an endogenous thioesterase. Similar diketide degradation problems were witnessed by Leaf and coworkers in S. coelicolor.33 Their study confirmed that precursor degradation in S. coelicolor was due to activity outside of the DEBS PKS. Desai and colleagues addressed precursor degradation during a 15-methyl6-dEB producing S. coelicolor fermentation by adding a hydrophobic adsorbent resin (XAD-16HP).27 The resin-bound precursors were continuously released over time to supply the host with intact precursors, hence minimizing undesirable degradation. Additionally, the resin captured the secreted 6-dEB analog and prevented product degradation during prolonged fermentation. The final reported yield of 15-methyl-6-dEB was 1.3 g/L. An alternative to diketide feeding to DEBS KS1o mutants is monoketide feeding to a mutant DEBS with an inactive loading domain. 29 Monoketide thioesters are less expensive than diketides and can still present a wide spectrum of structural diversity. The mutant DEBS PKS is incapable of priming the starter unit, but retains function in the remaining domains, and was tested in both E. coli and S. coelicolor for substrate acceptance and analog production. When fed with 3.8 mM of butyryl-SNAC, titers up to 21 mg/L of 15-methyl-6-dEB were extracted from S. coelicolor. Other monoketide acyl-SNAC thioesters were fed to an engineered E. coli strain, producing detectable amounts of ethyl, chloromethyl, bromomethyl, methylthio, and methoxy substituted 6-dEB analogs at the C15 position. Modifications to carbon 14 were similarly introduced and new products 14-methylthio-6-dEB, 14-nor-6-dEB, and 14-methoxy-6-dEB were isolated. The authors also found methyl thioglycolate to be a substantially cheaper acyl carrier than N-acetylcysteamine (NAC). In another example, the analog 6-deoxy-13-cyclopropyl-erythromycin B was formed in a 6-deoxyerythromycin producing S. erythrea mutant when cyclopropane carboxylic acid was supplemented in the growth medium. 51

23-13

Applications of Metabolic Engineering for Natural Drug Discovery

23.2.3 Artemisinin Epidemiologically, malaria remains the most common infectious disease worldwide. Each year, an estimated 300–500 million clinical cases of malaria arise.64 Malaria stems from four strains of Plasmodium and is transmitted to humans via the bites of mosquitoes of the genus Anopheles. Plasmodium falciparum and Plasmodium vivax are among the four malarial strains (the other two are Plasmodium ovale and Plasmodium malariae) that have shown resistance to antimalarial therapeutics. Infection by P. vivax has shown resistance to choroquin and/or primaquine.65,66 P. falciparum has developed resistance to nearly all antimalarials presently in use making this strain the most deadly, with fatality occurring within a matter of hours after infection. With the advent of resistance to classical drug treatments, new antimalarial drugs have been investigated, with artemisinin emerging as a potent treatment for severe malaria (Table 23.4). Artemisinin has shown rapid parasite abolition and faster fever reduction times than its classical counterpart, the quinines. Artemisinin is a sesquiterpenoid isolated from the plant Artemisia annua L. The concentration of artemisinin in A. annua is minimal, in the range of 0.01–0.8%, on a dry weight basis.67 It is also possible to chemically synthesize artemisinin, but the process is not economically feasible on a large scale due to low yields and the complexity of the synthetic transformations.68 Therefore, metabolic engineering of artemisinin could afford a method that can biologically synthesize artemisinin both economically and industrially for widespread clinical use. Currently, the putative artemisinin biosynthetic pathway begins with synthesis of the farnesyl diphosphate (FDP) from isopentenyl diphosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), which are derived from the mevalonate isoprenoid pathway. Subsequently, biosynthesis proceeds through the cyclization of FDP by amorphadiene synthase (ADS) to yield the amorpha-4,11-diene precursor (Figure 23.3). It is known that artemisinic acid is the next intermediate. However, contradictory theories have been proposed as to the steps immediately following synthesis of artemisinic acid to the final product, artemisinin.5,52,54 Bharel proposed pathway 2 in Figure 23.3 by chemically synthesizing the artemisinin precursor arteannuin B and proving its enzymatic conversion to dihydroarteannuin B.53 Wallaart formulated pathway 1 shown in Figure 23.3.52 They isolated dihydroartemisinic acid and showed subsequent photochemical conversion to dihydroartemisinic acid hydroperoxide. This demonstrates that pathway 1 can proceed in a nonenzymatic fashion. Additional insights into the artemisinin biosynthetic pathway outlined in Figure 23.3 will enable a more comprehensive metabolic engineering effort. 23.2.3.1 Limited Pathway Reconstitution One possible strategy for increasing artemisinin production is by increasing the amount of a rate-limiting precursor such as amorpha-4,11-diene. Among the precursors found in artemisinin biosynthesis from A. annua, there is a diminutive amount of amorpha-4,11-diene compared to other precursors in the pathway.55 It is postulated that the conversion of FDP to amorpha-4,11-diene is therefore, the Table 23.4 Selected Metabolic Engineering Techniques Utilized to Enhance Artemisinin Biosynthesis Metabolic Engineering Technique Studying intermediates for rational engineering approaches Limited pathway reconstitution in E. coli Placement of biosynthetic genes under strong promoters Optimization of fermentation yields Bioconversion of artemisinin for novel analogs

References 53,54 55–57 58,59 60,61 62–64

23-14

Future Applications of Metabolic Engineering

+

Amorphe-4,11-diene synthase

PPO

H

PPO

H

IPP

Amorpha-4,11-diene

FDP

GPP

H

CH3

1

H

CH3

2 H3C

H H H HOOC Dihydroartemisinic acid

H HOOC

H CH2

H3C

O

O

H CH2

O Arteannuin B

Artemisinic acid

O2 H HOO

HO

O2 H H

O Dihydroartemisinic acid hydroperoxide

CH3

H O OO O

7 10 HH

O Artemisinin

H3C

O

O

H CH3

O Dihydroarteannuin B

Figure 23.3 Proposed biosynthetic pathway of artemisinin in Artemisia annua L. The conversion from artemisinic acid to artemisinin has not yet been elucidated. It is anticipated to either proceed through pathways 1 or 2.

rate-limiting step. Upregulation of the enzyme responsible for the conversion of FDP to amorpha-4, 11-diene, amorpha-4,11-diene synthase (ADS), would increase product yield. Mercke isolated the ADS gene from A. annua, cloned it into E. coli and elucidated the reaction mechanism for this key enzyme.55 Wallaart proved the functional expression of recombinant ADS in a plant 54. They utilized a nonamorphadiene-producing plant, Nicotiana tabacum L. for transformation to ensure a clean background free of amorpha-4,11-diene. The ADS gene was placed into a plant expression cassette for transformation into the leaf discs of N. tabacum. The isolation of amorpha-4,11-diene in the N. tabacum plant proved that the amorpha-4,11-diene synthase could be overexpressed in the active form in heterologous plants. Expression of the active enzyme resulted in amorpha-4,11-diene ranging from 0.2 to 1.7 ng per gram fresh weight leaf tissue. Martin and coworkers from Keasling’s lab focused on reconstruction of the artemisinin pathway in E. coli for artemisinin production.56 E. coli employs the 2-C-methyl-D-erythritol 4-phosphate/deoxy-xylulose phosphate pathway (DXP) for the synthesis of 5-carbon isoprenoid precursors IPP and DMAPP, and the levels of these building blocks in the endogenous host are insufficient for both cellular requirements and overproduction of a foreign terpenoid. To supply abundant amounts of IPP and DMAPP, they engineered into E. coli the mevalonate isoprenoid pathway from S. cerevisiae. This group constructed a synthetic codon-optimized ADS gene for high expression in E. coli. The heterologous mevalonate pathway resulted in very high yields of amorphadiene production from E. coli when the culture is supplemented with 0.8% glycerol (~120 mg/L). This important advance will lay the groundwork for complete heterologous reconstitution of artemisinin and will provide an affordable route to mass-produce the important therapeutic. Alternatively, the microbial host can be engineered to produce arteminisic acid, which can be converted to artemisinin in high yields.

Applications of Metabolic Engineering for Natural Drug Discovery

23-15

23.2.3.2 Placing Biosynthetic Genes under Strong Promoters In concurrence with pathway reconstruction in a heterologous host, metabolic engineering has also been performed in the natural host A. annua to increase the copy number of the artemisinin biosynthetic genes. Chen et al. placed the FDP synthase gene under the CaMV 35S promoter.57 Subsequently, the recombinant gene was transfected into A. annua utilizing the workhorse organism Agrobacterium tumefaciens. The transgenic plant accumulated two to three times higher amounts of artemisinin in plant extracts when compared to the untransfected plant. Instead of directly affecting the artemisinin biosynthetic pathway, Sa and coworkers sought to determine how cytokinins production influenced plant growth. Cytokinins are a group of plant hormones that affect plant development. Serendipitously, they discovered that overexpression of the ipt gene using the CaMV 35S promoter resulted in a 30–70% increase in artemisinin production in A. annua.58 The ipt gene from T-DNA is responsible for the rate-limiting step in cytokinin biosynthesis. Sa and coworkers concluded that metabolic engineering outside the direct biosynthetic pathway could be advantageous for artemisinin production in the natural host. Increasing chlorophyll content and thus increasing plant metabolism produced a corresponding increase in artemisinin production.58 23.2.3.3 Optimization of Fermentation Yields The ultimate goal of metabolic engineering of natural products is for the eventual objective of largescale commercial production. Currently, artemisinin is harvested from the plant, and this process is extremely inefficient due to the aforementioned low yield. Several studies have optimized the fermentation of A. annua. In the beginning, Singh studied different strains of A. annua from different parts of the world and determined how leaf yield, artemisinin content, and yield varied at different stages of plant growth.59 Their research demonstrated that maximum artemisinin yield occurs at 50% flowering. Souret and coworkers studied two types of bioreactors, a mist reactor and a bubble column reactor, to determine the optimal expression of artemisinin in fast growing hairy roots.60 23.2.3.4 Bioconversion of Artemisinin for Novel Analogs Although artemisinin is a potent treatment against malaria, its usefulness is restricted by its low solubility in water, as well as its toxicity. To combat these drawbacks, Zhan and coworkers performed microbial transformation on artemisinin in Mucor polymorphosporus and Aspergillus niger in hopes of creating novel antimalarial compounds.61 Biotransformation by the two species resulted in the isolation of three new compounds. However, characterization of the three new compounds as to their antimalarial effectiveness has yet to transpire. De Medeiros demonstrated four new artiminisin analogs can be afforded up to 45% overall yield during the bioconversion of 10-deoxoartemisinin by Mucor ramannianus.62 One of the four compounds possesses a hydroxyl group in the C7 position, which has been shown to increase the antimalarial activity and water solubility from Quantitative Structural Activity Relationship (QSAR) studies. Similarly, by using Caenorhabditis elegans for biological conversion of artemisinin, Parshikov accomplished a conversion of 78.6% of artemisinin to 7β-hydroxyartemisinin.63 7β-hydroxyartemisinin is an important compound for the further derivatization at the 7β position of artemisinin to yield analogs with increased antimalarial potency and more favorable pharmacological properties.

23.3 Anticancer Drugs 23.3.1 Daunorubicin (DNR) and Doxorubicin (DXR) Daunorubicin (DNR, daunomycin) and doxorubicin (DXR, adriamycin) are among the most important antitumor compounds on the market today for treating solid tumors (Table 23.5). The two compounds are classified as anthracycline antibiotics, named after their tetracyclic aglycons. The rubicins intercalate with DNA and affect functions such as DNA and RNA synthesis, thereby conferring mutagenic and deleterious effects that result in tumor cell apoptosis. Although the two compounds vary by a single

23-16

Future Applications of Metabolic Engineering Table 23.5 Selected Metabolic Engineering Techniques Utilized to Enhance Doxorubicin/Daunorubicin Biosynthesis Metabolic Engineering Technique DNR overproduction in engineered natural strain DXR overproduction in engineered natural strain Daunosamine sugar reconstitution in heterologous host Biosynthesis of epirubicin from DNR/DXR host strain

References 70,71 72,73 74 75

hydroxyl group at C14 (Figure 23.4), they are appreciably different in terms of therapeutic value. DNR is active against acute leukemias, whereas DXR is effective toward sarcomas, Hodgkin’s disease and non-Hodgkin’s lymphomas, acute leukemias, cancers of the breast, genitourinary tract, thyroid, lung, stomach, and neuroblastoma.4 Unfortunately, both drugs have severe cardiomyopathic side effects that limit their effectiveness and maximum dosage. Therefore, there have been intense efforts to produce derivatives of these compounds that exhibit low cardiac toxicity without compromising high antitumor activity. Isolated in 1969 from the high producing strain Streptomyces peucetius subsp. caesius ATCC 27952, DXR was produced at an annual rate of 225 kg by the turn of the century.75 The overproducing strain was derived by mutagenesis of wild type S. peucetius, and differs from the parent culture in vegetative color and aerial mycelia.76 S. peucitius subsp. caesius is the only known strain that naturally produces DXR, but there are more strains that produce DNR, the immediate precursor of DXR (DXR is oxidized at C14). Both DNR and DXR belong to the polyketide family of natural products and originate from the successive condensation of one propionate starter unit and nine malonyl-CoA extender units (Figure 23.4). The carbon skeleton of DXR and DNR are assembled by a bacterial type II polyketide synthase. Unlike type I PKSs such as that of DEBS, in which the catalytic domains are aligned in an assembly line fashion, catalytic domains are expressed as separate proteins in type II PKSs. The minimal PKS components (KS, chain length factor (CLF) and ACP) repeatedly associate and dissociate to catalyze the chain elongation steps.77 PKS genes dpsABCDEFGY assemble the carbon skeleton of the first intermediate 12-deoxyaklanonic acid. Cyclization of the fourth ring yields ε-rhodomycinone (RHO). Glycosylation of RHO with the monosaccharide daunosamine affords rhodomycin D (RHO-D). The nitrogen containing, daunosamine sugar is made from a set of dedicated enzymes encoded in the pathway. For details of the biosynthetic pathway, see the review on DNR and DXR biosynthesis by Hutchinson.78 The biosynthetic yields of both DNR and DXR have been significantly enhanced by genetic engineering of the parental strains. The focus of metabolic engineering efforts has been on their natural producers since these organisms are genetically amendable and suitable as industrial microorganisms.79 Much work done on the subject has been pioneered by Hutchinson’s group, including the initial characterization of the DXR pathway gene cluster.80 An excellent review of DXR genetic engineering has been published by Hutchinson and Colombo.81 Overexpression of the DNR biosynthetic genes in S. peucetius has lead to increased biosynthesis of DNR and selected intermediates.70 The dnrO gene product encodes a DNA-binding protein that plays a crucial role in the negative regulation of the dnrN pseudo response regulator gene.82 Sequence comparison shows DnrO is homologous to other transcription repressors, including TetR from E. coli. The regulation cascade continues with DnrN activating transcription of the dnrI gene. DnrI is a Streptomyces Antibiotic Regulatory Protein (SARP) that specifically activates the transcription of the DNR biosynthetic genes. Homologs to DnrI are found in many polyketide gene clusters, including the well-studied ActII-ORF4 activator present in the actinorhodin pathway.83 It was first shown by Stutzman-Engwall and coworkers that increasing the copy numbers of dnrI and increasing expression of DnrI can result in elevated levels of DNR production.70 When plasmid-borne dnrI was introduced into the wild type host

HO

S-CoA

S-CoA O

HO

OH

O OCH3

OH O O

OH

OH O

OH O

O

O OH

dnrU

NH2

OH O O

OH

HO

OH O

O

O OR

doxA

dnrHX

Baumycin-like glycosides

dnrHX

O 13 14 OH

O

O

OH OH

OH

O OCH3

O

NH2

CH OH OH 2

O

Doxorubicin (DXR)

HO

OH O O

OH

ε-rhodomycinone (RHO)

OCH3 O

dnrDEF

Aklanonic acid (R=H) Aklanonic acid methyl ester (R=CH3)

OH O

O

Daunorubicin (DNR)

OCH3 O

12

O

dnrCG

(13S)-13-dihydrodaunorubicin

doxA

dnrPK

12-deoxyaklanonic acid

NH2 Rhodomycin D (RHO-D)

OH O

O

dpsABCDEFGY

Figure 23.4 Proposed biosynthetic pathway for daunorubicin and doxorubicin.

TDP-L-daunosamine

9

O

O

Applications of Metabolic Engineering for Natural Drug Discovery 23-17

23-18

Future Applications of Metabolic Engineering

ATCC 29050, a three- to 40-fold increase in antibiotic production was observed, depending upon the copy number of the plasmid. Inactivation of dnrI eliminated DNR and RHO production. These results show that overexpression of a SARP can lead to a direct increase in the yield of the target compound. Both DNR and DXR can be hyperglycosylated to yield baumycin-like glycosides, byproducts of a competing pathway that lower the yield of the two target compounds. The tandem glycotransferases that perform the hyperglycosylation have been identified to be DnrH and DnrX. Inactivation of the dnrH gene afforded a mutant S. peucetuis strain that increased DNR production by 8.5-fold.69 Further work to deactivate dnrX resulted in a three-fold increase of DXR along with the loss of two unknown acid-sensitive compounds.71 A different competing pathway is catalyzed by the enzyme DnrU, a KR that acts on the C13 carbonyl of DNR to afford the undesired byproduct, (13S)-13-dihydrodaunorubicin.72 As expected, DXR yield increased in the double mutant dnrX– dnrU– (94 µg/mL) and triple mutant dnrX– dnrU– dnrH– (3.4- to 3.8-fold increase in DXR than double mutant).72 An alternative approach toward DNR overproduction is to overcome bottlenecks, or rate-limiting steps in the biosynthetic pathway through overexpression of a specific enzyme. The rate-limiting step in TDP-daunosomine biosynthesis is likely catalyzed by the enzyme encoded by dnmT. Inactivation of dnmT leads to accumulation of RHO (Figure 23.4).69 Overexpression of DnmT in wild type DNR producing strain S. peucetius as well as in dnrH mutant strains resulted in a major decrease in RHO accumulation and a corresponding increase in DNR production.69 DoxA is a cytochrome P450 hydroxylase that catalyzes three oxidation steps in the DNR/DXR biosynthetic pathway, including the terminal oxidation that converts DNR directly to DXR. DXR titer was increased by a factor of 2.3 upon combined overexpression of doxA and dnrV. The role of DnrV in this study could not be discriminated, only that it possibly acts cooperatively during DoxA catalysis. The sugar components are important for the biological activities of polyketides. The mono-, di- and oligosaccharides are generally involved in molecular recognition of the cellular target and are therefore candidates for combinatorial biosynthesis. The daunosamine sugar attached to the aglycon of DXR and DNR is essential for the antitumor properties of the compounds.78 Removing the daunosamine from either anthracycline completely inactivates these compounds.78 Modifications of deoxysugars attached to natural product aglycons may give rise to analogs with novel bioactivities. To reconstitute the TDPdaunosamine biosynthetic pathway in a heterologous host, Olano cloned the genes dnmJLMVUZTQS on a shuttle plasmid and inserted the vector into S. lividans. Biosynthesis of daunosamine was confirmed through the bioconversion of exogenously supplied RHO into RHO-D.73 Complete reconstitution of the deoxysugar pathway will open the door to combinatorial glyco-randomization84 approaches in generating DXR and DNR analogs. Epirubicin is a semisynthetic analog of DNR with the added advantages of higher antitumor activity and lower cardiac cytotoxicity. It is also the starting compound for the chemosynthesis of 4-iodoxorubicin, an anthracycline derivative that may be helpful in treating prion-associated degeneracies.85 Epirubicin has an epimerized 4-epidaunosamine-sugar moiety in place of daunosamine. Through construction of a hybrid deoxysugar pathway in S. peucetius, Madduri and coworker were successful in biosynthesizing epirubicin. The dnmV gene, which encodes a TDP-4-ketohexulose reductase, was inactivated in the host strain. Additionally, 4-ketohexose reductase from avermectin and erythromycin PKSs that reduce the 4-keto functionality with opposite stereospecificity were introduced into the DNR producer. The hybrid sugar biosynthetic pathways correctly generated the desired 4-epidaunosamine, and the downstream DNR enzymes were able to utilize the unnatural sugar and synthesize the targeted 4′-epirubicin.74 The starter unit specificity of the DNR pathway has been explored for the production of DNR variants that contain alternative primer units. It is proposed that the gene product of dpsC confers starter unit fidelity upon the DNR PKS in Streptomyces sp. C5.86 When dpsC is inactivated, the PKS behaves rather promiscuously, and selects both acetyl-CoA and propionyl-CoA as starter units. A double mutant

Applications of Metabolic Engineering for Natural Drug Discovery

23-19

in dpsC and dpsD accumulates feudomycin D and feudomycinone C, both synthesized from acetate starter units.87 Heterologous expression of genes dpsABCDEFG and dauGI in S. lividans TK24 produced exclusively aklanonic acid, whereas eliminating dpsCD produced both aklanonic acid and desmethylaklanonic acid in a 2:3 ratio, respectively, further highlighting the importance of DpsC in maintaining starter unit fidelity.86 In a different strategy, rational recombination of the pradimicin minimal PKS and the R1128 initiation module afforded alkalonic acid-like compounds that are primed by medium length alkyl function groups in S. coelicolor.88

23.3.2 Paclitaxel Paclitaxel (commercial name Taxol ®) was first isolated in 1971 by Wani and coworkers from the bark of the Western yew tree found in the Pacific northwest region of the United States.93 It is useful in fighting cancers of the breast, ovaries, and colon, as well as small cell lung cancer (Table 23.6).4 Paclitaxel inhibits mitosis by binding specifically to the β-tubulin subunit of microtubules, antagonizing protein disassembly.4 Microtubules are crucial in preparing the spindle apparatus and play a key role in cytoskeletal development. Aggregation of microtubules into bundles and the subsequent abnormalities that follow end in cell arrest at the mitotic level. The antitumor potency of this drug stems in part from the N-benzoyl-3-phenylisoserine side chain linked to carbon-13 of the taxane ring.4 Like all isoprenoids, building blocks of paclitaxel are the five-carbon precursors IPP and DMAPP. The diterpene scaffold is derived from geranylgeranyl diphosphate (GGPP), which is polymerized from one molecule of DMAPP and three molecules of IPP, catalyzed by the GGPP synthase. In the first committed step toward Taxol biosynthesis, taxadiene synthase (TXS) cyclizes GGPP into the key intermediate, taxadiene. As can be seen from Figure 23.5, multiple hydroxylation reactions catalyzed by cytochrome P450 oxygenases94 and acylation reactions catalyzed by AT are required to afford the final product. At this point, the Taxol biosynthetic pathway has not been completely elucidated. Croteau’s group have cloned, expressed, and assigned putative roles for several key enzymes in the Taxol biosynthetic pathway, including TXS,95 taxadien-5α-ol-O-acetyltransferase (TAT),96 taxadien5α-yl acetate 10β-hydroxylase (THY10b),97 10-deacetylbaccatin III-10β-O-acyltransferase,98 taxane 2α-benzoyltransferase,99 taxoid 2α-hydroxylase,94 and taxoid 7β-hydroxylase.100 Complete chemical synthesis of Taxol has been accomplished,101,102 but the multi-step process is not commercially feasible. Semisynthesis from the key intermediate baccatin III is the current commercial approach for manufacturing Taxol. The major bottleneck of Taxol/baccatin III biosynthesis, especially for industrial development, lies in the low quantities extracted from its natural plant producer, T. brevifolia, which is genetically inaccessible. Therefore, most of the metabolic engineering efforts with paclitaxel focuses on reconstituting the pathway into a heterologous host. Ideally, the heterologous host is capable of expressing and correctly folding all the biosynthetic enzymes, producing large amounts of the compound, and affords at least one of the valuable precursors (such as baccatin III) suitable for further semisynthesis to the final product.

Table 23.6 Selected Metabolic Engineering Techniques Utilized to Enhance Paclitaxel Biosynthesis Metabolic Engineering Technique Limited pathway reconstitution in E. coli Limited pathway reconstitution in yeast Limited pathway reconstitution in plant

References 90,91 92 93

GGPPS

HO

O OH

O

10-deacetylbaccatin III

HO AcO BzO

HO Acyltransferase HO

Geranylgeranyl diphosphate

OPP

Figure 23.5 Proposed biosynthetic pathway for paclitaxel.

Hydroxylases acyltransferases

Dimethylallyl diphosphate

OPP

Isopentyl diphosphate

OPP

1 2

4

7 5

O OH

Baccatin III

O

Taxa-4(5),11(12)-diene (taxadiene)

13

AcO HO BzO

AcO

TXS

10 9 THY5a TAT THY10b

O O O

O OH

HO AcO BzO

AcO

Paclitaxel

OH

NH

OAc

O

Taxadiene-5α–acetoxy-10β–ol

HO

23-20 Future Applications of Metabolic Engineering

Applications of Metabolic Engineering for Natural Drug Discovery

23-21

23.3.2.1 Limited Pathway Reconstitution in E. coli Huang and coworkers engineered E. coli to produce taxadiene (0.5 mg/L) by overexpressing recombinant genes encoding IPP isomerase, GGPP synthase, and TXS.90 IPP isomerase and GGPP synthase were constructed as a soluble, fusion protein separated by a 21-residue linker. The yield of taxadiene from this host was 1.3mg/L upon overexpression of DXP synthase, which is a key enzyme in the endogenous IPP biosynthetic pathway. TXS contains an amino-terminal signal peptide that targets the protein toward the vacuoles of its native plant host. The signaling peptide is cleaved upon penetration and the resulting TXS is referred to as “pseudo-mature.” In the above study, the first 78 amino acids were removed from the TXS protein to improve its solubility in E. coli. In a separate publication, Huang reports a creative solution in improving TXS solubility in E. coli.89 TXS was expressed as a thioredoxin fusion protein at low temperature (20°C), and was shown to constitute 20% total cell protein (70% of the insoluble TXS protein shifted from the pellet to soluble fraction). After purification, 26 mg/L of active fusion protein was isolated from the BL21(DE3) E. coli strain. Not only were they able to create a soluble fusion protein in E. coli, but Huang and coworkers discovered the steady-state kinetic parameters between the native and fusion TXS were similar. 23.3.2.2 Limited Pathway Reconstitution in Yeast An immediate goal in the reconstitution of the Taxol pathway is the biosynthesis of baccatin III in a high producing strain, as this late stage intermediate is the starting compound for current commercially employed semisynthesis. Ten genes of the 15-step transformation from IPP/DMAPP to baccatin III have been cloned and characterized. Eight of these genes (encoding diterpene cyclase, prenyltransferase, CYP450 oxygenases, and ATs) have been expressed and purified from episomal vectors in Saccharomyces cerevisae and shown to be functional through in vitro assays by Dejong and coworkers.91 In a pathway reconstitution study by the above authors, genes encoding five of the reconstituted enzymes were introduced on three separate yeast vectors in an effort to generate the intermediate taxadiene 5α-acetoxy-10β-ol. The five enzymes are: GGPP synthase, TXS, CYP450 taxadiene 5α-hydroxylase (converts taxadiene to taxadiene-5α-ol), taxadiene-5α-ol-O-acetyl transferase (yields taxadiene-5α-ylacetate), and CYP450 taxoid 10β-hydroxylase (yields taxadiene-5α-acetoxy-10 β-ol). Unfortunately, neither taxadiene-5α-acetoxy-10 β-ol nor taxadiene-5α-yl-acetate can be detected from the fermentation media, while taxadiene-5α-ol was present only in small amounts (25 µg/L). The authors observed that taxadiene production in the host was measured at 0.7 mg/L in YPG media and 1.0 mg/L in selective media, suggesting the activities of the downstream enzymes are not reconstituted properly. 23.3.2.3 Limited Pathway Reconstitution in Plant Besumbes and coworkers took the first step in engineering taxoid biosynthesis in angiosperms, utilizing Arabidopsis as a model host.92 The authors tackled solubility issues with the plastid-localized TXS by expressing a recombinant TXS from T. baccata lacking the putative plastid targeting peptide region (first 60 residues). Vectors containing the constitutive and inducible promoters for TXS expression were delivered to A. tumefaciens and transformed into Arabidopsis via floral dipping. Constitutive expression of the full-length TXS in A. thaliana led to taxadiene accumulation at the expense of retarded growth of transgenic plants. Results suggest that the constitutive production of active TXS alters the balance of the GGPP pool, as the essential endogenous plastid isoprenoids are produced at lower yields. However, induced expression in the transgenic plant resulted in a more efficient recruitment of GGPP for taxadiene production, as implied by a 30-fold increase in the compound. The best reported yield was 600 ng of taxadiene per gram of dry leaf weight.

23.2.3 Epothilone The recently discovered antitumor drug epothilone stabilizes microtubules by a mechanism of action similar to that of paclitaxel, disturbing tubulin formation, and causing cell cycle arrest during

23-22

Future Applications of Metabolic Engineering

mitosis.107 Epothilone is viewed as a potential successor to Taxol for treating cancer patients for two important reasons: (1) epothilones are much more water soluble (advantageous for drug delivery methods) than Taxol, hence does not require stabilizing agents commonly used in the formulation of Taxol; and (2) epothilones are active against Taxol-resistant tumors, which can arise through prolonged treatment with Taxol. Several analogs of epothilone are undergoing clinical trials as potential therapeutics for breast and prostate cancer.2 Due to their medicinal potential, intense metabolic engineering efforts are underway to improve production levels and to facilitate combinatorial biosynthesis of epothilone (Table 23.7). Epothilone is synthesized naturally in Sorangium cellulosum, a slow growing myxobacterium with a 16-hour doubling time. Epothilone is produced in S. cellulosum at a rate of 20 mg/L.114 The epothilone biosynthetic pathway genes, belonging to the type I PKS family, have been cloned and sequenced (Figure 23.6).107 The gene cluster spans 56 kb and encodes six enzymes (EpoA–EpoK). Together, the biosynthetic assembly line is arranged as follows: an acetate-specific loading domain (EpoA), a cysteine-specific NRPS module (EpoB), eight PKS modules (EpoC–EpoF) that have mixed specificity toward malonyl- or methylmalonyl-CoA, and a cytochrome P450 epoxidase encoded by epoK. Epothilones C and D are products of the mixed polyketide-NRPS assembly line, which can each be oxidized by the action of EpoK to yield epothilones A and B, respectively. Epothilone B confers the highest activity against tumor cell lines, but is not a drug candidate due to its adverse side effects. Epothilone D exhibits the top therapeutic index of the four, and is produced in the lowest amounts in the natural host. Hence, a considerable amount of research effort is targeted at increasing the production of epothilone D. 23.3.3.1 Novel Epothilone Analogs through Bioconversion Novel epothilone analogs displaying a variety of hydroxylation patterns were generated through biotransformation by feeding epothilone D to a culture of Amycolata autotrophica.110 The various analogs extracted contain one, two, or three more hydroxyl groups than epothilone D. In most cases, additional hydroxyl groups decrease cytotoxic activity when tested against common tumor cell lines (breast, lung, and glioma). Analogs 11-hydroxy-, 14-hydroxy-, and 21-hydroxyepothilone D demonstrate comparable activity to the parent compound. 23.3.3.2 Reconstitution in Heterologous Hosts Myxococcus xanthus has been employed as a heterologous strain for potential epothilone overproduction. Since this organism also belongs to the myxobacterium family, regulation of secondary metabolite production, expression, and folding of the biosynthetic enzymes may proceed under more “friendly” conditions. Through a series of homologous recombination events, a 65.4-kb DNA segment from the S. cellulosum encompassing the complete pathway was introduced into the M. xanthus strain DZ1 chromosome to afford the engineered strain K111-32.104 Epothilones A and B were isolated at yields of 12–17 µg/L. Upon EpoK inactivation to yield strain K111-40, epothilone D production was observed, with a five-fold reduction in yield. Lau and coworkers optimized epothilone D yields to 23 mg/L in a fed-batch process

Table 23.7 Selected Metabolic Engineering Techniques Utilized to Enhance Epothilone Biosynthesis Metabolic Engineering Technique Epothilone C and D production reconstituted in heterologous host Epothilone A and B production reconstituted in heterologous host Fermentation techniques to increase yield of epothilones Novel bioactive epothilones through bioconversion Novel bioactive epothilones reconstituted in heterologous host

References 104–107 105,108 107,109,110 111 109,112–114

Loading

O

S

N

S

C A PCP

NRPS

O

Module 2

S

N

S

O

Module 3 Module 4

Module 5

Module 6

S

O O

O

N

S

R

S

R

N

S

OH O

OH

O

OH

O OH OH

S

R

EpoK

N

S

OH

O

21

N 18

S

N

OH

O

Module 7 Module 8

O

O 1

27 13

S

R

R

N

S

9

S

R

OH O

11

OH

OH

O

6

OH

O OH

O

OH

N

S

KS AT KR ACP |KS AT MT ACP

Epothilones C (R=H) and D (R=CH3)

S

R

S

KS AT KR ACP |KS AT KR ACP |KS AT DH ER KR ACP |KS AT DH ER KR ACP

Epothilones A (R=H) and B (R=CH3)

N

S

KS AT DH KR ACP

Module 9

S

R

N

S

KS AT KR ACP TE

OH

O OH

O OH

Figure 23.6 Epothilone assembly by PKS encoding genes epoA (loading), epoB (NRPS), epoC (module 2), epoD (modules 3–6), epoE (modules 7–8), and epoF (module 9) plus epoxidase gene epoK.

S

KSy AT ER ACP

Applications of Metabolic Engineering for Natural Drug Discovery 23-23

23-24

Future Applications of Metabolic Engineering

by incorporating an adsorber resin and utilizing methyl oleate as an alternate carbon source.106 The congener ratio between epothilones A/B and C/D can be improved through dissolved oxygen levels in the fermentation culture.108 By maintaining an excess level of dissolved oxygen (50% of saturation) in cultures of modified M. xanthus K111-32, epothilones A and B are produced as major products. As expected, epothilones C and D are the primary biosynthetic products in the absence of dissolved oxygen, which is required by the epoxidase EpoK. During fermentation of the M. xanthus strain K111-40, a novel epothilone analog, 10,11-didehydroepothilone D, was isolated at approximately 5% of the levels of epothilone C and epothilone D. This analog showed comparable activity against multidrug resistant breast and T-cell leukemia tumors. The newly introduced double bond is likely due to the inactivity of ER in module 5 (ER5) of the epothilone PKS. A rational approach was thus devised to strategically produce 10,11-didehydroepothilone D.112 The NADPH-binding site of ER5 was mutated via site-directed mutagenesis. The mutant gene was then reintegrated into the genome of K111-40 to yield an engineered epothilone assembly line that produces the desired analog. Using similar, rational approaches, additional epothilone analogs have been engineered from the M. xanthus host. Inactivation of ketoreductase in module 6 (KR6) resulted in low amounts (100–200 ng/mL) of 9-oxoepothilone D and isomeric 8-epi-9-oxoepothilone D as main epothilone products.111 The entire epothilone gene cluster has been reconstituted in the model actinomycete S. coelicolor using a pRM5 derived shuttle vector.107 Both epothilone A and epothilone B were synthesized at high levels in the heterologous host. Upon epoK deactivation, epothilones C and D accumulated as expected. Unoptimized yields of the various epothilones were recorded at 50–100 µg/L. Recently, E. coli has been engineered as a host for epothilone biosynthesis. Using precursor directed biosynthesis, Boddy et al. were able to effectively reconstitute epothilone biosynthesis in E. coli.105 A synthetic pentaketide NAC thioester intermediate was presented to the last three modules of epo PKS. Incorporation of the pentaketide substrate resulted in efficient biosynthesis (0.7 mg/L) of the epothilone C in E. coli. Boddy observed a time dependent precursor depletion, likely due to hydrolysis of the thioester, thus a second feeding of thioester precursor was delivered at a 12-hour fermentation lapse, increasing the titer to 1 mg/L, an amount comparable to the natural producer. Supplementing unnatural pentaketides modified at the starter position led to the engineered biosynthesis of epothilone analogs. Perhaps the most comprehensive effort reported thus far is the complete reconstitution of epothilones C and D biosynthesis in E. coli strain K207-3 by introducing synthetically redesigned epoABCDEF genes.103 Synthetically redesigned epoABCDEF genes were necessary to optimize codon usage and ease cloning of the genes into E. coli. The tetramodular enzyme EpoD, spanning almost 22 kb, was divided into two bimodular polypeptides. Communication between the artificially fragmented modules was restored with intermodular linkers found among type I PKSs.115,116 Chaperone proteins, including GroEL, GroES, and trigger factor (TF), were coexpressed with the epo proteins to assist in the folding of the heterologous megasynthases. For example, the chaperones significantly improved the soluble expression of EpoA, which was mostly found in the inclusion bodies in the absence of the chaperones. High levels of soluble epo proteins were recovered when expression was performed at low temperature (20°C), along with using the arabinose-inducing PBAD promoter, which was shown to be superior to the lac and T7 promoters. Thiazole diketide-SNAC feeding to E. coli produced epothilone C at 10 µg/L, but complete biosynthesis produced titers less than 1 µg/L. Although final epothilone titers are low, further metabolic engineering efforts, along with process optimization could push this approach toward higher product titers.

23.4 Cholesterol Lowering Statins Hypercholesterolemia is the primary risk factor for coronary heart diseases. Currently, about 34.5 million adults in the United States suffer from elevated levels of blood cholesterol.134 Globally,

Applications of Metabolic Engineering for Natural Drug Discovery

23-25

Table 23.8 Selected Metabolic Engineering Techniques Utilized to Enhance Type I Statin Biosynthesis Metabolic Engineering Technique Elimination of competing metabolic pathways Strain improvement for Lovastatin Microarray technology to determine transcription profiles Optimization of fermentation yields Stain improvement for ML-236B (compactin) Bioconversion of ML-236B (compactin) to pravastatin

References 118,119 120 121 122–125 126–128 129–134

cardiovascular diseases caused by high blood cholesterol account for 29% of all deaths.135 By inhibiting 3-hydroxy-3-methylglytaryl coenzyme A (HMG-CoA) reductase, the enzyme responsible for catalyzing the committed step in cholesterol biosynthesis, serum levels of cholesterol can be effectively lowered, thus lowering plaque accumulation in the arteries of the heart and lowering the risk for coronary heart disease. The cholesterol lowering statins are the drug therapy of choice in effectively treating hypercholesterolemia. Statins contain a HMG-like moiety that mimics the natural substrate and binds to the active site of HMG-CoA reductase. Upon binding of a statin, the enzymatic binding pocket rearranges to accommodate the hydrophobic bulk of the statin molecule and anchors the statin in the active site.136 Currently, the two top selling drugs in the world are statins: atorvastatin (Lipitor®) and simvastatin (Zocor®).137 The Federal Drug Administration in the United States has approved six statins. Statins can be separated into two classes based on their origin. Type II statins are completely synthetic and include fluvastatin (Lescol®), cerivastatin (Baycol®), and atorvastatin (Lipitor®). Type I statins are either natural products or are derived from natural products. These include lovastatin (Mevacor®), pravastatin (Pravachol®), and simvastatin (Zocor®). Compactin (ML-236B), a natural product, is the first statin found to inhibit cholesterol biosynthesis.128 Lovastatin is the first statin approved by the Federal Drug Administration for cholesterol lowering therapy. With the current demand for statins, continued improvement of commercial production is a necessity (Table 23.8). This section will focus on the metabolic engineering of type I statins. In their natural hosts, the statins compactin (Penicillium citrinum) and lovastatin (Aspergillus terreus) are derived from PKS. As seen in Figure 23.7, lovastatin is the C6-methylated form of compactin. Compactin and lovastatin synthesis differ at the step of pentaketide formation. During lovastatin biosynthesis, the transformation from the tetraketide to a pentaketide involves a C6-methylation step by a methyltransferase domain. The methyltransferase reaction is not employed during the analogous step of compactin biosynthesis. This additional methyl group creates additional drug-protein interactions that stabilize lovastatin in HMG-CoA reductase.136

23.4.1 Lovastatin 23.4.1.1 Strain Improvement for Lovastatin Hutchinson and Kennedy discovered the roles of several important enzymes in the lovastatin biosynthetic pathway, including the megasynthases: the lovastatin nonaketide synthase (lovB, LNKS) and the lovastatin diketide synthase (lovF, LDKS).119 Also found in the lov gene cluster are the lovE and lovH genes that encode proteins with the binuclear zinc finger motif characteristic of eukaryotic transcription factors. Increasing the copy number of lovE resulted in a seven- to ten-fold increase of lovastatin production from the parent strain as a result of upregulation of the lovastatin biosynthetic genes. The lvrA gene codes for self-resistance to lovastatin in A. terreus, and increases in lovastatin production will require the overexpression or an increase in copy number of this gene. With the elucidation of the

H

CH3

O OH

OH

O

PKS release

Diketide

ES

OH

CH3

O OH

OH

-H2O R

CH3

H

OH

CH3

O OH

OH

ML-236C, R=H Monacolin L: R=CH3

6

H

SE O OH

Nonaketide

6

H

OH

O2

NKS KR

ER

CH3 NKS KR/DH O

R H

O

R

ER

NKS KR

OH

CH3

O OH

CH3

OH

3

ML-236A, R=H Monacolin J: R=CH3

6

OH

OH

Octaketide

6

H

SE

Tetraketide

6

SE

R

H

SE

S

DKS

Transesterase

O

CH3

O

ER

NKS KR/DH

R

R

R

O

OH

H

CH3

O OH

OH

CH3

SE

ML-236B, R = H Lovastatin, R = CH3

6

O

6

O H

CH3

O SE

Hexaketide

6

Diels-Alder cyclization

NKS SE CH3 KR/DH

Pentaketide

6

O

H Heptaketide

6

R

NKS KR/DH CH MT; Lovastatin only

Figure 23.7 Biosynthetic pathway for compactin in Penicillium citrinum and for lovastatin in Aspergillus terreus. NKS is the nonaketide synthase that synthesizes dihydromonoacolin L with an external enoyl reductase. DKS is the diketide synthase that codes for the 2-methylbutylrate attached to the transesterase for transfer of the side arm to Monacolin J.

R

6

H

OH

O2

R

O Triketide

CH3 NKS KR/DH ES

NKS 4a,5-dihydro ML-236C, R=H SH Dihydromonacolin L, R=CH3

R

6

H

OH

NKS CoA KR/DH S O O CoA HO S

O

23-26 Future Applications of Metabolic Engineering

Applications of Metabolic Engineering for Natural Drug Discovery

23-27

complete gene cluster, rational metabolic engineering can be performed to increase lovastatin production in A. terreus though strain improvement or in a heterologous host for engineered biosynthesis. Removal of cometabolites presents an opportunity to improve product yield in several ways: (1) elimination of cometabolites could shift precursor resources toward the product of interest, (2) cometabolites may impose detrimental toxicities during fermentation reducing the yield of the target compound, and (3) cometabolites may be difficult to isolate and remove during downstream processing, leading to decreased product purity. During an analysis of metabolites in A. terrus fermentation it was found that eliminating the cometabolite sulochrin improved lovastatin production. Vinci first attempted this challenge by classical mutagenesis and screening of a hyperproducing strain of A. terreus.117 Their research lead to the isolation of two mutants strains, AH6 and CB4. Strain AH6 produced lovastatin equivalent to the wild type strain with the additional benefit of no detectable production of sulochrin. In pilot 250 gallon fermentation experiments with strain CB4, the strain yielded a 20% increase in lovastatin production when compared to the parental strain, and an 83% decrease in sulochrin production compared to the parental strain. Couch rationally engineered the disruption of sulochrin biosynthesis in A. terreus.118 Sulochrin is synthesized by the emodin anthrone PKS. Rational engineering by homologous recombination disruption resulted in the elimination of the unwanted cometabolite during lovastatin biosynthesis production. DNA microarray technology enables metabolic engineering on a genome-wide scale by monitoring genome transcription levels due to either mutations or perturbations within the natural host. By critical analysis of such transcript levels in A. terreus, one could decipher the points where rational metabolic engineering could lead to increased production of lovastatin. Askenazi and coworkers utilized metabolic profiling to gain insight into the genetic and physiological controls governing lovastatin biosynthesis.120 23.4.1.2 Optimization of Fermentation Yields Buckland demonstrated with shake flask fermentation of A. terreus that an additional shot of glucose to culture medium on the fifth day of fermentation resulted in an increase of 25% more lovastatin than controls.121 During lovastatin fermentation, there is a decrease in pH on the third day, but by addition of a buffer to the culture medium to exert pH control, lovastatin production increased by 40%.121 Finally, with total replacement of glucose by way of substitution with glycerol, a resultant increase of 30% more lovastatin was observed.121 Complementary to the fermentation experiments performed by Buckland, Novak equally reported an increase of 30% lovastatin production by total replacement of the carbon source from glucose to glycerol in fed-batch fermentation.122 Production levels of lovastatin were increased by five-fold by controlling the process parameters of pH, carbon source, and producer strain reisolation.123 In addition, Merck Co. optimized culture homogeneity with an agitator design that not only increased lovastatin production, but also reduced the operating power requirement cost by 66%. Taking all the adjustable process parameters together, optimizing fermentation conditions can only increase production by a specified amount, because as López proved, lovastatin inhibits its own synthesis.124 López performed the essential inhibition experiments by introducing specified amounts of lovastatin to identical batch cultures of A. terreus. The results specify that lovastatin inhibits its own synthesis due to a feedback regulatory mechanism. These experimental results present another avenue where metabolic engineering can increase the yield of the desired pro duct. Abolition of the feedback regulatory mechanism will remove the bulwark of product inhibition during fermentation leading to further strain improvement.

23.4.2 Pravastatin and Compactin While compactin is not as effective as lovastatin in lowering blood serum levels of LDL-cholesterol, it is a valuable precursor for the microbial synthesis of pravastatin, a highly effective statin. Figure 23.8 sum-

23-28

Future Applications of Metabolic Engineering

O CH3

NaOOC HO O

CH3

H

Compactin

OH

CH3

Mucor hiemalis Amycolata autotrophica Actinomadura sp. Streptomyces carbophilus Whole cell transformation

O CH3

NaOOC HO O

CH3

H

OH

CH3

HO Pravastatin

Figure 23.8 Microbial hydroxylation of compactin (sodium ML-236B carboxylate) to pravastatin via whole cell transformation.

marizes the different species able to perform microbial hydroxylation of the 6β-position of compactin to form paravastatin. Since compactin is the immediate precursor to pravastatin, increased production of parvastatin will undoubtedly require the increased production of compactin. The fungus P. citrinum naturally produces compactin.138 The principal metabolic engineering technique of natural host strain engineering was performed on P. citrinum. Prior to applying techniques such as overexpression of compactin structural genes, a transformation system must first be developed. Nara devised a transformation system for P. citrinum by constructing a shuttle vector with a hygromycin B phosphotransferase gene (hpt) from E. coli fused in combination with the 3-phosphoglycerate kinase (pgk) promoter and terminator regions from A. nidulans.125 The transformation rate under optimal transformation conditions was 194 transformants per µg circular DNA per 4 × 105 viable protoplasts. This dominant selection strategy allows for the facile screening of vector insertion. Abe improved the P. citrinum strain NO. 41520 by introducing two copies of each of the seven compactin biosynthesis genes, mlcA-mlcE, into the natural host using the vector developed by Nara et al.125 Abe accomplished this task with cosmid-recombination genetic manipulation, culminating in a 12% increase in compactin production from strains possessing two sets of the seven biosynthetic genes.127 Abe and coworkers also identified the pathway-specific regulatory gene mlcR in the compactin biosynthetic cluster.126,138 Transformation experiments to increase the copy number of mlcR correlated with an increase in compactin transcription, and resulted in a 20–30% increase in compactin production compared with the parent strain. To further increase the level of mlcR transcription, the mlcR gene was fused to the constitutively active promoter and terminator sequences of the A. nidulans 3-phosphoglycerate kinase (pgkA) gene. This experiment resulted in a 10–15% increase in compactin production. The biotransformation of compactin into pravastatin has also been extensively studied. As seen in Figure 23.8, whole cell transformation is implemented by microbial hydroxylation of the 6β-position. A report by Serizawa summarizes some of the species initially utilized to perform the hydroxylation of compactin to pravastatin.129 Mucor hiemalis was first identified as the most efficacious at the hydroxylation reaction. However, M. hiemalis proved inadequate for use in commercial fermentations because this host did not tolerate high concentrations of compactin. Amycolata autotrophica is also competent in the hydroxylation reaction. However, biotransformation in A. autotrophica had the deficiency of producing dihydroxylated by-products, which complicated downstream processing. Commercially, Streptomyces carbophilus possesses the ideal combination of hydroxylating activity with limited by-product generation. S. carbophilus can produce 340 mg/L pravastatin from 750 mg/L compactin in batch cultures, and by intermittent feeding of compactin, the conversion rate is increased to 1000 mg/L pravastatin produced from 2000 mg/L of compactin fed. Conversion occurred at a rate of 10 mg/L/hour, with compactin conversion increasing to 15 mg/L/hour upon utilization of a continuous feeding process.130 Matsuoka studied the CytP450sca monooxygenase, which is responsible for the hydroxylation of compactin in S. carbophilus, by isolation of the enzyme.131 Watanabe investigated the induction mechanism of CytP450sca in S. carbophilus by creating an assortment of mutations and deletion in the cytP450sca-2

Applications of Metabolic Engineering for Natural Drug Discovery

23-29

open reading frame.132 Deletion analysis leads to the conclusion that CytP450sca is regulated by a negative repression system, which is inducible by compactin. In the presence of compactin, the DNA-bound repressor changes conformation and dissociates from the palindromic sequence, allowing RNA polymerase to bind and transcribe the cytP450sac-2 gene, ultimately leading to expression of CytP450sca. This information may lead to the possibility of constitutive expression of the cytP450sac-2 gene by way of an artificial inducer or a constitutive promoter, increasing the efficiency of bioconversion. Peng reported a different hydroxylation system in the Actinomadura sp. strain 2966, with a conversion rate of 65–78% of compactin.133 In addition, the system conferred higher pravastatin conversion from intermittent addition of compactin. In Actinomadura sp. the hydroxylation reaction does not require induction with compactin as evidenced by the immediate conversion of compactin to pravastatin upon addition of compactin to Actinomadura sp. cultures.128 Additionally, the constitutive hydroxylase in Actinomadura sp. is stimulated by ATP and ascorbic acid, and unlike the CytP450sca monooxygenase system, there is no inactivation by carbon dioxide. Considering that statins are the top two selling drugs worldwide, there remains a plethora of areas where metabolic engineering can enhance the production of type I statins. For example, no biological methods are available for the production of simvastatin, which is currently semisynthesized from lovastatin. Heterologous reconstitution of the natural statin biosynthetic pathway into a more robust host can greatly reduce fermentation times. Furthermore, rational and combinatorial manipulation of the biosynthetic genes, such as the the PKS, may be utilized to generate novel statins with increased therapeutic values.

References 1. Berdy, J. Bioactive microbial metabolites. J. Antibiot. (Tokyo), 58 (1), 1–26, 2005. 2. Butler, M. S. Natural products to drugs: natural product derived compounds in clinical trials. Nat. Prod. Rep., 22 (2), 162–95, 2005. 3. Koepp, A. E., Hezari, M., Zajicek, J., Vogel, B. S., LaFever, R. E., Lewis, N. G., and Croteau, R. Cyclization of geranylgeranyl diphosphate to taxa-4(5),11(12)-diene is the committed step of taxol biosynthesis in Pacific yew. J. Biol. Chem., 270 (15), 8686–90, 1995. 4. Hardman, J. G., Limbird, L. E., and Gilman, A. G. The Pharamcological Basis of Therapeutics, 10th ed. McGraw-Hill, New York, 2001, pp. 2148. 5. Abdin, M. Z., Israr, M., Rehman, R. U., and Jain, S. K. Artemisinin, a novel antimalarial drug: biochemical and molecular approaches for enhanced production. Planta Med., 69 (4), 289–99, 2003. 6. Smith, D. J., Bull, J. H., Edwards, J., and Turner, G. Amplification of the isopenicillin N synthetase gene in a strain of Penicillium chrysogenum producing high levels of penicillin. Mol. Gen. Genet., 216 (2–3), 492–7, 1989. 7. MacCabe, A. P., Riach, M. B., and Kinghorn, J. R. Identification and expression of the ACV synthetase gene. J. Biotechnol., 17 (1), 91–7, 1991. 8. Veenstra, A. E., van Solingen, P., Bovenberg, R. A., and van der Voort, L. H. Strain improvement of Penicillium chrysogenum by recombinant DNA techniques. J. Biotechnol., 17 (1), 81–90, 1991. 9. Newbert, R. W., Barton, B., Greaves, P., Harper, J., and Turner, G. Analysis of a commercially improved Penicillium chrysogenum strain series: involvement of recombinogenic regions in amplification and deletion of the penicillin biosynthesis gene cluster. J. Ind. Microbiol. Biotechnol., 19 (1), 18–27, 1997. 10. Theilgaard, H., van Den Berg, M., Mulder, C., Bovenberg, R., and Nielsen, J. Quantitative analysis of Penicillium chrysogenum Wis54-1255 transformants overexpressing the penicillin biosynthetic genes. Biotechnol. Bioeng., 72 (4), 379–88, 2001. 11. Kennedy, J. and Turner, G. delta-(L-alpha-aminoadipyl)-L-cysteinyl-D-valine synthetase is a rate limiting enzyme for penicillin production in Aspergillus nidulans. Mol. Gen. Genet., 253 (1–2), 189–97, 1996.

23-30

Future Applications of Metabolic Engineering

12. Fernandez-Canon, J. M. and Penalva, M. A. Overexpression of two penicillin structural genes in Aspergillus nidulans. Mol. Gen. Genet., 246 (1), 110–18, 1995. 13. van Gulik, W. M., de Laat, W. T., Vinke, J. L., and Heijnen, J. J. Application of metabolic flux analysis for the identification of metabolic bottlenecks in the biosynthesis of penicillin-G. Biotechnol. Bioeng., 68 (6), 602–18, 2000. 14. Castillo, N. I., Fierro, F., Gutierrez, S., and Martin, J. F. Genome-wide analysis of differentially expressed genes from Penicillium chrysogenum grown with a repressing or a non-repressing carbon source. Curr. Genet., 49 (2), 85–96, 2006. 15. Casqueiro, J., Gutierrez, S., Banuelos, O., Hijarrubia, M. J., and Martin, J. F. Gene targeting in Penicillium chrysogenum: disruption of the lys2 gene leads to penicillin overproduction. J. Bacteriol., 181 (4), 1181–88, 1999. 16. Banuelos, O., Casqueiro, J., Gutierrez, S., and Martin, J. F. Overexpression of the lys1 gene in Penicillium chrysogenum: homocitrate synthase levels, alpha-aminoadipic acid pool and penicillin production. Appl. Microbiol. Biotechnol., 54 (1), 69–77, 2000. 17. Minambres, B., Martinez-Blanco, H., Olivera, E. R., Garcia, B., Diez, B., Barredo, J. L., Moreno, M. A., Schleissner, C., Salto, F., and Luengo, J. M. Molecular cloning and expression in different microbes of the DNA encoding Pseudomonas putida U phenylacetyl-CoA ligase. Use of this gene to improve the rate of benzylpenicillin biosynthesis in Penicillium chrysogenum. J. Biol. Chem., 271 (52), 33531–38, 1996. 18. Mingot, J. M., Penalva, M. A., and Fernandez-Canon, J. M. Disruption of phacA, an Aspergillus nidulans gene encoding a novel cytochrome P450 monooxygenase catalyzing phenylacetate 2-hydroxylation, results in penicillin overproduction. J. Biol. Chem., 274 (21), 14545–50, 1999. 19. Lutz, M. V., Bovenberg, R. A., van der Klei, I. J., and Veenhuis, M. Synthesis of Penicillium chrysogenum acetyl-CoA:isopenicillin N acyltransferase in Hansenula polymorpha: first step towards the introduction of a new metabolic pathway. FEMS Yeast Res., 5 (11), 1063–67, 2005. 20. MacCabe, A. P., Riach, M. B., Unkles, S. E., and Kinghorn, J. R. The Aspergillus nidulans npeA locus consists of three contiguous genes required for penicillin biosynthesis. EMBO J., 9 (1), 279–87, 1990. 21. Shaw, S. J., Abbanat, D., Ashley, G. W., Bush, K., Foleno, B., Macielag, M., Zhang, D., and Myles, D. C. 15-amido erythromycins: synthesis and in vitro activity of a new class of macrolide antibiotics. J. Antibiot. (Tokyo), 58 (3), 167–77, 2005. 22. Starks, C. M., Rodriguez, E., Carney, J. R., Desai, R. P., Carreras, C., McDaniel, R., Hutchinson, R., Galazzo, J. L., and Licari, P. J. Isolation and characterization of 7-hydroxy-6-demethyl-6-deoxyerythromycin D, a new erythromycin analogue, from engineered Saccharopolyspora erythrea. J. Antibiot. (Tokyo), 57 (1), 64–67, 2004. 23. Desai, R. P., Rodriguez, E., Galazzo, J. L., and Licari, P. Improved bioconversion of 15-fluoro6-deoxyerythronolide B to 15-fluoro-erythromycin A by overexpression of the eryK Gene in Saccharopolyspora erythrea. Biotechnol. Prog., 20 (6), 1660–65, 2004. 24. Carreras, C., Frykman, S., Ou, S., Cadapan, L., Zavala, S., Woo, E., Leaf, T., Carney, J., Burlingame, M., Patel, S., Ashley, G., and Licari, P. Saccharopolyspora erythrea-catalyzed bioconversion of 6-deoxyerythronolide B analogs for production of novel erythromycins. J. Biotechnol., 92 (3), 217–28, 2002. 25. Frykman, S., Leaf, T., Carreras, C., and Licari, P. Precursor-directed production of erythromycin analogs by Saccharopolyspora erythrea. Biotechnol. Bioeng., 76 (4), 303–10, 2001. 26. Jacobsen, J. R., Hutchinson, C. R., Cane, D. E., and Khosla, C. Precursor-directed biosynthesis of erythromycin analogs by an engineered polyketide synthase. Science, 277 (5324), 367–69, 1997. 27. Desai, R. P., Leaf, T., Hu, Z., Hutchinson, C. R., Hong, A., Byng, G., Galazzo, J., and Licari, P. Combining classical, genetic, and process strategies for improved precursor-directed production of 6-deoxyerythronolide B analogues. Biotechnol. Prog., 20 (1), 38–43, 2004.

Applications of Metabolic Engineering for Natural Drug Discovery

23-31

28. Reeves, C. D., Murli, S., Ashley, G. W., Piagentini, M., Hutchinson, C. R., and McDaniel, R. Alteration of the substrate specificity of a modular polyketide synthase acyltransferase domain through sitespecific mutations. Biochemistry, 40 (51), 15464–70, 2001. 29. Murli, S., MacMillan, K. S., Hu, Z., Ashley, G. W., Dong, S. D., Kealey, J. T., Reeves, C. D., and Kennedy, J. Chemobiosynthesis of novel 6-deoxyerythronolide B analogues by mutation of the loading module of 6-deoxyerythronolide B synthase 1. Appl. Environ. Microbiol., 71 (8), 4503–9, 2005. 30. Kinoshita, K., Williard, P. G., Khosla, C., and Cane, D. E. Precursor-directed biosynthesis of 16-membered macrolides by the erythromycin polyketide synthase. J. Am. Chem. Soc., 123 (11), 2495–502, 2001. 31. Kennedy, J., Murli, S., and Kealey, J. T. 6-Deoxyerythronolide B analogue production in Escherichia coli through metabolic pathway engineering. Biochemistry, 42 (48), 14342–48, 2003. 32. Kinoshita, K., Pfeifer, B. A., Khosla, C., and Cane, D. E. Precursor-directed polyketide biosynthesis in Escherichia coli. Bioorg. Med. Chem. Lett., 13 (21), 3701–4, 2003. 33. Leaf, T., Cadapan, L., Carreras, C., Regentin, R., Ou, S., Woo, E., Ashley, G., and Licari, P. Precursordirected biosynthesis of 6-deoxyerythronolide B analogs in Streptomyces coelicolor: understanding precursor effects. Biotechnol. Prog., 16 (4), 553–56, 2000. 34. Lau, J., Tran, C., Licari, P., and Galazzo, J. Development of a high cell-density fed-batch bioprocess for the heterologous production of 6-deoxyerythronolide B in Escherichia coli. J. Biotechnol., 110 (1), 95–103, 2004. 35. Rodriguez, E., Hu, Z., Ou, S., Volchegursky, Y., Hutchinson, C. R., and McDaniel, R. Rapid engineering of polyketide overproduction by gene transfer to industrially optimized strains. J. Ind. Microbiol. Biotechnol., 30 (8), 480–88, 2003. 36. Murli, S., Kennedy, J., Dayem, L. C., Carney, J. R., and Kealey, J. T. Metabolic engineering of Escherichia coli for improved 6-deoxyerythronolide B production. J. Ind. Microbiol. Biotechnol., 30 (8), 500–9, 2003. 37. Peiru, S., Menzella, H. G., Rodriguez, E., Carney, J., and Gramajo, H. Production of the potent antibacterial polyketide erythromycin C in Escherichia coli. Appl. Environ. Microbiol., 71 (5), 2539–47, 2005. 38. Volchegursky, Y., Hu, Z., Katz, L., and McDaniel, R. Biosynthesis of the anti-parasitic agent megalomicin: transformation of erythromycin to megalomicin in Saccharopolyspora erythrea. Mol. Microbiol., 37 (4), 752–62, 2000. 39. McDaniel, R., Welch, M., and Hutchinson, C. R. Genetic approaches to polyketide antibiotics. 1. Chem. Rev., 105 (2), 543–58, 2005. 40. Khosla, C., Gokhale, R. S., Jacobsen, J. R., and Cane, D. E. Tolerance and specificity of polyketide synthases. Annu. Rev. Biochem., 68, 219–53, 1999. 41. Weissman, K. J. and Leadlay, P. F. Combinatorial biosynthesis of reduced polyketides. Nat. Rev. Microbiol., 3 (12), 925–36, 2005. 42. Lum, A. M., Huang, J., Hutchinson, C. R., and Kao, C. M. Reverse engineering of industrial pharmaceutical-producing actinomycete strains using DNA microarrays. Metab. Eng., 6 (3), 186–96, 2004. 43. Stassi, D. L., Kakavas, S. J., Reynolds, K. A., Gunawardana, G., Swanson, S., Zeidner, D., Jackson, M., Liu, H., Buko, A., and Katz, L. Ethyl-substituted erythromycin derivatives produced by directed metabolic engineering. Proc. Natl. Acad. Sci. USA, 95 (13), 7305–9, 1998. 44. Marsden, A. F., Wilkinson, B., Cortes, J., Dunster, N. J., Staunton, J., and Leadlay, P. F. Engineering broader specificity into an antibiotic-producing polyketide synthase. Science, 279 (5348), 199–202, 1998. 45. Lambalot, R. H., Cane, D. E., Aparicio, J. J., and Katz, L. Overproduction and characterization of the erythromycin C-12 hydroxylase, EryK. Biochemistry, 34 (6), 1858–66, 1995. 46. Kao, C. M., Katz, L., and Khosla, C. Engineered biosynthesis of a complete macrolactone in a heterologous host. Science 265 (5171), 509–12, 1994.

23-32

Future Applications of Metabolic Engineering

47. Pfeifer, B. A., Admiraal, S. J., Gramajo, H., Cane, D. E., and Khosla, C. Biosynthesis of complex polyketides in a metabolically engineered strain of E. coli. Science, 291 (5509), 1790–92, 2001. 48. McDaniel, R., Thamchaipenet, A., Gustafsson, C., Fu, H., Betlach, M., and Ashley, G. Multiple genetic modifications of the erythromycin polyketide synthase to produce a library of novel “unnatural” natural products. Proc. Natl. Acad. Sci. USA, 96 (5), 1846–51, 1999. 49. Kao, C. M., Pieper, R., Cane, D. E., and Khosla, C. Evidence for two catalytically independent clusters of active sites in a functional modular polyketide synthase. Biochemistry, 35 (38), 12363–68, 1996. 50. Jacobsen, J. R., Keatinge-Clay, A. T., Cane, D. E., and Khosla, C. Precursor-directed biosynthesis of 12-ethyl erythromycin. Bioorg. Med. Chem., 6 (8), 1171–77, 1998. 51. Brown, M. S., Dirlam, J. P., McArthur, H. A., McCormick, E. L., Morse, B. K., Murphy, P. A., O’Connell, T. N., Pacey, M., Rescek, D. M., Ruddock, J., and Wax, R. G. Production of 6-deoxy-13cyclopropyl-erythromycin B by Saccharopolyspora erythrea NRRL 18643. J. Antibiot. (Tokyo), 52 (8), 742–47, 1999. 52. Wallaart, T. E., van Uden, W., Lubberink, H. G. M., Woerdenbag, H. J., Pras, N., and Quax, W. J. Isolation and identification of dihydroartemisinic acid from Artemisia annua and its possible role in the biosynthesis of artemisinin. J. Nat. Prod., 62 (3), 430–33, 1999. 53. Bharel, S., Gulati, A., Abdin, M. Z., Srivastava, P. S., Vishwakarma, R. A., and Jain, S. K. Enzymatic synthesis of artemisinin from natural and synthetic precursors. J. Nat. Prod., 61 (5), 633–36, 1998. 54. Wallaart, T. E., Bouwmeester, H. J., Hille, J., Poppinga, L., and Maijers, N. C. A. Amorpha-4,11-diene synthase: cloning and functional expression of a key enzyme in the biosynthetic pathway of the novel antimalarial drug artemisinin. Planta 212 (3), 460–65, 2001. 55. Mercke, P., Bengtsson, M., Bouwmeester, H. J., Posthumus, M. A., and Brodelius, P. E. Molecular cloning, expression, and characterization of amorpha-4,11-diene synthase, a key enzyme of artemisinin biosynthesis in Artemisia annua L. Arch. Biochem. Biophy., 381 (2), 173–80, 2000. 56. Martin, V. J. J., Pitera, D. J., Withers, S. T., Newman, J. D., and Keasling, J. D. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat. Biotechnol., 21 (7), 796–802, 2003. 57. Chen, D. H., Ye, H. C., and Li, G. F. Expression of a chimeric farnesyl diphosphate synthase gene in Artemisia annua L. transgenic plants via Agrobacterium tumefaciens-mediated transformation. Plant Sci., 155 (2), 179–85, 2000. 58. Sa, G., Mi, M., He-chun, Y., Ben-ye, L., Guo-feng, L., and Kang, C. Effects of ipt gene expression on the physiological and chemical characteristics of Artemisia annua L. Plant Sci., 160 (4), 691–98, 2001. 59. Singh, A. Vishwakarma, R. A, Husain, A, Evalutation of Artemisia annua strains for higher Artemisinin production. Planta Medica, 54 (5), 475–76, 1988. 60. Souret, F. F., Kim, Y., Wysiouzil, B. E., Wobbe, K. K., and Weathers, P. J. Scale-up of Artemisia annua L. hairy root cultures produces complex patterns of terpenoid gene expression. Biotechnol. Bioeng., 83 (6), 653–67, 2003. 61. Zhan, J. X., Zhang, Y. X., Guo, H. Z., Han, J., Ning, L. L., and Guo, D. A. Microbial metabolism of artemisinin by Mucor polymorphosporus and Aspergillus niger. J. Nat. Prod., 65 (11), 1693–95, 2002. WHO/CDS/CSR/DRS/2001.4 62. de Medeiros, S. F., Avery, M. A., Avery, B., Leite, S. G. F., Freitas, A. C. C., and Williamson, J. S. Biotransformation of 10-deoxoartemisinin to its 7 beta-hydroxy derivative by Mucor ramannianus. Biotechnol. Lett., 24 (11), 937–41, 2002. 63. Parshikov, I. A., Muraleedharan, K. M., Avery, M. A., and Williamson, J. S. Transformation of artemisinin by Cunninghamella elegans. Appl. Microbiol. Biotechnol., 64 (6), 782–86, 2004. 64. Bloland, P. B. Drug resistance in malaria. 2001. WHO/CDS/CSR/DRS/2001.4 65. Murphy, G. S., Basri, H., Purnomo, Andersen, E. M., Bangs, M. J., Mount, D. L., Gorden, J., Lal, A. A., Purwokusumo, A. R., Harjosuwarno, S., et al. Vivax malaria resistant to treatment and prophylaxis with chloroquine. Lancet, 341 (8837), 96–100, 1993.

Applications of Metabolic Engineering for Natural Drug Discovery

23-33

66. Looareesuwan, S., Buchachart, K., Wilairatana, P., Chalermrut, K., Rattanapong, Y., Amradee, S., Siripiphat, S., Chullawichit, S., Thimasan, K., Ittiverakul, M., Triampon, A., and Walsh, D. S. Primaquine-tolerant vivax malaria in Thailand. Ann. Trop. Med. Parasitol., 91 (8), 939–43, 1997. 67. van Agtmael, M. A., Eggelte, T. A., and van Boxtel, C. J. Artemisinin drugs in the treatment of malaria: from medicinal herb to registered medication. Trends Pharmacol. Sci., 20 (5), 199–205, 1999. 68. Avery, M. A., Chong, W. K. M., and Jenningswhite, C. Stereoselective total synthesis of (+)-Artemisinin, the antimalarial constituent of Artemisia annua L. J. Am. Chem. Soc., 114 (3), 974–79, 1992. 69. Scotti, C. and Hutchinson, C. R. Enhanced antibiotic production by manipulation of the Streptomyces peucetius dnrH and dnmT genes involved in doxorubicin (adriamycin) biosynthesis. J. Bacteriol., 178 (24), 7316–21, 1996. 70. Stutzman-Engwall, K. J., Otten, S. L., and Hutchinson, C. R. Regulation of secondary metabolism in Streptomyces spp. and overproduction of daunorubicin in Streptomyces peucetius. J. Bacteriol., 174 (1), 144–54, 1992. 71. Lomovskaya, N., Doi-Katayama, Y., Filippini, S., Nastro, C., Fonstein, L., Gallo, M., Colombo, A. L., and Hutchinson, C. R. The Streptomyces peucetius dpsY and dnrX genes govern early and late steps of daunorubicin and doxorubicin biosynthesis. J. Bacteriol., 180 (9), 2379–86, 1998. 72. Lomovskaya, N., Otten, S. L., Doi-Katayama, Y., Fonstein, L., Liu, X. C., Takatsu, T., Inventi-Solari, A., Filippini, S., Torti, F., Colombo, A. L., and Hutchinson, C. R. Doxorubicin overproduction in Streptomyces peucetius: cloning and characterization of the dnrU ketoreductase and dnrV genes and the doxA cytochrome P-450 hydroxylase gene. J. Bacteriol., 181 (1), 305–18, 1999. 73. Olano, C., Lomovskaya, N., Fonstein, L., Roll, J. T., and Hutchinson, C. R. A two-plasmid system for the glycosylation of polyketide antibiotics: bioconversion of epsilon-rhodomycinone to rhodomycin D. Chem. Biol., 6 (12), 845–55, 1999. 74. Madduri, K., Kennedy, J., Rivola, G., Inventi-Solari, A., Filippini, S., Zanuso, G., Colombo, A. L., Gewain, K. M., Occi, J. L., MacNeil, D. J., and Hutchinson, C. R. Production of the antitumor drug epirubicin (4’-epidoxorubicin) and its precursor by a genetically engineered strain of Streptomyces peucetius. Nat. Biotechnol., 16 (1), 69–74, 1998. 75. Arcamone, F., Animati, F., Capranico, G., Lombardi, P., Pratesi, G., Manzini, S., Supino, R., and Zunino, F. New developments in antitumor anthracyclines. Pharmacol. Ther., 76 (1–3), 117–24, 1997. 76. Arcamone, F., Cassinelli, G., Fantini, G., Grein, A., Orezzi, P., Pol, C., and Spalla, C. Adriamycin, 14-hydroxydaunomycin, a new antitumor antibiotic from S. peucetius var. caesius. Biotechnol. Bioeng., 11 (6), 1101–10, 1969. 77. Tang, Y., Lee, T. S., and Khosla, C. Engineered biosynthesis of regioselectively modified aromatic polyketides using bimodular polyketide synthases. PLoS Biol., 2 (2), E31, 2004. 78. Hutchinson, C. R. Biosynthetic studies of Daunorubicin and Tetracenomycin C. Chem. Rev., 97 (7), 2525–36, 1997. 79. Hutchinson, C. R., Borell, C. W., Otten, S. L., Stutzman-Engwall, K. J., and Wang, Y. G. Drug discovery and development through the genetic engineering of antibiotic-producing microorganisms. J. Med. Chem., 32 (5), 929–37, 1989. 80. Grimm, A., Madduri, K., Ali, A., and Hutchinson, C. R. Characterization of the Streptomyces peucetius ATCC 29050 genes encoding doxorubicin polyketide synthase. Gene, 151 (1–2), 1–10, 1994. 81. Hutchinson, C. R., and Colombo, A. L. Genetic engineering of doxorubicin production in Streptomyces peucetius: a review. J. Ind. Microbiol. Biotechnol., 23 (1), 647–52, 1999. 82. Otten, S. L., Olano, C., and Hutchinson, C. R. The dnrO gene encodes a DNA-binding protein that regulates daunorubicin production in Streptomyces peucetius by controlling expression of the dnrN pseudo response regulator gene. Microbiology, 146 (6), 1457–68, 2000. 83. Bibb, M. J. Regulation of secondary metabolism in streptomycetes. Curr. Opin. Microbiol., 8 (2), 208–15, 2005.

23-34

Future Applications of Metabolic Engineering

84. Langenhan, J. M., Griffith, B. R., and Thorson, J. S. Neoglycorandomization and chemoenzymatic glycorandomization: two complementary tools for natural product diversification. J. Nat. Prod., 68 (11), 1696–711, 2005. 85. Tagliavini, F., McArthur, R. A., Canciani, B., Giaccone, G., Porro, M., Bugiani, M., Lievens, P. M., Bugiani, O., Peri, E., Dall’Ara, P., Rocchi, M., Poli, G., Forloni, G., Bandiera, T., Varasi, M., Suarato, A., Cassutti, P., Cervini, M. A., Lansen, J., Salmona, M., and Post, C. Effectiveness of anthracycline against experimental prion disease in Syrian hamsters. Science, 276 (5315), 1119–22, 1997. 86. Rajgarhia, V. B., Priestley, N. D., and Strohl, W. R. The product of dpsC confers starter unit fidelity upon the daunorubicin polyketide synthase of Streptomyces sp. strain C5. Metab. Eng., 3 (1), 49–63, 2001. 87. Hoshino, T., and Fujiwara, A. Microbial conversion of anthracycline antibiotics. II. Characterization of the microbial conversion products of auramycinone by Streptomyces coeruleorubidus ATCC 31276. J. Antibiot. (Tokyo), 36 (11), 1463–67, 1983. 88. Lee, T. S., Khosla, C., and Tang, Y. Engineered biosynthesis of aklanonic acid analogues. J. Am. Chem. Soc., 127 (35), 12254–62, 2005. 89. Huang, K. X., Huang, Q. L., Wildung, M. R., Croteau, R., and Scott, A. I. Overproduction, in Escherichia coli, of soluble taxadiene synthase, a key enzyme in the Taxol biosynthetic pathway. Protein Expr. Purif., 13 (1), 90–96, 1998. 90. Huang, Q., Roessner, C. A., Croteau, R., and Scott, A. I. Engineering Escherichia coli for the synthesis of taxadiene, a key intermediate in the biosynthesis of taxol. Bioorg. Med. Chem., 9 (9), 2237–42, 2001. 91. Dejong, J. M., Liu, Y., Bollon, A. P., Long, R. M., Jennewein, S., Williams, D., and Croteau, R. B. Genetic engineering of taxol biosynthetic genes in Saccharomyces cerevisiae. Biotechnol. Bioeng., 93 (2), 212–24, 2006. 92. Besumbes, O., Sauret-Gueto, S., Phillips, M. A., Imperial, S., Rodriguez-Concepcion, M., and Boronat, A. Metabolic engineering of isoprenoid biosynthesis in Arabidopsis for the production of taxadiene, the first committed precursor of Taxol. Biotechnol. Bioeng., 88 (2), 168–75, 2004. 93. Wani, M. C., Taylor, H. L., Wall, M. E., Coggon, P., and McPhail, A. T. Plant antitumor agents. VI. The isolation and structure of taxol, a novel antileukemic and antitumor agent from Taxus brevifolia. J. Am. Chem. Soc., 93 (9), 2325–27, 1971. 94. Chau, M., and Croteau, R. Molecular cloning and characterization of a cytochrome P450 taxoid 2alpha-hydroxylase involved in Taxol biosynthesis. Arch. Biochem. Biophys., 427 (1), 48–57, 2004. 95. Wildung, M. R., and Croteau, R. A cDNA clone for taxadiene synthase, the diterpene cyclase that catalyzes the committed step of taxol biosynthesis. J. Biol. Chem., 271 (16), 9201–4, 1996. 96. Walker, K., Schoendorf, A., and Croteau, R. Molecular cloning of a taxa-4(20),11(12)-dien-5alphaol-O-acetyl transferase cDNA from Taxus and functional expression in Escherichia coli. Arch. Biochem. Biophys., 374 (2), 371–80, 2000. 97. Schoendorf, A., Rithner, C. D., Williams, R. M., and Croteau, R. B. Molecular cloning of a cytochrome P450 taxane 10 beta-hydroxylase cDNA from Taxus and functional expression in yeast. Proc. Natl. Acad. Sci. USA, 98 (4), 1501–6, 2001. 98. Walker, K., and Croteau, R. Molecular cloning of a 10-deacetylbaccatin III-10-O-acetyl transferase cDNA from Taxus and functional expression in Escherichia coli. Proc. Natl. Acad. Sci. USA, 97 (2), 583–87, 2000. 99. Walker, K., and Croteau, R. Taxol biosynthesis: molecular cloning of a benzoyl-CoA:taxane 2alphaO-benzoyltransferase cDNA from taxus and functional expression in Escherichia coli. Proc. Natl. Acad. Sci. USA, 97 (25), 13591–96, 2000. 100. Chau, M., Jennewein, S., Walker, K., and Croteau, R. Taxol biosynthesis: Molecular cloning and characterization of a cytochrome P450 taxoid 7 beta-hydroxylase. Chem. Biol., 11 (5), 663–72, 2004.

Applications of Metabolic Engineering for Natural Drug Discovery

23-35

101. Nicolaou, K. C., Yang, Z., Liu, J. J., Ueno, H., Nantermet, P. G., Guy, R. K., Claiborne, C. F., Renaud, J., Couladouros, E. A., Paulvannan, K., et al. Total synthesis of taxol. Nature, 367 (6464), 630–34, 1994. 102. Holton, R. A., Kim, H. B., Somoza, C., Liang, F., Biediger, R. J., Boatman, P. D., Shindo, M., Smith, C. C., Kim, S., Nadizadeh, H., Suzuki, Y., Tao, C., Vu, P., Tang, S., Zhang, P., Murthi, K. K., Gentile, L. N., and Liu, J. H. First total synthesis of taxol. J. Am. Chem. Soc., 116, 1599–1600, 1994. 103. Mutka, S. C., Carney, J. R., Liu, Y., and Kennedy, J. Heterologous production of Epothilone C and D in Escherichia coli. Biochemistry, 45 (4), 1321–30, 2006. 104. Julien, B., and Shah, S. Heterologous expression of epothilone biosynthetic genes in Myxococcus xanthus. Antimicrob. Agents Chemother., 46 (9), 2772–78, 2002. 105. Boddy, C. N., Hotta, K., Tse, M. L., Watts, R. E., and Khosla, C. Precursor-directed biosynthesis of epothilone in Escherichia coli. J. Am. Chem. Soc., 126 (24), 7436–37, 2004. 106. Lau, J., Frykman, S., Regentin, R., Ou, S., Tsuruta, H., and Licari, P. Optimizing the heterologous production of epothilone D in Myxococcus Xanthus. Biotechnol. Bioeng., 78 (3), 280–88, 2002. 107. Tang, L., Shah, S., Chung, L., Carney, J., Katz, L., Khosla, C., and Julien, B. Cloning and heterologous expression of the epothilone gene cluster. Science 287 (5453), 640–42, 2000. 108. Frykman, S. A., Tsuruta, H., Starks, C. M., Regentin, R., Carney, J. R., and Licari, P. J. Control of secondary metabolite congener distributions via modulation of the dissolved oxygen tension. Biotechnol. Prog., 18 (5), 913–20, 2002. 109. Regentin, R., Frykman, S., Lau, J., Tsuruta, H., and Licari, P. Nutrient regulation of epothilone biosynthesis in heterologous and native production strains. Appl. Microbiol. Biotechnol., 61 (5–6), 451– 55, 2003. 110. Tang, L., Qiu, R. G., Li, Y., and Katz, L. Generation of novel epothilone analogs with cytotoxic activity by biotransformation. J. Antibiot. (Tokyo), 56 (1), 16–23, 2003. 111. Tang, L., Chung, L., Carney, J. R., Starks, C. M., Licari, P., and Katz, L. Generation of new epothilones by genetic engineering of a polyketide synthase in Myxococcus Xanthus. J. Antibiot. (Tokyo). 58 (3), 178–84, 2005. 112. Arslanian, R. L., Tang, L., Blough, S., Ma, W., Qiu, R. G., Katz, L., and Carney, J. R. A new cytotoxic epothilone from modified polyketide synthases heterologously expressed in Myxococcus xanthus. J. Nat. Prod., 65 (7), 1061–64, 2002. 113. Starks, C. M., Zhou, Y., Liu, F., and Licari, P. J. Isolation and characterization of new epothilone analogues from recombinant Myxococcus xanthus fermentations. J. Nat. Prod., 66 (10), 1313–17, 2003. 114. Gerth, K., Bedorf, N., Hofle, G., Irschik, H., and Reichenbach, H. Epothilons A and B: antifungal and cytotoxic compounds from Sorangium cellulosum (Myxobacteria). Production, physicochemical and biological properties. J. Antibiot. (Tokyo), 49 (6), 560–63, 1996. 115. Watanabe, K., Rude, M. A., Walsh, C. T., and Khosla, C. Engineered biosynthesis of an ansamycin polyketide precursor in Escherichia coli. Proc. Natl. Acad. Sci. USA, 100 (17), 9774–78, 2003. 116. Gokhale, R. S., Tsuji, S. Y., Cane, D. E., and Khosla, C. Dissecting and exploiting intermodular communication in polyketide synthases. Science, 284 (5413), 482–85, 1999. 117. Vinci, V. A., Hoerner, T. D., Coffman, A. D., Schimmel, T. G., Dabora, R. L., Kirpekar, A. C., Ruby, C. L., and Stieber, R. W. Mutants of a Lovastatin-hyperproducing Aspergillus terreus deficient in the production of Sulochrin. J. Ind. Microbiol., 8 (2), 113–20, 1991. 118. Couch, R. D., and Gaucher, G. M. Rational elimination of Aspergillus terreus sulochrin production. J. Biotechnol., 108 (2), 171–78, 2004. 119. Hutchinson, C. R., Kennedy, J., Park, C., Kendrew, S., Auclair, K., and Vederas, J. Aspects of the biosynthesis of non-aromatic fungal polyketides by iterative polyketide synthases. Antonie Van Leeuwenhoek, 78 (3–4), 287–95, 2000.

23-36

Future Applications of Metabolic Engineering

120. Askenazi, M., Driggers, E. M., Holtzman, D. A., Norman, T. C., Iverson, S., Zimmer, D. P., Boers, M. E., Blomquist, P. R., Martinez, E. J., Monreal, A. W., Feibelman, T. P., Mayorga, M. E., Maxon, M. E., Sykes, K., Tobin, J. V., Cordero, E., Salama, S. R., Trueheart, J., Royer, J. C., and Madden, K. T. Integrating transcriptional and metabolite profiles to direct the engineering of lovastatin-producing fungal strains. Nat. Biotechnol., 21 (2), 150–56, 2003. 121. Buckland, B., Kodzo, G., Hallada, T., Kaplan, L., and Masurekar, P. Production of lovastatin, an inhibitor of cholesterol accumulation in humans. Novel Microb. Prod. Med. Agric., 161–69, 1989. 122. Novak, N., Gerdin, S., and Berovic, M. Increased lovastatin formation by Aspergillus terreus using repeated fed-batch process. Biotechnol. Lett., 19 (10), 947–48, 1997. 123. Manzoni, M., and Rollini, M. Biosynthesis and biotechnological production of statins by filamentous fungi and application of these cholesterol-lowering drugs. Appl. Microbiol. Biotechnol., 58 (5), 555–64, 2002. 124. Lopez, J. L. C., Porcel, E. M. R., Ferron, M. A. V., Perez, J. A. S., Sevilla, J. M. F., and Chisti, Y., Lovastatin inhibits its own synthesis in Aspergillus terreus. J. Ind. Microbiol. Biotechnol., 31 (1), 48–50, 2004. 125. Nara, F., Watanabe, I., and Serizawa, N. Development of a transformation system for the filamentous, ML-236B (compactin)-producing fungus Penicillium citrinum. Curr. Genet., 23 (1), 28–32, 1993. 126. Abe, Y., Ono, C., Hosobuchi, M., and Yoshikawa, H. Functional analysis of mlcR, a regulatory gene for ML-236B (compactin) biosynthesis in Penicillium citrinum. Mol. Genet. Genomics, 268 (3), 352– 61, 2002. 127. Abe, Y., Suzuki, T., Mizuno, T., Ono, C., Iwamoto, K., Hosobuchi, M., and Yoshikawa, H. Effect of increased dosage of the ML-236B (compactin) biosynthetic gene cluster on ML-236B production in Penicillium citrinum. Mol. Genet. Genomics, 268 (1), 130–7, 2002. 128. Peng, Y. L., and Demain, A. L.. Bioconversion of compactin to pravastatin by Actinomadura sp ATCC 55678. J. Mol. Catalysis B-Enzymatic, 10 (1–3), 151–56, 2000. 129. Serizawa, N., Hosobuchi, M., and Yoshikawa, H. Biochemical and Fermentation Technological Approaches to Production of Pravastatin. A HMG-CoA Reductase Inhibitor. Marcel Dekker, Inc., Wilmington, NC, 1997. 130. Park, J. W., Lee, J. K., Kwon, T. J., Yi, D. H., Kim, Y. J., Moon, S. H., Suh, H. H., Kang, S. M., and Park, Y. I. Bioconversion of compactin into pravastatin by Streptomyces sp. Biotechnol. Lett., 25 (21), 1827–31, 2003. 131. Matsuoka, T., Miyakoshi, S., Tanzawa, K., Nakahara, K., Hosobuchi, M., and Serizawa, N. Purification and characterization of cytochrome P-450sca from Streptomyces carbophilus. ML-236B (compactin) induces a cytochrome P-450sca in Streptomyces carbophilus that hydroxylates ML-236B to pravastatin sodium (CS-514), a tissue-selective inhibitor of 3-hydroxy-3-methylglutarylcoenzyme-A reductase. Eur. J. Biochem., 184 (3), 707–13, 1989. 132. Watanabe, I., and Serizawa, N. Molecular approaches for production of pravastatin, a HMG-CoA reductase inhibitor: transcriptional regulation of the cytochrome p450sca gene from Streptomyces carbophilus by ML-236B sodium salt and phenobarbital. Gene, 210 (1), 109–16, 1998. 133. Peng, Y., Yashphe, J., and Demain, A. L. Biotransformation of compactin to pravastatin by Actinomadura sp. 2966. J. Antibiot. (Tokyo), 50 (12), 1032–35, 1997. 134. Lloyd-Jones, D., Adams, R., Carnethon, M., De Simone, G., Ferguson, T. B., Flegal, K., Ford, E., Furie, K., Go, A., Greelund, K., Jaase, N., Hailpern, S., Ho, M., Howard, V., Kissela, B., Kittner, S., Lackland, D., Lisabeth, L., Marelli, A., McDermott, M., Meigs, J., Mozaffarian, D., Nichol, G., O’Donnel, C., Roger, V., Rosamond, W., Sacco, R., Sorlie, P., Stafford, R., Steinberger, J., Thom, T., Wasserthiel-Smoller, S., Wong, N., Wylie-Rosset, J., Hong, Y. Heat Disease and Stroke Statistics 2009 Update: A Report From the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. Journal of the American Heart Association 199(3), 480–487, 2009.

Applications of Metabolic Engineering for Natural Drug Discovery

23-37

135. Istvan, E. S. and Deisenhofer, J., Structural mechanism for statin inhibition of HMG-CoA reductase, Science 292 (5519), 1160–64, 2001. 136. Herper, M. The World’s Best-Selling Drugs. 2004. 137. Endo, A., Kuroda, M., and Tsujita, Y. ML-236A, ML-236B, and ML-236C, new inhibitors of cholesterogenesis produced by Penicillium citrinium. J. Antibiot. (Tokyo), 29 (12), 1346–48, 1976. 138. Abe, Y., Suzuki, T., Ono, C., Iwamoto, K., Hosobuchi, M., and Yoshikawa, H. Molecular cloning and characterization of an ML-236B (compactin) biosynthetic gene cluster in Penicillium citrinum. Mol. Genet. Genomics, 267 (5), 636–46, 2002.

24 Metabolic Engineering for Alternative Fuels 24.1 Introduction �� 24-1 24.2 Ethanol Production Process and Metabolic Engineering Opportunities �� 24-3 24.3 Feedstock Engineering �� 24-4 24.4 Cellulase Engineering �� 24-5 24.5 Biocatalyst Engineering �� 24-5 Saccharomyces cerevisiae • Zymomonas mobilis • Escherichia coli • Klebsiella oxytoca

Yandi Dharmadi Rice University

Ramon Gonzalez Rice University

24.6 Consolidated Bioprocessing �� 24-12 Native Cellulolytic Strategy • Recombinant Cellulolytic Strategy

24.7 Emerging Biofuel Platforms �� 24-12 24.8 Conclusions and Future Outlook.............................................. 24-13 References �� 24-13

24.1 Introduction Due to finite natural reserves, worldwide production of crude oil is expected to peak sometime in the near future, followed by a steady decline [1]. Liquid fuel shortage can be mitigated by utilizing conversion technologies (e.g., liquefied coal, liquefied natural gas) and alternative sources (e.g., shale oil, oil sands, gas hydrates). However, reliance on these fossil resources may be short-lived. In fact, natural gas production has declined in the past few years, resulting in sustained high prices [1]. The U.S. consumes 25% of worldwide oil production, with a trade balance of 60% import [2,3]. In light of rising oil prices, national energy security has become a real issue for oil importing countries. Furthermore, heightened awareness on global warming has highlighted environmental concerns about greenhouse gas emissions. These issues underscore the need for a sustainable alternative to fossil fuel, hence the increase in policy support and monetary investments in biofuels [4]. Currently ethanol accounts for 99% of U.S. biofuel production (90% worldwide), with biodiesel making up the rest [5,6]. Robust growth in U.S. ethanol production (4 billion gallons in 2005) is expected to meet the goal set by the Energy Policy Act of 2005, that requires 7.5 billion gallons of renewable fuel (6%) in gasoline blend by 2012 [3,7]. Similarly, the European Union Biofuels Use Directive of 2003 set a goal of 5.75% biofuel for transportation by 2010 [8]. Ethanol is clean burning and has higher oxygen content than gasoline. Gasoline blend with 85% ethanol (E85 fuel) has an octane rating of 105 and is commercially available [9]. Bioethanol holds a great promise as a viable alternative to gasoline, but harnessing its full potential calls for major advances to current production technologies. Today ethanol is mostly produced via

24-1

24-2

Future Applications of Metabolic Engineering

fermentation, using sugar-rich plant feedstocks such as sugar cane juice (Brazil) or corn starch (U.S.). Critical assessment of the bioethanol enterprise naturally raises several considerations, all pointing to the use of cellulosic/plant biomass as “next generation” feedstock.

1. Economics. The U.S. ethanol industry is supported by agricultural subsidies and tax credits, as the overall process economics is not yet competitive to the petroleum industry. Raw material cost for corn feedstock accounts for more than half the cost of ethanol [10]. Crop residues such as corn stover or bagasse (spent sugar cane) are readily available sources of lignocellulose, which can be converted to sugars for ethanol fermentation. Utilization of these cheap biomass feedstocks will give a significant boost to bioethanol profitability. 2. Growth. U.S. ethanol production has doubled in the 2000–2005 period [11]. However, this rapid growth may not be sustainable because of limitation in land availability and supporting infrastructure, e.g., the animal feed industry may not be able to keep up in absorbing excess distiller’s grains. The solution will be to utilize biomass feedstock, which is ten times more abundant than corn grain feedstock [12]. It is estimated that agricultural and forest resources will be able to provide over 1 billion tons of biomass each year for large-scale biofuel production, enough to account for at least 30% of transportation fuel [12]. 3. Energy. A useful metric to gauge fuel production efficiency is the fossil energy ratio (FER), defined as fuel energy delivered per fossil energy input used in producing the fuel. Corn ethanol is only marginally better than gasoline (FER=1.36 vs. 0.81), but cellulosic ethanol is projected to be a real “energy gainer” (FER>10) [13]. 4. Environment. With the same amount of energy delivered, cellulosic ethanol production will significantly reduce CO2 emission compared to gasoline (by an order of magnitude) because CO2 formed by fuel combustion is sequestered back in biomass formation [5]. By utilizing abundant biomass feedstock, deforestation to accommodate crops will also be unnecessary. 5. Ethics. Food vs. fuel—is converting food crops to fuel justified, when starvation is still a problem in parts of the world? Use of biomass feedstock sidesteps this question altogether, as food crops and land use will be preserved. Cellulosic ethanol will deliver food and fuel.

It is clear that the future of bioethanol lies in cellulosic biomass. However, technological challenges exist in conversion of biomass to amenable substrates, and in utilization of the substrates in ethanol fermentation. Plant biomass is lignocellulosic material, consisting of a complex matrix of cellulose (∼45%), hemicellulose (∼30%), and lignin (∼25%) [14]. Both starch and cellulose are glucose polymers, but unlike the α(1→4) glycosidic bonds in starch, β(1→4) bonds in cellulose make it crystalline and thus more biologically inert. However, chemical pretreatment for lignocellulosic biomass often results in degradation products inhibitory to subsequent fermentation. Hemicellulose is an amorphous, branched polymer of pentoses (xylose and arabinose) and other hexoses (galactose, mannose). Overall, lignocellulose contains 16–26% pentoses [15]. However, Saccharomyces cerivisiae (the conventional biocatalyst for ethanol fermentation) is unable to utilize pentoses. At the same time, other native pentose-fermenting biocatalysts such as Escherichia coli have poor ethanol yield, or cannot tolerate high ethanol concentration. These issues are central to breaking the “biological barrier” to tapping the tremendous potential of bioethanol, and need to be addressed by cutting-edge technology. There is intensive research in this area, with metabolic engineering taking center stage in biocatalyst development. This chapter will focus on the role of metabolic engineering as the enabling technology for cellulosic bioethanol. Therefore, we will defer discussion on emerging biofuel platforms such as syngas fermentation [16] and biofuel molecules other than ethanol [17] toward the end.

24-3

Metabolic Engineering for Alternative Fuels

24.2 Ethanol Production Process and Metabolic Engineering Opportunities Figure 24.1 shows a generalized process flow of ethanol production from biomass. Size reduction of bulky biomass (e.g., poplar wood, corn stover) is necessary for ease of processing in subsequent steps, and to some extent increasing the accessible surface area. This is adequately accomplished by milling and passing through a 0.25-in screen, followed by washing with water and drying [18,19]. Thermochemical pretreatment serves to alter the lignocellulose structure, making it more accessible to subsequent enzymatic hydrolysis, and partially releasing the lignin, glucan, or xylan oligomers from the matrix. Pretreatment is a critical step in the overall process, and probably one of the most expensive [20]. A successful pretreatment will keep the size reduction requirement to a minimum and lower enzyme loading in the hydrolysis step. Depending on the method used, pretreatment also determines the kind of degradation products formed and the release pattern of lignin, hexoses, and pentoses, all of which affects the subsequent hydrolysis and fermentation steps. Pretreatment agents commonly used include hot water (>200°C), high pressure steam, dilute sulfuric acid, ammonia, and lime [19]. Generally, water or acidic pretreatment releases hemicellulosic sugars, while basic pretreatment releases lignin from the biomass. Comparison of these pretreatment methods on corn stover, followed with enzyme hydrolysis, shows that dilute sulfuric acid pretreatment gives the highest overall monomeric glucose and xylose yield (>90%) [21]. Degradation products formed in pretreatment (e.g., furfural, phenols, acetic acid) are fermentation inhibitors, which can be removed by steam stripping, evaporation, treatment with laccase (phenol oxidase), or lime [22]. A dedicated enzyme production system, typically an aerobic fungal cell culture (or external source), uses a fraction (3–5%) of pretreated biomass stream as substrate and provides cellulases for hydrolysis [23]. The enzymes are added to pretreated biomass to hydrolyze soluble glucan and xylan oligomers, as well as cellulose and hemicellulose still intact in the solid matrix. Commercial cellulase preparations usually have

Biomass

Size reduction

Pretreatment and detoxification

Enzyme production

Cellulases/ hemicellulases

CBP

Enzymatic hydrolysis

Lignin

Sugars SSF Fermentation

Product recovery

Coproducts

Ethanol

Figure 24.1 (See color insert following page 13-20.) Ethanol production from biomass. Saccharifying enzymes and biocatalyst work in tandem in simultaneous saccharification and fermentation (SSF). In consolidated bioprocessing (CBP), the enzymes are produced directly by the ethanologenic biocatalyst.

24-4

Future Applications of Metabolic Engineering

enough xylanase activity for efficient release of xylose monomers [21]. An optional solid/liquid separation produces lignin-rich solid residues that can be burned to provide energy, while monomeric hexoses and pentoses in the sugar stream are used as carbon source in anaerobic fermentation of ethanologenic biocatalyst, with ammonium sulfate, urea, or corn steep liquor (CSL) added as nitrogen source. Finally, product recovery by distillation and adsorption produces ethanol and other coproducts (e.g., CO2, xylitol). The scheme described above is referred to as separate hydrolysis and fermentation (SHF). A more integrated process is simultaneous saccharification and fermentation (SSF), in which both enzymes and biocatalyst are added to pretreated biomass in the same vessel. This in effect creates a fed-batch mode fermentation, where sugar substrates are fermented as soon as they are released in solution. In addition to cutting capital cost by merging two unit operations, this strategy relieves inhibition of end hydrolysis products (glucose, cellobiose) on the enzymes, avoids osmotic stress to the biocatalyst due to high sugar loading, and allows for up to 40% higher ethanol yield [24]. Some challenges to SSF include incompatible optimal pH and temperatures for hydrolysis and fermentation, inhibition of biocatalysts by impurities in enzyme preparation, and inhibition of enzyme by fermentation products [24]. Compared to SSF, consolidated bioprocessing (CBP) will be a step further in terms of process integration in ethanol production. In CBP, the biocatalyst itself produces cellulolytic enzymes, thereby eliminating the need for separate enzyme production system. Currently, no biocatalyst is able to adequately carry out the multi-functional demands of CBP [25]. However, cost savings due to merged unit operations and yield loss minimization will give CBP a significant economic edge. It is projected that switching from SSF to CBP will amount to an overall 18% reduction of ethanol production cost (20 ¢/gal gasoline equivalent), a great improvement in light of low profit margins in a price-competitive energy market [25]. It is not difficult to conceive of engineering challenges in the process description above, and yet it is easy to recognize the opportunities for metabolic engineering contributions. Yield, titer, and productivity make the economic bottom line, but it is genes, pathways, fluxes, and regulations that make the underlying mechanisms. Metabolic engineering has played a key role in biocatalyst development, and as result, large-scale ethanol production from biomass is becoming a reality. Since 2004, Iogen Corp (Ontario, Canada) has produced ethanol from wheat straw with 1 million gal/year capacity [26]. In the following sections we will discuss metabolic engineering efforts on biocatalyst, cellulase production, and the plant feedstock itself.

24.3 Feedstock Engineering Energy crops are plants specifically grown for biomass feedstock in biofuel production. As such, they should be sustainable, have high biomass per acre yield, and require little agricultural inputs (e.g., water, fertilizer, herbicides). Perennial plants such as switchgrass and poplar are excellent energy crops because they are fast growing and drought resistant. Moreover, perennial species have very efficient nutrient utilization, as carbon and mineral nutrients are retained by rhizomes/roots that stay underground upon harvest [13]. Further improvements on energy crops are desirable, particularly with respect to ease of deconstruction of lignocellulosic biomass. This involves modification of the plant cell wall so that it contains less lignin, or altered lignin composition that facilitates removal. A 45% decrease of lignin content in healthy, transgenic aspen was achieved by downregulation of 4CL, a coenzyme A ligase of hydroxycinnimic acids (intermediates in lignin biosynthesis) [27]. Control of syringil:guaicyl (S:G) lignin monomer ratio in Arabidopsis can be achieved by overexpression or knockout of ferulate-5-hydroxylase (F5H), a cytochrome P450-dependent monooxigenase catalyzing the hydroxylation of syringil lignin precursors (ferulic acid, coniferaldehyde and coniferyl alcohol) [28]. Overexpression of Arabidopsis F5H in poplar also increases the syringil monomer content, resulting in higher pulping efficiency but otherwise normal, healthy trees [29]. In a similar work on transgenic aspen, downregulation of 4CL and upregulation of F5H resulted in 52% less lignin, 64% higher S:G ratio, and 30% higher cellulose content [30].

Metabolic Engineering for Alternative Fuels

24-5

A novel upstream strategy for feedstock deconstruction is the expression of cellulase in the plant biomass itself. A successful implementation of this strategy would mean no separate enzyme production system is needed, hence a process integration similar to that of CBP. The 1,4-β-endoglucanase E1 of Acidothermus cellulolyticus was expressed in rice and maize leaves to 4.9% and 2% total soluble protein, respectively, with the enzyme targeted to the apoplast [31]. Compartmentalization of the enzyme outside of the cytoplasm avoids interference with cytosolic metabolism and blocks access to cellulosic cell wall, ensuring healthy plants. Unfortunately, the enzyme did not survive harsh pretreatment condition (ammonia fiber explosion method), resulting in loss of two-thirds of activity [32]. However, modest success was achieved when crude enzyme extract (prepared separately) was used to hydrolyze pretreated biomass, with 30 and 22% glucose recovery from rice and maize cellulose, respectively [31].

24.4 Cellulase Engineering There are three kinds of enzymatic activities working in concert during cellulose hydrolysis: cellobiohydrolases processively cleave cellobiose units (glucose dimer) from both reducing and nonreducing ends of the cellulose chain, endoglucanases create more attack sites for cellobiohydrolases by randomly cutting the cellulose chain into smaller pieces, and β-glucosidases hydrolyze cellobiose into glucose. Cellulose degrading organisms such as the filamentous brown rot fungi Trichoderma reesei and Aspergillus niger have an arsenal of cellulases at their disposal; the T. reesei genome encodes at least two cellobiohydrolases, two β-glucosidases, and five endoglucanases [23]. Although filamentous fungi naturally produce high levels of extracellular cellulases in aerobic culture, they have low specificity to crystalline cellulose. This was a major hurdle in cellulosic ethanol production, as high enzyme load requirement translated to high operating cost. By contrast, thermophilic anaerobic bacteria such as Clostridium thermocellum produce cellulases that are highly active on crystalline cellulose, but enzyme secretion level is low due to energy limitation in anaerobic culture. Almost all (95%) of C. thermocellum endoglucanase activity actually resides in the cellulase complex called cellulosome [33]. Cellulosomes are catalytically very efficient because several enzymatic activities in effect are present in the same active site, i.e., the substrate need not dissociate in order to proceed to the next enzymatic reaction [34]. Advanced molecular cloning technologies have enabled the engineering of chimeric or “designer” cellulosomes from free cellulases [35,36]. This exciting development could prove to be revolutionary in terms of biomass feedstock deconstruction technology. Engineering efforts have been focused on improving the activity of T. reesei cellulases using mutagenesis approach. By using mutagens nitrosoguanidine, ethyl methane sulfonate, and nitrous acid on T. reesei spores followed by screening on selective media plates, Durand et al. obtained a strain with catabolite (glucose) derepression, constitutive β-glucosidase expression, as well as increased and constitutive cellulase expression, with overall four-fold increase in cellulase productivity compared to the parent strain [37]. A targeted improvement of wild-type cellobiohydrolase I of T. reesei (Cel7A) was reported; the method used error-prone PCR, DNA shuffling, as well as site-directed and site-saturation mutagenesis, followed by expression and thermal stability/activity screening in S. cerevisiae [3]. Through enzyme engineering and fermentation process optimization, a major breakthrough was accomplished in 2004. The National Renewable Energy Laboratory, Genencor International, and Novozymes Biotech achieved more than 20-fold cost reduction of cellulase, down to 10–25 ¢/gallon ethanol produced [38]. Work is under way to bring the cost down to as low as amylases for corn ethanol production (1–2 ¢/gallon) [13].

24.5 Biocatalyst Engineering S. cerevisiae naturally produces ethanol with high yield, titer, and productivity [13], making it an excellent biocatalyst for ethanol production from corn starch or sugar cane juice. However, the sugar stream from biomass feedstock is different; it contains pentoses (not metabolizable by wild-type S. cerevisiae) and fermentation inhibitors. This adds two more requirements in the list of biocatalyst properties,

24-6

Future Applications of Metabolic Engineering

namely broad substrate utilization and tolerance to inhibitors, including ethanol itself. S. cerevisiae tolerates up to 21% (w/v) ethanol [39]. Metabolic engineering strategies in biocatalyst development aptly start from what nature has provided. The first option is to engineer pentose metabolic pathways lacking in native ethanologens (e.g., S. cerevisiae, Zymomonas mobilis), while the complementary approach is to engineer a homoethanol pathway in native pentose fermenters (e.g., E. coli, Klebsiella oxytoca). In many cases, heterologous gene expression and evolutionary selection techniques have been instrumental in producing desired phenotypes, while further insights into the genetic basis of mutations can be obtained through metabolic engineering tools such as flux analysis and transcriptome profiling. The following sections will describe metabolic engineering efforts on S. cerevisiae, Z. mobilis, E. coli, and K. oxytoca as the most promising biocatalysts for ethanol production. Maximum theoretical ethanol yields are calculated by balancing hypothetical net reactions involving carbon source, water, ethanol, and carbon dioxide. For glucose and xylose, the reactions are as follows: C 6H12O6 → 2 C 2H6O + 2 CO 2 C5H10O5 → 5 3 C 2H6O + 5 3 CO 2 Conversion of molar yields to mass yields is done by factoring in the molecular weights of glucose (180.16), xylose (150.13), and ethanol (46.07). In both cases the ethanol yields are 0.511 g/g sugar. For cellobiose and higher molecular weight oligomers, the balance accounts for water in hydrolysis: C 6nH(10n+ 2)O(5n+1) + (n-1) H 2O → 2n C 2H6O + 2n CO 2 where n is the number of glucose units. Using molecular weight of water (18.016), the mass yield is calculated as: yield =

(46.07 )(2n) 92.14n = 180.16n - 18.016(n - 1) 162.144n + 18.016

Thus the maximum theoretical ethanol yields are 0.538 g/g for cellobiose (n=2), and 0.568 g/g for cellulose (n → ∞) .

24.5.1 Saccharomyces cerevisiae Some yeasts such as Pichia stipitis, Candida shehatae, and Pachysolen tannophilus do ferment xylose, albeit with poor ethanol yields and low ethanol tolerance [40]. Xylose enters the metabolism as xylulose, which is phosphorylated by xylulokinase (XK) to xylulose-5-P, and proceeds to pentose phosphate pathway and glycolysis (Figure 24.2). There are two dissimilation pathways between xylose and xylulose: in yeasts, xylose is first reduced to xylitol by xylose reductase (XR), and xylitol is oxidized to xylulose by xylitol dehydrogenase (XDH); in bacteria, xylose isomerase (XI) converts xylose to xylulose directly. These pathways represent two metabolic engineering approaches for heterologous gene expression in S. cerevisiae. Expression of P. stipitis genes XYL1 and XYL2 encoding XR (Xyl1p) and XDH (Xyl2p) in S. cerevisiae allows for oxidative growth on xylose, but with xylitol accumulation and thus low ethanol yield and [41]. This is because the P. stipitis XR prefers the redox cofactor NADPH to NADH [42], but the XDH is specific to NAD only [43], resulting in cofactor imbalance (i.e., NADH accumulation and NADPH depletion). Various approaches to alleviate this cofactor imbalance were reported, including controlling the XR:XDH expression ratio to a low value [44], mutation to reduce the affinity of Xyl1p to NADPH [45] and Xyl2p to NAD [46], and shifting the cofactor specificity of Xyl2p from NAD to NADP by

24-7

Metabolic Engineering for Alternative Fuels

NADPH NADH

D-xylose

NADP NADPH

NADP NAD

Xylitol

Xylose reductase YEASTS

NAD

Xylose isomerase BACTERIA

NADH

L-xylulose reductase

Xylitol dehydrogenase

NADH NAD L-xylulose

L-arabinitol 4-dehydrogenase

Aldose reductase

FUNGI

NADP NAD NADPH NADH

L-arabinose

D-xylulose ATP Xylulokinase ADP

L-arabinitol

L-ribulosephosphate 4-epimerase L-ribulose-5-P

D-xylulose-5-P Pentose phosphate pathway

L-arabinose isomerase

BACTERIA

L-ribulokinase ADP

L-ribulose

ATP

D-glyceraldehyde-3-P Glycolysis Pyruvate PDC, ADH Ethanol

Figure 24.2 (See color insert following page 13-20.) Upstream pathways for xylose and arabinose in bacteria, fungi, and yeasts.

introducing a zinc binding site [47] or NADP-recognition sequence [46]. Ammonium assimilation pathway mediated by two glutamate dehydrogenases was altered by deleting GDH1 (NADPH-dependent) and overexpressing GDH2 (NADH-dependent), resulting in 44% reduction in xylitol accumulation and 16% increase in ethanol yield [48]. Through 13C labeling flux analysis, it was determined that the modification had shifted the cofactor preference of Xyl1p from NADPH to NADH [49]. Pathway bottlenecks downstream of xylulose have been reported as well. Flux into the pentose phosphate pathway is limited by XK activity [50]. This is addressed by overexpression and chromosomal integration of endogenous S. cerevisiae XK (XKS1) [51,52]. However, high level expression of XK could result in ATP depletion, and thus severe impairment of xylose uptake [53]. Optimal growth and xylose fermentation at moderate XK expression can be achieved by using tunable vectors with varying promoter strength and copy numbers [54]. Low capacity of the pentose phosphate pathway, as evident in the accumulation of sedoheptulose-7-P [55], was addressed by overexpression of the endogeneous transketolase (TKL1) and transaldolase (TAL1) [56]. A research group at Purdue University has developed an industrial S. cerevisiae strain based on the XYL1-XYL2-XKS1 expression (strain 424A(LNH-ST), derivative of ATCC 4214) [57]. This strain has been licensed to Iogen Corp (Ontario, Canada) for their bioethanol production. Strain TMB3001 is another recombinant S. cerevisiae expressing XYL1-XYL2-XKS1. This strain is able to grow anaerobically on glucose and xylose, but growth on xylose as sole carbon source requires oxygen [52]. Directed evolution by progressively lowering oxygen and glucose levels resulted in a mutant (C1) that can grow anaerobically on xylose [58]. DNA microarray and flux analyses were used in characterization of the C1 mutant vs. parent strain and the effect of external electron acceptor (acetoin). The results suggested that ATP production is the principal limitation in anaerobic growth on xylose, although redox limitation is inextricably related [59].

24-8

Future Applications of Metabolic Engineering

The other metabolic engineering route to xylulose is through XI (Figure 24.2). Expression of XI from anaerobic rumen fungus Piromyces sp. E2 (XylA) in S. cerevisiae allowed slow anaerobic growth on xylose (µ=0.005 h–1) [60]. Evolutionary selection under anaerobic condition yielded a faster growing strain (µ=0.03 h–1) with high ethanol yield [61], which was further modified by overexpression of nonoxidative pentose phosphate pathway genes and deletion of aldose reductase (GRE3) to minimize xylitol production [62]. The resulting strain RB217 had very high growth rate (µ=0.09 h–1) and xylose consumption (1.1 g/g biomass/h), the highest known to date. Although in mixed glucose-xylose culture sugar consumption was diauxic with slower xylose consumption rate, further evolutionary adaptation under xylose limiting condition resulted in strain RB218 with 50% faster xylose consumption (0.9 g/g biomass/h) [63]. In fungi, L-arabinose is converted to D-xylulose-5-P through L-arabinitol, L-xylulose, xylitol, and D-xylulose as intermediates (Figue 24.2). Expression of fungal L-arabinitol 4-dehydrogenase (lad1) and L-xylulose reductase (lxr1) along with XYL1, XYL2, and XKS1 in S. cerevisiae allowed for fermentative growth on L-arabinose, albeit a very slow one [64]. In bacteria, conversion of L-arabinose to D-xylulose5-P (through L-ribulose and L-ribulose-4-P intermediates) is mediated by the gene products of araBAD (L-arabinose isomerase, L-ribulokinase, L-ribulose-5-P-4-epimerase). Overexpression of Bacillus subtilis araA, E. coli araB and araD, along with endogenous GAL2 (galactose permease) resulted in growth of S. cerevisiae on L-arabinose. Evolutionary adaptation yielded a faster growing strain (doubling time 8 h) with ethanol productivity 0.08 g/g biomass/h [65]. Enzymatic and DNA microarray analyses showed that beneficial mutations had occurred, namely increased transaldolase expression, and reduced activity and affinity of L-ribulokinase (AraB) to ribulose (thus tighter control on ATP utilization) [65]. This is analogous to optimal xk expression at moderate level, as discussed earlier. During fermentation, excess NADH is reoxidized to NAD+ by glyceraldehyde-3-P dehydrogenase (GPD1 and GPD2) in the conversion of dihydroxyacetone-P to glycerol-3-P, resulting in glycerol by product [66]. Deletion of NADPH-dependent GDH1, along with overexpression of NADH-dependent GLN1 (glutamine synthetase) and GLT1 (glutamate synthase) was successful in redirecting NADH to the ammonium assimilation pathway, resulting in 38% decrease in glycerol production and 10% increase in ethanol yield [67]. In silico genome-scale metabolic network reconstruction was utilized to formulate optimization strategies for decreased glycerol and improved ethanol yields [68]. The best strategy calls for heterologous expression of nonphosphorylating, NADP+-dependent glyceraldehyde-3-P dehydrogenase (GAPN) from Streptococcus mutans, which converts glyceraldehyde-3-P directly to 3-phosphoglycerate (bypassing 1,3bisphophoglycerate intermediate in the endogenous pathway). GAPN expression resulted in 40% decrease in glycerol yield and 3% increase in ethanol yield in anaerobic glucose fermentation, and 25% improvement in ethanol yield in anaerobic cofermentation of glucose and xylose [68]. Laccase detoxifies phenolic compounds by promoting free radical polymerization to high molecular weight products [69]. Expression of laccase from white-rot fungus Trametes versicolor in S. cerevisiae, along with overexpression of native Sso2p (a native membrane-bound protein) for improved secretion, restored growth on glucose in the presence of 1.25 mM coniferyl aldehyde. This strain fermented diluteacid pretreated spruce hydrolysate with ethanol yield of 0.44 g/g sugar [69]. Resistance of S. cerevisiae to aromatic carboxylic acids is attributed to phenylacrylic acid decarboxylase (Pad1p), which catalyzes conversion to the vinyl form [70]. When challenged with ferulic acid, cinnamic acid, and spruce hydrolysate, strain overexpressing Pad1p exhibited faster growth and ethanol production. Improvements in ethanol productivity (29%) and uptake rates of glucose (25%) and mannose (45%) were observed in fermentation of spruce hydrolysate [70].

24.5.2 Zymomonas mobilis Z. mobilis is a unique microorganism that metabolizes glucose exclusively through the Entner– Doudoroff (ED) pathway anaerobically [71]. The ED pathway yields 1 mol ATP per mol glucose, half of that in Emden–Meyerhoff–Parnas (EMP) pathway. In Z. mobilis this low energy yield is compensated by

Metabolic Engineering for Alternative Fuels

24-9

sustained high glycolytic flux and constitutive expression of fermentative enzymes (50% of total protein) [72]. Z. mobilis exhibits a homoethanol pathway through a very efficient pyruvate decarboxylase (PDC) and alcohol dehydrogenase (ADH) system [15]. In fact, it is superior to yeasts in terms of ethanol yield and productivity, because of lower biomass yield due to limited ATP supply [73]. To introduce xylose metabolism, E. coli genes xylA, xylB, tktA, and talB (xylose isomerase, xylulokinase, transketolase, and transaldolase) were expressed in Z. mobilis CP4 (pZB5), allowing for growth on xylose with 86% ethanol yield [74]. Similarly, arabinose metabolism was introduced by expression of E. coli genes araBAD (L-arabinose isomerase, L-ribulokinase, L-ribulose-5-P-4-epimerase), as well as tktA and talB in Z. mobilis ATCC39676 (pZB206), resulting in growth on arabinose with 98% ethanol yield [75]. In both cases, xylose or arabinose is first converted to xylulose-5-P, then proceeds into the pentose phosphate pathway to yield glyceraldehyde-3-P, which is an intermediate in EMP pathway (Figure 24.2). Since all sugars are transported by facilitated diffusion through glucose permease, Z. mobilis exhibits simultaneous consumption of sugar mixtures. However, arabinose uptake is much slower because the glucose permease Glf has low affinity to arabinose [76]. Based on the previous work, all seven genes for xylose and arabinose metabolism were expressed via plasmid in strain 206C(pZB301) and chromosomal integration in strain AX101. Fermentation with 40 g/L glucose, 40 g/L xylose, and 20 g/L arabinose in both strains resulted in identical ethanol yield (84%). Also observed was incomplete arabinose consumption (25% residual) and minimal formation of byproducts xylitol, acetic acid and lactic acid [77]. Although Z. mobilis can tolerate up to 12% (w/v) ethanol [78], it is sensitive to acetic acid. For example, strain AX101 lost 50% of ethanol productivity when subjected to 2.5 g/L acetic acid [79]. A directed evolution strategy to increase acetic acid tolerance was successfully applied to xylose fermenting Z. mobilis 39767(pZBL4). The strain was grown on glucose and xylose medium and subjected to increasing concentrations of dilute acid-pretreated hydrolysate of yellow poplar wood, which at the final 50% (v/v) contained 7.5 g/L acetic acid. The adapted strain delivered 94–96% ethanol yield in the presence of 4–10 g/L acetic acid [80]. Also, expression of basic proton binding peptide from E. coli cbpA gene in Z. mobilis CP4(pJB99-2) conferred tolerance to acetic acid (pH 3.5) and HCl (pH 3.0) [81].

24.5.3 Escherichia coli Wild-type E. coli has an excellent range of substrate utilization, including all of the lignocellulosic sugars (glucose, xylose, arabinose, mannose, galactose) [66]. E. coli also grows well under anaerobic or aerobic condition, and can sustain high glycolytic flux. However, ethanol yield is poor because under fermentative condition E. coli also produces lactic acid, acetic acid, formic acid, and succinic acid [82]. Homoethanol fermentation in E. coli is hindered by redox imbalance. The pathway to ethanol starts with pyruvate, which is cleaved into acetyl-CoA and formic acid by pyruvate formate lyase (PFL) (Figure 24.3). Reduction of acetyl-CoA to ethanol proceeds in two steps through acetaldehyde as intermediate; the multienzyme protein AdhE plays the role of acetaldehyde dehydrogenase and alcohol dehydrogenase, each requiring one stoichiometric NADH [83]. Thus on a triose basis the pathway from pyruvate to ethanol consumes two NADH, while glycolysis to pyruvate only provides one NADH (in conversion of glyceraledehyde-3-P to 1,3-bisphosphoglycerate). Therefore, ethanol production is balanced by other more oxidized products such as acetic acid (no NADH). To circumvent the redox limitation of endogenous ethanol pathway, the PDC and ADH enzymes from Z. mobilis was expressed in E. coli, via a plasmid bearing an artificial pet (production of ethanol) operon containing the pdc and adhB genes [84]. The transformation conferred homoethanol pathway, with ethanol accounting for 95% of fermentation products. This success is attributed to the higher affinity of PDC to pyruvate (K m = 0.4 mM) compared to the competing native enzymes PFL (K m = 2.0 mM) and LDH (K m = 7.2 mM), as well as the higher affinity of ADH to NADH (K m = 12 mM) compared to the native enzyme AdhE (K m = 50 mM) [15]. Also, redox balance is possible in the heterologous pathway

24-10

Future Applications of Metabolic Engineering 1/2 glucose NAD

NADH Phosphoenolpyruvate pykF pykA NAD Lactate

NADH IdhA

NADH Pyruvate

NAD Fumarate

Oxalocetate

Menaquinol

pflB

pdc

NADH

ppc

Acetaldehyde

Formate

frdABCD gltA

Acetyl-CoA

Citrate

Menaquinone-8 Succinate

NADH

adhB

adhE

NAD

pta

NAD

Ethanol

Acetaldehyde fdhF hycBCDEFG

CO2

H2

α-ketoglutarate

Acetyl-P

NADH adhE

ackA

NAD Ethanol

Acetate

Glutamate family amino acids

Figure 24.3 Fermentative pathways in E. coli, with redox cofactor utilization. Dashed lines signify multi-step pathways. Heterologous PDC-ADH pathway from Z. mobilis (pet operon) is shaded.

because cleavage of pyruvate to acetaldehyde and CO2 by PDC is nonoxidative, thus only one NADH is required in the reduction of acetaldehyde to ethanol by ADH. The pet operon was integrated into the pfl locus of E. coli B (ATCC 11303) to take advantage of its strong, constitutive native promoters. However, the recombinants had low PDC expression, and therefore low ethanol yield [85]. Selection on high chloramphenicol (600 µg/ml) or aldehyde indicator plates resulted in analogous mutants with PDC expression comparable to that in the plasmid-bearing strain and Z. mobilis. Further deletion of fumarate reductase (∆frdABCD) reduced succinic acid production by 95%; the resulting strain KO11 exceeds 100% theoretical ethanol yield when grown on glucose or xylose in rich medium [85]. Compared to the parent strain (E. coli B), KO11 exhibits higher maximum growth rate (30% higher) and glycolytic flux (50% higher). This is attributed to higher expression of the xylose catabolic genes, which came to light through microarray analysis [86]. Directed evolution of KO11 by increasing ethanol concentration from 35 to 50 g/L resulted in strain LY01, which fermented xylose to 60 g/L ethanol titer with 85% yield [87]. Microarray analysis revealed increased glycine metabolism and betaine synthesis in LYO1 compared to KO11, therefore linking ethanol tolerance to osmotic stress (glycine and betaine are protective osmolytes) [88]. Addition of glycine or

Metabolic Engineering for Alternative Fuels

24-11

betaine was shown to increase ethanol tolerance in KO11. LYO1 also exhibits tolerance to furfural and 5-hydroxymethylfurfural but not to aromatic aldehydes, suggesting development of resistance to a limited class of relatively hydrophilic inhibitors (like ethanol) [89]. Although toxicity is correlated with hydrophobicity, unlike ethanol, aromatic aldehydes were shown to cause no damage to integrity of cell membrane. Although E. coli B grows well in minimal medium, derivative strains bearing the pet operon require complex nutrient supplementation to support growth and ethanol production. Switching from LB to minimal medium resulted in 85% decrease in growth and xylose fermentation [90]. LB at 15 g/L can be substituted by 5% CSL with equivalent results, but lowering CSL level to 1% resulted in 50% decrease of ethanol yield [91]. Systematic studies have ruled out macronutrient and energy limitation; instead, the problem is caused by unbalanced partitioning of pyruvate between biosynthetic and fermentative pathways [92]. Cleavage of pyruvate to acetaldehyde and CO2 by PDC bypasses acetyl-CoA altogether, therefore limiting availability of acetyl-CoA for the anaplerotic reaction leading to α-ketoglutarate, which is the precursor for glutamate family amino acids. The committed step in the α-ketoglutarate pathway is the formation of citrate from oxaloacetate and acetyl-CoA by citrate synthase (GltA), which is inhibited by NADH (Figure 24.3). Expression of NADH-insensitive citrate synthase (CitZ) from Bacillus subtilis in KO11 resulted in 75% increase in growth and ethanol production on 10% xylose medium supplemented with 1% CSL [92]. Addition of pyruvate, acetaldehyde, and acetate also stimulate growth and ethanol production by increasing the intracellular acetyl-CoA pool, through dynamic export and import of extracellular acetate [93]. The same positive effect was achieved by blocking acetyl-CoA-utilizing acetate pathway through inactivation of acetate kinase (ackA) [93]. The phosphenolpyruvate:carbohydrate phosphotransferase system (PTS) uses phosphoenolpyruvate as an ATP equivalent in active sugar transport [94]. The glucose PTS protein IIAglc (crr) also exerts regulatory control on intracellular level of cAMP, which is an allosteric effector required in expression of catabolic enzymes for other sugars. Thus the PTS is responsible for the “glucose effect,” i.e., glucose represses the utilization of less preferred carbon sources (e.g., xylose, arabinose), resulting in diauxic growth and sequential consumption of sugar mixtures. Disruption of the PTS relieves glucose repression on xylose and arabinose, and hence simultaneous sugar uptake [95,96]. Although this could be a useful phenotype from the productivity standpoint, benchmark batch fermentation showed that PTS-Glc+ strain (spontaneous glucose revertant of PTS strain) [97] did not achieve complete sugar consumption faster than the wild-type with glucose-xylose and glucose-arabinose mixtures, although a marginal improvement was observed in fermentation with mixture of all three sugars [95]. Also, disruption of the glucose PTS transporter IIBCglc (ptsG) in ethanologenic E. coli bearing the pet operon did not improve ethanol yield, and in fact lowered productivity in fermentation of sugar mixtures [96]. However, selective pressure of fosfomycin (an antibiotic analogous to phosphoenolpyruvate) on KO11 resulted in a mutant with 20% higher ethanol yield when grown on xylose [98].

24.5.4 Klebsiella oxytoca Like E. coli, K. oxytoca is a gram-negative, enteric bacterium part of normal intestinal flora, but it is also an opportunistic pathogen causing multiresistant infections in hospital settings [99]. In the natural habitat K. oxytoca grows in paper and pulp waste streams and around other wood sources [73]. K. oxytoca naturally produces β-glucosidase and possesses special transport systems enabling it to grow on oligosaccharides, including cellobiose and cellotriose. From processing standpoint this is an attractive feature because β-glucosidase need not be added in enzymatic hydrolysis or SSF, and absence of glucose in the fermentation broth minimizes risk of contamination by other microorganisms [73]. Development of K. oxytoca as biocatalyst for ethanol fermentation mirrors the effort done on E. coli. Expression of pet operon in K. oxytoca strain M5A1 conferred near homoethanol pathway—fermentation on rich medium with glucose or xylose resulted in maximum 98% theoretical ethanol yield, with ethanol accounting for 90% for fermentation products. Unlike in S. cerevisiae or E. coli, ethanol productivity on xylose is equivalent to that on glucose (>2.0 g/L/h), and in fact almost twice as high as that in E. coli [100].

24-12

Future Applications of Metabolic Engineering

Integration of pet operon into the pfl locus, followed by selection on high chloramphenicol plates (600 µg/ml) resulted in strain P2, which fermented 10% glucose or 10% cellobiose with comparable ethanol yield (96% theoretical) and productivity (>1.5 g/L/h). Benchmark SSF of crystalline cellulose with commercial cellulases (including β-glucosidase) showed consistently superior ethanol production of strain P2 compared to E. coli KO11 [101]. Although K. oxytoca can grow at pH as low as 5.0 and temperature as high as 35°C, fungal cellulases are usually optimal at more extreme conditions, e.g., T. reesei cellulases have optimal activity at pH 4.5 and 55°C [73]. In order to better simulate industrial SSF conditions, strain P2 was grown on glucose in CSL-based medium at pH 5.2. Compared to growth at pH 6.8, the low pH condition resulted in significant accumulation (14% carbon yield) of 2,3-butanediol and acetoin. Deletion of the budAB operon (encoding α-acetolactate decarboxylase and α-acetolactate synthase) eliminated the unwanted coproducts and increased ethanol yield from 83% to 93% [102].

24.6 Consolidated Bioprocessing Biocatalyst for CBP carries out the dual function of SSF. In order to combine these functions into a single agent, two complementary strategies have been pursued. Native cellulolytic strategy seeks to engineer homoethanol pathway into cellulase/hemicellulase-producing organisms, while recombinant cellulolytic strategy seeks to confer saccharifying traits to biocatalysts already possessing excellent ethanologenic function [25]. Some successful examples are given in the following.

24.6.1 Native Cellulolytic Strategy The genus Erwinia represents plant pathogenic bacteria, which attack plant tissue by secreting hydrolases and lyases that are effective in solubilizing lignocellulose. Erwinia carotovora SR38 and Erwinia chrysanthemi EC16 were transformed with the pet operon, allowing them to efficiently produce ethanol in glucose and cellobiose fermentation. Excellent ethanol yield (0.5 g/g cellobiose) and productivity (1.5 g/L/h) were observed in both strains [103]. The celullolytic anaerobe Clostridium cellulolyticum is not adapted to growth in rich medium, which leads to glycolytic overflow and accumulation of inhibitory compounds, and thus impaired growth [104]. Expression of pet operon in C. cellulolyticum channels excess pyruvate into acetate (93% increase) and ethanol (53% increase), and less to lactate (48% decrease). Prolonged growth phase was also observed, resulting in 150% increase in cellulose consumption and 180% increase in cell density [104].

24.6.2 Recombinant Cellulolytic Strategy Endoglucanase genes celY and celZ from E. chrysanthemi were integrated into K. oxytoca strain P2, along with plasmid expression of the E. chrysanthemi accessory transporter encoded by the out gene [105]. The resulting strain SZ21 produces 20,000 units/L of endoglucanase activity extracellularly. SZ21 self-sufficiently grows on amorphous cellulose, producing ethanol at 58–76% theoretical yield [106]. However, growth on crystalline cellulose requires addition of exogenous cellulases [105]. A novel S. cerevisiae strain was constructed using fusion proteins of heterologous cellulases with α-agglutinin. The resulting strain codisplays endoglucanase II and cellobiohydrolase II from T. reesei and β-glucosidase 1 from Aspergillus aculeatus on the cell surface, with α-agglutinin as anchor. This strain self-sufficiently grows on amorphous cellulose with 88.5% theoretical ethanol yield [107].

24.7 Emerging Biofuel Platforms While previous sections in this chapter have focused on the successes and challenges of metabolic engineering for the production of ethanol from cellulosic sugars, this section highlights progress

Metabolic Engineering for Alternative Fuels

24-13

on pathway engineering for the production of other biofuel molecules and the use of alternative carbon sources. Biodiesel has been proposed as a secure, renewable, and environmentally safe alternative to petroleumderived diesel. Although currently produced by transesterification of vegetable oils or animal fats, biological production of biodiesel by an engineered strain of E. coli has been reported [108]. Overproduction of free fatty acids in E. coli was also demonstrated [109]. Higher-chain alcohols have been proposed as an alternative to ethanol due to their higher energy density and lower hygroscopicity. Among them, butanol has received significant attention and several metabolic engineering strategies have been used to improve its production in Clostridia and other microorganisms [110]. E. coli was recently engineered for the production of branched-chain alcohols to be used as biofuels, including isobutanol, 3-methyl-1butanol, 1-butanol, 1-propanol, 2-methyl-1-butanol, and 2-phenylethanol [111]. Derivative of isoprenoids pathways are another source of biofuels, with potential to replace both gasoline and diesel [112]. Although the production of biofuels via microbial fermentation has been largely based on the use of carbohydrates as carbon and energy sources, an alternative approach is the fermentation of syngas (CO, CO2, H2) produced by the gasification of biomass [16]. Compounds generated in large amounts as by-products in the emerging biofuels industry, such as glycerol, can also be used as feedstocks in the production of biofuels. Given the high degree of reduction per carbon in glycerol, fuels can be produced at yields higher than those obtained from common sugars [113]. Recent reports demonstrate the conversion of glycerol to ethanol and other biofuels [114,115].

24.8 Conclusions and Future Outlook Metabolic engineering has been indispensable as the enabling technology for cellulosic ethanol production, as evident in improvements of biocatalysts, saccharifying enzymes, and biomass feedstock described in previous sections. The use of classical metabolic engineering and strain improvement approaches has made possible most of this progress. Newer tools for engineering complex phenotypes that are based on the use of whole-cell analysis and engineering approaches hold great promise for developing advanced biofuels. These include, global gene and protein expression profiling in combination with metabolite profiling and flux analysis [116,117], in silico metabolic reconstruction and analysis [118], metagenomic approaches [119], whole genome shuffling [120,121], global transcriptional machinery engineering [122], and evolutionary engineering [123].

References 1. Hirsch, R.L, Bezdek R., and Wendling R. Peaking of world oil production: impacts, mitigation, & risk management. 2005. http://www.netl.doe.gov/publications/others/pdf/Oil_Peaking_NETL.pdf 2. Mielenz, J.R. Bioenergy for ethanol and beyond. Curr. Opin. Biotech. 2006, 17, 303–304. 3. Gray, K.A., Zhao, L., and Emptage, M. Bioethanol. Curr. Opin. Chem. Biol. 2006, 10, 141–146. 4. Schubert, C. Can biofuels finally take center stage? Nat. Biotechnol. 2006, 24, 777–784. 5. Farrell, A.E., Plevin, R.J., Turner, B.T., Jones, A.D., O’Hare, M., and Kammen, D.M. Ethanol can contribute to energy and environmental goals. Science. 2006, 311, 506–508. 6. Worldwatch Institute. Biofuels for transportation: global potential and implications for sustainable agriculture and energy in the 21st century. 2006. http://www.worldwatch.org/pubs/biofuels 7. Energy Policy Act. 2005. http://www.ferc.gov/legal/maj-ord-reg/fed-sta/ene-pol-act.asp 8. European Union Biofuels Use Directive. 2003. http://ec.europa.eu/energy/res/legislation/doc/biofuels/en_final.pdf 9. National Ethanol Vehical Coalition. http://www.e85fuel.com 10. Shapouri, H. and Gallagher, P. USDA’s 2002 ethanol cost-of-production survey. 2005. http://www. ncga.com/ethanol/pdfs/031506USDACostOfProduction.pdf 11. Herrera, S. Bonkers about biofuels. Nat. Biotechnol. 2006, 24, 755–760.

24-14

Future Applications of Metabolic Engineering

12. Perlack, R.D., Wright, L.L., Turhollow, A.F., Graham, R.L., Stokes, B.J., and Erbach, D.C. Biomass as feedstock for bioenergy and bioproducts industry: the technical feasibility of a billion-ton annual supply. 2005. http://feedstockreview.ornl.gov/pdf/billion_ton_vision.pdf. 13. U.S. Department of Energy Office of Science. Breaking the barriers to cellulosic ethanol: a joint research agenda. http://www.doegenomestolife.org/biofuels/ 14. Wiselogel, A., Tyson, J., and Johnsson, D. 1996. Biomass feedstock resources and composition. In Handbook on bioethanol: production and utilization. Wyman CE (ed.). Taylor and Francis: Washington, D.C., 105–118. 15. Aristidou, A. and Penttilä, M. Metabolic engineering applications to renewable resource utilization. Curr. Opin. Biotech. 2000, 11, 187–198. 16. Henstra, A.M., Sipma, J., Rinzema, A., and Stams, A.J.M. Microbiology of synthesis gas fermentation for biofuel production. Curr. Opin. Biotechnol. 2007, 18, 200–206. 17. Keasling, J.D. and Chou, H. Metabolic engineering delivers next-generation biofuels. Nat. Biotechnol. 2008, 26, 298–299. 18. Pan, X.J., Gilkes, N., Kadla, J., Pye, K., Saka, S., Gregg, D., Ehara, K., Xie, D., Lam, D., and Saddler, J. Bioconversion of hybrid poplar to ethanol and co-products using an organosolv fractionation process: optimization of process yields. Biotechnol. Bioeng. 2006, 94, 851–861. 19. Wyman, C.E., Dale, B.E., Elander, R.T., Holtzapple, M., Ladisch, M.R., and Lee, Y.Y. Coordinated development of leading biomass pretreatment technologies. Bioresour. Technol. 2005, 96, 1959–1966. 20. Mosier, N., Wyman, C., Dale B., Elander, R., Lee, Y.Y., Holtzapple, M., and Ladisch, M. Features of promising technologies for pretreatment of lignocellulosic biomass. Bioresour. Technol. 2005, 96, 673–686. 21. Wyman, C.E., Dale, B.E., Elander, R.T., Holtzapple, M., Ladisch, M.R., and Lee, Y.Y. Comparative sugar recovery data from laboratory scale application of leading pretreatment technologies to corn stover. Bioresour. Technol. 2005, 96, 2026–2032. 22. Klinke, H.B., Thomsen, A.B., and Ahring, B.K. Inhibition of ethanol-producing yeast and bacteria by degradation products produced during pre-treatment of biomass. Appl. Microbiol. Biotechnol. 2004, 66, 10–26. 23. Lynd, L.R., Weimer, P.J., van Zyl, W.H., and Pretorius I.S. Microbial cellulose utilization: fundamentals and biotechnology. Microbiol. Mol. Biol. Rev. 2002, 66, 506–577. 24. Lin, Y. and Tanaka, S. Ethanol fermentation from biomass resources: current state and prospects. Appl. Microbiol. Biotechnol. 2006, 69, 627–642. 25. Lynd, L.R., van Zyl, W.H., McBride, J.E., and Laser, M. Consolidated bioprocessing of cellulosic biomass: an update. Curr. Opin. Biotechnol. 2005, 16, 577–583. 26. Brown, S.F. Biorefinery breakthrough. Fortune. February 6, 2006, p 88. 27. Dean, J.F.D. 2005. Synthesis of lignin in transgenic and mutant plants. In Biotechnology of biopolymers, from synthesis to patents. Steinbuchel, A., Doi, Y., and Weinheim, D.E. (eds). Wiley-VCH Verlag: Weinheim, Germany, 4–26. 28. Meyer, K., Shirley, A.M., Cusumano, J.C., Bell-Lelong, D.A., and Chapple, C. Lignin monomer composition is determined by the expression of a cytochrome P450-dependent monooxygenase in arabidopsis. Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 6619–6623. 29. Franke, R., McMichael, C.M., Meyer, K., Shirley, A.M., Cusumano, J.C., and Chapple, C. Modified lignin in tobacco and poplar plants overexpressing the Arabidopsis gene encoding ferulate 5-hydroxylase. Plant. J. 2000, 22, 223–234. 30. Li, L.G., Zhou, Y.H., Cheng, X.F., Sun, J.Y., Marita, J.M., Ralph, J., and Chiang, V.L. Combinatorial modification of multiple lignin traits in trees through multigene contransformation. Proc. Natl. Acad. Sci. USA. 2003, 100, 4939–4944. 31. Sticklen, M. Plant genetic engineering to improve biomass characteristics for biofuels. Curr. Opin. Biotechnol. 2006, 17, 315–319.

Metabolic Engineering for Alternative Fuels

24-15

32. Teymouri, F., Alizadeh, H., Laureano-Perez, L., Dale, B.E., and Sticklen, M. Effects of ammonia fiber explosion treatment on activity of endoglucanase from Acidothermus cellulolyticus in transgenic plants. Appl. Biochem. Biotechnol. 2004, 116,1183–1192. 33. Demain, A.L., Newcomb, M., and Wu, JHD. Cellulase, clostridia, and ethanol. Microbiol. Mol. Biol. Rev. 2005, 29, 124–154. 34. Bayer, E.A., Belaich, J.P., Shoham, Y., and Lamed, R. The cellulosomes: multienzyme machines for degradation of plant cell wall polysaccharides. Annu. Rev. Microbiol. 2004, 58, 521–554. 35. Fierobe, H-P., Mingardon, F., Mechaly, A., Bélaïch, A., Rincon, M., Pagès, S., Lamed, R., Tardif, C., Bélaïch, J-P., and Bayer, E.A. Action of designer cellulosomes on homogeneous versus complex substrates: controlled incorporation of three distinct enzymes into a defined trifunctional scaffoldin. J. Biol. Chem. 2005, 16, 16325–16334. 36. Caspi, J., Irwin, D., Lamed, R., Shoham, Y., Fierobe, H-P., Wilson, D., and Bayer, E. Thermobifida fusca family-6 cellulases as potential designer cellulosome components. Biocatal.Biotransform. 2006, 10, 3–12. 37. Durand, H., Clanet, M., and Tiraby, G. Genetic improvement of Trichoderma reesei for large scale cellulase production. Enzyme Microb. Technol. 1988, 10, 341–346. 38. Greer, D. Spinning straw into fuel. Biocycle. 2005, 46, 61–65. 39. Walker, G. 1998. Yeast growth. In Yeast: physiology and biotechnology. Walker G (ed.). Wiley: New York, 101–202. 40. Jeffries, T.W. and Jin, Y.-S. Metabolic engineering for improved fermentation of pentoses by yeasts. Appl. Microbiol. Biotechnol. 2004, 63, 495–509. 41. Jeffries, T.W. Engineering yeasts for xylose metabolism. Curr. Opin. Biotechnol. 2006, 17, 320–326. 42. Verduyn, C., Van Kleef, R., Frank, J., Schreuder, H., Van Dijken, J.P., and Scheffers, W.A. Properties of the NAD(P)H-dependent xylose reductase from the xylose-fermenting yeast Pichia stipitis. Biochemical J. 1985, 226, 669–677. 43. Kötter, P., Amore, R., Hollenberg, C.P., and Ciriacy, M. Isolation and characterization of the Pichia stipitis xylitol dehydrogenase gene, XYL2, and construction of a xylose-utilizing Saccharomyces cerevisiae transformant. Curr. Genet. 1990, 18, 493. 44. Walfridsson, M., Anderlund, M., Bao, X., and Hahn-Hägerdal, B. Expression of different levels of enzymes from the Pichia stipitis XYL1 and XYL2 genes in Saccharomyces cerevisiae and its effects on product formation during xylose utilisation. Appl. Microbiol. Biotechnol. 1997, 48, 218–224. 45. Jeppsson, M., Bengtsson, O., Franke, K., Lee, H., Hahn-Hägerdal, B., and Gorwa-Grauslund, M.F. The expression of a Pichia stipitis xylose reductase mutant with higher KM for NADPH increases ethanol production from xylose in recombinant Saccharomyces cerevisiae. Biotechnol. Bioeng. 2005, 93, 665–673. 46. Metzger, M.H. and Hollenberg, C.P. Amino acid substitutions in the yeast Pichia stipitis xylitol dehydrogenase coenzyme-binding domain affect the coenzyme specificity. Eur. J. Biochem. 1995, 228, 50–54. 47. Watanabe, S., Kodaki, T., and Makino, K. Complete reversal of coenzyme specificity of xylitol dehydrogenase and increase of thermostability by the introduction of structural zinc. J Biol. Chem. 2005, 280, 10340–10349. 48. Roca, C., Nielsen, J., and Olsson, L. Metabolic engineering of ammonium assimilation in xylosefermenting Saccharomyces cerevisiae improves ethanol production. Appl. Environ. Microbiol. 2003, 69, 4732–4736. 49. Grotkjaer, T., Christakopoulos, P., Nielsen, J., and Olsson, L. Comparative metabolic network analysis of two xylose fermenting recombinant Saccharomyces cerevisiae strains. Metab. Eng. 2005, 7, 437–444. 50. Chang, S.F. and Ho, N.W. Cloning the yeast xylulokinase gene for the improvement of xylose fermentation. Appl. Biochem. Biotechnol. 1988, 17, 313–318. 51. Ho, N.W., Chen, Z., and Brainard, A.P. Genetically engineered Saccharomyces yeast capable of effective co-fermentation of glucose and xylose. Appl Environ Microbiol. 1998, 64, 1852–1859.

24-16

Future Applications of Metabolic Engineering

52. Eliasson, A., Christensson, C., Wahlbom, C.F., and Hahn-Hägerdal, B. Anaerobic xylose fermentation by recombinant Saccharomyces cerevisiae carrying XYL1, XYL2, and XKS1 in mineral medium chemostat cultures. Appl. Environ. Microbiol. 2000, 66, 3381–3386. 53. Johansson, B., Christensson, C., Hobley, T., and Hahn-Hägerdal, B. Xylulokinase overexpression in two strains of Saccharomyces cerevisiae also expressing xylose reductase and xylitol dehydrogenase and its effect on fermentation of xylose and lignocellulosic hydrolysate. Appl. Environ. Microbiol. 2001, 67, 4249–4255. 54. Jin, Y.S., Ni, H., Laplaza, J.M., and Jeffries, T.W. Optimal growth and ethanol production from xylose by recombinant Saccharomyces cerevisiae require moderate D-xylulokinase activity. Appl. Environ. Microbiol. 2003, 69, 495–503. 55. Senac, T. and Hahn-Hägerdal, B. Intermediary metabolite concentrations in xylulose- and glucosefermenting Saccharomyces cerevisiae cells. Appl. Environ. Microbiol. 1989, 56, 120–126. 56. Walfridsson, M., Hallborn, J., Penttilä, M., Keranen, S., and Hahn-Hägerdal, B. Xylose-metabolizing Saccharomyces cerevisiae strains overexpressing the TKL1 and TAL1 genes encoding the pentose phosphate pathway enzymes transketolase and transaldolase. Appl. Environ. Microbiol. 1995, 61, 4184–4190. 57. Ho, N.W.Y., Chen, Z., Brainard, A.P., and Sedlak, M. 2000. Genetically engineered Saccharomyces yeasts for conversion of cellulosic biomass to environmentally friendly transportation fuel ethanol. In ACS Symposium Series 767. American Chemical Society: New York, 142–159. 58. Sonderegger, M. and Sauer, U. Evolutionary engineering of Saccharomyces cerevisiae for anaerobic growth on xylose. Appl. Environ. Microbiol. 2003, 69, 1990–1998. 59. Sonderegger, M., Jeppsson, M., Hahn-Hägerdal, B., and Sauer, U. Molecular basis for anaerobic growth of Saccharomyces cerevisiae on xylose, investigated by global gene expression and metabolic flux analysis. Appl. Environ. Microbiol. 2004, 70, 2307–2317. 60. Kuyper, M., Harhangi, H.R., Stave, A.K., Winkler, A.A., Jetten, M.S., de Laat, W.T., den Ridder, J.J., Op den Camp, H.J., van Dijken, J.P., and Pronk, J.T. High-level functional expression of a fungal xylose isomerase: the key to efficient ethanolic fermentation of xylose by Saccharomyces cerevisiae? FEMS Yeast Res. 2003, 4, 69–78. 61. Kuyper, M., Winkler, A.A., van Dijken, J.P., and Pronk, J.T. Minimal metabolic engineering of Saccharomyces cerevisiae for efficient anaerobic xylose fermentation: a proof of principle. FEMS Yeast Res. 2004, 4, 655–664. 62. Kuyper, M., Hartog, M.M., Toirkens, M.J., Almering, M.J., Winkler, A.A., van Dijken, J.P., and Pronk, J.T. Metabolic engineering of a xyloseisomerase-expressing Saccharomyces cerevisiae strain for rapid anaerobic xylose fermentation. FEMS Yeast Res. 2005, 5, 399–409. 63. Kuyper, M., Toirkens, M.J., Diderich, J.A., Winkler, A.A., van Dijken, J.P., and Pronk, J.T. Evolutionary engineering of mixed-sugar utilization by a xylose-fermenting Saccharomyces cerevisiae strain. FEMS Yeast Res. 2005, 5, 925–934. 64. Richard, P., Verho, R., Putkonen, M., Londesborough, J., and Penttilä, M. Production of ethanol from L-arabinose by Saccharomyces cerevisiae containing a fungal L-arabinose pathway. FEMS Yeast Res. 2003, 3, 185–189. 65. Becker, J. and Boles, E. A modified Saccharomyces cerevisiae strain that consumes L-arabinose and produces ethanol. Appl. Environ. Microbiol. 2003, 69, 4144–4150. 66. Zaldivar, J., Nielsen, J., and Ollson, L. Fuel ethanol production from lignocellulose: a challenge for metabolic engineering and process integration. Appl. Microbiol. Biotechnol. 2001, 56, 17–34. 67. Nissen, T.L., Kielland-Brandt, M.C., Nielsen, J., and Villadsen, J. Optimization of ethanol production in Saccharomyces cerevisiae by metabolic engineering of the ammonium assimilation. Metab. Eng. 2000, 2, 69–77. 68. Bro, C., Regenberg, B., Forster, J., and Nielsen, J. In silico aided metabolic engineering of Saccharomyces cerevisiae for improved bioethanol production. Metab. Eng. 2005, 8, 102–111.

Metabolic Engineering for Alternative Fuels

24-17

69. Larsson, S., Cassland, P., and Jönsson, L.J. Development of a Saccharomyces cerevisiae strain with enhanced resistance to phenolic fermentation inhibitors in lignocellulose hydrolysates by heterologous expression of laccase. Appl. Environ. Microbiol. 2001, 67, 1163–1170. 70. Larsson, S., Nilvebrant, N.O., and Jonsson, L.J. Effect of overexpression of Saccharomyces cerevisiae Pad1p on the resistance to phenylacrylic acids and lignocellulose hydrolysates under aerobic and oxygen-limited conditions. Appl. Microbiol. Biotechnol. 2001, 57, 167–174. 71. Panesar, P.S., Marwaha, S.S., and Kennedy, J.F. Zymomonas mobilis: an alternative ethanol producer. J. Chem. Technol. Biotechnol. 2006, 81, 623–635. 72. Sprenger, G.A. Carbohydrate metabolism in Zymomonas mobilis: a catabolic highway with some scenic routes. FEMS Microbiol. Lett. 1996, 145, 301–307. 73. Dien, B.S., Cotta, M.A., and Jeffries, T.W. Bacteria engineered for fuel ethanol production: current status. Appl. Microbiol. Biotechnol. 2003, 63, 258–266. 74. Zhang, M., Eddy, C., Deanda, K., Finkestein, M., and Picataggio, S. Metabolic engineering of a pentose metabolism pathway in ethanologenic Zymomonas mobilis. Science. 1995, 267, 240–243. 75. Deanda, K., Zhang, M., Eddy, C., and Picataggio, S. Development of an arabinose-fermenting Zymomonas mobilis strain by metabolic pathway engineering. Appl. Environ. Microbiol. 1996, 62, 4465–4470. 76. Parker, C., Barnell, W.O., Snoep, J.L., Ingram, L.O., and Conway, T. Characterization of the Zymomonas mobilis glucose facilitator gene-product (Glf) in recombinant Escherichia coli: examination of transport mechanism, kinetics and the role of glucokinase in glucose transport. Mol. Microbiol. 1995, 15, 795–802. 77. Mohagheghi, A., Evans, K., Chou, Y.C., and Zhang, M. Cofermentation of glucose, xylose, and arabinose by genomic DNA-integrated xylose/arabinose fermenting strain of Zymomonas mobilis AX101. Appl. Biochem. Biotechnol. 2002, 98, 885–898. 78. Rogers, P.L., Lee, K.J., and Tribe, D.E. Kinetics of alcohol production by Zymomonas mobilis at high sugar concentrations. Biotechnol. Lett. 1996, 1, 165–170. 79. Lawford, H.G. and Rousseau, J.D. Performance testing of Zymomonas mobilis metabolically engineered for cofermentation of glucose, xylose, and arabinose. Appl. Biochem. Biotechnol. 2002, 98, 429–448. 80. Lawford, H.G., Rousseau, J.D., Mohagheghi, A., and McMillan, J.D. Fermentation performance characteristics of a prehydrolyzate-adapted xylose-fermenting recombinant Zymomonas in batch and continuous fermentations. Appl. Biochem. Biotechnol. 1999, 77, 191–204. 81. Baumler, D.J., Hung, K.F., Bose, J.L., Vykhodets, B.M., Cheng, C.M., Jeong, K.C., and Kaspar, C.W. Enhancement of acid tolerance in Zymomonas mobilis by a proton-buffering peptide. Appl. Biochem. Biotechnol. 2006, 134, 15–26. 82. Böck, A. and Sawers, G. 1996. Fermentation. In Escherichia coli and Salmonella: cellular and molecular biology, 2nd edn. Niedhardt F.C., Curtiss III R., Lin ECC., Low K.B., Magasanik B., Reznikoff W.S., Riley M., Schaechter M., and Umbarger, H.E. (eds). American Society for Microbiology: Washington, D.C., 262–282. 83. Kessler, D., Leibrecht, I., and Knappe, J. Pyruvate-formate-lyase-deactivase and acetyl-CoA reductase activities of Escherichia coli reside on a polymeric protein particle encoded by adhE. FEBS Lett. 1991, 281, 59–63. 84. Ingram, L.O., Conway, T.T., Clark, D.P., Sewell, G.W., and Preston, J.F. Genetic engineering of ethanol production in Escherichia coli. Appl. Environ. Microbiol. 1987, 53, 2420–2425. 85. Ohta, K., Beall, D.S., Mejia, J.P., Shanmugam, K.T., and Ingram, L.O. Genetic improvement of Escherichia coli for ethanol production: chromosomal integration of Zymomonas mobilis genes encoding pyruvate decarboxylase and alcohol dehydrogenase II. Appl. Environ. Microbiol. 1991, 57, 893–900.

24-18

Future Applications of Metabolic Engineering

86. Tao, H., Gonzalez, R., Martinez, A., Rodriguez, M., Ingram, L.O., Preston, J.F., and Shanmugam, K.T. Engineering a homo-ethanol pathway in Escherichia coli: increased glycolytic flux and levels of expression of glycolytic genes during xylose fermentation. J. Bacteriol. 2001, 183, 2979–2988. 87. Yomano, L.P., York, S.W., and Ingram, L.O. Isolation and characterization of ethanol-tolerant mutants of Escherichia coli KO11 for fuel ethanol production. J. Ind. Microbiol. Biotechnol. 1998, 20, 132–138. 88. Gonzalez, R., Tao, H., Purvis, J.E., York, S.W., Shanmugam, K.T., and Ingram, L.O. Gene array-based identification of changes that contribute to ethanol tolerance in ethanologenic Escherichia coli: comparison of KO11 (parent) to LY01 (resistant mutant). Biotechnol. Prog. 2003, 19, 612–623. 89. Zaldivar, J., Martinez, A., and Ingram, L.O. Effect of selected aldehydes on the growth and fermentation of ethanologenic Escherichia coli. Biotechnol. Bioeng. 1999, 65, 24–33. 90. Lawford, H.G. and Rouseau, J.D. Studies on nutrient requirements and cost-effective supplements for ethanol production by recombinant Escherichia coli. Appl. Biochem. Biotechnol. 1996, 57-58, 307–326. 91. Martinez, A., York, S.W., Yomano, L.P., Pineda, V.L., Davis, F.C., Shelton, J.C., and Ingram, L.O. Biosynthetic burden and plasmid burden limit expression of chromosomally integrated heterologous genes (pdc, adhB) in Escherichia coli. Biotechnol. Prog. 1999, 15, 891–897. 92. Underwood, S.A., Buszko, M.L., Shanmugam, K.T., and Ingram, L.O. Flux through citrate synthase limits the growth of ethanologenic Escherichia coli KO11 during xylose fermentation. Appl. Environ. Microbiol. 2002, 68, 1071–1081. 93. Underwood, S.A., Zhou, S., Causey, T.B., Yomano, L.P., Shanmugam, K.T., and Ingram, L.O. Genetic changes to optimize carbon partitioning between ethanol and biosynthesis in ethanologenic Escherichia coli. Appl. Environ. Microbiol. 2002, 68, 6263–6272. 94. Postma, P.W., Lengeler, J.W., and Jacobson, G.R. Phosphoenolpyruvate:carbohydrate phosphotransferase systems. In Escherichia coli and Salmonella: Cellular and molecular biology, 2nd edn. Niedhardt F.C., Curtiss III R., Lin, ECC., Low, K.B., Magasanik, B., Reznikoff, W.S., Riley, M., Schaechter, M., Umbarger, HE (eds.). American Society for Microbiology: Washington, D.C., 1149–1174. 95. Hernández-Montalvo, V., Valle, F., Bolivar, F., and Gosset, G. Characterization of sugar mixtures utilization by an Escherichia coli mutant devoid of the phosphotransferase system. Appl. Microbiol. Biotechnol. 2001, 57, 186–191. 96. Nichols, N.N., Dien, B.S., and Bothast, R.J. Use of catabolite repression mutants for fermentation of sugar mixtures to ethanol. Appl. Microbiol. Biotechnol. 2001, 56, 120–125. 97. Flores, N., Xiao, J., Berry, A., Bolivar, F.F., and Valle, F. Pathway engineering for the production of aromatic compounds in Escherichia coli. Nature. 1996, 4, 620–623. 98. Lindsay, S.E., Bothast, R.J., and Ingram, L.O. Improved strains of recombinant Escherichia coli for ethanol production from sugar mixtures. Appl. Microbiol. Biotechnol. 1995, 43, 70–75. 99. Podschun, R. and Ullmann, U. Klebsiella spp. as nosocomial pathogens: epidemiology, taxonomy, typing methods, and pathogenicity factors. Clin. Microbiol. Rev. 1998, 11, 589–603. 100. Ohta, K., Beall, D.S., Mejia, J.P., Shanmugam, K.T., and Ingram, L.O. Metabolic engineering of Klebsiella oxytoca M5A1 for ethanol production from xylose and glucose. Appl. Environ. Microbiol. 1991, 57, 2810–2815. 101. Wood, B.E. and Ingram, L.O. Ethanol-production from cellobiose, amorphous cellulose, and crystalline cellulose by recombinant Klebsiella oxytoca containing chromosomally integrated Zymomonas mobilis genes for ethanol-production and plasmids expressing thermostable cellulase genes from Clostridium thermocellum. Appl. Environ. Microbiol. 1992, 58, 2103–2110. 102. Wood, B.E., Yomano, L.P., York, S.W., and Ingram, L.O. Development of industrial-medium-required elimination of the 2,3-butanediol fermentation pathway to maintain ethanol yield in an ethanologenic strain of Klebsiella oxytoca. Biotechnol. Prog. 2005, 21, 1366-1372. 103. Beall, D.S. and Ingram, L.O. Genetic engineering of soft-rot bacteria for ethanol production from lignocellulose. J. Indust. Microbiol. 1993, 11, 151–155.

Metabolic Engineering for Alternative Fuels

24-19

104. Guedon, E., Desvaux, M., and Petitdemange, H. Improvement of cellulolytic properties of Clostridium cellulolyticum by metabolic engineering. Appl. Environ. Microbiol. 2002, 68, 53–58. 105. Zhou, S., Davis, F.C., and Ingram, L.O. Gene integration and expression and extracellular secretion of Erwinia chrysanthemi endoglucanase CelY (celY) and CelZ (celZ) in ethanologenic Klebsiella oxytoca P2. Appl. Environ. Microbiol. 2001, 67, 6–14. 106. Zhou, S.F. and Ingram, L.O. Simultaneous saccharification and fermentation of amorphous cellulose to ethanol by recombinant Klebsiella oxytoca SZ21 without supplemental cellulase. Biotechnol. Lett. 2001, 23, 1455–1462. 107. Fujita, Y., Ito, J., Ueda, M., Fukuda, H., and Kondo, A. Synergistic saccharification, and direct fermentation to ethanol, of amorphous cellulose by use of an engineered yeast strain codisplaying three types of cellulolytic enzyme. Appl. Environ. Microbiol. 2004, 70, 1207–1212. 108. Kalscheuer, R., Stölting, T., and Steinbüchel, A. Microdiesel: Escherichia coli engineered for fuel production. Microbiology. 2006, 152, 2529–2536. 109. Lu, X.F., Vora, H., and Khosla, C. Overproduction of free fatty acids in E. coli: Implications for biodiesel production. Metab. Eng. 2008, 10, 333–339. 110. Lee, S.Y., Park, J.H., Jang, S.H., Nielsen, L.K., Kim, J., and Jung, K.S. Fermentative butanol production by clostridia. Biotechnol. Bioeng. 2008, 101, 209–228. 111. Atsumi, S., Hanai, T., and Liao, J.C. Non-fermentative pathways for synthesis of branched-chain higher alcohols as fuels. Nature. 2008, 451, 86–89. 112. Withers, S.T., Gottlieb, S.S., Lieu, B., Newman, J.D., and Keasling, J.D. Identification of isopentenol biosynthetic genes from Bacillus subtilis by a screening method based on isoprenoid precursor toxicity. Appl. Environ. Microbiol. 2007, 73, 6277–6283. 113. Yazdani, S.S. and Gonzalez, R. Anaerobic fermentation of glycerol: a path to economic viability for the biofuels industry. Curr. Opin. Biotechnol. 2007, 18, 213–219. 114. Dharmadi, Y., Murarka, A., and Gonzalez, R. Anaerobic Fermentation of glycerol by Escherichia coli: a new platform for metabolic engineering. Biotechnol. Bioeng. 2006, 94, 821–829. 115. Yazdani, S.S. and Gonzalez, R. Engineering Escherichia coli for the efficient conversion of glycerol to ethanol and co-products. Metab. Eng. 2008, 10, 340–351. 116. Mukhopadhyay, A., Redding, A.M., Rutherford, B.J., and Keasling, J.D. Importance of systems biology in engineering microbes for biofuel production, Curr. Opin. Biotechnol. 2008, 19, 228–234. 117. Park, J.H., Lee, S.Y., Kim, T.Y., and Kim, H.U. Application of systems biology for bioprocess development. Trends Biotechnol. 2008, 26, 404–412. 118. Feist, A.M. and Palsson, B.O. The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli. Nat. Biotechnol. 2008, 26, 659–667. 119. Steele, H.L., Jaeger, K.E., Daniel, R., and Streit, W.R. Advances in recovery of novel biocatalysts from metagenomes. J. Mol. Microbiol. Biotechnol. 2009, 16, 25–37. 120. Shi, D.J., Wang, C.L., and Wang, K.M. Genome shuffling to improve thermotolerance, ethanol tolerance and ethanol productivity of Saccharomyces cerevisiae. J. Ind. Microbiol. Biotechnol. 2009, 36, 139–147. 121. Petri, R. and Schmidt-Danner, C. Dealing with complexity: evolutionary engineering and genome shuffling. Curr. Opin. Biotechnol. 2004, 15, 298–304. 122. Alper, H., Moxley, J., Nevoigt, E., Fink, G.R., and Stephanopoulos, G. Engineering yeast transcription machinery for improved ethanol tolerance and production. Science. 2006, 314, 1565–11568. 123. Chatterjee, R. and Yuan, L. Directed evolution of metabolic pathways. Trends Biotechnol. 2006, 24, 28–38.

Index A ABC transporters, ATP hydrolysis energy of active transport, 21-12 Acetobacter spp. construction of BioBrick versions of chromosome sequences and selectable markers, 3-6 integration of foreign DNA, 3-6 recombination efficiency, 3-5 to 3-6 study of acetate tolerance in, 1-9 transcription regulation and translation, 3-2 Acetyl-CoA flow in glyoxylate pathway and succinate production, 4-4 Activators and interaction, 3-2 Adenine riboswitch, 3-10; see also Riboswitches Adenosylcobalamin (Coenzyme B12) riboswitch, 8-14 Advanced RNA-based control systems, 8-26 to 8-27 Aerobic succinic acid producing pathways, 21-13 Agrobacterium radiobacter epoxide hydrolase by epPCR and DNA shuffling, 2-24 Agrobacterium tumefaciens with IFS plasmid for transfection of tobacco plants, 22-5 Amino acid substitutions by recombination process, 5-2 Amorphadiene production, 12-5 AmpR Cre-expressing plasmid, 7-14 Anaerobic succinic acid production, 21-13 Anaerobiospirillium succiniciproducens gram-negative anaerobes and succinic acid production, 21-13 Anthocyanins from flovanones naringenin and eriodictyol, 22-5 Anticancer drugs DNR and DXR, 23-15 to 23-19 epothilone, 23-21 heterologous strain, Myxococcus xanthus, 23-22 to 23-24 novel epothilone analogs, 23-22

paclitaxel, 23-21 biosynthetic pathway for, 23-20 metabolic engineering techniques, 23-19 Antimicrobial drugs artemisinin fermentation yields, optimization of, 23-15 limited pathway reconstitution, 23-13 to 23-14 novel analogs, bioconversion of, 23-15 placing biosynthetic genes under, 23-15 erythromycin heterologous biosynthesis, 23-10 to 23-11 precursor directed biosynthesis, 23-11 to 23-12 S. erythrea strain improvement and, 23-8, 23-10 penicillin, 23-4 biosynthetic genes, overexpression of, 23-5 to 23-6 genome-wide analysis of, 23-6 to 23-7 heterologous host, biosynthetic pathway reconstitution in, 23-7 metabolic engineering techniques for biosynthesis, 23-5 metabolic precursors, 23-7 Antisense RNA molecules, 8-6 to 8-7 araBAD promoter expression system, 12-4 on medium-copy number plasmids and production of lycopene in E. coli, 4-5 Arabinose-inducible araBAD promoter (pKLJ12), 6-4 Arabinose upstream pathways, 24-7 Arachidonic acid, elongation and desaturation, 22-15 Aromatic amino acids biosynthesis with shikimate pathway, 22-17 to 22-18 AROM multifunctional enzyme, 11-7 Artemisinin in Artemisia annua, 22-8 biosynthesis genes for, 23-15 pathway in Artemisia annua L., 23-14 fermentation yields, optimization of, 23-15 limited pathway reconstitution, 23-13 to 23-14

I-1

I-2 metabolic engineering techniques, 23-13 novel analogs, bioconversion of, 23-15 Artificial chromosome, use in metabolic engineering, 6-10 large DNA fragments, maintenance of, 6-12 to 6-13 for novel metabolic control systems, 6-11 Ashbya gossypii vitamin riboflavin production, 21-17 Aspartate aminotransferase (AATase), 2-24 Aspergillus niger heme-containing peroxidases production, 10-6 Assembly of designed oligonucleotides (ADO), 2-11 ATP concentration ATP dissipating flux and, 21-17 during cell-free reactions, 16-8 pool, 11-3

B Bacillus cerus Phospholipase C (PLCBc), 2-23 Bacillus licheniformis catalytic efficiency, 2-23 sucrose production, 21-17 Bacillus subtilis glucosamine-6-phosphate (GlcN6P) synthesis in, 3-11 vitamin riboflavin production, 21-17 Bacterial artificial chromosomes (BACs), 6-12, 7-5 library, 6-13 for metagenomes, 6-12 profile metagenomes, used for, 6-12 Binding protein, 10-3 to 10-4 Biocatalyst engineering, 24-5 to 24-6 Biocatalytic oxyfunctionalization fermentation processes, 21-1 to 21-4 biological energy issues, 21-11 to 2-18 intracellular processes interfering with, 21-8 microbial energetics and biotechnological applications, 21-5 to 21-11 whole-cell oxyfunctionalization process biological energy issues in, 21-18 to 21-23 examples for, 21-6 Biodiesel production, 24-13 Bioreactor-based systems for production of flavonoids, 22-3 Bloom’s model, 5-4 Budding yeast metabolism, 17-16

C Calnexin (CNX), 10-6 Calnexin (CNX)-calreticulin (CRT) chaperone cycle, 10-5

Index Calvin cycle, 11-5 Candida antarctica, catalytic efficiency, 2-23 Candida famata vitamin riboflavin production, 21-17 Capillary electrophoresis-mass spectrometry, 14-6 to 14-8 γ-Carboxylation system, 10-13 heterologous proteins production, 10-16 in vitro preparation, 10-15 VKD-proteins and, 10-14 VKORC1 gene, 10-15 β-Carotene production, 6-6, 22-8 Carotenoid biosynthesis of, 4-7 C30 carotenoid synthase from Staphylococcus aureus, 4-8 enzymes, 4-8 Cartesian co-localization microtiter plates, 2-18 to 2-19 protein chips, 2-19 CAT protein production, 16-8 Cell-free biology, 16-1 to 16-2 Cell-free reaction; see also Cell-free systems, for metabolic engineering ATP concentration during, 16-8 CAT protein production, 16-8 reaction components for reactions, 16-9 relative accumulation of radioactivity in metabolites during, 16-8 Cell-free systems, for metabolic engineering, 16-2 to 16-3 cell-free reactions, activating complex metabolism in, 16-5 to 16-7 disulfide-bonded proteins and, 16-9 to 16-10 engineering, cell extract, 16-3 to 16-5 Cell growth and strain improvement strategies, 4-9 Cellular properties, improvement of, 12-9 global regulatory functions, 12-10 substrate and product toxicity, elimination of, 12-10 substrate range, extension of, 12-10 Cellulose hydrolysis, 24-5 C15 farnesyl diphosphate molecules and carotenoids, 4-7 Chalcones, flavanones from, 22-3 Chaperones and folding catalysts binding protein, 10-3 to 10-5 calnexin (CNX), 10-5 to 10-6 PDI, 10-2 to 10-3 signal peptide of polypeptide, 10-6 Chemoautotrophs, 11-5 Chloramphenicol and cell growth, 2-22 chloramphenicol acetyl transferase (CAT), 16-4 sensitivity and cat-sacB cassette, 7-10

I-3

Index Chlorpyrifos toxin and DNA shuffling, 2-22 Cholesterol lowering statins, 23-24 to 23-25 lovastatin fermentation yields, optimization of, 23-27 strain improvement, 23-25, 23-27 pravastatin and compactin, 23-27 to 23-29 Chromosomal engineering, 6-6 to 6-7 manipulations Red/ET and Cre/Flp site-specific recombination (SSR), 7-13 Cloning vectors characteristics, 6-2 to 6-3 Clostridium thermocellum, cellulases production, 24-5 Codon Coli 2 table, 9-3 negative effect of, 9-4 to 9-5 optimization by probability score, 9-6 to 9-7 using CAI = 1 algorithm, 9-6 usage in different hosts, 9-3 to 9-4 space, 9-4 Coenzyme Q10 (CoQ10) biosynthesis genes involved in, 22-10 recombinant DNA technology for, 22-9 terpenoids, 22-10 to 22-11 artemisinin, 22-10 taxol, 22-11 to 22-12 ColE1-type high-copy plasmids, 6-11 ColE1-type plasmids (ColE1, pMB1), 6-2 ColE1-type pUC vectors, 6-4 Combinatorial consensus mutagenesis (CCM), 2-17, 5-7 to 5-8; see also Directed mutagenesis Combined codon usage tables, 9-3 Combined genome-wide analysis, 13-13 to 13-15 Comparative genome analysis, 12-7 to 12-8 Compartmentalization in vitro compartmentalization (IVC), 2-20 to 2-21 in vivo, 2-20 Constitutive/ inducible promoter, 6-2 Cordyceps unilateralis BCC 1869, pathogenic fungus, 22-20 Corynebacterium glutamicum, lysine production, 21-15 Counter-selection schemes, 7-10 to 7-11 Cre expression, 7-14 Cross-feeding adaptations, 1-7 13C-tracer experiments, 20-3; see also Metabolic flux analysis 13C-tracer based flux analysis, 21-15

D Daunorubicin (DNR), 23-15 baumycin-like glycosides, hyperglycosylated, 23-18 biosynthetic pathway for, 23-17 metabolic engineering techniques, 23-16 Degenerate homoduplex recombination (DHR), 2-13 Degenerate oligonucleotide gene shuffling (DOGS), 2-10 Dehydrogenase induction and growth rate, 4-9 Deoxyerythronolide B synthase (DEBS), 22-13 to 22-14 Deoxy-xylulose-P synthase (dxs) in carotenoid biosynthesis, 4-5 to 4-6 Dicyclohexylcarodiimide (DCCD) by ATPase activity inhibition, 21-15 Directed evolution, 2-1, 2-3 assays, tools, 2-17 Cartesian co-localization, 2-18 to 2-19 compartmentalization, 2-20 to 2-21 mutant libraries generation, tools for, 2-3 gene recombination, 2-7 to 2-15 random mutagenesis, 2-4 to 2-7 semirational approaches, 2-15 to 2-17 patent and licensing issues, 2-26 scheme of, 2-2 successes and applications of catalytic activity, 2-23 to 2-24 enzyme stability, 2-24 to 2-25 natural evolution, 2-25 to 2-26 proteins with therapeutic value, 2-25 target picking, 2-2 Directed mutagenesis combinatorial consensus mutagenesis (CCM), 5-7 to 5-8 structure-guided consensus mutagenesis, 5-9 to 5-10 structure-guided saturation mutagenesis, 5-9 DNA microarray, 13-2 applications of, 13-5 to 13-6 fundamentals of, 13-3 to 13-4 reliability and reproducibility, 13-6 DNA-protein tagging, 2-22 DNaseI fragmented genes with random nucleotides, 2-7 DNA sequence features, during design codon context, 9-9 downstream region, 9-8 to 9-9 GC content, 9-7 mRNA secondary structure, 9-8 sequence motifs, 9-7 to 9-8 wild-type sequence, 9-9

I-4 DNA shuffling technology Agrobacterium radiobacter, 2-24 DNaseI-based, 2-10 on E. coli aspartate aminotransferase (AATase) mutant with TATase, 2-24 NExT DNA Shuffling, 2-10 patents for, 2-26 random nontemplated insertions with, 2-7, 2-9 and toluene ortho-monooxygenase activity, 4-6 DNA tagging, 2-17 to 2-18 Docosahexaenoic acid, elongation and desaturation, 22-15 Doxorubicin (DXR), 23-15 baumycin-like glycosides, hyperglycosylated, 23-18 biosynthetic pathway for, 23-17 metabolic engineering techniques, 23-16 Drug cassettes for λ red recombineering, 7-7 Dual-phase production system for succinic acid production, 21-13 Dynamic in silico simulation, 17-9 practical applications budding yeast metabolism, 17-16 to 17-18 innate immune signaling, 17-18 to 17-19 spatiotemporal stochastic simulation algorithm, 17-14 to 17-15 stochastic spatiotemporal simulations, 17-11 to 17-16 theoretical illustration, 17-9 to 17-11

E Electrospray ionization (ESI), 2-18 Emden-Meyerhoff-Parnas pathway, 21-8 Engineering multifunctional enzymes, 11-7 to 11-9 Enhancers activators bind sites, 3-2 Entner-Doudoroff pathway, 21-8 Enzyme-linked immunosorbent assay (ELISA), 2-18 Enzyme-to-enzyme channeling, 11-2 Calvin cycle enzyme interactions, 11-3 Epothilone, 23-21 analogs through bioconversion, 23-22 assembly by PKS encoding genes, 23-23 heterologous strain, Myxococcus xanthus, 23-22 to 23-24 metabolic engineering techniques, 23-22 synthesized naturally in Sorangium cellulosum, 23-22 ER-associated degradation (ERAD) pathway, 10-5 Ergosterol biosynthetic pathway in S. cerevisiae, 4-3 Error-prone polymerase chain reaction (epPCR), 2-5 to 2-6 Pseudomonas aeruginosa, 2-24 and saturation mutagenesis methods, 4-4 to 4-5

Index Error-prone rolling circle amplification (epRCA), 2-6 Erwinia herbicola for lycopene synthesis in E. coli, 6-9 Erythromycin; see also Antimicrobial drugs biosynthesis heterologous, 23-10 to 23-11 polyketide synthase (PKS) assembly line and, 23-9 precursor directed, 23-11 to 23-12 metabolic engineering techniques, 23-9 Saccharopolyspora erythrea strain improvement and, 23-8, 23-10 synthesis from, 22-13 Escherichia coli acetate tolerance in, 1-9 biopolymer poly-3-hydroxybutyrate (PHB) production in, 6-6 carotenoid biosynthesis in, 22-8 Catharanthus roseus for biosynthesis of hydroxylated flavonols, 22-6 cyclohexanone to ε-caprolactone, 21-17 6-deoxyerythronolide B (6dEB) pathway in, 21-16 to 21-17 DNA insertion in, 7-20 to 7-21 E. coli MutSLH MMR system, 7-9 engineering for synthesis of human insulin, 22-2 fermentative pathways in, 24-10 glutathione degraded chlorinated ethenes overexpression, 4-6 heteroduplex recombination, 2-11 indigo formation from tryptophane by, 21-19 λ Red + Gam (RecET + λ Gam) recombination system, 7-2 markerless gene replacement using Red/ET recombineering in, 7-15 in metabolic engineering, 24-9 to 24-11 mevalonate-independent pathway for production of IPP in, 6-9 pet operon into pfl locus of, 24-10 phenylalanine biosynthesis, chorismate in, 22-17 to 22-18 plant-specific anthocyanin pathway in, 4-3 polyketide antibiotics, production in, 4-3 polyphosphate (polyP) in, 6-8 protein expression of phosphotriesterase and growth rate, 4-6 recA-mediated homologous recombination in recBC sbcA E. coli mutant, 2-11 recombinant carotenoid pathways, 4-5 recombinant for flavonoid biosynthesis, 22-5 to 22-6 Red/ET recombination system, 3-3 rpoH gene, translational thermoregulation in expression, 3-8 sacB, expression of, 7-10

I-5

Index site-specific recombination systems for drug marker eviction in, 7-11 studies of homologous recombination pathways in, 7-1 to 7-2 succinic acid production, 21-13 terpenoid production in, 4-4 TPP synthesis and coenzyme-B12 transport in, 3-10 transcription regulation and translation, 3-2 xylene monooxygenase (XMO) in, 21-19 Z. mobilis ethanol pathway in, 21-14 Ethanol directed evolution of KO11 by, 24-10 to 24-11 production from biomass, 24-3 to 24-4 Evolutionary engineering, 1-1 concurrent selection of objectives, 1-10 evolutionary design of enzymes uses in vitro evolution methods, 4-4 microbial evolution, 1-3 evolution process, 1-4 genotype changes, 1-5 selection criteria, 1-11 step and evolution, comparison of, 1-2 to 1-3 strains production, 1-2 substrate utilization, improvement of, 1-9 to 1-10 stress tolerance, 1-9 whole-cell metabolic engineering design, 1-2 Evolution systems, in vitro, 1-7 to 1-8 Exon shuffling, 2-13 Expression vectors, 12-5

F FamClash algorithm, 5-14 to 5-15; see also Recombineering technology FamClash computational methods, 2-17 Fed-batch processes erythromycin production, 21-16 Feedstock engineering energy crops, 24-4 to 24-5 Fermentation process, biological energy issues in primary metabolites, 21-11 to 21-15 secondary metabolites, 21-15 to 21-18 Fitness landscape, 1-4; see also Microbial evolution FLAG tag sequences in Cam/Kan drug cassette, 7-16 Flavin mononucleotide (FMN) riboswitch, 8-14 Flavonoids, 22-2 biosynthesis of, 22-3 metabolic network for, 22-4 plant cell cultures, production in, 22-3, 22-5 recombinant hosts E. coli, 22-5 to 22-6 yeast biosynthesis, 22-6

FLIRT system, 7-20 Flp and Cre site-specific recombination (SSR) systems, 7-11 Flp recombinase system, 7-16 Fluorescence activated cell sorting (FACS) system, 3-14, 8-24 Flux analysis by whole isotopomer modeling, 20-6 Fluxes in glycolysis, analysis, 20-8 Fluxome analysis, 12-9 Food products flavors development of, 22-15 to 22-16 precursor biosynthesis, 22-17 to 22-19 raspberry ketone and raspberry alcohol, 22-19 vanillin, 22-19 pigments, 22-20 Frame shuffling, 2-7 Furin cleavage, 10-16

G β-Galactosidase activity, 6-11 Gam protein and recBC nuclease, 3-5 Gas chromatography, 14-3 GC-MS technology, 14-3 to 14-4 Gene discovery, 12-2 expression, 3-11 gorging technique, 3-3 GeneAmp XL PCR kit, 3-4 Gene recombination technique, 2-7 homology-dependent, 2-8 and independent, 2-11 to 2-14 oligonucleotide and primer-dependent reassembly, 2-10 to 2-11 primerless fragment-reassembly, 2-9 to 2-10 in vivo techniques, 2-11 Gene site saturation mutagenesis (GSSM), 2-17 Genome reduction, 7-18 Red/ET recombineering and RecA-mediated dsDNA break repair, 7-19 Genome-scale models, 15-3 to 15-4 Genome shuffling and pesticide pentachlorophenol (PCP) by Sphingobium chlorophenolicum, 4-9 to 4-10 Genotype changes single nucleotide polymorphisms (SNPs), 1-5 evolutionary phenotype changes, 1-6 to 1-7 genome rearrangements, 1-5 to 1-6 mutator cells, 1-6 glk and ptnABCD genes, 13-5 glmS riboswitch, 3-11; see also Riboswitches Glucosamine-6-Phosphate (GlcN6P) riboswitch, 8-15 Glutamate synthesis in E. coli study for growth rate and biomass efficiency, 1-6

I-6 Glutathione in biotransformation of chlorinated ethenes, 4-6 Glycine riboswitch, 8-15 Glycoengineering, 10-6 to 10-8 in insects CMP-N-acetylneuraminic acid, biochemical pathway for, 10-11 CMP-SAS and SAS, 10-10 glycosylation processing pathway, 10-10 mammalian cells, 10-8 antibody with glycans, 10-9 GnT-III and mannosidase II (ManII), 10-10 N-glycosylation, 10-9 recombinant human erythropoeitin (rHuEPO), 10-9 transferrin and, 10-9 plants, 10-11 to 10-12 yeast, 10-12 to 10-13 gmk gene overexpression, 6-10 Green cat cassette, chloramphenicol resistance gene and green fluorescent protein, 7-11 Guanine riboswitch, 3-10; see also Riboswitches Guiding directed evolution models, 5-3

H Haematococcus pluvialus pigment study, 22-20 Heterologous metabolic pathway, 12-5 High throughput screening method for synthetic riboswitches creating libraries, 3-14, 12-2 individual assay for activity, 3-16 sample screening protocol, 3-15 Homology-independent gene recombination, 2-11, 2-16 multiple-parent nonhomologous gene recombination, 2-13 to 2-15 two-parent nonhomologous gene recombination, 2-12 to 2-13 Host metabolic network, 4-2 to 4-4 Human Genome Project, 6-12 Hydrophilic interaction chromatography (HILIC), 14-6 Hypermutating B cells as tool for generating mutations in vivo, 2-4

I Improving expression by modifying gene, 9-5 to 9-6 by modifying host, 9-4 to 9-5 Incremental truncation for the creation of hybrid enzymes (ITCHY), 2-12 Insertion and deletion (indel) mutagenesis, 2-6 to 2-7; see also Gene recombination

Index λ Integrase family of site-specific recombinases, 7-11 IPTG-dependent promoter IPTG-inducible lac promoter system, 12-4 IPTG-inducible tac promoter (pKLJ03), 6-4 and wild type rpsL overexpression, 7-11 Isoprenoids, 22-6 carotenoids microbial biosynthesis of, 22-7 to 22-8 coenzyme Q10 (CoQ10), 22-9 genes involved in, 22-10 terpenoids, 22-10 to 22-11 Isotopically nonstationary 13C metabolic flux analysis, 19-9 to 19-11 Isotopic tracer experiments with GC-MS analysis of labeling patterns, 18-6

K Kanamycin antibiotics, 3-11 kcat of multienzyme complex, 11-7 Klebsiella oxytoca as biocatalyst for ethanol fermentation, 24-11 to 24-12 in metabolic engineering, 24-11 K43R mutant protein, 7-10 Kyoto Encyclopedia of Genes and Genomes (KEGG), 15-2

L Labeling analysis, 18-3 to 18-5 lac- and PBAD-type promoters, 12-4 Laccase detoxification by phenolic compounds, 24-8 lac promoter, 3-2 randomization of, 3-4 Lac repressor-peptide fusion binding with lacO DNA sequences on plasmid for plasmid display, 2-22 Lactic acid fermentations with LAB, 21-11 Lactococcus lactis heme auxotrophy with reconstitution of functional cytochrome in, 21-12 lacZα gene expression, 6-3 lacZ and gfp expression, 6-11 lacZ reporter fusion with alcAp promoter, identification of expression levels of ACVS gene, 23-5 Large insertions, 7-19 to 7-21 Laser-induced fluorescence (LIF), 14-7 Ligand binding domain (LBD), 2-24 Linear DNA substrates and RecBCD dsDNA exonuclease, 7-2

Index Liquid chromatography-mass spectrometry, 14-4 to 14-6 Local node flux analysis, 20-7 to 20-8 pentose phosphate pathway split ratio, 20-11 to 20-13 Lovastatin; see also Cholesterol lowering statins fermentation yields, optimization of, 23-27 strain improvement, 23-25, 23-27 Low-copy plasmids, in metabolic engineering application in, 6-5 arabinose-inducible araBAD promoter, 6-9 to 6-10 case studies, overexpression of, 6-7 dxs from mini-F plasmid, 6-9 to 6-10 guaBA from pSC101-derived plasmid, 6-10 observed with use of, 6-9 ppk from mini-F plasmid, 6-8 comparison with high copy, 6-9 expression level and, 6-5 to 6-7 metabolic pathways in, 6-8 representative of, 6-3 tac promoter use in, 6-9 Lower copy vectors, 3-3 Lpp-OmpA (lipoprotein-outer membrane protein A) fusion, 2-21 Lycopene production, 22-8 in E. coli, 6-9 by overexpression, 6-9 to 6-10 and in silico model, 4-8 Lysine riboswitch, 8-14 to 8-15

M Macrophage colony-stimulating factor (M-CSF), 10-4 Mammalian VKD propeptides, 10-15 Mannheimia succiniciproducens gram-negative anaerobes and succinic acid production, 21-13 Matrix-assisted laser desorption-ionization (MALDI), 2-18 Maximum efficiency (MAX) randomization, 2-17 Megalomicin production by Micromonospora megalomicea, 23-10 Metabolic burden effect in engineered metabolic systems, 6-4 Metabolic channeling in primary metabolism, 11-3 to 11-5 in secondary metabolism, 11-5 to 11-7 Metabolic engineering, 15-2, 23-1 based on high-throughput technologies, 15-4 utilizing single high-throughput technologies, 15-4 to 15-5 biocatalyst for, 24-5 to 24-6 biofuel platforms, 24-12 to 24-13 cell-free systems for, 16-2 to 16-3

I-7 cellulose hydrolysis, 24-5 cellulosic bioethanol, 24-1 to 24-2 development of in silico methods for, 15-8 to 15-9 minimization of metabolic adjustment (MOMA) algorithm, 15-8, 15-9 regulatory on/off minimization (ROOM), 15-8, 15-9 Escherichia coli, 24-9 to 24-11 ethanol production from biomass, 24-3 to 24-4 feedstock engineering energy crops, 24-4 to 24-5 next generation feedstock, 24-2 genome-scale models, 15-5 to 15-6 flux balance analysis (FBA), 15-6, 15-8 integration with heterogeneous data, 15-9 to 15-10 integration of high-throughput data sets, 15-5 Klebsiella oxytoca, 24-11 to 24-12 native cellulolytic strategy, 24-12 recombinant cellulolytic strategy, 24-12 Saccharomyces cerevisiae, 24-6 to 24-8 tools for translation control, 9-9 to 9-10 Zymomonas mobilis, 24-8 to 24-9 Metabolic flux analysis, 20-1 to 20-2 local node flux analysis, 20-7 to 20-8 metabolic flux studies, using GC-MS, 18-5 in vivo carbon flux distribution in central metabolism, 18-7 by whole isotopomer modeling, 20-6 Metabolic pathways evolutionary strategies for screen large libraries, 4-10 to 4-11 regulation, 12-3 to 12-4 Metabolic systems engineering, 17-3 to 17-5; see also Metabolic engineering E-cell for, 17-7 core features of E-Cell System, 17-8 in silico methods, 17-5 to 17-6 stochastic spatiotemporal dynamics, 17-6 to 17-7 Metabolome analysis, 12-9 Metabolome Standardization Initiative (MSI), 19-8 Metabolomics analytical methods for, 19-4 to 19-6 biochemical engineering aspects of, 19-6 to 19-9 studies, 19-4 Mevalonate production optimization, in E. coli, 12-6 Mg2+ -dependent thermostable polymerase, 2-5 Micellar electrokinetic chromatography (MEKC), 14-6 Microbial biosynthesis, 22-20 to 22-21 genetic machinery, engineering, 22-21 to 22-23 new approaches, 22-21

I-8 Microbial energetics and biotechnological applications, 21-5, 21-7 biocatalytic reactions and energy metabolism, 21-8 to 21-9 biological energy generation and consumption in, 21-8 growth and biocatalysis, energy aspects of, 21-9 to 21-10 recombinant enzyme overproduction, energy aspects of, 21-11 stress metabolism during, 21-10 Microbial evolution, 1-3 evolutionary process, 1-4 genotype changes genome rearrangements, 1-5 to 1-6 mutator cells, 1-6 single nucleotide polymorphisms (SNP), 1-5 phenotype changes, 1-6 to 1-7 Microbial hosts expressing heterologous biosynthetic pathways, optimization strategies for small-molecule production, 4-5 Microtiter plates, 2-18 to 2-19 Mini-F-derived vector, 6-9 Mini-F plasmid, 6-4 Minimization of metabolic adjustment (MOMA) algorithm, 15-8 to 15-9 Molecular pathway breeding, 4-7 Monod kinetics and microbial growth, 21-5 mRNA synthesis rates, 6-4 Mucor rouxii in yeast, γ-linolenic acid accumulation, 22-15 Muller’s ratchet concept, 1-7 Multi-copy plasmids, 3-3 Multienzyme systems in primary metabolism, 11-4 Multifunctional enzyme systems, 11-7 Multiple genes in operons regulation, 12-4 to 12-5 Multiple-parent nonhomologous gene recombination degenerate homoduplex recombination (DHR), 2-13 nonhomologous random recombination (NRR), 2-13 to 2-14 Multiple sequence alignments (MSA), 2-3 Mutagenesis techniques, 2-3 gene recombination, 2-7 to 2-8 homology-dependent, 2-8 to 2-14 homology-dependent, 2-14 to 2-15 homology-independent, 2-16 random mutagenesis, 2-4 error-prone polymerase chain reaction (epPCR), 2-5 to 2-6 insertion and deletion (indel) mutagenesis, 2-6 to 2-7 sequence saturation mutagenesis (SeSaM), 2-6 summary of, 2-8

Index semirational approaches, 2-15 targeted and guided randomization, 2-17 Myricetin from Escherichia coli, 22-6

N NADPH-dependent xylose reduction to xylitol, 21-14 Naphthoquinones from plants, 22-20 Naringenin production, 4-3 Natural and engineered riboswitches, 8-11 to 8-23 composition and conformational dynamics of, 8-11 to 8-12 ligand-controlled gene regulation by, 8-12 synthetic riboswitches, controling gene expression levels, 8-16 aptamer, attachment of, 8-17 to 8-19 aptamer insertion, 8-16 to 8-17 construction framework for, 8-21 to 8-23 linker between, 8-19 to 8-21 targets and implementation in metabolic networks, 8-12 to 8-14 coenzyme B12, 8-14 FMN, 8-14 GlcN6P, 8-15 glycine, 8-15 guanine and adenine, 8-15 lysine, 8-14 to 8-15 SAM, 8-14 SAM-coenzyme B12, 8-15 to 8-16 TPP, 8-14 Natural drugs, 23-2 to 23-3 Natural evolution, 2-3, 2-25 to 2-26 Necator americanus secretory protein (Na-ASP1), 10-3 Neurospora, multifunctional enzyme, 11-4 Neutral mutations, 2-3 Nidula niveo-tomentosa used for de novo synthesis of raspberry ketone, 22-19 Nonhomologous random recombination (NRR), 2-13, 2-13 to 2-14 NOR gate behavior, 8-12 Nucleotide exchange and excision technology (NExT DNA Shuffling), 2-10 Nutrasweet process by sweetener aspartame, 22-17

O Oligonucleotide and primer-dependent reassembly, 2-10 to 2-11; see also Gene recombination Oligosaccharyltransferase (OST) complex, 10-6 to 10-7 Open reading frame (ORF), 9-1 Operators, repressors elements outside promoter, 3-2

Index Optimal pattern of tiling for combinatorial libraries (OPTCOMB), 5-21 to 5-22; see also Optimizing chimeric libraries Optimizing chimeric libraries, 5-16 disruptive nature of recombination calibrating, 5-18 to 5-19 estimating chimeric library diversity, 5-23 optimal pattern of tiling for combinatorial libraries (OPTCOMB), 5-21 to 5-22 practical considerations for library synthesis, 5-23 recombination as shortest path problem (RASPP), 5-19 to 5-21 tunable parameters during library construction, 5-17 to 5-18 OptKnock design algorithm, 1-10 to 1-11 Organophosphates as pesticides and chemical warfare agents, 4-6 Organophosphorus hydrolase (OPH), 2-23 Oxidative glucose catabolism for energy generation, 21-8 Oxygen dependent energy production, in cytomim system, 16-7

P Paclitaxel; see also Anticancer drugs biosynthetic pathway for, 23-20 limited pathway reconstitution in E. coli and yeast, 23-21 in plants, 23-21 metabolic engineering techniques, 23-19 Papaver somniferum gene-silencing, 8-24 Paracoccus haeundaensis astanxanthin biosynthesis in, 22-20 pcbC gene and penDE gene in P. chrysogenum in penicillin production, 23-5 PCR mutagenesis, 7-18 and PCR-mediated gene, 7-2 Penicillin, 23-4; see also Antimicrobial drugs biosynthesis of genes overexpression of, 23-5 to 23-6 in P. chrysogeum and in A. nidulans, 23-6 genome-wide analysis of, 23-6 to 23-7 heterologous host, biosynthetic pathway reconstitution in, 23-7 metabolic engineering techniques for biosynthesis, 23-5 metabolic precursors, 23-7 Pentapeptide scanning mutagenesis, 2-6 Pentose phosphate pathway (PPP), 21-8 analysis of fluxes in, 20-8 to 20-11 Pfu DNA polymerases, 2-10 Phenotypic microarray, 13-6 applications of, 13-7 to 13-8 fundamentals of, 13-7

I-9 Phenylalanine and in flavonoid biosynthesis, 22-3 Phenylpropanoid metabolism in plants, 11-6 Phosphenolpyruvate:carbohydrate phosphotransferase system (PTS) in active sugar transport, 24-11 Phosphoenolpyruvate (PEP)-availability for anaplerosis, 21-13 α-Phosphothionate nucleotide, alkaline labile analog, 2-6 Phosphotransferase (PTS)-based glucose uptake system, 21-13 Phosphotriesterases and hydrolytic detoxification, 4-6 Physical linkage, 2-20 cell-surface display, 2-21 to 2-22 phage display, 2-21 plasmid display, 2-22 ribosome and mRNA display, 2-22 Pichia stipitis xylose utilization, 21-14 Plant cell cultures, flavonoid production in, 22-3 to 22-5 Plasmid instability in engineered metabolic systems, 6-4 Plasmid pCL1920 and pKLJ12, 6-3 Plasmid replication, energy resources, 4-8 pUC-type Plasmids, 3-3 Polyaromatic pathway, in microorganisms, 11-4 Polyketides Saccharopolyspora erythrea, 22-13 tylosin, production of, 22-14 Polypeptide inheritance in libraries, 5-18 Polyphosphate (polyP) molecules production by enzyme polyphosphate kinase (PPK), 6-8 Polyunsaturated fatty acids (PUFAs) microbial production, 22-14 to 22-15 ω-3, ω-6, and ω-9 families, formation of, 22-16 Porphyrin production, 4-3 Primerless fragment-reassembly, 2-9 to 2-10; see also Gene recombination Promoter engineering, 7-18 library, 6-6 promoters, 12-4 Promoters, RNA polymerases transcription gene at sites, 3-2 chromosomal integration Red/ET recombination, 3-5 to 3-8 libraries, 3-3 mutagenesis and cloning randomization by whole plasmid PCR, 3-3 to 3-5 mutant promoters, 3-3 Protein engineering, 12-2 to 12-3 expression levels optimization, 4-4 to 4-5 mapping, 13-9 to 13-11

I-10 protein disulfide isomerase (PDI) catalytic oxidation and isomerase functions, 10-2 to 10-3 synthesis, energy resources, 4-8 Proteomics fundamentals of, 13-8 to 13-9 protein mapping, 13-9 to 13-11 and proteome analysis, 12-8 quantitative protein profiling, 13-11 to 13-13 proV gene of Salmonella strain LT2 and Red recombineering, 7-16 to 7-17 pSC101-based low-copy plasmid, 6-3 Pseudomonas putida alkane degradation pathway of, 21-19 to 21-20 Red/ET recombineering in, 7-21 to 7-22 S12 and bioproduction of phenol from glucose, 4-9 Pseudomonas stutzeri half-life of NAD(P)H oxidizing phosphite dehydrogenase improvement, 2-24 pUC18 and pUC19 plasmids, 6-2 Purine-responsive riboswitches, 8-15

Q Quercetin from Escherichia coli, 22-6

R Ralstonia eutropha and reverse engineering, 22-20 Random chimeragenesis on transient templates (RACHITT), 2-10 Random insertional-deletional strand exchange (RAISE), 2-7 Random insertion/deletion (RID) technique, 2-7 Random mutagenesis, 5-3 caveats, 5-6 to 5-7 of chromosomal gene by λ Red/RT recombineering technology, 7-17 parental proteins, 5-4 to 5-5 protocol identification, 5-5 to 5-6 Random point mutagenesis techniques, 2-5 Random priming in vitro recombination (RPR), 2-10 RecA-dependent pathway, 7-4 RecBCD dsDNA exonuclease, 7-2 recE and recT genes in prophage, recombination, 3-5 RecE and RecT rac prophage, 7-3 RecF pathways, 7-1, 7-3 Recombinant DNA technology, 6-1, 12-1 Recombinant protein production, 6-4 Recombination as shortest path problem (RASPP), 5-19 to 5-21; see also Optimizing chimeric libraries

Index Recombineering technology, 7-5 development of, 7-5 electrocompetent/recombinogenic E. coli cells, 7-7 electroporation and plating, 7-7 to 7-8 FamClash algorithm, 5-14 to 5-15 λ Red, source of, 7-5 to 7-6 recombinants, selection and verification of, 7-8 residue clash maps (RCM), 5-13 Schema algorithm, 5-10 to 5-13 with single-stranded DNA oligos, 7-9 to 7-10 statistical coupling analysis (SCA), 5-15 to 5-16 substrates preparation of, 7-6 to 7-7 red and gam from Plac promoter, 7-6 Red/ET recombination system, 3-3, 7-3 to 7-4 Acinetobacter baylyi sp. ADP1, natural transformation, 3-5 to 3-6, 3-8 primer design for overlap PCR, 3-6 assembly of integration constructs, 3-7 and long PCRs, 3-7 to 3-8 Red/ET-expressing cells in, 7-10 Red/ET recombineering, 7-2, 7-12, 7-15, 7-22 application in metabolic engineering, 7-21 to 7-22 for combinatorial biosynthesis, 7-22 PCR product generation by primers 3KO and 5KO, 7-2, 7-11 RecA-dependent pathway, 7-4 and recombineering, 7-3 rpsL gene, 7-10 to 7-11 λ Red operon over-production and recombination rates in strain, 3-5 Red-promoted PCR-mediated recombineering, 7-4 λ Red recombineering system, drug cassettes, 7-7 Regulatory on/off minimization (ROOM), algorithm, 15-8, 15-9 Repressors DNA-binding proteins, transcription control, 3-2 Residue clash maps (RCM), 5-13; see also Recombineering technology R factor-derived plasmid pSC101, 6-2 Riboswitches, 8-11 to 8-14 natural riboswitches, 3-9 RNA cleavage and splicing, regulation of, 3-11 transcription, regulation of, 3-10 translation initiation, regulation of, 3-10 synthetic riboswitches, 3-11 to 3-12 aptamer selection and, 3-12 to 3-13 design considerations for creation, 3-13 high throughput screening method, 3-14 to 3-15

I-11

Index R6K origin, trans-acting Π protein, 7-20 RNA-based regulatory systems ligand dependent natural riboswitches, 3-9 to 3-11 synthetic riboswitches, 3-11 to 3-12 ligand independent ribosome binding site (RBS), 3-8 to 3-9 trans-acting riboregulators, 3-9 transcript stability and secondary structure, 3-9 RNA elements, 8-3 based control systems, 8-23 cellular processes, role in, 8-2 control elements in metabolic network engineering, 8-23 to 8-25 and engineering, 8-3, 8-25 gene expression regulation and antisense mechanisms, 8-6 to 8-7 RNAi pathway, 8-7 to 8-9 internal ribosome entry sites (IRESes), 8-4 processing and degradation, 8-5 to 8-6 regulatory elements, 8-3 ribozymes and, 8-5, 8-6 RNA-dependent RNA polymerases (RdRPs), 8-8 RNAi pathway in, 8-7 to 8-8 RNAi-resistant genes in, 9-9 and RNA-RNA interactions, 8-10 RNase activity, 8-4 to 8-5 RNase III family of endoribonucleases, 8-4 to 8-5 self-/cis-cleaving hammerhead ribozyme, 8-6 sensory elements and molecular ligands, 8-10 to 8-11 nucleic acids binding and, 8-10 as thermosensors, 8-9 sequence-/structure- specific recognition, 8-4 Shine-Dalgarno (SD) sequence, 8-3 translation initiation, 8-3 to 8-4 tRNA overexpression strategy, 9-5 rpsL-tetA counter-selection marker, 7-17 λ Rred-promoted and SSR-mediated modification of bacterial chromosome duplications and inversions, 7-15 to 7-16 gene/operon replacement, 7-12 genome reduction, 7-18 to 7-19 insertions, 7-15 large deletions, 7-12, 7-14 promoter engineering, 7-18 random mutagenesis, 7-17 reporter fusions, 7-16 to 7-17 seamless deletions, 7-14 to 7-15 SSR-mediated excision of drug markers, 7-12 RuBisCO activity, 11-5

S Saccharomyces cerevisiae chemical bioethanol production, 21-13 to 21-14 expression of P. stipitis genes XYL1 and XYL2, 24-6 expression of XI from anaerobic rumen fungus Piromyces sp. E2 (XylA) in, 24-8 flavonoid biosynthesis in, 22-6 and Flp recombinase, 7-11 in metabolic engineering, 24-6 to 24-8 strain TMB3001 expression in, 24-7 and in vitro and in vivo recombination events, 2-11 xylose isomerase overexpression from anaerobic fungus Piromyces sp. E2 in, 21-14 xylose utilization by, 21-14 Saccharomyces Genome Database (SGD), 15-2 Saccharopolyspora erythraea, polyketide erythromycin, 21-16 S-Adenosylmethionine (SAM) riboswitch, 8-14 sbcA mutations and over-expression of recE and recT genes in prophage, 3-5 SceI-expressing plasmid, 7-14 SceI restriction enzyme in vivo expression and dsDNA break, 7-15 Schema algorithm, 5-10 to 5-13; see also Recombineering technology SCHEMA computational methods, 2-17 SCOPE computational methods, 2-17 Screening technologies for metabolites, 4-10 to 4-11 Segregational instabilities, 6-5 SELEX methods, 3-12, 8-10 to 8-11, 8-25; see also Riboswitches Self-/cis-cleaving hammerhead ribozyme, 8-6 Self-organizing map (SOM) clustering, 13-4 Semisynthetic (ss) drugs, 23-2 Sequence homology-independent protein recombination (SHIPREC), 2-12 Sequence-independent site-directed chimeragenesis (SISDC), 2-17 Sequence saturation mutagenesis (SeSaM), 2-6 Shine-Dalgarno sequences, 9-8 Short hairpin RNAs (shRNAs), 8-9 Signal peptidase, 10-6 Silent phenotypes, 1-7 Simvastatin hydroxylation to 6-α-hydroxymethyl derivative catalyzed by wildtype Nocardia spp., 21-23 Single library directed graph representation, 5-20 Single-stranded DNA annealing (SSA) pathway, 7-4 Site-specific recombination systems for drug marker eviction in E. coli, 7-11

I-12 Site-specific recombination systems (SSRs), 7-16 S-RNase polymorphic gene, 2-7 SSO-mediated recombineering, 7-9 to 7-10 SSO-mediated DNA repair, 7-5, 7-9 SSR-mediated excision of drug markers, 7-12 Staggered extension process (StEP), 2-10 Statistical coupling analysis (SCA), 5-15 to 5-16; see also Recombineering technology Streptomyces coelicolor cosmid clones, 7-21 erythromycin production, 21-16 isochromanequinone antibiotics synthesis, 22-2 Streptomyces fradiae DNA shuffling, 22-14 Streptomyces griseus study, 1-1 Structure-guided consensus mutagenesis, 5-9 to 5-10; see also Directed mutagenesis Structure-guided saturation mutagenesis, 5-9; see also Directed mutagenesis Styrene to (S)-styrene epoxide flavin dependent styrene monooxygenase (SMO) of Pseudomonas sp. strain VLB120, catalyzed by, 21-21 Succinic acid production, 21-12 to 21-13 based on comparative genome analysis, 15-7 SucR CamS colonies, 7-10 Sugar pathways and overexpression of cognate glycosyltransferases, 4-3 to 4-4 Synechocystis carboxysomes, 11-5 Synthetic evolution, 2-1 synthetic genes, 9-2 and synthetic riboswitches, 8-16 Synthetic shuffling, 2-10 Systems biology, 15-2 to 15-3

T tac and trc promoter, 3-2 Taq DNA polymerase, 2-9 Terminal deoxynucleotidyl transferase (TdT), treatment with, 2-6 Terpenoids artemisinin, 22-10 taxol, 22-11 to 22-12 Tetracycline resistant (pSC101) and kanamycin-resistant (pSC102) DNA fragments, 6-1 Tetrahydrobiopterin (BH4) de novo biosynthesis by overexpression of guaBA from pSC101-derived plasmid, 6-10 Theophylline aptamer, 3-11 Thermotoga neopolitana xylose isomerase stability, 2-24

Index Thiamine pyrophosphate (TPP) riboswitch, 8-14 synthesis, 3-10 THIO-ITCHY technique, 2-12 Thrombopoeitin (TPO), 10-6 TLR4 signaling pathways, 17-20 Tn7-based system, 7-20 Tn4430 transposable elements, 2-6 Tobramycin antibiotics, 3-11 Toll-like receptors (TLRs), 17-18 Toluene conversion 3-methylcatechol synthesis, action of toluene dioxygenase and cis-dihydrodiol dehydrogenase in P.putida F1, 21-22 toluene cis-glycol, toluene dioxygenase containing P. putida UV4 catalyzed by, 21-22 toluene dioxygenase, 2-22 T5 promoter, 6-6 Transcriptome analysis, 12-8 Transposable elements for indel mutagenesis, 2-6 Transposon kits commercial tool for generating mutations in vivo, 2-4 Tricarboxylic acid cycle (TCA), 21-8 succinyl-CoA oxidation by succinyl-CoA ligase, 21-13 Tricarboxylic acid cycle (TCA) enzymes, 11-2 True random point mutagenesis technique, 2-6 Tryptophan synthase metabolon, 11-7 Tryptophan synthesis, 22-17 to 22-18 Two-parent nonhomologous gene recombination, 2-13 SCRA TCHY, scheme of, 2-12 Tylosin commercial production, 22-14 Tyrosine aminotransferase (TATase), 2-24

V Vanillin synthesis in white-rot fungi, 22-19 Vector pSC101, 6-2 VKD γ-carboxylation system, 10-14

W Wax esters from Jojoba plant, 22-15 Whole-cell oxyfunctionalization process, biological energy issues in cyclohexanone to ε-caprolactone, conversion, 2-20 to 2-21 Dactylosporangium spp., 2-18 3,4-dimethylbenzaldehyde from pseudocumene, production of, 2-22 to 2-23

I-13

Index L-proline to trans-4-hydroxy-L-proline, hydroxylation of, 2-19 n-alkanes, hydroxylation of, 2-20 P. putida, 2-19 Pseudomonas spp., styrene monooxygenase (SMO), 2-21 simvastatin, hydroxylation of, 2-23 toluene dioxygenase, 2-22 Whole genome evolutionary strategies for screen large libraries, 4-10 to 4-11 shuffling approaches, 4-9

X Xylose repressor translation, 3-11 upstream pathways, 24-7

Y YATP biological constant, growth rate and type of carbon and energy source, 21-5 Yeast artificial chromosomes (YACs), 6-12 Yeast isoprenoid precursor pathway, 4-4 Yersinia pseudotuberculosis pKOBEG-sacB use for recombineering, 7-6 Y-Ligation-based block shuffling (YLBS), 2-14

Z zupT iron transporter, iron and heme concentrations, 4-4 Zymomonas mobilis ATPase activity inhibition by DCCD, 21-15 in metabolic engineering, 24-8 to 24-9 for pentose utilization, 21-14

(a)

pCL1920 4549 bp

repA

Plac HindIII PstI SalI XbaI BamHI SmaI KpnI SacI lacZα

(b) araC

NheI EcoRI PBAD SalI SphI rrnB term. AmpR

pKLJ12 12396 bp

Spc/StrR

mini-F (f5) fragment

Figure 6.1 Representative low-copy plasmids. (a) Plasmid pCL1920, derived from the R-factor replicon pSC101. (b) Plasmid pKLJ12, derived from the mini-F replicon pML31 (f5 fragment of the F factor plasmid). In both diagrams, the restriction sites that constitute the multiple cloning site are shown. (From Cohen, S. N. and Chang, A. C. Y. Proc. Nat. Acad. Sci. USA 70 (5), 1293–1297, 1973a, Jones, K. L. and Keasling, J. D. Biotechnol. Bioeng. 59, 659–665, 1998. With permission.) (a) PPK

polyPn ATP (b)

polyPn+1

O

PPX

polyPn+ Pi

-O

P

O

O-

P

O-

On–2 O

SPR O

H 2N

N

CH3

H2N

N

H N

O

C10, C15, C20 H2N

IMP

OH

N

HN GCHI

N

H2N PPP O

N

N O

OPPP N

(d)

CH3

O

OH

HN

OH

NH

PTPS

OPP DMAPP (C5)

polyP

BH4

O

NH

idi

HN

O

H N

HN

DXP

Lycopene

O

(c)

Glucose

OPP

O

O-

ADP

Pyruvate + G3P dxs

IPP (C5)

P

O

OH

NH

guaB

XMP

guaA

OH OH

GMP

gmk

GDP

GTP

Figure 6.2 Metabolic pathways in which the use of low-copy plasmids improved production. (a) One-step pathway for the production of polyphosphates from ATP. PPK = polyphosphate kinase, PPX = polyphosphatase. (b) Abbreviated pathway for lycopene formation. dxs = DXP synthase, idi = IPP isomerase; G3P = glyceraldehydes-3 -phosphate, IPP = isopentenyl diphosphate, DMAPP = dimethylallyl diphosphate; C10, C15, C20 symbolize 10-, 15-, and 20-carbon chain length intermediates. (c) Tetrahydrobiopterin (BH4) biosynthetic pathway from GTP (guanosine 5′-triphosphate). GCHI = GTP cyclohydrolase I; PTPS = 6-pyruvoly-tetrahydropterin synthase; SPR = sepiapterin reductase. (d) Abbreviated pathway for GTP synthesis from IMP. guaB = IMP dehydrogenase; guaA = GMP synthase; gmk = GMP kinase. IMP = inosine 5′-monophosphate; XMP = xanthosine 5′-monophosphate; GMP = guanosine 5’-monophosphate; GDP = guanosine 5′-diphosphate; GTP = guanosine 5′-triphosphate.

A Stem loops I-II interactions

Stem loops I-II interactions retained Competing strand

Direct coupling Stem III Regulatory domain (sTRSV ribozyme)

Sensor domain (aptamer)

integration Aptamer-coupled ribozyme

Aptamer-coupled ribozyme ON switch platform

Insertion into the 3´ UTR of a target gene through stem III

AAAAA GFP Aptamer bound, ribozyme inactive conformation, allowing gene expression

Ligand

AAAAA GFP Aptamer unbound, ribozyme active conformation, suppressing gene expression

B

AAAAA GFP Aptamer unbound, ribozyme inactive conformation, allowing gene expression

Ligand

AAAAA GFP Aptamer bound, ribozyme active conformation, suppressing gene expression

Figure 8.11 General compositional framework and design strategy for engineering universal, ligand-controlled cis-acting hammerhead ribozyme-based regulatory systems. The color scheme is as follows: catalytic core, purple; loop sequences, blue; aptamer sequence, brown; competing strand, green; switching strand, red; spacer sequences, orange; cleavage site, brown arrow. Modular strategies for coupling the aptamer and regulatory domains and systematic integration of the coupled control molecule comprising these domains into a target mRNA are shown. An aptamer is directly attached to the ribozyme through one of its loops without replacing any part of the ribozyme, thereby maintaining loop I-II interactions required for in vivo functionality. Spacer sequences are included on both ends of the control molecule to insulate from non-specific interactions with the surrounding sequences. A competing strand, whose sequence is similar to that of the switching strand, is integrated into the aptamercoupled ribozyme, which enables the control molecule to adopt two primary conformations through the strand displacement mechanism, as the competing strand displaces the switching strand. Through this mechanism, (A) an ON switch, in which ligand binding stabilizes the conformation with the disrupted catalytic core, and (B) an OFF switch, in which ligand binding stabilizes the conformation with the restored catalytic core, are constructed. (Adapted from Win, M.N. and Smolke, C.D. Proc. Natl. Acad. Sci. USA, 104, 14283–14288, 2007.)

Aptamer I

ON switch I, active ribozyme

AAAAA

Aptamer II

Sensor domain replacement

ON switch II, active ribozyme

AAAAA

ON switch II, inactive ribozyme

AAAAA

Figure 8.12 Modular design strategies for the construction of new ribozyme switches comprising aptamer domains responsive to diverse ligands. The color scheme corresponds to that used in Figure 8.11. An ON switch platform is used for illustration where aptamer I (left dashed box) is directly replaced with aptamer II (right dashed box) to construct an ON switch II. (Adapted from Win, M.N. and Smolke, C.D. Proc. Natl. Acad. Sci. USA, 104, 14283–14288, 2007.) (a)

(e)

(b)

(c)

(f )

(d)

Figure 11.4 Influencing metabolic pathways via the addition of small molecule effectors. (a) Substrate and product molecule resulting from the presence or absence of (b) small molecule effectors that influence (c) active and (d) inactive enzymes within (e, f) metabolic pathways engineered for function that is coupled to ligand-binding events.

(a)

(b)

(c)

(d) Periplasm

Cytoplasm

NAD+ NADH ADP+Pi

ATP

Figure 11.5 Proposed model for engineering bacterial multienzyme systems between sequential reactions. One approach to the engineering of dynamic metabolic channels is through the use of a library of generic interacting domains (a) to directionally tether specific recombinant enzymes to one another (b, c) in order to produce any metabolic product of interest. Efficiently directing cellular substrates into metabolons will lead to the creation of engineered metabolic machines in bacteria (d) that mimic, and even rival, those found in higher organisms such as plants.

(b)

(a)

– A

B

C

D

E

(c)

Figure 17.4 (a) A simple schematic of negative feedback system in metabolic pathways. (b) The temporal simulation profile of metabolite concentrations of a hypothetical system as depicted in Figure 17.4 and Table 17.2a. (c) The temporal simulation profile of metabolite concentrations of a hypothetical system as depicted in Figure 17.4 and Table 17.2b (without negative feedback mechanism). In (b) and (c) all metabolites are initially at steady-state levels and at t = 0 s, the concentration of A is increased instantaneously (perturbed) by 266 molecules or 0.44 mM. The x-axis represents time in seconds and the y-axis represents the number of metabolites. (Using an assumed volume of 1e–18 l, we could covert the y-axis to metabolite concentration, if necessary.) All simulations were carried out using the E-Cell system version 3.

(b)

(a) F A

– B

C

D

E

(c)

Figure 17.5 (a) Metabolite B having an additional reaction that converts it to metabolite F. The temporal simulation profile of metabolite concentrations, (b) without and (c) with negative feedback mechanism, a hypothetical system depicted in Figure 17.6 and Table 17.3. Initially all metabolites remain at steady-state condition and at t = 0 s, the concentration of A is increased instantaneously (perturbed) by 266 molecules or 0.44 mM (volume of cell is assumed to be 1e–18 l). The x-axis represents time in seconds and the y-axis represents the number of metabolites. All simulations were carried out using the E-Cell system version 3.

OH C4H

NH2

HOOC Phenylalanine

OH

HOOC p-Coumaric acid

PAL HOOC Cinnamic acid

OH HOOC Caffeic acid

4CL R1

PAL - phenylalanine lyase C4H - cinnamate 4-hydroxylase 4CL - 4-coumaryl:CoA lyase CHS - chalcone synthase CHI - chalcone isomerase DFR - dihydroxyflavanone reductase ANS - anthocyanidin synthase IFS - isoflavone synthase 3GT - 3-O-glucosyltransferase FSI - flavone synthase FHT - flananone 3b-hydoxylase FLS - flavonol synthase

R2 COSCoA Acid-CoA complex

H2C

O COSCoA Malonyl-CoA

x3

CHS

COOH

R1 OH

HO

R2

OH O Chalcones CHI

R1 HO

O+

OH

DFR ANS

OH O Flavanones

FHT FLS

FSI

3GT

R1

R1 O+

OH

R1

O

HO

HO

R2

O-Glc R2 OH Anthocyanin 3-O-glucosides

Flavonoid class

Isoflavones flavan-4-ols

R2

OH R2 OH Anthocyanidins

HO

IFS DFR

R1

O

HO

OH R2 OH O Flavonols

OH O Flavones

R1=H

R2=H

R1= OH

OH

O

R2=H

R1= OH

R2= OH

Flavanones

(2S)-Pinocembrin

(2S)-Naringenin

(2S)-Eriodictyol

Flavones

Apigenin

Luteolin

Chrysin

Floavonols

Kaempferol

Quercetin

Myrecetin

Anthocyanidins

Palargonidin

Cyanidin

Delphinidin

Anthocyanin 3-O-glucosides

Palargonidin 3-O-glucoside

Cyanidin 3-O-glucoside

Delphinidin 3-O-glucoside

Figure 22.1 Metabolic network for biosynthesis of the variety of flavonoids is shown with relevant genes for each reaction highlighted. The structure of each class is shown with the R groups (i.e., H, OH) dependent on the substrate fed to the pathway.

Biomass

Enzyme production

Size reduction

Cellulases/ hemicellulases

CBP

Enzymatic hydrolysis

Pretreatment and detoxfication

Lignin

Sugars SSF Fermentation

Product recovery

Coproducts

Ethanol

Figure 24.1 Ethanol production from biomass. Saccharifying enzymes and biocatalyst work in tandem in simultaneous saccharification and fermentation (SSF). In consolidated bioprocessing (CBP), the enzymes are produced directly by the ethanologenic biocatalyst.

NADPH NADH

D-xylose

NADP NADPH

NADP NAD

Xylitol

Xylose reductase YEASTS

NAD

Xylose isomerase BACTERIA

NADH

L-xylulose reductase

Xylitol dehydrogenase

NADH NAD L-xylulose

L-arabinitol 4-dehydrogenase

Aldose reductase

FUNGI

NADP NAD NADPH NADH

L-arabinose

D-xylulose ATP Xylulokinase ADP

L-arabinitol

L-ribulosephosphate 4-epimerase L-ribulose-5-P

D-xylulose-5-P Pentose phosphate pathway

L-arabinose isomerase

BACTERIA

L-ribulokinase ADP

ATP

D-glyceraldehyde-3-P Glycolysis Pyruvate PDC, ADH Ethanol

Figure 24.2 Upstream pathways for xylose and arabinose in bacteria, fungi, and yeasts.

L-ribulose