Enzyme Functionality Design Engineering And Screening.Svendsen

Enzyme Functionality Design, Engineering, and Screening edited by Allan Svendsen Novozymes AIS Bagsvard, Denmark m M...

Author: Allan Svendsen

157 downloads 1578 Views 8MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Enzyme Functionality Design, Engineering, and Screening edited by

Allan Svendsen

Novozymes AIS Bagsvard, Denmark

m MARCEL

DEKKER

MARCELDEKKER, INC.

NEWYORK BASEL

Although great care has been taken to provide accurate and current information, neither the author(s) nor the publisher, nor anyone else associated with this publication, shall be liable for any loss, damage, or liability directly or indirectly caused or alleged to be caused by this book. The material contained herein is not intended to provide speciﬁc advice or recommendations for any speciﬁc situation. Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identiﬁcation and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress. ISBN: 0-8247-4709-7 This book is printed on acid-free paper. Headquarters Marcel Dekker, Inc., 270 Madison Avenue, New York, NY 10016, U.S.A. tel: 212-696-9000; fax: 212-685-4540 Distribution and Customer Service Marcel Dekker, Inc., Cimarron Road, Monticello, New York 12701, U.S.A. tel: 800-228-1160; fax: 845-796-1772 Eastern Hemisphere Distribution Marcel Dekker AG, Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland tel: 41-61-260-6300; fax: 41-61-260-6333 World Wide Web http://www.dekker.com The publisher oﬀers discounts on this book when ordered in bulk quantities. For more information, write to Special Sales/Professional Marketing at the headquarters address above. Copyright n 2004 by Marcel Dekker, Inc. All Rights Reserved. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microﬁlming, and recording, or by any information storage and retrieval system, without permission in writing from the publisher. Current printing (last digit): 10 9 8 7 6 5 4 3 2 1 PRINTED IN THE UNITED STATES OF AMERICA

Preface

This book focuses on understanding enzyme functionality and protein design in order to create altered characteristics. Aspects of knowledge and understanding within the areas of enzyme function and engineering are discussed, as are the subsequent screening of new enzyme variants. The book presents some background concepts required for analyzing enzymes and proteins, in order to increase our understanding of their function. The following chapters are on selected topics that have been part of my daily work regarding protein design, which includes structural knowledge to guide random mutagenesis to selected areas of the protein, as well as methods to create and select variant proteins with the desired characteristics. I hope that the many stories from the many specialists who contributed to this volume will generate novel ideas for further development in the ﬁeld of protein engineering. The diverse knowledge backgrounds from theoretical to pure experimental science and the biological to the biophysical sciences are intended to meld to create a novel understanding of the enzyme functionality and the variant enzyme design and production. In order to engineer enzyme functionality, some basic understanding is needed of the enzyme function and the speciﬁc enzyme characteristics, both in general and under the conditions in which the enzymes are intended to be used. To understand the functionality of an enzyme we also need to understand the substrate, as the substrate is often much more complicated and larger than the enzyme. This makes the interaction of the enzyme with the iii

iv

Preface

substrate a much more complicated interaction than small substrate interactions with a bigger enzyme, which are usually discussed in textbooks. While the issue of the substrate is not the subject of a separate chapter, the problem is touched on by G. H. Peters in Chapter 6. The increasing insight into sequence families and the structural relationship among diﬀerent functionalities is breaking up the old enzyme classiﬁcation system (the EC classes). There is discussion of the possibilities of combining the knowledge of the relationships and the developed techniques within protein engineering, design, and directed evolution, for development of new functionalities in common scaﬀolds—the so-called ‘‘promiscuity area.’’ The screening is emphasized as the most important issue in protein engineering. We can to a certain extent direct our design to limit the overwhelming number of possibilities, and we can do it in the DNA, but we cannot easily sample what we expect to ﬁnd because our assays are not always good enough: ‘‘you get what you select for.’’ This book is divided into parts on enzyme design, enzyme diversity generation, and screening. Within these areas the book focuses on ideas, function, and results rather than on precise method description. Many methods are mentioned, but a separate reference may be consulted to obtain more detailed information. Each Part begins with a chapter that provides a general overview or a discussion of ideas and concepts, and continues with chapters that present scientiﬁc studies, but in the sense of presenting the writers’ ideas and experiences of the subject. Part I, on design, covers protein engineering concepts, library design methods, and computer design methods. The chapters focus on how to design or suggest directions of mutational work, and discuss computational methods. The simulation data ﬁx a new complexity level and allow us to extrapolate our ideas further into a more complex understanding. Parts IIA and B, on engineering, cover site-directed methods including combinations and redesign, and evolutionary methods and phage display, and provide examples on how to make the variants. Part III, on screening, covers chemical-based assays, ﬂuorescence-based assays, and in vivo assays, and discusses the important area of selecting the correct variants within the large variability it is possible to create. Also in Part III, two issues of assaying the characteristics and expression of the variant enzymes are brieﬂy discussed. All scientiﬁc studies present research containing examples of the issues covered in that section. Part I: Enzyme Design Chapter 1 introduces enzyme engineering concepts, mainly using phytases as examples. Chapter 2 discusses the classiﬁcation system for enzymes, including the EC system and a new classiﬁcation system so far used mainly for carbohydrate-degrading enzymes. The new classiﬁcation puts diﬀerent activity types together in one family, basically connected by a similarity in the 3D structure. Chapter 3 examines the

Preface

v

variability in variant 3D structures based on X-ray crystallography, and hence begins discussing ‘‘subtle’’ changes in these structures. This chapter gives examples from subtilisin proteases and T4 lysozyme. Chapter 4 discusses another predictive computational tool for lipase, using enantioselectivity. Chapter 5 discusses a computational prediction tool called ‘‘combine,’’ which has implications for the guidance of protein engineering activities. The enzyme is haloalkane dehalogenase. Chapter 6 thoroughly describes molecular simulation, using lipases and PTPase as examples. Chapter 7 discusses some of the issues within theoretical electrostatics analysis of enzymes, mainly focusing on some titration characteristics with examples from xylanase, lyzozyme, and alcohol dehydrogenase. Chapter 8 presents the theoretical numbers of combinations of variant sequences and the necessary limitation of the diversity in experimental settings. Part IIA: Enzyme Diversity Generation: Site-Directed and Redesign Chapter 9 describes the activity of a variant enzyme using bacterial alphaamylases as the main example, and includes electrostatics calculations. Chapter 10 discusses the understanding of catalysis of chitinase, including electrostatics, molecular dynamics, and variants as tools. Chapter 11 discusses the mutational development of a phosphotriesterase enzyme; changes in speciﬁc activities and stereospeciﬁcity are also discussed. Chapter 12 explores the engineering of glucose dehydrogenases and their potential use in biosensors. Chapter 13 provides a thorough review of one well-researched concept for stabilization of proteins, the so-called ‘‘Proline rule,’’ using oligo1,6-glucosidase as an example. Chapter 14 discusses changing the DNA enzymes themselves for better functionality, using the homing endonucleases as examples. Part IIB: Enzyme Diversity Generation: Evolutionary Methods Chapter 15 discusses evolutionary methods, mentioning pathway engineering and genome shuﬄing. Chapter 16 gives examples on random mutagenesis methods and describes the error-prone PCR methods and results. Chapter 17 discusses phage display of enzymes, including examples of gluthathion transferase, betalactamase, subtilisins, lipases, PenG, and metalloenzymes, as well as suicide substrates. Chapter 18 reviews the in-vivo–directed evolution in yeast, with a lipase as an example. Chapter 19 discusses shuﬄing of the catechol 2,3dioxygenase and Chapter 20 the shuﬄing and chimers of gluthathion transferases. Chapter 21 discusses chimers of b-glucosidases. Part III: Screening Chapter 22 reviews screening methods. Chapter 23 focuses on methods for screening for thermostability and examples of thermostable variants. Chapter 24 discusses screening methods and highthroughput screening (HTS)—ﬂuorescence methods and digital imaging, as well as library design and combinatorial algorithms. Chapter 25 describes bottlenecks in screening setup, pricing, and HTS—primary and secondary

vi

Preface

screening. Chapter 26 discusses HTS of variants of Pseudomonas lipases changing the enantiomeric ratio. Chapter 27 discusses display in cells— ﬂuorescence-activated cell sorting (FACS). Chapter 28 gives an example of the importance of expression, which is well known but not discussed at length elsewhere—this is a very important issue for utilization of the variant enzymes, and also for understanding the screening results. Chapter 29 provides examples of protein modiﬁcations and assays. Although subtilisins and amylases have been the most famous engineered enzymes, they are discussed only in Chapters 3 and 17. Some of the earliest work on protein engineering of enzymes was on alpha-amylases. Amylases are addressed in Chapters 2, 9, and 10. Lipases are the most represented enzyme type discussed in this book, in Chapters 4, 6, 16–18, 22, 26, and 28, of which ﬁve discuss Pseudomonas sp. lipases, and three discuss enantiomer selectivity. Several chapters are within my own present ﬁeld of molecular modeling and computational biochemistry (Chapters 4–7) and theoretical considerations of DNA and mutant combinations (Chapter 8), as well as on library design (Chapter 24). Several other enzymes are discussed: phytase (Chapter 1), triesterases (Chapter 11), and glucose dehydrogenases (Chapter 12), among others. A large number of engineering methods are mentioned, for example, phage display (Chapter 17), in vivo recombination in yeast (Chapter 18), cell surface display (Chapter 27), screening assays in petri dishes (Chapters 18, 23, and 25), and in-solution and microtiter plates (Chapters 23–26 and 29). Assays for activity, speciﬁcity, and stability together with expression are very important parts of enzyme engineering. Details of reaction mechanisms are given in Chapters 10 and 11, and are also touched on in other chapters. Since around 1980, researchers started to change protein sequence by purpose. This led to the ‘‘old protein engineering cycle’’ (1983–1990), based on the understanding of the protein structure and its relationship to function. The protein engineering loop, ‘‘structure–theory–design–mutation–puriﬁcation–analysis,’’ was applied and the mutation was often based on some protein concepts, which was taken all the way through to analysis before a new variant was designed. Around 1990–1994 the ‘‘medium cycle’’ arose, still largely based on the rational method, but with increases in speed—‘‘make many and test’’—and still needing pure samples and feedback from the results. Later, ‘‘the evolutionary period’’ (1994–2002?) introduced random methods and variant libraries with many combinations; high-throughput screening started, testing the conformational ‘‘space.’’ Increases in computer speed opened up possibilities for a new understanding, and electrostatics and molecular dynamics simulations were becoming an integrated part of the design process.

Preface

vii

At present, we may want some understanding and structural information as well as the more random methodologies in combination. The uncertainty—the lowering of possibilities—has directed work, restricted the use of structural information and more sophisticated screening methods, and made it important for combining library design and screening. Terms like ‘‘directed random evolution’’—doped oligonucleotide methods in addition to directed evolution—which had begun to be used in 1993–1994, have become more common. In the future, we probably will have to think ‘‘out of the protein’’ on the external interactions, the electric ﬁeld, the dynamical behavior, the water structure, the surroundings, the interactions, and so on. In the beginning of protein engineering, the mutants were followed from design (structure-based) to test, often before doing the next mutant, and led to our belief today in high throughput. In Chapter 8, G. L. Moore and C. D. Maranas make it clear that a limitation is necessary (possible). A drawback in directed evolution has been the fact that the variant enzyme is judged in a nonpuriﬁed condition. Also, the error-prone PCR method is found not to be a random choice but rather a much-directed choice, which, when analyzed, covers only a certain number of possibilities (see Chapter 16). An interesting possibility today is the changing of the polymerases and DNA acting enzymes for alternative mutagenesis reactions (see Chapter 14). My experience tells me that to make a ﬁnal choice of the variant enzymes, the variant enzyme has to be puriﬁed in order to secure the correct characteristics of the protein. Chapter 29, the last chapter, focuses on the protein stability measurements and modiﬁed enzymes—in this case, chemically modiﬁed enzymes. This chapter, together with Chapter 28 on expression of variant enzymes, emphasizes the importance of testing and expressing reasonable amounts, and making measurements of the ﬁnal puriﬁed protein. The contributors to this book come from all over the world, young scientists and older ones, and present diﬀerent opinions on which method is the best for obtaining a certain characteristic of an enzyme. As a structural chemist, I personally prefer a structurally derived background for the design, whereas others prefer more random methods. The reasoning behind the choices and each contributor’s personal experiences are discussed. Enzyme Functionality is mainly for scientiﬁc professionals, Ph.D. candidates, and post-doctoral students in the ﬁeld of enzymes and the many related special areas such as molecular dynamics simulation, electrostatics, and genetic methods for variant engineering, as well as the always very important area of screenings assay development. The chapters are based mainly on the scientists’ own experiences and are written at a high scientiﬁc level, with thorough discussions of ideas and methods and a wide range of references to original articles.

viii

Preface

I would like to mention that the second chapter of this book is dedicated to Martin Schu¨lein, a Novozymes researcher, colleague, and friend, who died much too early in his always-energetic work on carbohydrate-degrading enzymes. The chapter authors have written based on their own experience and expertise, and present a wide variety of ideas. I hope this diverse range of ideas gives readers inspiration for their own choice of design, and that they can contribute to strengthening the important discussion in this strongly developing ﬁeld of protein engineering. Allan Svendsen

Contents

Preface Contributors

iii xiii

Part I: Enzyme Design 1. Concepts for Protein Engineering Martin Lehmann 2. Sequence Families and Modular Organization of Carbohydrate-Active Enzymes Bernard Henrissat, Pedro M. Coutinho, Emeline Deleury, and Gideon Davies 3. Analyzing Three-Dimensional Structures of Variant Enzymes Richard Bott 4. Quantitative Modeling of Lipase Enantioselectivity Ju¨rgen Pleiss

1

15

35

59

ix

x

Contents

5. Rational Redesign of Haloalkane Dehalogenases Guided by Comparative Binding Energy Analysis Jirˇı´ Damborsky´, Jan Kmunı´cˇek, Toma´sˇ Jedlicˇka, Santos Luengo, Federico Gago, Angel R. Ortiz, and Rebecca C. Wade 6. Computer Simulations: A Tool for Investigating the Function of Complex Biological Macromolecules Gu¨nther H. Peters 7. Calculations of Ionization Equilibria in Proteins Andrey Karshikoﬀ 8. Modeling and Optimization of Directed Evolution Protocols Gregory L. Moore and Costas D. Maranas

79

97

149

185

Part IIA: Enzyme Diversity Generation: Site-Directed and Redesign 9. Rational Redesign of Enzymes Jens Erik Nielsen 10. Details in the Reaction Mechanism of Chitinases Vincent G. H. Eijsink, Gustav Kolstad, Sigrid Ga˚seidnes, Bjørnar Synstad, Martin G. Peter, Jens Erik Nielsen, David Komander, Douglas Houston, and Daan M. F. van Aalten 11. Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase Frank M. Raushel 12. Protein Engineering of PQQ Glucose Dehydrogenase Satoshi Igarashi and Koji Sode 13. The Proline Rule: A Concept for Engineering Protein Stability Yuzuru Suzuki 14. Homing Endonucleases: Tools and Targets for Protein Engineering Alfred Pingoud, Ann-Jose´e Noe¨l, Vera Pingoud, Shawn Steuer, and Wolfgang Wende

213

229

247

261

293

325

Contents

xi

Part IIB: Enzyme Diversity Generation: Evolutionary Methods 15. Evolutionary Methods for Protein Engineering Huimin Zhao and Wenjuan Zha 16. Directed Evolution by Random Mutagenesis: A Critical Evaluation Thorsten Eggert, Manfred T. Reetz, and Karl-Erich Jaeger 17. Enzyme Engineering by Phage Display Patrice Soumillion, Daniel Legendre, and Jacques Fastrez 18. In Vivo Gene Shuﬄing in Yeast: A Fast and Easy Method for Directed Evolution of Enzymes Jens Sigurd Okkels 19. Eﬀective DNA Shuﬄing Methods for Enzyme Evolution Osamu Kagami, Sang-Ho Baik, and Shigeaki Harayama 20. Exploring the Functional Space of Combinatorial Mutant Libraries for the Directed Evolution of Novel Enzyme Activities Bengt Mannervik, Lars O. Hansson, and William G. Bardsley 21. Modifying the Character of an Enzyme by Producing Chimeric Enzymes: Chimeric h-glucosidases as an Illustration Kiyoshi Hayashi, Bong Jo Kim, Kshamata Goyal, Satya Singh, Jong-Deog Kim, Yeon-Kye Kim, Satoru Nirasawa, and Motomitsu Kitaoka

353

375

391

413

425

443

461

Part III: Screening 22. Assay Systems for Screening or Selection of Biocatalysts Uwe T. Bornscheuer

475

23. Screening of Enzyme Variants for Thermostability Shigenori Kanaya

491

xii

24. Combinatorial Mutagenesis Algorithms, Digital Imaging Spectroscopy, and Solid-Phase Assays for Directed Evolution Simon Delagrave, Edward J. Bylina, William J. Coleman, Steven J. Robles, Mary M. Yang, Christin L. McConnell, and Douglas C. Youvan

Contents

507

25. Screen Automation and Robotics Michael H. Lamsa, Nils Buchberg Jensen, and Steen Krogsgaard

525

26. Screening for Enantioselective Enzymes Manfred T. Reetz

559

27. Enzyme Engineering by Microbial Cell Surface Display Thorsten M. Adams and Harald Kolmar

599

28. Overexpression and Secretion of Biocatalysts in Pseudomonas Frank Rosenau and Karl-Erich Jaeger

617

29. Analysis of Catalytic and Structural Stability of Native and Covalently Modiﬁed Enzymes P. V. Sundaram and S. Srimathi

633

Index

661

Contributors

Thorsten M. Adams Abteilung fu¨r Molekulare Genetik und Pra¨parative Molekularbiologie, Institut fu¨r Mikrobiologie und Genetik, Georg-AugustUniversita¨t Go¨ttingen, Go¨ttingen, Germany Sang-Ho Baik Kamaishi Laboratories, Marine Biotechnology Institute Co., Ltd., Kamaishi, Japan William G. Bardsley Uppsala, Sweden

Department of Biochemistry, Uppsala University,

Uwe T. Bornscheuer Institute of Chemistry and Biochemistry, Department of Technical Chemistry and Biotechnology, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany Richard Bott Genencor International, Palo Alto, California, U.S.A. Edward J. Bylina, Ph.D. U.S.A. William J. Coleman, Ph.D. U.S.A.

KAIROS Scientiﬁc Inc., San Diego, California,

KAIROS Scientiﬁc Inc., San Diego, California, xiii

xiv

Contributors

Pedro M. Coutinho, Ph.D.* Centre for Biological and Chemical Engineering, Instituto Superior Te´cnico, Lisbon, Portugal Jirˇ ı´ Damborsky´, Ph.D. National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic Gideon J. Davies York Structural Biology Laboratory, Department of Chemistry, University of York, York, England Simon Delagrave, B.Sc., Ph.D.

BioTech Studio, Newark, Delaware, U.S.A.

Emeline Deleury Architecture et Fonction des Macromole´cules Biologiques, Centre National de la Recherche Scientiﬁque (CNRS), Universite´s d’Aix-Marseille I and II, Marseille, France Thorsten Eggert, Ph.D. Institut fu¨r Molekulare Enzymtechnologie, Heinrich-Heine Universita¨t Du¨sseldorf, Forschungszentrum Ju¨lich, Ju¨lich, Germany Vincent G. H. Eijsink, Ph.D. Department of Chemistry and Biotechnology, Agricultural University of Norway, A˚s, Norway Jacques Fastrez Laboratoire de Biochimie Physique et des Biopolyme`res, Institut des Sciences de la Vie, Universite´ Catholique de Louvain, Louvain-laNeuve, Belgium Federico Gago, Ph.D. Madrid, Spain

Department of Pharmacology, University of Alcala,

Sigrid Ga˚seidnes, M.Sc. Department of Chemistry and Biotechnology, Agricultural University of Norway, A˚s, Norway Kshamata Goyal Tsukuba, Japan

Enzyme Laboratory, National Food Research Institute,

Lars O. Hansson Department of Biochemistry, Uppsala University, Uppsala, Sweden

* Current aﬃliation: Architecture et Fonction des Macromole´cules Biologiques, Centre National de la Recherche Scientiﬁque (CNRS), Universite´s d’Aix-Marseille I and II, Marseille, France.

Contributors

xv

Shigeaki Harayama Tokyo, Japan Kiyoshi Hayashi Tsukuba, Japan

National Institute of Technology and Evaluation,

Enzyme Laboratory, National Food Research Institute,

Bernard Henrissat, D.Sc. Architecture et Fonction des Macromole´cules Biologiques, Centre National de la Recherche Scientiﬁque (CNRS), Universite´s d’Aix-Marseille I and II, Marseille, France Douglas Houston Division of Molecular Microbiology and Biological Chemistry, Wellcome Trust Biocentre, University of Dundee, Dundee, Scotland Satoshi Igarashi Department of Biotechnology, Tokyo University of Agriculture and Technology, Tokyo, Japan Karl-Erich Jaeger, Ph.D. Institut fu¨r Molekulare Enzymtechnologie, Heinrich-Heine Universita¨t Du¨sseldorf, Forschungszentrum Ju¨lich, Ju¨lich, Germany Toma´sˇ Jedlicˇka, M.Sc. National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic Nils Buchberg Jensen, M.Sc. (Chem. Eng.) Laboratory Technology, Novo Nordisk Engineering A/S, Bagsværd, Denmark Osamu Kagami, Ph.D. Kamaishi Laboratories, Marine Biotechnology Institute Co., Ltd., Kamaishi, Japan Shigenori Kanaya, Ph.D. Department of Material and Life Science, Graduate School of Engineering, Osaka University, Osaka, Japan Andrey Karshikoﬀ, Ph.D. Department of Biosciences at Novum, Karolinska Institutet, Huddinge, Sweden Bong Jo Kim Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan Jong-Deog Kim Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan

xvi

Contributors

Yeon-Kye Kim Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan Motomitsu Kitaoka Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan Jan Kmunı´ cˇ ek, M.Sc. National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic Harald Kolmar Abteilung fu¨r Molekulare Genetik und Pra¨parative Molekularbiologie, Institut fu¨r Mikrobiologie und Genetik, Georg-August-Universita¨t Go¨ttingen, Go¨ttingen, Germany Gustav Kolstad, M.Sc. Department of Chemistry and Biotechnology, Agricultural University of Norway, A˚s, Norway David Komander Division of Molecular Microbiology and Biological Chemistry, Wellcome Trust Biocentre, University of Dundee, Dundee, Scotland Steen Krogsgaard, Ph.D. Strain Development, Molecular Biotechnology, Novozymes A/S, Bagsværd, Denmark Michael H. Lamsa, B.S. HTS-Core Robotics, Novozymes Biotech, Inc., Davis, California, U.S.A. Daniel Legendre Laboratoire de Biochimie Physique et des Biopolyme`res, Institut des Sciences de la Vie, Universite´ Catholique de Louvain, Louvain-laNeuve, Belgium Martin Lehmann Biotechnology Research Department, Roche Vitamins AG, Basel, Switzerland Santos Luengo Madrid, Spain

Department of Pharmacology, University of Alcala,

Bengt Mannervik Uppsala, Sweden

Department of Biochemistry, Uppsala University,

Costas D. Maranas, Ph.D. Department of Chemical Engineering, Pennsylvania State University, University Park, Pennsylvania, U.S.A.

Contributors

xvii

Christin L. McConnell Department of Earth and Planetary Sciences, Environmental Science and Public Policy Program, Harvard University, Cambridge, Massachusetts, U.S.A. Gregory L. Moore, B.S. Department of Chemical Engineering, Pennsylvania State University, University Park, Pennsylvania, U.S.A. Jens Erik Nielsen, Ph.D.* Howard Hughes Medical Institute and Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California, U.S.A. Satoru Nirasawa Tsukuba, Japan

Enzyme Laboratory, National Food Research Institute,

Ann-Jose´e Noe¨l Institute for Biochemistry, Justus-Liebig-Universita¨t, Giessen, Germany Jens Sigurd Okkels, Ph.D.y Novozymes A/S, Bagsværd, Denmark Angel R. Ortiz, Ph.D.z Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York, U.S.A. Martin G. Peter, Dr. Germany

Institute of Chemistry, University of Potsdam, Golm,

Gu¨nther H. Peters, Ph.D. Department of Chemistry, MEMPHYS-Center for Biomembrane Physics, Technical University of Denmark, Lyngby, Denmark Alfred Pingoud sen, Germany

Institute for Biochemistry, Justus-Liebig-Universita¨t, Gies-

Vera Pingoud Institute for Biochemistry, Justus-Liebig-Universita¨t, Giessen, Germany Ju¨rgen Pleiss, Ph.D. Institute of Technical Biochemistry, University of Stuttgart, Stuttgart, Germany * Current aﬃliation: Department of Biochemistry, University College Dublin, Dublin, Ireland y Current aﬃliation: Molecular Biology, Maxygen Aps, Horsholm, Denmark z Current aﬃliation: Centro de Biologia Molecular, Universidad Autonoma de Madrid, Madrid, Spain

xviii

Contributors

Frank M. Raushel, Ph.D. Department of Chemistry, Texas A&M University, College Station, Texas, U.S.A. Manfred T. Reetz Max-Planck-Institut fu¨r Kohlenforschung, Mu¨lheim an der Ruhr, Germany Steven J. Robles, Ph.D. U.S.A.

KAIROS Scientiﬁc Inc., San Diego, California,

Frank Rosenau, Ph.D. Institut fu¨r Molekulare Enzymtechnologie, Heinrich-Heine Universita¨t Du¨sseldorf, Forschungszentrum Ju¨lich, Ju¨lich, Germany Satya Singh Enzyme Laboratory, National Food Research Institute, Tsukuba, Japan Koji Sode Department of Biotechnology, Tokyo University of Agriculture and Technology, Tokyo, Japan Patrice Soumillion Laboratoire de Biochimie Physique et des Biopolyme`res, Institut des Sciences de la Vie, Universite´ Catholique de Louvain, Louvain-la-Neuve, Belgium S. Srimathi Centre for Protein Engineering and Biomedical Research, The Voluntary Health Services, Madras, India Shawn Steuer Institute for Biochemistry, Justus-Liebig-Universita¨t, Giessen, Germany P. V. Sundaram Centre for Protein Engineering and Biomedical Research, The Voluntary Health Services, Madras, India Yuzuru Suzuki, Ph.D. Department of Applied Biochemistry, Kyoto Prefectural University, Kyoto, Japan Allan Svendsen, Ph.D. mark

Protein Design, Novozymes A/S, Bagsværd, Den-

Bjørnar Synstad, Ph.D. Department of Chemistry and Biotechnology, Agricultural University of Norway, A˚s, Norway

Contributors

xix

Daan M. F. van Aalten, Ph.D. Division of Molecular Microbiology and Biological Chemistry, Wellcome Trust Biocentre, University of Dundee, Dundee, Scotland Rebecca C. Wade, D.Phil.

EML Research, Heidelberg, Germany

Wolfgang Wende Institute for Biochemistry, Justus-Liebig-Universita¨t, Giessen, Germany Mary M. Yang, Ph.D. U.S.A.

KAIROS Scientiﬁc Inc., San Diego, California,

Douglas C. Youvan, Ph.D.* KAIROS Scientiﬁc Inc., San Diego, California, U.S.A. Wenjuan Zha, B.S. Department of Chemical and Biomolecular Engineering, Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, U.S.A. Huimin Zhao, Ph.D. Department of Chemical and Biomolecular Engineering, Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, U.S.A.

* Current aﬃliation: Foundation for the Biological Manhattan Project, Frontenac, Kansas, U.S.A.

1 Concepts for Protein Engineering Martin Lehmann Roche Vitamins AG Basel, Switzerland

1

INTRODUCTION

Increasingly, proteins and especially enzymes come into focus as therapeutic and commercial targets, not only as isolated products but also as targets for metabolic engineering inside of a cell. However, most of the time, enzymes as isolated from nature do not fulﬁl the speciﬁc demands they have to meet for their industrial or medical application or in a re-engineered cell tailored for special tasks such as overproduction of natural compounds. To improve proteins/enzymes for given purposes, protein-engineering concepts have emerged over the last decades. Protein engineering describes the alteration/improvement of certain properties of a protein by changing its building blocks and, therefore, its structure and properties. This can be done by chemically modifying the amino acid residues of a protein or by altering its primary structure, which is possible since all modern DNA manipulation tools have become available. Typical chemical modiﬁcations are the crosslinking of amino acids at the protein surface or the derivatization of particular, reactive amino acid residues in the active center of enzymes such as the catalytically active serines in serine-proteases. Advantages of chemical modiﬁcation include that changes of the 3-D-structure remain localized and are 1

2

Lehmann

therefore much better predictable. However, the desired eﬀects are generally weak and are not achievable in each case and for every protein. In addition, the protein must be extracted from the cells and sometimes even puriﬁed. It is not applicable to proteins that should be active intracellularly. Finally, chemical modiﬁcation adds costs to a possible product. For these reasons, it is much more promising nowadays to directly change the properties of a protein by altering its primary structure. In order to make the most eﬃcient use of rational protein engineering, it is still desirable to understand better the links between the primary structure of a protein and its 3-D structure on one hand, and between its 3-D structure and properties on the other hand (Table 1).

Table 1

Protein Engineering Concepts

Stability Homology approaches

Reducing entropy of the unfolded state Increasing strength and number of interactions

Protein design algorithms (PDA) Activity Comparison of homologous enzymes

Comparison with thermostable homologues (Ref. 31), amino acid residue exchange between homologous proteins of same stability (Refs. 2,3), consensus concept (Refs. 4,9,11–13) Proline rule (Ref. 26), additional disulﬁde bonds (Ref. 23) H-bonds (Ref. 20), salt bridges, hydrophobic interactions (Refs. 21,22), stabilization of secondary structure elements (Refs. 33,36) (Ref. 40)

Detection of amino acid residues critical for catalysis, transfer of residues or introduction of new amino acid residues at those positions (Refs. 14,15) Alteration of the electrostatic and steric 3-D structure alone or in environment around amino acid combination with docking residues crucial for catalysis to algorithms further analyzing inﬂuence substrate speciﬁcity, speciﬁc catalysis of the enzyme activity, and enantioselectivity (Refs. 15,37–39) pKa shifts of a titratable group by alteration of the local environment to shift the pH activity proﬁle (Ref. 37)

Concepts for Protein Engineering

2

3

HOMOLOGY APPROACHES

Homology-based approaches try to use the information that is inherent in a collection of homologous wild-type enzymes. The amino acid sequence and the enzyme properties of a series of homologous wild-type enzymes are collected and compared in order to relate the diﬀerences in the amino acid sequence level to diﬀering enzyme properties. This is helpful for identifying amino acids that have a modulating eﬀect on catalysis or stability. 2.1

Stability

One strategy for identifying thermostabilizing mutations is to compare homologous proteins that pronouncedly diﬀer in their thermostability and to identify the amino acid residues responsible for the diﬀerence in thermostability. In one example, Perl et al. (1) compared the cold shock proteins (67 amino acids long) from the thermophile Bacillus caldolyticus and the mesophile Bacillus subtilis, which diﬀer only in 12 amino acid residues from each other but show a remarkable diﬀerence in stability. Site-directed mutagenesis of all 12 diﬀering residues in the B. caldolyticus background revealed that only two are responsible for the lower thermostability of the B. subtilis enzyme. However, the lower the sequence homology of two proteins the more diﬃcult it is to identify the residues that account for a diﬀerence in stability. Serrano et al. (2) showed that it is also possible to use the information of highly homologous enzymes of similar stability to create a new enzyme with improved stability. The work-intensive strategy is to determine for every non-conserved position of two or more homologous proteins which of the occurring amino acid residues is most stabilizing. Like in the cold-shock protein, as discussed above, the majority of the diﬀering amino acids will show no or only a slight diﬀerence in their impact on the overall protein stability. However, combination of the ‘‘most stabilizing’’ residues in one protein results in a mutant that, surprisingly, is more thermostable than any of the parents used for the design. Using the sequence information of barnase and binase, Serrano et al. were able to construct a new microbial RNase which was 3.3 kcal/mol more stable than barnase. In a similar approach in which structural information was also taken into account, Jiang et al. (3) increased the stability of the WW domain by 2.5 kcal/mol and increased the Tm by 28jC. The central prerequisite of this approach is a homology among the group of proteins used that is high enough that a given amino acid replacement can be expected to have the same eﬀect in all proteins compared. Steipe et al. (4) went one step further. They compared the variable VL domains of diﬀerent immunoglobulins and raised the idea that an amino acid that occurs more frequently at a given position of a sequence alignment is more stabilizing than an amino acid occurring less frequently. Using this idea, they predicted 10

4

Lehmann

individual, stabilizing mutations of which six had indeed a stabilizing eﬀect. Experiments on the VL and VH domains (5,6,7) and the design of a highly stable functional GroEL minichaperone (8) further supported this hypothesis. This approach greatly reduces the number of single mutations that have to be tested to achieve a stabilized mutant. A further step ahead was the idea to take the entire sequence alignment of a group of homologous proteins to calculate the consensus sequence of this alignment and to generate a synthetic gene coding for the consensus sequence obtained. This approach was tested on a subset of fungal phytases. The resulting consensus phytase, which is based on 13 wild-type sequences, was, surprisingly, 15–26jC more thermostable than all of its parents (9). It diﬀered in at least 80 amino acid residues from any of its parents. Incorporation of additional wild-type sequences in the alignment yielded an improved consensus phytase that was 7.4jC more thermostable than the ﬁrst one (10). Examination of the eﬀects on protein stability of most of the newly introduced residues revealed that 10 were stabilizing, 8 had no pronounced eﬀect, and 10 showed a destabilizing eﬀect; 4 were not tested. Back mutation of the most destabilizing mutations and introduction of another stabilizing mutation increased the melting temperature by another 5jC to 90jC ((11–12), see also Fig. 1). Although the 3-Dstructure of one of the wild-type phytases was known, none of this information went into the design process. The same consensus approach was used to generate a more stable consensus ankyrin repeat protein (13). 2.2

Activity

The information inherent in a sequence alignment of homologous proteins can also be used for activity engineering when the compared enzymes show diﬀerences in their catalytic properties. The crucial part of such an attempt is to correlate the diﬀerences in the amino acid sequence with the diﬀerence in a catalytic property. In case there is a manageable number of amino acid diﬀerences, each residue can be tested separately. However, when the number of diﬀerences is too high, additional information is usually required to determine the critical amino acid diﬀerences. A known 3-D-structure of one or more of the homologous proteins is of great help to further reduce the number of amino acid residues that might be responsible for the diﬀerence in enzymatic properties. In this direction, much work has been done on the engineering of fungal phytases. Two wild-type phytases from the Aspergillus niger strains NRRL 3135 and A. niger T213 display a threefold diﬀerence in speciﬁc activity, although they have only 12 amino acid diﬀerences distributed over the entire protein. Among the 12 divergent positions, three are located in or close to the substrate binding site. Testing of these three differing positions by site-directed mutagenesis revealed that a single amino

Concepts for Protein Engineering

5

Figure 1 ‘‘Evolution’’ of consensus phytases. (From Ref. 11. For further details see Refs. 9 and 10.)

acid diﬀerence is responsible for the threefold diﬀerence in speciﬁc activity (14). Position 27 of the same enzyme, which is located in the active site cleft, was also found to have a profound eﬀect on the speciﬁc activity. Usually, glutamine is found at this position, except for two phytases from Aspergillus terreus strains, which have a leucine. Remarkably, the latter two phytases have a much higher speciﬁc activity with phytic acid (up to 196 U/mg). Exchanging the glutamine residue against leucine in A. fumigatus phytase increased the speciﬁc activity with phytic acid from 26.5 to 92.1 U/mg at pH 5.0. However, this amino acid exchange had a negative eﬀect on protein stability. Therefore a series of additional amino acids were tested at position 27 for their eﬀects on the enzymatic properties. Threonine was the most favorable of the tested amino acids and had a positive eﬀect on the speciﬁc activity without negatively impacting protein stability (15). Comparable eﬀects as for A. fumigatus phytase were observed in consensus phytase (Fig. 2). Threonine has not yet been found in a wild-type phytase amino acid sequence at this position. In conclusion, comparison of the sequences and properties of homologous enzymes is quite useful for detecting amino acid residues that have an inﬂuence on catalysis. Amino acid substitutions at the identiﬁed,

6

Lehmann

Figure 2 pH-Dependent activity proﬁle of consensus phytase and two mutants, in which glutamine at position 50 was replaced by leucine or threonine (Refs. 9,10). This critical position was identiﬁed in a homology approach under the additional use of an available 3-D structure.

critical residues should not be restricted to amino acids occurring in homologous wild-type sequences; rather, saturation mutagenesis of those residues should be envisaged, as the example above shows. 3

STRUCTURE-BASED APPROACHES

The valid attempt to explain by the structure of a protein alone how it works, how it folds, and how it maintains its structure is not only very challenging but is also aimed too high at the moment. There are too many interactions that govern the folding of a protein or the catalysis of a chemical reaction. However, a 3-D structure alone or in combination with additional information, such as the characterization of the biochemical and biophysical properties of a protein or of additional structures of homologous proteins or the structure of the same protein in a complex with an inhibitor, a substrate or a product of the enzyme reaction, together with general concepts of protein folding and the way enzymes catalyze a reaction, can help generate ideas on how to improve the selected properties of a protein. First of all, amino acid residues have to be identiﬁed together with their function in the activity or stability of the enzyme. A time-consuming but straightforward approach using structural information for picking the protein positions of interest is

Concepts for Protein Engineering

7

called alanine scanning (16). Here a series of amino acid residues are changed to alanine and the eﬀect on enzyme activity or stability is analyzed. For improvement, saturation mutagenesis can be applied to the identiﬁed, critical protein positions. 3.1

Stability

The ﬁrst question that arises when a protein engineer is confronted with the task of stabilizing an enzyme is: under which conditions, why, and how is an enzyme inactivated? There are many ways to inactivate an enzyme and, accordingly, a number of strategies can be chosen to avoid inactivation. Generally, enzymes are sensitive against high temperature, extreme pH values, oxygen stress, proteases, deamidation of glutamine and asparagine residues, and chelating agents—if they depend on a metal ion for stability or activity— just to mention the most important factors that come into play when an enzyme is used in an industrial environment. Amino acids most susceptible to oxidation are the sulfur-containing amino acids cysteine and methionine. Replacing cysteines and methionines on the surface of the protein where they are particularly susceptible to oxidation can greatly reduce oxidative inactivation (17). Similarly, surface-exposed glutamine and asparagine residues can be replaced to avoid deamidation. If a protein is susceptible to proteases, it can be made more resistant by carefully altering the sequence around the preferred cleavage site(s) (18). Engineering a protein to sustain at extreme pH can be a more challenging task involving more than one or two residues. Here phenomena such as the spontaneous formation of peptide succinimides in Asp–Gly and Asn–Gly sequences, the burial of ionized groups, and the repulsive electrostatic forces caused by the large net charge that many proteins encounter at extreme pH values are important factors. If a problematic site is known, it can be attempted to replace this residue. However, it can be quite diﬃcult to identify the responsible residues, in particular if more than one or two residues are involved. Still, the most promising way to preserve the stability against most of the destabilizing factors is to increase the general stability of a protein molecule. A large number of weak interactions (hydrophobic interactions, salts bridges, H-bonds) of the amino acid side chains together with the backbone of peptide bonds keep a protein in its active 3-D state. Additionally, disulﬁde bonds and the incorporation of prolines help to stabilize the native state by reducing the entropy of the unfolded state. Therefore every mutation that helps to increase the number and strength of weak interactions in the native state or is able to reduce the entropy of the unfolded state increases the stability of a protein. Several concepts have been developed to achieve exactly that. Researchers have attempted to ﬁll cavities in a protein (19). This should

8

Lehmann

help increase the number and strength of most hydrophobic interactions. They engineered new salt bridges or H-bonds (20), they have improved hydrophobic interactions at the surface or in the protein core (21,22), they have engineered new disulﬁde bridges (23), and they have introduced additional prolines (24) or less glycines into the amino acid sequence (25). For every amino acid replacement in a protein, it has to be considered that the eﬀect of the replacement can reach far beyond the localized area it has been planned for. Sometimes, this leads to reorganization of parts or the entire structure of a protein accompanied by a much stronger and often negative eﬀect on the stability than the desired localized improvement would bring. As it is not possible at the moment to predict these far-reaching eﬀects of an amino acid replacement on the protein structure, the results of rational protein engineering are sometimes quite unexpected. Nevertheless, there have been some impressive examples using the strategies described above. Suzuki et al. (26) proposed that the proline content of a protein is correlated with its stability, because prolines have the unique property to decrease the entropy of the unfolded state of a protein. This strategy was successfully applied to diﬀerent proteins such as the bacteriophage T4 lysozyme (27), an oligo-2,6-glucosidase of Bacillus cereus, the neutral protease of Bacillus stearothermophilus (24), and the glucoamylase of Aspergillus awamori (28). The obtained results show that the positions in the amino acid sequence chosen for proline introduction have to be carefully selected; otherwise, strong negative eﬀects on stability could be the result of such replacements. Also, the introduction of new disulﬁde bonds has led to impressive results (23). Again, the success rate is rather low because of the unpredictable long-range eﬀects of the amino acid replacements. An article about protein stability would not be complete without mentioning one of the most impressive examples of thermostabilization of a protein. Eijsink’s group has worked for several years on the stabilization of thermolysin-like protease from B. stearothermophilus (29). They ﬁnally succeeded in a hyper-stable mutant which was still active at 100jC. The temperature optimum of the eightfold mutant was 21jC higher than that of the wild-type enzyme while maintaining its normal activity at 37jC. Five amino acid substitutions were derived from the 3jC more thermostable homologue thermolysin (30). The three remaining, rationally designed mutations included an additional proline and a new disulﬁde bridge. These impressive results indicate that the combination of diﬀerent strategies will result in larger jumps in thermostabilization. There has been an attempt also to increase the number of interactions by changing the quaternary structure from a monomer to a multimer because it is often observed that enzymes occurring in a mesophilic organism as a monomer are found in a thermophilic organism as a multimer (31). However, while engineering of active monomers from a multimer has already been

Concepts for Protein Engineering

9

achieved by replacing amino acids that are critical for multimerization and by subsequent stabilization of the monomeric structure (32), it is diﬃcult to favor multimerization of a protein that is found as a monomer in its wildtype form. It is also possible to improve the stability of secondary structure elements such as a-helices. a-Helices have a net positive charge at their Nterminal end and a net negative charge at their C-terminal end. An additional stabilizing interaction is generated when an oppositely charged amino acid residue is located in the right distance to the ends of the a-helix. This feature is called helix-capping (33,34). When an a-helix does not show helixcapping or suboptimal capping, the responsible amino acid can be replaced by a better one (35). Besides this, it has been found that not every amino acid occurs with the same probability in an a-helix or a h-sheet. By replacing amino acids that have a low propensity for occurring in such secondary structure elements against an amino acid with a higher propensity, the stability of a protein can be improved (36). Furthermore, it has been very popular to compare the structure of a mesophilic enzyme to its homologous thermophilic counterpart(s), to ﬁnd out which are the most common ways used by nature to evolve highly stable proteins. However, no general rules have been observed so far that can be applied to a larger number of proteins. Most of the proteins have found their own way of stabilization, mostly a combination of all the approaches mentioned above. Because of the considerable number of diﬀerent promising approaches that can be used in a more rational way or that make use of more ‘‘irrational’’ methods such as directed evolution, protein stabilization is seen more and more as a solvable task that is done routinely in biochemical laboratories. 3.2

Activity

Compared to stability engineering, engineering of the catalytic properties of an enzyme is still more of an endeavor. The pH activity proﬁle, the speciﬁc activity, the substrate speciﬁcity, or the enantioselectivity are typical properties a protein engineer is interested to change. To improve any of those properties in a rational way, a high resolution 3-D structure (preferably of an enzyme complexed with its substrate, product, or a substrate analog), a good idea about the way the enzyme catalyses the reaction and, as a result of this, the identity of the responsible amino acid residues in the active center with their function during catalysis are required. Computer programs are already able to model a substrate into an active site cleft to generate some ideas about the amino acid residues that are important for substrate binding, stabilization of the transition state of the reaction, and for catalysis of the reaction

10

Lehmann

itself. Having all this information collected, one can try to raise educated ideas about the way the activity proﬁle of an enzyme can be changed. The pH-activity proﬁle, for example, can be changed by altering the pKa value of either a nucleophile or a proton donor of a reaction. This, again, is possible by altering the electrostatic ﬁeld around a titratable group of an enzyme, which is inﬂuenced by the local hydrogen bonding network, by solvent accessibility, and by the neighboring charged groups. Mutations introduced to inﬂuence one of those factors should have an impact on the pH-activity proﬁle of the enzyme. It is thought that the introduction of a negative charge in the surrounding of a titratable group produces an upward shift of the pKa value of a proton donor while the introduction of a positive charge should shift its pKa value downwards. Even this rather simple model is sometimes contradicted by the experiment. Wind et al. (40) inserted an additional positive or negative charge in the active site of cyclodextrin glycosyl transferase. Contrary to expectations, both mutants showed a downward shift of the pHactivity proﬁle. Knowing or guessing the amino acid residues interacting with the substrate during catalysis enables one to speciﬁcally increase or decrease the ﬁt of a substrate into its active site and to attempt to favor one substrate over another by speciﬁcally altering the electrostatic and steric environment inside the active center. Among others, this can lead to an increase in the speciﬁc activity of an enzyme. Sometimes, facilitation of the release of the product increases the speciﬁc activity; however, this is accompanied by a higher Km value most of the time (15). Steric and electrostatic interactions between the substrate and the enzyme are also the engineering targets for improving the enantioselectivity of an enzyme. A successful example was described by Rotticci et al. (41) who were able to double the enantioselectivity of Candida antarctica lipase B toward halohydrins using a modeling algorithm that predicted the structural changes in the active center of possible mutants. Only those mutants were constructed that displayed better interactions with the substrate in energy contour maps. One single-point mutation doubled the enantioselectivity while another mutation annihilated the enantioselectivity toward the target substrate. However, as already mentioned for rational stability engineering, every amino acid substitution can cause adaptations of the entire protein structure to the introduction of a new amino acid residue. This can perturb or even reverse a predicted eﬀect on the enzyme. 4

OUTLOOK

Every amino acid substitution has a more or less pronounced eﬀect on the entire structure of a protein, which means that most of the time its eﬀect

Concepts for Protein Engineering

11

reaches far beyond the actual site of mutation. Therefore it is very diﬃcult to predict accurately the eﬀect a mutation has on an enzyme property of interest. The accurate prediction of such an eﬀect has to take the interactions of the peptide bond backbone and all amino acid residues with each other and with the solvent into account. This huge number of interactions can only be handled by complex computer programs. Yet, protein design algorithms improve rapidly. Special programs are already able to predict stabilizing point mutations with an accuracy of 1 kcal/mol (42,43). These force ﬁeld calculations or similar algorithms are also able to predict the interactions between a substrate and an active site, which makes them attractive for activity engineering. However, as long as these programs cannot meet the accuracy required, directed evolution (44), an approach of mutagenesis and screening, is so far the most successful approach in particular for stability but also for activity engineering; at least as long as the target is to obtain an improved mutant rather than an explanation on how proteins fold or how an enzyme catalyzes a chemical reaction.

ACKNOWLEDGMENT The author wishes to thank Dr. M. Wyss for his critical and fruitful comments and discussions regarding the manuscript.

REFERENCES 1.

2. 3.

4.

5. 6.

7.

D Perl, U Mu¨ller, U Heinemann, FX Schmid. Two exposed amino acid residues confer thermostability on a cold shock protein. Nat Struct Biol 7:380–383, 2000. L Serrano, AG Day, AR Fersht. Step-wise mutation of barnase to binase. J Mol Biol 233:305–312, 1993. X Jiang, J Kowalski, JW Kelly. Increasing protein stability using a rational approach combining sequence homology and structural alignment: Stabilizing the WW domain. Protein Sci 10:1454–1465, 2001. B Steipe, B Schiller, A Plu¨ckthun, S Steinbacher. Sequence statistics reliably predict stabilizing mutations in a protein domain. J Mol Biol 240:188–192, 1994. E Ohage, B Steipe. Intrabody construction and expression: I. The critical role of VL domain stability. J Mol Biol 291:1119–1128, 1999. S Ewert, A Honegger, A Plu¨ckthun. Structure-based improvement of the biophysical properties of immunoglobulin VH domains with a generalizable approach. Biochemistry 42:1517–1528, 2003. A Knappik, L Ge, A Honegger, P Pack, M Fischer, G Wellnhofer, A Hoess, J Wo¨lle, A Plu¨ckthun, B Virneka¨s. Fully synthetic human combinatorial antibody

12

8. 9.

10.

11.

12.

13.

14.

15.

16. 17.

18.

19.

20.

Lehmann libraries (HuCAL) based on modular consensus frameworks and CDRs randomized with trinucleotides. J Mol Biol 296:57–86, 2000. Q Wang, AM Buckle, NW Foster, CM Johnson, AR Fersht. Design of highly stable functional GroEL minichaperones. Protein Sci 8:2186–2193, 1999. M Lehmann, D Kostrewa, M Wyss, R Brugger, A D’Arcy, L Pasamontes, APGM van Loon. From DNA sequence to improved functionality: using protein sequence comparisons to rapidly design a thermostable consensus phytase. Protein Eng 13:49–57, 2000. M Lehmann, C Loch, A Middendorf, D Studer, SF Lassen, L Pasamontes, APGM van Loon, M Wyss. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng 15:403–411, 2002. M Lehmann, L Pasamontes, SF Lassen, M Wyss. The consensus concept for thermostability engineering of proteins. Biochim Biophys Acta 1543:408–415, 2000. M Lehmann, M Wyss. Engineering proteins for thermostability: the use of sequence alignments versus rational design and directed evolution. Curr Opin Biotechnol 12:371–375, 2001. A Kohl, HK Binz, MT Stumpp, A Plu¨ckthun, MG Gru¨tter. Designed to be stable: crystal structure of a consensus ankyrin repeat protein. Proc Natl Acad Sci USA 100:1700–1705, 2003. A Tomschy, M Wyss, D Kostrewa, K Vogel, M Tessier, S Ho¨fer, H Bu¨rgin, A Kronenberger, R Re´my, APGM van Loon, L Pasamontes. Active site residue 297 of Aspergillus niger phytase critically aﬀects the catalytic properties. FEBS Lett 472:169–172, 2000. A Tomschy, M Tessier, M Wyss, R Brugger, C Broger, L Schnoebelen, APGM van Loon, L Pasamontes. Optimization of the catalytic properties of Aspergillus fumigatus phytase based on the three-dimensional structure. Protein Sci 9: 1304–1311, 2000. C Cunningham, JA Wells. High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. Science 244:1081–1085, 1989. TV Borchert, SF Lassen, A Svendsen, HB Frantzen. Oxidation stable amylases for detergents. In: SB Petersen, B Svensson, S Petersen, eds. Carbohydrate Bioengineering. Amsterdam: Elsevier Science, 1995, pp 175–179. M Wyss, L Pasamontes, A Friedlein, R Re´my, M Tessier, A Kronenberger, A Middendorf, M Lehmann, L Schnoebelen, U Ro¨thlisberger, E Kusznir, G Wahl, F Mu¨ller, H-W Lahm, K Vogel, APGM van Loon. Biophysical characterization of fungal phytases (myo-inositol hexakisphosphate phosphohydrolases): molecular size, glycosylation pattern, and engineering of proteolytic resistance. Appl Environ Microbiol 65:367–373, 1999. M Karpusas, WA Baase, M Matsumura, BW Matthews. Hydrophobic packing in T4 lysozyme probed by cavity-ﬁlling mutants. Proc Natl Acad Sci USA 86:8237–8241, 1989. VGH Eijsink, G Vriend, JR van der Zee, B van den Burg, G Venema. Increasing the thermostability of the neutral proteinase of Bacillus stearothermophilus by improvement of internal hydrogen-bonding. Biochem J 285:625–628, 1992.

Concepts for Protein Engineering

13

21. B van den Burg, BW Dijkstra, G Vriend, B van der Vinne, G Venema, VHG Eijsink. Protein stabilization by hydrophobic interactions at the surface. Eur J Biochem 220:981–985, 1994. 22. K Ishikawa, H Nakamura, K Morikawa, S Kanaya. Stabilisation of Escherichia coli ribonuclease HI by cavity-ﬁlling mutations within a hydrophobic core. Biochemistry 32:6171–6178, 1993. 23. M Matsumura, G Signor, BW Matthews. Substantial increase of protein stability by multiple disulﬁde bonds. Nature 342:291–293, 1989. 24. K Watanabe, T Masuda, H Ohashi, H Mihara, Y Suzuki. Multiple proline substitutions cumulatively thermostabilize Bacillus cereus ATCC 7064 oligo-1,6glucosidase. Irrefragable proof supporting the proline rule. Eur J Biochem 226: 277–283, 1994. 25. I Margarit, S Campagnoli, F Frigerio, G Grandi, V De Filippis, A Fontana. Cumulative stabilizing eﬀects of glycine to alanine substitutions in Bacillus subtilis neutral protease. Protein Eng 5:543–550, 1992. 26. Y Suzuki, K Hatagaki, H Oda. A hyperthermostable pullulanase produced by an extreme thermophile, Bacillus ﬂavocaldarius KP 1228, and evidence for the proline theory of increasing protein thermostability. Appl Microbiol Biotechnol 34:707–714, 1991. 27. BW Matthews, H Nicholson, WJ Becktel. Enhanced protein thermo-stability from site-directed mutations that decrease the entropy of unfolding. Proc Natl Acad Sci USA 84:6663–6667, 1987. 28. MJ Allen, PM Coutinho, CF Ford. Stabilization of Aspergillus awamori glucoamylase by proline substitution and combining stabilizing mutations. Protein Eng 11:783–788, 1998. 29. B van den Burg, G Vriend, OR Veltman, G Venema, VGH Eijsink. Engineering an enzyme to resist boiling. Proc Natl Acad Sci USA 95:2056–2060, 1998. 30. VGH Eijsink, OR Veltman, W Aukema, G Vriend, G Venema. Structural determinants of the stability of thermolysin-like proteinases. Nat Struct Biol 2:374– 379, 1995. 31. B Dalhus, M Saarinen, UH Sauer, P Eklund, K Johansson, A Karlsson, S Ramaswamy, A Bjork, B Synstad, K Naterstad, R Sirevag, H Eklund. Structural basis for thermophilic protein stability: structures of thermophilic and mesophilic malate dehydrogenase. J Mol Biol 318:707–721, 2002. 32. G Saab-Rinco´n, VR Jua´rez, J Osuna, F Sa´nchez, X Sobero´n. Diﬀerent strategies to recover the activity of monomeric triosephophate isomerase by directed evolution. Protein Eng 14:149–155, 2001. 33. H Nicholson, WJ Becktel, BW Matthews. Enhanced protein thermostability from designed mutations that interact with a-helix dipoles. Nature 336:651–656, 1988. 34. L Serrano, AR Fersht. Capping and a-helix stability. Nature 342:296–299, 1989. 35. S Walter, B Hubner, U Hahn, FX Schmid. Destabilization of a protein helix by electrostatic interactions. J Mol Biol 252:133–143, 1995. 36. X-J Zang, WA Baase, BW Matthews. Multiple alanine replacements within ahelix. Protein Sci 1:761–776, 1992.

14

Lehmann

37. G Jones, P Willett, RC Glen, AR Leach, R Taylor. Development and validation of a genetic algorithm for ﬂexible docking. J Mol Biol 267:727–748, 1997. 38. DS Goodsell, GM Morris, AJ Olson. Automated docking of ﬂexible ligands: applications of AutoDock. J Mol Recognit 9:1–5, 1996. 39. M Rarey, B Kramer, T Lengauer, G Klebe. A fast ﬂexible docking method using an incremental construction algorithm. J Mol Biol 261:470–489, 1996. 40. RD Wind, JC Uitdehaag, RM Buitelaar, BW Dijkstra, L Dijkhuizen. Engineering of cyclodextrin product speciﬁcity and pH optima of the thermostable cyclodextrin glycosyltransferase from Thermoanaerobacterium thermosulfurigenes EM1. J Biol Chem 273:5771–5779, 1998. 41. D Rotticci, JC Rotticci-Mulder, S Denman, T Norin, K Hult. Improved enantioselectivity of a lipase by rational protein engineering. Chem Biol Chem 2:766– 770, 2001. 42. BI Dahiyat. In silico design for protein stabilization. Curr Opin Biotechnol 10: 387–390, 1999. 43. SM Malakauskas, SL Mayo. Design, structure and stability of a hyperthermophilic protein variant. Nat Struct Biol 5:470–475, 1998. 44. JC Moore, FH Arnold. Directed evolution of a para-nitrobenzyl esterase for aqueous-organic solvents. Nat Biotechnol 14:458–467, 1996.

2 Sequence Families and Modular Organization of Carbohydrate-Active Enzymes* Bernard Henrissat and Emeline Deleury Centre National de la Recherche Scientifique (CNRS), ´s d’Aix-Marseille I and II Universite Marseille, France

Pedro M. Coutinhosuy ´cnico Instituto Superior Te Lisbon, Portugal

Gideon J. Davies University of York York, England

1

INTRODUCTION

Carbohydrates, in the form of oligo- and polysaccharides, are universally found in nature. These compounds are elaborated from simple sugars by gly* Dedicated to the memory of Martin Schu¨lein. y Current aﬃliation: Centre National de la Recherche Scientiﬁque (CNRS), Universite´s d’AixMarseille I and II, Marseille, France

15

16

Henrissat et al.

cosyltransferases (GTs) and are degraded by glycoside hydrolases (GHs) and polysaccharide lyases (PLs). In this work, these biosynthesis and degradative enzymes are referred to as ‘‘carbohydrate-active enzymes.’’ Large amounts of polysaccharide-based compounds are biosynthesized each year on Earth (for cellulose alone this amounts to over 109 t/year), mostly from photosynthesis. Because of this abundance, carbohydrate-based materials have long found applications as raw materials. Today, they are used in various industries, such as food, feed, paper, detergent, and textile, where there is a large scope for application of degradative enzymes to improve the properties of carbohydrate-based materials or to achieve their degradation into simple and fermentable sugars. In other words, carbohydrate-active enzymes are key enzymes for the clean processing of abundant and useful renewable resources. Because of the many possible isomers of a simple monosaccharide, there is an enormous chemical diversity of structures in oligo- and polysaccharides (1). In addition, carbohydrate structures are sometimes ‘‘decorated’’ by noncarbohydrate substituents such as various esters. Finally, chemically simple homopolysaccharides, such as cellulose, chitin, or starch, display extensive physical diversity and heterogeneity depending on the source, due to the diﬀerence in aggregation states or crystallinity. The physical state of the substrate has severe consequences as, e.g., some enzymes are able to degrade the noncrystalline part only, while others can aﬀect the crystalline part of the substrate (2). This immense chemical and physical diversity requires a corresponding level of diversity in enzymes for the selective biosynthesis and biodegradation of carbohydrate structures. The diversity of these enzymes has long been the source of the problem in their classiﬁcation. 2

THE CLASSIFICATION OF CARBOHYDRATE-ACTIVE ENZYMES

Several criteria can be envisioned for the classiﬁcation of carbohydrate-active enzymes. The simplest form of classiﬁcation is based on their substrate speciﬁcities. Such a classiﬁcation is the basis of the recommendations of the International Union of Biochemistry and Molecular Biology (IUBMB) (3) and is expressed in the EC number for a given enzyme. O-Glycoside hydrolases are given the code EC 3.2.1.x, where the last digit represents the substrate speciﬁcity. For example, h-glucosidase is EC 3.2.1.21, while h-galactosidase is EC 3.2.1.23. The advantage of this system is its simplicity, which has led to its widespread usage. The intrinsic problem with a classiﬁcation based on substrate (or product) speciﬁcity is that it does not appropriately accommodate enzymes that act on several substrates. This is particularly relevant to glycosidases,

Carbohydrate-Active Enzymes

17

which work on the highly complex polysaccharides and which frequently display broad, overlapping, speciﬁcities. For instance, endoglucanases, while typically considered to be cellulases, are also active to various degrees on xylan, xyloglucan, h-glucan, and various artiﬁcial substrates. Recently, there have been many structural results on glycoside hydrolases, but the classiﬁcation based on substrate speciﬁcity fails to reﬂect the three-dimensional structural features of these enzymes, which are now becoming apparent. In another example, the IUBMB classiﬁcation assigns cyclodextrin glucanotransferases (EC 2.4.1.19) and starch branching enzymes (EC 2.4.1.18) in the transferases class and does not reﬂect the clear structural, evolutionary, and mechanistic relationship of these enzymes with a large family of starchhydrolysing enzymes (4). Similarly, myrosinase, an enzyme hydrolysing a particular series of S-glucosides, is classiﬁed as EC 3.2.3.1 (thio-glucosidase) and yet, has a sequence, molecular mechanism, and 3-D structure strikingly similar to that of an O-h-glucosidase (EC 3.2.1.21) (5,6). Conversely, many structurally unrelated enzymes such as endoglucanases display a similar substrate speciﬁcity and hence, identical IUBMB classiﬁcation (for a review, see Ref. 7). Finally, with the ﬂood of sequence data originating from molecular biology and recently from genome sequencing projects, it is now common to discover open reading frames which show similarities to known glycoside hydrolase sequences, but without knowledge of (or the means to readily determine) the substrate speciﬁcity. To circumvent the problems with the EC classiﬁcation, we proposed a novel system in 1991 for the glycosidases (8). This system is based on a direct relationship between sequence and folding similarities (9). Consequently a classiﬁcation solely based on amino acid sequence similarities was proposed. It was anticipated that the system would prove useful with the fast-growing number of glycosidase genes being sequenced and with the increasing number of 3-D structures being solved. The basic principle behind this new classiﬁcation system is simple: regardless of activity and substrate speciﬁcity, sequences which would display similarity would be grouped in the same family, while sequences displaying no apparent similarity would be assigned to diﬀerent families (8). The 300 sequences of glycosidases and related enzymes available in 1991 were observed to form 35 families (8). The earliest feature that appeared from the sequence-based families is that many were ‘‘polyspeciﬁc,’’ i.e., they contained enzymes of diﬀerent substrate speciﬁcity (e.g., containing several EC numbers). The family that groups the largest number of EC numbers is family GH13 (also known as the a-amylase superfamily), which contains almost 30 enzyme speciﬁcities (for a review, see Ref. 10). A number of other families contain more than two distinct EC numbers, such as family GH1 (grouping eight EC numbers), family GH16 (seven EC numbers),

18

Henrissat et al.

families GH3 and GH32 (six EC numbers), etc. Several of the ‘‘monospeciﬁc’’ families (i.e., containing only one EC number) could turn out to be ‘‘polyspeciﬁc’’ when all members are characterized at the biochemical level. The existence of a number of polyspeciﬁc families indicates: (1) that the acquisition of new speciﬁcities by glycosidases is a common evolutionary event, (2) that the substrate speciﬁcity of glycosidases could be engineered for application purposes, and (3) that the substrate (or product) speciﬁcity of a glycosidase is deﬁned by details of the 3-D structure, not by the global fold. Several consequences emerged from the new classiﬁcation system. First, because sequence similarity is a strong indication of folding similarities, members of a given family were predicted to share the same fold. This would facilitate the homology modeling of other family members when the 3D structure of one member is determined. This may also help the structural biologist to choose a target for structural investigation because there is more potential for discovery in the resolution of a potentially new structure than of a structure that could be predicted. Second, because the catalytic apparatus is expected to be conserved within each family, an important outcome of the new classiﬁcation system is the ability to locate the potential active site residues within a family based on the identiﬁcation of appropriate invariant residues or based on the prior experimental identiﬁcation of a catalytic residue. Equally important was the observation of Gebler et al. that the molecular mechanism is strictly conserved within a given glycosidase family (11). Fig. 1 shows the two primary mechanisms of glycosidases. With no exception to date, all the active members of a given family operate by using the same mechanism. It is worth mentioning here that some family members lost their catalytic machinery and assumed new roles (amino acid transporters, lectins, signaling sensors, inhibitors, etc.), pointing to the versatility of the glycosidase scaﬀolds for the development of new functionalities. The number of glycosidase families (35 in 1991), being necessarily a mere consequence of the size of the initial sequence sample, was predicted to increase as more sequences would become available (8). The exponential growth in sequences in public databases indeed led to a steady growth in the number of GH families, with 300 sequences and 35 families in 1991 (8), 480 sequences and 45 families in 1993 (12), 2800 sequences in 74 families in April 1999 (13) and 6500 sequences in 87 families as of May 2002. Given the success of the new classiﬁcation system of glycosidases— which eventually became a standard of description of this category of enzymes—a similar strategy was applied to other carbohydrate-active enzymes, such as the glycosyltransferases (14). Again, most features of the glycosidase classiﬁcation were found. In particular, several families were found to contain

Carbohydrate-Active Enzymes

19

Figure 1 The two canonical mechanisms of glycosidases (shown here for the hydrolysis of an equatorial glycosidic bond): (a) the retaining mechanism, (b) the inverting mechanism.

enzymes of varying donor and acceptor speciﬁcity. For instance, families GT1 and GT2 each contain eight characterized enzyme speciﬁties. Because the biochemical characterization of glycosyltransferases is notoriously diﬃcult, only a very small number of them are characterized, and it is more than likely that each of these families actually contains dozens of other enzyme speciﬁcities. Here the power of the sequence-based classiﬁcation is maximal, considering that some glycosyltransferase families such as family GT2 contain over 1200 members, of which less than 10% have been characterized. More recently, the sequence-based classiﬁcation was extended to the polysaccharide lyases and the carbohydrate esterases. In theory, other carbohydrate-active enzymes, such as the epimerases for instance, could be subjected to a similar classiﬁcation system.

20

3

Henrissat et al.

THE CARBOHYDRATE-ACTIVE ENZYMES SERVER

The main issues in maintaining carbohydrate-active family classiﬁcations are as follows: (1) how to make them available to the scientiﬁc community, (2) how to disclose the new families and new family members, and (3) how to keep up with the ever-increasing number of sequences. The World Wide Web is obviously the best medium for the distribution of family classiﬁcations. For carbohydrate-active enzymes, the ﬁrst milestone was achieved in 1996 with the availability of the GH classiﬁcation on ExPASy (http://www. expasy.ch/ cgi-bin/lists?glycosid.txt) (15). However, this useful document suﬀered from containing annotated SwissProt entries only, thereby missing a large number of entries already available in GenBank, and the information on the threedimensional structures in the Protein Data Bank (PDB). Other drawbacks were as follows: (1) the irregular updates; (2) the impossibility of performing family-by-family browsing, and (3) the unavailability of family classiﬁcations of other carbohydrate-active enzymes such as glycosyltransferases. To overcome some of these problems, we created the Carbohydrate Active enZYmes server (CAZy, http://afmb.cnrs-mrs.fr/CAZY/index.html) to provide access to the classiﬁcations of GHs, GTs, and PLs in families based on sequence similarities (13). CAZy grants access to the various families of carbohydrateactive enzymes. Each family is annotated with information regarding all the enzyme activities that have been characterized and with the known catalytic and structural features. This summary is followed by a list of proteins and open reading frames (ORFs) belonging to the family with links to sequence and structural information available in public databases. Links to complementary relevant resources available on the Internet are also provided. An example is shown in Fig. 2. Because information on the repertoire of carbohydrate-active enzymes present in a given organism can provide interesting insights on its carbohydrate metabolism (16), a new feature was recently added where the user can access CAZy via organism (for organisms whose genome has been completely sequenced; see, example in Fig. 3). Based on a relational database, CAZy provides curated nonredundant sequence and structural information on carbohydrate-active enzyme families to the academic and commercial research communities. As of May 2002, the CAZy database contained over 12,500 proteins and ORFs belonging to more than 3200 organisms. The proteins are arranged in 200 families and cover 180 EC numbers (note that many enzyme speciﬁcities are not covered by the EC numbers). The CAZy web site features 280 HTML pages with almost 49,000 external links. The carbohydrate-active enzyme content of 53 complete genomes is available. Over half a million pages have been downloaded from CAZy externally since its launch in September 1998. The server is regularly updated, generally at least once a month. During this period, the number of single entries covered by CAZy increased fourfold!

Carbohydrate-Active Enzymes

21

Figure 2 Example of a CAZy page: family GH13 of the glycosidases. The header is a resume of what is known in this family. The ‘‘known activities’’ ﬁeld shows that no less than 19 diﬀerent enzyme activities have been experimentally identiﬁed in this family. Other ﬁelds indicate, for example, that the molecular mechanism leads to overall retention of the anomeric conﬁguration and that the catalytic residues have been identiﬁed. The ‘‘statistics’’ ﬁeld shows, among other data, that there were 942 members in the family (as of 13 May 2002) and that 34 have had their 3-D structure solved (resulting in a total of 119 PDB ﬁles). After the header, a listing (as complete as possible) of the proteins and ORFs assigned to this family is given with links to protein, nucleotide, enzyme classiﬁcation, and structure databases.

4

THE PREDICTIVE POWER OF THE NEW CLASSIFICATION SYSTEM

As stated earlier, the mechanism, catalytic residues, and fold are conserved within each family. In consequence, it is now necessary to determine the stereochemical outcome or to identify the catalytic apparatus of one family member as this information can be readily extended to all members of the

22

Henrissat et al.

Carbohydrate-Active Enzymes

23

Figure 4 Molecular mechanism in the families of glycosidases (May 2002). The families which operate with a mechanism leading to overall retention of the anomeric conﬁguration are indicated in black on a gray backgound. The families which act with an inverting mechanism are indicated in white on a black background. Those families for which the mechanism remains to be established are presented in gray on a white background.

family, and those missing the catalytic machinery can be identiﬁed (and corrected if sequencing errors are the cause (17)). The molecular mechanism is now known for 64 of the 87 families of glycosidases (Fig. 4). When the classiﬁcation was introduced in 1991, only a handful of families had a structural representative. As expected from the relationship between sequence and structure, the recent accumulation of structural data for glycosidases (over 1600 PDB entries are listed in CAZy as of May 2002) conﬁrmed that enzymes belonging to the same family indeed had a similar fold. More unexpected (and exciting) was the astonishing number of diﬀerent

Figure 3 Access to carbohydrate-active enzymes by completely sequenced organism. (Top) The 62 organisms available from CAZy as of 13 May 2002. (Bottom) Results page for Thermotoga maritima.

24

Henrissat et al.

folds displayed by glycosidases—from all a to all h, (h/a)8-, (a/a)6- and (a/ a)7-barrels, jelly-rolls, h-propellers, h-barrels, h-helix, etc. (7,18,19)—a diversity largely exceeding the level known for esterases or peptidases, for example Fig. 5 shows that about half of the glycosidase families have at least one structural representative (May 2002). It is possible that other folds are yet to be discovered in the unresolved families. The clans of glycosidases (7,15,18) group together families sharing a common ancestor. When two proteins have related sequences, their 3-D structures are related. However, the opposite is not true because 3-D structures are better conserved than the sequences. Sometimes, one can predict that several families will fold similarly, usually by increasing the sensitivity of sequence comparison methods (see, e.g., Ref. 20). Because this implies detecting relatedness at the borderline of signiﬁcance, structure determination is clearly the method of choice to unambiguously establish that some of the sequence-based families are related. By analogy to the

Figure 5 3-D structures in the families of glycosidases (May 2002). The families for which at least one 3-D structure has been deposited in the Protein Data Bank appear in white on a black background; those for which crystallization notes have been published appear in black on a pale gray background. Those for which there is no 3D structural data available are shown in gray on a white background.

Carbohydrate-Active Enzymes

25

proteinase work (21) and to avoid the confusion associated with the term ‘‘superfamily,’’ it was proposed that these groupings of related structures be referred to as ‘‘clans’’ (7,15,18). Thus far, 12 such clans have been described (an updated list of these clans can be found in the CAZy server). The largest of these, glycoside hydrolase clan GH-A, is composed of families 1, 2, 5, 10, 17, 26, 35, 39, 42, 51, 53, 59, 72, 79, and 86. What are the characteristics of a clan? Besides a common fold, the clans group families of enzymes sharing an identical catalytic machinery (identical residues on equivalent secondary structure elements) and hence an identical catalytic machinery. There is residual sequence similarity, sometimes detectable but too low to produce a global alignment (only two residues are invariant in clan GH-A for instance). Finally, there is a topological resemblance of the substrates (orientation of the glycosidic bond; see Fig. 6). These features make the clans diﬀerent from the ‘‘folding/structural superfamilies,’’ which group together proteins sharing the same fold and whose common origin is hard to demonstrate (22). A fundamental basis for a useful classiﬁcation requires that it must have predictive power. For instance, member-

Figure 6 Topological resemblance of the substrates hydrolyzed by glycosidase clan GH-A members. h-D- and a-L-hexosides and pentosides all share an identical reactive center (traced in black) with an equatorial glycosidic bond. An identical catalytic machinery can cleave these apparently dissimilar substrates.

26

Henrissat et al.

ship of a folding superfamily does not necessarily predict the details of the substrate, the mechanism, the catalytic amino acids nor the possibility of side reactions. The power of the sequence-based classiﬁcation stems from the fact that family or clan membership predicts the enzyme structure and the conﬁguration of both substrate and product. By contrast, folding superfamilies sometimes group enzymes operating with diﬀerent mechanisms, or sometimes, even enzymes performing totally unrelated chemical reactions (22). 5

CARBOHYDRATE-ACTIVE ENZYMES IN THE ERA OF GENOMICS

A total of 85 organisms have had their genome fully sequenced (May 2002) and sequences of over 350 genomes are currently under preparation (Genome Online Database; http://wit.integratedgenomics.com/GOLD/). The predictive power of the sequence-based families of carbohydrate-active enzymes provides an eﬃcient tool for the competent annotation (e.g., prediction of the function, fold, and mechanism) of ORFs found during genome sequencing. With the availability of a number of completely sequenced genomes, one can also search and make a census of all carbohydrate-active enzymes contained in a genome. As soon as this is performed for several genomes, the complement of carbohydrate-active enzymes within diﬀerent genomes can be compared. However, before entering into these considerations, we must examine

Figure 7 Schematic structure of selected proteins containing a carbohydratebinding modules of family CBM2. GHX, module belonging to glycosidase family X; PLX, module belonging to polysaccharide lyase family X; CEX, module belonging to carbohydrate esterase family X; CBMX, module belonging to carbohydrate-binding module family X; FN3, modules distantly related to eukaryotic ﬁbronectin type III modules; X, modules of unknown function with homologues in the databases; unlabeled gray boxes represent regions not yet assigned; TM, membrane-spanning region. (a) Endo-1,4-glucanase (Acidothermus cellulolyticus); (b) endo-1,4-glucanase (Streptomyces lividans); (c) cellulase Cel6B (Cellulomonas ﬁmi); (d) cellulase Cel6A (C. ﬁmi); (e) cellulase Cel9A (C. ﬁmi); (f) ORF Slr0897 (Synechocystis sp.); (g) xylanase Xyn10A (C. ﬁmi); (h) xylanase Xyn10A (Pseudomonas cellulosa); (i) xylanase B (S. lividans); (j) endo-1,4-glucanase (S. lividans); (k) chitinase (Bacillus thuringiensis); (l) chitinase C (Streptomyces coelicolor); (m) cellulase Cel45A (P. cellulosa); (n) cellulase Cel48A (C. ﬁmi); (o) cellulase Cel48A (Thermobiﬁda fusca); (p) arabinofuranosidase C (P. cellulosa); (q) ORF SC5C7.30c (S. coelicolor); (r) ORF Rv1987 (Mycobacterium tuberculosis); (s) pectate lyase Pel10A (P. cellulosa); (t) rhamnogalacturonan lyase Rgl11A (P. cellulosa); (u) esterase D (P. cellulosa); (v) acetyl xylan esterase STX-III (Streptomyces thermoviolaceus); (w) xylanase Xyn11A (C. ﬁmi); (x) chitin-binding protein celS2 (Streptomyces viridosporus).

Carbohydrate-Active Enzymes

27

28

Henrissat et al.

one particular feature of crucial importance for the genomic analysis of carbohydrate-active enzymes: the modularity of these proteins. Many carbohydrate-active enzymes are modular, consisting of one or more catalytic domains carrying one or several noncatalytic domains. The noncatalytic modules often have a function in carbohydrate-binding (23), but sometimes their function is to promote protein–protein interaction, such as the dockerin modules implicated in the assembly of the multisubunit cellulosomes (24). In many cases, additional modules have been inferred by sequence analysis only, and their function remains to be studied and described (23). The noncatalytic modules also form distinct families, and those whose function has been shown to be carbohydrate-binding can now also be readily accessed through the CAZy server (http://afmb.cnrs-mrs.fr/ CAZY/CBM.html). A particular feature of non-catalytic modules (whether carbohydratebinding or not) is that they can be attached to many diﬀerent types of catalytic domains (Figs. 7, 8, and 9). It is essential to understand and dissect the modularity of any given ORF prior to annotation, classiﬁcation, or exploitation. Failure to appreciate its modularity is often the cause of many incorrect genome annotations, such as the labeling of certain Arabidopsis ORFs as ‘‘h-1,3-glucanase-like’’ when they are merely small noncatalytic modules that most likely bind h-1,3-glucans (25). It is also interesting to note that protein modularity is not restricted to glycoside hydrolases, and that a number of modular glycosyltransferases, carbohydrate esterases, and polysaccharide lyases have been identiﬁed (Figs. 7, 8, and 9). In summary, two major problems associated with carbohydrate-active enzymes must be appropriately dealt with to avoid erroneous genome annotations: (1) the modularity of carbohydrate-active enzymes and (2) the polyspeciﬁcity of the families. Failure to take these aspects into consideration can lead to: Wrong assignment. This frequently happens in the case of modular proteins (see above). Overprediction. For instance, only a avery few residues sometimes switch the speciﬁcity of a glycosyltransferase. In an extreme case, it has been shown that a mutation of a single residue could change an a-1,3-GalNAc transferase into an a-1,3-galactosyltransferase (26). When an ORF is distantly related to a largely polyspeciﬁc family, or to a family where only a very few members have been characterized, the precise substrate speciﬁcity cannot be reliably predicted. Conversely, a good ﬁt between an ORF and many members of a large monospeciﬁc family (where many members have been characterized) allows a more conﬁdent prediction of the speciﬁcity. Whenever possible, additional features should be examined, for instance, the presence of the catalytic residues. In a small but signiﬁcant number of cases,

Figure 8 Schematic structure of selected proteins containing a carbohydratebinding modules of family CBM13. GHX, module belonging to glycosidase family X; GTX, module belonging to glycosyltransferase family X; PLX, module belonging to polysaccharide lyase family X; CEX, module belonging to carbohydrate esterase family X; CBM X, module belonging to carbohydrate-binding module family X; unlabeled gray boxes represent regions not yet assigned; RIP, ribosome inactivating protein; KIN, kinase; S2A and M12A, peptidases belonging to families S2A and M12A. (a) Cinnamomin (Cinnamomum camphora); (b) polypeptide GalNAc transferase T1 (human); (c) ORF Rv1419 (Mycobacterium tuberculosis); (d) h-1,3glucanase I (Oerskovia xanthineolytica); (e), h-1,3-glucanase II (O. xanthineolytica); (f) ORF 6 (Polyangium cellulosum); (g) Pectate lyase B (Pseudoalteromonas haloplanktis); (h), ORF CAP0120 (Clostridium acetobutylicum); (i) ORF CAC0706 (C. acetobutylicum); (j) ORF CAP0071 (C. acetobutylicum); (k) a-galactosidase (Aspergillus niger); (l) xylanase (Streptomyces olivaceoviridis); (m) ORF SCD69.08 (Streptomyces coelicolor); (n) chitinase Chi35 (Streptomyces thermoviolaceus); (o) arabinofuranosidase B (S. coelicolor); (p) serine protease (Rarobacter faecitabidus); (q) protease (Chryseobacterium meningosepticum).

Figure 9 Schematic structure of selected proteins containing a bacterial dockerin module. DOC1, bacterial dockerin module; COH, bacterial cohesin module; GHX, module belonging to glycosidase family X; PLX, module belonging to polysaccharide lyase family X; CEX, module belonging to carbohydrate esterase family X; CBM X, module belonging to carbohydrate-binding module family X; X, modules of unknown function with homologues in the databases; unlabeled gray boxes represent regions not yet assigned. (a) ORF CAC0912 (Clostridium acetobutylicum); (b) mannanase ManK (Clostridium cellulolyticum); (c) endo-1,4-glucanase C (C. cellulolyticum); (d) scaﬀoldin CipV (Acetivibrio cellulolyticus); (e) cellulase CelE (C. cellulolyticum); (f) xylanase C (Clostridium thermocellum); (g) xylanase Y/feruloyl esterase (C. thermocellum); (h) xylanase A (C. thermocellum); (i) lichenase B (C. thermocellum); (j) xylanase/lichenase D (Ruminococcus ﬂavefaciens); (k) chitinase (C. thermocellum); (l) h-mannanase Man26B (C. thermocellum); (m) cellulase/mannanase Cel26A-Cel5E (C. thermocellum); (n) a-galactosidase (Clostridium josuii); (o) cellulase CelF (C. cellulolyticum); (p) cellulase CelJ (C. thermocellum); (q) ORF CAC0919 (C. acetobutylicum); (r) pectate lyase A (Clostridium cellulovorans); (s) ORF Y-P (C. cellulolyticum); (t) xylanase B (R. ﬂavefaciens).

Carbohydrate-Active Enzymes

31

members of a glycosidase family appear to lack the catalytic residues identiﬁed in other members. Aside from the always possible sequencing error, other reasons exist—such as the evolution of non-catalytic proteins from enzymes by loss of the catalytic machinery (see, for instance, Ref. 25). Underprediction. Because of the problems cited above, annotators sometimes assume the other extreme view, and it is not infrequent to ﬁnd uninformative annotations such as ‘‘putative sugar hydrolase.’’ Here the inherent characteristics of the sequence-based families could improve annotation because, e.g., one could predict whether this is a noncatalytic protein or an enzyme, and if so, whether it operates with retention or inversion of the anomeric conﬁguration or whether it would hydrolyze an axial or an equatorial glycosidic bond. Again, if the ORF to annotate is strongly related to a large monospeciﬁc and well-characterized family, then a precise annotation becomes possible. In doubtful cases, an annotation such as ‘‘member of glycosidase family GH5’’ (for example) would be much more informative than ‘‘putative sugar hydrolase.’’ In our eﬀorts to update and maintain the CAZy server (where the family assignments are based on catalytic modules), we started to examine the carbohydrate-active enzyme content of genomes. Some global results are given as follows. 5.1

Eukaryotes

As of January 2002, ﬁve eukaryotic genomes are available: Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, and Man. Of these organisms, Arabidopsis has, by far, the largest number of carbohydrate-active enzymes with 386 GHs and 411 GTs (16). The total (almost 800) exceeds 3% of the coding regions of the genome. In comparison, S. cerevisiae, D. melanogaster, and C. elegans have only f90, f230, and f320 ORFs coding for proteins related to glycosidases and glycosyltransferases. The human genome, with about 350 of these proteins, is not much more impressive than that of the nematode C. elegans. 5.2

Archaea

Twelve archaeal genomes were available at the time of writing. One surprising ﬁnding was that the genomes of three of them (Aeropyrum pernix, Archaeoglobus fulgidus,and Methanobacterium thermoautotrophicum) appear to completely lack glycosidases (27). This puzzling observation suggests any of the following possibilities: (1) that the metabolism of these organisms does not involve the degradation of glycosides, (2) that these organisms have developed an alternative chemistry to perform this reaction, (3) that these organisms have glycosidases which are so diﬀerent from the known ones that

32

Henrissat et al.

they have not been identiﬁed, or (4) perhaps that these organisms rely on other organisms for the hydrolysis of glycosidic bonds. The last three possibilities are unlikely because these three Archaea do not grow on sugars as a carbon source. The nine other Archaea examined do have glycosidases, but these were clearly acquired from hyperthermophilic bacteria by horizontal transfer. Therefore, it is tempting to speculate that early Archaea developed before the emergence of metabolic pathways involving the degradation of glycosidic bonds. 5.3

Bacteria

Regardless of the size of their genomes, all free-living bacteria have about 1– 2% of their coding regions dedicated to glycosidases and glycosyltransferases. The only outlier is Thermotoga maritima, whose genome contains about 3% glycosidases and glycosyltransferases. It is interesting to note that a large number of the glycosidases of T. maritima are involved in plant cell wall degradation. Bacteria which only grow as parasites/pathogens of eukaryotic cells have a much reduced content in glycosidases (Helicobacter pylori, Mycobacterium leprae, Neisseria meningitidis) or sometimes show no glycosidase at all (for instance, Campylobacter jejuni), illustrative of the loss of complete metabolic pathways in parasitic organisms. The discovery potential of genomic research in the search for enzymes is considerable and one may even ﬁnd useful enzymes from organisms which apparently do not express the desired activity. Here we must mention as a striking example the presence of the complete operon to make cellulose in the genomes of Escherichia coli and Salmonella typhimurium. Yet, these bacteria are notorious for their inability to biosynthesize cellulose. It is only recently that researchers have found that these bacteria indeed can synthesize cellulose under appropriate conditions (28).

6

CONCLUSION

The increasing insight provided by the sequence families of carbohydrateactive enzymes is breaking up the traditional EC class system. Because the sequence-based system allows the inference of structural and mechanistic relationships between enzymes of diﬀering substrate speciﬁcity, it paves the way for protein engineering, directed evolution, and the development of new functionalities on common and stable scaﬀolds. It is clear that other enzyme systems can beneﬁt from a similar approach, and an excellent example is provided by the proteolytic enzymes. A catalogue and a structure-based classiﬁcation of these enzymes is readily available from the MEROPS database (29). In the genomic era, such structure-based classiﬁcation systems provide the best possible tools for the appropriate annotation of genome data.

Carbohydrate-Active Enzymes

33

ACKNOWLEDGMENTS The authors are particularly grateful to Amos Bairoch (Switzerland), James A. Campbell (Australia), and R. Antony Warren (Canada) for many useful discussion throughout the years.

REFERENCES 1.

2. 3.

4.

5.

6.

7. 8. 9. 10.

11.

12. 13.

RA Laine. A calculation of all possible oligosaccharide isomers both branched and linear yields 1.051012 structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems. Glycobiology 4:759–767, 1994. TT Teeri. Crystalline cellulose degradation—new insight into the function of cellobiohydrolases. Trends Biotechnol 15:160–167, 1997. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classiﬁcation of Enzymes. San Diego, CA: Academic Press, 1992. HM Jespersen, EA MacGregor, B Henrissat, MR Sierks, B Svensson. Starchand glycogen-debranching and branching enzymes: prediction of structural features of the catalytic (beta/alpha)8-barrel domain and evolutionary relationship to other amylolytic enzymes. J Protein Chem 12:791–805, 1993. WP Burmeister, S Cottaz, H Driguez, R Iori, S Palmieri, B Henrissat. The crystal structures of Sinapis alba myrosinase and a covalent glycosyl-enzyme intermediate provide insights into the substrate recognition and active-site machinery of an S-glycosidase. Structure 5:663–675, 1997. WP Burmeister, S Cottaz, P Rollin, A Vasella, B Henrissat. High resolution Xray crystallography shows that ascorbate is a cofactor for myrosinase and substitutes for the function of the catalytic base. J Biol Chem 275:39385–39393, 2000. B Henrissat, G Davies. Structural and sequence-based classiﬁcation of glycoside hydrolases. Curr Opin Struct Biol 7:637–644, 1997. B Henrissat. A classiﬁcation of glycosyl hydrolases based on amino acid sequence similarities. Biochem J 280:309–316, 1991. C Chothia, AM Lesk. The relation between the divergence of sequence and the structure in proteins. EMBO J 5:823–826, 1986. EA MacGregor, S Janecek, B Svensson. Relationship of sequence and structure to speciﬁcity in the a-amylase family of enzymes. Biochim Biophys Acta 1546: 1–20, 2001. J Gebler, NR Gilkes, M Claeyssens, DB Wilson, P Be´guin, WW Wakarchuk, DG Kilburn, RC Miller Jr, RA Warren, SG Withers. Stereoselective hydrolysis catalyzed by related h-1,4-glucanases and h-1,4-xylanases J Biol Chem 267: 12559–12561, 1992. B Henrissat, A Bairoch. New families in the classiﬁcation of glycosyl hydrolases based on amino acid sequence similarities. Biochem J 293:781–788, 1993. PM Coutinho, B Henrissat. Carbohydrate-active enzymes: an integrated database approach. In: H Gilbert, G Davies, B Henrissat, B Svensson, eds. Recent

34

14.

15. 16. 17.

18. 19. 20.

21. 22.

23.

24. 25. 26.

27. 28.

29.

Henrissat et al. Advances in Carbohydrate Bioengineering. Cambridge: The Royal Society of Chemistry, 1999, pp 3–12. JA Campbell, GJ Davies, V Bulone, B Henrissat. A classiﬁcation of nucleotidediphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem J 326:929–939, 1997. B Henrissat, A Bairoch. Updating the sequence-based classiﬁcation of glycosyl hydrolases. Biochem J 316:695–696, 1996. B Henrissat, PM Coutinho, GJ Davies. A census of carbohydrate-active enzymes in the genome of Arabidopsis thaliana. Plant Mol Biol 47:55–72, 2001. B Henrissat, PM Coutinho, PJ Reilly. Reading-frame shift in Saccharomyces glucoamylases restores catalytic base, extends sequence and improves alignment with other glucoamylases. Protein Eng 7:1281–1282, 1994. G Davies, B Henrissat. Structures and mechanisms of glycosyl hydrolases. Structure 3:853–859, 1995. Y Bourne, B Henrissat. Glycoside hydrolases and glycosyltransferases: families and functional modules. Curr Opin Struct Biol 11:593–600, 2001. B Henrissat, I Callebaut, S Fabrega, P Lehn, JP Mornon, G Davies. Conserved catalytic machinery and the prediction of a common fold for several families of glycosyl hydrolases. Proc Natl Acad Sci USA 92:7090–7094, 1995. ND Rawlings, AJ Barrett. Classiﬁcation of peptidases. Methods Enzymol 244: 1–15, 1994. N Nagano, CT Porter, JM Thornton. The (h/a)8 glycosidases: sequence and structure analyses suggest distant evolutionary relationships. Protein Eng 14: 845–855, 2001. AB Boraston, BW McLean, JM Kormos, M Alam, NR Gilkes, CA Haynes, P Tomme, DG Kilburn, RAJ Warren. Carbohydrate-binding modules: diversity of structure and function. In: HJ Gilbert, GJ Davies, B Henrissat, B Svensson, eds. Recent Advances in Carbohydrate Bioengineering. Cambridge: The Royal Society of Chemistry, 1999, pp 202–211. EA Bayer, H Chanzy, R Lamed, Y Shoham. Cellulose, cellulases and cellulosomes. Curr Opin Struct Biol 8:548–557, 1998. B Henrissat, GJ Davies. Glycoside hydrolases and glycosyltransferases : families, modules and implications for genomics. Plant Physiol 124:1515–1519, 2000. NO Seto, CA Compston, SV Evans, DR Bundle, SA Narang, MM Palcic. Donor substrate speciﬁcity of recombinant human blood group A, B and hybrid A/B glycosyltransferases expressed in Escherichia coli. Eur J Biochem 259:770–775, 1999. PM Coutinho, B Henrissat. Life with no sugars? J Mol Microbiol Biotechnol 1:307–308, 1999. X Zogaj, M Nimtz, M Rohde, W Bokranz, U Romling. The multicellular morphotypes of Salmonella typhimurium and Escherichia coli produce cellulose as the second component of the extracellular matrix. Mol Microbiol 39:1452– 1463, 2001. ND Rawlings, E O’Brien, AJ Barrett. MEROPS: the protease database. Nucleic Acids Res 30:343–346, 2002.

3 Analyzing Three-Dimensional Structures of Variant Enzymes Richard Bott Genencor International Palo Alto, California, U.S.A.

1

INTRODUCTION

Proteins provide diverse functional capability to sustain life. It is assumed that every protein that is actively expressed within a cell has at least one functional and/or structural purpose such that it increases the ability of the organism to survive and reproduce. At a molecular level, proteins are polymers of diﬀerent sequences of 20 naturally occurring amino acids. The speciﬁc sequence of a protein is determined by the nucleotide sequence of the gene encoding that protein in the genome of the organism. Proteins consist of one or more polypeptide chains that reproducibly fold into a speciﬁc tertiary structure. In the tertiary structure, amino acid side chains that are widely separated in their linear sequence are brought into close proximity. In functionally related proteins having a common ancestor and a similar function, the tertiary structure brings together the side chains of amino acids in highly conserved spatial juxtapositions (e.g., the catalytic triad found in serine proteases). 35

36

Bott

Proteins having such a conserved structural motif often share patterns of conserved amino acid sequence throughout their structures. The pattern of conserved sequence can range from nearly complete identity (>95%) but may be much more restricted—limited to regions forming the catalytic site and substrate and/or cofactor-binding sites that may represent as little as 20% of the overall sequence. These shared patterns of conservation for functionally related proteins seem to diverge in a manner similar to that presumed to reﬂect the expected evolutionary relationships for the parent organisms. Serine proteases of the trypsin-like family from mammals are more closely and more extensively conserved than the related serine proteases from bacteria. Although these structurally and functionally homologous proteins share a common mechanism of action, they have diverse speciﬁcity and stability proﬁles, which are the result of the diﬀerences in the amino acid sequences encoded by DNA. This somehow better adapts the proteins for functioning in particular organisms, each of which is in its own unique environment. Because the genetic information necessary to maintain the organism encodes for the amino acid sequence, the amino acid sequence must in turn somehow determine the overall tertiary folding of the proteins and thereby its function; it follows that modifying the DNA sequence to encoding a particular protein will alter its structure, and hence its function. The development of recombinant DNA technology allowed the manipulation of the genetic sequence to speciﬁcally alter the coding for one or more amino acids in a site-speciﬁc manner. This enabled altering the amino acid sequence of a protein with the aim of probing the speciﬁc functional roles of particular amino acid side chains, or of engineering the protein toward a commercial function. This technique has been highly fruitful for both pursuits. The selective replacement of purported functionally important residues with an accompanied loss of function is regarded as the deﬁnitive proof of the role of a particular side chain. There are now numerous examples of engineered enzymes, such as proteases, amylases, cellulases, and lipases, where one to a few amino acid changes have resulted in enzymes that are superior to naturally occurring enzymes. Such engineered proteins have been used in commercial applications. There are also numerous cases of protein engineering where the substituted side chains did not result in the desired eﬀect. Indeed, it very often the case that the site-speciﬁc substitutions are neither beneﬁcial nor deleterious, resulting in little or no change in the properties being measured for a particular protein. Such changes are regarded as neutral substitutions. Now that the entire genomes of several species are known, it is also possible to follow evolutionary drifts in the sequence of related proteins performing a similar function. The class of trypsin-like serine proteases provide a well-documented case study. The three-dimensional structure of serine proteinases of the trypsin class is conserved even in the presence of deletions or

Structures of Variant Enzymes

37

insertions of large segments of amino acid sequence. Structural studies have discovered that in regions where these proteins shared a common overall tertiary fold, the sequences can be diverged by more than 50%. From this observation, it became clear that the overall tertiary fold was much more highly conserved than the amino acid sequences. Given this insight, it follows that the tertiary fold of the protein is tolerant to single amino acid substitutions, provided that these substitutions do not replace one of the critical side chains involved in the catalytic machinery, or a residue that is crucial for a speciﬁc recognition process. This would, of course, be a necessary corollary to any postulate that protein evolved by the slow accumulation of amino acids substitutions that gradually allow the divergence of function to the point of having two enzymes with diﬀerent functionality. The functionality must have been suﬃciently maintained during the accumulation of numerous substitutions that gradually altered the function through the accumulation of a series of compensating and synergistic changes. Natural proteins that would have evolved by this process would have tertiary folds that were, to a large extent, forgiving and, at the same time, robust. From this line of reasoning, there would be an inherent potential for success to engineer proteins. A rationale based on the best available understanding of the relationship between tertiary structure and function would provide a means to select sites that would produce immediate and, hopefully, beneﬁcial changes toward the desired function or property (e.g., enzyme activity or stability). This process that would focus on productive, rather than random, accumulation of changes should result in accelerated evolution of a desired function in a protein. Such a process of rational or ‘‘directed’’ mutagenesis should be superior to the random walk accumulation of changes over time occurring in natural evolution. In both cases, these changes would most likely be tolerated without a major conformational change. Thus, it is possible to model probable changes based on the native structure. Several functional characteristics have emerged as not only being desirable, but also quite achievable in protein engineering. It has been demonstrated that for numerous proteins, it is possible to alter substrate speciﬁcity and overall catalytic eﬃciency. In the case of subtilisin, relative speciﬁcity has been altered by as much as 1000-fold, and relative catalytic eﬃciency has been altered, as measured by kcat/Km, by as much as 100-fold (1). The pH and temperature performance proﬁles have also been shifted easily by at least one pH unit, and temperature for optimum performance has been raised by 10j C (1). Stability has also been manipulated by site-directed and rational mutagenesis, as has been exempliﬁed by the extensive work with T4 lysozyme (2) and bacterial amylase (3). Clearly, knowledge of the structure of the starting protein is central to this strategy. X-ray crystallographic determination has been the most extensively used approach to obtain three-dimensional structures of the parent and

38

Bott

variant enzymes, and nuclear magnetic resonance (NMR) has also been employed to determine three-dimensional structures of proteins up to 30 kDa (4). This chapter will review some of the standard approaches that are used in determining variant structure, along with some of the general guiding principles that appear to be recurring themes in structures of engineered proteins. It would be impossible to cover the extensive work that has been done in a number of protein systems, so this chapter will focus on a few representative examples. These illustrate how the knowledge of the threedimensional structure of native and variant enzymes structure has been used to elucidate the underlying principles of stability, enzyme function, and energy conversion. This chapter will also focus on emerging techniques to further quantitate signiﬁcant changes in the structure as regards to coordinates shift, ﬂexibility, and conformational change. Finally, the chapter will look at the emerging techniques and what new structural insights may be forthcoming as site-speciﬁc variants continue to be analyzed in the future. 2

STRUCTURE DETERMINATION

There are numerous texts that cover the general principles of x-ray crystallography—one for the nonexpert wishing to understand only the underlying principles (5), as well as excellent in-depth textbooks for detailed studies (6,7). It all begins with a protein crystal. These are regular arrays of protein molecules that will scatter x-rays from the electron clouds of individual atoms to form a coherent diﬀraction pattern. These crystals are formed by solvated protein molecules, with the crystal comprising between 40% and 60% solvent. The determination of the parent protein structure by x-ray crystallography requires an experimental collection of diﬀraction data taken as intensities of diﬀracted x-rays scattered from a crystal of the protein and the ‘‘phases’’ for combining the observed intensities in a Fourier summation. The combination of these data produces a three-dimensional visualization of the scattering matter (the electron of atoms within the protein molecule), which is expressed as an electron density map. There are several highly successful strategies for suﬃciently determining accurate phases, which include multiple isomorphous replacement, multiple wavelength anomalous dispersion, and, ﬁnally, molecular replacement. The latter technique has been extensively used in the determination of variant structures. A model is then constructed, consisting of coordinates of usually all nonhydrogen atoms by ﬁtting the expected atoms residue by residue into the electron density map. There is every reason to expect that the three-dimensional structure obtained from x-ray crystallography is a good representation of the active protein structure in a solution. The structures of proteins determined in

Structures of Variant Enzymes

39

diﬀerent crystal forms are in close agreement (8), and there is general agreement between the structures determined by x-ray crystallography and those determined independently by NMR (4). The general predictions made on the basis of these x-ray structures, particularly the identiﬁcation of active site residues, have been subsequently conﬁrmed by site-directed mutagenesis. 3

THREE-DIMENSIONAL STRUCTURE OF PROTEINS

In solution, the amino acid sequence of proteins is presumed to dictate the reproducible folding of the molecules into stable three-dimensional structures. Within these structures, there are recognizable substructures or features of secondary structure such as loops, helices, and sheets. The geometry of these features of secondary structure, expressed as torsion angles of the main chain atoms, is largely a consequence of the stereochemical restraints imposed by the interacting atoms of the peptide bond, which links the amino acids in the polypeptide chain with the Ca atom and its constituents (9). The peptide bond is planar and rigid so that there are only two bonds that are free to rotate for each peptide residue. Rotation about these bonds is limited to speciﬁc ranges that result in periodical structures such as the a helix. The steric constraints are used as a criterion for evaluating the quality of protein structures by plotting the torsional angles for each peptide bond to give the Ramachandran plot (9). At the time this chapter was written, there were more than 18,000 structures deposited in the protein data bank (10). The vast majority of torsion bond angles for these proteins conform to ranges recognized for the helical, sheet, and turn segments, upon which the overall fold is observed. The absence of a side chain loosens the restraints so that the only outliers in a Ramachandran plot are usually glycine residues. The distribution of side chains in the three-dimensional structure for soluble proteins is the ‘‘oil-drop’’ model. From the ﬁrst protein structures of myoglobin (11), hemoglobin (12), lysozyme (13), and ribonuclease (14), it was noticed that hydrophobic side chains were found in the interior of the protein whereas hydrophilic side chains were found on the surface. Within the interior, these side chains were found to be in close packed arrangement, leaving only a few cavities that were occupied by solvents. Occasionally, hydrophilic residues were found in the interior; however, they were usually observed to be situated as pairs of hydrophilic, oppositely charged side chains. By being situated in a shielded environment, they would interact with each other more strongly than in an aqueous environment. Such structures are termed salt bridges and are considered to have a stabilizing inﬂuence on the overall fold of the enzyme. In general, enzymes tend to be roughly spherical objects with relative smooth surfaces. Any crevices are usually ﬁlled with ordered solvent. In enzymes, there is usually a surface feature where the reaction occurs, which

40

Bott

includes the side chains responsible for catalytic activity. The active site is surrounded by residues to create a unique binding surface for the substrate molecule. These features often are the sites with the highest probability for altering the function and/or speciﬁcity of the enzyme.

4

DIRECTED EVOLUTION OF A PROTEIN

To perform a directed evolution of a protein, a hypothesis is formulated as to which property of the protein will enhance its performance. This is often largely based on a biochemical analysis of the enzymatic reaction, the activity of the enzyme toward a substrate of interest, and an analysis of the optimal conditions for use of the enzyme in a given application. This can further be enhanced by comparing the performance of diﬀerent enzymes in the application. The diﬀerent enzymes will most often come from libraries of natural isolates taken from environments that most closely match the conditions of the application for which the enzyme is intended. In most cases, the engineering goals will be to alter the substrate speciﬁcity, increase the overall catalytic eﬃciency under a speciﬁc set of environmental conditions such as pH and temperature, and/or alter the stability of the enzyme. These parameters will then lead to the selection of sites and regions to be systematically explored using recombinant DNA technology. The resulting variant enzymes would then be screened for improved performance and ranked. The probable structures of the variants will then be evaluated with regard to the altered properties that resulted from the change. In certain instances, anomalies between the overall pattern of altered performance and a speciﬁc variant will beneﬁt from the determination of the actual structure as opposed to the ‘‘probable’’ structure derived from modeling on the basis of the parent enzyme. In these instances, the knowledge of the three-dimensional structure of the variant protein becomes crucial.

5

DETERMINING X-RAY STRUCTURES OF VARIANT PROTEINS

Knowledge of the native enzyme structure is of considerable advantage in determining the structure of the variant enzyme. First, the crystallization conditions giving crystals of the parent enzyme will, in most cases, give crystals of the variant that are suitable for diﬀraction studies. If crystals of the parent protein are available, small crystals can be used as seed crystals for the variant enzyme. The seed crystals serve as a nucleation site for crystal growth of the variant and predispose the variant to crystallize in an isomorphous form that facilitates the comparison of the three-dimensional structures of the

Structures of Variant Enzymes

41

parent and variant proteins. Another advantage when the variant crystallizes in a form that is isomorphous is that the coordinates of the parent enzyme can immediately serve as a phasing model. In this most favorable instance, the time needed to begin an analysis of the variant structure is limited to the time needed to obtain a suitably diﬀracting crystal and the time to collect the diﬀraction data from the variant crystal. The most immediate visualization of diﬀerences between the parent and the variant protein comes from the diﬀerence electron density map. In the case of isomorphous crystals, the |Fovariant||Foparent| diﬀerence in electron density can be examined, where Fo corresponds to the observed structure factors of equivalent reﬂections from the variant and parent enzyme crystals. These maps can be regarded as being essentially the result of subtracting the electron density of parent from the electron density of the variants at each sampling point throughout the electron density map. Where the structures of the variant and parent enzymes are unchanged, the electron density of the parent and variant will be the same and cancel out. In regions where atoms have moved, there will be positive electron density for the new position of atoms that have changed or have been added by a substitution in the variant and negative density at the position where the atoms are replaced by a substitution in the parent protein. However, atoms rarely shift by distances exceeding their van der Waals radius so that there is no overlap expected between the old and new positions. However, as illustrated by the example in Fig. 1, the new and old positions do overlap. There are also instances when a new crystal form is obtained. New crystal forms have been linked to substitutions involving crystal contacts (15). In these cases, it is necessary to use the techniques of molecular replacement to obtain a starting phase model to generate an electron density map of the variant structure. There are several highly successful program packages available, including AmoRe (16) and CNS (17), which determine the correct orientation and position of the reference molecule in an automatic or semiautomatic manner. However, in these instances, it is not possible to directly compare the structure of the variant and parent molecules as before. Instead, FoFc diﬀerences in electron density map are examined, where Fo is the observed diﬀraction intensity from the variant protein and Fc is the calculated diﬀraction intensity from a model of the parent enzyme aligned and positioned to serve as a phasing model for the variant. In this map, the diﬀerence map is the result of subtracting features present in the model of the parent protein from the electron density of the variant enzyme. In this case, the electron density includes all features not included in the model of the parent enzyme. Solvent molecules, salts, and ligands not present in the coordinate set of the parent protein will appear as positive electron density as well as diﬀerences between the structure of the parent and variant enzymes. So

42

Bott

Figure 1 Structural perturbations arising from site-speciﬁc mutations. Superposition of native subtilisin BPNVand variant having Y217L substitution (dark gray). The side chain of Leu217 is seen to closely resemble the conformation of Tyr217 in the native enzyme. Residues forming the catalytic triad (Asp32, His64, and Ser221) were not altered.

instead of a few diﬀerent density peaks that immediately highlight the structural diﬀerences, there may be many more features of the electron density that must be surveyed. As will be noted in speciﬁc examples below, this can often be a rewarding exercise in cases where the solvent molecules serve as indicators of structural changes either for side chain shifts, or also directly themselves as mediators of altered function. In general, the diﬀerences arising for single amino acid substitutions are very subtle and usually result in limited local perturbations in the structure. The phenomenon of subtle changes can be reinforced by what is found in naturally occurring variants. There are numerous examples of related proteins that share a close homology in amino acid sequence and function within the protein data bank. One of the most extensively engineered enzymes, subtilisin, is a representative case. Subtilisins belong to the S8 family of serine proteinases. The three-dimensional structures of subtilisins from several different species of Bacillus have been characterized. Three in particular (from

Structures of Variant Enzymes

43

Bacillus amyloliquefaciens, B. licheniformis, and B. lentus), which have been commercialized for use as detergent additives, have been extensively studied in several laboratories (18–20). The sequences of these enzymes diﬀer at 87 and 103 of a possible 275 positions (Fig. 2). It should be clear from this picture that if diﬀerent species having 83–103 substitutions have a similar overall folding pattern, then one would certainly expect that variants having 1–10 substitutions would also have a similar folding pattern. The ﬁnding that very subtle shifts occur represents one of the major challenges of analyzing the

Figure 2 A comparison of main chain folding of three subtilisin enzymes. Subtilisin BPNVfrom B. amyloliquefaciens (gray), subtilisin Carlsberg from B. licheniformis (black), and subtilisin from B. lentus (dark gray). Although the sequence can diﬀer at 40% of the positions, these enzymes share an identical overall tertiary folding pattern.

44

Bott

structure of site-speciﬁc changes, which is to discern real diﬀerences from random ﬂuctuations in structure. Some attempts to address this issue are described below. 6

STRUCTURAL ANALYSIS OF SITE-SPECIFIC VARIANTS

It has been possible to alter stability, substrate speciﬁcity, pH activity proﬁle, and electrostatic interactions of many proteins. X-ray crystallography has been used to determine structures that illustrate the structural basis for the alteration of protein properties in a number of proteins. It would be impossible to do justice to the breadth of all crystallographic analysis of site-directed mutagenesis in all possible protein systems developed over more than two decades in a dozen such chapters. Therefore, it is not the intent of this chapter to attempt encyclopedic coverage, but rather to select samples over a range of the last 20 years. We will discuss the analysis of two enzymes, T4 lysozyme and subtilisin, both of which have been extensively studied; a redox protein, cytochrome f; and a light transducing protein, bacteriorhodopsin. T4 lysozyme has been extensively studied as a model system to understand the structural basis of protein stability. T4 lysozyme consists of 164 amino acids that folds into two domains linked by a long central a helix connecting the two domains. Through extensive mutagenesis, it has been possible to elucidate certain principles governing protein stability of helices. In one study, the three-dimensional structures of variants, in which 13 of the possible 19 natural amino acid replacements (Ala, Arg, Asn, Glu, Gly, Ile, Leu, Lys, Phe, Pro, Thr, Trp, and Val) were introduced to replace serine in the middle of the a helix, were determined (21). All amino acids were accommodated without a major distortion of the helix main chain, a pattern that is repeated at other sites in T4 lysozyme and other proteins. Based on the conservation of main chain conformation and the helix-stabilizing hydrogen bonds, it was possible to identify the structural basis for the high helix propensity for alanine as well as the low helix propensity for glycine and proline. Alanine was proposed to provide an energetic compromise between increased hydrophobic stabilizations without incurring the entropy cost associated with the conformational restriction of the additional side chain atoms present in residues with larger side chains. Proline, while restricting conformational freedom, has the obvious enthalpic cost of losing a main chain hydrogen bond and also the introduction of some steric interactions, although not suﬃcient to disrupt the helix backbone, and thus has a lower helix propensity. Glycine was proposed to have low helix propensity due to the entropy cost that accompanies the additional conformational ﬂexibility. These structures were determined from multiple crystal forms, but when the helix residues 40–49 where aligned, the root mean square (rms) deviation

Structures of Variant Enzymes

45

for Ca atoms ranged from 0.10 to 0.14 A˚ for isomorphous crystal forms and from 0.19 to 0.33 A˚ for variants determined from nonisomorphous crystal forms. A similar variation was reported in a survey of structures of T4 lysozyme determined from 25 crystal forms (22). These crystals were grown under diverse conditions, varying pH values, and diﬀerent space groups, and had one to ﬁve molecules in the asymmetrical unit. Variation between equivalent Ca was again reported to be in the range of 0.25–0.4 A˚. It was noted that these were well above the estimated error of 0.1–0.2 A˚. This study reinforced the pattern seen for the helix substitutions above, such that in general, the folding pattern of T4 lysozyme was suﬃciently robust to tolerate between 1 and 11 substitutions distributed over 16 sites. Several of these sites altered the domain-to-domain juxtaposition, resulting in an altered hinge angle. Here the determination of structures of numerous variants was required to decipher whether the change in hinge angle was a consequence of the substitutions at the domain interface, thereby resulting in diﬀerent crystals forms, or whether the ﬂexibility of the variant in the hinge angle was an intrinsic property of the enzyme and that the diﬀerent crystal forms provided the opportunity to map the variants. Numerous variants that involved substitutions far removed from the domain interface resulted in diﬀerent crystals forms, which also manifested altered hinge angles between the two domains of T4 lysozyme. Thus, the altered hinge angle was interpreted to be the result of intrinsic ﬂexibility in the molecule. Substitutions of sites involved in crystal contacts were attributed to altered crystallization patterns. This has also been reported for subtilisin crystals (15), which are discussed below. The substitutions gave diﬀerent crystal forms allowing the observation of hinge angle variability and resulting in the conclusion that ﬂexibility of the hinge was an intrinsic property of the enzyme structure. In these cases, individual domains showed high conservation that allowed the deﬁnition of the hinge-bending axis of the molecule. A careful analysis showed that the motion included more than a simple opening and closing of the cleft, but was a combination of rotation and had a substantial side-to-side component akin to ‘‘the chewing action of a camel.’’ Such an illustrative description would not have been possible without the detailed structural analysis of numerous mutants, which in turn gave rise to numerous crystal forms of the enzyme. Our understanding of the relative importance of internal hydrogen bonds for stability has been derived, in part, from x-ray crystallography of variants of the T4 lysozyme. In studies focusing on internal solvent and the characterization of the presence or absence of internal solvent in the structures of variants of T4 lysozyme, diﬀerences in relative thermal stability have been manifested. From the analysis of numerous variants constructed to introduce or replace internal solvents, it was concluded that hydrogen bonds

46

Bott

are energy-neutral. By relating structures that had either lost or gained solvents and the availability for hydrogen bonds to form, it was possible to conclude that the hydrogen bonds formed in the folded state are oﬀset by hydrogen bonds with solvents in the unfolded state. It was also possible to deﬁne the rules for creating and the requirements that included the proximity of three or four potential hydrogen bond donor/acceptors; otherwise, the resulting variant would be expected to be less stable. These conclusions were closely related to the structural analysis that determined which of the variants had additional solvent molecules present and how many were introduced. Often creating a cavity that could accommodate two water molecules, such as when a methionine or another large residue was replaced with alanine (Met6!Ala), the two molecules each satisﬁed one hydrogen-bonding requirement of the other. These studies and related ones in other enzymes such as amylase (3) have provided examples of structurally based strategies that can be successfully employed to stabilize proteins. There are also examples of variants that manifest dramatic changes in properties, such as synergy between the variants for stability, where the substitutions are far removed from each other. In these structures, the phenomenon of long-range interactions has been invoked to explain the consequences of these changes. However, although the phenomenon is well documented, the basis of long-range interactions and the means to evaluate these have remained elusive. 7

ENZYME SPECIFICITY AND CATALYTIC ACTIVITY

Structural studies have also played a role in the engineering of altered speciﬁcity and increased catalytic function of proteolytic enzymes. The subtilisins, proteolytic enzymes originally isolated from various Bacillus species, display broad speciﬁcity and relatively high stability to denaturants such as detergents. Therefore, they have been incorporated into detergent powders as additives to dissolve proteinaceous stains. Subtilisins perform a similar function as surfactants to solubilize stains and are cost-eﬀective, resulting in their incorporation into surfactants. One obvious strategy in improving these enzymes was to alter the speciﬁcity of enzymes with the aim of targeting speciﬁc soils. Toward this end, numerous studies were undertaken to identify the speciﬁc residues that serve as determinants of speciﬁcity along the binding sites. Subtilisin can be inhibited by the product resulting from the hydrolysis of a polypeptide chain or artiﬁcial substrate. When cleaved, the artiﬁcial substrate, succinyl–Ala– Ala–Pro–phenylalaninyl para-nitroanilide, results in a product, succinyl–Ala– Ala–Pro–phenylalanine, that can inhibit subtilisin. We have determined the structures of the product-inhibited native enzyme, which has served as a basis

Structures of Variant Enzymes

47

for a model of an enzyme–substrate complex (Fig. 3). When examining the interactions of the phenylalanine side chain of the substrate, we can see that it has van der Waals contacts with the main chain of residues 126–129 and with the side chain at positions 155 and 156. It was also noticed that glycine 166 was found in an analogous position of site thought to determine the P1 speciﬁcity in chymotrypsin and trypsin position 189. As a glycine residue lacks any side chain, the site is open rather than closed as in chymotrypsin and trypsin. We use the nomenclature proposed by Berger and Schecter (23) to

Figure 3 The model of binding from the synthetic substrate, succinyl–Ala–Ala– Pro-phenylalaninyl–para-nitroanilide. The model was deduced from substrate and product complexes to numerous subtilisin BPNVvariants. The location of residues forming the catalytic triad is indicated.

48

Bott

designate the speciﬁc subsite for binding polypeptide substrates. Modeling experiments have shown that a side chain would close this pocket and would increase the potential contacts available for interacting with the P1 side chain. This was borne when all 19 substitutions were made and one of these (asparagine) resulted in an enzyme showing increased catalytic activity. Another substitution, a lysine for glycine, resulted in a 1000-fold increase relative to the parent enzyme for substrates having glutamic acid at the P1 position. Thus, a change at even a single position can radically alter the speciﬁcity at the P1 position and knowledge of the three-dimensional structure coupled with the analysis of site-speciﬁc substitutions, which, in the instance of position 166, suggest that it can accommodate many diﬀerent side chains without perturbing the tertiary structure, and hence the function, of the enzyme. A comparison of the subtilisin from B. amyloliquefaciens (subtilisin BPNV) and B. licheniformis (subtilisin Carlsberg), which diﬀer at 89 of a possible 275 amino acids in their sequences, displays a number of diﬀerent kinetic properties. In addition to the speciﬁcity diﬀerences for negatively charged amino acids, the two enzymes display a 10-fold diﬀerence in kcat, the turnover number for a synthetic substrate. For example, with the substrate succinyl alanine–alanine proline–phenylalanine–para-nitroanilide, subtilisin BPNVhas a kcat value of 50 turnovers/s, whereas subtilisin Carlsberg has a kcat of 510 turnovers/s (24). Although subtilisin Carlsberg diﬀers from subtilisin BPNV at 89 positions, few of these diﬀerences are near the active site of the enzyme or the substrate-binding site. The initial attempt to recruit the substrate speciﬁcity and turnover properties by replacing three of the amino acid diﬀerences that were found in the substrate-binding site was highly successful (24). The three substitutions were Glu156!Ser, Gly169!Ala, and Tyr217!Leu. These three changes were shown to successfully recruit both substrate speciﬁcity and turnover rate (kcat) of subtilisin Carlsberg into subtilisin BPNV. A crystallographic analysis conﬁrmed that these substitutions resulted in a structure that was highly conserved, except that the side chains at positions 156 and 217 adopted conformations that were identical to those found for the same amino acid side chains in subtilisin Carlsberg. The introduction of the side chain at position 169 was also accommodated without any conformational change. The turnover number was later found to largely occur as a result of a single amino acid substitution, Tyr217!Leu. Thus, with a single amino acid change, a 10-fold increase in turnover number was introduced into subtilisin BPNV. Analysis of the structure of the enzyme in complex with reaction products revealed an identical binding pattern with that seen for the native enzyme. Based on the structural data, the increase in the turnover number appeared not to be the result of altered substrate binding, but rather due to the removal of any steric hindrance to the reactions involved in the rate-limiting acylation step. These studies showed that the net removal of only

Structures of Variant Enzymes

49

four nonhydrogen atoms in a molecule containing 1880 atoms can have very dramatic eﬀects. Another example of how such subtle changes can inﬂuence the performance of a variant was seen in a diﬀerent subtilisin from B. lentus. This enzyme diﬀers from subtilisin BPNVat 103 positions of which six residues are deletions resulting in a molecule consisting of 269 amino acid residues. Nevertheless, the subtilisin from B. lentus shares a common, highly conserved folding pattern with subtilisin BPNV. B. lentus subtilisin already has a leucine at position 217 and has an even higher turnover number for equivalent synthetic substrates than subtilisin Carlsberg or the Tyr217!Leu variant of subtilisin BPNV. An engineered variant having three substitutions Asn76!Asp, which also occurs in subtilisin Carlsberg, Ser103!Ala, and Val104!Ile, was found to result in an enzyme that was eﬀective by more than twofold in detergent applications than the native enzyme. Here the diﬀerence involved replacing nitrogen with oxygen, removing a hydroxyl oxygen, and adding a methyl carbon. As might have now been expected, the three-dimensional structure showed very subtle changes resulting from the resculpting of the substratebinding surface by two atoms and the recruitment of another atomic replacement near the tight calcium site shared by all three subtilisins: subtilisin BPNV, subtilisin Carlsberg, and B. lentus subtilisin. The analysis of signiﬁcant changes showed that few changes arise from the very subtle structural differences between the variant and the parent enzymes. However, this variant displayed a signiﬁcant diﬀerence in the overall ﬂexibility of the segment involved in the substrate-binding site that was altered to make it more ﬂexible (25). The method for determining this will be discussed below. The increased ﬂexibility has been veriﬁed independently in the NMR structures of the native and variant enzymes (26). The relation rates of amide nitrogens indicate increased ﬂexibility in several regions, including one side of the substratebinding site as predicted by the variation on average temperature factors from the crystallographic structure. Because there are very few positional diﬀerences of any signiﬁcance arising from the very subtle structural changes described above, it appears that the increase in ﬂexibility may be a factor in the increased turnover number and the variant’s increased performance. Sometimes the largest changes that arise as a consequence of sitespeciﬁc substitutions do not aﬀect the conformation at the site of substitution but rather the contiguous side chains of molecules such as the solvent. Often the solvent molecules ﬁll what would otherwise be cavities within the molecules or crevices along the surface. In some cases, solvents form channels that either contribute to the function of the protein, or serve as space holders— surrogates for products or substrates that must pass through a channel or enter a cavity. Two examples of the former are found in bacteriorhodopsin (27) and cytochrome b6f (28).

50

Bott

Bacteriorhodopsin links the photoisomerization of the all-trans retinal chromophore to the 13-cis,15-anti isomer with proton transfer in a unidirectional manner. Extensive mutagenesis studies identiﬁed a series of mutants blocking this process and facilitating the linking of protonation of speciﬁc side chains with particular spectroscopic states (29). A series of structures of two site-speciﬁc mutants, E204Q and D96N, both of which were found to interrupt the photocycle roughly to the same state (either early or late M state) (27), was determined in both the resting and M states, where the retinal was still photoisomerized, and compared. The mutants were found on opposite ends of the solvent channel leading to the retinal, which is covalently linked to Lys216. The variant E204Q is in the extracellular region and D96N is in the cytoplasmic region. Comparing the structures of these variants showed nearly identical ground states but highlighted subtle changes in the conformation, which were the consequences of the site-speciﬁc mutations at either the cytoplasmic or extracellular face of the molecule. The structures of E204Q in the resting and M states in the cytoplasmic region, when compared to the consequences of the D96N mutation (30), could be used to diﬀerentiate changes in the cytoplasmic region that were the result of the diﬀerences between the resting and M states from the consequences of D96N, which would be overlaid on the diﬀerence seen for E204Q. Likewise, comparing the structures of the D96N variant in the resting and M states for changes with those of E204Q in the extracellular region was performed to diﬀerentiate shifts in the extracellular region due to the diﬀerence in resting and M states onto which the structural consequences of E204Q would be overlaid. Thus, in this study, diﬀerent site-speciﬁc mutations, each contributing subtle local perturbations, were used to ﬁlter out these consequences of sitespeciﬁc substitutions to obtain an unbiased comparison of the structural changes between the resting and ground states. Each substitution prevented a key protonation event necessary in the continuation of the photocycle to the next relaxation step involving a coordinated transfer of a proton along the channel. It was noted that in the M state of the E204Q, there appeared to be a nascent solvent channel partially formed to facilitate this event. Cytochrome b6f from Chlamydomonas reinhardtii was modiﬁed to remove residues hydrogen bonding to the internal solvent channel. These mutants displayed similar phenotypes all manifesting a decreased rate of reduction and, in the most impaired mutant N168F, the organism could not grow phototrophically (28). The three-dimensional structures of the three mutants (N168F, Q158L, and N153Q) were determined and compared to the native protein (28). N168F was determined in a diﬀerent crystal form (P21) instead of P212121 and produced the highest resolution data (1.6 A˚). Structural analysis showed that the N168F mutant had a loss of two of the

Structures of Variant Enzymes

51

ﬁve internal solvent channels, which correlated with the pronounced decrease in the reduction rate and phototrophic growth. Smaller disruptions were seen in the Q158L and N153Q mutants, which had shifts in one of the ﬁve internal solvent atoms. In summary, the eﬀect of site-speciﬁc mutation can also be indirect, aﬀecting either neighboring side chains or ordered water, rather than the immediate shift in the residue itself. 8

RELATING STRUCTURE TO FUNCTION: NEW STRATEGIES AND TECHNIQUES

In all of the instances cited above, there is very close agreement in the overall structure of the variant and parent proteins. In most instances, very subtle structural changes or absence of structural changes have been linked to the changes in performance of the variant with respect to the parent protein. This, in turn, has raised the question: Are any other subtle changes occurring within the variant that might contribute to the altered function being missed? Intuitively, we expect that the remainder of the protein must aﬀect the functioning of the enzyme either by stabilizing, shielding, or otherwise modulating the interaction of particular amino acid residues. To explore this concern, one must be able to determine and measure what signiﬁcant changes have occurred in the variant structure. The problem here is that these may be subtle shifts within highly ordered structure that are smaller than insigniﬁcant shifts in the more variable parts of the structure and also may vary with the resolution range of the data collected. As such, these are not likely to appear in the diﬀerence electron density maps, or might be dismissed as noise peaks when some density did appear. The ﬁrst step would be to evaluate the empirical error between coordinate sets. There are a number of relatively trivial metrics that have been and are being used to estimate error in protein structure. One of the most favored ones is to take the diagonal terms from the last cycle of reﬁnement as a measure of the residual ‘‘error.’’ These terms reﬂect what would be the shift for a particular atom in the next cycle of reﬁnement and, as such, are a better measure of the convergence of the reﬁnement rather than the error. Often these values appear very low, ranging below 0.1 A˚ for coordinates derived from an electron density at 2.0 A˚. Such estimates are also restricted to internal error, measuring how well the model agrees with a single data set. This would be contrasted to external error, which would be the variation in coordinates determined from independent data sets. Put in another way, would the structure be the same if it were independently determined a second time from two independent data sets? A second approach that does address external errors is the rms variation determination between a set of coordinates, usually the most well-ordered, such as the main chain atoms or sometimes only the Ca

52

Bott

atoms. This gives the variance of one standard deviation if we assume that the mean experimental error is precisely zero. However, this is not the case and not a really desired outcome because one standard deviation does not conform to the statistical 95% conﬁdence level. Instead, it would be more useful to know the actual mean error along with the variation of the error about the mean. In this way, one could establish conﬁdence criteria based on an analysis of variance. 9

IDENTIFYING STATISTICALLY SIGNIFICANT DIFFERENCES

Such a method (31) has been applied to variants of subtilisin. This method is empirical, relying on taking the distances between equivalent atoms as a function of the average temperature factor. When the logs of the diﬀerences between equivalent atoms versus temperature factor of those atoms are plotted, a linear distribution is found (Fig. 4). The temperature factor is a

Figure 4 A plot of the log of the distance between equivalent atoms after molecular superposition as a function of the reﬁned crystallographic temperature factor. A linear regression ﬁt of the mean error is drawn as a solid line, with variants at 2r plotted as dashed lines.

Structures of Variant Enzymes

53

reﬁned parameter that models the relative ﬂexibility of individual atoms within a protein molecule. Atoms in the interior of the molecule are generally less ﬂexible and will lower temperature factors than atoms of residues located on the surface. Linear regression is used to determine the mean error that is not zero and the variant about the mean. The equations for the mean and variance are plotted in Fig. 4 as solid and dashed lines. What is obtained is a linear function for the mean error as a function of the temperature factors that can be applied to all atoms. We ﬁnd that in all instances so far examined, the atoms with low-temperature factors coming from the more ordered parts of the structure tend to have a lower overall mean error, whereas those atoms with high-temperature factors have higher mean errors and are compounded with higher variants as well. So what might be a signiﬁcant diﬀerence between well-ordered atoms in the parent and variant proteins might not be signiﬁcant for the more disordered segments, and these can be diﬀerentiated as a function of the crystallographic temperature factor. In practice, a residue Z-score can be computed using the equation of the mean error and variant as a function of temperature factor and those residues having a Z-score of z3.0 are taken to be signiﬁcantly diﬀerent. In this way, diﬀerences in residues neighboring the substituted side chains have been detected. Moreover, one can also have more conﬁdence in the identiﬁcation of signiﬁcant structural diﬀerences that are potential important determinants of altered function in the variant protein. The advantage of this approach is in variants where the substituted side chain is within the well-ordered regions of the protein structure. Signiﬁcant diﬀerences detected for well-ordered segments would otherwise be overshadowed by nonsigniﬁcant variants in disordered segments and may well fall below the overall rms for all atoms in the structure. Using this approach, one can systematically identify signiﬁcant changes arising from the substituted side chain(s) throughout the entire molecule. 10

MEASUREMENT OF ALTERED FLEXIBILITY

The strategy employed above can easily be extrapolated to the analysis of other structural properties such as ﬂexibility. The central basis in the previous approach was to construct a means for doing an analysis of variance between closely related observations. In the case of Bott and Frane (31), the plot of the log versus the temperature factor resulted in a linear distribution as a function of the temperature factor. In this plot, all data could be subjected to a linear regression analysis to get a function based on all atoms that could be applied to any pair. To analyze ﬂexibility, we realized that the main chain atoms would, by virtue of the covalent linkage and restraints of the peptide bond, have similar ﬂexibility. Thus, by computing a rolling average, the variation of ﬂexibility,

54

Bott

as measured by the crystallographic temperature factor, would be a gradual varying value and this average could again be compared between the parent and variant proteins. This would generate a suﬃciently large set of observations to construct a mean value of the error between any pair. The variance about the mean error seen for all the pairwise comparisons of rolling average between corresponding residue sets in the parent and variant structures can also be constructed. The validation of this approach came from the comparison of the same variant and parent proteins by x-ray crystallography and NMR (26). The same segment was identiﬁed by both techniques as being more ﬂexible in the subtilisin variant N76D/N87S/S103A/V104I (DSAI) described above relative to the parent enzyme. 11

MEASURING CONFORMATIONAL CHANGES

The initial presumption set out at the onset of this chapter was that structure is determined by juxtaposition of disparate atoms into a functionally important conformational change between a variant and the parent protein. The extension of this presumption was that an analysis of variant structure would identify conformational changes arising from the introduction of speciﬁc substitutions. In many regards, the quantitation of error and ﬂexibility is relatively trivial compared to the quantitation of conformation change. Both involve simple two-parameter relationships: the degree of variation versus a particular temperature factor, or the linear position of a residue in the polypeptide chain. Instead, in the analysis of conformational change, one looks for the simultaneous displacement of atoms in amino acid residues that are not contiguous. Although each may be within the boundaries or vagaries of statistically signiﬁcant diﬀerence with regard to the coordinates, the conformational change is signiﬁcant. Knowledge of what conformation changes occur and the relative magnitude of these changes should be considered in relating structural and functional changes. This problem is not unique to the analysis of site-speciﬁc mutations. The question of what occurs upon substrate binding to facilitate catalysis has been an equally vexing problem. One solution to address this question was put forward by Bystroﬀ and Kraut (32) to analyze the ligand-induced changes in dihydrofolate reductase. Their strategy was to examine ‘‘distance diﬀerence’’ or DD plots. These were based on the distance plot developed by Ooi and Nishikawa (33) to plot a twodimensional grid, the Ca–Ca distances, between residues (residues 1–1, 1–2, 1–3, 1–4 , etc. by 2–1, 2–2, 2–3, etc.) within each protein molecule. Such a plot would have diagonal symmetry, with diagonal values (1–1, 2–2, 3–3, . . . , n–n) being zero. In such a plot, residues involved in tertiary interactions will be in close proximity and certain secondary features can easily be recognized as patterns of short contacts. Bystroﬀ and Kraut recognized that if the diﬀerences in these distances were plotted instead, then such a map would

Structures of Variant Enzymes

55

immediately identify conformational changes occurring between two crystallographic structures, such as an enzyme alone versus one complexed with a substrate or substrate analog. Using the DD plot, it was possible to identify and diﬀerentiate shifts from a contiguous series of residues relative to the molecule as a whole and the movement of whole domains. An extension of this strategy has been employed in the analysis of conformational changes arising from site-speciﬁc mutagenesis (34). One very attractive feature of the DD plot is that it is internally consistent and it is not dependent on the alignment of the variant and parent proteins because only intramolecular contacts are being compared. For a protein of 250 amino acids, for example, there will be 250250 intramolecular diﬀerences of which 250 would be trivial self-contacts. These could be used to establish mean deviation and variance. This, in turn, could be used in an analogous manner to determine signiﬁcant conformation shifts within the molecule. These shifts, taken together with the analysis of signiﬁcant changes in positional coordinate and altered ﬂexibility, provide the means of pinpointing the probable structural changes that provide the basis for the altered performance in the variant protein. 12

WHAT DOES THE FUTURE HOLD?

We have seen in these limited examples a consistent theme for site-speciﬁc mutation. The changes are usually localized and often subtle. Although numerous techniques have been and continue to be developed, one of the most pressing needs is to obtain the most precise structural information possible. We have seen that there have been eﬀorts to link the results of x-ray crystallography and NMR. NMR can provide very precise information if suﬃcient spectra can be acquired for the particular residue in question. The combination of NMR results should expand the ability to probe detailed interactions and provide additional insights into the nature of subtle changes, altered charge interactions, and molecular motion. Similar developments are occurring in x-ray crystallography, particularly in the preparation of crystals for data collection. The use of cryocooling has become an increasingly routine approach to increase crystal lifetime in x-ray beams allowing a collection of diﬀraction data from smaller crystals. In the past few years, it has also proven to be a useful tool for the extension of resolution in larger crystals (35). Crystals of subtilisin from B. lentus, when collected at room temperature, routinely diﬀract to resolutions ranging between 1.8 and 1.6 A˚ (18–20). When cryocooled to 100 K, the same crystals have been found to diﬀract to a resolution of 0.78 A˚ (36). At 0.78 A˚ resolution, it is possible to diﬀerentiate the diﬀerent elements on the basis of electron density so that, for example, correct orientation of histidine can be determined rather than inferred, and charge can be inferred from the presence of doubly and singly bonded carboxyl CO

56

Bott

bonds of aspartic and glutamic acids side chains. Hydrogen atoms can be visualized for well-ordered atoms, which, by their presence or absence, can be used to identify charged atoms. These advances will no doubt bring us closer to understanding not only the consequences of particular site-speciﬁc substitutions but of the overall relationship between the structure and function of the molecules responsible for and regulating metabolism. REFERENCES 1. 2.

3. 4.

5. 6. 7. 8.

9. 10.

11. 12. 13.

14.

JA Wells, DA Estell. Subtilisin—an enzyme designed to be engineered. Trends Biochem Res 13:291–297, 1988. BW Matthews. Studies on protein stability with T4 lysozyme. Advances in Protein Chemistry on ‘‘Protein Stability’’. New York: Academic Press, 1995, pp 249–278. A Shaw, R Bott, AG Day. Protein engineering of alpha-amylase for low pH performance. Curr Opin Biotechnol 10:349–352, 1999. JR Martin, FAA Mulder, Y Karmini-Nejad, J van der Zwan, M Mariani, D Schipper, R Boelens. The solution structure of serine protease PB92 from Bacillus alcalophilus presents a rigid fold and ﬂexible substrate-binding site. Structure 5:521–532, 1997. G Rhodes. Crystallography Made Crystal Clear: A Guide for Users of Macromolecular Structure. San Diego, CA: Academic Press, 1993. TL Blundell, LN Johnson. Protein Crystallography. New York: Academic Press, 1976. D Blow. Outline of Crystallography for Biologists. Oxford: Oxford University Press, 2002. T Gallagher, J Oliver, R Bott, C Betzel, GL Gilliland. Subtilisin BPNVat 1.6 A˚ resolution: analysis of discreet disorder and comparison of crystal forms. Acta Crystallogr, D Biol Crystallogr 52:1125–1135, 1996. GN Ramachandran, C Ramakrishnan, V Sasissekharan. Stereochemistry of polypeptide chain conformations. J Mol Biol 7:95–99, 1963. HM Berman, J Westbrook, Z Feng, G Gilliland, TN Bhat, H Weissig, IN Shindyalov, PE Bourne. The protein data bank. Nucleic Acids Res 28:235–242, 2000. CL Nobbs, HC Watson, JC Kendrew. Structure of deoxymyoglobin: a crystallographic study. Nature 209:339–341, 1966. MF Perutz. X-ray analysis of hemoglobin. Science 140:863–869, 1963. CCF Blake, DF Koenig, GA Mair, ACT North, DC Phillips, VR Sarma. Structure of hen egg-white lysozyme: a three-dimensional Fourier synthesis at 2.0 A˚ resolution. Nature 206:757, 1965. HW Wyckoﬀ, KD Hardman, NM Allewell, T Inagami, LN Johnson, FM Richards. The structure of ribonuclease-S at 3.5 A˚ resolution. J Biol Chem 242: 3984–3988, 1967.

Structures of Variant Enzymes

57

15. JL Dauberman, G Ganshaw, C Simpson, TP Graycar, S McGinnis, R Bott. Packing selection of Bacillus lentus subtilisin and a site-speciﬁc variant. Acta Crystallogr, D Biol Crystallogr 40:650–656, 1994. 16. J Navaza. AmoRe: an automated package for molecular replacement. Acta Crystallogr A 50:157–163, 1994. 17. AT Brunger, PD Adams, GM Clore, WL DeLano, P Gros, RW GrosseKumstlave, J-S Jiang, J Kusezewski, M Nilges, NS Pannu, RJ Read, LM Rice, T Simonsen, GL Warren. Crystallography and NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr, D Biol Crystallogr 54:905–921, 1998. 18. C Betzel, S Klupsch, G Papendorf, S Hastrup, S Branner, KS Wilson. Crystal structure of the alkaline protease savinase from Bacillus lentus at 1.4 A˚ resolution. J Mol Biol 223:427–445, 1992. 19. JM van der Laan, AV Teplyakov, H Kelders, KH Kalk, O Misset, LJSM Mulleners, BW Dijkstra. Crystal structure of the high alkaline serine protease PB92 from Bacillus alcalophilus. Protein Eng 5:405–411, 1992. 20. R Bott, J Dauberman, R Caldwell, C Mitchinson, L Wilson, B Schmidt, C Simpson, S Power, R Lad, IH Sagar, T Graycar, D Estell. Using structural comparison as a guide in protein engineering. Ann NY Acad Sci 672:10–19, 1992. 21. M Blaber, X Zhang, BW Matthews. Structural basis of amino acid a helix propensity. Science 260:1637–1640, 1993. 22. X Zhang, JA Wozniak, BW Matthews. Protein ﬂexibility and adaptability seen in 25 crystal forms of T4 lysozyme. J Mol Biol 250:527–552, 1995. 23. A Berger, I Schecter. Mapping the active site of papain with the aid of peptide substrate and inhibitors. Philos Trans R Soc Lond, B 257:249–264, 1970. 24. JA Wells, BC Cunningham, TP Graycar, DA Estell. Recruitment of substratespeciﬁcity properties from one enzyme into a related one by protein engineering. Proc Natl Acad Sci USA 84:5167–5171, 1987. 25. T Graycar, M Knapp, G Ganshaw, J Dauberman, R Bott. Engineered Bacillus lentus subtilisin having altered ﬂexibility. J Mol Biol 292:97–109, 1999. 26. FA Mulder, D Schipper, R Bott, R Boelens. Altered ﬂexibility in the substratebinding site of related native and engineered high-alkaline Bacillus subtilisins. J Mol Biol 292:111–123, 1999. 27. H Luecke, B Schobert, J-P Cartailler, H-T Richter, A Rosengarth, R Needleman, JK Lanyl. Coupling photoisomerization of retinal to directional transport in bacteriorhodopsin. J Mol Biol 300:1237–1255, 2000. 28. G Sainz, CJ Carrell, MV Ponamarev, GM Soriano, WA Cramer, JL Smith. Interruption of the internal water chain of cytochrome f impairs photosynthetic function. Biochemistry 39:9164–9173, 2000. 29. LS Brown. Proton transport mechanism of bacteriorhodopsin as revealed by site-speciﬁc mutagenesis and protein sequence variability. Biochemistry (Moscow) 66:1249–1255, 2001. 30. H Luecke, B Schobert, J-P Cartailler, H-T Richter, JK Lanyl. Structural changes in bacteriorhodopsin during ion transport at 2.0 A˚ resolution. Science 286:255– 260, 1999.

58

Bott

31. R Bott, J Frane. Incorporation of crystallographic temperature factors in the statistical analysis of protein tertiary structures. Protein Eng 3:649–657, 1990. 32. C Bystroﬀ, J Kraut. Crystal structure of unliganded Echerichia coli dihydrofolate reductase. Ligand-induced conformational changes and cooperativity in binding. Biochemistry 30:2227–2239, 1991. 33. T Ooi, K Nishikawa. Conformation of Biological Macromolecules and Polymers. New York: Academic Press, 1973. 34. R Bott. Unpublished results. 35. UK Genick, SM Soltis, P Kuhn, IL Canestrelli, ED Getzoﬀ. Structure at 0.85 A˚ of an early protein photocycle intermediate. Nature 392:206–209, 1998. 36. P Kuhn, M Knapp, SM Soltis, G Ganshaw, M Thoene, R Bott. The 0.78 A˚ structure of a serine protease: Bacillus lentus subtilisin. Biochemistry 39:13446– 13452, 1998.

4 Quantitative Modeling of Lipase Enantioselectivity ¨rgen Pleiss Ju University of Stuttgart Stuttgart, Germany

1

INTRODUCTION

Lipases are versatile tools in the hands of organic chemists (1,2). They are used to hydrolyze ester bonds of a variety of nonpolar substrates at high activity, regioselectivity, and stereoselectivity. Moreover, they are used to catalyze the reverse reaction in nonpolar solvents. Thus the reaction can be optimized by changing substrate structure, solvent, additives, water activity, pressure, temperature, immobilization methods, and, as recombinant lipases became available, the biocatalyst itself (3–5). Optimization of reaction conditions is still a trial-and-error process of screening a highly dimensional parameter space. The role of the sequence and structure of the biocatalyst was studied by x-ray analysis, site-directed mutagenesis, and, recently, by random mutation, created by directed evolution experiments. Based on these experimental studies, empirical rules on how to predict the fast-reacting enantiomer were derived for secondary alcohols (6), primary alcohols (7), and carboxylic acids (8). These rules are highly useful to organic chemists, but are not able to predict quantitative proper59

60

Pleiss

ties such as the E value, nor the eﬀect of changes in the reaction conditions or the biocatalyst itself. Since the ﬁrst x-ray structures of lipases became available (9,10), lipase–substrate interactions were studied on a molecular level. Structure data conﬁrmed the catalytic mechanism, which is similar to serine proteases, and identiﬁed the catalytic machinery: a catalytic triad (serine, histidine, aspartic, or glutamic acid) and an oxyanion hole. Soon it became evident that lipases may crystallize in two conformations, a closed form or an open form (9,11–13). These conformations diﬀer in the position of the lid and the geometry of the oxyanion hole. In aqueous solution, the equilibrium is shifted toward the closed, inactive form, while near a hydrophobic substrate interface, the open, active form is stabilized (14). In this open form, the binding site is fully exposed. Several hydrophobic binding patches were observed: a patch to bind the acid moiety, mostly a medium or a long-chain fatty acid, and at least two more patches to bind the alcohol moiety. Currently, the sequence information of several thousand lipases is deposited, but only 30 lipases and serine esterases have known structures. Although lipases have no global sequence similarity, they have a similar architecture, the a/h hydrolase fold (15), and they follow the same catalytic mechanism. Many lipases show enantiorecognition toward chiral alcohols or carboxylic acids. For chiral secondary alcohols, x-ray data have revealed the structural basis of enantiorecognition by Candida rugosa lipase (16), which supports an empirical rule for the prediction of enantiopreference (6). For Pseudomonas cepacia lipase, the structural basis of stereoselectivity toward triacylglycerol analogs was investigated (17). Based on these structural data, computer-aided modeling is a promising method to establish quantitative predictions of enantioselectivity, which applies to a broad range of lipases and substrates. Lipase enantioselectivity is promising to be modeled because 1) lipases are highly active toward a broad range of substrates under a variety of reaction conditions; they act as monomers and need no cofactors; 2) many sequence and structure data on lipases are available, many of them in complex with substrate-analogous inhibitor; 3) interactions between protein and substrates are expected to be dominated by shape complementarity, with hydrophobic substrates binding to a hydrophobic binding site; induced ﬁt eﬀects upon binding of an inhibitor to the open form of a lipase are limited to side chains movements; 4) as enantioselectivity measures the ratio of kcat/Km toward the two enantiomers, only the transition state complexes have to be compared; in contrast to substrate speciﬁcity, diﬀerences in properties such as size, solubility, diffusion, or interactions of the Michaelis complexes play no role on enantioselectivity (18–20).

Quantitative Modeling of Lipase Enantioselectivity

2 2.1

61

SEQUENCE AND STRUCTURE SIMILARITIES: LIPASE ENGINEERING DATABASE Annotation of the Catalytic Machinery

The Lipase Engineering Database (http://www.led.uni-stuttgart.de) (21) was established as a repository that integrates information on sequence, structure, and function of lipases, and makes it available to protein engineering studies. Currently, it includes 92 sequences from the Swiss-Prot sequence database (22). Based on sequence similarity, each sequence is assigned to one of the 32 homologous families, which are grouped into 15 superfamilies. For each family, multisequence alignments have been performed. These are annotated by information on amino acids, which are relevant to function (catalytic triad, oxyanion hole, lid, substrate binding site), information on structure (secondary structure, disulﬁde bridges), and information on the eﬀect of mutations. Nine of the thirty-two homologous families include a member with a known 3-D structure. Fifty-two structures of twenty diﬀerent lipases are superposed and consistently annotated. The [G,A,T]-x-S-x-G motif near the catalytic serine is the only sequence motif common to all lipases and esterases. Therefore the other residues of the catalytic machinery, the catalytic H–D/E pair and the residues of the oxyanion hole, can only be identiﬁed by their three-dimensional structure. For all nine superfamilies, where one member has a known structure, this assignment can be performed with high reliability (21). Thus for 91% of all sequences in the Lipase Engineering Database, the catalytic machinery is completely annotated, compared to 48% in the Swiss-Prot database (November 1999); in addition, in four homologous families, the annotation of the catalytic histidine had to be corrected. 2.2

Shape of the Binding Site and Chain-Length Specificity

In all lipases, the binding site is a deep, hydrophobic pocket with varying shape. Lipases can be classiﬁed into three groups: 1) lipases that bind the scissile fatty acid in a long hydrophobic crevice near the surface (lipases from ﬁlamentous fungi); 2) lipases with a deep, funnel-type binding site (pancreatic lipases, Pseudomonas lipases, and lipase B from Candida antarctica); and 3) lipases that bind the scissile fatty acid in a tunnel deep in the protein and the alcohol moiety in a ﬂat region near the protein surface (lipases from C. rugosa and Geotrichum candidum). The shapes of the binding sites can be used to interpret the biochemical properties of the enzymes (23). While the alcohol binding site of C. antarctica lipase B is located at the wall of a deep and narrow funnel, it is well accessible

62

Pleiss

in the lipase from C. rugosa. This may explain why C. antarctica lipase B is frequently used to resolve racemic mixtures of small secondary alcohols at high enantioselectivity, while C. rugosa lipase is used for bulky secondary alcohols with ring structures (24). The shape and size of the scissile fatty acid binding site mediate the chain length proﬁle of the lipase: C. antarctica lipase B, which prefers short- and medium-chain-length fatty acids, binds the scissile fatty acid at the wall of its narrow, 6-A˚-long funnel, while the long-chainspeciﬁc Rhizomucor miehei lipase has a 10-A˚-long hydrophobic crevice (23). The latter has been blocked in the homologous Rhizopus lipases by point mutants, thus shifting the speciﬁcity proﬁle toward short-chain fatty acids (25,26). 2.3

Classification by Conserved Structural Elements

The systematic comparison of sequence and structure of all microbial lipases demonstrated that despite their variability in sequence and structure, they can be assigned to either of two classes derived from the structure of the oxyanion hole (21): the GGGX type (with G binding to the oxyanion), which includes mostly carboxylesterases, and the GX type, which includes all other lipases that bind the oxyanion via the backbone nitrogen of a hydrophobic or hydrophilic residue (denoted X). This structure-based classiﬁcation seems to have direct implications to substrate speciﬁcity: While all GX-type lipases are not accepting esters of tertiary alcohols, most GGGXtype lipases are hydrolyzing these substrates at moderate enantioselectivity (26a). 3

A QUANTITATIVE MODEL OF ENANTIOSELECTIVITY

Secondary alcohols are industrially important optically active intermediates. Racemic resolution or enantioselective acylation catalyzed by P. cepacia lipase has been well studied because of the high enantioselectivity of the enzyme toward a broad range of substrates. However, for some substrates, enantioselectivity is low despite optimization of reaction conditions. The catalytic machinery of P. cepacia lipase (catalytic triad S87–H286– D264 and oxyanion hole L17–Q88) is located at the bottom of the funnel-like substrate binding site. The acid and the alcohol moieties of the substrate bind to the wall of the funnel: 1) the acid moiety to the hydrophobic crevice (23); 2) the large substituent of the alcohol moiety to the hydrophobic dent (27) (side chains of L248, L287, V266, and backbone atoms of the catalytic H286); 3) the medium-sized substituent of the alcohol moiety to the entrance to the hydrophilic trench (side chains of T18, Y29, H86, L287, I290); 4) the side chains of Y29 and L287 open toward the hydrophilic trench, which consists of

Quantitative Modeling of Lipase Enantioselectivity

63

hydrophilic and hydrophobic side chains (T18, L27, Y29, H86, L287, Q292, I290, L293), and the backbone atoms of Y29; 5) two rigid structures, the oxyanion stop (backbone atoms of L17 and T18) near the oxyanion hole and the His stop (side chain of H86 and backbone of the catalytic H286) near the hydrophobic dent. The two enantiomers of 30 chiral secondary alcohol substrates for which experimental E values have been published (24) were manually placed in the substrate binding site of P. cepacia lipase and relaxed by molecular dynamics simulation (Fig. 1). Secondary alcohols can bind in two binding modes (27): 1) in a productive binding mode, where the distance d(HNq–Oalc) between the HNq of the catalytic H286 and the alcohol oxygen Oalc of the substrate is less than 2.5 A˚, thus allowing formation of a hydrogen bond; the fast-reacting enantiomer optimally binds in this mode, while the slow-reacting enantiomer is repelled by the oxyanion stop; 2) in a non-

Figure 1 Surface of the substrate binding site of Pseudomonas cepacia lipase (hydrophilic and hydrophobic side chains) in complex with the fast-reacting enantiomer of substrate in productive binding mode; the alcohol moiety is a chiral secondary alcohol with hydrogen (light gray) and two substituents L and M at the stereo center: the large substituent (L) binds to the hydrophobic dent, the mediumsized substituent (M) near the entrance to the hydrophilic trench; the scissile fatty acid (R) binds to the hydrophobic crevice (Ref. 27).

64

Pleiss

productive binding mode, where the distance d(HNq–Oalc) is greater than 2.5 A˚. While the slow-reacting enantiomer optimally binds in this mode, the fastreacting enantiomer is blocked by the His stop. Thus enantiopreference can be explained by the fast- and slow-reacting enantiomers preferably binding to a productive and a nonproductive mode, respectively. This is in accordance with x-ray data on binding of D- and L-menthol to C. rugosa lipase (16). Both enantiomers of the 30 substrates were docked in the productive mode and d(HNq–Oalc) was determined. For the slow-reacting enantiomer, d(HNq–Oalc) correlated best to the experimentally determined E values. Three regions were assigned (Fig. 2): For substrates with low E values (E < 20), distances d(HNq–Oalc) are smaller than 2.0 A˚. The high activity toward the slow-reacting enantiomer is consistent with the observed low enantioselectivity. For substrates with high E values (E > 100), distances d(HNq–Oalc) are larger than 2.2 A˚. Substrates in a twilight zone between 2.0 and 2.2 A˚ have unpredictable enantioselectivity. This in silico assay of enantioselectivity was also successfully applied to explain enantioselectivity of C. rugosa lipase toward secondary alcohols with two stereo centers (28) and of P. cepacia lipase toward g- and y-lactones (29). As an alternative to analyzing the geometry of enzyme–substrate complexes, the diﬀerence in free energy has been determined by molecular mod-

Figure 2 Correlation of d(HNq–Oalc) for the slow-reacting enantiomer in a productive binding mode with experimental E values for 30 substrates (E values > 100 were displayed at E = 100); three zones are indicated. Low and high E values are separated by a twilight zone (Ref. 27).

Quantitative Modeling of Lipase Enantioselectivity

65

eling (30). By docking both enantiomers of chiral secondary alcohols to Candida antarctica lipase B, the two binding modes and enantiopreference can be predicted by comparing the potential energy of complexes with both enantiomers (30,31). When entropy was included in the evaluation of free energy diﬀerences, the changes in enantioselectivity can also be reproduced (32,33). 4

MUTANTS WITH CHANGED ENANTIOSELECTIVITY

4.1

Stereoselectivity Toward Triacylglycerols and Sn-2 Substituted Analogs Triacylglycerols are the natural substrates of lipases. Lipases from ﬁlamentous fungi from the genus Rhizopus and Mucorales were shown to predominantly hydrolyze the sn-1 and sn-3 groups (Fig. 3), and show slightly diﬀerent stereoselectivity (34). In an eﬀort to ﬁnd the structural determinants of stereoselectivity, structural analogs of triacylglycerols were investigated with

Figure 3 Left: Stereoselective hydrolysis of triradylglycerol to form sn-1,2- or sn2,3-diradylglycerols; right: ﬂexible (ether, benzylether, ester) and rigid (amide, phenyl) sn-2 substituents (Ref. 40).

66

Pleiss

their functional sn-2 ester group exchanged by an ether, amide, or phenyl group (35–37). Modifying the structure near the stereo center had an inﬂuence not only on enantiomeric excess, but also on stereopreference of Rhizopus lipase: While substrates with a ﬂexible sn-2 group (ether, ester) were preferably hydrolyzed in the sn-1 position, Rhizopus lipase had sn-3 preference toward substrates with a rigid sn-2 group (amide, phenyl). Moreover, the homologous Rhizomucor miehei lipase did not show this switch in stereopreference: For all four substrates, the lipase preferably hydrolyzed the sn-1 group, although the lipases from Rhizopus and R. miehei have a similar structure and their sequences are 56% identical. Thus stereopreference seems to depend both on the structure of the substrate and on the details of sequence and structure of the biocatalyst. An empirical rule to predict enantiopreference toward primary alcohols (7) could not be applied to explain these experimental results: It only includes lipase from Pseudomonas cepacia and excludes substrates with an oxygen next to the stereo center. To explain this puzzling observation, the interaction of Rhizopus and Rhizomucor lipases and triacylglycerols and sn-2 substituted analogs were modeled (37–39), and mutants were designed with modiﬁed stereoselectivity (40). In both lipases, the scissile fatty acid binds to a hydrophobic crevice (T83, A89, I93, F95, F112, L146, P178, V206, V209, P210, F216 in Rhizopus lipase) (23). The binding site of the diacylglycerol moiety consists of two

Figure 4 Triacylglycerol (ester substrate) in sn-1 (left) and sn-3 (right) orientation. Side chains of the catalytic S145 and the two mutated amino acids L258 and L254 are displayed. AO3–C3 describes the torsion of the bond between C3 of the glycerol backbone and an alcohol oxygen (Ref. 40).

Quantitative Modeling of Lipase Enantioselectivity

67

orthogonal a-helices (D204–V209 and S253–S259), the G elbow loop (39,41), and a hydrophobic patch, the hydrophobic dent (in Rhizopus lipase: I205, T252, L254, L258). Triacylglycerol substrates and their sn-2-substituted analogs were docked in two orientations to the binding sites (Fig. 4): in the sn-1 orientation with the scissile sn-1 chain bound to the hydrophobic crevice, or in the sn-3 orientation with the scissile sn-3 chain bound to the hydrophobic crevice. In both orientations, the sn-2 chain was positioned in the hydrophobic dent. The lipase–substrate complexes in both orientations of the substrates were relaxed by energy minimization and molecular dynamics simulation. The geometry of the averaged substrate structure was analyzed and correlated with the experimentally determined stereoselectivity (37,38). For both lipases and all substrates, the torsion angle AO3–C3 of the substrates in the sn-3 orientation was an appropriate probe of stereoselectivity: For AO3–C3 > 150j, both lipases preferably hydrolyzed the substrate in the sn-1 position; for AO3–C3 < 150j, the sn-3 position was preferred (Tab. 1).

Table 1 Stereoselectivity of Lipases from Rhizopus (ROL) and Rhizomucor Toward Flexible (Ether, Ester) and Rigid (Amide, Phenyl) Triradylglycerols Experimental Lipase/substrate

Preference

ee value [%]

Model a

E

b

Preference

AO3–C3

ROL Ether Ester Amide Phenyl

sn-1 sn-1 sn-3 sn-3

61 19 63 77

(F (F (F (F

2) 5) 6) 3)

4 1 5 8

sn-1 sn-1 sn-3 sn-3

164j 170j 117j 118j

RML Ether Ester Amide Phenyl

sn-1 sn-1 sn-1 sn-1

69 73 56 68

(F (F (F (F

4) 3) 2) 2)

6 7 4 6

sn-1 sn-1 sn-1 sn-1

163j 169j 166j 173j

a

ee ¼

½A ½B 100 ½A þ ½B

E¼

lnð1 cð1 þ eep ÞÞ ; at conversion c ¼ 10% lnð1 cð1 eep ÞÞ

b

Source: Ref. 39.

68

Pleiss

Comparing the two orientations, the side chain of L258 diﬀerentiated between both conformations by interaction with the substrate. This seemed to be the major determinant of stereoselectivity. In the sn-1 orientation, the functional group of the sn-2 chain binds deep in the His gap, a cleft between the catalytic H257 and its neighbor L258 (Fig. 5); in the sn-3 orientation, the sn-2 chain is near the entrance to the His gap. As a consequence, the more rigid and bulky the sn-2 substituent, the more unfavorable its sn-1 orientation and the more favorable its sn-3 orientation. Thus the structure of the substrate’s sn-2 group and the shape of the lipase’s His gap determine stereoselectivity. 4.2

Reversal of Stereopreference for Rigid Substrates

To validate the role of the His gap for stereoselectivity, its size was modiﬁed by replacing L258 by a bulky phenylalanine, a small and hydrophobic alanine, or a hydrophilic serine. Mutants that lead to an increased His gap were expected to display increased sn-1 selectivity or decreased sn-3 selectivity, while a decrease of the His gap size should shift stereoselectivity toward sn-3. As expected, the modeled torsion angles and the experimentally determined stereoselectivity followed this prediction: For the rigid amide and phenyl substrates, the mutant L258F had increased sn-3 selectivity (E = 14 and E = 22, compared to wild type E = 5 and E = 8 for amide and phenyl substrates, respectively), while the mutants L258A and L258S were less sn-3 selective toward the amide substrate (40). The most interesting eﬀect was observed for

Figure 5 Model of a complex of Rhizomucor lipase and trioctanoin in sn-1 orientation; the functional ester group of the sn-2 fatty acid points toward the His gap (side chains of H257 and L258) (Ref. 41).

Quantitative Modeling of Lipase Enantioselectivity

69

the bulky phenyl substrate: While the mutant L258F had higher sn-3 selectivity than the wild-type enzyme (E = 22 and E = 8, respectively), the stereopreference of the mutants L258A and L258S switched to sn-1 (E=5 and E=3, respectively). Thus exchanging a single residue led to a reversal of the apparent handedness of the biocatalyst. However, this eﬀect was highly speciﬁc: It occurred only toward the phenyl substrate but not toward the amide or other substrates. The eﬀect of mutations toward ﬂexible substrates (ether, ester) was less pronounced. Because L258A and L258S had similar selectivity, the shape of the His gap, but not its physicochemical properties, mediates stereoselectivity. For all mutants and substrates, the torsion angle AO3–C3 predicts stereopreference (AO3–C3 > 150j: sn-1 selective; AO3–C3< 150j: sn-3 selective), and even ranking of substrates and mutants by stereoselectivity (Fig. 6).

Figure 6 Correlation of experimentally determined E values and torsion angle AO3– C3 for all mutants of Rhizopus lipase and substrates; sn-1 preference: E > 1, AO3–C3 > 150j, sn-3 preference: E < 1, AO3–C3 < 150j.

70

5 5.1

Pleiss

TUNING ENANTIOSELECTIVITY BY REACTION CONDITIONS Solvent Effects

To establish a quantitative model of the enantioselectivity of Pseudomonas lipase toward secondary alcohols (cf Chapter 3), experimentally determined E values were collected from literature. Selected data were determined under optimized reaction conditions. For most of the substrates, enantioselectivity could be increased by choosing the appropriate solvent. One of the investigated secondary alcohols (medium-sized and large-substituent CF3 and naphthyl, respectively, according to Fig. 1) had low enantioselectivity (E = 22) in t-butyl methyl ether. Changing the solvent led to moderate (E = 60–70 in diethyl ether, toluene, dodecane, and hexane) and even high enantioselectivity (E > 100 in tetrahydrofurane, acetone, and benzene) (42). However, for other substrates, solvent engineering failed to improve low enantioselectivity (43). The in silico assay oﬀers a structural interpretation of this observation (27) assuming an upper limit of enantioselectivity, which is determined by the structure of lipase and substrate, and can be probed by modeling. For large or small distances d(HNq–Oalc), high or low, respectively, enantioselectivity is expected if the optimal solvent is used. However, for a suboptimal solvent, enantioselectivity decreases below this structure-based optimum. Thus for all substrates for which the experiment shows low enantioselectivity in a given solvent, but modeling results in large distances d(HNq–Oalc), it may be worthwhile trying to increase enantioselectivity by solvent engineering. However, for small distances, the model predicts low enantioselectivity for all solvents. 5.2

Pressure Dependence

Since long it has been suggested that enantioselectivity could be tuned by pressure (44). Recently, it was found for a supercritical CO2 system that increasing the pressure led to a decrease in enantioselectivity of Candida antarctica lipase B catalyzed acylation (45). This eﬀect was also observed for Candida rugosa lipase catalyzed transesteriﬁcation of esters of racemic menthol in chloroform under diﬀerent pressures (46). In order to rationalize these experimental ﬁndings, a fully solvated system of Candida rugosa lipase in chloroform at 7% water content was investigated by molecular dynamics simulations at various pressures (46). A water-ﬁlled cavity was identiﬁed, which leads from the protein surface through the center of the protein toward the catalytic H449. At increasing pressures, it gradually ﬁlled with water molecules, thus increasing its volume and displacing the catalytic histidine

Quantitative Modeling of Lipase Enantioselectivity

71

Figure 7 Pressure induced displacement of the H449 side chain in the active site of Candida rugosa lipase. The lipase structure was averaged over the last 50 ps of the 100 bar simulation with the coordinates of the 13 water molecules in the water channel taken from the snapshot at 250 ps. The (+)-menthyl ester was docked as tetrahedral intermediate to the averaged structure and energy was minimized. In comparison, the crystal structure (1LPM) contains only 6 water molecules in the water channel (Ref. 46).

side chain (Fig. 7). The diﬀerence DdNq–O = dNq–O+ dNq–O of the distance between the H449–Nq and the menthyl oxygen for fast- and slow-reacting enantiomer were analyzed; as the H449 side chain was displaced, DdNq–O decreased, which can explain the decreasing enantioselectivity (Fig. 8). 6

OUTLOOK: FROM QUALITATIVE TO QUANTITATIVE INFORMATION

For a broad range of lipases and substrates (secondary and primary alcohols and caboxylic acids), the molecular basis of enantiorecognition was modeled by ﬂexible docking using molecular dynamics simulations. As the binding sites and the substrates are hydrophobic, enzyme–substrate interaction is predominantly sterical. Upon docking, the protein side chains and the substrate change their conformation. The resulting conformation of protein side chains and substrate diﬀers for the two enantiomers. For each substrate class, a geometrical parameter that predicts the fast-reacting enantiomer could be

72

Pleiss

Figure 8 Correlation between experimentally determined enantioselectivity E and the diﬀerence of distances between H449–Nq and menthyl-alcohol-O of the (+)- and ()-enantiomer DdNq–O = dNq–O+ dNq–O [dNq–O+, dNq–O: distance between (+) and ()-menthyl-alcohol-O, respectively] (46).

identiﬁed. In addition, a semiquantitative correlation was observed between a geometrical parameter derived from the model and the experimentally determined enantioselectivity as measured by the E value. This correlation holds with few exceptions for changing the substrate structure, and also for changing the shape of the binding site by site-directed mutagenesis. Thus the in silico assay could be applied to predict ranking of mutants by enantioselectivity toward a single substrate, and also to predict ranking of substrates. Enzyme–substrate pairs can only be ranked by comparing identical reaction conditions. Changing external parameters such as hydrostatic pressure or solvent type may have dramatic eﬀects on enantioselectivity: Increasing pressure leads to diﬀusion of water molecules into internal cavities of Candida rugosa lipase, thus changing the shape of the binding site and, consequently, its enantioselectivity. The eﬀect of solvent can be estimated by the following observation: There seems to be a maximum enantioselectivity for each enzyme–substrate pair that is determined by the structure of both components and that can be predicted by modeling. Enzyme–substrate pairs with low or high enantioselectivity are predicted by small or large distances d(HNq–Oalc), respectively, in the model. This structure-limited maximum enantioselectivity can be attained by using the optimal solvent. However, nonoptimal solvent decreases enantioselectivity. Thus enzyme–substrate pairs with large distances

Quantitative Modeling of Lipase Enantioselectivity

73

d(HNq–Oalc) but with low experimental enantioselectivity are expected to be optimizable by solvent engineering. As it is known since long that lipase selectivity may be strongly inﬂuenced by the reaction medium, several suggestions have been put forward to explain these eﬀects (5). Because the two enantiomers bind in diﬀerent orientations, the solvent-exposed surface might diﬀer. Exchanging the solvent could lead to changes in the diﬀerence in free energy of binding, DDG, of the two enantiomers. Alternatively, solvent molecules might compete with the two enantiomers for the binding site; exchanging the solvent would shift the equilibrium between the two enantiomers. As a third explanation, the solvent might change the average structure and the dynamics of the lipase, which could result in a change of enantioselectivity. Another factor that mediates enantioselectivity are mutations in the lipase. For mutants in the binding site with direct contact to the substrates, their short-range interaction with the substrate can be fairly modeled by analyzing local geometry (40) or evaluating activation enthalpy and entropy (32,33). However, there is growing evidence that amino acids located far from the binding site can also mediate enantioselectivity of lipases, as shown by directed evolution experiments (47) or by chemical modiﬁcation (48). How can such long-range interaction be rationalized? For other proteins, it has been shown that the speciﬁcity of a receptor or the interaction of an enzyme with reaction intermediates are linked to the dynamics of the complex (49–51). In a few cases, it has been demonstrated that mutations or chemical modiﬁcations may indeed change the dynamics of the protein (52,53). In principle, molecular dynamics simulations of solvated systems are appropriate to study these eﬀects. With current hardwares and softwares, simulations of mediumsized proteins are performed in the 10–100 nsec time scale (54), which is not far from the Asec to msec time scale of slow hinge bending motions that are suspected to play a dominant role in binding (55). Understanding the relationship of sequence, structure, dynamics, and function will open new routes of analyzing sequence information. Today, qualitative information is derived from sequence data: assignment to a protein family or enzyme class, and, in the best case, annotation of functionally relevant amino acids. However, if we want to understand the properties of enzyme mutants or if we want to apply protein engineering to optimize biochemical properties, we need methods for quantitative prediction of how enantioselectivity depends on protein sequence, solvent eﬀects, or substrate structure. A mutant is not simply ‘‘better than wild type’’—it is better toward one given substrate, but may be worse toward another. Thus for a reaction to be optimized, mutants have to be engineered in a personalized way to match the reaction conditions and the structure of the substrate. Understanding the interplay of sequence, structure, dynamics of an enzyme, and its interaction

74

Pleiss

with substrate and solvent on a quantitative level will allow to understand the metabolic function of enzymes, to direct engineering, and to control the quality of biocatalytic experiments.

REFERENCES 1. 2. 3. 4. 5. 6.

7. 8.

9.

10. 11.

12. 13.

14. 15.

RD Schmid, R Verger. Lipases: interfacial enzymes with attractive applications. Angew Chem Int Ed Engl 37:1608–1633, 1998. UT Bornscheuer, RJ Kazlauskas. Hydrolases in Organic Synthesis—Regioand Stereoselective Biotransformations. Weinheim: Wiley-VCH, 1999. A Svendsen. Lipase protein engineering. Biochim Biophys Acta 1543:223–238, 2000. F Theil. Enhancement of selectivity and reactivity of lipases by additives. Tetrahedron 56:2905–2919, 2000. P Berglund. Controlling lipase enantioselectivity for organic synthesis. Biomol Eng 18:13–22, 2001. RJ Kazlauskas, ANE Weissﬂoch, AT Rappaport, LA Cuccia. A rule to predict which enantiomer of a secondary alcohol reacts faster in reactions catalyzed by cholesterol esterase, lipase from Pseudomonas cepacia, and lipase from Candida rugosa. J Org Chem 56:2656–2665, 1991. ANE Weissﬂoch, RJ Kazlauskas. Enantiopreference of lipase from Pseudomonas cepacia toward primary alcohols. J Org Chem 60:6959–6969, 1995. SN Ahmed, RJ Kazlauskas, AH Morinville, P Grochulski, JD Schrag, M Cygler. Enantioselectivity of Candida rugosa Lipase toward Carboxylic Acids—a Predictive Rule from Substrate Mapping and X-ray Crystallography. Biocatalysis 9:209–225, 1994. L Brady, AM Brzozowski, ZS Derewenda, E Dodson, G Dodson, S Tolley, JP Turkenburg, L Christiansen, B Huge-Jensen, L Norskov. A serine protease triad forms the catalytic centre of a triacylglycerol lipase. Nature 343:767–770, 1990. FK Winkler, A D’Arcy, W Hunziker. Structure of human pancreatic lipase. Nature 343:771–774, 1990. U Derewenda, AM Brzozowski, DM Lawson, ZS Derewenda. Catalysis at the interface: the anatomy of a conformational change in a triglyceride lipase. Biochemistry 31:1532–1541, 1992. P Grochulski, Y Li, JD Schrag, M Cygler. Two conformational states of Candida rugosa lipase. Protein Sci 3:82–91, 1994. JD Schrag, Y Li, M Cygler, D Lang, T Burgdorf, HJ Hecht, R Schmid, D Schomburg, TJ Rydel, JD Oliver, LC Strickland, CM Dunaway, SB Larson, J Day, A McPherson. The open conformation of a Pseudomonas lipase. Structure 5:187–202, 1997. R Verger. ‘‘Interfacial activation’’ of lipases: facts and artefacts. Trends Biotechnol 15:32–38, 1997. DL Ollis, E Cheah, M Cygler, B Dijkstra, F Frolow, SM Franken, M Harel, SJ

Quantitative Modeling of Lipase Enantioselectivity

16.

17.

18. 19. 20.

21.

22. 23. 24.

25. 26.

26a.

27.

28.

29.

30.

75

Remington, I Silman, J Schrag. The alpha/beta hydrolase fold. Protein Eng 5:197–211, 1992. M Cygler, P Grochulski, RJ Kazlauskas, JD Schrag, F Bouthillier, B Rubin, AN Serreqi, AK Gupta. A structural basis for the chiral preferences of lipases. J Am Chem Soc 116:3180–3186, 1994. DA Lang, MLM Mannesse, GH DeHaas, HM Verheij, BW Dijkstra. Structural basis of the chiral selectivity of Pseudomonas cepacia lipase. Eur J Biochem 254:333–340, 1998. RS Phillips. Temperature eﬀects on stereochemistry of enzymatic reactions. Enzyme Microb Technol 14:417–419, 1992. PLA Overbeeke, SC Orrenius, JA Jongejan, JA Duine. Enthalpic and entropic contributions to lipase enantioselectivity. Chem Phys Lipids 93:81–93, 1998. PLA Overbeeke, J Ottosson, K Hult, JA Jongejan, JA Duine. The temperature dependence of enzyme kinetic resolutions reveals the relative importance of enthalpy and entropy to enzyme enantioselectivity. Biocatal Biotransform 17: 61–79, 1999. J Pleiss, M Fischer, M Peiker, C Thiele, RD Schmid. Lipase Engineering Database—understanding and exploiting sequence–structure–function relationships. J Mol Catal, B 10:491–508, 2000. A Bairoch, R Apweiler. The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Res 25:31–36, 1997. J Pleiss, M Fischer, RD Schmid. Anatomy of lipase binding sites: the scissile fatty acid binding site. Chem Phys Lipids 93:67–80, 1998. RJ Kazlauskas, UT Bornscheuer. Biotransformations with lipases. In: H-J Rehm, G Reed, Eds. Biotechnology. Weinheim, New York: Wiley-VCH, 1998, pp 37–191. RD Joerger, MJ Haas. Alteration of chain length selectivity of a Rhizopus delemar lipase through site-directed mutagenenesis. Lipids 29:377–384, 1994. RR Klein, G King, RA Moreau, MJ Haas. Altered acyl chain length speciﬁcity of Rhizopus delemar lipase through mutagenesis and molecular modeling. Lipids 32:123–130, 1997. E Henke, J Pleiss, UT Bornscheuer. Activity of lipases and esterases towards tertiary alcohols: Insights into structure-function relationships. Angew Chem Int Ed 41:3211–3213, 2002. T Schulz, J Pleiss, RD Schmid. Stereoselectivity of Pseudomonas cepacia lipase toward secondary alcohols: a quantitative model. Protein Sci 9:1053–1062, 2000. T Schulz, RD Schmid, J Pleiss. Structural basis of stereoselectivity in Candida rugosa lipase catalyzed hydrolysis of secondary alcohols. J Mol Model 7:265– 270, 2001. B-Y Hwang, H Scheib, J Pleiss, B-G Kim, RD Schmid. Computer-aided molecular modeling of the enantioselectivity of Pseudomonas cepacia lipase toward g- and y-lactones. J Mol Catal B 10:223–231, 2000. F Haeﬀner, T Norin, K Hult. Molecular modeling of the enantioselectivity in lipase-catalyzed transesteriﬁcation reactions. Biophys J 74:1251–1262, 1998.

76

Pleiss

31. S Raza, L Fransson, K Hult. Enantioselectivity in Candida antarctica lipase B: a molecular dynamics study. Protein Sci 10:329–338, 2001. 32. J Ottosson, JC Rotticci-Mulder, D Rotticci, K Hult. Rational design of enantioselective enzymes requires considerations of entropy. Protein Sci 10:1769– 1774, 2001. 33. D Rotticci, JC Rotticci-Mulder, S Denman, T Norin, K Hult. Improved enantioselectivity of a lipase by rational protein engineering. Chembiochem 2:766– 770, 2001. 34. E Rogalska, C Cudrey, F Ferrato, R Verger. Stereoselective hydrolysis of triglycerides by animal and microbial lipases. Chirality 5:24–30, 1993. 35. P Stadler, A Kovac, L Haalck, F Spener, F Paltauf. Stereoselectivity of microbial lipases. The substitution at position sn-2 of triacylglycerol analogs inﬂuences the stereoselectivity of diﬀerent microbial lipases. Eur J Biochem 227: 335–343, 1995. 36. A Kovac, P Stadler, L Haalck, F Spener, F Paltauf. Hydrolysis and esteriﬁcation of acylglycerols and analogs in aqueous medium catalyzed by microbial lipases. Biochim Biophys Acta 1301:57–66, 1996. 37. L Haalck, F Paltauf, J Pleiss, RD Schmid, F Spener, P Stadler. Stereoselectivity of lipase from Rhizopus oryzae towards triacylglycerols and analogs: computer aided modeling and experimental validation. Methods Enzymol: Lipases 284: 353–376, 1997. 38. H-C Holzwarth, J Pleiss, RD Schmid. Computer aided modelling of Rhizopus oryzae lipase catalyzed stereoselective hydrolysis of triglycerides. J Mol Catal B 3:73–82, 1997. 39. H Scheib, J Pleiss, A Kovac, F Paltauf, RD Schmid. Stereoselectivity of Mucorales lipases toward triradylglycerols—a simple solution to a complex problem. Protein Sci 8:215–221, 1999. 40. H Scheib, J Pleiss, P Stadler, A Kovac, AP Potthoﬀ, L Haalck, F Spener, F Paltauf, RD Schmid. Rational design of Rhizopus oryzae lipase with modiﬁed stereoselectivity toward triradylglycerols. Protein Eng 11:675–682, 1998. 41. J Pleiss, H Scheib, RD Schmid. The His gap motif in microbial lipases: a determinant of stereoselectivity toward triacylglycerols and analogs. Biochimie 82:1043–1052, 2000. 42. J Gaspar, A Guerrero. Lipase-catalysed enantioselective synthesis of napthyl triﬂuoromethyl carbinols and their corresponding non-ﬂuorinated counterparts. Tetrahedron: Asymmetry 6:231–238, 1995. 43. I Petschen, EA Malo, MP Bosch, A Guerrero. Highly enantioselective synthesis of long chain alkyl triﬂuoromethyl carbinols and h-thiotriﬂuoromethyl carbinols through lipases. Tetrahedron: Asymmetry 7:2135–2143, 1996. 44. SV Kamat, B Iwaskewycz, EJ Beckman, AJ Russell. Biocatalytic synthesis of acrylates in supercritical ﬂuids: tuning enzyme activity by changing pressure. Proc Natl Acad Sci U S A 90:2940–2944, 1993. 45. T Matsuda, R Kanamaru, K Watanabe, T Harada, K Nakamura. Control on enantioselectivity with pressure for lipase-catalyzed esteriﬁcation in supercritical carbon dioxide. Tetrahedron Lett. 42:8319–8321, 2001.

Quantitative Modeling of Lipase Enantioselectivity 46.

47.

48.

49.

50. 51.

52.

53. 54. 55.

77

UHM Kahlow, RD Schmid, J Pleiss. A model of the pressure dependence of the enantioselectivity of Candida rugosa lipase towards (+/)-menthol. Protein Sci 10:1942–1952, 2001. DX Zha, S Wilensek, M Hermes, KE Jaeger, MT Reetz. Complete reversal of enantioselectivity of an enzyme-catalyzed reaction by directed evolution. Chem Commun 2664–2665, 2001. M Basri, BL Th’ng, CN Razak, AB Salleh. Eﬀect of reductive alkylation of Candida rugosa lipase on its enantioselective esteriﬁcation reaction. Ann N Y Acad Sci 864:192–197, 1998. L Zidek, MV Novotny, MJ Stone. Increased protein backbone conformational entropy upon hydrophobic ligand binding. Nat Struct Biol 6:1118–1121, 1999. JL Radkiewicz, CL Brooks. Protein dynamics in enzymatic catalysis: exploration of dihydrofolate reductase. J Am Chem Soc 122:225–231, 2000. MJ Osborne, J Schnell, SJ Benkovic, HJ Dyson, PE Wright. Backbone dynamics in dihydrofolate reductase complexes: role of loop ﬂexibility in the catalytic mechanism. Biochemistry 40:9846–9859, 2001. RB Rose, CS Craik, RM Stroud. Domain ﬂexibility in retroviral proteases: structural implications for drug resistant mutations. Biochemistry 37:2607– 2621, 1998. BF Volkman, D Lipson, DE Wemmer, D Kern. Two-state allosteric behavior in a single-domain signaling protein. Science 291:2429–2433, 2001. V Daggett. Long timescale simulations. Curr Opin Struct Biol 10:160–164, 2000. B Ma, M Shatsky, HJ Wolfson, R Nussinov. Multiple diverse ligands binding at a single protein site: a matter of pre-existing populations. Protein Sci 11:184– 197, 2002.

5 Rational Redesign of Haloalkane Dehalogenases Guided by Comparative Binding Energy Analysis ˇek and Toma ´ˇ ˇka ´, Jan Kmunı´c Jirˇı´ Damborsky s Jedlic Masaryk University Brno, Czech Republic

Santos Luengo and Federico Gago University of Alcala Madrid, Spain

Angel R. Ortiz Mount Sinai School of Medicine New York, New York, U.S.A.

Rebecca C. Wade EML Research Heidelberg, Germany

1

COMPARATIVE BINDING ENERGY ANALYSIS

Comparative binding energy (COMBINE) analysis is a computational method for deducing quantitative structure–activity relationships using structural 79

80

Damborsky´ et al.

data from ligand–macromolecule complexes (1). It can be applied to the formation of macromolecule–small molecule complexes and macromolecule– macromolecule complexes; in this article, these complexes will be referred to generically as macromolecule–ligand complexes. The ‘‘COMBINE’’ acronym refers to two aspects of the technique (2): (i) macromolecule–ligand structural data are combined with experimental binding data and (ii) empirical molecular mechanics energy calculations are combined with Partial Least-Squares Projection to Latent Structures (PLS) chemometric analysis. COMBINE analysis systematically explores the relationships between experimental binding aﬃnities for a set of ligands and selected interaction energies with the macromolecule. COMBINE analysis is formally similar to CoMFA (comparative molecular ﬁeld analysis) (3) in as much as both methods deal with data matrices containing a large number of energy descriptors that are subjected to chemometric analysis. On the other hand, the energy descriptors diﬀer: in CoMFA they are interaction ﬁelds calculated for the ligand alone, whereas in COMBINE analysis they represent residue-based ligand-receptor interactions. Compared to classical molecular mechanics calculations of binding energies, the advantages of subjecting ligand–macromolecule interaction energies to statistical analysis are that the noise due to inaccuracies in the potential energy functions and molecular models can be reduced and that mechanistically important interactions can be identiﬁed. Compared to classical Quantitative Structure–Activity Relationships (QSAR) analysis, COMBINE is expected to be more predictive as it incorporates more physically relevant information about the energetics of ligand–receptor interactions (1). To estimate the total binding energy for each ligand–macromolecule complex, DU, a molecular mechanics force ﬁeld is used to calculate the following terms: (i) the sum, EINTERLM, of intermolecular interaction energies (Dui) between the ligand and each macromolecule residue, each of which consists of van der Waals and electrostatic contributions; (ii) the change in intramolecular energy of the ligand upon binding to the macromolecule, DEL; and (iii) the change in intramolecular energy of the macromolecule upon ligand binding, DEM. In addition, a measure of the cost in electrostatic free energy of desolvating the apposing surfaces of both interacting partners upon complex formation (4) is estimated using a continuum electrostatics method that provides two extra terms: (iv) the desolvation energy of the ligand, EDESOLVL, and (v) the desolvation energy of the macromolecule, EDESOLVM. L M LM þ DE L þ DEM þ EDESOLV þ EDESOLV DU ¼ EINTER

ð1Þ

The COMBINE analysis methodology is schematized in Fig. 1. The energy descriptors obtained from the set of experimentally determined or modeled ligand–macromolecule complexes are used to construct a matrix in

Rational Redesign of Haloalkane Dehalogenases

1

2

L1

L2

L1M

3

..... L2M

81

Ln

M

+ .....

LnM

R 1 R 2 R 3 R 4 ...R n R 1 R 2 R 3 R 4 ...R n E DESOLV Ki L1 L2 L3 . . Ln

E VDW

E ELE

PLS

COMBINE model

Figure 1 Scheme of the COMBINE analysis methodology. L1. . .Ln—ligands; M— macromolecule; L1M. . .LnM—ligand–macromolecule complexes; R1. . .Rn—residue of a macromolecule; EVDW—van der Waals interaction energy; EELE—electrostatic interaction energy; EDESOLV—substrate desolvation energy; Ki—log (experimental binding aﬃnity); PLS—Partial Least Squares Projection to Latent Structures analysis.

which the rows represent the diﬀerent ligands and the columns contain the two blocks of residue-based molecular mechanics energy information (van der Waals and electrostatic) plus the additional desolvation energy terms and a last column containing the experimental binding aﬃnities/activities. This matrix is then projected to a small number of latent variables using the PLS method (5), and the original energy terms are given weights, wi, according to their importance in the model. 2

APPLICATION OF COMBINE ANALYSIS IN DRUG DESIGN

COMBINE analysis was initially used for the study of protein–inhibitor complexes. Ortiz et al. (1) applied COMBINE analysis to a series of 26 inhibitors of the human synovial ﬂuid phospholipase A2. The COMBINE model explained 92% (82% cross-validated) of the quantitative variability of binding constants and provided insight into the mechanism of phospholipase inhibition. Only 2% of the energy terms were required for explaining the diﬀerences in activity. The model indicated that the calcium ion present in the enzyme active site is important for inhibitory activity as is the steric accommodation of the inhibitors in the binding site of the enzyme. Perez et al. (4) conducted COMBINE analysis with a set of 33 HIV-1 protease inhibitors and externally validated their models using an additional 16 inhibitors.

82

Damborsky´ et al.

Incorporation of electrostatic desolvation eﬀects in the model resulted in signiﬁcant improvement of its predictive ability. The model constructed for a merged set of 49 inhibitors explained 91% (81% cross-validated) of the quantitative variability of the experimental data. This study was further extended by Pastor et al. (6) who incorporated the two possible binding modes of the HIV-1 inhibitors into the COMBINE model. This was achieved by manipulation of the data matrix used to describe the interaction energies and provided a model with improved external predictive ability and simpliﬁed interpretability. Tomic et al. (7) developed a COMBINE model for the binding speciﬁcity of transcription factors of the nuclear receptor family to DNA. They analyzed experimental data for the interaction of 20 mutant glucocorticoid receptor DNA-binding domains with 16 diﬀerent response elements in a total of 320 complexes. The analysis revealed that speciﬁcity of binding of the transcription factor to DNA is largely determined by the energy cost of DNA desolvation and is tuned by intermolecular electrostatic interactions and conformational changes. Lozano et al. (8) applied in parallel COMBINE and GRID/GOLPE analyses to a series of 12 heterocyclic amines and human cytochrome P450 1A2. The resultant COMBINE model explained 90% (74% cross-validated) of the quantitative variability of the activity data and corresponded well with the GRID/GOLPE model explaining 96% (79% cross-validated) of the quantitative variability of the activity data. The study showed that the combined use of two 3D-QSAR approaches for model construction acts as a mutual validation procedure and allows a more reliable and detailed interpretation of the results. Cuevas et al. (9) studied 40 complexes of human neutrophil elastase with the N3-substituted triﬂuoromethylketone-based pyridone inhibitors. The authors carried out Poisson–Boltzmann computations and derived two additional descriptors representing the electrostatic energy contributions to the partial desolvation of both the receptor and the ligands, and solvent-screened electrostatic interactions. Incorporation of these descriptors into the model improved its statistical parameters. Most recently, Wang and Wade (10) constructed a COMBINE model for two subtypes and one mutant of neuraminidase from inﬂuenza virus complexed with 43 inhibitors. The model highlighted 12 protein residues and 1 bound water molecule as particularly important for inhibitory activity and indicated the potential for using COMBINE analysis to investigate species speciﬁcity and resistant mutants. 3

APPLICATION OF COMBINE ANALYSIS IN PROTEIN ENGINEERING

A primary goal of protein engineering is to alter the physico-chemical and functional properties of proteins by modiﬁcation of their structures. Protein

Rational Redesign of Haloalkane Dehalogenases

83

structures can be engineered either by directed evolutionary approaches (11,12), which do not require any a priori knowledge of protein–function relationships, or by rational design which is based on the knowledge of these relationships. Protein structures and structure–function relationships are often so complex that it is diﬃcult to study them without the use of computer graphics and computer modeling. COMBINE analysis quantitatively explores residue-based protein–ligand interactions and provides quantitative information about the importance of every residue in a macromolecule for the binding of diﬀerent substrates. Mutagenesis of the residues with the highest importance in a COMBINE model should lead to the most significant changes in substrate speciﬁcity. The molecular models of mutant structures can be constructed in silico and the eﬀects of substitution on substrate binding can be predicted prior to experiment using the COMBINE model. The application of COMBINE analysis to the study of structure– function relationships and engineering of haloalkane dehalogenase DhlA has been recently reported by Kmunicek et al. (13) and is further extended in this contribution.

4

PROTEIN ENGINEERING OF HALOALKANE DEHALOGENASES

Haloalkane dehalogenases are microbial enzymes that catalyze the cleavage of a carbon–halogen bond by a hydrolytic mechanism (Fig. 2). Haloalkane dehalogenases require a water molecule as the only co-factor for the reaction that is considered to be a critical step for the biological degradation of various haloalkanes (14). Haloalkanes are widely used as solvents, degreasing agents, intermediates in chemical synthesis, and pesticides. Therefore haloalkane dehalogenases could ﬁnd application in bioremediation technologies and chemical syntheses (15–17). Diﬀerent haloalkane dehalogenases have been isolated from various bacteria (18–25), but none of them shows suﬃcient activity toward some of the technologically interesting compounds, such as 1,2-dichloropropane, 2-chloropropane, 2-chlorobutane,

Figure 2 Reaction scheme of hydrolytic dehalogenation catalysed by haloalkane dehalogenases. Enz—enzyme.

84

Damborsky´ et al.

and 1,2,3-trichloropropane, although these substances have the potential to be good substrates for haloalkane dehalogenases from the reaction mechanism standpoint. Site-directed mutagenesis experiments were initiated to study structure–function relationships and redesign of haloalkane dehalogenases (26– 41). These studies identiﬁed some functional residues, such as the catalytic triad or pairs of transition-state and product stabilizing residues, but to our knowledge none of them provided enzymes with signiﬁcantly improved activities toward target substances. Structural studies have been conducted to determine the 3-D structures of the wild type (42–51) and mutant proteins (31,35,52). The haloalkane dehalogenases are composed of two domains. The core of the main domain consists of an eight-stranded h-pleated sheet with seven parallel strands and one antiparallel strand (Fig. 3). This h-sheet

Figure 3 Three-dimensional model of the haloalkane dehalogenase DhlA (A) and LinB (B), and topological arrangement of secondary elements in DhlA (C) and LinB (D). The structures were determined by protein crystallography (from Refs. 42,50). Numbering of the secondary elements respects the evolutionary changes in the cap domains (from Ref. 63). The triangles indicate position of the catalytic triad residues.

Rational Redesign of Haloalkane Dehalogenases

85

is surrounded by a-helices. The cap domain is lying on top of the main domain and consists of ﬁve a-helices. A buried, mainly hydrophobic cavity is located between these two domains. Three-dimensional structures provide not only a good starting point for the rational design of site-directed mutations and for the interpretation of results from mutagenesis experiments, but also essential data for computer-modeling studies. Molecular docking (51,53), quantitative structure–function relationships (54), quantum-mechanical calculations (55–60), and molecular dynamics simulations (61–63) have brought insights into the binding of substrates to the enzyme active site, the mechanism of the dehalogenation reaction, and the conformational behavior of several dehalogenase enzymes at atomic resolution. Although the haloalkane dehalogenases are currently being intensively studied and engineered, an eﬀective catalyst for some target compounds has not been obtained yet. Another approach to improve catalytic performance is to modify the reaction conditions. Grey et al. reported construction of a thermostable haloalkane dehalogenase DhaA (64) suitable for dehalogenation at elevated temperatures. 5

COMBINE MODEL FOR THE HALOALKANE DEHALOGENASE DhlA

COMBINE analysis was conducted to identify the protein residues responsible for the diﬀerences in binding aﬃnities of 18 chlorinated and brominated aliphatic substrates of haloalkane dehalogenase DhlA from Xanthobacter autotrophicus GJ10 (13). Experimental data for the following compounds were extracted from the literature (65): 1-chlorobutane, 1-chlorohexane, 1bromobutane, 1-bromohexane, 1,1-dichloromethane, 1,2-dichloroethane, 1,1-dibromomethane, 1,2-dibromoethane, 1,2-dichloropropane, 1,2-dibromopropane, 2-chloroethanol, 2-bromoethanol, epichlorohydrine, epibromohydrine, 2-chloroacetonitrile, 2-bromoacetonitrile, 2-chloroacetamide, and 2-bromoacetamide. The values of apparent dissociation constants (Km) varied by three orders of magnitude. The substrate molecules were positioned in the active site of DhlA in such a way that their C–X bonds aligned with the corresponding bond as found in the experimental structure of 1,2-dichloroethane in the Michaelis–Menten complex with DhlA (43). Manually prepared enzyme–substrate complexes were energy minimized and van der Waals and electrostatic interaction energies between the protein and the substrates were calculated and decomposed on a per residue basis using the program AMBER 5.0 (66). The data matrix composed of these intermolecular interaction energies, together with the desolvation energies calculated using an electrostatic continuum method, was correlated with Km values using the PLS method. A four-component model explained 91% (73% cross-validated) of

86

Damborsky´ et al.

Figure 4 Plots of observed vs. predicted Km values and the models of Michaelis complexes for structure-based model of DhlA (A,B), docking-based model of DhlA (C,D), and docking-based model of LinB (E,F). The nucleophile (Asp) and the halide-stabilizing residue (Trp) are shown in stick representation.

Rational Redesign of Haloalkane Dehalogenases

87

the quantitative variance in Km (Fig. 4A). The ﬁrst dimension mainly projected out the electrostatic term of Asp124, which contributes substantially to the energy variance but has a poor contribution to the Km correlation. Asp124 is the nucleophile that initiates the dehalogenation reaction and is in very close contact with the electrophilic carbon of each substrate (Fig. 4B). Analysis of the second, third, and fourth principal components showed that only a few energy variables, involving only a few protein residues, are important for explaining the diﬀerences in binding among substrates (1% of the enzyme’s amino acids explained 91% of variance in Km). These residues can be divided into two classes, with respect to their interaction with the substrates. The ﬁrst class is formed by residues separating chlorinated from brominated derivatives: Trp125, Trp175, and Pro223. These residues form the halogen binding site in the enzyme. Mutations aﬀecting these residues should be primarily used to modulate the halogen speciﬁcity of the enzyme. Phe222 also contributes to the separation of chlorinated derivates from brominated derivates, together with Leu179 (Fig. 5). The second set of residues discriminates substrates by their interactions with the substrate alkyl chains. These are mainly Phe172, Phe222, and Phe164, with a contribution from Asp124 as well. Mutations aﬀecting these residues can be used to tune the activity of the enzyme for diﬀerent chain speciﬁcity. a-Helix 4 has the largest concentration

Figure 5 Stereo view of the active site of haloalkane dehalogenase DhlA with bound ligands. Residues separating chlorinated from brominated derivatives are shown as dark sticks: Trp125, Trp175, Leu179, Phe222, and Pro223. Residues separating substrates according to the size and shape of their carbon chains are shown as light sticks: Phe164 and Phe172. The van der Waals surface of the protein atoms in direct contact with the halogen atom is represented by dots.

88

Damborsky´ et al.

of residues involved in explaining the Km diﬀerences: Phe172, Trp175, Lys176, and Leu179 (see Fig. 3 for the position of a-helix 4). This ﬁnding is in good agreement with experimental observations by Priest et al. (27), who isolated 12 in vivo mutants of DhlA with improved activity toward 1-chlorohexane and 9 of them carried modiﬁcations in a-helix 4 or its close surroundings. Priest et al. suggested that a-helix 4 is critical for the speciﬁcity of DhlA. The applicability of the COMBINE models to predictions was validated using two mutants of DhlA for which the crystal structures had been determined (33,35). Four substrate molecules with available experimental binding constants were modeled in the active sites of the mutant proteins and their Km values were predicted using the COMBINE model. The trends in changes of binding aﬃnity due to mutation were predicted correctly without exception (13). The main disadvantage of the methodology described above is the need for at least one experimental structure of an enzyme–substrate complex and the assumption that all substrates bind to the active site in the same mode. There are probably many enzymes for which the structural information on the enzyme–substrate complex is missing or which bind their substrates in diﬀerent orientations, e.g., broad-speciﬁcity enzymes. An additional study was therefore conducted with DhlA in which all substrate molecules were automatically and independently positioned inside the active site using a computational method. The remaining part of the COMBINE analysis procedure was the same as described above. The automated molecular docking program AutoDock 3.0 (67) was used for positioning 18 halogenated substrates into the active site of DhlA. The docking calculations provided suitable orientations for 15 out of these 18 substrate molecules, as no suitable orientations were found for dihaloacetamides and 2-bromoacetonitrile, i.e., they could not be docked with an orientation close to that necessary for catalysis. Multiple orientations were found for several substrates: 1,2-dibromopropane, halobutanes, and halohexanes. The orientations for the subsequent COMBINE analysis were selected using quantum mechanical calculations, which discriminated between binding modes on the basis of their suitability for the ensuing SN2 dehalogenation reaction. Dehalogenation reactions were simulated inside a reduced model of the active site of DhlA composed of 20 amino acids (Fig. 6) using the semi-empirical quantum mechanics program MOPAC 6.0 (68) interfaced by TRITON 2.0 (69). The selected orientations were further optimized by energy minimization and were found to be in very good agreement with the expected reaction mechanism of DhlA (Fig. 4D). Furthermore, bound substrates resembled the reactive conformation of 1,2-dichloroethane described by Lau et al. (62). The goodness of the ﬁt to the experimental data in the COMBINE model constructed from these selected docked orientations was comparable to that previously

Rational Redesign of Haloalkane Dehalogenases

89

Figure 6 Three-dimensional model of the active site of haloalkane dehalogenase DhlA used for quantum mechanical calculations as displayed in the main window of the TRITON program. This program is used for the preparation of the input data for calculation of reaction coordinates, for monitoring of the progress of calculation, and for analysis of output data. The software is freely available at http://ncbr.chemi.muni. cz/triton/triton.html.

obtained with the structure-based model (compare Fig. 4A and C). The model explained 96% (67% cross-validated) of the quantitative variance in Km, and two outliers (dihalopropanes) had to be removed from the model. The composition of the latent variables extracted and the importance of amino acid residues for explaining Km values, however, were similar in both models, leading to a similar biochemical interpretation. Interestingly, the automatic docking-based model employed more electrostatic contributions than the structure-based alignment model. The largest diﬀerence in van der Waals interactions was noted for the residues in direct contact with the halogenated

90

Damborsky´ et al.

hexanes (Phe222 and Leu263) due to the diﬀerent orientations and conformations adapted by these long substrates in the small active site.

6

COMBINE MODEL FOR THE HALOALKANE DEHALOGENASE LinB

Haloalkane dehalogenase LinB from Sphingomonas paucimobilis UT26 belongs to the same protein family as DhlA. These two proteins diﬀer both by their structures and their catalytic properties. The catalytic triad of LinB is composed of Asp108–His272–Glu132 (38), while the catalytic triad of DhlA consists of Asp124–His289–Asp260 (45). The catalytic acid is positioned after h-strand 6 in LinB (Fig. 3D) and after h-strand 7 in DhlA (Fig. 3C). Bound substrate, transition states, and product structures are primarily stabilized by hydrogen bonds from the Trp109–Asn38 pair in LinB and the Trp125– Trp175 pair in DhlA. The active site of LinB is 2.5 times larger than the active site of DhlA and is less buried inside the protein core (70). There are at least three tunnels leading to the active site of LinB, but only one tunnel in DhlA (63). LinB shows broader substrate speciﬁcity than DhlA, i.e., it is more active toward larger and h-substituted haloalkanes, and therefore it should be more suitable for the design of eﬃcient catalysts for the target compounds carrying a halogen in the h-position. Currently, there is no 3-D structure of a Michaelis complex for LinB. Furthermore, it is not safe to assume that all substrates bind to the large active site in a similar way. An automated docking-based methodology was therefore used for the construction of a COMBINE model for LinB. Experimental data (Km values) were determined for 25 substrates: 1-chloropropane, 1-chlorobutane, 1chlorohexane, 1-chloroheptane, 1-chlorooctane, 1-bromopropane, 1-bromobutane, 1-bromohexane, 1-iodopropane, 1-iodohexane, 1,3-dichloropropane, 1,5-dichloropentane, 1,2-dibromoethane, 1,3-dibromopropane, 1-bromo-3-chloropropane, 1,2-dibromopropane, 2-bromo-1-chloropropane, 1-bromo-2-methyl-propane, bis(2-chloroethyl)ether, chlorocyclohexane, bromocyclohexane, 4-bromobutyronitrile, 3-chloro-2-methylpropene, 3-chloro2-(chloromethyl)-1-propene, and 2,3-dichloropropene. A preliminary model, consisting of only one principal component, explained 91% (87% crossvalidated) of the quantitative variability in Km values (Fig. 4E). Two outliers, bis(2-chloroethyl)ether and chlorocyclohexane, had to be removed from the model (Fig. 4F). Extreme kcat/Km values were repeatedly measured with these substrates. The model explained the variability in Km values resulting from the diﬀerent lengths of the substrate molecules but could not deal properly with the variability originating from the diﬀerent halogens. In DhlA, halogen substituents are tightly bound between two opposing tryptophans and the

Rational Redesign of Haloalkane Dehalogenases

91

COMBINE model constructed for this protein could distinguish between chlorinated and brominated substrates. More research is needed to reﬁne the model for LinB, e.g., by investigating the contribution of electrostatic desolvation energies, the eﬀects of the energy minimization on conformation of substrates in the Michaelis complexes or the eﬀects of explicit inclusion of the water molecules in Michaelis complexes. In fact, active exchange of water molecules between the active site of LinB and the bulk solvent was observed in nanosecond-scale molecular dynamics simulation (63). The lesson learned so far from the comparison of DhlA and LinB COMBINE models is that exactly the same methodology to generate structures of the complexes cannot necessarily be applied, even to closely related proteins, but the modeling protocol must be adjusted with respect to the proteins distinguishing structural and biochemical features.

7

CONCLUSION

COMBINE analysis quantitatively explores macromolecule–ligand interactions on a residue basis and provides quantitative information about the importance of every residue in a macromolecule for binding of diﬀerent substrates. COMBINE analysis identiﬁed a number of speciﬁcity-determining amino acid residues in the haloalkane dehalogenase DhlA. Trp125, Trp175, Leu179, Phe222, and Pro223 are important in distinguishing chlorinated and brominated derivatives. Mutations aﬀecting these residues should modulate the halogen speciﬁcity of the enzyme. A second set of residues (Phe164, Phe172, and Phe222) are found to discriminate substrates by their interactions with the carbon chain. The predictive ability of the COMBINE model derived for DhlA was conﬁrmed with two site-directed point mutants and four novel substrates. Modeling the speciﬁcity of the haloalkane dehalogenase LinB using the same methodology is slightly more diﬃcult due to its larger active site and less speciﬁc binding of its substrates. Our current COMBINE model diﬀerentiates between molecules of diﬀerent chain length but cannot properly distinguish substrates bearing a diﬀerent halogen atom. To achieve this goal we are currently tailoring our modeling protocol for the LinB enzyme.

ACKNOWLEDGMENTS This work was supported by the NATO Linkage Grant MTECH. LG. 974701 and grants from the Czech Ministry of Education J07/98:143100005 and ME551 (JD).

92

Damborsky´ et al.

REFERENCES 1. 2.

3.

4.

5.

6.

7. 8.

9.

10. 11. 12. 13.

14. 15. 16.

AR Ortiz, MT Pisabarro, F Gago, RC Wade. Predictive of drug binding aﬃnities by comparative binding energy analysis. J Med Chem 38:2681–2691, 1995. RC Wade. Derivation of QSARs using 3D structural models of protein–ligand complexes by COMBINE analysis. In: H-D Holtje, W Sippl, eds. Rational Approaches to Drug Design: 13th European Symposium on Quantitative Structure–Activity Relationships. Barcelona: Prous Science, 2001, pp 23–28. RD Cramer, DE Patterson, JD Bunce. Comparative Molecular Field Analysis (CoMFA): 1. Eﬀect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110:5959–5967, 1988. C Perez, M Pastor, AR Ortiz, F Gago. Comparative binding energy analysis of HIV-1 protease inhibitors: incorporation of solvent eﬀects and validation as a powerful tool in receptor-based drug design. J Med Chem 41:836–852, 1998. S Wold, E Johansson, M Cocchi. PLS—Partial least-squares projections to latent structures. In: H Kubinyi, ed. 3D QSAR in Drug Design: Theory, Methods and Application. Leiden: ESCOM, 1993, pp 523–550. M Pastor, C Perez, F Gago. Simulation of alternative binding modes in a structure-based QSAR study of HIV-1 protease inhibitors. J Mol Graph Model 15: 364–371, 1997. S Tomic, L Nilsson, RC Wade. Nuclear receptor-DNA binding speciﬁcity: a COMBINE and Free-Wilson QSAR analysis. J Med Chem 43:1780–1792, 2000. JJ Lozano, M Pastor, G Cruciani, K Gaedt, NB Centeno, F Gago, F Sanz. 3DQSAR methods on the basis of ligand–receptor complexes. Application of COMBINE and GRID/GOLPE methodologies to a series of CYP1A2 ligands. J Comput-Aid Mol Des 14:341–353, 2000. C Cuevas, M Pastor, C Perez, F Gago. Comparative binding energy (COMBINE) analysis of human neutrophil elastase inhibition by pyridone-containing triﬂuormethylketones. Comb Chem High Throughput Screen 4:627–642, 2001. T Wang, RC Wade. Comparative binding energy (COMBINE) analysis of inﬂuenza neuraminidase–inhibitor complexes. J Med Chem 44:961–971, 2001. Kuchner, FH Arnold. Directed evolution of enzyme catalysts. Trends Biotechnol 15:523–530, 1997. FH Arnold. Design by directed evolution. Acc Chem Res 31:125–131, 1998. J Kmunicek, S Luengo, F Gago, AR Ortiz, RC Wade, J Damborsky. Comparative binding energy analysis of the substrate speciﬁcity of haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. Biochemistry 40:8905– 8917, 2001. DB Janssen, F Pries, JR Van der Ploeg. Genetics and biochemistry of dehalogenating enzymes. Ann Rev Microbiol 48:163–191, 1994. DB Janssen, JP Schanstra. Engineering proteins for environmental applications. Curr Opin Biotech 5:253–259, 1994. G Stucki, M Thuer. Experiences of a large-scale application of 1,2-dichloroethane degrading microorganisms for groundwater treatment. Environ Sci Technol 29: 2339–2345, 1995.

Rational Redesign of Haloalkane Dehalogenases

93

17. PE Swanson. Dehalogenases applied to industrial-scale biocatalysis. Curr Opin Biotechnol 10:365–369, 1999. 18. S Keuning, DB Janssen, B Witholt. Puriﬁcation and characterization of hydrolytic haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. J Bacteriol 163:635–639, 1985. 19. T Yokota, T Omori, T Kodama. Puriﬁcation and properties of haloalkane dehalogenase from Corynebacterium sp. strain m15-3. J Bacteriol 169:4049–4054, 1987. 20. R Scholtz, T Leisinger, F Suter, AM Cook. Characterization of 1-chlorohexane halidohydrolase, a dehalogenase of wide substrate range from an Arthrobacter sp. J Bacteriol 169:5016–5021, 1987. 21. DB Janssen, J Gerritse, J Brackman, C Kalk, D Jager, B Witholt. Puriﬁcation and characterization of a bacterial dehalogenase with activity toward halogenated alkanes, alcohols and ethers. Eur J Biochem 171:67–92, 1988. 22. PJ Sallis, SJ Armﬁeld, AT Bull, DJ Hardman. Isolation and characterization of a haloalkane halidohydrolase from Rhodococcus erythropolis Y2. J Gen Microbiol 136:115–120, 1990. 23. Y Nagata, K Miyauchi, J Damborsky, K Manova, A Ansorgova, M Takagi. Puriﬁcation and characterization of haloalkane dehalogenase of a new substrate class from a g-hexachlorocyclohexane-degrading bacterium, Sphingomonas paucimobilis UT26. Appl Environ Microbiol 63:3707–3710, 1997. 24. GJ Poelarends, M Wilkens, MJ Larkin, JD van Elsas, DB Janssen. Degradation of 1,3-dichloropropene by Pseudomonas cichorii 170. Appl Environ Microbiol 64:2931–2936, 1998. 25. A Jesenska, M Bartos, V Czernekova, I Rychlik, I Pavlik, J Damborsky. Cloning and expression of haloalkane dehalogenase gene dhmA from Mycobacterium avium N85 and preliminary characterization of DhmA. Appl Environ Microbiol 68:3724–3730, 2002. 26. F Pries, J Kingma, M Pentega, G Van Pouderoyen, CM Jeronimus-Stratingh, AP Bruins, DB Janssen. Site-directed mutagenesis and oxygen isotope incorporation studies of the nucleophilic aspartate of haloalkane dehalogenase. Biochemistry 33:1242–1247, 1994. 27. F Pries, AJ Van den Wijngaard, R Bos, M Pentenga, DB Janssen. The role of spontaneous cap domain mutations in haloalkane dehalogenase speciﬁcity and evolution. J Biol Chem 269:17490–17494, 1994. 28. F Pries, J Kingma, DB Janssen. Activation of an Asp-124!Asn mutant of haloalkane dehalogenase by hydrolytic deamidation of asparagine. FEBS Lett 358:171–174, 1995. 29. F Pries, J Kingma, GH Krooshof, CM Jeronimus-Stratingh, AP Bruins, DB Janssen. Histidine 289 is essential for hydrolysis of the alkyl-enzyme intermediate of haloalkane dehalogenase. J Biol Chem 270:10405–10411, 1995. 30. C Kennes, F Pries, GH Krooshof, E Bokma, J Kingma, DB Janssen. Replacement of tryptophan residues in haloalkane dehalogenase reduces halide binding and catalytic activity. Eur J Biochem 228:403–407, 1995. 31. JP Schanstra, IS Ridder, GJ Heimeriks, R Rink, GJ Poelarends, KH Kalk, BW

94

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

Damborsky´ et al. Dijkstra, DB Janssen. Kinetic characterization and X-ray structure of a mutant of haloalkane dehalogenase with higher catalytic activity and modiﬁed substrate range. Biochemistry 35:13186–13195, 1996. JP Schanstra, A Ridder, J Kingma, DB Janssen. Inﬂuence of mutations of Val226 on the catalytic rate of haloalkane dehalogenase. Protein Eng 10:53–61, 1997. GH Krooshof, EM Kwant, J Damborsky, J Koca, DB Janssen. Repositioning the catalytic triad acid of haloalkane dehalogenase: eﬀects on activity and kinetics. Biochemistry 36:9571–9580, 1997. P Holloway, KL Knoke, JT Trevors, H Lee. Alternation of the substrate range of haloalkane dehalogenase by site-directed mutagenesis. Biotechnol Bioeng 59: 520–523, 1998. GH Krooshof, IS Ridder, AWJW Tepper, GJ Vos, HJ Rozeboom, KH Kalk, BW Dijkstra, DB Janssen. Kinetic analysis and X-ray structure of haloalkane dehalogenase with a modiﬁed halide-binding site. Biochemistry 37:15013–15023, 1998. K Hynkova, Y Nagata, M Takagi, J Damborsky. Identiﬁcation of the catalytic triad in the haloalkane dehalogenase from Sphingomonas paucimobilis UT26. FEBS Lett 446:177–181, 1999. JF Schindler, PA Naranjo, DA Honaberger, C-H Chang, JR Brainard, LA Vanderberg, CJ Unkefer. Haloalkane dehalogenases: steady-state kinetics and halide inhibition. Biochemistry 38:5772–5778, 1999. Y Nagata, K Hynkova, J Damborsky, M Takagi. Construction and characterization of histidine-tagged haloalkane dehalogenase (LinB) of a new substrate class from a g-hexachlorocyclohexane-degrading bacterium, Sphingomonas paucimobilis UT26. Protein Expr Purif 17:299–304, 1999. S Marvanova, Y Nagata, M Wimmerova, J Sykorova, K Hynkova, J Damborsky. Biochemical characterization of broad-speciﬁcity enzymes using multivariate experimental design and a colorimetric microplate assay: characterization of the haloalkane dehalogenase mutants. J Microbiol Methods 44:149–157, 2001. Y Nagata, Z Prokop, S Marvanova, J Sykorova, M Monincova, M Tsuda, J Damborsky. Re-construction of mycobacterial dehalogenase Rv2579 by cumulative mutagenesis of haloalkane dehalogenase LinB. Appl Environ Microbiol 69:2349–2355, 2003. R Chaloupkova, J Sykorova, Z Prokop, A Jesenska, M Monincova, M Pavlova, M Tsuda, Y Nagata, J Damborsky. Modification of activity and specificity of haloalkane dehalogenase from Sphingomonas paucimobilis UT26 by engineering of its entrance tunnel. Submitted. SM Franken, HJ Rozeboom, KH Kalk, BW Dijkstra. Crystal structure of haloalkane dehalogenase: an enzyme to detoxify halogenated alkanes. EMBO J 10:1297–1302, 1991. KHG Verschueren, F Seljee, HJ Rozeboom, KH Kalk, BW Dijkstra. Crystallographic analysis of the catalytic mechanism of haloalkane dehalogenase. Nature 363:693–698, 1993. KHG Verschueren, J Kingma, HJ Rozeboom, KH Kalk, DB Janssen, BW

Rational Redesign of Haloalkane Dehalogenases

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

56.

57.

95

Dijkstra. Crystallographic and ﬂuorescence studies of the interaction of haloalkane dehalogenase with halide ions. Studies with halide compounds reveal a halide binding site in the active site. Biochemistry 32:9031–9037, 1993. KHG Verschueren, SM Franken, HJ Rozeboom, KH Kalk, BW Dijkstra. Noncovalent binding of the heavy atom compound [Au(CN)2] at the halide binding site of haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. FEBS Lett 323:267–270, 1993. KHG Verschueren, SM Franken, HJ Rozeboom, KH Kalk, BW Dijkstra. Reﬁned X-ray structures of haloalkane dehalogenase at pH 6.2 and pH 8.2 and implications for the reaction mechanism. J Mol Biol 232:856–872, 1993. HJ Rozeboom, J Kingma, DB Janssen, BW Dijkstra. Crystallization of haloalkane dehalogenase from Xanthobacter autotrophicus GJ10. J Mol Biol 200:611–612, 1988. IS Ridder, HJ Rozeboom, BW Dijkstra. Haloalkane dehalogenase from Xanthobacter autotrophicus GJ10 reﬁned at 1.15 A resolution. Biol Crystallogr 55: 1273–1290, 1999. J Newman, TS Peat, R Richard, L Kan, PE Swanson, JA Aﬀholter, IH Holmes, JF Schindler, CJ Unkefer, TC Terwilliger. Haloalkane dehalogenase: structure of a Rhodococcus enzyme. Biochemistry 38:16105–16114, 1999. J Marek, J Vevodova, I Kuta-Smatanova, Y Nagata, LA Svensson, J Newman, M Takagi, J Damborsky. Crystal structure of the haloalkane dehalogenase from Sphingomonas paucimobilis UT26. Biochemistry 39:14082–14086, 2000. AJ Oakley, Z Prokop, M Bohac, J Kmunicek, T Jedlicka, M Monincova, I KutaSmatanova, Y Nagata, J Damborsky, MCJ Wilce. Exploring the structure and activity of haloalkane dehalogenase from Sphingomonas paucimobilis UT26: evidence for product and water mediated inhibition. Biochemistry 41:4847–4855, 2002. MG Pikkemaat, IS Ridder, HJ Rozeboom, KH Kalk, BW Dijkstra, DB Janssen. Crystallographic and kinetic evidence of a collision complex formed during halide import in haloalkane dehalogenase. Biochemistry 38:12052–12061, 1999. J Damborsky, M Kuty, M Nemec, J Koca. Molecular modelling to understand the mechanisms of microbial degradation—Application to hydrolytic dehalogenation with haloalkane dehalogenases. In: F Chen, G Schu¨u¨rmann, eds. Quantitative Structure–Activity Relationships in Environmental Sciences—VII. Pensacola: SETAC Press, 1997, pp 5–20. J Damborsky. Quantitative structure–function relationships of the single-point mutants of haloalkane dehalogenase: a multivariate approach. Quant Struct-Act Relat 16:126–135, 1997. J Damborsky, M Kuty, M Nemec, J Koca. A molecular modeling study of the catalytic mechanism of haloalkane dehalogenase: 1. Quantum chemical study of the ﬁrst reaction step. J Chem Inf Comput Sci 37:562–568, 1997. FC Lightstone, Y-J Zheng, AH Maulitz, TC Bruice. Non-enzymatic and enzymatic hydrolysis of alkyl halides: a haloalkane dehalogenation enzyme evolved to stabilize the gas-phase transition state of an SN2 displacement reaction. Proc Natl Acad Sci USA 94:8417–8420, 1997. AH Maulitz, FC Lightstone, YJ Zheng, TC Bruice. Nonenzymatic and

96

58.

59.

60. 61.

62.

63.

64.

65. 66.

67.

68. 69. 70.

Damborsky´ et al. enzymatic hydrolysis of alkyl halides: a theoretical study of the S(N)2 reactions of acetate and hydroxide ions with alkyl chlorides. Proc Natl Acad Sci USA 94:6591–6595, 1997. J Damborsky, M Bohac, M Prokop, M Kuty, J Koca. Computational sitedirected mutagenesis of haloalkane dehalogenase in position 172. Protein Engng 11:901–907, 1998. M Kuty, J Damborsky, M Prokop, J Koca. A molecular modeling study of the catalytic mechanism of haloalkane dehalogenase: 2. Quantum chemical study of complete reaction mechanism. J Chem Inf Comput Sci 38:736–741, 1998. M Prokop, J Damborsky, J Koca. TRITON: in silico construction of protein mutants and prediction of their activities. Bioinformatics 16:845–846, 2000. FC Lightstone, YJ Zheng, TC Bruice. Molecular dynamics simulations of ground and transition states for the S(N)2 displacement of Cl from 1,2dichloroethane at the active site of Xanthobacter autotrophicus haloalkane dehalogenase. J Am Chem Soc 120:5611–5621, 1998. EY Lau, K Kahn, P Bash, TC Bruice. The importance of reactant positioning in enzyme catalysis: a hybrid quantum mechanics/molecular mechanics study of a haloalkane dehalogenase. Proc Natl Acad Sci USA 97:9937–9942, 2000. M Otyepka, J Damborsky. Functionally relevant motions of haloalkane dehalogenases occur in the speciﬁcity-modulating cap domains. Protein Sci 11:1206– 1217, 2002. KA Gray, TH Richardson, K Kretz, JM Short, F Bartnek, R Knowles, L Kan, PE Swanson, DE Robertson. Rapid evolution of reversible denaturation and elevated melting temperature in a microbial haloalkane dehalogenase. Adv Synth Catal 343:607–617, 2001. JP Schanstra, J Kingma, DB Janssen. Speciﬁcity and kinetics of haloalkane dehalogenase. J Biol Chem 271:14747–14753, 1996. DA Case, DA Pearlman, JW Caldwell, TE Cheatham III, WS Ross, CL Simmerling, TA Darden, KM Merz, RV Stanton, AL Cheng, JJ Vincent, M Crowley, DM Ferguson, RJ Radmer, GL Seibel, UC Singh, PK Weiner, PA Kollman. AMBER 5.0. San Francisco: University of California, 1997. GM Morris, DS Goodsell, RS Halliday, R Huey, WE Hart, RK Belew, AJ Olson. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19:1639–1662, 1998. JJP Stewart. MOPAC—A semiempirical molecular-orbital program. J ComputAid Mol Des 4:1–45, 1990. J Damborsky, M Prokop, J Koca. TRITON: graphic software for rational engineering of enzymes. Trends Biochem Sci 26:71–73, 2001. J Damborsky, J Koca. Analysis of the reaction mechanism and substrate speciﬁcity of haloalkane dehalogenases by sequential and structural comparisons. Protein Eng 12:989–998, 1999.

6 Computer Simulations: A Tool for Investigating the Function of Complex Biological Macromolecules ¨nther H. Peters Gu Technical University of Denmark Lyngby, Denmark

1

COMPUTER SIMULATIONS

Computer simulations have become one of the principal tools in theoretical studies of physical, chemical, and biological systems, and with the fast advancement of computational resources, simulation techniques have emerged as indispensable scientiﬁc and engineering tools. In particular, two simulation techniques established in the 1960s are now commonly applied to many aspects of medicinal chemistry and biophysics, where computer analyses and simulations can augment or explain experimental observations. These techniques are the Monte Carlo algorithms (1–9) and the molecular dynamics simulation techniques (10–32). Monte Carlo calculations represent an entirely diﬀerent type of simulation than those based on molecular dynamics. The name ‘‘Monte Carlo’’ comes from the randomchance nature of the simulations, akin to the games of chance at Monaco’s 97

98

Peters

gambling resort. Monte Carlo simulations are stochastic and use random numbers to sample from a probability distribution, usually the classical Boltzmann distribution, to obtain for instance thermodynamic properties, minimum-energy structures and/or rate coeﬃcients, or to sample conformers as part of a global conformer search algorithm. The molecular dynamics (MD) simulation technique on the other hand is a deterministic method, where the time evolution of a system is determined by Newton’s equations of motion (i.e., positions, velocities, and accelerations of atoms). Hence, MD simulations not only provide information on a molecular level (i.e., in space) but also on the dynamics of a system (i.e., behavior in time). Experiments often do not provide the molecular information available from simulations. Therefore as schematically shown in Fig. 1, theoreticians, computational scientists, and experimentalists can have a synergistic interaction, leading to new insights into complex biological systems. Chemical and physical

biological system

modeling approach

developing model system

experiment

computer simulations

theory and approximation

experimental results

"exact" model results

theoretical predictions

comparison/ adjustment

comparison/ adjustment

insight on molecular level

Figure 1 Synergistic interaction between experimentalists, computational chemists, and theoreticians.

Computer Simulations

99

intuition and experience, of course, will always be necessary, but computers add depth and new dimensions in understanding complex biological systems. 2

MONTE CARLO METHOD

Numerical methods that are known as Monte Carlo (MC) methods can be broadly described as statistical simulation methods. Here the terminology statistical simulation is deﬁned in general terms including any method that utilizes sequences of random numbers to perform a simulation. The name ‘‘Monte Carlo’’ was coined by Metropolis during the Manhattan Project of World War II, because of the similarity of statistical simulations to games of chance, and because the capital of Monaco was a center for gambling. Monte Carlo methods have been used for centuries, but only in the past several decades has this technique been applied to complex problems. One of the ﬁrst MC calculations was performed by Enrico Fermi in the 1930s. He used MC in the calculation of neutron diﬀusion and later designed the Fermiac, which is a Monte Carlo mechanical device used in the calculation of criticality in nuclear reactors. It was in the 1940s when von Neumann developed a formal foundation for the MC method by establishing the mathematical basis for probability density functions, inverse cumulative distribution functions, and pseudorandom number generators. In many applications of MC, the physical process is simulated directly without the need to write down the diﬀerential equations that describe the behavior of the system. It is noteworthy that this concept is in contrast to conventional numerical discretization methods, which typically are applied to ordinary or partial diﬀerential equations that describe some underlying physical or mathematical system. The only requirement for utilizing MC is that the physical (or mathematical) system can be described by a probability density function. Once the distribution function is known, MC simulation can proceed by random sampling from that distribution. By the 1960s, the method was used in a variety of engineering ﬁelds. However, at that time, calculations were limited by the computational power available. Many complex problems remained intractable throughout the 1970s. With the development of high-speed supercomputers, Monte Carlo application has received increased attention. In particular with the development of parallel algorithms having much higher execution rates. The Monte Carlo technique is now used routinely in many diverse ﬁelds including nuclear reactor design, traﬃc ﬂow, economics, environmental problems (e.g., air pollution), or biological systems. In the latter area, simulations are carried out to understand the collective behavior of many-particle systems leading for instance to protein folding, phase separation, or spatial segregation in membranes (33–36).

100

Peters

A major advance in Monte Carlo simulations was made by Metropolis and co-workers (37), who developed a new algorithm in which the new conformation is generated from the current one by a ‘‘move’’ accepted with the probability, Pacc Pacc ¼ 1 DE < 0 Pacc ¼ minð1; expðDE=kB TÞÞ DE > 0

ð1Þ ð2Þ

which depends on the corresponding change in energy, DE, and on the externally adjustable parameter kBT, where kB is the Boltzmann constant, and T is the temperature (38). As the Metropolis algorithm satisﬁes the ‘‘detailed balance’’ condition and each conﬁguration can be reached in a ﬁnite number of steps (ergodicity), the resulting Markov process (chain) will converge to the canonical distribution. That means the probability (frequency) that a particular conformation occurs will be proportional to its Boltzmann factor exp(E/kBT). Thermodynamic quantities can then be simply calculated by computing averages over the sampled conformations. In many practical applications, one can predict the statistical error (the ‘‘variance’’) in the average result, and hence an estimate of the number of Monte Carlo trials needed to achieve a given error and convergence of the thermodynamic quantities of interest. The MC method is also an eﬃcient method for sampling the conﬁgurational space, which frequently is explored in protein folding problems. Because MC is stochastic in nature and the method is based on probability distributions, there is no dynamic component involved in the simulations (i.e., time-dependent quantities such as transport coeﬃcients cannot be calculated from MC simulations). Molecular dynamics simulations, in contrast, are deterministic and follow the time-evolution of particles in phase space. In other words, MC explores the phase space by jumping from one conﬁguration to another conﬁguration weighted by the Boltzmann factor, while MD follows the trajectory (time-evolution) of single atoms through the phase space. An obvious advantage of molecular dynamics over Monte Carlo is that MD simulations follow the classical trajectory of the system, whereas the dynamics in MC is artiﬁcial. MD is the method of choice when investigating the kinetics of a given process. One of the major shortcomings of MD simulations is that as the complexity of the system increases, these simulations are computationally demanding, and in many problems only a limited part of the phase space can be explored. Furthermore, if the system moves along a rough energy surface with a relatively large number of local minima, then MD simulations tend to get trapped in these energy minima and as a consequence only a limited part of the phase space is sampled. The problem of multiple minima and their detection is well known in computational chemistry. In many instances, one cannot explore all

Computer Simulations

101

possible states of a system by performing an exhaustive search of all its degrees of freedom. Here MC simulations provide a suﬃcient method to explore the energy landscape and have been successfully applied in many biophysical areas such as structure-based drug design, docking of molecules to a receptor, and alignment of molecules by optimization of molecular similarity indices. 3 3.1

MOLECULAR DYNAMICS METHOD Historical Background

The molecular dynamics technique was ﬁrst introduced by Alder and Wainwright in the late 1950s (32,39). The authors studied the interactions of hard spheres resulting in many important insights into the behavior of simple liquids (39). In the early 1960s, pioneering works on the development of consistent force ﬁelds based on experimental data (such as spectroscopic data, heat of formation, structures of small molecules, quantum-mechanical information, etc.) were independently carried out by Lifson at the Weizman Institute of Science, Scheraga at Cornell University, and Allinger at the Wayne State University. These researches lay the foundation for developing force ﬁeld parameters for various chemical compounds by optimizing computationally obtained results to experimental observations such as structure and energetics. In the early 1970s, Rahman and Stillinger performed the ﬁrst simulation using a realistic potential for liquid argon (40). The same authors also carried out the ﬁrst molecular dynamics simulation of a polar molecule (liquid water) (41,42). The ﬁrst protein simulation appeared in 1977 by McCammon et al. The authors conducted simulations of the bovine pancreatic trypsin inhibitor (BPTI) (43). Today molecular dynamics simulations are well established in the scientiﬁc community and this technique is applied to a wide range of applications including chemical, biophysical, or medicinal problems such as enzyme catalysis, protein–protein interactions, and protein/ligand design (44). Moreover, molecular dynamics simulation techniques are also used in experimental procedures such as x-ray crystallography and NMR structure determinations. The number of simulation techniques has greatly expanded, and techniques have been developed for particular problems including mixed quantum mechanical–classical simulations that are applied to the study, for instance, of charge transfer in enzymatic reactions. 3.2

Classical Mechanics

Molecular dynamics simulation is a method that inherently introduces the concept of time and is based on Newton’s classical mechanics. Building on the work of Galileo Galilei (1564–1642), Nicolas Copernicus (1473–1543),

102

Peters

Tycho Brahe (1546–1601), and Johannes Kepler (1571–1630), Isaac Newton (1642–1727) formulated in 1687 the second law of motion stating that a ! body’s acceleration, ! a, is equal to the net force, F, divided by its mass, m. !

!

a ¼ F=m

ð3Þ

Throughout the text, an arrow above a variable indicates that the quantity is a vector. In the late 1800s and early 1900s a number of experimental observations indicated that Newtonian physics has its limitation because it only considers the nuclear motion of many-body systems. It became increasingly clear that electromagnetic radiation had particle-like properties in addition to its wave-like properties such as interference and diﬀraction. This initiated intense research in that area and several major breakthroughs have been achieved, as summarized in Table 1. Plank demonstrated in 1900 that electromagnetic radiation was emitted and adsorbed from a black body in discrete quanta, each having an energy proportional to the frequency of radiation. Einstein invoked this concept (discrete quanta) to explain the photo-electric eﬀect in 1904. In 1924, de Broglie asserted that matter has a dual nature, i.e., that particles can be wavy. This led to the formulation of Schro¨dinger’s time-dependent wave equation of matter (45–47). !

ywð r; tÞ ˆ ð! Hw r; tÞ ¼ ih yt

ð4Þ

Hˆ is the so-called Hamiltonian operator incorporating all the relevant forces exerted on the particles of the system, and ! r given by ! r ¼ rx eˆ x þ ry eˆ y þ rz eˆ z is the position of the particle in Cartesian coordinates. The solution of Eq. (4) yields discrete (quantized) values (or eigenvalues) of energy En and for each En its corresponding wave function. Hence each particle is represented by a wave function w (position, time) such that the quantity ww* is the probability of ﬁnding a particle at that position at that time. However, the theory of quantum mechanics is by no means equivalent to Newton’s Laws. There are some major diﬀerences between classical and quantum mechanics, and these diﬀerences form a limitation on their exact application. In classical mechanics a particle can have any energy and any speed, whereas in quantum mechanics these quantities are quantized. As a consequence a particle in a quantum system can only have certain values for its energy and its speed (or momentum). These special values are called the energy or momentum eigenvalues of the quantum system. Associated with each eigenvalue is a special state called an eigenstate. The eigenvalues and eigenstates of a quantum system are the most important features for characterizing that systems behavior. There are no eigenvalues or eigenstates in classical mechanics. Newton’s laws allow, in

Computer Simulations Table 1

103

Historical Overview in Quantum Mechanics

Scientist

Year

Achievement

Max Planck

1900

Albert Einstein

1905

Niels Bohr

1913

Louis de Broglie

1924

Werner Heisenberg

1925

Erwin Schro¨dinger

1926

Copenhagen Interpretation

1927

Explained blackbody radiation by applying the concept of discrete-energy quanta in physics. Treated radiation as independent particles of energy, where quantum theories of both matter and radiation were needed to describe these systems. Demonstrated that the frequencies of atomic spectral lines are independent of the frequencies of electronic motions within the atom. Established the concept of wave-particle duality for matter. Developed the matrix mechanics which is a consistent (but arbitrary) quantum theory emphasizing quantum rules to problems of atomic structure and atomic spectra. Proved that the theories of matrix mechanics are equivalent to his own developed wave mechanics. Proposed by Heisenberg and Bohr, this interpretation of quantum mechanics was based on Bohr’s statistical interpretation of the wave function and Dirac’s more comprehensive theory of quantum mechanics.

principle, to determine the exact location and velocity of a particle at some future time. Quantum mechanics, on the other hand, only determines the probability for a particle to be in a certain location with a certain velocity at some future time. The probabilistic nature of quantum mechanics makes it very diﬀerent from classical mechanics. Quantum mechanics incorporates what is known as the ‘‘Heisenberg Uncertainty Principle’’. This principle states that the location and velocity of a quantum particle are not known to inﬁnite accuracy. If one can determine precisely the particle’s location, then the exact velocity is uncertain and vice versa. In practice, the level of ‘‘uncertainty’’ is so small that it is only noticeable when dealing with matter having atomic dimensions. Quantum mechanics permits what is called ‘‘superposition of states’’. This means that a quantum particle can be in two diﬀerent states at the same time, which is certainly not possible in classical

104

Peters

mechanics. Quantum mechanics describes a system in terms of probabilities. It forces to abandon the notion of precisely deﬁned trajectories of atoms through time and space. In classical mechanics electronic motions are not considered, and quantum eﬀects are generally ignored. The classical description is excellent for a wide range of systems but of course fails for reactions involving electron transport such as bond formation and cleavage, or polarization. To study this kind of problems, quantum dynamical approaches are used, which combine quantum-mechanical calculations with classical mechanics. 3.3

Characteristic Time Scales

Many complex phenomena, which one encounters in science and technology, are consequences of collective or cooperative behavior of many interacting particles (‘‘many-body problem’’). Liquid–solid phase transitions, nematic–isotropic transitions of liquid crystal molecules, viscoelasticity of polymer melts, self-organization in biological systems, and enzymatic reactions are only a few examples for such many-body problems. An exact treatment of these systems would require a quantummechanical approach. However, quantum-mechanical computations are expensive, and at the present time, these calculations are not feasible for complex systems with processes occurring on relatively large time and length scales. As shown in Fig. 2, quantum-mechanical calculations involve time and length scales of 1012 sec and 1010 m, respectively. Systems involving larger time or length scales require other approaches such molecular dynamics, Monte Carlo, or continuum theory (Fig. 2). As mentioned previously, classical mechanics cannot be applied when, for instance, charge transfer occurs in a process. For other processes, the question arises as to which motions can be reasonably approximated by classical mechanics. In the classical mechanical description, atoms may possess any energy, and as a consequence atoms move along continuous energy surfaces. Contrarily, in the quantum-mechanical description, the energy is quantized, and the atoms can only occupy certain discrete (separated) energy levels. This ‘‘discreteness’’ of the energy landscape will be more pronounced at temperatures where the gaps between the energy levels are much larger than the thermal energy. With increasing temperature more energy levels become thermally accessible and the atoms approach the limit of classical behavior. To determine this limit, let us consider a harmonic oscillator. For the harmonic oscillator, the energy levels are separated by DE=hf, where f is the frequency of the harmonic vibration, and h is Planck’s constant. Classical behavior is approached at temperatures for which kBT>>hf,

Computer Simulations

Figure 2

105

Time and length scales encountered in simulations.

where kB is the Boltzmann constant and T the absolute temperature. Using T equal to 300 K, then kBT is 2.5 kJ/mol. The frequency of the harmonic oscillator becomes 6.25 psec1, which indicates that classical mechanics is a good approximation for motions with a characteristic time scale of picoseconds or longer at room temperature. Depending on system size and the complexity of the system, molecular dynamics simulations are usually performed on a time scale of 10-9 sec (Fig. 2). This raises the question of how well the conﬁgurational space can be sampled (48,49). The many-body problem causes a correlated motion between the particles, i.e., the motion of individual particles is coupled to the motion of other particles. Each dynamic process (motion) has a characteristic time-scale, amplitude, and energy range. Macromolecules in general, and proteins in particular, display a broad range of characteristic motions ranging from motions that are very fast and often localized (e.g., atomic ﬂuctuations) to slow motions that occur on the scale of the whole molecule (e.g., motion of domains). Many of these motions have an important role in the biochemical function of proteins and might be coupled to one another. Namely, the large-scale dynamic transitions involve medium-scale motions that naturally could involve localized motions. As summarized in Table 2, these motions are in the range of picoseconds to hours;

106

Peters

Table 2

Characteristic Motions in Proteins

Type of motions Local motions: . Atomic ﬂuctuation . Side chain motion . Loop motions Medium scale motions: . Rigid-body motion (helices) . Loop motion . N- or C-terminal motion Large scale motions: . Domain motions . Subunit motions

Functionality examples . . .

. .

. .

Global motions: . Helix-coil transition . Folding/unfolding . Subunit association

. .

Time and amplitude scales

Substrate recognition Ligand docking Temporal diﬀusion pathway

1015–1012 sec (fsec–psec) <1 A˚

Binding speciﬁcity Structural change to obtain active conformation

109–106 sec (nsec–Asec) 1–5 A˚

Hinge bending motion Allosteric transitions

106–103 sec (Asec–msec) 5–10 A˚

Protein functionality Hormone activation

103–104 sec (msec–hr) >5 A˚

i.e., that these motions span 20 orders of magnitude in characteristic time. 3.4

Potential Functions

Biological systems involving macromolecules are inherently complex and consist normally of many atoms such that a complete quantum mechanical description of these systems is not yet feasible. Here classical mechanics and the use of empirical potential energy functions are presently the only approach to study these systems. Potential energy functions (i.e., force ﬁelds) provide a reasonably good compromise between accuracy and computational eﬃciency. The parameters contained in the force ﬁelds are often calibrated to experimental results and quantum mechanical calculations of small model compounds. The force ﬁelds are tested by computing the physical properties that are measurable by experiment. Normally, structural data obtained from x-ray crystallography and NMR, dynamic properties obtained from spectroscopy and inelastic neutron scattering, and thermodynamic data are used in the evaluation of the accuracy of the force ﬁeld (50–53). The development of a force ﬁeld is an iterative process, and depend-

Computer Simulations

107

ing on the complexity of the system, it could require extensive optimization. Several research groups are focusing on deriving functional forms and parameters for potential energy functions, which are generally applicable to biological molecules. Among the most commonly used potential energy functions are the CHARMM (http://www.scripps.edu/brooks/charmm_ docs/charmm.html), GROMOS (http://www.gromacs.org), AMBER (http:// www.amber.ucsf.edu/amber/amber.html), and OPLS/AMBER force ﬁelds. These force ﬁelds are continuously being improved to be suitable for applications in both fundamental and applied research. Complete potential functions are now available for macromolecular simulations involving nucleic acids (54), proteins (55), lipids (56), and carbohydrates (57). These force ﬁelds are functions of the atomic positions, ! r, which are usually expressed in terms of Cartesian coordinates. The total potential energy is then computed as a sum of intramolecular (or bonded) energies, Ubonded, and intermolecular (or nonbonded) energies, Unonbonded. As schematically shown in Fig. 3, Unonbonded accounts for interactions between nonbonded atoms (i.e., between molecules) or atoms separated by three or more covalent bonds in the same molecule. The intramolecular term describes the bond stretching, valence angle bending, and bond rotations (torsion) in a molecule.

bond stretch

torsional angle intermolecular interactions valence angle bend

(A) Figure 3 cules.

intramolecular nonbonded

(A) Illustration of the types of interactions encountered in macromole-

108

Peters

Figure 3 (B) Interactions included in the potential energy function for molecular dynamics simulations.

Computer Simulations

109

Ubonded ¼ Ubondedstretch þ Uanglebend þ Utorsion þ Uimproper

ð5Þ

Ubonded-stretch in Eq. (5) is a harmonic potential describing the covalent bond between atom pairs, i.e., 1,2 pairs. Ubondedstretch ¼

1 X Kb ðr r0 Þ2 2 1;2 pairs

ð6Þ

This potential is an approximation of the energy of a bond as it is stretched from its equilibrium bond length, r0 (Fig. 3B). The force constant, Kb, determines the strength of the bond and like r0 depends on the chemical type of atoms connected. The equilibrium bond length and the force constant are usually inferred from high-resolution crystal structures and microwave spectroscopy data, respectively. Uangle-bend in Eq. (5) is associated with an alteration of the bond angle h from equilibrium value h0 (Fig. 3B). This function is also expressed as a harmonic potential. Uanglebend ¼

1X Kh ðh h0 Þ2 2 i;j;k

ð7Þ

Again, h0 and the force constant Kh depend on the chemical type of atoms forming the angle. Ubonded-stretch (Eq. (6)) and Uangle-bend (Eq. (7)) describe the deviation from an ideal geometry. These potentials are eﬀectively penalty functions, and the sum of the potentials should be close to zero in a perfectly optimized structure. The third term in Eq. (5) represents the torsion angle potential function, which takes into account the presence of steric barriers between atoms separated by three covalent bonds (i.e., 1,4 pairs). Diﬀerent functional forms of the potential are employed in the literature, and one form frequently encountered is a periodic cosine function X Kw ð1 cosðnwÞÞ ð8Þ Utorsion ¼ 1;4 pairs

The rotation around the middle bond is described by a dihedral angle, w, (Fig. 3B) and a coeﬃcient of symmetry (n=1,2,3). The last term (the socalled improper dihedrals) in Eq. (5) has also the form of a harmonic potential and is used to maintain the chirality or planarity of chemical groups (e.g., sp2-hybridization in a carboxylate group). The potential is given by 1 X Uimproper ¼ Kx ðx x0 Þ2 ð9Þ 2 improper

110

Peters

where x0 is the equilibrium angle as displayed in Fig. 3B. The force constants Kb, Kh, and Kw are obtained from studies of small model compounds by using structural information (geometry) and vibrational spectra monitored usually in the gas phase (IR and Raman spectroscopy), supplemented with ab initio quantum calculations. The nonbonded interactions consist of two components, which are the van der Waals (vdW) and electrostatic (elec) interaction energies (see also Fig. 3B). Unonbonded ¼ UvdW þ Uelec

ð10Þ

The functional form of Eq. (10) does not include an explicit hydrogen bond term and hydrogen bonds are frequently accounted for through an appropriate parameterization of van der Waals and Coulomb interactions. The van der Waals interactions are described by a Lennard Jones (LJ) 6–12 potential that includes (i) repulsive forces arising at short distances where the electron–electron interaction is strong and (ii) attractive forces (so-called dispersion forces) originating from ﬂuctuations in the charge distribution in the electron clouds. The LJ potential given by Eq. (11) results in a minimum in the energy, where atom pairs are located at the optimal distances (rijmin) stabilizing the structure. The minimum energy (qij) and the optimal separation of atoms (approximately equal to the sum of van der Waals radii of the atoms) depend on the chemical type of these atoms. 2 !12 ! 3 min 6 X rmin r ij ij 5 ULJ ¼ eij 4 2 ð11Þ r rij ij i<j The electrostatic interaction between a pair of atoms is represented by the Coulomb potential. X qi qj ð12Þ Uelec ¼ 4per e0 rij nonbonded pairs er is the eﬀective dielectric constant for the medium, and rij is the distance between two atoms i and j having charges qi and qj, respectively. The empirical potential functions have some limitations, and one such limitation originates from the ﬁxed set of atom types employed when determining the parameters for the force ﬁeld. Atom types are deﬁned to describe for instance a particular bonding situation. For example, an aliphatic carbon atom in an sp3 bond has diﬀerent properties than a carbon atom found in the His ring. Instead of presenting each atom by a unique set of parameters, there is a certain amount of grouping to minimize the number of atom types. This could result in type-speciﬁc errors. The properties

Computer Simulations

111

of certain atoms (e.g., aliphatic carbon or hydrogen atoms) are less sensitive to their surroundings and a single set of parameters may be applicable, while other atoms such as oxygens and nitrogens are much more inﬂuenced by their neighboring atoms. These atoms require speciﬁc parameters to account for the diﬀerent bonding environments. An approximation introduced to decrease the computational burden is the pair-wise additive approximation. The interaction energy between one atom and the remaining part of the system is calculated as a sum of pair-wise interactions. Hence, the simultaneous interaction between three or more atoms is not calculated and, consequently, any quantity that depends on multiple interactions (e.g., polarization eﬀects) is poorly described, giving rise to subtle diﬀerences between calculated and experimental results. Finally, it is noteworthy that the potential energy function does not include entropic eﬀects. Thus a minimum value in the sum of the diﬀerent potential energy might not correspond to the equilibrium conﬁguration (i.e., not corresponding to the minimum of the free energy). However, the latter eﬀect occurs only in simple (‘‘static’’) energy calculations, whereas in molecular dynamics simulations entropic eﬀects are implicitly included.

3.5 3.5.1

Short-Range and Long-Range Forces Truncating the Potential

Truncating the interactions introduces a discontinuity in the potential function itself and its derivatives. This corresponds to an inﬁnite (impulsive) force acting between atoms that cross the discontinuity. Molecular dynamic simulations using soft potentials and carried out under these conditions will not, strictly speaking, conserve energy. However, the extent of the eﬀect is dependent on the functional form of the potential and the chosen cutoﬀ. To deal with this problem diﬀerent methodologies have been implemented that shift either the potential or the force at the truncation point. Using the former methodology, the potential is shifted upward by the amount of the discontinuity, thereby bringing the energy to zero at exactly the truncation point. US ðrÞ ¼ UðrÞ Uðrc Þ

r V rc

US ðrÞ ¼ 0

r < rc

ð13Þ

rc is the truncation (cutoﬀ) distance. The disadvantage is that the entire potential is shifted by the same amount, U(rc), and that the forces are still discontinuous at the truncation point, making the force a Heavyside function at this point. To remedy this, a switching function (a third-order

112

Peters

spline) can be used to smoothly switch the potential to zero. The potential is multiplied by the switching function, which has the form Sðr; ron ; roff Þ

¼ 1 ¼

ðroff rÞ2 ðroff þ2r3ron Þ ðroff ron Þ3

¼

0

rVron ron VrVroff

ð14Þ

r < roff

Between the distances ron and roﬀ, the potential smoothly approaches zero. The subscripts on (start) and oﬀ (end) refer to the interval used for smoothing the potential. Another approach is to shift the force to zero by introducing an additional term which is linear in distance; thus rVrc Usf ðrÞ ¼ UðrÞ Uðrc Þ dU dr r¼rc ðr rc Þ ð15Þ ¼ 0 r < rc The shifted-force potential represents a larger perturbation on the overall potential. Depending on the density, a cutoﬀ of about 2.5 Lennard Jones diameter makes up about 5–10% of the total energy and pressure. Sometimes these corrections can be added to the simulation averages upon completion of the simulation (‘‘tail-corrections’’), but at other times it is important to include the corrections during the course of the simulation. For example, the long-range correction to the energy depends on the density, and therefore the tail-correction must be included during the simulations when, for instance, these are carried out in the NPT ensemble [i.e., simulations at constant number of particles (N), constant pressure ( P), and constant temperature (T)]. 3.5.2

Force Calculations

The equations of motion in the Hamiltonian formulation of mechanics can be written as a set of ﬁrst-order diﬀerential equations (10,16) ! d! ri pi ¼ dt dt

ð16Þ

d! pi ! ¼ Fi dt

ð17Þ

The forces needed to integrate Eqs. (16) and (17) are derived from the potential model and are the gradient of the potential energy. Because all the empirical potential energy functions mentioned in Section 3.4. are analytical, it is straightforward to derive an expression for the force. In the simplest case,

Computer Simulations

113

the potential is pair-wise additive and spherically symmetric. For this situation, the force between two atoms, i and j, is acting alongq the separation ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ! ! 2 vector rij, where the magnitude of the vector is given by jrij j ¼ xij þ y2ij þ z2ij. ! The force that atom j exerts on atom i is simply Fji ¼ jr!i U rij , and according to Newton’s third law, the force that atom i exerts on atom j is ! ! Fij ¼ Fji : 3.5.3

Coulombic Interactions

There are numerous examples in nature indicating that the biological function of some proteins is driven by electrostatic interactions. Therefore the treatment of the electrostatic contributions becomes an important issue in simulating charged systems (58–63). The lack of insuﬃcient treatment of these interactions, which might introduce artiﬁcial (undesired) errors when using the truncation method, has been reported in the literature and has been frequently (re)examined. The problem in calculating the electrostatic forces lies in their nature. These forces are long-range and have to be treated in a more sophisticated way than it is done for van der Waals interactions. The Coulomb potential vanishes as 1/r, which is a much slower decay than the 1/r6 dispersion interaction characterized by the Lennard Jones model potential (Eq. (11)). The task of developing a correct and eﬃcient treatment of these long-range electrostatic interactions in macromolecular simulations has drawn signiﬁcant attention as reﬂected throughout the literature. Generally, two approaches are applied that involved continuum models (i.e., implicit representation of the solvent) and models using explicit representations of the solvent in periodic boundary conditions. It is the latter situation where the treatment of electrostatic interactions requires some considerations. There are two widely used techniques. The ﬁrst one is the Ewald summation, which is based on the full lattice sum, using special mathematical techniques to enhance the convergence. Alternatively, the reaction ﬁeld technique can be used. This method uses full three-dimensional periodic boundary conditions and treats interactions with the nearby solvent molecules explicitly, whereas interactions with the distant solvent molecules are handled through a continuum description. The Ewald summation was developed in 1921 to compute the electrostatic contribution in a system with periodic boundary conditions (pbc). In such a system, the central simulation cell will be surrounded by its image in all directions inﬁnitely. The total electrostatic energy of a system of N particles in a cubic box of size L with pbc is represented by V X N X N 1X qi qj Uelec ¼ ð18Þ 2 !n i¼1 j¼1 rij;!n

114

Peters

! where n ¼ ðn1 Lx ; n2 Ly ; n3 Lz Þ is the cell basis vector and N is the number of atoms. The prime symbol at the ﬁrst summation indicates the exclusion of all i=j interactions inside the central simulation cell. This sum is conditionally convergent, as the terms decay as 1/n (i.e., the result depends on the order of summation). Now, the Ewald technique separates the electrostatic interactions into two parts, which are a short-range term handled in the direct (real) space and a long-range, smooth varying contribution handled approximately in the reciprocal space using Fourier transforms. This splitting changes the potential energy from a slowly and conditionally convergent series to the sum of two rapidly converging series calculated in direct and reciprocal space and a constant term, Uelec= Udir+Urec+Uself. The last term, Uself, cancels the self energy term introduced in the calculation of Urec. Because the derivation of these potential functions is lengthy and is beyond the scope of this chapter only the ﬁnal equations are given: Udir ¼

N 1XX qi qj erfcðaA! rj ! ri þ ! nA ! ! ! 2 !n i; j¼1 Arj ri þ nA

Urec ¼

1 X expðp2 m2 =a2 Þ ! ASð mÞ2 A !2 2pV ! m

ð19Þ

!

ð20Þ

m p0

N a X q2 Uself ¼ pﬃﬃﬃ p i¼1 i

ð21Þ

where V is the volume of the simulation cell with sides of length Lx, Ly, and ! complementary error function, Lz. m is a reciprocal-space vector, erfc is the X ! N and S( m) is the structure factor deﬁned as q expð2pi½m1 xj =Lx þ m2 yj = j¼1 j Ly þ m3 zj =Lz Þ. The transformation treats each point charge in the system as if it were surrounded by a Gaussian distribution (i.e., normal distribution) of an equal but opposite sign charge, producing an exponentially decaying function. The Gaussian charge distribution essentially screens the interactions between neighboring point-charges, and the interactions become shortrange. As a result, the sum over all charges, including their images, converges rapidly in direct space. To counteract the Gaussian distribution introduced ‘‘artiﬁcially’’, another Gaussian distribution of the same sign and magnitude of charge is added for each point charge. This sum is performed in the reciprocal space using Fourier transforms and transformed back to the direct space. The a parameter occurring in Eqs. (19)–(21) represents the width of the Gaussian distribution. a reﬂects the accuracy and determines the relative rates of convergence between the direct and reciprocal sums to optimize computational time. When a is small the direct sum converges faster than the reciprocal

Computer Simulations

115

sum. Similarly, when a is large the reciprocal sum converges faster than the direct sum. This is simply due to the fact that in the direct sum a is in the numerator of the function while in the reciprocal sum a occurs pinﬃﬃﬃ the de1=6 , nominator. It has been shown that a should vary such that a ¼ c p VN2 where the constant c determines the ratio of execution time of the real and the reciprocal term, which may vary from one platform to another. The implementation of the Ewald sum has been further developed to supplement the conventional and also the reﬁned truncated list methods (64). Eﬃcient computational schemes such as the Particle Mesh Ewald (PME) method (65,66), the somewhat related Particle–Particle–Particle method (67), and the Fast Multipole Algorithm (or Method) (FMA/ FMM) originally formulated nonperiodically (68,69) and subsequently generalized to periodic systems (70,71) have been used, reﬁned, and rigorously compared in the literature (72–76). The PME method is the currently preferred methods besides the conventional truncated list methods. 3.6

Algorithm for Solving Newton’s Equations of Motion

All common algorithms used to numerically solve the equations of motion are based on a Taylor series expansion of the coordinates, velocities, or higher order time derivatives of the coordinates. An overview of the most commonly used algorithms is provided in Table 3A. The simplest method is an Euler formulation, but this algorithm does not produce a stable trajectory, and an NVE [i.e., constant number of particles (N), constant volume (V), and constant energy (E)] simulation suﬀers from severe energy drift. This is because the positions and velocities are advanced without considering information from the previous time step. This shortcoming is solved by the more advanced algorithms such as the Gear predictor–corrector algorithms, the Verlet algorithm, or the Verlet-like algorithms: Leapfrog and Velocity–Verlet (see Table 3B). The Verlet and the Verlet-like algorithms are based on a Taylor series expansion of only the coordinates and velocities, whereas the predictor–corrector algorithms use a number of time derivatives of the coordinates to advance each derivative forward in time. The predictor–corrector algorithms frequently applied in the literature diﬀer in the number of time derivatives used in the numerical solution of the equations of motion. This is referred to the diﬀerent value of the algorithms. For instance, the predictor–corrector algorithm of ‘‘value 5’’ includes terms up to the fourth-order time derivative. As the name indicates the predictor– corrector algorithm consists of a predictor step and a corrector step. A Taylor series expansion of the diﬀerent time derivatives is used to advance each derivative forward in time from t to (t+yt), producing a series of predicted coordinates and their time derivatives. The predicted coordinates

116 Table 3A

Peters Algorithm of the Diﬀerent Integration Schemes

Algorithm Euler

Predictorcorrector

Verlet

Leapfrog

VelocityVerlet

! ri ðt

: ! ri ðt

þ ytÞ ¼

! ri ðtÞ

Algorithm : þ! ri ðtÞyt þ 12 ! r¨ i ðtÞyt2

: þ ytÞ ¼ ! ri ðtÞ þ ! r¨ i ðtÞyt : !p ! ri ðt þ ytÞ ¼ ri ðtÞ þ ! ri ðtÞyt þ 12 ! r¨ i ðtÞyt2 ::: þ 16 ! ri ðtÞyt3 þ : : : ::: :! p :! ri ðt þ ytÞ ¼ ri ðtÞ þ ! r¨ i ðtÞyt þ 12 ! ri ðtÞyt2 þ : : : ::: p ! r¨ i ðtÞ þ ! ri ðtÞyt þ : : : r¨i ðt þ ytÞ ¼ ! ::: ::: p ! ri ðt þ ytÞ ¼ ! ri ðtÞ þ : : : r¨i c ðt þ ytÞ ! r¨i p ðt þ ytÞ Dr!¨i ðt þ ytÞ ¼ ! !c !c ri ðt þ ytÞ ¼ ri ðt þ ytÞ þ c0 D ! r¨ i ðt þ ytÞ :c :p ! ! ! ¨ ri ðt þ ytÞ ¼ ri ðt þ ytÞ þ c1 D ri ðt þ ytÞ ! r¨ ip ðt þ ytÞ þ c2 D ! r¨ i ðt þ ytÞ ri¨ c ðt þ ytÞ ¼ ! ::: ::: ! ri p ðt þ ytÞ þ c3 D ! r¨ i ðt þ ytÞ ri c ðt þ ytÞ ¼ ! ! ! ! ri ðt þ ytÞ ¼ 2ri ðtÞ ri ðt ytÞ þ ! r¨ i ðtÞyt2 : 1 ! ! ! ½ri ðt þ ytÞ ri ðt þ ytÞ ri ðtÞ ¼ 2yt : ! ri ðt þ ytÞ ¼ ! ri ðtÞ þ ! ri ðt þ yt=2Þyt : : ! ri ðt yt=2Þ þ ! r¨ i ðtÞyt ri ðt þ yt=2Þ ¼ ! :! :! 1 !: ri ðtÞ ¼ ri ðt þ yt=2Þ þ ri ðt yt=2Þ 2 : 1 ¨ ! ri ðt þ ytÞ ¼ ! ri ðtÞ þ ! ri ðt þ ytÞyt þ ! r i ðtÞyt2 2 : : 1 ¨ ! r i ðtÞyt ri ðt þ yt=2Þ ¼ ! ri ðtÞ þ ! 2 : : 1 ¨ ! ri ðt þ yt=2Þ þ ! ri ðt þ ytÞ ¼ ! r i ðt þ ytÞyt 2

Source: Refs. 10 and 16.

Taylor series expansion : and ! ri ðtÞ to second and ﬁrst order, respectively

! ri ðtÞ

: ! ri ðtÞ; ! ri ðtÞ; r!¨i ðtÞ

and higher order time derivatives comments: superscript ‘‘p’’: predicted value; superscript ‘‘c’’ correct value; c0, c1, c2, and c3 are the Gear coeﬃcients (10)

! ri ðtÞ

forward to ! ri ðt þ ytÞ and backward to ! ri ðt ytÞboth to fourth order ! ri ðtÞ forward to ! ri ðt þ ytÞ : (to third order); ! ri ðtÞ forward :! to ri ðt þ yt=2Þ and backward : to ! ri ðty t=2Þ (to third order) ! ri ðtÞ

forward to ! r ðt þ ytÞ (to : i third order); ! ri around t : using ! ri ðt þ ytÞ and moving backward by (yt) as well as : : ! ri ðtÞ ri around (t +yt) using ! and moving forward by (yt) (to third order)

Computer Simulations Table 3B

117

Advantages and Disadvantages of the Diﬀerent Integration Schemes

Algorithm Euler

Properties . . . .

Predictor-corrector

. . .

Verlet

. . . .

Leapfrog

. .

Velocity-Verlet

. .

Simplest integration scheme Not time-reversible Advancing coordinates and velocities without information from the previous time step Severe energy drift in an NVE simulation Not time-reversible Requires higher order time-derivatives Storage requirements due to the number of time-derivatives Time-reversible Numerical imprecision due to the addition of relatively large and small numbers Coordinates and velocities are determined to diﬀerent orders; ! r~O(yt4), y! r/yt~O(yt2) Algorithm does not involve the velocities— temperature regulation by scaling is not possible Coordinates and velocities are determined to the same order; ! r~O(yt3), y! r/yt~O(yt3) Coordinates and velocities are determined at diﬀerent time Coordinates and velocities are determined to the same order; ! r~O(yt3), y! r/yt~O(yt3) Coordinates and velocities are determined at the same time

are then used to determine the correct velocities or accelerations (depending on the order of the equations of motion). Because the equations of motion were not involved in the predictor step, which was simply a Taylor series expansion, the corrected velocities or accelerations based on the equations of motion diﬀer generally from the predicted values. The diﬀerence between the corrected and the predicted velocities or accelerations is used with a set of Gear coeﬃcients to correct all of the predicted time derivatives. The Gear coeﬃcients are chosen to optimize the stability and accuracy of the trajectories. The coeﬃcients depend on the order of the diﬀerential equation and the number of derivatives used in the Taylor series expansion. The predictor–corrector algorithms usually store a signiﬁcant number of time derivatives of the coordinates at a particular time step. The alternative algorithms Verlet and Verlet-like algorithms require much less storage, as they use information from the current and previous time steps to advance

118

Peters

the coordinates in time. An overview of the advantages and disadvantages of the diﬀerent algorithms is listed in Table 3B. The derivation of the governing equations for these integration schemes is straightforward, and a summary of the ﬁnal equations is provided in Table 3A. An excellent discussion of diﬀerent algorithms can be found in the literature (10,16). 3.7 3.7.1

Implementation of the Algorithms Predictor–Corrector Algorithm

It is straightforward to implement the predictor–corrector algorithm for any of the ensembles considered. For each variable, i.e., particle coordinate, heat bath coordinate or volume coordinate, predictor equations are formulated. The corrector coeﬃcients (Gear coeﬃcients) are chosen according to the order of the equations of motion and according to the chosen ‘‘value’’ of the predictor–corrector algorithm, i.e., the number of terms in the predictor equations (10). In a simulation of an NVE ensemble, where the accelerations only depend on the position coordinates and not on the velocities, one may use the Verlet equation directly to integrate the equations of motion (40). However, for an NVT [i.e., constant number of particles (N), constant volume (V), and constant temperature (T)], or an NPT [i.e., constant number of particles (N), constant pressure ( P), and constant temperature (T)] simulation, one cannot use the Verlet algorithm because the accelerations also depend on the velocities at time t, which do not enter in the Verlet scheme. Instead, one has to use another Verlet-like algorithm such as the leapfrog algorithm, where the velocities are part of the integration scheme. 3.7.2

Leapfrog Algorithm

The leapfrog algorithm can be used directly when the accelerations depend only on the particle position coordinates as in the simulation of the NVE ensemble. To use the algorithm in situations where the accelerations also depend on velocities at time t, the calculations are not :straightforward : because the velocities at half-times [i.e., ! ri ðt þ yt=2Þ and ! ri ðt yt=2Þ ]: are involved directly in the algorithm. The problem is that the velocity ! ri ðtþ : ! yt/2) is unknown and is needed to determine ri ðtÞ. Hence one may have to ‘‘customize’’ the equations to the particular situation encountered. Two situations may arise when either the accelerations depend linearly on the velocities or on the square of the velocities. In the : former case, it is easy to solve the equation for the unknown velocity ! ri ðt þ yt=2Þ , whereas in the latter case a quadratic equation in the unknown velocity has to be solved. Usually, this is unattractive and as demonstrated below an iterative approach is used instead.

Computer Simulations

119

NVT Ensemble. For a simulation of an NVT ensemble, the equations of motion for the particle coordinates ri and the friction coeﬃcient f are (77– 80) ! ! : ! ¨ri ðtÞ ¼ Fi ðtÞ fðtÞr!i ðtÞ ð22Þ mi ! N : 2 1 X ri ðtÞ 3NkB T fðtÞ ¼ mi ! ð23Þ MS i¼1 where kB is the Boltzmann constant. The parameter MS is associated with the heat bath coordinate and regulates the changes in the friction coeﬃcient ~. In the limit where MS approaches inﬁnity, the conventional MD simulation that samples a microcanonical ensemble (NVE) is recovered. MS is proportional to the relaxation time, ss, and is deﬁned as MS=3NkBTss. A relatively large values of ss means that the variation of f(t) is relatively slow corresponding to a slow heat ﬂow between the system and the heat bath. The accelerations, ! r¨i ðtÞ, depend linearly on the velocity, and the expression for the half-time velocities becomes: : ! : ðFi ðtÞ=mi Þyt þ ½1 ð1=2ÞfðtÞytr!i ðt yt=2Þ ! ri ðt þ yt=2Þ ¼ ð24Þ ½1 þ ð1=2ÞfðtÞyt The position coordinates are advanced by the ordinary leapfrog algorithm (see Table 3A). The friction coeﬃcient f(t) is expanded to third order in yt like the coordinates, i.e., 1 ¨ 2 fðt þ ytÞ ¼ fðtÞ þ fðtÞyt þ fðtÞyt þ O ðyt3 Þ 2

ð25Þ

fðtÞ is given by Eq. (23), and the acceleration of the friction variable, f¨ ðtÞ, is found by diﬀerentiation of Eq. (23) with respect to time. f(t+yt) is then given by ! N ! : 2 1 X 1 fðt þ ytÞ ¼ fðtÞ mi ri ðtÞ 3NkB T yt þ MS i¼1 MS ! 3N X : ! ! mi ri ðtÞri¨ ðtÞ yt2 ð26Þ i¼1

: and ! ri ðtÞ is calculated as the average between the half time velocities at (t+yt/ 2) and (tyt/2) (see Table 3A). In summary, the leapfrog algorithm for the simulation of an NVT ensemble has the following scheme: (i) Eq. (24) is used

120

Peters

to advance the velocities, where the forces are determined from the potential energy function and the atomic positions at time t; (ii) the positions are advanced using the usual equations (Table 3A); (iii) the velocities at time t are calculated from the velocities at (t+yt/2) and (tyt/2); and (iv) the results are ﬁnally used in Eq. (26): to advance the friction coeﬃcient, f. In this scheme, : ri ðt yt=2Þ, and f(t). the stored quantities are ! ri ðtÞ, ! The equations of motion for this ensemble are (81) 2 ! ¨ : 1 ! VðtÞ VðtÞ ! !¨ ! 2 ðFi ðtÞ fðtÞpi ðtÞÞ þ ri ðtÞ ð27Þ ri ðtÞ ¼ mi 3VðtÞ 3VðtÞ

NPT Ensemble.

1 ¨ ðPðtÞ Pext Þ fðtÞVðtÞ VðtÞ ¼ MV 1 fðtÞ ¼ MS with

! N X ð pi Þ 2 i¼1

mi

ð28Þ !

ð3N þ 1ÞkB T

þ

MV 2 VðtÞ MS

ð29Þ

: !

!

p ðtÞ Fi ðtÞ VðtÞ ! pi ðtÞ ¼ fðtÞ þ mi 3VðtÞ mi mi ! : pi ðtÞ ! VðtÞ ¼ ri ðtÞ ! ri ðtÞ mi 3VðtÞ 0 1 !2 ! N X ! 1 @mi pi ðtÞ þ! PðtÞ ¼ ri ðtÞFi ðtÞA 3VðtÞ i¼1 mi

ð30Þ ð31Þ ð32Þ

There are two parameters introduced in the equations of motion. Ms and Mv are associated with the heat bath and the barostat, respectively. Mv is related to the relaxation time sv by Mv=PextsV2/rref3. The variable rref is a reference length here arbitrarily chosen to be the atom size parameter of the Lennard– r¨ ðtÞ and Jones potential. Pext is the external pressure. Both the accelerations ! ¨ V(t) depend on the squared velocities. Hence it will not be feasible to solve the coupled quadratic equations for the advanced velocities. Instead, an iterative approach has to be used. The friction coeﬃcient f(t+yt) is deter. mined like the NVT ensemble by a Taylor series expansion (Eq. (25)). ~ (t) is given by Eq. (29), and the acceleration ~¨ (t) is determined by diﬀerentiation . of ~ (t) (Eq. (29)) with respect to time :

! ! N X pi ðtÞ pi ðtÞ 2MV ¨fðtÞ ¼ 1 ¨ mi VðtÞVðtÞ þ MS i¼1 mi mi MS

ð33Þ

. where ! pi (t)/mi and ! pi ðtÞ=mi are given by Eqs. (30) and (31), respectively. The iterative determination of the advanced velocities requires an initial guess of

Computer Simulations

121

: : : : ! the ri ðtÞ and VðtÞ. A natural choice would be ! ri ðtÞ ¼:! ri ðt dt=2Þ and : velocities : : VðtÞ ¼ Vðt dt=2Þ. Additionally, ! ri ðtÞ, V(t), ! ri ðt dt=2Þ, Vðt dt=2Þ, and : f(t) have to be stored. These are used to determine the accelerations ri ðtÞ ¨ and VðtÞ, which provides an avenue for determining the advanced velocities : : ! dt=2Þ and V ðt þ dt=2Þ, and subsequently the new values for the velociri ðt þ : : ties ! ri ðtÞ and VðtÞ using the half-time velocities (Table 3A). These enter a new calculation of the accelerations and the advanced velocities. This process is continued until convergence, and ﬁnally the coordinates are advanced before continuing on to the next time step. Hence both particle velocities and volume velocity have to be determined by iteration because the variables are coupled, preventing an analytical solution. 4

CASE STUDIES

There are many aspects of medicinal chemistry and biophysics, where computer analysis and simulation can augment or explain experimental observations. Computers and sophisticated software packages have become an integral part in modern science. Hence there exists a vast amount of theoretical and computational contributions in the literature focusing on the understanding of enzyme structure and function. These studies cover a wide range of areas such as protein–lipid interactions (e.g., membrane-associated enzymes), ion transport through channels, solvent eﬀect on enzyme function, substrate binding, catalysis (enzyme kinetics), inhibitor design (molecular recognition), etc. In the following we will not attempt to provide a comprehensive review of the literature, but we will discuss two projects performed in our research group, MEMPHYS (82), which will highlight some possible applications of molecular dynamics simulations. 4.1

Lipase–Lipid Interactions: Implications for Hydrolysis

The research activities presented here include both computer simulations and experimental techniques (83). Depending on the question asked and on the complexity of the problem, simulations and experimental techniques are applied in a complementary fashion. Both approaches are aimed at elucidating important lipase–lipid interactions on an atomic level as well as at understanding how the lipid-interface interferes with the performance of lipases. 4.1.1

Background

The essential role of lipases in many biological and industrial processes has stimulated interest in elucidating the molecular details determining the function of triglyceride lipases. Triglyceride lipases catalyze the hydrolysis of glycerides into free fatty acids and monoglycerols (84). Biologically, lipases

122

Peters

are essential for transporting fats into cells for storage or for conversion into energy. Triglycerides in the diet, in lipoprotein particles, and in fat storage cells cannot be directly absorbed into cells. They must be hydrolyzed into free fatty acids and monoglycerols before they are able to cross cellular membranes. Commercially, lipases are used in various applications ranging from removal of oils or fats from fabrics to stereo-speciﬁc synthesis of compounds including precursors for biologically active therapeutics, herbicides, or pesticides (85–88). Lipases from diﬀerent organisms vary greatly in size ranging from molecular masses of 20–25 to 60–65 kDa. Although the amino acid sequence of lipases of origin is diverse, they all share the characteristic structural a/h hydrolase fold (89). Three-dimensional crystal structures of several lipases covering the full range of sizes have provided the basis for understanding the activation process and catalytic mechanism (90,91). Several crystal structures revealed that the active site of lipase is covered by a helical loop (‘‘lid’’), and that the activation process involves the displacement of the active-site lid. It should be noted, however, that the lid is not a ubiquitous feature and it has been found that some lipolytic enzymes have solvent accessible active sites. In aqueous solution, the activity of lipases is very low and only when the substrate concentration exceeds the critical micelle concentration is a sharp increase in enzyme activity observed. Hence the lipid interface triggers a conformational change in the enzyme (i.e., the displacement of the lid), by which the active site becomes accessible to the substrate. To illustrate this movement, the closed (‘‘inactive’’) and open (‘‘active’’) conformations of the Rhizomucor miehei lipase are shown in Fig. 4. The displacement of active-site lid results in the exposure of a hydrophobic site of the lid and thereby facilitating binding of the lipase to the interface (91). Hydrophobic interactions between residues of the lid and the lipid interface contribute to the stabilization of the open conformation. The observed conformational rearrangements correlate well with the phenomenon of interfacial activation. The mechanism for catalysis is analogous to that proposed for serine proteases (92,93), where the active site region possesses a [Ser/His/acidic acid] active site triad (as shown in Fig. 4) and a neighboring oxyanion hole. The triad is involved in the catalysis, where the serine forms the nucleophilic center of the sequence G–X–S–X–G, and the His residue serves as a general acid/ base. The oxyanion hole stabilizes the incipient carbonyl of the ester group during turnover. The catalytic reaction is initiated by forming a Michaelis– Menten complex between the substrate and the enzyme. The reactive carbonyl carbon atom of the ester bond is then attacked by the oxygen of the serine side chain leading to the formation of a tetrahedral intermediate (close to the transition state). During the formation of the intermediate, the hydrogen atom of the hydroxyl group of the serine is transferred to the histidine, thereby

Computer Simulations

123

Figure 4 Secondary structure of the closed and open conformation of Rhizomucor miehei lipase are shown on the left. The top view shown on the right displays the three residues (His–Ser–Asp) forming the catalytic triad. As indicated, the active site serine is covered by the lid in the closed conformation.

causing the histidine imidazole ring to become protonated (i.e., positively charged). The positive charge is stabilized by the negatively charged acidic residue in the triad. The tetrahedral intermediate, which subsequently breaks down to release alcohol and to form an acyl enzyme, is stabilized by two hydrogen bonds formed with amide bonds of residues belonging to the oxyanion hole. The protonated imidazole ring donates a proton to the leaving alcohol group. The acyl enzyme is then hydrolyzed by water or cleaved by a competing nucleophile in which one proton is transferred from the nucleophile through the imidazole group to the active-site serine residue (94). 4.1.2

Overview

As schematically shown in Fig. 5, the enzymatic reaction of lipases involves at least four processes: (1) binding to the lipid surface, (2) penetration into the lipid phase, (3) activation of the enzyme (i.e., displacement of the activesite lid), and (4) catalytic hydrolysis (including the formation of the transition state). Although the lipid interface is essential for eﬃcient catalysis, the exact role of the lipid–water interface is not well understood. However, there is increasing evidence that the properties of the interfacial plane are important for lipase action (95,96), and it has been shown that lipases are sensitive to many external factors including surface pressures and lipid composition. In order to reveal the structural and interfacial properties of the lipid ﬁlm at diﬀerent surface pressures, we have studied the structure and phase transitions 1,2-sn dipalmitoylglycerol monolayers by applying x-ray

124

Peters

Figure 5 Schematic illustration of the enzymatic reaction of lipases that involves at least four processes: (1) binding to the lipid surface, (2) penetration into the lipid phase, (3) activation of the enzyme, and (4) catalytic hydrolysis.

diﬀraction (96,97), pressure-area (k-A) isotherms (98), and computer simulations (99,100). As displayed in Fig. 6, the interfacial quality (measured by the hydrophilicity of the interface) is dependent on the surface pressure of the ﬁlm and hence may inﬂuence the activation and/or adsorption of the enzyme to the interface. Activation of the enzyme involves the displacement of the active-site lid, in order to allow access of the substrate to the active site suggesting that the displacement of the lid might be triggered by interactions between residues located in the lid and the head groups of the lipid interface. It is exactly this part of the activation pathway that, at the present time, is diﬃcult to probe by experimental means but where computer simulations might provide essential information. To investigate possible activation pathways and to elucidate the eﬀect of a hydrophobic environment (as it would be provided by a lipid surface) on the lid opening, we have applied molecular dynamics (MD) (101) and Brownian dynamics (BD) (102) techniques. Molecular dynamic simulations were performed to investigate the eﬀect of a hydrophobic environment on the activation of lipases, whereas the BD technique was applied to study the dynamics of the activating loop. Our results, which agree well with experimental observations, suggest that the activation of lipases is enhanced in a hydrophobic environment. An example is shown in Fig. 7 for Rhizomucor miehei lipase. At a dielectric constant of 4 (corresponding approximately to a lipid

Computer Simulations

125

Figure 6 Calculated hydrophilicity of the lipid interface as a function of area per lipid molecule extracted from a molecular dynamics trajectory. The hydrophilicity is expressed as the diﬀerence between the accessible areas of hydrophilic and hydrophobic atoms at the lipid interface.

Figure 7 Total mean energy diﬀerence, EactiveEinactive, as a function of dielectric constant. The energies shown are averages calculated from the energy diﬀerence obtained after opening of the active site lid and subsequently closing of the lid, i.e., total mean energy diﬀerence=[(EactiveEinactive)opening+(EactiveEinactive)closing]/2.

126

Peters

environment), the energy gain is approximately 30 kcal/mol. Additionally, the BD simulations revealed that the active-site lid exhibits some gating motion suggesting that the enzyme molecule may exist in a partially active form prior to the catalytic reaction as also suggested by recent x-ray crystallographic studies (103). Our ﬁndings indicate that the conformational and interfacial properties of the lipid may have considerable inﬂuence on the enzyme’s catalytic activity. Indeed, using ﬂuorescence microscopy, surface potential, and activity measurements, we obtained a detailed understanding of the inhibitory eﬀect of the additives, fatty acid and fatty alcohol, on the lipolytic activity of the bacterial lipase Pseudomonas cepacia, the yeast lipase Candida rugosa, and the fungal lipases Rhizomucor miehei and Rhizopus delemar (95). Measurements were performed for 1,2 didecanoyl-glycerol/eicosanoic acid and 1,2 didecanoyl-glycerol/1-octadecanol mix-monolayers. As shown in Fig. 8, small amounts of fatty acid have a signiﬁcant inhibitory eﬀect on the R. miehei lipase activity. These results indicate that lipase activity is strongly inﬂuenced by the lateral distribution of additives in the diglyceride matrix. Furthermore, the studies showed that the level of inhibition might be correlated to the isoelectric point (pl) of the enzymes (95). Initially, we concluded that repulsive charge–charge interactions between fatty acid moieties and

Figure 8 Lipase activities as a function of mole fraction of Eicosanoic acid. Data are shown for R. miehei lipase (Rm1) (1 unit). Subphase is 10 mM TRIS buﬀer with 0.1 mM EDTA; pH=8. Error bars are calculated from at least three independent experiments.

Computer Simulations

127

charged residues at the enzyme surface are responsible for this inhibition. However, as shown in Fig. 9, Humicola lanuginosa lipase shows relatively high binding aﬃnity to acidic phophatidylglycerol liposomes. It should be noted that a direct comparison of the results is diﬃcult to perform because the systems (monolayer vs. liposomes) and lipids (diglyceride/fatty acid vs. acidic phospholipids) are diﬀerent (104,105). With the increasing amount of information, it is also important to understand the physics and the chemistry that relate the structural fold of the protein and the structure of the binding site with the function and action of the enzyme. Substrate binding, enzymatic processes, and product release are often associated with conformational changes in the structure, and these structural changes require a certain ﬂexibility of the protein (106–108). Several studies have used molecular dynamics simulations to study the ﬂexibility of proteins and its relation to the biological function of the protein. In these studies, protein ﬂexibility was extracted using principal component analysis (109–111), which allows the separation of the internal protein motion in terms of relatively large collective motions and small thermal ﬂuctuations (112–114). This technique provides an avenue for extracting functionally relevant motions in the protein and for understanding the physical nature of the protein energy landscape (115–118). Therefore an essential aspect of protein function is the dynamic response of the protein upon substrate binding and product release in the presence of a lipid patch. To gain further insight into the structure–function relation of lipases, we have performed

Figure 9 Quenching of the pyrene monomer ﬂuorescence as a function of lipid concentrations. Liposomes are composed of 97 mol% DMPG and 3 mol% PDA. Excitation and emission wavelengths are 290 and 340 nm, respectively.

128

Peters

molecular dynamics simulations of R. miehei lipase in complex with a substrate or a product molecule in the presence of a lipid patch (Fig. 10). These simulations indicate that the dynamic responses of the substrate or product molecules are dependent on the environment (119). Entry and departure of substrate molecules could be observed in the presence of the lipid patch as shown in Fig. 10. Here, the snapshots of the initial conﬁguration (Fig. 10a) and a conﬁguration taken after 1000 psec (Fig. 10b) are shown that display the entry (solid circle) and departure (dashed circle) of a substrate molecule. The case of the simulation with a product molecule reveals a diﬀerent picture. Analysis of the hydrogen pattern between product (fatty acid) and residues in the binding cleft along the trajectory revealed that two serine residues form stable hydrogen bonds with the substrate and hence might be involved in the mechanism of product inhibition (Fig. 11) (119). Important questions remain regarding the exact orientation of lipase molecules at the interfacial plane and the mobility of the active site lid. Using x-ray reﬂectivity measurements supplemented by molecular dynamics simulations, we have gained insight into the orientation of the lipase molecules as well as the conformation of the lipase molecules (i.e., closed or open con-

Figure 10 Secondary structure of R. miehei lipase (Rml) complexed with a substrate molecule and in the presence of the lipid patch consisting of substrate molecules. Active site lid is shown as a rod, whereas atoms of the active serine are displayed in van der Waals modus. The substrate molecules are displayed in sticks. Images (a) and (b) are snapshots of the simulation taken at the start and after 1 nsec. See text for more details.

Computer Simulations

129

Figure 11 Snapshots taken along the trajectory from the Rml-product-patch simulation showing the hydrogen bond pattern between the carboxylate group of the product molecule and residues in the binding pocket. Hydrogen bonds are shown as solid bonds. The snapshots are taken at (a) 0, (b) 600, and (c) 1940 psec.

formation). Fig. 12 displays an example of the calculated electron density proﬁles (q(z)) across the enzyme, which were extracted from a molecular dynamics trajectory and calculated for diﬀerent orientations. Clearly, the calculated proﬁles depend characteristically on the orientation of the enzyme. Hence this approach provides a route for determining the orientation of the lipase molecule at an air–water interface by ﬁtting the calculated proﬁles to the corresponding experimental data recorded from synchrotron x-ray reﬂectivity measurements (120). The computational scheme developed would provide an avenue for determining the lipase orientation at the interface of diﬀerent lipid monolayers (alkane, alcohols, diglyceride, etc.) and for elucidating the eﬀect of diﬀerent lipid headgroups on the orientation of lipases. We have considered an alkane/water interface to study the orientation of lipases on such a hydrophobic surface. Initial synchrotron x-ray reﬂectivity measurements of the alkane/water system suggested that the water molecules behave very diﬀerently when compared to bulk water (121). Therefore we ﬁrst proceeded to elucidate the structure of the water at that interface, as this could be an important eﬀect on the adsorption and orientation of lipase molecules. The measurements could not reveal the exact water structure, and we therefore performed molecular dynamics simulations on the alkane/water and alcohol/water systems. The simulations revealed that water molecules are oriented characteristically at these interfaces. Calculated electron density proﬁles are shown in Fig. 13, which compare well with the experimentally

130

Peters

Figure 12 Orientation-speciﬁc electron density proﬁles for selected orientations of the open (dashed) and closed (solid lines) conformation of Thermomyces lanuginosa lipase. The Euler angles of rotation are indicated in each panel. For example, (0,0,0) means that the active site lid is aligned with the surface normal nz. Snapshots of the closed and open conformers taken after 2 nsec of molecular dynamics simulations are shown to the right of the proﬁles and are in orientations referring to the particular electron density proﬁle. The active site lid is displayed as a solid tube.

Computer Simulations

131

determined proﬁles (122). There are settled diﬀerences in the interfacial water density when comparing the water electron density proﬁle determined at the alkane monolayer with the proﬁle obtained in the presence of the alcohol monolayer. 4.2

Regulation and Substrate Specificity of Protein Tyrosine Phosphatases

Protein tyrosine phosphatases (PTPs) are critical elements in the regulation of signal transduction pathways in living organisms, and their unregulated activities are related to diverse pathological events. Several kinds of diseases have been related to an increase in the PTP transcription. For instance, the receptor-like phosphatase, PTPa, has been found to possess oncogenic activity. Another example is the cytosolic phosphatase, PTP1B, which might be involved in the development of diabetes. Consequently, phosphatases may

Figure 13 Normalized electron density proﬁles for n-alkane C36H74 and n-alcohol C35H71OH. The proﬁles are extracted from molecular dynamics simulations of crystalline monolayers of these amphiphilic molecules at a water surface.

132

Peters

represent a potential therapeutic target. To determine the biological function and to design inhibitory agents, it would be of enormous value to have a detailed structural understanding of the regulation of these enzymes and how phosphatases distinguish the diﬀerent substrates that they encounter in the cell. To elucidate the molecular mechanisms underlying substrate speciﬁcity and inhibitor selectivity, we have used a multidisciplinary approach, which combines theoretical approaches, computer simulations, and experimental techniques in a complementary fashion (123). 4.2.1

Background

The phosphorylation/dephosphorylation of tyrosine residues in proteins is one of several key molecular mechanisms by which living organisms regulate cell growth, proliferation, and diﬀerentiation (124). The phosphorylation state of proteins is remarkably dynamic, which enables cells to respond rapidly to discrete changes in environmental conditions (125). This dynamic behavior is governed by the opposing actions of protein-tyrosine kinases and protein-tyrosine phosphatases (Fig. 14), which are integrated within an elaborate signal-transducing network, an enzyme-based system, which converts external environmental stimuli to internal cellular action. The defective or inappropriate operation of this network is at the root of a variety of diseases in humans and animals. Consequently, the characterization of the individual components and the delineation of the circuitry of this regulatory network have emerged as one of the most active ﬁelds in biological research. The critical roles played by these phosphatases in pathological events indicate that these signaling enzymes are suitable targets for pharmacological intervention and that this may be achievable in a selective manner. Clearly, a selective control of the biological function of PTPs is a challenging task, and an inhibitor must not only tenaciously bind to the speciﬁc target enzyme, but must do so without impeding the catalytic behavior of closely related en-

Figure 14 kinases.

Illustration of the interplay between protein phosphatases and protein

Computer Simulations

133

zymes. Many PTPs show strong substrate selectivity, and it is generally believed that the activity of PTPs is regulated by speciﬁc substrates containing the requisite structural recognition sites and/or by conﬁning individual PTPs and protein-tyrosine kinases to speciﬁc cellular microenvironments. 4.2.2

Overview

The PTPs are characterized by having a common active site sequence: the (H/ V)CX5R(S/T) motif (X denotes any amino acid residue). This conserved motif, which in the literature is referred to as the ‘‘P-loop’’, plays an important role in the binding of the substrate and subsequent catalytic reaction. It deﬁnes the binding site for the tyrosyl phosphate substrate and contains a nucleophilic cysteinyl residue as well as a conserved arginine residue separated by ﬁve residues. The backbone nitrogens of the P-loop and three of the arginine nitrogens coordinate the oxygens of the phosphate group. Binding of the substrate triggers the displacement of a loop (referred to as the ‘‘WPDloop’’ in the literature) toward the phosphate moiety, which causes a tight binding of the tyrosyl phosphate group and brings a catalytically active aspartate (general acid/base) in position for the catalytic reaction (Fig. 15). The cysteine thiolate in the consensus sequence functions as a nucleophile and

conformation: open substrate

closed Asp Asp Cys

H2 N-L-pY-EDAD influence of this site on binding

Gln

Figure 15 Secondary structure of PTP1B complexed with the hexa-peptide DADEpYL-NH2 (pY stands for phosphorylated tyrosine) is displayed on the right. The peptide structure is shown on the left. The active site Cys, Gln (coordinates a water molecule that participates in the hydrolysis), Asp (general acid/base), and substrate are shown in stick modus.

134

Peters

electrical field electrical field from N-H bond

helix Arg H

Cys H

H H

H

H

Figure 16 Illustration of the electrical ﬁeld surrounding the negatively charged active site cysteine.

attacks the phosphorous atom in the substrate resulting in the formation of the thiophosphate enzyme intermediate and product release (dephosphorylated substrate). The transition state is then destabilized by the attack of a water molecule yielding inorganic phosphate. The ﬁrst PTP structure solved was PTP1B, which has initiated much interest in this ﬁeld, as it has been suggested that PTP1B is a negative regulator of the insulin signaling. The secondary structure of PTP1B complexed with the hexa-peptide, DADEpYL-NH2 (pY stands for phosphorylated tyrosine), is displayed in Fig. 15. The active Cys, the conserved Gln, the essential general acid/base (Asp), and the substrate are shown in stick modus. The active site Cys (residue 215 in PTP1B), which facilitates the catalysis, is negatively charged at physiological pH (i.e., the pKa of the Cys residue in the protein is lower than the pKa of a free cysteine residue). It has

Figure 17 (A) Titration results for PTP1B. pKas computed with modiﬁed backbone charges as a function of pKas in PTP1B. ‘‘Loop’’ indicates the pKa shift when the backbone charges of all residues in the loop are zeroed. ‘‘Helix’’ marks the pKa shift when the backbone charges of all amino acids in the central a-helix are zeroed. ‘‘Helix+loop’’ indicates that backbone charges of all residues in the loop and central a-helix are zeroed. These values should be seen in relation to the pKa of 8.3 for a free cysteine amino acid residue. (B) Time evolution of the active site cysteine’s pKa in the presence or absence of the substrate. Solid line indicates the pKa value (8.3) of a free cysteine residue.

136

Peters

been suggested from the crystal structure that the hydrogens in the backbone of the P-loop stabilizes the negative charge at the active site Cys (see Fig. 16). To further study the origin of the low pKa value of the cysteine on an atomic level, we performed macroscopic electrostatic calculations using the so-called single site titration method, which is based on the Poisson– Boltzmann methodology (126 and reference therein). The methodology calculates the electrostatic ﬁeld around the titratable residues, iteratively accounting for the inﬂuence of neighboring charges. From the electrostatic ﬁeld the apparent pKa can be computed, which is shown as a function of pKas in PTP1B in Fig. 17A. The insert ‘‘loop’’ indicates the pKa shift when the backbone charges of all residues in the loop are zeroed. ‘‘Helix’’ marks the pKa shift when the backbone charges of all amino acids in the central ahelix are zeroed. ‘‘Helix+loop’’ indicates that the backbone charges of all residues in the loop and in the central a-helix are zeroed. The analysis of the charges contributing to the pKa shift shows that the net inﬂuence of titratable charges is negligible. The major contribution stems from the electrostatic microdipoles created by the backbone charges of the consensus sequence (H/V)CX5R(S/T). The peculiar loop structure of this motif shows that these microdipoles are directed toward the thiol atom of the active site cysteine, thereby giving rise to a pKa shift. In a subsequent study, we could show that the pKa shift is independent of protein ﬂexibility (Fig. 17B) and thoroughly stems from the architecture of the binding pocket (127). One of the surprising observations is the preference of PTP1B for negatively charged peptide substrates. Experimentally, it has been observed that PTP1B has a high catalytic eﬃciency for the phosphotyrosine-containing peptide DADEpYL, whose sequence is derived from the autophosphorylation site of the epidermal growth factor receptor (EGFR988–998). The high binding aﬃnity is particularly surprising because the total charge of all titratable residues in PTP1B is 6. To determine the origin of the preference of PTP1B for negatively charged peptides, we performed macroscopic electrostatic calculations using again the so-called single-site titration method discussed above. These calculations reveal that there is a positively charged electrostatic ﬁeld surrounding the active site pocket, which may serve as a trap for negatively charged peptides (128) and as a possible diﬀusion path for substrates. In contrast to PTP1B, such a ﬁeld is not observed for PTPa suggesting that DADEpYL is a poor substrate for PTPa. Indeed, kcat/Km for the hydrolysis of the hexa-peptide is four times lower than the value determined for PTP1B (Fig. 18). To further study the origin of the observed substrate speciﬁcities, we tested various peptides based on the phosphorylation site of diﬀerent receptors. In all cases, the diﬀerent peptides were very eﬃciently catalyzed by PTP1B but were only poorly recognized by PTPa suggesting that the inherent substrate speciﬁcity of these PTPs resides

Computer Simulations

137

Figure 18 Catalytic eﬃciency of the hydrolysis of the hexa-peptide Ac-DADEpYLNH2 (pY stands for phosphorylated tyrosine) with PTP1B, PTP1BG259Q, PTPa, and PTPaQ259G. The subscripts indicate the mutation; G259Q, for instance, means that Gly259 in PTP1B has been mutated to Gln.

in their active sites (data not shown). To identify the areas in PTPs that could determine the substrate speciﬁcity, we have performed a detailed structural analysis of the variability and conservation of amino acid residues in the vicinity of the active site. Based on Ca regiovariation analyses and primary sequence alignments, we could identify regions in the binding cleft, which might confer substrate speciﬁcity between PTPs. Among those, residues 47, 48, 258, and 259 (PTP1B numbering) are of particular interest (129). We were able to show that, in particular, residues 48 and 259 are involved in substrate speciﬁcity and are potential targets for inhibitor design. Position 48 is occupied by an Asp residue in PTP1B but by an Asn in PTPa. Here selectivity toward substrates or potential inhibitors is governed by electrostatic interactions (i.e., salt bridge formation) (130,131). On the other hand, the selectivity impaired by residues 258 and 259 might be due to steric hindrance. In PTP1B, residue 259 is a glycine, and we hypothesized that the lack of a side chain would allow easy access to the active site, whereas bulky residues in this position as found in PTPa (259 is a

138

Peters

glutamine) and other PTPs might cause steric hindrance. Thus it appeared that Gly259 and Cys258 in PTP1B form the bottom of an open cleft, a gateway, which leads to the active site. To elucidate the mechanism of substrate recognition for this particular substrate site, we performed a detailed enzyme kinetic analysis of PTP1B, PTPa, and single mutants of these enzymes using, among other peptides, Ac-DADEpYL-NH2. As shown by the catalytic eﬃciencies in Fig. 18, replacing Gly259 in PTP1B with a Gln (PTP1BG259Q mutant) caused steric hindrance and concomitant restricted substrate recognition. In contrast, by substituting Gln259 for a glycine in PTPa (PTPaQ259G mutant), we obtained an enzyme with broad substrate recognition capacity, i.e., an enzyme similar to PTP1B (132). Our studies, however, pointed also to a more complex picture regarding the involvement of residue 259 in substrate recognition and hydrolysis. Thus using the above mutational approach, we noted that bulky 259 residues—in addition to the described steric hindrance—might indirectly inﬂuence the catalytic activity of PTPs. We hypothesized that this eﬀect was mediated by an interference in the rotational freedom of residue 262, which is a conserved glutamine in most PTPs and critical for the positioning of a water molecule in the second step of catalysis. Thus it seems likely that bulky residues in the 259 position would negatively inﬂuence substrate hydrolysis both due to a direct eﬀect caused by reduced substrate binding and impairing hydrolysis. To evaluate the inﬂuence of residue 259 on Gln262, we are currently performing molecular dynamics simulations using PTP1B and a mutant of PTP1B in which a deﬁned set of four residues have been introduced as a model for PTPa. As mentioned above, PTP1B has a glycine in position 259, whereas PTPa has bulky glutamine in that position. In particular, we are interested in monitoring the ﬂexibility of Gln262, which during catalytic reaction swings into the binding pocket and coordinates a water molecule, which is essential in the catalytic reaction. For each enzyme, two cases are modeled: the Michaelis–Menten complex with the substrate analogue p-nitrophenyl phosphate bound to the active site and the cysteine–phosphor complex. Preliminary results for the wild-type PTP1B and the mutant show signiﬁcantly diﬀerent behavior of Gln262 when the cysteine–phosphor complex is formed. In Fig. 19, the distances between Gln262 (Q262(CD)) and the phosphor of the cysteine–phosphor complex (indicated by P) are shown. For PTP1B (top), Gln262 can freely swing toward the phosphate group, whereas in the mutant structure (bottom), Gln262 does not approach the phosphate moiety. This suggests that Gln262 has a higher ﬂexibility in the wild-type structure than in the mutant structure. Furthermore, the simulation results identiﬁed several key interactions between Gln262 and surrounding residues providing a way to explain the diﬀerence in the experimentally observed catalytic eﬃciency for the enzymes on an atomic level (133).

Computer Simulations

139

Figure 19 Flexibility of the Gln262 side chain in PTP1B (top) and from the quadruple mutant, R47V/D48N/M258C/G259Q (bottom) as extracted from molecular dynamics simulations. As an indication of the mobility of the side chain, the distance between Gln262 (Q262(CD)) and the phosphor of the cysteine–phosphor complex was monitored. Right panels show schematically the interactions of Gln262 with the water molecule (top) and the water molecule and Gln259 (bottom). See the text for more details.

ACKNOWLEDGMENTS The author would like to acknowledge ﬁnancial support from the Danish National Research Foundation via a grant to MEMPHYS—Center for Biomembrane Physics, from the Danish Cancer Research Foundation, and from the Danish Natural Science Research Council.

REFERENCES 1. 2.

GF Fishman. Monte Carlo—Concepts, Algorithms, and Applications. New York: Springer, 1996. MEJ Newman, GT Barkema. Monte Carlo Methods in Statistical Physics. New York: Oxford University Press, 1999.

140

Peters

3.

K Binder. The Monte Carlo Method in Condensed Matter Physics. Berlin: Springer, 1990. K Binder, DW Hermann. Monte Carlo Simulation in Statistical Physics. Berlin: Springer, 1988. K Binder. Monte Carlo Methods in Statistical Physics. Berlin: Springer, 1986. K Binder. Applications of the Monte Carlo Method in Statistical Physics. Berlin: Springer, 1984. OG Mouritsen. Computer Studies of Phase Transitions and Critical Phenomena. Berlin: Springer, 1984. H Gould, J Tobochnik. An Introduction to Computer Simulation Methods: Applications to Physical Systems. Part 2. Reading, MA: Addison-Wesley, 1988. S Jain. Monte Carlo Simulations of Disordered Systems. Singapore: World Scientiﬁc, 1992. MP Allen, DJ Tildesley. Computer Simulation of Liquids. Oxford: Oxford University Press, 1989, pp 71–108. C Branden, J Tooze. Introduction to Protein Structure 2nd ed. New York: Garland Publishing Inc., 1999. P Bratley, BL Fox, LE Schrage. A Guide to Simulation. New York: Springer Verlag, 1987. CL Brooks III, M Karplus, BM Pettitt. A Theoretical Perspective of Dynamics, Structure, and Thermodynamics. New York: Wiley Interscience, 1988. NR Cohen. Guidebook on Molecular Modeling in Drug Design. San Diego, CA: Academic Press, 1996, pp 1–26. A Fersht. Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding. New York: WH Freeman and Company, 1999. D Frenkel, B Smit. Understanding Molecular Simulations. From Algorithms to Applications. San Diego, CA: Academic Press, 1996. H Gould, J Tobochnik. An Introduction to Computer Simulation Methods: Applications to Physical Systems. Part 1. Reading, MA: Addison-Wesley, 1988. JM Haile. Molecular Dynamics Simulations: Elementary Methods. New York: Wiley, 1992. M Kalos, PA Whitlock. Monte Carlo Methods. New York: John Wiley and Sons, 1986. AR Leach. Molecular Modelling. Principles and Applications. Essex, England: Addison-Wesley Longman, 1996. DC Rapaport. The Art of Molecular Dynamics Simulation. Cambridge, England: Cambridge University Press, 1995. W van Gunsteren, P Weiner, AT Wilkinson. Computer Simulation of Biomolecular Systems: Theoretical and Experimentational Applications. Leiden, The Netherlands: ESCOM, 1996. JM Thijssen. Computational Physics. Cambridge: Cambridge University Press, 1999. H Gould, L Spornick, J Tobochnik. Thermal and Statistical Physics

4. 5. 6. 7. 8.

9. 10. 11. 12. 13. 14. 15. 16. 17.

18. 19. 20. 21. 22.

23. 24.

Computer Simulations

25. 26. 27.

28. 29. 30. 31. 32. 33.

34. 35.

36.

37. 38.

39. 40. 41. 42. 43. 44.

141

Simulations: The Consortium for Upper-level Physics Software. New York: Wiley, 1995. DW Hermann. Computer Simulation Methods. Berlin: Springer, 1990. K Binder. Monte Carlo and Molecular Dynamics Simulations in Polymer Science. New York: Oxford University Press, 1995. MP Allen, DJ Tildesley. Computer Simulation in Chemical Physics. NATO ASI Series C: Mathematical and Physical Sciences. Dordrecht: Kluwer Academic Press, 1993, Vol 397. D Raabe. Computational Materials Science. Weinheim: Wiley-VCH, 1998. FJ Vesley. Computational Physics—An Introduction. New York: Plenum Press, 1994. FF Abraham. Computational statistical mechanics—methodology, applications, and supercomputing. Adv Phys 35:1–111, 1986. P Stoltze. Simulation methods in atomic-scale materials physics. Lyngby: World Scientiﬁc, 1992. BJ Alder, TE Wainwright. Phase transition for a hard sphere system. J Chem. Phys. 27:1208, 1957. MW Maddox, ML Longo. A Monte Carlo study of peptide insertion into lipid bilayers: equilibrium conformations and insertion mechanisms. Biophys J 82:244–263, 2002. H Berry. Monte Carlo simulations of enzyme reactions in two dimensions: fractal kinetics and spatial segregation. Biophys J 83:1891–1901, 2002. EI Michonova-Alexova, IP Sugar. Component and state separation in DMPC/ DSPC lipid bilayers: A Monte Carlo simulation study. Biophys J 83:1820–1833, 2002. AFP de Araujo, TC Pochapsky. Monte Carlo simulations of protein folding using inexact potentials: how accurate must parameters be in order to preserve the essential features of the energy landscape? Fold Des 1:299–314, 1996. N Metropolis, A Rosembluth, M Rosembluth, A Teller. Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092, 1953. J Skolnick, A Kolinski. Dynamic Monte Carlo simulations of a new lattice model of globular protein folding, structure and dynamics. J Mol Biol 221:499– 532, 1991. BJ Alder, TE Wainwright. Studies in molecular dynamics I General method. J Chem. Phys. 31:459, 1959. AJ Rahman. Correlations in the motion of atoms in liquid argon. Phys Rev A 136:405, 1964. AJ Rahman, FH Stillinger. Molecular dynamics study of liquid water. J Chem Phys 55:3336–3359, 1971. AJ Rahman, FH Stillinger. Improved simulation of liquid water by molecular dynamics. Chem Phys 60:1545–1557, 1974. JA McCammon, BR Gelin, M Karplus. Dynamics of folded proteins. Nature 267:585–590, 1977. W Wang, O Donini, CM Reyes, PA Kollman. Biomolecular simulations: Recent developments in force ﬁelds, simulations of enzyme catalysis, protein–

142

45. 46.

47. 48. 49.

50.

51.

52.

53.

54.

55.

56.

57.

Peters ligand, protein–protein, and protein–nucleic acid noncovalent interactions. Ann Rev Biophys Biomol Struc 30:211–243, 2001. E Schro¨dinger. The relation between the quantum mechanics of Heisenberg, Born and Jordan and that of Schro¨dinger. Ann Phys 79:734–756, 1926. E Schro¨dinger. Quantisation as a problem of characteristic values, the perturbation theory and its application to the Stark-Eﬀect of the H Balmer Lines. Ann Phys 80:437–490, 1926. E Schro¨dinger. An undulatory theory of the mechanics of atoms and molecules. Phys Rev 28:1049–1070, 1926. M Braxenthaler, R Unger, D Auerbach, JA Given, J Moult. Chaos in protein dynamics. Proteins 29:417–425, 1997. JB Clarage, T Romo, BK Andrews, BM Pettitt. A sampling problem in molecular dynamics simulations of macromolecules. Proc Natl Acad Sci USA 92:3288–3292, 1995. M Levitt, M Hirshberg, R Sharon, V Dagget. Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution. Comp Phys Comm 91:215–231, 1995. SW Bunte, H Sun. Molecular modeling of energetic materials: the parameterization and validation of nitrate esters in the COMPASS force ﬁeld. J Phys Chem B 104:2477–2489, 2000. J Wang, P Cieplak, PA Kollman. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem 21:1049–1074, 2000. P Cieplak, J Caldwell, P Kollman. Molecular mechanical models for organic and biological systems going beyond the atom centered two body additive approximation: aqueous solution free energies of methanol and N-methyl acetamide, nucleic acid base, and amide hydrogen bonding and chloroform/ water partition coeﬃcients of the nucleic acid bases. J Comput Chem 22:1048– 1057, 2001. AD MacKerell Jr, J Wio´rkiewicz-Kuczera, M Karplus. An all-atom empirical energy function for the simulation of nucleic acids. J Am Chem Soc 117: 11946–11975, 1995. AD MacKerell Jr, D Bashford, M Bellott, RL Dunbrack Jr, JD Evanseck, MJ Field, S Fischer, J Gao, H Guo, S Ha, D Joseph-McCarthy, L Kuchnir, K Kuczera, FTK Lau, C Mattos, S Michnick, T Ngo, DT Nguyen, B Prodhom, WE Reiher III, B Roux, M Schlenkrich, JC Smith, R Stote, J Straub, M Watanabe, J Wio´rkiewicz-Kuczera, D Yin, M Karplus. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J Phys Chem 102:3586–3616, 1998. M Schlenkrich, J Brickmann, AD MacKerell Jr, M Karplus. An empirical potential energy function for phospholipids: criteria for parameter optimization and applications. In: KM Merz Jr, B Roux, eds. Biological Membranes: A Molecular Perspective from Computation and Experiment. Birkhauser, 1996, pp 31–81. SN Ha, A Giammona, M Field, JW Brady. A revised potential-energy surface

Computer Simulations

58. 59. 60. 61. 62. 63.

64. 65. 66. 67. 68. 69.

70. 71.

72. 73.

74.

75.

76.

143

for molecular mechanics studies of carbohydrates. Carbohydr Res 180:207– 221, 1988. R Garemyr, A Elofsson. Study of the electrostatics treatment in molecular dynamics simulations. Proteins 37:417–428, 1999. A Warshel, ST Russel. Calculations of electrostatic interactions in biological systems and in solutions. Q Rev Biophys 17:283–422, 1984. A Warshel, J A˚qvist. Electrostatic energy and macromolecular function. Ann Rev Biophys Chem 20:267–298, 1991. RJ Loncharich, BR Brooks. The eﬀects of truncating long-range forces on protein dynamics. Proteins 6:32–45, 1989. B Honig, A Nicholls. Classical electrostatics in biology and chemistry. Science 268:1144–1149, 1995. Z Radic, PD Kirchhoﬀ, DM Quinn, JA McCammon, P Taylor. Electrostatic inﬂuence on the kinetics of ligand binding to acetylcholinesterase. J Biol Chem 272:23265–23277, 1997. PJ Steinbach, BR Brooks. New spherical-cutoﬀ methods for long-range forces in macromolecular simulation. J Comput Chem 15:667–683, 1994. T Darden, D York, L Pedersen. Particle mesh ewald: An n log(n) method for ewald sums in large systems. J Chem Phys 98, 1993. U Essmann, L Perera, ML Berkowitch, T Darden, L Hsing, LG Pedersen. A smooth particle mesh Ewald method. J Chem Phys 103, 8577–8593, 1995. RW Hockney, JW Eastwood. Computer Simulation Using Particles. New York: McGraw-Hill, 1981. V Rokhlin. Rapid solution of integral equations of classical potential theory. J Comput Phys 60:187–207, 1985. JA Board Jr, JW Causey, JF Leathrum Jr, A Windemuth, K Schulten. Accelerated molecular dynamics simulations with the parallel fast multipole algorithm. Chem Phys Lett 198:89–94, 1992. E Pollock, J Glosli. Comments on pppm, fmm and the Ewald method for large periodic coulombic systems. Comp Phys Comm 95:93–110, 1996. F Figuerido, R Levy, R Zhou, B Berne. Large scale simulation of molecules in solution: combining the periodic fast multipole method with multiple time step integrators. J Chem Phys, 9835–9849, 1997. C Sagui, TA Darden. Molecular dynamics simulations of biomolecules: Longrange electrostatic eﬀects. Annu Rev Biophys Biomol Struct 28:155–179, 1999. T Schlick, RD Skeel, AT Brunger, LV Kale, J Board, J Hermans, K Schulten. Algorithmic challenges in computational molecular biophysics. J Comput Phys 151:9–48, 1999. DM York, TA Darden, LG Pedersen. The eﬀect of long-range electrostatic interactions in simulations of macromolecular crystals: A comparison of the Ewald and truncated list methods. J Chem Phys 99:8345–8348, 1993. DM York, W Yang, H Lee, T Darden, LG Pedersen. Toward the accurate modelling of DNA: The importance of long-range electrostatics. J Am Chem Soc 117:5001–5002, 1995. T Fox, PA Kollman. The application of diﬀerent solvation and electrostatic

144

77. 78. 79. 80. 81. 82. 83.

84. 85. 86. 87. 88. 89.

90. 91. 92.

93.

94.

Peters models in molecular dynamics simulations of ubiquitin: how well is the x-ray structure ‘‘maintained’’? Proteins: Struct Func Gen 25:315–334, 1996. S Nose. Mol Phys 52:255–268, 1984. S Nose. A uniﬁed formulation of the constant temperature molecular dynamics methods. J Chem Phys 81:511–519, 1984. WG Hoover. Canonical dynamics: equilibrium phase-space distributions. Phys Rev A 31:1695–1697, 1985. WG Hoover. Molecular Dynamics. Berlin: Springer, 1987. M Parrinello, A Rahman. Crystal structure and pair potentials: a moleculardynamics study. Phys Rev Lett 45:1196–1199, 1980. The author is a member of the MEMPHYS Group—Center for Biomembrane Physics. This work has been performed in collaboration with several research groups involving Risø National Laboratory (K Kjaer, DK), Centre for Interdisciplinary Studies of Molecular Interactions (CISMI) (T Bjørnholm, DK), MaxPlank Institute Berlin (G Brezesinski, H Mo¨hwald, D), European Molecular Biology Laboratory (EMBL) (R Wade, D), University of Helsinki (PKJ Kinnunen, FIN), Novo Nordisk A/S (RP Bywater, DK) and Novozyme A/S (A Svendsen, DK). P Woolley, SB Petersen. Lipases: Their Structure, Biochemistry and Applications. Cambridge: Cambridge University Press, 1994, pp 271–288. AR Macrae. Lipase catalyzed interesteriﬁcation of oils and fats. J Am Oil Chem Soc 60:291–294, 1983. W Boland, C Frobel, M Lorentz. Esterolytic and lipolytic enzymes in organic synthesis. J Synthetic Org Chem 12:1049–1072, 1991. BA van Kuiken, WD Behnke. The activation of porcine pancreatic lipase by cis-unsaturated fatty acid. BBA 214:148–160, 1994. AR Macrae, RC Hammond. Present and future applications of lipases. Biotechnol Gen Eng Rev 3:193–217, 1985. M Cygler, JD Schrag, JL Sussman, M Harel, I Silman, MK Gentry, BP Doctor. Relationship between sequence conservation and three-dimensional structure in a large family of esterases, lipases, and related proteins. Protein Sci 2:366–382, 1993. ZS Derewenda. Structure and function of lipases. Adv Protein Chem 45:1–52, 1994. GG Dodson, DM Lawson, FK Winkler. Structural and evolutionary relationships in lipase mechanism and activation. Faraday Discuss 93:95–105, 1992. M Norin, F Haeﬀner, A Achour, T Norin, K Hult. Computer modeling of substrate binding to lipases from Rhizomucor miehei, Humicola lanuginosa, and Candida rugosa. Protein Sci 3:1493–1503, 1994. AT Yagnik, JA Littlechil. Molecular modelling studies of substrate binding to the lipase from Rhizomucor miehei. Comput Aided Mol Design 11:256–264, 1997. L Brady, AM Brzozowski, ZS Derewenda, E Dodson, S Tolley, JP Turkenburg, L Christiansen, B Huge-Jensen, L Nørskov, L Thim, U Menge. A serine

Computer Simulations

145

protease triad forms the catalytic centre of a triacylglycerol lipase. Nature 343:767–770, 1990. 95. GH Peters, U Dahmen-Levison, K de Meijere, G Brezesinski, S Toxvaerd, H Mo¨hwald, A Svendsen, PKJ Kinnunen. Inﬂuence of surface properties of mixmonolayers on lipolytic hydrolysis. Langmuir 16:2779, 2000. 96. GH Peters, S Toxvaerd, NB Larsen, T Bjørnholm, K Schaumburg, K Kjaer. Structure and Dynamics of Lipid Monolayers: Implications for enzyme catalysed lipolysis. Nat Struct Biol 2:401, 1995. 97. GH Peters, NB Larsen, T Bjørnholm, S Toxvaerd, K Schaumburg, K Kjaer. X-ray diﬀraction and molecular dynamics studies: structural analysis of phases in diglyceride monolayers. Phys Rev E 57:3153, 1998. 98. GH Peters, S Toxvaerd, NB Larsen, T Bjørnholm, K Schaumburg, K Kjaer. Phase transitions in di-glyceride monolayers studied by computer simulations, pressure-area isotherms and x-ray diﬀraction. Nuovo Cim 16:1479, 1994. 99. GH Peters, S Toxvaerd, A Svendsen, OH Olsen. Modeling of complex biological systems: I. Molecular dynamics studies of di-glyceride monolayers. J Chem Phys 100:5998, 1994. 100. GH Peters, S Toxvaerd, OH Olsen, A Svendsen. Modeling of complex biological systems: II. Eﬀect of chainlength on the phase transitions observed in diglyceride monolayers. Langmuir 11:4072, 1995. 101. GH Peters, S Toxvaerd, O Olsen, A Svendsen. Computational studies of activation of lipases and the eﬀect of a hydrophobic environment. Protein Eng 10:137, 1997. 102. GH Peters, OH Olsen, A Svendsen, R Wade. Theoretical investigation of the dynamics of the active site lid in Rhizomucor miehei lipase. Biophys J 71:119, 1996. 103. AM Brzozowski, H Savage, CS Verma, JP Turkenburg, DM Lawson, A Svendsen, S Patkar. Structural origins of the interfacial activation in Thermomyces (Humicola) lanuginosa lipase. Biochemistry 39:15071, 2000. 104. GH Peters, A Svendsen, H Langberg, J Vind, SA Patkar, S Toxvaerd, PKJ Kinnunen. Active serine involved in the stabilization of the active site loop in the Humicola lanuginosa lipase. Biochemistry 37:12375, 1998. 105. GH Peters, A Svendsen, H Langberg, J Vind, SA Patkar, PKJ Kinnunen. Glycosylation of Thermomyces lanuginosa lipase enhances surface binding, but does not signiﬁcantly inﬂuence the catalytic activity. Colloids Surf Sci B 26:125–134, 2002. 106. GH Peters, R Bywater. Computational analysis of chain ﬂexibility and ﬂuctuations in Rhizomucor miehei lipase. Protein Eng 12:747, 1999. 107. GH Peters, MØ Jensen, RP Bywater. Dynamics of the substrate binding pocket in the presence of a covalently attached inhibitor. J Biomol Struct Dyn 19:1–13, 2001. 108. GH Peters. The dynamic response of a fungal lipase in the presence of charged surfactants. Colloids Surf Sci B 26:84–101, 2002. 109. A Amadei, ABM Linssen, HJC Berendsen. Essential dynamics of proteins. Proteins 17:412–425, 1993.

146

Peters

110. T Ichiye, M Karplus. Collective motions in proteins; a covariance analysis of atomic ﬂuctuations in molecular dynamics and normal mode simulations. Proteins 11:205–217, 1991. 111. M Karplus, T Ichiye. Comment on a ﬂuctuation and cross correlation analysis of protein motions observed in nanosecond molecular dynamics simulation. J Mol Biol 263:120–122, 1996. 112. HJC Berendsen, S Hayward. Collective protein dynamics in relation to function. Curr Opin Struct Biol 10:165–169, 2000. 113. BL De Groot, DMF van Aalten, A Amadei, HJC Berendsen. The consistency of large concerted motions in proteins in molecular dynamics simulations. Biophys J 71:1707–1713, 1996. 114. A Kitao, N Go. Investigating protein dynamics in collective coordinate space. Curr Opin Struct Biol 9:164–169, 1999. 115. DW Miller, DA Argard. Enzyme speciﬁcity under dynamic control: a normal mode analysis of alpha-lytic protease. J Mol Biol 286:267–278, 1999. 116. TM Frimurer, GH Peters, MD Sørensen, JJ Led, OH Olsen. Assignment of side-chain conformation using adiabatic energy mapping, free energy perturbation, and molecular dynamics simulations. Protein Sci 8:25, 1999. 117. BK Andrews, T Romo, JB Clarage, BM Pettitt, GN Phillips Jr. Characterizing global substrates of myoglobin. Structure 6:587–594, 1998. 118. LSD Caves, JD Evanseck, M Karplus. Locally accessible conformations of proteins: multiple molecular dynamics simulations of crambin. Protein Sci 7:649–666, 1998. 119. GH Peters, RP Bywater. Inﬂuence of a lipid interface on protein dynamics in a fungal lipase. Biophys J 81:3052–3065, 2001. 120. MØ Jensen, TR Jensen, K Kjaer, T Bjørnholm, OG Mouritsen, GH Peters. Orientation and conformation of a lipase at an air–water interface studied by molecular dynamics simulations. Biophys J 83:98–111, 2002. 121. TR Jensen, MØ Jensen, N Reitzel, K Balashev, GH Peters, K Kjaer, T Bjørnholm. Water in contact with extended hydrophobic surfaces: Direct evidence of weak dewetting. Phys Rev Lett 90:086101–086400, 2003. 122. MØ Jensen, OG Mouritsen, GH Peters, Interfacial water structure at an alkane and alcohol monolayer studied by molecular dynamics and x-ray scattering, submitted. 123. This project is carried out in collaboration with NPH Møller and OH Olsen from Novo Nordisk A/S. 124. EH Fisher, H Charbonneau, NK Tonks. Protein tyrosine phosphatases. Science 253:401, 1991. 125. T Hunter. Protein kinases and phosphatases: the yin and yang of protein phosphorylation and signalling. Cell 80:225, 1995. 126. GH Peters, TM Frimurer, OH Olsen. Electrostatic evaluation of the signature motif (H/V)CX5R(S/T) in protein-tyrosine phosphatases. Biochemistry 37: 5383, 1998. 127. GH Peters, TM Frimurer, JN Andersen, OH Olsen. Molecular dynamics simulations of protein-tyrosine phosphatase 1B: I. Ligand-induced changes in the protein motions. Biophys J 77:505, 1999.

Computer Simulations 128.

129.

130.

131.

132.

133.

147

GH Peters, TM Frimurer, JN Andersen, OH Olsen. Molecular dynamics simulations of protein-tyrosine phosphatase 1B: II. Substrate–enzyme interactions and dynamics. Biophys J 78:2191, 2000. JN Andersen, OH Mortensen, GH Peters, PG Drake, LF Iversen, OH Olsen, HS Andersen, NK Tonks, NPH Møller. Structural and evolutionary relationships among protein tyrosine phosphatase domains. Mol Cell Biol 21:7117– 7136, 2001. LF Iversen, HS Andersen, S Branner, SB Mortensen, GH Peters, K Norris, OH Olsen, CB Jeppesen, BF Lundt, W Ripka, KB Møller, NPH Møller. Structure-based design of a low molecular weight, nonphosphorus, nonpeptide, and highly selective inhibitor of protein-tyrosine phosphatase 1B. J Biol Chem 275:10300, 2000. LF Iversen, HS Andersen, KB Møller, OH Olsen, GH Peters, S Branner, SM Mortensen, TK Hansen, J Lau, Y Ge, DD Holsworth, MJ Newman, NPH Møller. Steric hindrance as basis for structure-based design of selective inhibitors of protein-tyrosine phosphatases. Biochemistry 40:14812–14820, 2001. GH Peters, LF Iversen, S Branner, HS Andersen, SB Mortensen, OH Olsen, KB Møller, NPH Møler. Residue 259 is a key determinant of substrate speciﬁcity of protein-tyrosine phosphatases 1B and a. J Biol Chem 275:18201, 2000. GH Peters, LF Iversen, HS Andersen, OH Olsen, NPH Møller. Molecular modelling of wild-type and mutant protein tyrosine phosphatases: residue 259 determines the ﬂexibility of glutamine 262. Submitted.

7 Calculations of Ionization Equilibria in Proteins Andrey Karshikoff Karolinska Institutet Huddinge, Sweden

Functional properties of proteins result from a delicate balance of diﬀerent type of interactions. Among them, electrostatic interactions are a factor, whose importance becomes evident at any pH-dependent property, such as pH regulation of enzyme activity and substrate/inhibitor binding, pH dependence of protein stability, and many others. Electrostatic interactions in proteins cannot be measured directly. That is why their correct theoretical description is of key importance for studies on structure–function relationship in proteins. There is no doubt that engineering studies are a powerful instrument for a better understanding of the role of electrostatic interactions in functional properties of protein. On the other side, a correct understanding of electrostatic interactions and their interplay with all other interactions in proteins is needed for an adequate design of molecules with desired properties. A good example of this need is the charge reversal mutations, which are expected to stimulate the binding of charged substrates. Failures to increase the binding of, say, a negatively charged substrate by mutating a negative group to a positive one have been clearly explained on the basis of theoretical calculations 149

150

Karshikoﬀ

(1). There are also other questions that can be answered by synergic eﬀorts of experimental protein engineering and theoretical modeling. For instance, do salt bridges stabilize (2) or destabilize (3) native structure of proteins? Elevation of thermal stability of enzymes is also an interesting objective of engineering studies. Good natural sources for understanding the factors regulating the thermal stability of proteins are the thermophilic and hyperthermophilic organisms (4). Electrostatic interactions and salt bridge formation seem to have a dominant role for the enormous thermal stability of the enzyme from thermophiles, some of which are active at temperatures of boiling water or even higher (5). Again, experimental studies based on engineered proteins together with the theoretical predictions can provide the volume of knowledge needed to understand the origin of thermal stability (6). A variety of other examples can be given, where electrostatic interactions and protonation/deprotonation equilibria in proteins can be examined by means of a combined eﬀort and protein engineering. Two main issues of electrostatic interactions and protonation/deprotonation equilibria in proteins will be considered below. The ﬁrst one is that ionizable groups in proteins may not follow the Henderson–Hasselbalch equation. The titration curves of such groups have irregular ionization. The analysis of experimental data that indicate pH dependence of a certain observable usually begins with eﬀorts to identify the titratable group or groups, whose ionization regulates or is responsible for this dependency. It will be illustrated that the formal assignment of a pK value to the midpoint of an observed dependency may be misleading if irregular ionization occurs. The second issue concerns electrostatic interactions in denatured (unfolded) state of proteins. Not much work has been done on studying the ionization properties of denatured proteins, probably because proteins in this state are considered as ‘‘dead’’ molecules, deprived of biological activity. Often, electrostatic interactions in denatured proteins are set to zero, i.e., considered as irrelevant, which is an oversimpliﬁcation and is applicable to a very few cases. It will be shown that in order to predict stability of native protein as a function of pH, electrostatic interactions in denatured state have to be taken into account. 1

PROTONATION/DEPROTONATION EQUILIBRIA IN PROTEINS

Most often, theoretical prediction of electrostatic interactions is focused on the calculation of measurable quantities, whose values can be directly correlated to electrostatic interactions. Such quantities are, for instance, the ionization equilibrium constants (or their equivalents, the pK values) of the individual titratable groups in proteins.

Calculations of Ionization Equilibria in Proteins

151

The degree of deprotonation, h, of a titratable amino acid side chain in solution at standard condition is given by the Henderson–Hasselbalch equation: h¼

10ðpHpKÞ 1 þ 10ðpHpKÞ

ð1Þ

where pK is the negative logarithm of the dissociation constant. The pH dependence of h has a sigmoidal character with inﬂection point at h=0.5, where pK equals pH. This simple feature of Eq. (1) is widely used for the determination of pK of titratable groups in proteins, just by ﬁtting the experimental data to Eq. (1). The standard free energy of deprotonation is related to the dissociation constant by DG0p!d=2.3RTpK. If the protonation is coupled with some interactions with the environment that diﬀer from the standard conditions, one writes DGp!d ¼ DG0p!d 2:3RT pH þ DGenv

ð2Þ

where DGenv corresponds to the free energy change due to these interactions. The straightforward interpretation of Eq. (1) holds until DGenv is a linear function of pH. Otherwise, the ionization curve h(pH) may have nontrivial character and, in general, pK is no longer equal to pH when h=0.5. In such cases, ﬁtting of experimental data to Eq. (1) is inappropriate. In proteins, DGenv(pH) for the individual titratable groups can be far from linear. 1.1

Factors Regulating Ionization Equilibria in Proteins

The fundamental assumption of the theory of protonation/deprotonation equilibria in proteins is that these equilibria are regulated only by the electrostatic environment created by the protein molecule and the surrounding solvent, i.e., DGenv=DGel. A few factors determining DGel can be distinguished. The ﬁrst one arises from charge–charge interactions between titratable groups themselves. That is, the process of protonation of a given group is realized in the electric ﬁeld created by the other titratable groups of the molecule. The magnitude of these interactions depends on the protonation state of the interacting groups and is obviously pH-dependent. Electrostatic interaction of the titratable groups with protein permanent charges is the second factor contributing to DGel. The charges that can be considered permanent are the polypeptide backbone dipoles, the partial charges of the polar groups, or metal ions bound to the protein molecule. These interactions are pH-independent. The separation of the charge–charge interaction into pH-dependent and pH-independent is formal and aims simplicity of the theoretical formulations and analysis. Desolvation eﬀect is the third factor

152

Karshikoﬀ

that essentially inﬂuences the protonation/deprotonation equilibria in proteins. This is the part of DGel that corresponds to the energy of transfer of a titratable group, or a model compound with experimentally known pK value, from solvent (standard conditions) to its location in the protein. The model compound of a given titratable side chain is the corresponding amino acid with the alpha amino and the carboxyl groups substituted by blocking groups. This energy is always positive (unfavorable) and is often called ‘‘desolvation penalty.’’ The ability of native proteins to adopt diﬀerent conformations, or in other words the conformational ﬂexibility of proteins, is the fourth factor regulating the ionization behavior of titratable groups. This factor is not electrostatic in nature, but it has a signiﬁcant inﬂuence on the other three factors and vice versa; electrostatic interactions control the conformational ﬂexibility by stabilizing one or another conformation at given conditions. Bohr eﬀect in hemoglobins is an example for conformational change controlled by electrostatic interactions. The factors determining protonation/deprotonation equilibria in proteins are mutually dependent, and the description of any of the factors cannot be done out of the context of all other factors. To simplify the theoretical analysis, the factors depending on electrostatic interactions will be considered separately from the conformational ﬂexibility. Without loosing generality, the theoretical considerations will be made on the basis of a ﬁxed, nonﬂexible structure. Then the interplay between electrostatic factors and conformational ﬂexibility will be analyzed. 1.2

Ionization Curves of Titratable Groups in Proteins

The change of the protonation/deprotonation equilibrium of a given titratable group can be analyzed by means of a thermodynamic cycle as shown in Fig. 1A. Each titratable group is considered as an appropriate model compound, which is transferred from solution to its place in the protein (S P). When the group is protonated, the energy of transfer is DGpS!P, while for the deprotonated form of the group, the energy of transfer is DGdS!P. The protonation/deprotonation equilibrium of the model compound in solution is characterized by its standard free energy DGSp!d=2.3RTpKmod. The equilibrium constant, pKmod, can be a subject of calculations or can be determined Figure 1 A: Thermodynamic cycle used for the calculation of the pK values of titratable groups in proteins. B: Lattice representation of the protein as a dielectric material, ep, immersed in the medium of the solvent with dielectric constant es. Dielectric constant, charge value, and ionic strength are assigned to each node of the lattice.

Calculations of Ionization Equilibria in Proteins

153

154

Karshikoﬀ

experimentally. According to the thermodynamic cycle, the free energy of deprotonation in the protein molecule is related to DGSp!d as follows: DGPp!d ¼ DGSp!d þ ðDGS!P DGS!P Þ d p

ð3Þ

The diﬀerence between transfer energies, DGsol=(DGdS!PDGpS!P), is the desolvation penalty responsible for the shift of the protonation/deprotonation equilibrium of the titratable group towards stabilization of its uncharged form. Taking into account the inﬂuence of the charge–charge interactions, DGtc, and the inﬂuence of protein permanent charges, DGpc, one obtains DGPp!d ¼ 2:3RT ðpKmod pHÞ þ DGsol þ DGpc þ DGtc

ð4Þ

For a ﬁxed protein structure, the terms DGsol and DGpc depend only on the permittivity of the surrounding medium and on the distribution of the protein permanent charges, i.e., they are pH-independent. Being pH-independent, these two factors can be considered as pK corrections: DpKsol=DGsol/2.3RT and DpKpc=DGpc/2.3RT. Combining the pH-independent parts of Eq. (4), one obtains DGPp!d ¼ 2:3RT ðpKint pHÞ þ DGtc

ð5Þ

where pKint=pKmod+DpKsol+DpKpc. The deﬁnition of pKint (intrinsic pK) given by Tanford and Kirkwood (7) has the meaning of pK of a given group if all other titratable groups in the protein were in their neutral form. The term DGtc depends on the protonation state of all other titratable groups, so that it is pH-dependent. It is convenient to express h in terms of statistical sum: eðDGP!d =RT Þ P

h¼

1 þ eðDGP!d =RT Þ P

ð6Þ

where the denominator represents the partition function of a system with two states (protonated and deprotonated). After substituting DGPp d from Eq. (5), the above equation can be written as: h¼

e2:3ðpHpKint ÞDGtc =RT 10ðpHpKint ÞDGtc =2:3RT ¼ 1 þ e2:3ðpHpKint ÞDGtc =RT 1 þ 10ðpHpKint ÞDGtc =2:3RT

ð7Þ

The purpose of this formal derivation of Eq. (7) is to illustrate that h may depend on pH in a more complicated manner than that predicted by Eq. (1). Obviously, if DGtc=0, Eq. (7) becomes identical to Eq. (1). If DGtc=const., the inﬂection point of h(pH) is shifted with a magnitude of DGtc/RT pH units, while if DGtc linearly depends on pH, the titration curve changes

Calculations of Ionization Equilibria in Proteins

155

also its slope. In all these cases, half-protonation (h=0.5) occurs at the pH corresponding to the inﬂection point of the titration curve. This point is deﬁned as pK1/2. In cases of cooperative ionization, DGtc(pH) becomes a nonlinear function of pH and the above simple rules cannot be applied. The calculation of DGtc(pH), and respectively h(pH) of a given titratable group, is a complex task because it depends on and at the same time inﬂuences the protonation/deprotonation equilibria of all other titratable groups in the protein. 1.3

Calculation of Protonation/Deprotonation Equilibria in Proteins

Usually, individual titratable groups are assumed to have two states: protonated and deprotonated. On the other hand, deprotonated state of histidines has two tautomers: Nq2–H and Ny2–H. Also, the hydrogen atom in the protonated form of the glutamic and aspartic acids can be bound to one of the two carboxyl oxygens. In addition, titratable groups are usually involved in hydrogen bonds with the polar groups from the protein environment. Upon deprotonation (or protonation), local but important changes may occur. Hydrogen bonds may be broken, donor–acceptor partnership can be changed, and hence hydrogen bond networks may be rearranged. These eﬀects go together with the reorientation of the surrounding polar groups and in this way may inﬂuence the protonation/deprotonation equilibria of other titratable groups. These eﬀects can be taken into account in diﬀerent ways. The simplest way is to introduce alternative proton position, say, by using stereochemical criteria only. As far as the occupancy of the alternative proton positions depends on pH for both titratable and nontitratable polar groups, it is convenient to formally distinguish pH-sensitive sites (polar groups, such as threonines, water molecules participating in hydrogen bond networks, etc.) and titratable sites (asp, glu, his, etc.). The introduction of the alternative proton locations means that the individual sites (titratable or pHsensitive) may have more than two states. A general theoretical approach that treats sites with multiple states has been elaborated earlier by Spassov and Bashford (8) (see also Refs. 9 and 10 for review). 1.3.1

Microscopic Protonation/Deprotonation Equilibria

Instead with two states (protonated and deprotonated), a given site (titratable or pH-sensitive) can be described by a set of n microstates Sa (a=0,1,. . .,n1). The term microstate is formally introduced to distinguish from protonated or deprotonated state. Diﬀerent microstates can be diﬀerent rotamers or tautomers. Each microstate, a, is characterized by a certain number of titratable

156

Karshikoﬀ

hydrogens, ma. Usually, ma=1, but for histidines, ma=2. As far as there is no rule in ordering the states, the choice of the reference state is in fact arbitrary and does not aﬀect the ﬁnal results or validity of the derived equations. In the further considerations, the S0 will be used as reference state. The equilibrium of the microstates within a single group is determined by the microscopic equilibrium constants, K Aa, or equivalently by pK aA: pKaA ¼ logKaA ¼ log

½Sa þ Dra pa ½H ¼ log þ Dma pH; ½S0 p0

ð8Þ

where pa is the population of state Sa (Spa=1) and Dma=m0ma. For transitions between states with equal proton content, Dma=0. Consider a single titratable group, for instance tyrosine, as a model compound free in solution. Assume also that its protonated form has two microstates corresponding to the two most populated orientation of the hydroxyl group. These states are experimentally indistinguishable and a single macroscopic pKmod is observed. The two microstates may have diﬀerent populations when this titratable group is in a protein molecule. This diﬀerence may arise from electrostatic interactions with the protein environment or from participation of the hydroxyl group in hydrogen bonds. Therefore the microscopic equilibrium constants, or the pK Aa,mod values, should be used in the thermodynamic cycle (Fig. 1) rather than the experimentally observed pKmod. The relation between macroscopic and microscopic pK is given in detail in Ref. 11 (see also Refs. 12 and 13). It is reasonable to assume that all microstates of protonated species of a model compound are equally populated. For instance, each of the carboxyl oxygens is protonated with a probability of 0.5. In this case, the transition from one protonated microstate, say S0, to another protonated microstate, Sa, will be characterized by Dma=0 and Sa/S0=1. According to Eq. (8), pK Aa,mod=0 for this transition. The same is valid for the deprotonated states. It should be noted that histidine tautomers are not equally populated. The introduction of microstates requires the reconsideration of Eq. (4) and certain adjustment of the terminology. Eq. (4) gives an expression for the free energy of the transition from protonated (reference state) to deprotonated state. Some groups may have more than one protonated (or deprotonated) microstate. Moreover, polar groups have only protonated states. Eq. (4) holds in all these cases; however, it must be rewritten as follows: l DGia ¼ 2:3RT ðpKia;mod Dmia pHÞ þ DGia;sol þ DGia;pc þ DGia;tc :

ð9Þ

Here, DGia is the free energy of the transition from the reference state, Si0, to state Sia of a titratable or pH-dependent site i in the protein molecule. The multiplier Dmia indicates whether during transition, Si0!Sia, deprotonation

Calculations of Ionization Equilibria in Proteins

157

occurs (Dmia=1) or does not occur (Dmia=0). Consider a transition of polar group from Si0 to an arbitrary state Sia. From the equality of microstate populations, it follows from Eq. (8) that pKAia,mod=0. Taking into account that Dmia=0 (no change of the protonation state of a polar group takes place), Eq. (9) becomes DGia ¼ DGia;sol þ DGia;pc þ DGia;tc :

ð10Þ

The transition Si0!Sia is regulated by the changes of desolvation energy, DGia,sol, electrostatic interactions with the permanent charges, DGia,pc, and titratable sites, DGia,tc. DGia depends on pH via DGia,tc. 1.3.2

Electrostatic Interactions

Each atom, k, of a certain titratable or pH-sensitive group, i, is characterized by partial charge, qia(k), which depends on the chemical nature of the groups. At each microstate, Sia, of this group, there is a distribution of charges Uia. The work needed to situate all charges on their places on the atoms comprising the group is given by the self energy of the distribution Uia: Gia;self ¼ 1=2

mi X

BðUia ;kÞqia ðkÞ:

k

The sum in the above expression is over the number of all partial atomic charges, mi, of the site i. B(Uia,k) is the electrostatic potential at location k created by all charges within Uia. According to the deﬁnition of desolvation energy, this is the energy of transfer of titratable or pH-sensitive site from standard condition (model compound in solution) to its place in the protein molecule. As far as only electrostatic interactions are considered, desolvation energy is then the diﬀerence between self energies when the titratable or pHsensitive site is considered as a model compound in solution (S) and when it is at its location in the protein molecule (P): DGS!P ia;sol ¼ 1=2

X

ðBP ðUia ; kÞ BS ðUia ;kÞÞqia ðkÞ

k

The contribution of the desolvation energy for the transition Si0!Sia in protein can be obtained from the thermodynamic cycle shown in Fig. 1: DGia;sol ¼ 1=2

X ½ðBP ðUia ; kÞ BS ðUia ; kÞÞqia ðkÞ ðBP ðUi0 ; kÞ k

B ðUi0 ; kÞÞqi0 ðkÞ S

ð11Þ

158

Karshikoﬀ

The contribution of the permanent charges, DGia,pc, is calculated as: X DGia;pc ¼ ðBðUia ; kÞ BðUi0 ; kÞÞqpc ðkÞ;

ð12Þ

kafpcg

where summation is over all permanent charges of the protein, {pc}. It is convenient to introduce a microscopic intrinsic pKAia,int of the transition Si0!Sia analogously to that used in Eq. (5): l A ¼ pKia;mod þ ðDGia;sol þ DGia;pc Þ=2:3RT pKia;int

ð13Þ

The value of pKAia,int at given conditions (temperature and ionic strength) depends only on the protein structure (i.e., on how the i-th group is situated in the protein) but not on the charge–charge interactions with the titratable sites. The transitions {Si0!Sia} of a given group i in protein molecule occur under the inﬂuence of the electrostatic ﬁeld of all other titratable and pHsensitive sites. This inﬂuence is accounted by DGia,tc (the last term of the righthand side of Eqs. (9) and (10)). Electrostatic interaction between site i in microstate a and site j i in state h is given by Wia; jh ¼

mi X

BðUjh ; kÞqia ðkÞ

ð14Þ

k

where the sum is taken over all atoms, mi, of site i with partial charges, qia(k). B(Uia,k) is the electrostatic potential at the location of atom k created by the charge distribution, Ujh, of site j. The microstate Sb of group j is pH-dependent, and its population depends on the other titratable and pH-sensitive sites in the very same way as the population of Sa of group i. Thus to calculate electrostatic interactions between sites i and j, one needs to know the populations of the microstates Sa and Sb at a given pH. In fact, according to Eq. (8), the determination of the equilibrium populations of the microstates of the titratable and pH-sensitive sites includes the task for calculation of pK values. Therefore it is more convenient to transform the task for calculation of pK values to a task for determination of the microstate populations as a function of pH. 1.3.3

Populations of Microstates of Titratable and pH-Sensitive Sites

The population of the microstates of the individual titratable and pH-sensitive sites is calculated in terms of statistical physics. The solution of the task for multiple site titration in proteins has been given by Bashford and Karplus (14). Later, it has extended for the more general case including redox sites and other properties (8). Here, a modiﬁed expression for the population of the individual microstates will be given, which is more convenient for the problem considered. The probability of certain site i to be in a microstate Sa is given by

Calculations of Ionization Equilibria in Proteins

159

the Boltzmann weighted sum X dðxi ; aÞexpðDGðxÞ=RT Þ pia ¼

fxg

X

expðDGðxÞ=RT Þ

:

ð15Þ

fxg

The sums in Eq. (15) are taken over all possible states {x} that the protein molecule can adopt. One state of the protein molecule is described by the vector x=(x1,. . .xi,. . .xM), which contains M elements. The number of elements corresponds to the number of sites (titratable and pH-sensitive). Each element x i indicates the microstate of the ith site, i.e., x i = 0,1,Sia,. . .ni1, if site i has ni microstates. The function d(xi,a) is deﬁned so that d(xi,a)=1 if xi=a and d(xi,a)=0 if xi p a. If one considers only titratable sites with two states each, the element xi will have values 1 or 0 depending on whether the site i is in protonated or deprotonated state. In this case, d(xi,a)=xi and Eq. (15) becomes identical to that introduced by Bashford and Karplus (14). The energy, DG(x), of the system in state x is given by: XX X DGðxÞ ¼ 2:3RT ðpKxli ;int Dmxi pHÞ þ 1=2 Wxi ;xj ð16Þ i

i

j pi

where indices i and j enumerate all titratable and pH-dependent sites. In the above expression, pKAxi,int and Wxi,xj are deﬁned by Eqs. (13) and (14), respectively. After substituting DG(x) from Eq. (16) into Eq. (15), one obtains the ﬁnal expression for probability site i to be in state Sa as a function of pH. It can be illustrated that if the system has only two states (the vector x has only one element x=1 or 0), Eq. (17) reduces to Eq. (7). 1.4

Continuum Dielectric Model

The factors regulating protonation/deprotonation equilibria in proteins are deﬁned by Eqs. (11), (12), and (14). The solution of all of these equations requires calculations of the electrostatic potential, B(Uia,k), created by a set of atomic charges, Uia, at the position of atom k. In order to calculate B(Uia,k), one assumes that both the three-dimensional structure of the protein of interest and the values of the partial charges are known. There are diﬀerent methods for the calculation of B(Uia,k). Among them, the continuum dielectric model is probably the most frequently used approach. It is attractive because of its simplicity and the few parameters needed to perform the calculations. In this model, the protein molecule and the surrounding solvent are treated as two homogeneous media characterized by macroscopic quantities such as permittivity and charge density. The protein is represented as a rigid body with low dielectric constant (ep=2 to

160

Karshikoﬀ

20) and ﬁxed charge distribution, Up(r), which is immersed in a high dielectric medium (esc80, assuming aqueous solution). The linearized Poisson–Boltzmann equation is solved for this system: jðqðrÞjBðrÞÞ n2 BðrÞ þ 4kUp ðrÞ ¼ 0:

ð17Þ

The ionic strength of the solution is presented in Eq. (17) through the Debye parameter j. A detailed derivation of the above equation can be found in Ref. 15. The nonlinear form of the Poisson–Boltzmann equation can also be used; however, it has been shown that for physiological ionic strength, the two forms of the equation give practically equal results (16). For an arbitrary, nonanalytical, shape of the dielectric boundary (the protein–solvent interface), Eq. (17) is solved numerically. The most popular and widely used routine is the ﬁnite diﬀerence method, ﬁrst proposed for the calculation of electrostatic interactions in proteins by Warwicker and Watson (17). The protein is placed in a box with a three-dimensional grid forming a cubic lattice. Values of dielectric constant (ep or es), charge, and ionic strength are assigned to each grid point (Fig. 1B). The ﬁnite diﬀerence formula for the calculation of the potential at position k is P k qi Bi þ 4kq h : Bk ¼ P qi þ n2 h2 The sums in the above expression run over the 6 neighboring grid point i (in the planar representation in Fig. 1B they are 4), h is the grid spacing, and qk is the charge in the volume belonging to the grid point k. As can be seen, the potential Bk depends on the potentials at the neighboring grid point, Bi, which are also unknown. Therefore the ﬁnite diﬀerence formula is solved iteratively. A comprehensive theoretical background of the computational procedure is given in a number of works (18,19). The principal scheme of the method has been further elaborated by the introduction of the focusing technique (18) and by a multigrid technique (20,21). Alternative models can also be used for the calculation of electrostatic interactions in proteins. The microscopic model proposed by Warshel et al. (22,23) considers the protein molecule on an atomic level, which makes it the most rigorous method. All atomic partial charges and polarizabilities are taken explicitly into account. In this way, the introduction of a dielectric constant for the protein molecule is avoided. This approach has been extended by the introduction of Langevin dipoles to account for the reaction of the surrounding solvent molecules (24). Recently, another approach has been successfully introduced, namely, the generalized Born model (25–27). Each atom in the protein molecule is

Calculations of Ionization Equilibria in Proteins

161

represented as a sphere with a given radius and charge. The interior of the atom is considered as a uniform dielectric material. Similarly to the continuum dielectric model, the protein molecule is surrounded by the high dielectric medium of the solvent. The electrostatic interactions are calculated as the work needed to create a given charge distribution. Onufriev et al. (27) modiﬁed this model by introducing an additional function, which depends on the atomic radii and distances. They demonstrated that, with the exception of some deeply buried titratable sites, this modiﬁcation gives practically identical results in comparison with the dielectric continuum model, but the calculations are essentially faster. 1.5

Protein Dielectric Constant

The key parameter of the continuum dielectric model is the protein dielectric constant. While the dielectric constant of the solvent can be measured, in the vicinity and inside the protein molecule, its value can only be assumed. Lamm and Pack (28) have shown that the dielectric constant at the protein– solvent interface can be reduced to a value of about 30. The dielectric constant inside the proteins is usually assumed to be homogenous with a value between 2.5 and 4 (29). Values between 10 and 20 have also been proposed (30 31 32). This large diﬀerence in the evaluated protein dielectric constants illustrates the fact that the problem of its determination is far from being solved. Inhomogeneous dielectric constant has been considered as a possible solution of the problem. For instance, a high dielectric constant can be attributed to regions in proteins containing polar side chains (33). Sharp et al. (34) have proposed a calculation of the local dielectric constant based on Clausius–Mossotti equation. Other equations (Debye, Onsager, and Kirkwood) that relate the microscopic properties, such as polarizability and dipole moment, to the macroscopic dielectric constant are also known. However, they all treat homogenous matter, while protein is an inhomogeneous matter. An attempt to treat the protein molecule as an inhomogeneous dielectric medium has also been made (35). 1.6

Computational Strategies

A direct application of the statistical mechanical calculations (Eq. (15)) is limited because the CPU time grows exponentially with the number of the titratable and pH-sensitive sites in the protein molecule. Nowadays, computational facilities allow the summations in Eq. (15) to be performed in a reasonable time for about 25 to 30 sites. Apparently, the use of Eq. (15) easily becomes unrealistically time consuming even for small proteins. There are methods, however, that can be used to overcome this obstacle by approximating the rigorous treatment.

162

Karshikoﬀ

A large part of the titratable groups in proteins do not participate in cooperative titration. The pK shift of such a group caused by charge–charge interactions can be estimated from the mean electrostatic potential created by the rest of the titratable groups. The protonation state of all groups is determined iteratively via pK1/2 calculations (36). At each iteration step, pK1/2 of a given group is obtained by Eq. (7), where DGtc is calculated as a function of the average charges (degree of deprotonation) of other groups determined from their pK1/2. These pK1/2 values are taken from the previous iteration. At the ﬁrst step, pKmod are used. This approach (mean ﬁeld approximation) is very eﬀective, but it is inappropriate for sites participating in cooperative interactions (37). For those sites, the iterations converge slowly or do not converge. A combination between mean-ﬁeld approximation and statistical mechanical calculations has been proposed to reduce the complexity of the task (38). Groups that have pKint far from the pH region of interest can be considered as being ﬁxed in appropriate protonation state and can be excluded from statistical calculations (37). This stripping facilitates the calculations (especially at extreme pH values) but often does not reduce the number of sites enough for a direct application of Eq. (15). Monte Carlo simulation is a powerful approach that can be used for pK calculations (39). The accuracy of Monte Carlo methods depends on the length of the simulation and the speciﬁcity of the system. If the protein contains pairs or clusters of strongly interacting groups with cooperative ionization, a very long simulation is needed for achieving reliable estimates for the protonation states of those groups. This can be avoided by introducing some modiﬁcations in the standard algorithm (39,40). Computations can be essentially speeded up without diminution of the accuracy by applying clustering methods (8,41,42). In all these methods, the strongly interacting (or closely situated) sites are grouped in clusters. The degree of deprotonation of these groups is calculated either by Boltzmann statistics or by Monte Carlo simulation over the groups included in one cluster, while the inﬂuence of the rest of the groups is counted by mean-ﬁeld approximation. A computational strategy that combines rigorous application of Eq. (15), Monte Carlo calculations, and a clustering technique is detailed and described in Ref. 11. 2

PROTONATION/DEPROTONATION EQUILIBRIA AND CONFORMATIONAL FLEXIBILITY

All considerations made above were based on a single protein structure. The sensitivity of the protonation/deprotonation equilibria of titratable groups to

Calculations of Ionization Equilibria in Proteins

163

the conformational changes is one of the major problems of the accurate prediction of ionization equilibria in proteins. A number of approaches have been proposed to account for conformational ﬂexibility (12,41,43,44). Because of the complexity of the task, all methods are based on approximations aiming the reduction of its prohibitively large computational demands. A possible reduction of the problem is to collect an ensemble of conformations, which presumably represent the conformational variety of a protein molecule in solution. Antosiewicz et al. (45) have used sets of NMR structures for this purpose. An overall agreement of the calculated pK values with the experimental data was achieved. Moreover, the pK values averaged over the NMR structures were more accurate than those calculated from a single crystal structure. On the other hand, Khare et al. (46) demonstrated that in the regions where NMR and x-ray structures diﬀer signiﬁcantly, the pK values calculated on the basis of the x-ray structures are in better agreement with the experimental data. For solvent-exposed residues, however, NMR structures provide better agreement with the experimental data. These results suggest that the crystal contacts are one of the main sources of discrepancy between the calculated and observed pK values in general. A disadvantage of the calculations based on NMR models is that the side chain conformations are usually not a result of experimental observations, and that the assumption for equal weight of the individual models is too strong. An original method for analyzing the interplay of conformational ﬂexibility and pK calculations has recently been proposed by Georgescu et al. (47). In this method, continuum electrostatic and molecular mechanics force ﬁeld calculations are combined in Monte Carlo sampling procedure. Another technique for collecting of protein conformations is molecular dynamics (MD) simulation (43,48). A general result of combining MD and pK calculations is the overall improvement of the theoretically predicted pK values. However, discrepancies between experimental and calculated pK values remain most often for groups buried in the protein interior. One possible reason is the relatively short time of conformational sampling (32). This assumption has partially been conﬁrmed by 1-ns MD simulations combined with pK calculations for the structures of xylanase (49,50). It has been concluded that 500-ps simulation time can be considered as a lower limit when the goal is the prediction of the ionization behavior of proteins by means of trajectory averaging. This is illustrated in the upper panel of Fig. 2, where the time evolution of the pK value of Asp121 from Bacillus circulans xylanase is shown. It must be pointed out that the pK calculated from the x-ray structure (3.9) and after 1-ns MD simulation (3.6) are fairly close to each other and are both close to the experimental value. The excellent agreement with the experimental results in this case is in fact a lucky

164

Karshikoﬀ

Figure 2 Xylanase from B. circulans. Upper panel: snapshot pK values of Asp121 taken at each 5 ps. The time evolution of the average pK is given as a continuous line. Time is measured after 50-ps relaxation. The dashed line corresponds to the experimental pK value of 3.6 (Ref. 75). Lower panel: the time evolution of the average DpKsol due to desolvation (snapshot values are given with solid circles) and due to electrostatic interactions with the peptide dipoles DpK (snapshot values are given with open circles).

Calculations of Ionization Equilibria in Proteins

165

hit. However, it illustrates some typical relations between the factors contributing to the protonation/deprotonation equilibria in proteins. In Fig. 2 (lower panel), the change of pK due to desolvation, DpKsol, and due to the interactions with the protein permanent charges (in this case, peptide dipoles only), DpKpc, is plotted. After 500 ps, Asp121 undergoes a transition at which the contribution of the desolvation increases stabilizing its neutral form (increasing pK). At the same time, the energy of interactions with the peptide dipoles tends to compensate this eﬀect by stabilizing the charged form (reducing pK). This compensatory eﬀect is typical and reﬂects two features of proteins. The ﬁrst one arises from the chemical nature of proteins as polypeptides (51). The second one results from the fact that buried titratable groups are usually surrounded by appropriate polar environment. Another typical relationship between the conformational ﬂexibility and pK values is illustrated in Fig. 3. The pK of Lys52 (upper panel) from Bacillus agaradhaerens xylanase ﬂuctuates between two average values: around 14 and around 10.5. The occupancy of the protein conformers providing these pK values is approximately equal within the time period of 1 ns which results in a pK of f12—which is an expected value for lysines. No experimental data are available to validate this result; however, an interesting, yet speculative, conclusion can be drawn. In spite of the extreme sensitivity of the pK values regarding conformational ﬂexibility, the conformers can form a limited number of sets, each providing a single average pK value. In the lower panel of Fig. 3, pK snapshots of Asp21 from two molecules (A and B) forming the crystallographic asymmetric unit of B. agaradhaerens xylanase are shown. The molecules A and B should not diﬀer in solution, so that one expects identity of the ﬁnal pK values. In the case of molecule A, the protonation/deprotonation equilibrium of Asp21 is relatively stable during the time of simulation, providing a ﬁnal average pK value of 3.4. As seen in Fig. 3, the pK values of Asp21 calculated for the two molecules collapsed after 500 ps to average (over the second half of the simulation time) pK values of 4.2 and 3.8 for molecules A and B, respectively. The tendency to reduction of the diﬀerence between these two values suggests that at a longer MD simulation, they may converge into a single value. It must be pointed out that the above examples are illustrative and highlight the importance of conformational ﬂexibility in protonation/deprotonation equilibria in proteins. Other results could be shown, where the prediction of pK fails even for 1-ns MD simulation. Reasons for failures might be that MD simulation is performed for a ﬁxed protonation state of the protein, for instance, when all titratable groups are in their charged forms. Changes in protonation state of the protein will change the explored area of the conformational space.

166

Karshikoﬀ

Figure 3 Xylanase from B. agaradhaerens. Upper panel: snapshot pK values of Lys52, molecule A from the crystallographic asymmetric unit. Lower panel: snapshot pK values of Asp21, molecule A (solid circles) and molecule B (open circles), from the crystallographic asymmetric unit, respectively. In all cases, snapshots are taken at each 5 ps.

Calculations of Ionization Equilibria in Proteins

3

167

IRREGULAR TITRATION IN PROTEINS

The relation between conformational ﬂexibility and ionization equilibria of the individual titratable groups in proteins was considered for cases of noncooperative ionizations. In regions where the protein structure is rigid enough to prohibit essential changes of the conformations upon change of the ionization state of the molecule, cooperative titration of the groups belonging to this region may occur. If the deprotonation (or protonation) of two or more sites is cooperative, DGtc becomes a relevant factor determining the deprotonation function, h(pH). As it will be shown below, h(pH) can diﬀer essentially from the well-known sigmoidal character given by Eq. (1). The nonsigmoidal pH dependence of h is named irregular titration to distinguish it from the familiar h(pH) that follows Eq. (1). 3.1

Conditions for Irregular Titration

The desolvation of the titratable groups causes a signiﬁcant shift of their pK values. The importance of this factor has been ﬁrst pointed out by Warshel and Russell (22). The eﬀect of burial of the titratable sites is manifested by stabilization of their neutral form (pK increasing the values for the acidic groups and decreasing for the basic groups). Due to the desolvation energy, a group completely buried in the protein interior may shift its pK value up to 25 pH units (23). A well-known example for such an ‘‘unusual’’ pK shift is that of the hen egg white lysozyme active site Glu35, which has a pK value of 6.2. As it has been already mentioned, buried titratable group is usually surrounded by polar environment, so that a tendency of compensation of the desolvation penalty is present. This compensation is not necessarily complete. An example for incomplete compensation is given in Fig. 2. The interplay between desolvation penalty and charge–charge interactions between titratable groups is of particular interest. First, in salt bridges, these two factors have opposite eﬀect on the protonation/deprotonation equilibria. Second, in contrast to the inﬂuence of protein permanent charges, charge–charge interactions are pH-dependent. Thus clusters of titratable groups can be involved in strong pH-dependent charge–charge interactions, which can result in cooperative ionization behavior. Cooperative and nonsigmoidal titration has ﬁrst been theoretically obtained by Bashford and Gerwert (52). Another example of irregular (nonsigmoidal) titration is the ionization behavior of the lysine cluster in the constriction zone of some porins (53) and in bacteriorhodopsin (54). In addition, Alexov and Gunner (12) have theoretically shown that tautomerization leads to an irregular titration. It must be noted that these results should not be considered as ‘‘theoretical exercises’’ without connection to experimental observations (52).

168

Karshikoﬀ

A theoretical description of this phenomenon has been given by Yang et al. (41) for the case of two acidic groups. It has been extended for arbitrary pairs of groups by Koumanov et al. (11). Fig. 4 illustrates the ionization curves of two interacting groups [acidic–acidic (aa) and basic– basic (bb) pair]. If the groups do not interact (electrostatic interaction energies Waa=0 and Wbb=0), the ionization of the groups follows Henderson– Hasselbalch equation (Eq. (1) or Eq. (7)) with the pK values equal to pKint. For nonzero electrostatic interactions between the groups, the Henderson– Hasselbalch titration is violated which is manifested by the formation of a plateau due to the buﬀering eﬀect of the synchronous ionization of the pair. When the groups in the pair have equal pKint values, their ionization curves coincide. With the increase of Waa (Wbb), the length of the plateau increases but its midpoint remains at the pH where the groups are half-protonated. Hence the group remains half-deprotonated in a certain pH region. The separation of the inﬂection points of the two sigmoidal segments (pKV and pKW) is proportional to the energy of interactions between the groups. If the pKint values of the interacting groups diﬀer, the plateau is shifted to the level of higher degree of deprotonation for the group with lower pKint and vice

Figure 4 Titration of two interacting sites (acidic–acidic or basic–basic pair). Continuous line pKint of the groups are equal. In this case, titration curves coincide so that only one line is drawn. Dashed line: the interacting groups have diﬀerent pKint. pKV and pKW indicate the pH of the inﬂection points of the sigmoidal segments.

Calculations of Ionization Equilibria in Proteins

169

versa for the other group. In such cases, the titration curves of the partners are not identical (Fig. 4, dashed lines). Consider a pair of an acidic and a basic group. The concrete environment of the pair may induce shifts of the protonation/deprotonation equilibria of the groups, so that the pKint of the acidic group is higher than that of the basic group. This can occur if the acidic group is deeply buried in the protein interior. In such a case, irregular titration is also observed. The titration curves of the groups have a two-step sigmoidal form similar to that shown in Fig. 4. Unlike to aa and bb pairs, for an acid–base pair, the separation between the inﬂection points and the level of the plateau depend on both DpKint and W. The conditions for irregular titration are the following. First, pKint of the acidic group must be higher than pKint of the basic group. Second, the absolute value of the diﬀerence between DpKint and the pK shift due to the charge–charge interactions within the pair should not exceed 1.3 pH units. Third, the magnitude of the charge–charge interactions should correspond to a pK shift larger than 1.3 pH units. A detailed derivation of these conditions is given in Ref. 11. The conditions for irregular titration are mild. The energy involved in this eﬀect is about 2 kcal/mol, which is less than the charge–charge interaction energy in a salt bridge. The condition related to the rigidity of the protein structure is stronger. It requires that the acidic group remains buried in the protein interior during its deprotonation (ionization). If the protein environment allows conformational changes, diminishing the desolvation penalty of the acidic group upon ionization, the conditions for irregular titration can be broken. Two-step sigmoidal dependencies are often observed when ionization properties of proteins are investigated, for instance, by the pH dependence NMR chemical shift of the individual titratable groups (55). Consider again the system of two interacting acidic (or basic) groups with identical pKint (Fig. 4). At pH corresponding to the midpoint of the plateau, the total charge of the couple indicates departure of one proton from the system. Fitting to an appropriate sum of Henderson–Hasselbalch equations would give the same result at the midpoint; however, two pK values will appear (pKV and pKW). According to this ﬁt, the groups should be half-protonated at pH values equal to pKV and pKW, respectively. However, none of the groups is halfprotonated at these pH values. Moreover, half-protonation occurs not at a given pH value, but rather within a pH range, the magnitude of which depends on the electrostatic interactions between the groups. 3.2

Irregular Titration in Enzyme Active Site: An Example

Fitting to the Henderson–Hasselbalch equation is widely used for the identiﬁcation of the groups responsible for pH dependence of enzymatic activity,

170

Karshikoﬀ

substrate, or inhibitor binding. As it was just illustrated, this approach is misleading if irregular titration occurs. An interesting example for challenging of the traditional interpretation of experimental results based on ﬁtting to Henderson–Hasselbalch equation and the eﬀects of irregular titration is the substrate binding and proton abstraction from the alcohol substrate of Drosophila lebanonensis alcohol dehydrogenase. Studies on pH dependencies of the diﬀerent steps of the enzymatic reaction have shown that a group in the active site undergoes deprotonation with a pK of 6.8 to 7.5, depending on temperature (56,57). As seen in Fig. 5, the experimental observations excellently ﬁt the Henderson– Hasselbalch equation, revealing deprotonation of a single group with pK of 7.3. There are three groups in the active site of Drosophila alcohol dehydrogenase, which are suspected of having such a pK value: Tyr151, Lys155, and Ser138. The hydroxyl groups of Tyr151 and Ser138 interact via hydrogen bonds with the hydroxyl group of the Ca carbon of the substrate. Lys155 interacts with the O2V hydroxyl groups of the NAD+ ribose and is in the vicinity of Tyr155. After a profound analysis of a large number of ki-

Figure 5 Alcohol dehydrogenase from D. lebanonensis. pH dependence of the function 1/f2 (open circles) scanned from the publication of Winberg et al. (57). The continuous curve is obtained by ﬁtting of Winberg’s data to Eq. (1) using Origin program (Copyright 1997, Microcal Software Inc.) The pK value obtained from the ﬁtting is 7.3 (with Hill coeﬃcient 1.0).

Calculations of Ionization Equilibria in Proteins

171

netic data, it was concluded that the most likely candidate for the group with pK of 7.3 is Ser138 (57). Electrostatic calculations reveal a completely diﬀerent situation (58). Two residues in the active site, Tyr151 and Lys155, show irregular titration (Fig. 6). Lys155 is inaccessible to the solvent and due to large desolvation penalty, DpKsol for this residue is about 9. The eﬀect of the polar environment, DGpc, is opposite, so that, in terms of Eq. (13), the pKint value of Lys55 is 6.4 on average. Tyr151 is also inaccessible to the solvent; however, its polar environment completely compensates the desolvation penalty. The pKint value of this residue remains at about 10. As can be seen, the couple Tyr151– Lys155 satisﬁes the conditions for irregular titration. The strong electrostatic inﬂuence of the positive charge of Lys155 results in a tendency of stabilization of the charged form of Tyr151: at acidic pH, the degree of deprotonation of Tyr151 is between 0.2 and 0.3. At high pH, the inﬂuence of Lys155 diminishes because of the increase of its degree of deprotonation. This causes the stabilization of the neutral (protonated) form of Tyr 151 and, as a result, a reduction of h is observed at pH > 9 (Fig. 6). Such a ‘‘reversal’’ of h(pH) has also been obtained in other theoretical investigations (53,54). The irregular titration of these two residues suggests a diﬀerent understanding of the pH dependence of the catalytic reaction of Drosophila alcohol dehydrogenase. In Fig. 7, the net deprotonation of the active site of the en-

Figure 6 Alcohol dehydrogenase from D. lebanonensis. Degree of deprotonation of Lys155 (solid line) and Tyr151 (dashed line). The curves are average of the results corresponding to the two subunits of this enzyme.

172

Karshikoﬀ

Figure 7 Alcohol dehydrogenase from D. lebanonensis. Total degree of deprotonation of the active site groups Tyr151 and Lys155. The experimental data (open circles) scanned from the work of Winberg et al. (Ref. 57) are superimposed for comparison.

zyme is presented together with the experimental data of Winberg et al. (57). As can be seen, the theoretical curve follows the experimental points relatively well. The question arises, which interpretation of the experimental data is more reliable? The one illustrated in Fig. 5 is deduced from a ﬁt to the Henderson–Hasselbalch equation and suggests ionization of a single group with changing of the charge from 1 to 0. However, it ignores any possible coupling of the deprotonation of this putative group with other ionization processes in the protein molecule. The electrostatic calculations (Fig. 7) suggest cooperative ionization of two groups and change of the net charge from +1 to 0. The theoretical calculations also suggest a molecular mechanism for proton abstraction from the alcohol substrate through a proton relay chain (58). Without going into details of the molecular mechanism, it is worth noting that it involves rotamer pH dependence of a pH-sensitive site (O2V ribose hydroxyl of NAD+). The population of the rotamers of the O2V ribose hydroxyl as a function of pH is shown in Fig. 8. At neutral pH, all rotamers are approximately equally populated. Increasing the pH, the population of the rotamers, which interact with Tyr151 and Lys155, also increases and becomes practically equally populated, ensuring proton transfer from the alcohol substrate via Tyr151 to Lys155. At low pH, the proton relay chain

Calculations of Ionization Equilibria in Proteins

173

Figure 8 Alcohol dehydrogenase from D. lebanonensis. Rotamer population of the NAD+ ribose O2V hydroxyl group as a function of pH: the hydroxyl group does not donate hydrogen to both Tyr151 and Lys155 (line with dots), proton donor to Lys155 (solid line) and proton donor to Tyr151 (dashed line). The horizontal line indicates population of 1/3.

breaks because O2V ribose hydroxyl adopts orientation at which its hydrogen is not involved in interactions neither with Tyr151 nor with Lys155. The example of Drosophila alcohol dehydrogenase could be considered as a caution against a formal ﬁtting of experimental results to the Henderson–Hasselbalch equation. Another caution should also be made. Electrostatic calculations have been performed for two structures only, i.e., conformational ﬂexibility has practically been ignored. There are, however, experimental data (59) showing the invariance of the structure of the active site upon substrate binding. This can be considered as an evidence that conformational ﬂexibility is not relevant. In cases for which such evidences are not available, irregular titration obtained on the basis of a single protein structure can be as misleading as the formal ﬁtting of experimental data to Henderson–Hasselbalch equation. 4

ELECTROSTATIC INTERACTIONS IN DENATURED PROTEINS

Electrostatic interactions in denatured proteins are often considered as irrelevant for functional properties of proteins, such as enzyme catalysis. Indeed, substrate binding or the catalytic reaction is realized when the enzyme

174

Karshikoﬀ

is in its native state. On the other hand, the structural stability of native proteins is determined by noncovalent interactions, including electrostatic interactions, in both native and denatured states. In the case of pH-induced denaturation, electrostatic interactions play a prime role. The works of Oliveberg et al. (60–62) have given an experimental evaluation of the importance of electrostatic interactions in denatured state, showing, for instance, that the pK values of the acidic groups in unfolded barnase are on average with 0.4 pH units lower than those of model compounds. Other researchers have also noted that electrostatic interactions in denatured state inﬂuence protein stability (63). 4.1

Models

The easiest and often-used way of handling electrostatic interactions in denatured proteins is to ignore them. This null approximation can be justiﬁed only if electrostatic interactions are screened, for instance, by denaturing agent such as GdmCl. Otherwise, the assumption for zero electrostatic interactions is inapplicable for the prediction of quantities, such as the electrostatic term of unfolding energy (63). Schaefer et al. (31) have used the extended backbone and side chain conformation for the calculation of electrostatic interactions in denatured state. In this model, the titratable groups are characterized by the maximum solvent accessibility to the solvent reﬂecting the fact that they are fully hydrated in denatured state. On the other hand, the charge–charge distances are maximized in the extended conformation, which may lead to the underestimation of electrostatic interactions. Similar model has been proposed by Warwicker (64) and successfully applied for the calculation of the pHinduced denaturation of a synthetic leucine zipper (65). A hybrid approach has also been proposed to analyze the pH and ionic strength eﬀect of sperm whale apomyoglobin (66). A model based on the molecular mechanics and electrostatic calculations has been proposed by Elcock (67). The key point of the model is the artiﬁcial ‘‘swelling’’ of the protein molecule by the increase of the atom–atom distances corresponding to the minimum of the van der Waals interactions. Recently, a more general model of denatured state has been proposed by Zhou (68–70). In this model, denatured protein molecule is treated as a Gaussian chain immersed in a dielectric medium, whereas electrostatic interactions are calculated based on the Debye–Hu¨ckel theory. An approach based on the continuum dielectric model and ideologically very close to that of Zhou (68,69) has been proposed independently (71). The unfolded protein molecule is represented as a material with low dielectric constant, ep between 30 and 40, immersed in the high permittivity

Calculations of Ionization Equilibria in Proteins

175

medium of the solvent, es>ep. The shape of the dielectric cavity can be considered as an average over all possible conformations of a ﬂexible chain, which results in a sphere inside wherein most of the protein atoms reside. The radius of this sphere can be the radius of gyration (71) or the Stocks radius (70) of an unfolded protein. It is known that due to diﬀerences in desolvation energies, charges tend to be expelled from a medium with low e (the dielectric cavity) towards a medium with higher dielectric constant (the solvent) (23). Since charges of titratable groups belong to the protein moiety and due to the polypeptide chain ﬂexibility, it is reasonable to assume that titratable sites of a denatured protein in equilibrium are located on the surface of the molecule, i.e., on the surface of the dielectric cavity. The variety of conformers that unfolded protein can adapt is reﬂected by diﬀerent conﬁgurations of titratable sites on the surface of the dielectric cavity. As a ﬁrst approximation, one can assume random distributions of the titratable sites. An additional, but very important, constriction can be introduced. Because titratable groups have ﬁxed position in the protein sequence, distances between them cannot be arbitrary. For instance, the distances between two separated along the polypeptide chain sites can be larger than those between two adjacent in the sequence titratable sites. An algorithm for the generation of quasirandom distributions taking into account the inﬂuence of the protein sequence is detailed and described in Ref. 71. The strategy for pK calculations does not diﬀer from that described for calculation of protonation/deprotonation equilibria in native proteins. As long as the shape of the dielectric cavity is sphere, the Poisson–Boltzmann equation (Eq. (17)) can be solved analytically (72). A variant of the analytical solution of Eq. (17) adapted for proteins has been given by Tanford and Kirkwood (7). 4.2

Protonation/Deprotonation Equilibria and DG(pH) of Denaturation

Stability of proteins at given conditions is determined by the diﬀerence in the Gibbs free energies, DGu, between their folded and unfolded states. Following the concept that pH-induced changes of protein properties are predominantly due to changes of electrostatic interactions, one can derive an expression for the electrostatic term of free energy: Z pH DGu ðpH0 Þ ¼ 2:3RT ½Qu ðpHÞ Qn ðpHÞdpHþDG0 ; ð18Þ pH0

where Qn(pH) and Qu(pH) are protein net charges at native and unfolded states, respectively, and DG0 is the free energy at pH0. The net charge is a sum of the charge values of the individual titratable groups at given pH, which can

176

Karshikoﬀ

be obtained from the degree of deprotonation: qi=h for acidic and qi= 1h for basic groups. Thus, to calculate DGu(pH), one needs to know the protonation/deprotonation equilibria in both the native and denatured state. 4.2.1

Protonation/Deprotonation Equilibria in Denatured Proteins

Cooperative ionization of titratable groups in denatured state is not expected because of the ﬂexibility of the structure. Therefore it is more convenient to consider pK values rather than h(pH). The calculated and experimental pK values of some individual groups of barnase are compared in Table 1. The agreement between theory and experiment is fairly good. As noted by Oliveberg et al. (61), the pK values in denatured state are shifted from the standard values (pKmod), indicating nonzero electrostatic interactions. This comparison illustrates that the null approximation is not valid. An interesting result is that there are small but detectable diﬀerences between the calculated pK values for a given type of groups (see Table 1). These diﬀerences reﬂect the inﬂuence of the protein sequence on electrostatic interactions (71). This seems to be not an artifact of the calculations since very

Table 1 Group Asp8 Asp12 Asp22 Asp44 Asp54 Asp75 Asp86 Asp93 Asp101 Average Glu29 Glu60 Glu73 Average His18 His102

Barnase Calculated 3.31 3.20 3.17 3.33 3.28 3.09 3.27 3.15 3.28 3.23 3.78 3.64 3.62 3.68 6.42 6.35

Experiment

pKmod

3.50

4.0

3.70 6.59

4.4 6.3 6.3

Comparison of the calculated pK values (Ref. 71) with the experimental data (Ref. 61). The experimental value for His18 is taken from Ref. 74.

Calculations of Ionization Equilibria in Proteins

177

Figure 9 Free energy of denaturation as a function of pH for barnase (upper panel) and for NTL9 (lower panel). Open circles: experimental data scanned from the publications Refs. 61 and 76 for barnase and NTL9, respectively. Continuous line: DGu(pH) calculated with the model of Kundrotas and Karshikoﬀ (Refs. 71,77). Dashed line: DGu(pH) calculated with the null model. The constants DG0 (Eq. (18)) are chosen so that DG(pH 2.1)=0 for barnase and DG(pH 0)=2 kcal/mol for NTL9.

178

Karshikoﬀ

similar deviations of the pK values in denatured state of another protein have been experimentally observed (73). 4.2.2

Energy of Denaturation as a Function of pH

The null approximation means that the net charge in denatured state is calculated by Eq. (1) using standard pK values. The experimental observations (61,73) show that ionization equilibria of the titratable groups in denatured state diﬀer from the standard values. This gives a diﬀerence in Qu(pH) and hence in DGu(pH). The diﬀerence in the prediction of DGu(pH), when calculated with the null approximation and with the model accounting for electrostatic interactions, is illustrated in Fig. 9. In the upper panel of Fig. 9, DGu(pH) for barnase is compared with the experimental data. The change of DGu, experimentally measured and theoretically predicted, in the interval pH 1 to pH 6 is about 13 kcal/mol. Calculations based on the null approximation gave 23 kcal/mol, which is almost twice the experimental value. A similar overestimation of the free energy change is obtained for the N-terminal domain of the ribosomal protein L9 from Bacillus stearothermophilus (NTL9) when the null approximation is used: 4.4 kcal/mol vs. an experimental value of 2.4 kcal/mol (Fig. 9, lower panel). The theoretical calculations give a certain underestimation of DGu at the neutral pH region. On the other hand, the experimentally observed reduction of the protein stability at pH>8 is successfully predicted. At this pH region, the null approximation fails to predict whatever value or pH dependence of DGu. The above examples illustrate the importance of electrostatic interactions in denatured state. Neglecting their role may lead to incorrect prediction of protein stability. This is especially important for protein engineering studies aiming the design of protein with enhanced stability.

REFERENCES 1. 2. 3.

4.

J-K Hwang, A Warshel. Why ion reversal by protein engineering is unlikely to succeed. Nature 334:270–272, 1988. AC Tissot, S Vuilleumier, AR Fersht. Importance of two buried bridges in the stability and folding pathway of barnase. Biochemistry 35:6786–6794, 1996. S Pao-pin, U Sauer, H Nicholson, BW Matthews. Contributions of engineering surface salt bridges to the stability of T4 lysozyme determined by direct mutagenesis. Biochemistry 30:7142–7153, 1991. F Niehaus, C Bertoldo, M Kahler, G Antranikian. Extremophiles as a source of novel enzymes for industrial application. Appl Microbiol Biotechnol 51:711– 729, 1999.

Calculations of Ionization Equilibria in Proteins 5.

6.

7. 8.

9. 10.

11.

12.

13.

14. 15. 16. 17. 18.

19.

20. 21.

179

A Karshikoﬀ, R Ladenstein. Ion pairs and the thermotolerance of proteins from hyperthermophiles: a ‘‘traﬃc rule’’ for hot roads. Trends Biochem Sci 26:550–556, 2001. JHG Lebbink, V Consalvi, R Chiaraluce, KD Berndt, R Ladenstein. Structural and thermodynamic studies on a salt bridge triad in the NADP-binding domain of glutamate dehydrogenase from thermotoga maritima: cooperativity and electrostatic contribution to stability. Biochemistry 41:15524–15535, 2002. C Tanford, JG Kirkwood. Theory of titration curves. I. General equations for impenetrable spheres. J Am Chem Soc 79:5333–5339, 1957. VZ Spassov, D Bashford. Multiple-site ligand binding to ﬂexible macromolecules: separation of global and local conformational change and an iterative mobile clustering approach. J Comput Chem 20:1091–1111, 1999. GM Ullmann, EW Knapp. Electrostatic models for computing protonation and redox equilibria in proteins [review]. Eur Biophys J 28:533–551, 1999. MR Gunner, E Alexov. A pragmatic approach to structure based calculation of coupled proton and electron transfer in proteins. Biochim Biophys Acta 1458: 63–87, 2000. A Koumanov, H Ru¨terjans, A Karshikoﬀ. Continuum electrostatic analysis of irregular ionization and proton allocation in proteins. Proteins: Str Func Gen 46:85–96, 2002. E Alexov, MR Gunner. Incorporating protein conformational ﬂexibility into the calculation of pH-dependent protein properties. Biophys J 72:2075–2093, 1997. EG Alexov, MR Gunner. Calculated protein and proton motion coupled to electron transfer: electron transfer from QA–QB to QB in bacterial photosynthetic reaction centers. Biochemistry 38:8253–8270, 1999. D Bashford, M Karplus. pKa’s of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry 29:10219–10225, 1990. C Tanford. Physical chemistry of macromolecules. NY, London, Sydney: John Wiley & Sons, 1961. H-X Zhou. Macromolecular electrostatic energy within the nonlinear Poisson– Boltzmann equation. J Chem Phys 100:3152–3162, 1994. J Warwicker, NC Watson. Calculation of the electric ﬁeld potential in the active site cleft due to alpha-helix dipoles. J Mol Biol 157:671–679, 1982. I Klapper, R Hagstrom, R Fine, K Sharp, B Honig. Focusing of electric ﬁelds in the active site of Cu–Zn superoxide dismutase: eﬀects of ionic strength and amino-acid modiﬁcation. Proteins: Str Func Gen 1:47–59, 1986. A Nicholls, B Honig. A rapid ﬁnite diﬀerence algorithm, utilizing successive over-relaxation to solve the Poisson–Boltzmann equation. J Comput Chem 12: 435–445, 1991. H Oberoi, NM Allewell. Multigrid solution of the nonlinear Poisson–Boltzmann equation and calculation of titration curves. Biophys J 65:48–55, 1993. M Holst, RE Kozack, F Saied, S Subramaniam. Treatment of electrostatic effects in proteins: multigrid-based Newton iterative method for solution of the full nonlinear Poisson–Boltzmann equation. Proteins 18:231–245, 1994.

180

Karshikoﬀ

22. A Warshel, ST Russell. Calculations of electrostatic interactions in biological systems and in solutions. Q Rev Biophys 17:283–422, 1984. 23. A Warshel, ST Russell, AK Churg. Macroscopic models for studies of electrostatic interactions in proteins: limitations and applicability. Proc Natl Acad Sci USA 81:4785–4789, 1984. 24. J Floria´n, A Warshel. Langevin dipoles model for ab initio calculations of chemical processes in solution: parametrization and application to hydration free energy of neutral and ionic solutes and conformational analysis in aqueous solution. J Phys Chem B 101:5583–5595, 1997. 25. B Jayaram, Y Liu, DL Beveridge. A modiﬁcation of the generalized Born theory for improved of solvation energies and pK shifts. J Chem Phys 109:1465–1471, 1998. 26. J Srinivasan, MW Trevathan, P Beroza, DA Case. Application of a pairwise generalized Born model to proteins and nucleic acids: inclusion of salt eﬀects. Theor Chem Acc 101:426–434, 1999. 27. A Onufriev, D Bashford, DA Case. Modiﬁcation of the generalized Born model suitable for macromolecules. J Phys Chem B 104:3712–3720, 2000. 28. G Lamm, GR Pack. Calculation of dielectric constant near polyelectrolytes in solution. J Phys Chem B 101:959–965, 1997. 29. M Gilson, B Honig. The dielectric constant of a folded protein. Biopolymers 25: 2097–2191, 1986. 30. J Antosiewicz, JA McCammon, MK Gilson. Prediction of pH-dependent properties of proteins. J Mol Biol 238:415–436, 1994. 31. M Schaefer, M Sommer, M Karplus. pH-dependence of protein stability: absolute electrostatic free energy diﬀerence between conformations. J Phys Chem B 101:1663–1683, 1997. 32. HWT van Vlijmen, M Schaefer, M Karplus. Improving the accuracy of protein pKa calculations-conformational averaging versus the average structure. Proteins 33:145–158, 1998. 33. G King, FS Lee, A Warshel. Microscopic simulations of macroscopic dielectric constant of solvated proteins. J Chem Phys 95:4366–4377, 1991. 34. K Sharp, A Jean-Charles, B Honig. A local dielectric constant for model solvation free energies which accounts for solute polarizability. J Phys Chem 96: 3822–3828, 1992. 35. D Voges, A Karshikoﬀ. A model for a local static dielectric constant in macromolecules. J Phys Chem 108:2219–2227, 1998. 36. C Tanford, R Roxby. Interpretation of protein titration curves. Application to lysozyme. Biochemistry 11:2192–2198, 1972. 37. D Bashford, M Karplus. Multiple-site titration curves of proteins: an analysis of exact and approximate methods for their calculation. J Phys Chem 95:9556– 9561, 1991. 38. A Karshikoﬀ. A simple algorithm for calculation of multiple site titration curves. Protein Eng 8:243–248, 1995. 39. P Beroza, MY Fredkin, MY Okamura, G Feher. Protonation of interacting residues in a protein by a Monte Carlo method: application to lysozyme and the

Calculations of Ionization Equilibria in Proteins

40.

41. 42.

43. 44.

45. 46.

47.

48.

49.

50.

51.

52. 53. 54.

55.

181

photosynthetic reaction center of Rhodobacter sphaeroides. Proc Natl Acad Sci U S A 88:5804–5808, 1991. M Miteva, PA Demirev, AD Karshikoﬀ. Multiply-protonated protein ions in the gas phase: calculation of the electrostatic interactions between charged sites. J Phys Chem B 101:9645–9650, 1997. A-S Yang, MR Gunner, R Sampogna, K Sharp, B Honig. On the calculation of pKas in proteins. Proteins 15:252–265, 1993. M Gilson. Multiple-site titration and molecular modelling: two rapid methods for computing energies and forces for ionizable groups in proteins. Proteins: Str Func Gen 15:266–282, 1993. A-S Yang, B Honig. On the pH dependence of protein stability. J Mol Biol 231: 459–474, 1993. TJ You, D Bashford. Conformation and hydrogen ion titration of proteins: a continuum electrostatic model with conformational ﬂexibility. Biophys J 69: 1721–1733, 1995. J Antosiewicz, JA McCammon, MK Gilson. The determination of pKas in proteins. Biochemistry 35:7819–7833, 1996. D Khare, P Alexander, J Antosiewich, P Bryan, M Gilson, J Orban. pKa measurements from nuclear magnetic resonance for B1 and B2 immunoglobin G-binding domain of protein G: comparison with calculated values for nuclear magnetic resonance and x-ray structures. Biochemistry 36:3580–3589, 1997. RE Georgescu, E Alexov, MR Gunner. Combining conformational ﬂexibility and continuum electrostatics for calculating pKas in proteins. Biophys J 83: 1731–1748, 2002. E Alexov. Role of the protein side-chain ﬂuctuations on the strength of pairwise electrostatic interactions: comparing experimental with computed pKas. Proteins 50:94–103, 2003. AA Gorfe, P Ferrara, A Caﬂisch, DN Marti, HR Bosshard, I Jelesarov. Calculation of protein ionization equilibria with conformational sampling pKa of a model leucine zipper, GCN4 and barnase. Proteins 46:41–60, 2002. A Koumanov, A Karshikoﬀ, EP Friis, TV Borchert. Conformational averaging in pK calculations. Improvement and limitations in prediction of ionization properties of proteins. J Phys Chem B 105:9339–9344, 2001. VZ Spassov, R Ladenstein, A Karshikoﬀ. Optimization of the electrostatic interactions between ionized groups and peptide dipoles in proteins. Protein Sci 6:1190–1195, 1997. D Bashford, K Gerwert. Electrostatic calculations of the pKa values of ionizable groups in bacteriorhodopsin. J Mol Biol 224:473–486, 1992. A Karshikoﬀ, V Spassov, RQ Cowan, R Ladenstein, T Schirmer. Electrostatic analysis of two porin channels from E. coli. J Mol Biol 240:372–384, 1994. VZ Spassov, H Luecke, K Gerwert, D Bashford. pKa calculations suggest storage of an excess proton in a hydrogen-bonded network in bacteriorhodopsin. J Mol Biol 321:203–219, 2001. N Spitzner, L Frank, S Pfeiﬀer, A Koumanov, A Karshikoﬀ, H Ru¨terjans. Ionization properties of titratable groups in ribonuclease T1. I. pKa values in the

182

56.

57.

58.

59.

60.

61.

62. 63.

64.

65.

66.

67.

68.

69.

Karshikoﬀ native state determined by two-dimensional heteronuclear NMR spectroscopy. Eur Biophys J 30:186–197, 2001. MK Brendskag, JS McKinley-McKee, JO Winberg. Drosophila lebanonensis alcohol dehydrogenase: pH dependence of the kinetic coeﬃcients. Biochim Biophys Acta 1431:74–86, 1999. JO Winberg, MK Brendskag, I Sylte, RI Lindstad, JS McKinley-McKee. The catalytic triad in Drosophila alcohol dehydrogenase: pH, temperature and molecular modelling studies. J Mol Biol 294:601–616, 1999. A Koumanov, J Benach, S Atrian, R Gonza`lez-Duarte, A Karshikoﬀ, R Ladenstein. The catalytic mechanism of Drosophila alcohol dehydrogenase: evidence for a proton relay modulated by the coupled ionization of the active site lysine/tyrosine pair and a NAD+ ibose OH switch. Proteins 51:289–298, 2003. J Benach, S Atrian, R Gonzalez-Duarte, R Ladenstein. The catalytic reaction and inhibition mechanism of Drosophila alcohol dehydrogenase: observation of an enzyme-bound NAD-ketone adduct at 1.4 A˚ resolution by x-ray crystallography. J Mol Biol 289:335–355, 1999. M Oliveberg, S Vuilleumier, A Fersht. Thermodynamic study of the acid denaturation of barnase and its dependence on ionic strength: evidence for residual electrostatic interactions in the acid/thermal denatured state. Biochemistry 33: 8826–8832, 1994. M Oliveberg, VL Arcus, AR Fersht. pKa values of carboxyl groups in the native and denatured state of barnase: the pKa values of the denatured state are on 0.4 units lower than those of model compounds. Biochemistry 34:9424–9433, 1995. Y-J Tan, M Oliveberg, B Davis, AR Fersht. Perturbed pKa-values in the denatured states of proteins. J Mol Biol 254:980–992, 1995. CN Pace, RW Alston, KL Shaw. Charge–charge interactions inﬂuence the denatured state ensemble and contribute to protein stability. Protein Sci 9:1395– 1398, 2000. J Warwicker. Simpliﬁed methods for pK(a) and acid pH-dependent stability estimation in proteins: removing dielectric and counterion boundaries. Protein Sci 8:418–425, 1999. P Phelan, AA Gorfe, I Jelesarov, DN Marti, J Warwicker, HR Bosshard. Salt bridges destabilize a leucine zipper designed for maximized ion pairing between helices. Biochemistry 41:2998–3008, 2002. A-S Yang, B Honig. Structural origin of pH and ionic strength eﬀects on protein stability. Acid denaturation of sperm whale apomyoglobin. J Mol Biol 237:602– 614, 1994. AH Elcock. Realistic modeling of the denatured states of proteins allows accurate calculations of the pH dependence of protein stability. J Mol Biol 294: 1051–1062, 1999. H-X Zhou. A Gaussian-chain model for treating residual charge–charge interactions in the unfolded state of proteins. Proc Natl Acad Sci U S A 99:3569– 3574, 2002. H-X Zhou. Residual electrostatic eﬀects in the unfolded state of the N-terminal

Calculations of Ionization Equilibria in Proteins

70. 71. 72. 73.

74. 75.

76.

77.

183

domain of L9 can be attributed to nonspeciﬁc nonlocal charge–charge interactions. Biochemistry 41:6533–6538, 2002. HX Zhou. Dimensions of denatured protein chains from hydrodynamic data. J Phys Chem B 106:5769–5775, 2002. PJ Kundrotas, A Karshikoﬀ. Model for calculations of electrostatic interactions in unfolded proteins. Phys Rev E 65:11901–11909, 2002. JG Kirkwood. Theory of solutions of molecules containing widely separated charges with special application to zwitterions. J Chem Phys 2:351–361, 1934. M Tollinger, JD Forman-Kay, LE Kay. Measurement of side-chain carboxyl pKa values of glutamate and aspartate residues in an unfolded protein by multinuclear NMR spectroscopy. J Am Chem Soc 124:5714–5717, 2002. D Sali, M Bycroft, AR Fersht. Stabilization of protein-structure by interaction of alpha-helix dipole with a charged side-chain. Nature 335:740–743, 1988. M Joshi, A Hedberg, L McIntosh. Complete measurement of the pK(a) values of the carboxyl and imidazole groups in Bacillus circulans xylanase. Protein Sci 6:2667–2670, 1997. DL Luisi, DP Raleigh. pH-Dependent interactions and the stability and folding kinetics of the N-terminal domain of L9. Electrostatic interactions are only weakly formed in the transition state of folding. J Mol Biol 299:1091–1100, 2000. PJ Kundrotas, A Karshikoﬀ. Modeling of denatured state for calculations of electrostatic contribution to protein stability. Protein Sci 11:1681–1686, 2002.

8 Modeling and Optimization of Directed Evolution Protocols Gregory L. Moore and Costas D. Maranas Pennsylvania State University University Park, Pennsylvania, U.S.A.

1

INTRODUCTION

This chapter summarizes research activities by the Maranas group (http:// fenske.che.psu.edu/faculty/cmaranas/) toward modeling the statistics of combinatorial DNA libraries generated through directed evolution methods. Directed evolution methods utilize the process of natural selection to combinatorially evolve enzymes, proteins, or even entire metabolic pathways with improved properties. These methods typically begin with the infusion of diversity into a small set of parental nucleotide sequences through DNA recombination and/or mutagenesis. The resulting combinatorial DNA library is then subjected to a high-throughput selection or screening procedure, and the most improved variants are isolated for another round of recombination or mutagenesis. The cycles of recombination/mutagenesis, screening, and isolation continue until a protein or enzyme with the desired level of improvement is found. In the last few years, remarkable success stories of directed evolution have been reported (1–3), ranging from manifold improvements in enzyme activity and thermostability (4), enhanced bioreme185

186

Moore and Maranas

diation (5–8), and design of vaccines (9–11), to viral vectors for gene delivery (12,13). A key challenge in directed evolution is that only an inﬁnitesimally small fraction of the diversity aﬀorded by DNA sequences can be characterized regardless of the eﬃciency of the screening procedure employed. For example, a 500-bp gene implies 4500c10301 alternatives, but even the most eﬃcient screening methods are restricted to 107–108 alternatives. Therefore, it is important to know how diversity is generated and allocated in the combinatorial DNA library and which regions are the most promising. This chapter addresses the ﬁrst question in the context of the DNA shuﬄing (14,15) and SCRATCHY (16) protocols and examines how fragmentation length, annealing temperature, sequence identity, and number of shuﬄed parental sequences aﬀect the number, type, and distribution of crossovers along the length of reassembled sequences. The predictive frameworks presented here [eShuﬄe (17) and eSCRATCHY (16)] provide a step toward optimizing directed evolution protocols in response to an enzyme or protein design challenge. In these modeling frameworks, annealing events during reassembly are modeled as a network of reactions, and equilibrium thermodynamics is employed to quantify their conversions and selectivities. The development of the modeling frameworks was assisted by the experimental and practical expertise of Professors Stephen Benkovic and Stefan Lutz. Figures and tables are reprinted from Refs. 16 and 17 (copyright n 2001 National Academy of Sciences, USA). 2 2.1

MODELING DNA SHUFFLING Background

DNA shuﬄing (14,15), along with its variants, is one of the earliest and most commonly used DNA recombination protocols. It consists of random fragmentations of parent nucleotide sequences with DNase I and subsequent fragment reassembly through primerless PCR. Library diversity is generated during reassembly when two fragments originating from diﬀerent parent sequences anneal and subsequently extend. This gives rise to a crossover, the junction point in a reassembled sequence where a template switch takes place from one parent sequence to another. The key advantage of DNA shuﬄing is that many parent sequences can be recombined simultaneously [i.e., family DNA shuﬄing (18,19)], generating multiple crossovers per reassembled sequence. However, crossovers tend to aggregate in regions of high-sequence identity due to the annealing-based reassembly. 2.2

Modeling of Annealing Events

During annealing, fragments compete to anneal with a growing template. This competition is quantiﬁed by utilizing equilibrium thermodynamics to

Directed Evolution Protocols

187

infer (a) what fraction of these fragments will anneal at a given temperature, (b) how these annealing events will be distributed between those involving high or low overlap lengths, and (c) what portion of these annealing events will involve mismatches. An annealing event between fragments originating from the same parent sequence yields a homoduplex (assuming in-frame annealing), whereas the annealing of two fragments from diﬀerent parents gives a heteroduplex. Mismatches at exactly the 3V end will lead to less eﬃcient extension and thus are not counted. The thermodynamics of duplex formation can be analyzed using nearest-neighbor parameters that describe the enthalpic and entropic contributions of speciﬁc nucleotide pairs in the overlapping region (20–25). The change in free energy DG associated with an annealing event can be approximated by summing the free energy gains associated with all 2-nt matches and the free energy penalties associated with the mismatches. Additional corrections are also included for the duplex initiation free energy cost, salt concentration, and dangling end stabilization (26). Enthalpic and entropic parameters at 37jC for the contribution of pairs of matches and mismatches are summarized in a table found in the supplemental material of Ref. 17. Given this free energy predictive capability, the extent of duplex formation can be tracked at diﬀerent temperatures. Speciﬁcally, consider the reaction associated with the annealing of a fragment F with a template A, forming a duplex AF: A þ FWAF: Assuming equilibrium, the equilibrium constant K(T) links the mole fractions of the template, fragment, and duplex at diﬀerent temperatures:

DGðTÞ xAF KðTÞ ¼ exp : ¼ RT xA xF Here x denotes mole fractions and 0 denotes initial values of the species in the reaction mixture so that xA=xA0xAF and xF=xF0xAF. Let a(T) be the annealing curve deﬁned as the fraction of templates that have annealed at temperature T [a(T)=xAF/xA0=1–xA/xA0]. Upon rearrangement, these equations can be solved for xF, xA, xAF, and a(T). The temperature at which half of the templates have hybridized to form duplexes [i.e., a(T)=1/2] is deﬁned as the melting temperature Tm. Comparisons of the predictions obtained with the described free energy modeling framework against those found by an empirical formula commonly used for hybridization experiments (27) are in good agreement (see Table 1). Plots of a(T) vs. T reveal that there is a relatively narrow temperature range, centered around Tm, where the majority of annealing events take place (sigmoidal curve). In general, longer overlaps imply higher melting temperatures whereas shorter overlaps, mismatches, and low guanine/cytosine (GC) content depress Tm.

188

Moore and Maranas

Table 1 Comparison of Melting Temperature Predictions for Diﬀerent Duplexes of Fragmented Subtilisin E Gene Between the Proposed Model and Tm=81.5+0.41(% GC)500/L+16.6 log[Na+] Melting temperature (jC)

Sequence positions

Overlap length

Percent GC

Annealing model

Howley et al. (27)

819–828 1013–1022 529–538 804–828 779–828 729–828

10 10 10 25 50 100

50 30 60 52 50 55

26 17 32 61 72 81

30 22 35 61 71 78

Data shown are for [Na+]=0.05 M and an initial template mole fraction xA0=2.7108 that corresponds to a DNA concentration of 10 mg/L, typical for DNA shuﬄing.

During the annealing step of DNA shuﬄing, not a single, but many diﬀerent fragments with varying lengths, overlaps, and mismatches are competing for a given template: A þ Fmv WAFmv : Here m refers to a fragment originating from parent sequence m and v implies an overlap length of v nucleotides with the template upon annealing. After adjusting the expression for a(T) to reﬂect the multiplicity of annealing choices and resolving the system of equations, the temperature-dependent selectivity: 0 1 X smv ðTÞ ¼ xAFmv =@ xFmV vV A mV ;vV

for a particular fragment and overlap choice mv is estimated. The presence of multiple fragment and overlap choices ‘‘spreads’’ the melting curve over a wider range of temperatures, implying that annealing events occur over the entire temperature range (typically 94–55jC). The free energy diﬀerences between annealing choices and relative fragment concentrations determine which annealing choice dominates at a given temperature. For instance, at high temperatures, fragments with large overlaps that match perfectly with the template dominate all others because of the large enthalpic gains that they provide on annealing. As the temperature is lowered, the melting temperatures of fragments with progressively smaller overlaps and even one or two mismatches are reached, resulting in selectivities that are much more uniform.

Directed Evolution Protocols

189

Because annealing selectivities are temperature-dependent, duplex formation must be assessed cumulatively over the entire annealing temperature range. To this end, the annealing step is modeled as a sequence of pseudoequilibrium states progressively contributing duplexes as the temperature is lowered from 94jC to 55jC. Mathematically, this implies an integration of the temperature-dependent selectivities smv(T) times the annealing rate da(T)/ dT over the annealing temperature schedule: Smv ¼

Z T denature Tanneal

smv ðTÞ

daðTÞ dT: dT

Given a pool of fragments competing for a template and an annealing temperature schedule, Smv quantiﬁes the overall annealing selectivities. The eﬀect of the length of overlap and number/severity of mismatches is illustrated in Fig. 1. The ﬁrst plot (Fig. 1a) addresses the case when there are no mismatches. It clearly shows that there is a strong preference toward anneal-

Figure 1 Selectivity vs. overlap lengths (a) and selectivity for diﬀerent degrees, types, and locations of mismatches (b). Both charts utilize the subtilisin E gene (positions 760–784), and mismatches are evenly distributed in the overlapping region.

190

Moore and Maranas

ing events involving the maximum overlap. However, a nonnegligible portion of annealing events involves shorter overlaps. The second plot (Fig. 1b) considers the eﬀect of the number and type of mismatches on annealing selectivities for a given overlap length. Although the great majority of annealing events involve no mismatches, some mismatch-bearing annealing events that cannot be ignored also occur. Note that, in the present implementation, the type of a mismatch aﬀects its selectivity whereas its distance from the 3V end does not. Next, the individual annealing statistics are utilized to infer crossover generation in the reassembled sequences. 2.3

Fragment Reassembly

The reassembly process is modeled as a successive sequence of annealing events. Speciﬁcally, the selectivity of an annealing event is assumed to depend only on the identity of the fragment added immediately before. For clarity of presentation, only fragments of a unique length L will be used in the reassembly analysis. Nevertheless, fragments with varying lengths can be incorporated in a straightforward manner as described (28,29). The key idea of the reassembly procedure is to postulate a set of recursive relations that resolves the question of what is the probability C x that a full-length reassembled sequence of B nucleotides has x crossovers. To x denoting the probability that reassembly from pothis end, we deﬁne Pik sition i to the end B of the DNA sequence will yield exactly x crossovers, given that the fragment ending at position i1 originated from parent sequence k. The selectivities Smv, deﬁned earlier, can then be calculated for diﬀerent annealing choices. When a fragment from parent sequence m anneals with a fragment from sequence k, either a homoduplex (m=k) or a heteroduplex (m p k) is formed. Homoduplex formation implies that no crossover is generated and the recursion must still track x crossovers over the remainder of the reassembly. However, heteroduplex formation implies that only x1 remaining crossovers must be subsequently tracked. The annealing of a fragment of length L with an overlap v implies the addition of Lv nucleotides, extending the template to position (i1)+(Lv). This position becomes the new reassembly point completing the recursion. Summation over all parent sequences m and overlap lengths v encompasses all possible reassembly pathways: Pxik ¼

L1 X v¼1

Skv PxiþLv;k þ

L1 XX

Smv Px1 iþLv;m ; b x > 0; b i > L; and b k:

mpk v¼1

The resolution of this recursion requires boundary conditions at the start and end of the gene or gene fragment under consideration. At the onset of re-

Directed Evolution Protocols

191

assembly, the initial fragment covers the range i=1 to i=L, implying that subsequent annealing events add nucleotides starting from position i=L+1. This initial fragment comes from parent m with a probability equal to the relative concentration Cm of parent m in the reaction mixture. This implies that the probability C x that the reassembled sequences contains x crossovers is the parent relative concentration averaged probability of having x crossovers past position L+1: Cx ¼

X

Cm PxLþ1;m ; b x ¼ 0; 1; . . .

m

The boundary conditions for the end position B ensure that no crossovers occur beyond position i=B: P0ik ¼ 1; b i > B and b k Pxik ¼ 0; b x > 0; b i > B; and b k: Because reassembly is a bidirectional process, the reassembly algorithm is also executed in the reverse direction with the complementary DNA sequences, and the results are combined. A ﬂowchart outlining the proposed reassembly procedure is shown in Fig. 2. Interestingly, the original application of the reassembly algorithm overestimated the total number of crossovers, especially for shuﬄing sequences that share very high-sequence identity. Closer inspection revealed that this was due to the formation of heteroduplexes with fragments involving perfect sequence identity with the growing template. Even though they are indeed crossovers, according to the formal crossover deﬁnition, they are completely

Figure 2

A ﬂowchart of the eShuﬄe reassembly algorithm.

192

Moore and Maranas

undetectable experimentally and, more importantly, they do not contribute any diversity. Therefore, the term silent crossover was proposed for them, and the reassembly algorithm was revised to exclude them. Speciﬁcally, if the annealing of a fragment from parent m to a growing template ending with a fragment from parent k is equivalent to the continuation of the template with nucleotides from parent k, no crossover is counted. The proposed reassembly procedure allows the estimation of the fraction of the reassembled sequences containing x =0,1, . . . crossovers. By redeﬁning what constitutes a desirable crossover, diﬀerent types of crossovers can be assessed separately. For example, in the family DNA shuﬄing of sequences A, B, and C, the statistics of all six possible types of crossovers AB, BA, AC, CA, BC, and CB can be tracked independently. In addition, one could even track homoduplex extension events such as AA, BB, or CC. Next, the statistics of the distribution of these crossovers along the reassembled sequences is examined. Speciﬁcally, the question addressed is, ‘‘what is the probability that a given position i in a reassembled sequence is the site of a crossover (i.e., end point of a heteroduplex annealing event)?’’ This probability depends on the parent origin of the fragment ending at position i1. Thus, the probability that a fragment from parent k ends exactly at position i1 is deﬁned as Tik. A recursion is then established in a similar manner as before. A fragment from parent m ends at position i1 if and only if it was added to a fragment from parent k ending at position iL+v with an overlap v. The probability for this particular duplex formation event can be quantiﬁed by multiplying the selectivity Smv times the probability TiL+v,k that the template is positioned appropriately: Tim ¼

L1 XX k

TiLþv;k Smv ; b i > L þ 1 and b m:

v¼1

Boundary conditions ensure that the ﬁrst nucleotide added to the original fragment comes from a parent sequence k with a probability proportional to its relative concentration. Furthermore, no fragment may end before position i=L: TLþ1;k ¼ Ck ; b k Tik ¼ 0; b i V L and b k: Once the probability Tik that a particular type of template k ends immediately before position i is known, it can be multiplied by the selectivity of a crossover-generating annealing event Smv and summed over all possible annealing choices to infer the probability Picross that position i is the site of a crossover.

Directed Evolution Protocols

Pcross ¼ i

L1 X XX k

193

Tik Smv :

v¼1 mpk

Again, by tailoring the deﬁnition of a crossover, the distribution of diﬀerent types of crossovers (i.e., AB, BC, or AC) along the sequence can be assessed separately. A consistency check reveals that the average number of crossovers calculated based on the probabilities Picross quantifying crossover density along the DNA sequence (SiPicross) is identical to the one obtained based on the crossover number distribution calculated earlier (SxxC x). Given this versatile algorithmic framework, the statistics of any type of crossover can be quantiﬁed both in terms of variability among the reassembled sequences and along the length of the gene. Predictions obtained based on the analysis described above are next contrasted against experimental data from DNA shuﬄing experiments reported in the literature. 2.4

Comparisons with Experimental Results

Although directed evolution studies are being reported in the literature with an accelerating pace, only a few studies report DNA sequencing results for naive (i.e., unselected) DNA libraries. Partial DNA sequencing results allowing for the estimation of the number of crossovers in a small subset of the reassembled sequences are found for the following ﬁve studies. Computer simulation of DNA shuﬄing of these systems provides the basis for the comparisons. Every eﬀort was made to ensure that the fragment length, annealing temperature, and salt and DNA concentrations matched the ones in the experimental study. When no information was provided, default values from the original DNA shuﬄing protocol (14,15) were adopted. The ﬁrst system considered is composed of two 465-bp IL-1h genes (human and murine) (15) with a sequence identity of only 75%. An extremely low annealing temperature of 25jC was used to boost the generation of crossovers. Nine colonies were sequenced for a total of 17 crossovers, implying an average of 1.9 per sequence. Simulation results are in close agreement with the experiment, predicting an average of 1.5 crossovers. The next system involved the family DNA shuﬄing of four class C cephalosporinase genes, 1.2 kb in length with pairwise sequence identities ranging from 58% to 82% (18). It was reported that neither of the two active clones sequenced contained any fragments from the Yersinia enterocolitica gene (third gene). The question is whether this occurred because fragments originating from this gene had a detrimental eﬀect on activity, or simply because pieces from this gene were disproportionately misrepresented in the naive library due to the lack of suﬃciently long stretches of near-perfect sequence identity with the other three genes. The average sequence identities

194

Moore and Maranas

of each one of the four genes against the remaining three are 70%, 70%, 65%, and 59%, respectively. Simulation results predict that 36% of the naive sequences contain at least one crossover. The fraction of crossover-bearing sequences containing at least one piece from each one of the four genes is 85%, 95%, 7%, and 19%, respectively. This indicates that Y. enterocolitica (the third one) is, by far, the least even though it is not the one with the lowest sequence identity. This suggests a possible explanation for the absence of any piece of Y. enterocolitica in the most active clones. The next system studied involved two genes for glycinamide ribonucleotide transformylase, Escherichia coli (purN) and human (hGART) (30), with a very low-sequence identity of 49%. Here the following staggered portions of the two genes were shuﬄed (E. coli positions 1–434 and human positions 164– 611), implying that crossovers could only be formed in the 271-bp shared region (47% sequence identity). This arrangement requires that all reassembled genes of full length start with the E. coli gene and end with the human gene, yielding an odd number of crossovers. In the experimental study, only single crossover clones were observed of 10 sequenced clones. This is consistent with the simulation prediction that the ratio of the number of reassembled sequences with three or more crossovers to the number of sequences with a single crossover is less than 109. A system with a relatively highsequence identity is analyzed next. It involves the DNA shuﬄing of two biphenyl oxygenases sharing a sequence identity of 87% (31). For this system, an average of 3.3 crossovers per sequence is observed experimentally (six sequenced clones), whereas the simulation suggests a slightly smaller average of 2.8. The last study is the only one where the simulation results deviated from the experimentally observed crossover averages. It involved the DNA shufﬂing of a 1.3-kb gene for wild-type (wt) subtilisin E and that of a clone (1E2A) diﬀering by only 10 point mutations (32). Slightly larger fragments in the range of 20–50 bases were used in place of the default fragment length range of 10–50 bases. One would expect that a large average number of crossovers would be generated in this system because only 10 point mutations are present, implying a sequence identity of 99.2%. However, this is not observed experimentally as only an average of 1.9 crossovers per sequence is reported (32). The simulation results, on the other hand, are consistent with the intuitive expectation, predicting an average of 3.6 crossovers per reassembled sequence. The randomly chosen sequences may not have been representative of the entire DNA library. For instance, recombinations between mutations at positions 520 and 732 in clone 1E2A must be occurring independently because the stretch of perfect identity is much wider than even the maximum fragment size. However, a crossover occurs in only 10% of the reported sequences instead of the 50% frequency expected for independent reassembly. With the

Directed Evolution Protocols

195

exception of this last example, simulation predictions are in good agreement with the published experimental results without adjustable model parameters. 2.5

Subtilase Case Study

Subtilases are serine proteases (33) extensively engineered with directed evolution experiments (19,34,35). A set of 12 subtilases including subtilisins E, BPNV, Carlsberg, 147, ALP I, PB92, and Sendai; serine proteases C and D; proteinases K and R; and thermitase are next considered to highlight the eﬀect of fragmentation length, annealing temperature, sequence identity, and number of shuﬄed sequences on the number, type, and distribution of crossovers. We chose to mirror recent subtilase-directed evolution experiments (19) by analyzing the shuﬄing of only a 500-bp subgenomic region. The average pairwise sequence identity is 58%, ranging from 44% to 90%. First, a high-sequence identity 80% pair (subtilisin E and subtilisin BPNV) is considered. As shown in Fig. 3a, for a fragmentation length of L = 50 bases, 44% of the reassembled sequences involve no crossovers, 37% involve one crossover, 15% involve two crossovers, and diminishing percentages for sequences with more than two crossovers. As the fragment length is reduced, a nonlinear increase of crossovers is observed. This nonlinear increase in the average number of crossovers as a function of L is more clearly depicted in Fig. 3b. Interestingly, the same plot (dashed line) reveals a dramatic increase of silent crossovers for very small fragment lengths (i.e., L V 20). Fig. 4 illustrates the distribution of crossovers superimposed against the sequence identity along the sequence. It shows that crossovers are preferentially aggregated in regions of near-perfect sequence identity, forming a characteristic double peak. The double peak implies that annealing events make full use of the available sequence identity, giving rise to two distinct double peaks at the two ﬂanking positions of the sequence identity stretch. Larger fragments aﬀord a wider range of overlaps ﬂattening the two peaks, whereas smaller fragments are capable of generating crossovers in relatively narrow regions of high-sequence identity. However, in DNA shuﬄing, not a single fragmentation length L is employed but rather a distribution of fragment sizes, typically in the range of 10–50 bases, with a size distribution described by an exponentially decaying function (28,29). When a range of fragment sizes is employed for the above example, computational results reveal that the crossover statistics are almost identical with the case of utilizing a single ‘‘eﬀective’’ fragment size which, for the 10- to 50-base range, is 25 bases. Next, the eﬀect of annealing temperature on crossover generation is studied. It was found that two underlying mechanisms exist, with which annealing temperature aﬀects the crossover statistics (see Fig. 5). Speciﬁcally,

196

Moore and Maranas

Figure 3 (a) Crossover number distribution for DNA shuﬄing of subtilisin E and subtilisin BPNV for L = 15, 25, and 50 bases. (b) Average number of crossovers per sequence for the same system plotted vs. fragment length in bases. The dotted line includes silent crossovers.

for medium to large fragments, lower annealing temperatures imply that the melting temperatures of more annealing choices containing mismatches (i.e., heteroduplexes) are encountered, yielding more crossovers upon extension. However, for very small fragments at high temperatures, the entropic contribution to the free energy of annealing dominates, blurring the distinction between homoduplexes and heteroduplexes, causing a sharp increase in the total number of crossovers. Clearly, as in the case of fragment length, the annealing temperature cannot be arbitrarily reduced because at some point, fragments cease to exhibit strong aﬃnity for annealing in-frame, and out-offrame additions start to overwhelm the reassembly process.

Directed Evolution Protocols

197

Figure 4 Probability of generating a crossover along the length of the sequence for the (subtilisin E and subtilisin BPNV) system for L = 15, 25, and 50 bases along the subregions 485–979. Black columns in the bottom strip chart denote identical nucleotides for both sequences, and white lines denote mismatches.

The limits of DNA shuﬄing are explored by choosing the low-sequence identity pair (serine protease D and proteinase K) that has a 46% sequence identity. As expected, very few crossovers are predicted (see Table 2), with only a single narrow region at the end of the sequence coinciding with a short stretch of high-sequence identity. Subsequently, the high-sequence identity pair (subtilisin E and subtilisin BPNV) is shuﬄed in silico together with the low-sequence identity pair (serine protease D and proteinase K) in equal

Figure 5 Eﬀect of annealing temperature on the number of crossovers produced for the high-sequence identity subtilase pair (subtilisin E and subtilisin BPNV).

198

Moore and Maranas

Table 2 Average Numbers of Crossovers per Sequence Calculated for Various Fragment Lengths L and Parental Sets L (bases) 15 25 50

High-sequence identity pair

Low-sequence identity pair

Set of four subtilases

Set of 12 subtilases

2.9 1.3 0.8

0.5 0.1 0.0

2.3 0.8 0.5

4.8 1.4 0.8

ratios. The key question is whether the low-sequence identity pair will simply dilute the fragment pool that can form heteroduplexes, depressing crossover generation by a factor of 2, or whether synergism in the reassembly will dominate. Even though the average pairwise sequence identity for the four subtilase system is as low as 58%, a comparable number of crossovers with the (subtilisin E and subtilisin BPNV) single-pair case is found (see Table 2). This implies that synergistic reassembly is taking place, alluding to the contribution of ‘‘bridging’’ crossovers by the low-sequence identity pairs. The full power of synergistic reassembly is revealed when all 12 subtilases are included, providing a computational veriﬁcation of what is seen experimentally with family DNA shuﬄing, especially for smaller fragments. Even though the average pairwise sequence identity is only 58%, at least as many crossovers are generated (see Table 2) as for the high-sequence identity 80% pair. More importantly, these crossovers span the entire sequence range (see Fig. 6). Admittedly though, the distribution is still multimodal, with peaks tracking the location of high-sequence identity—a signature of the annealing-based reassembly characteristic of DNA shuﬄing. In Sec. 3, we examine the SCRATCHY protocol, which is capable of generating crossovers in nonhomologous regions and reducing the bias seen in Figs. 4 and 6.

Figure 6 Crossover probability distributions for in silico family DNA shuﬄing of all 12 subtilases (L = 15).

Directed Evolution Protocols

3 3.1

199

MODELING SCRATCHY Background

As mentioned above, sequence homology-dependent methods for recombining genes have been successful at evolving improved proteins (1–13). An inherent limitation of these methods is their dependence on DNA sequence identity for generating diversity. This precludes the creation of crossovers between genes at loci of low homology, biasing crossover positions toward regions of highest homology. In general, a severe bias toward parental recombination is observed when sequences with less than 70% sequence identity are DNA-shuﬄed. Given the fact that protein structure is more frequently conserved than DNA homology, homology-dependent methods for recombining genes may potentially exclude solutions to protein engineering problems. The need for a recombination protocol capable of freely exchanging genetic diversity without sequence identity limitations has motivated the creation of incremental truncation for the creation of hybrid enzymes (ITCHY). ITCHY allows one to create comprehensive fusion libraries between fragments of genes without any sequence dependency (30,36,37). However, the main drawback of the method, as well as similar techniques (38), is that members of these libraries contain only one crossover per gene. As suggested earlier (39), the DNA shuﬄing of ITCHY libraries could potentially introduce multiple crossovers between the genes of interest by preserving ITCHY crossovers (prepositioned crossovers) in the starting material and by recombining regions of homology between genes (Fig. 7). This combination of ITCHY and DNA shuﬄing has been named SCRATCHY. 3.2

eSCRATCHY Modeling Framework

An in silico modeling framework for crossover statistics prediction, named eSCRATCHY, was developed in conjunction with experimental work on SCRATCHY. The modeling framework builds on the eShuﬄe program presented above for assessing the generation of crossovers in the context of DNA shuﬄing (17). SCRATCHY can be abstracted as the family DNA shuﬄing of an artiﬁcially created superfamily containing all single crossover hybrids between the two genes of interest. The presence of fragments that contain prepositioned crossovers during reassembly extends the sequence space accessed by SCRATCHY compared to the one available to traditional DNA shufﬂing. Therefore, when fragment–fragment hybridization is considered in the reassembly algorithm of eSCRATCHY, it is necessary to keep track of, not only the overlapping region, but also whether one (or both) fragments

200

Moore and Maranas

Figure 7 Schematic overview of SCRATCHY. Initially, individual incremental truncation (ITCHY) libraries of the two complementary constructs are created (a). Following functional selection (b) to recover in-frame hybrids of parental size, the libraries are mixed and submitted to DNA shuﬄing (c).

contains a prepositioned crossover and whether this crossover is located within or outside the overlapping region (Fig. 8). These considerations give rise to three hypothetical, yet distinct, mechanisms for generating crossovers in contrast to the single mechanism (i.e., the extension of a heteroduplex) encountered in eShuﬄe, namely, (a) the extension of a heteroduplex as in eShuﬄe, (b) the incorporation of a prepositioned crossover, or (c) the extension of a hybrid duplex, which occurs when a fragment already

Directed Evolution Protocols

Figure 8

201

The three mechanisms for generating crossovers that are tracked in silico.

containing a prepositioned crossover anneals with another fragment with the crossover positioned in the duplex. Hybrid duplexes are part stabilizing homoduplexes and part crossover-generating heteroduplexes presumably enabling the SCRATCHY protocol to generate crossovers within narrower sequence identity stretches than DNA shuﬄing. It is important to note that these three hypothesized mechanisms reﬂect, and thus are dependent upon, the abstraction of the proposed reassembly algorithm as a recursive sequence of annealing events. Clearly, the sequence of actual hybridization events occurring in the reacting mixture over multiple cycles deﬁnes a process much more complex than the level of detail captured within eSCRATCHY. Speciﬁcally, hybrid duplexes may also occur in DNA shuﬄing but only after the ﬁrst reassembly cycle and only between fragments arising from heteroduplex extension in regions of near-perfect sequence identity that are largely absent in low-sequence identity systems. Annealing choices from all three mechanisms are handled in a straightforward manner within the free energybased scoring system (24). In addition, the reassembly algorithm is modiﬁed to check for each of the three crossover types for every fragment annealing event. Additional modiﬁcations were performed to improve computational performance. The family of single crossover sequences generated in the ITCHY step is much larger than that typically used for molecular breeding, so the original eShuﬄe program (which scales as the square of the number of parental sequences) was customized. Speciﬁcally, fragments with identical sequences from diﬀerent ITCHY parents were pooled because they do not change the outcome of fragment–fragment extensions considered by the reassembly algorithm. By aggregating their concentrations instead of considering them separately, CPU times were reduced to scale linearly with the number of parental sequences. In addition, we found that for fragmentation lengths longer than 40 nt, approximating individual duplex melting curves as step functions at the duplex’s melting temperature provided a tractable and accurate approximation of the annealing thermodynamics because melting

202

Moore and Maranas

Probability of a Given Number of Crossovers

temperatures for larger fragments are signiﬁcantly above the applied annealing temperature. A 40-nt fragment reassembly conﬁrmed that predictions vary by less than 5% when this approximation is utilized. eSCRATCHY was next used to address questions concerning the preservation of prepositioned crossovers in reassembled sequences, as well as their contribution toward multiple crossover sequences in comparison with those that also occur in homology-based reassembly. In particular, the eﬀects of fragmentation length and pairwise sequence identity on the number and positioning of crossovers produced and the relative contribution of each of the three postulated crossover mechanisms are examined. The purN/hGART system mentioned above (also see Ref. 30) is ﬁrst examined in detail. In this case study, both in-frame and parental size selection are ‘‘idealized’’ so that the crossovers present in the ITCHY library are not biased in any manner. Predictions from eSCRATCHY indicate that 52% of the reassembled sequences have multiple crossovers for a fragmentation length of 60 nt even though the nucleotide sequence identity is only 49% in the overlapping region. Note that even for fragments as short as 20 nt, predictions by eShuﬄe indicate that almost 99.9% of sequences reassembled by DNA shuﬄing alone will be wild type. Interestingly, in contrast to DNA shuﬄing, eSCRATCHY predicts that fragmentation length has little, if any, eﬀect on the average number of crossovers produced per sequence (Fig. 9). Smaller fragments imply that more annealing choices are available during reassembly and thus more opportunities to generate crossovers, but at the same time, a

20-nt 40-nt 60-nt 80-nt

30% 20% 10% 0% 0

1

2

3

4

5

Number of Crossovers Figure 9 Probability that a hybrid sequence contains a given number of crossovers after the ‘‘idealized’’ SCRATCHY of PurN and hGART for fragmentation sizes of 20, 40, 60, and 80 nucleotides (54jC annealing temperature). Note that the distributions are similar for each of the sizes.

Directed Evolution Protocols

203

smaller portion of fragments contains prepositioned crossovers. These two eﬀects appear to cancel each other for systems with low-sequence identity. Thus, relatively large fragments can be utilized in SCRATCHY without reducing the number of crossovers, allowing for easier puriﬁcation, isolation, and reassembly. In addition, predictions suggest that neglecting hybrid duplex crossovers in eSCRATCHY would produce drastically diﬀerent results, as these crossovers contribute 47% of the total number of crossovers. This ‘‘emergent’’ mechanism, not present in eShuﬄe, is almost as frequent as the prepositioned crossover mechanism. Heteroduplex crossovers are negligible, as expected, for a system with 49% sequence identity. The distribution of crossovers along the sequence is shown in Fig. 10. Prepositioned crossovers are present almost uniformly along the entire sequence, showing that the unbiased nature of the ITCHY library is retained. In contrast, hybrid duplexbased crossovers track regions of high-sequence identity and involve a less even distribution. Contrary to homology-based methods, the sum of all types of crossovers ﬁlls the entire sequence length with an average frequency of 0.65% per position. The ‘‘signature’’ of DNA shuﬄing can still be detected in the form of peaks tracking regions of high-sequence identity. Next, we examined the eﬀect of pairwise sequence identity on crossover frequencies for the recombination of the following six sequences with purN using eSCRATCHY and eShuﬄe (sequence identity with purN in the overlapping region in parentheses): GAR transformylases from human (49%);

Figure 10 Distribution of the diﬀerent types of crossovers along the sequence after the ‘‘idealized’’ SCRATCHY of PurN and hGART (20-nt fragments, 54jC annealing temperature). Note that no gaps appear along the entire crossover range when the crossover types are totaled. Heteroduplex crossovers are negligible and are not pictured.

204

Moore and Maranas

Figure 11 A comparison of the numbers of crossovers predicted for ‘‘idealized’’ SCRATCHY and DNA shuﬄing for sequence pairs of various sequence identities (20-nt fragments, 54jC annealing temperature). White bars represent contributions to SCRATCHY from prepositioned crossovers; black bars represent hybrid duplex crossovers; and crosshatched bars represent heteroduplex crossovers.

Pseudomonas aeruginosa (54%), Pasteurella multocida (60%), Vibrio cholerae (64%), Salmonella typhimurium (79%); and methionyl-tRNA formyltransferase from E. coli (33%). As seen in Fig. 11, predictions suggest that SCRATCHY is capable of generating crossovers for all sequence pairs, regardless of sequence identity. On the other hand, DNA shuﬄing requires an approximate ‘‘threshold’’ sequence identity of 60% before any appreciable crossover generation occurs. Even for high-sequence identities, we predict that SCRATCHY outperforms DNA shuﬄing by an average of 1.5 crossovers per sequence. Both prepositioned and hybrid duplex crossover mechanisms remain prevalent for the entire range of sequence identities, and the heteroduplex mechanism begins to contribute at identities greater than 60% (Fig. 11). Upon utilizing parameters reﬂecting the speciﬁcs of the actual experimental library, eSCRATCHY’s predictions of the naive purN/hGART SCRATCHY library were reexamined and compared to the experimental data. 3.3 3.3.1

Comparisons with Experimental Results Experimental SCRATCHY

Two ITCHY libraries encoding either the PurN/hGART (PGX) or the hGART/PurN (GPX) hybrid pairs were constructed (Fig. 7a), with members

Directed Evolution Protocols

205

distributed over the entire sample space, comparable to data from previous libraries (30, 37). Functional selection (Fig. 7b) was used to select for in-frame members of parental size for DNA shuﬄing. Although the proﬁle of representative sequences in such a library is biased as shown in Fig. 12, the distribution of the two directional libraries allows for multiple crossovers to occur in the overlapping region. Equal amounts of both selected libraries (PGX and GPX) were DNAshuﬄed (Fig. 7c), and the resulting reassembled sequences were ampliﬁed. The primer pair used for ampliﬁcation anneals to outside portions on either side of the gene, yielding a comprehensive library of possible combinations

Figure 12 Proﬁles of crossover positions for the (a) PGX and (b) GPX libraries, including experimental counts (bars) and smooth ﬁtted functions of crossover probability (lines).

206

Moore and Maranas

including wild-type constructs. From this naive library, the hybrid genes of over 50 individual colonies were analyzed by DNA sequencing, and the results are summarized in Fig. 13. For further information on the SCRATCHY protocol, see Ref. 17. Analysis of the library revealed several interesting characteristics. Most importantly, a signiﬁcant portion of the sample sequences had multiple crossovers. When considering the location and number of the crossover points in the sequences, an important experimental bias emerges. The majority of sequences (70%) in the library are reassembled duplicates of GPX

Figure 13 Sequence data for the naive SCRATCHY library. The dotted lines indicate the borders of the overlapping region between amino acid positions 54 and 144.

Directed Evolution Protocols

207

library members, as if the library were present at a higher concentration than the PGX library during DNA shuﬄing. Further examination of the sequencing data reveals a number of additional interesting features. The reassembly of parental wt sequences in SCRATCHY, in contrast to DNA shuﬄing of low homology sequences, is not dominant. Although few wt-PurN sequences are identiﬁed in the naive library, wt-hGART is absent. The deﬁciency of wt-hGART in the recombination mixture is explained by the paucity of a contiguous bridge of hGART fragments traversing the entire gene length due to the uneven distribution of fusion points in the two ITCHY libraries (Fig. 12). The same bias, ampliﬁed by the higher eﬀective concentration of the GPX library, is also responsible for the preponderance of hGART/PurN/hGART double crossover sequences over PurN/hGART/PurN hybrids. Reassembly of a PurN/hGART/PurN hybrid requires both a PurN-to-hGART crossover at the beginning of the overlapping region and a hGART-to-PurN crossover near the end of the overlapping region. However, both of these crossovers occur infrequently in the starting material, explaining their absence. In summary, the data show that the characteristics of the ITCHY libraries are inherited by the SCRATCHY library. 3.3.2

eSCRATCHY Comparisons

Accurate in silico analysis required the integration of two experimental presets: the crossover distribution of the employed ITCHY libraries and the fragment reassembly-based bias toward hGART/PurN library members. First, the uneven crossover distributions caused by the functional selection of the ITCHY libraries were accounted for in the eSCRATCHY program by ﬁtting the observed crossover data with a smooth function (Fig. 12), thus customizing the relative concentration of each of the ITCHY library members. Second, as seen in the naive library, hGART/PurN library members dominate the reassembly process. This eﬀect was accounted for by adjusting the concentration ratio of the two libraries to 86% GPX:14% PGX. This ratio was calculated by examining the 5V- and 3V-termini of the library members. The relative eﬀective concentration of the GPX library was estimated by counting the number of sequences beginning with hGART (47 sequences) and ending with PurN (39 sequences). Similarly, the PGX library estimate totaled 14 (3+11), resulting in the 86:14 ratio. Together, these two modiﬁcations result in crossover predictions that are in good agreement with the experimental sequence data for the naive library. The distribution matches well with what is found experimentally (Fig. 14a). The discrepancy between the numbers of multiple crossovers predicted in the ‘‘idealized’’ case (Fig. 9) and those found in the experiment can be attributed to the bias in the starting

208

Moore and Maranas

Probability of a Given Number of Crossovers

(a) 80% Experimental Model

60% 40% 20% 0% 0

1

2

3

4

5

Number of Crossovers

(b) Crossover Probability

60% 50% 40%

Experimental Model

30% 20% 10% 150 170 190 210 230 250 270 290 310 330 350 370 390 410 430

0%

Sequence Position (nt) Figure 14 Comparing eSCRATCHY predictions (70 nt fragmentation length, 54jC annealing temperature) for (a) the number of crossovers per naive library member and (b) naive library crossover positions against experimental data. In (b), data are grouped in histogram form with each bar representing a range of 10 nt.

material. In addition, predictions for crossover position statistics (Fig. 14b) capture the uneven nature of crossovers found in the reassembled sequences as a result of the same bias, which also leads to an increased 3.6:1 ratio of prepositioned/hybrid duplex crossovers compared to the ‘‘idealized’’ case. Another interesting aspect is the contribution of crossovers originating from incremental truncation or homology-based recombination. Experimentally, all fusion points observed in the SCRATCHY libraries have counterparts at locations corresponding to prepositioned crossovers, originating from the ITCHY libraries. However, the origin of the crossovers in the

Directed Evolution Protocols

209

homologous region between amino acids 100 and 110 could not have been attributed conclusively to ITCHY or DNA shuﬄing. In the eSCRATCHY model, heteroduplex crossovers are rare across the entire sequence. 4

SUMMARY

eShuﬄe provided for the ﬁrst time a quantitative framework for the in silico exploration of many ‘‘what if’’ scenarios in terms of fragmentation length, annealing temperature, and parental choices in the context of DNA shuﬄing. Comparisons of the eShuﬄe predictions against experimental data revealed good agreement, particularly in light of the fact that there are no adjustable parameters in the modeling framework. The only parameters are the free energy contributions used unchanged from literature sources (24). Therefore, no reparameterization is needed when either the experimental conditions or the sequences to be shuﬄed change, providing a versatile framework for comparing diﬀerent protocol choices and setups. In the context of family DNA shuﬄing (18,19), the eShuﬄe program enabled the estimation of the relative contribution of fragments from diﬀerent parental sequences to the combinatorial DNA library. Results revealed that the pairwise sequence identities between the parental sequences do not always explain the observed parental crossover frequencies in the libraries. eShuﬄe also led to the quantiﬁcation of synergistic reassembly in family DNA shuﬄing and the elicitation of the presence of the swapping of identical fragments between high-sequence identity parental pairs (silent crossovers). The eSCRATCHY framework (a) led to a newly hypothesized mechanism for the generation of crossovers based on the extension of hybrid duplexes, (b) revealed that fragmentation length has little eﬀect on crossover statistics, and (c) veriﬁed a complete coverage of gene length with potential crossover sites. An in silico case study of six pairs of parental sequences ranging in sequence identity from 33% to 79% revealed that SCRATCHY outperforms DNA shuﬄing by approximately 1.5 crossovers per sequence for all six sequence pairs. Comparisons of eSCRATCHY statistics with experimental naive library sequence data were in good agreement after adjusting the concentration ratio of the incremental truncation libraries. Both eSCRATCHY and experimental results conﬁrmed that the crossover distributions of the incremental truncation libraries are inherited by the SCRATCHY library. ACKNOWLEDGMENTS The authors would like to thank Professor Stephen Benkovic and Dr. Stefan Lutz for developing the SCRATCHY protocol and other helpful discus-

210

Moore and Maranas

sions, and Dr. Shankar Vaidyaraman for help with the model implementation. Financial support by National Science Foundation Award BES0120277, National Science Foundation Career Award CTS9701771, and the Life Science Consortium at Penn State is gratefully acknowledged along with hardware support by the IBM-SUR program.

REFERENCES 1. 2. 3. 4.

5.

6. 7. 8. 9. 10. 11.

12. 13.

14. 15.

S Brakmann. Discovery of superior enzymes by directed molecular evolution. ChemBioChem 2:865–871, 2001. IP Petrounia, FH Arnold. Directed evolution of enzymatic properties. Curr Opin Biotechnol 11:325–330, 2000. C Schmidt-Dannert. Directed evolution of single proteins, metabolic pathways, and viruses. Biochemistry 40:13125–13136, 2001. K Miyazaki, PL Wintrode, RA Grayling, DN Rubingh, FH Arnold. Directed evolution study of temperature adaptation in a psychrophilic enzyme. J Mol Biol 297:1015–1026, 2000. A Crameri, G Dawes, E Rodriguez, S Silver, WPC Stemmer. Molecular evolution of an arsenate detoxiﬁcation pathway by DNA shuﬄing. Nat Biotechnol 15:436–438, 1997. K Furukawa. Engineering dioxygenases for eﬃcient degradation of environmental pollutants. Curr Opin Biotechnol 11:244–249, 2000. F Bruhlmann, W Chen. Tuning biphenyl dioxygenase for extended substrate speciﬁcity. Biotechnol Bioeng 63:544–551, 1999. LP Wackett. Directed evolution of new enzymes and pathways for environmental biocatalysis. Ann NY Acad Sci 864:142–152, 1998. PA Patten, RJ Howard, WPC Stemmer. Applications of DNA shuﬄing to pharmaceuticals and vaccines. Curr Opin Biotechnol 8:724–733, 1997. RG Whalen, R. Kaiwar, NW Soong, J Punnonen. DNA shuﬄing and vaccines. Curr Opin Mol Ther 3:31–36, 2001. G Marzio, K Verhoef, M Vink, B Berkhout. In vitro evolution of a highly replicating, doxycycline-dependent HIV for applications in vaccine studies. Proc Natl Acad Sci USA 98:6342–6347, 2001. NW Soong, L Nomura, K Pekrun, M Reed, L Sheppard, G Dawes, WPC Stemmer. Molecular breeding of viruses. Nat Genet 25:436–439, 2000. SK Powell, MA Kaloss, A Pinskstaﬀ, R McKee, I Burimski, M Pensiero, E Otto, WP Stemmer, NW Soong. Breeding of retroviruses by DNA shuﬄing for improved stability and processing yield. Nat Biotechnol 18:1279–1282, 2000. WPC Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature (London) 370:389–391, 1994. WPC Stemmer. DNA shuﬄing by random fragmentation and reassembly: In vitro recombination for molecular evolution. Proc Natl Acad Sci USA 91:10747– 10751, 1994.

Directed Evolution Protocols

211

16. S Lutz, M Ostermeier, GL Moore, CD Maranas, SJ Benkovic. Creating multiple crossover libraries independent of sequence identity. Proc Natl Acad Sci USA 98:11248–11253, 2001. 17. GL Moore, CD Maranas, S Lutz, SJ Benkovic. Predicting crossover generation in DNA shuﬄing. Proc Natl Acad Sci USA 98:3226–3231, 2001. 18. A Crameri, S Raillard, E Bermudez, WPC Stemmer. DNA shuﬄing of a family of genes from diverse species accelerates directed evolution. Nature (London) 391:288–291, 1998. 19. JE Ness, M Welch, L Giver, M Bueno, JR Cherry, TV Borchert, WPC Stemmer, L Minshull. DNA shuﬄing of subgenomic sequences of subtilisin. Nat Biotechnol 17:893–896, 1999. 20. HT Allawi, J SantaLucia. Nearest-neighbor thermodynamics of internal A–C mismatches in DNA: sequence dependence and pH eﬀects. Biochemistry 37: 9435–9444, 1998. 21. HT Allawi, J SantaLucia. Nearest neighbor thermodynamic parameters for internal G–A mismatches in DNA. Biochemistry 37:2170–2179, 1998. 22. HT Allawi, J SantaLucia. Thermodynamics of internal C–T mismatches in DNA. Nucleic Acids Res 26:2694–2701, 1998. 23. HT Allawi, J SantaLucia. Thermodynamics and NMR of internal G–T mismatches in DNA. Biochemistry 36:10581–10594, 1997. 24. J SantaLucia. A uniﬁed view of polymer, dumbbell, and oligonucleotide DNA nearest neighbor thermodynamics. Proc Natl Acad Sci USA 95:1460–1465, 1998. 25. N Peyret, PA Seneviratne, HT Allawi, J SantaLucia. Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A–A, C–C, G–G, and T–T mismatches. Biochemistry 38:3468–3477, 1999. 26. S Bommarito, N Peyret, J SantaLucia. Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res 28:1929–1934, 2000. 27. PM Howley, MF Israel, M Law, MA Martin. A rapid method for detecting and mapping homology between heterologous DNAs. Evaluation of polyomavirus genomes. J Biol Chem 254:4876–4883, 1979. 28. GL Moore, CD Maranas. Modeling DNA mutation and recombination for directed evolution experiments. J Theor Biol 205:483–503, 2000. 29. GL Moore, CD Maranas, KR Gutshall, JE Brenchley. Modeling and optimization of DNA recombination. Comp Chem Eng 24:693–699, 2000. 30. M Ostermeier, JH Shim, SJ Benkovic. A combinatorial approach to hybrid enzymes independent of DNA homology. Nat Biotechnol 17:1205–1209, 1999. 31. T Kumamaru, H Suenaga, M Mitsuoka, T Watanabe, K Furukawa. Enhanced degradation of polychlorinated biphenyls by directed evolution of biphenyl dioxygenase. Nat Biotechnol 16:663–666, 1998. 32. H Zhao, FH Arnold. Functional and nonfunctional mutations distinguished by random recombination of homologous genes. Proc Natl Acad Sci USA 94:7997– 8000, 1997. 33. RJ Siezen, JA Leunissen. Subtilases: The superfamily of subtilisin-like serine proteases. Protein Sci 6:501–523, 1997.

212

Moore and Maranas

34. K Chen, FH Arnold. Enzyme engineering for nonaqueous solvents: Random mutagenesis to enhance activity of subtilisin E in polar organic media. Bio/ technology 9:1073–1077, 1991. 35. K Chen, AC Robinson, MEV Dam, P Martinez, C Economou, FH Arnold. Enzyme engineering for nonaqueous solvents: II. Additive eﬀects of mutations on the stability and activity of subtilisin E in polar organic media. Biotechnol Prog 7:125–129, 1991. 36. M Ostermeier, SJ Benkovic. Construction of hybrid gene libraries involving the circular permutation of DNA. Biotechnol Lett 23:303–310, 2001. 37. S Lutz, M Ostermeier, SJ Benkovic. Rapid generation of incremental truncation libraries for protein engineering using alpha-phosphothioate nucleotides. Nucleic Acids Res 29:E16, 2001. 38. V Sieber, CA Martinez, FH Arnold. Libraries of hybrid proteins from distantly related sequences. Nat Biotechnol 19:456–460, 2001. 39. M Ostermeier, AE Nixon, SJ Benkovic. Incremental truncation as a strategy in the engineering of novel biocatalysts. Bioorg Med Chem 7:2139–2144, 1999.

9 Rational Redesign of Enzymes Jens Erik Nielsen* University of California, San Diego La Jolla, California, U.S.A.

1

INTRODUCTION

The enormous acceleration of reaction rates that enzymes provide is one of the foundations of life as we know it. For many years, the study of metabolic enzymes, their reactions, and the way that ATP is generated has been the main focus of enzymology (1). However, classic metabolism is only the most well-known function that enzymes perform. The enzymes that participate in DNA repair, transcription, and replication, and the enzymes that play roles in signal transduction and cell morphology control, are examples of often bigger and more complicated enzymes and enzyme complexes that are currently being studied intensely (2–4). Therefore, it is not hard to ﬁnd a good reason to study how enzymes work, and, consequently, there is a steady ﬂow of information on new enzymes, their substrate speciﬁcities, catalytic mechanisms, and cellular roles in current biological journals. The aim of the current review is not to provide a comprehensive description of diﬀerent classes of enzymes and their functional diversity,

*Current aﬃliation: University College Dublin, Dublin, Ireland.

213

214

Nielsen

but rather to provide a description of how well we understand enzymes, and how good we are at using our knowledge to manipulate enzymes. The manipulation of enzymes is desirable not only from a scientiﬁc point of view, but also because enzyme-catalyzed reactions often present a more environmentally sound alternative to industrial processes (5,6). The conditions of industrial processes are often very diﬀerent from the conditions for similar reactions in nature, and it is therefore sometimes necessary to improve the thermostability, to change the substrate speciﬁcity, or to change another characteristic of an enzyme to make it perform adequately in an industrial process (6). The ability to change a certain characteristic of an enzyme is promoted by a good understanding of the characteristic that is to be changed. Thus if one wants to increase the catalytic performance of an enzyme at acidic pH and high temperature, it is generally a good idea to know how and why the high temperature and the acidic pH inﬂuences the enzymatic rate enhancement if one is to successfully redesign the enzyme. Recent years have seen the development of several methods that rely on selection or screening methods to pick out optimized enzyme mutants from artiﬁcially generated enzyme populations [the so-called directed evolution techniques (7)]. These methods do not depend on a detailed understanding of the enzyme in question, and the relative success of these methods compared to classic protein engineering techniques in designing novel and more eﬃcient enzymes (8–10) provides a somewhat sobering measure of our understanding of enzymes. However, while directed evolution techniques to a large extent may be able to provide enzymes with almost any set of characteristics, they have so far failed to give us detailed insights into the principles of enzyme catalysis. While industrially irrelevant, it is scientiﬁcally unsatisfactory to employ black-box methods that simply provide solutions without explanations. In the following sections, I will focus mainly on what we know about enzymes, on how they work, and on what rational enzyme engineering studies have told us about enzymes. 2

HOW DO ENZYMES WORK?

The classical view of enzyme catalysis is that enzymes are optimized to stabilize the transition state of the reactants, and in that way accelerate the rate of catalysis (11,12). This may give the impression that designing an efﬁcient enzyme is simply about making a protein that binds the transition state of a given reaction as strongly as possible. Fig. 1 shows the energy diagram of a hypothetical reaction for both an uncatalyzed pathway in water and an enzyme-catalyzed pathway. It is seen that the rate enhancement of enzymes is not solely dependent on a strong binding of the transition state, but also on the ability to bind the substrate and product ground states much less tightly than the transition state as compared to the energy of these three

Rational Redesign of Enzymes

215

Figure 1 Energy diagram for a hypothetical reaction showing both the uncatalyzed and enzyme-catalyzed pathways. It is seen that the true rate enhancement provided by the enzyme stems not only from a high stabilization of the transition state, but also from a relatively low stabilization of the substrate and product ground states.

states in the reference state (13,14). If an enzyme equally reduces the energy of all three states (i.e., in the ﬁgure DGz = DGS = DGP), then the enzyme does not enhance the rate of the reaction at all compared to the rate in the reference state. Thus, it is the diﬀerential stabilization of the transition state as compared to the stabilization of the ground state of both the substrate and the product that is the foundation of the rate acceleration of enzymes. In recent years, much work has been devoted to understanding how enzymes achieve this remarkable stabilization of the transition state relative to the two ground states. Several speciﬁc eﬀects such as the low-barrier hydrogen bond (15), solvent reorganization (14), and electrostatics (13) were proposed to account for the diﬀerential stabilization of the transition state. The references above provide a comprehensive discussion of these eﬀects and

216

Nielsen

their proposed importance, and as such these theories provide several logical routes to understanding enzymes. However, the application of these theories in redesigning enzymes is still very limited. So although it is clear that low-barrier hydrogen bonds, solvent reorganization, and the detailed electrostatics play important roles in enzyme catalysis, it is not possible to translate this information into a set of point mutations that will make a given enzyme more eﬃcient (16). This is partly because we do not have a good understanding of exactly how the above eﬀects play a role, and also because we, in most cases, do not have a suﬃciently detailed understanding of the catalytic mechanism of the enzyme in question. Without a very detailed understanding of the reaction mechanism, it is diﬃcult to rationally improve the way the enzyme diﬀerentially binds the two ground states and the transition state.

3

CATALYTIC MECHANISMS

Understanding the catalytic mechanism is an essential step in improving the performance of an enzyme. Classic biochemistry experiments for determining catalytic mechanisms involve the mutation of a set of residues in the active site to identify residues that are essential for the catalytic activity. Once these residues are identiﬁed and once a 3-D structure or a homologybuilt model of the enzyme is available, the next logical step is to obtain structural information of the binding mode of the proposed transition state inhibitors and optionally suicide inhibitors in an attempt to elucidate the details of the catalytic mechanism. All of these methods not only provide valuable information on the catalytic mechanism, but all also provide ample room for diﬀerent interpretations of the experimental results as will be exempliﬁed in the following. 3.1

Alpha-Amylases

In the a-amylases (17) and the related cyclodextrin glycosyltransferases (CGTases) (18), the active site includes three highly conserved acidic residues that can all potentially act as an acid/base catalyst or as the nucleophile in the catalytic mechanism. Mutation of any one of the three acids to the corresponding amide leads to a signiﬁcant loss of activity. However, while some research groups reported a complete loss of activity when knocking out any one of the three acids (19,20), other groups reported that traces of activity were present when mutating one or more of the active-site residues (21,22). Consequently, it was unclear which two protein residues were the true active residues. Only recent x-ray structural work (23) has been able to resolve the issue, although controversy still exists on the level of activity of the active-site

Rational Redesign of Enzymes

217

Figure 2 The two possible catalytic mechanisms for HEWL. Path A, the currently accepted view. Path B, the previously accepted mechanism. R, oligosaccharide chain, RV, peptidyl side chain. (From Ref. 28.)

218

Nielsen

mutants (23,24). Clear information on the catalytic mechanism is very important in, e.g., the attempts to reengineer alpha-amylase and CGTase pH–activity proﬁles (25,26), which is of importance in the industrial application of these enzymes (27). 3.2

Lysozyme and Thermolysin

For other enzymes, accepted catalytic mechanisms have been contested in recent years. In Hen Egg White Lysozyme (HEWL), this had the implication that one of the textbook examples of catalytic mechanisms had to be revised. It has long been accepted that the catalytic mechanism of HEWL proceeded via a carbo-cation intermediate (Fig. 2), which was created by protonation of the glycosidic oxygen followed by an SN1 elimination of the aglycon part of the substrate. However, recent experimental work by Vocadlo et al. (28) using suicide inhibitors showed that the catalytic reaction proceeds via a covalent substrate–enzyme intermediate although the previously accepted mechanism cannot be completely ruled out. In the case of thermolysin, the experimental data are less clear-cut, and currently there are two diﬀerent views of the thermolysin catalytic mechanism. The original mechanism was proposed by Kester and Matthews (29) based on the x-ray data of thermolysin in complex with inhibitors. However, data on the pH-dependent characteristics from kinetic studies carried out by Mock et al. (30,31) are seemingly inconsistent with the mechanism originally proposed by Matthews, and thus has led Mock et al. to propose an alternative reverse protonation mechanism for the catalytic reaction. This has spurred new interest in the catalytic mechanism of thermolysin (32,33), which has led to new insights into the details of the thermolysin mechanism. Examining the details of the arguments in favor of each of the two proposed mechanisms for thermolysin is beyond the scope of this review, but it is clear from the three examples above that it is nontrivial to be absolutely certain about the details of the catalytic mechanisms for even extremely wellstudied enzymes. Similar controversies are likely to appear for a large number of enzymes once catalytic mechanisms are investigated in detail. For the ﬁeld of enzyme design, the consequences are far-reaching because it introduces an extra layer of uncertainty when designing point mutations and interpreting their results.

4

REDESIGN OF ENZYMES

Redesigning an enzyme normally has a very speciﬁc goal, and most redesigned enzymes have had their stability improved for their use in speciﬁc industrial processes. The highly improved stability of enzymes used in washing powder and in starch liquefaction processes are well-known examples rep-

Rational Redesign of Enzymes

219

resenting this class of redesigned enzymes (24,34,35). Examples of other types of reengineering come from the cyclodextrin glycosyl transferases where the product speciﬁcity has been changed (Ref. (18) and references therein) from a butyrylcholinesterase, which has been reengineered as a cocaine hydrolase (36), and from Bacillus circulans xylanase, where Joshi et al. (37) engineered a remarkable shift in the pH–activity proﬁle the enzyme. Brieﬂy, enzyme engineering can be conceptually divided into three categories depending on the objective of the engineering project. The three categories are activity engineering (changing kcat), substrate speciﬁcity engineering (changing the substrate speciﬁcity), and stability engineering (improving the resistance of the enzyme structure to temperature, pH, or other eﬀects). 4.1

Stability Engineering

Several concepts for engineering protein stability have been established and are found to be applicable to a wide range of proteins. Stabilization by the insertion of prolines (38), by cavity ﬁlling (39), and by helix capping (40) are all well-known concepts and will not be reviewed here. Recent years have seen the development of Protein Design Algorithms (PDAs), which combine knowledge of the determinants of protein stability with in silico mutations of a protein. This way, it is possible to predict point mutations that will stabilize a given protein (41). PDAs contain a force ﬁeld for evaluating the stability of a given protein and a search algorithm that searches sequence/ structure space in an attempt to optimize the stability given by the force ﬁeld. In recent years, signiﬁcant progress has been made both in constructing better force ﬁelds and in constructing search algorithms so that sequence/structure space is faster and more eﬃciently searched. The most accurate force ﬁelds for predicting protein stability are currently statistically based methods (42–44) and semiempirical methods (45–48). Both approaches provide predictions of protein stability with an accuracy of around 1 kcal/mol for a single point mutation. Current state-of-the art protein search algorithms are based on the mean-ﬁeld method and on the dead-end elimination theorem (Ref. 49 and references therein). Presently, it is feasible to perform quite extensive searches of sequence/structure space even for quite large proteins. However, presently, most PDAs use relatively simple force ﬁelds, which provide only a rough description of steric eﬀects, desolvation energies, and a simple treatment of hydrogen bonds. The coming years will undoubtedly see the incorporation of more accurate force ﬁelds in PDAs to improve the prediction accuracy. 4.2

Engineering Substrate Specificity

Engineering of enzyme substrate speciﬁcity is closely related to the inverse problem of structure-based drug design (50,51), where numerous tools have

220

Nielsen

been developed for the construction and docking of small ligands in enzyme active sites (52–55). Fewer tools are available for designing novel substrate speciﬁcities for enzymes, but several PDAs have been modiﬁed to optimize the stability of a given enzyme–substrate complex by performing point mutations in the enzyme. The program DEZYMER (56,57) has been successfully used to construct an iron superoxide dismutase (58) using Escherichia coli thioredoxin as scaﬀold. Other examples of the engineering of novel substrate speciﬁcities/catalytic activities have also been reported (59,60). 4.3

Stability vs. Activity

An aspect of protein stability that is unique to enzymes is the correlation between thermostability and catalytic activity for naturally occurring enzymes. If enzymatic activity is measured at a given temperature, it is often found that an enzyme from a psycrophilic organism have a higher activity than the corresponding enzymes from a mesophilic organism, which in turn have a higher activity than the version of the enzyme from a thermophilic organism. The stability of the same enzymes is found to be in the reverse order, with the thermophilic enzyme being the most stable and the psycrophilic enzyme being the least stable. This inverse relationship between the stability of a given enzyme and its activity suggests that enzymes perform best at temperatures relatively close to their melting temperature (Tm). The reason for this behavior is that enzymes in nature have been designed to be only marginally stable at the temperature where they function, because a too stable enzyme in a cell would be diﬃcult to degrade when its activity was no longer beneﬁcial to the cell (61). The lower activity for more stable enzymes is a result of decreased mobility in the active site region. Therefore it is possible to engineer enzymes that have a high catalytic activity even at temperatures far from their Tm (62), provided that the mobility of the active site region is maintained. 4.4

Engineering Catalytic Activity

In contrast to the more well-understood ﬁeld of stability engineering, the reengineering of the catalytic mechanism itself is only emerging. Boundaries between engineering substrate speciﬁcity or stability and the ﬁeld of engineering catalytic activity are poorly deﬁned because the stabilization of an enzyme often inﬂuences its catalytic activity by changing the active site mobility, and the engineering of the substrate or product speciﬁcity of an enzyme most certainly changes the kcat and kcat/Km values for certain substrates. Here I deﬁne engineering catalytic activity as engineering that aim at modifying a property of the catalytic activity of the enzyme, which is not limited by the substrate/product speciﬁcity of the enzyme nor by the stability

Rational Redesign of Enzymes

221

of the enzyme. Such an aim could be, e.g., to increase the activity of the enzyme at a certain temperature, to increase the activity of an enzyme at acidic pH, to make an enzyme independent of its natural cofactor, or to completely redesign the catalytic mechanism of the enzyme. Rational reengineering of the catalytic mechanism of an enzyme has not yet, to the author’s knowledge, been achieved, and attempts at both changing the pH–activity proﬁle of an enzyme and the construction of enzymes with higher activity were proven to be diﬃcult. However, the interpretation of the results obtained in the engineering process often lead to surprising new insights in the catalytic mechanism, as illustrated by the example below. 4.5

Changing the pH–Activity Profile of B. Circulans Xylanase

The family 11 xylanases [for classiﬁcation nomenclature, see the CaZy database (63)] use a double-displacement mechanism similar to that of lysozyme (Fig. 2) for cleaving xylan polymers. In this type of mechanism, the lower limb of the pH–activity proﬁle is controlled by the pKa of the nucleophilic residue (Asp-52 in HEWL), and the alkaline limb of the pH–activity proﬁle is determined by the pKa of the acid/base catalyst (Glu-35 in HEWL). For B. circulans xylanase, it had been shown that the apparent pKa values from the pH–activity proﬁle coincided with the pKa values for the nucleophile (Glu78, pKa 4.6) and the acid/base catalyst (Glu-172, pKa 6.7) (64). In xylanases with an acidic pH optimum, an aspartic acid forms a hydrogen bond with Glu-172, while in xylanases with a neutral pH optimum, this residue is an asparagine (37). Joshi et al. (37) studied the eﬀect of substituting this asparagine (Asn-35) in a neutral xylanase with an aspartic acid (Asp-35), and observed, in accordance with expectations, that the pH optimum of the enzyme was shifted to more acidic pH values. Measurement of the pKa values of the active-site acids surprisingly revealed that the pH dependence of catalysis in the mutant is determined by the pKa values of Asp-35 (pKa 3.7) and Glu-78 (pKa 5.7). Joshi et al. (37) conﬁrmed that the catalytic mechanism was unchanged in the mutant, and therefore concluded that the mutant xylanase operated by an inverse protonation mechanism, where the enzyme is active when Asp-35 is protonated and Glu-78 is charged. The pKa of Asp-35 is signiﬁcantly lower than the pKa of Glu-78 and the fraction of enzyme molecules with this protonation state is therefore lower than 1% at all pH values. Because the mutant enzyme is approximately 20% more active than the wild type, the authors suggest that that the inherent catalytic activity of the mutant enzyme must be at least a hundredfold higher than the wild type (37), thus providing a surprising explanation for the observed pH-optimum shift. Current methods for predicting pKa values of protein titratable groups (65) can aid in the interpretation of such results (66), but no tools or ana-

222

Nielsen

lytical methods exist that can reliably predict the proposed 100-fold increase in catalytic activity of the mutant protein. 5

DISCUSSION

Remarkable results in the redesign of enzymes for speciﬁc applications have been achieved in the last decade. Thanks to the extensive research in the mechanisms that govern protein stability, it is now possible to improve the thermostability of almost any given protein by following a few simple rules. Further improvements in the thermostability may be achieved by applying protein design algorithms, which in some cases will be able to produce even more thermostable variants of an enzyme by repacking parts of the hydrophobic core of the protein. So far, the engineering of catalytic activity has been proven to be the most diﬃcult area of enzyme design. This is mainly because of our poor understanding of the speciﬁc eﬀects that make a certain enzyme a highly eﬃcient catalyst (high kcat) compared to other similar enzymes. Experimental and theoretical results (67–70) suggest that the dynamics of the enzyme play an important role in the catalytic mechanism of the enzyme, and because neither present theoretical methods nor the present experimental techniques are capable of giving clear-cut answers on how important dynamics are and on which correlated motions are necessary for catalysis, it is very diﬃcult to include this information in the enzyme design process. A further drawback of the ﬁeld of enzymology is that only mutational data, which directly leads to publishable conclusions, become available to the scientiﬁc community. The ﬁeld of protein stability has greatly beneﬁted from the large amount of data on stability changes resulting from point mutations, which has been produced in studies of two-state folding proteins. Much of this data has been compiled into the ProTherm database (71), which now provides an essential service to computational biologists who are trying to construct algorithms that can predict changes in protein stability. For enzymes, there is no mass production of experimental kcat, kcat/Km, or Km values for enzyme mutants. Data on kinetics for enzymes are sparse, often carried out under widely diﬀerent conditions, and almost never deposited in an electronic form on the worldwide web. If we are to successfully understand how enzymes work and how we can manipulate their catalytic properties, it will be necessary to generate reproducible kinetic data for a large number of mutant enzymes, and electronically store these data in a publicly available database. With such data, the task of interpreting mutational data and understanding the principles of catalytic mechanisms will be much more feasible, and hopefully lead to theoretical models that can reproduce and predict experimental observations for a wide range of enzymes.

Rational Redesign of Enzymes

223

ACKNOWLEDGMENTS The author wishes to thank Stewart Adcock for suggestions on the manuscript. This work was supported in part by the Danish Natural Science Research Council and by the Howard Hughes Medical Institute.

REFERENCES 1. 2.

3. 4. 5. 6.

7.

8.

9. 10.

11. 12. 13. 14. 15. 16.

CK Matthews, KE van Holde, Biochemistry. Red Wood City, CA: Benjamin/ Cummings Publishing Company, Inc., 1990, pp 339–538. D Jeruzalmi, O Yurieva, Y Zhao, M Young, J Stewart, M Hingorani, M O’Donnell, J Kuriyan. Mechanism of processivity clamp opening by the delta subunit wrench of the clamp loader complex of E. coli DNA polymerase III. Cell 106:417–428, 2001. M Huse, J Kuriyan. The conformational plasticity of protein kinases. Cell 109:275–282, 2002. MJ Tyska, DM Warshaw. The myosin power stroke. Cell Motil Cytoskelet 51: 1– 15, 2002. S Panke, MG Wubbolts. Enzyme technology and bioprocess engineering. Curr Opin Biotechnol 13:111–116, 2002. ME Bruins, AE Janssen, RM Boom. Thermozymes and their applications: A review of recent literature and patents. Appl Biochem Biotechnol 90:155–186, 2001. P Forrer, S Jung, A Pluckthun. Beyond binding: Using phage display to select for structure, folding and enzymatic activity in proteins. Curr Opin Struct Biol 9:514–520, 1999. JR Cherry, MH Lamsa, P Schneider, J Vind, A Svendsen, A Jones, AH Pedersen. Directed evolution of a fungal peroxidase. Nat Biotechnol 17:379–384, 1999. ET Farinas, T Bulter, FH Arnold. Directed enzyme evolution. Curr Opin Biotechnol 12:545–551, 2001. T Sakamoto, JM Joern, A Arisawa, FH Arnold. Laboratory evolution of toluene dioxygenase to accept 4-picoline as a substrate. Appl Environ Microbiol 67: 3882–3887, 2001. L Pauling. Molecular architecture and biological reactions. Chem Eng News 24:1375–1377, 1946. AJ Kirby. Enzyme mechanisms, models, and mimics. Angew Chem, Int Ed Engl 35:707–724, 1996. A Warshel. Electrostatic origin of the catalytic power of enzymes and the role of preorganized active sites. J Biol Chem 273:27035–27038, 1998. WR Cannon, SJ Benkovic. Solvation, reorganization energy, and biological catalysis. J Biol Chem 273:26257–26260, 1998. WW Cleland, PA Frey, JA Gerlt. The low barrier hydrogen bond in enzymatic catalysis. J Biol Chem 273:25529–25532, 1998. JR Knowles. Enzyme catalysis: Not diﬀerent, just better. Nature 350:121–124, 1991.

224

Nielsen

17. EA MacGregor, S Janecek, B Svensson. Relationship of sequence and structure to speciﬁcity in the alpha-amylase family of enzymes. Biochim Biophys Acta 1546:1–20, 2001. 18. BA van der Veen, JC Uitdehaag, BW Dijkstra, L Dijkhuizen. Engineering of cyclodextrin glycosyltransferase reaction and product speciﬁcity. Biochim Biophys Acta 1543:336–360, 2000. 19. A Nakamura, K Haga, S Ogawa, K Kuwano, K Kimura, K Yamane. Functional relationships between cyclodextrin glucanotransferase from an alkalophilic Bacillus and alpha-amylases. Site-directed mutagenesis of the conserved two Asp and one Glu residues. FEBS Lett 296:37–40, 1992. 20. K Takase, T Matsumoto, H Mizuno, K Yamane. Site-directed mutagenesis of active site residues in Bacillus subtilis alpha-amylase. Biochim Biophys Acta 1120:281–288, 1992. 21. C Klein, J Hollender, H Bender, GE Schulz. Catalytic center of cyclodextrin glycosyltransferase derived from X-ray structure analysis combined with sitedirected mutagenesis. Biochemistry 31:8740–8746, 1992. 22. RM Knegtel, B Strokopytov, D Penninga, OG Faber, HJ Rozeboom, KH Kalk, L Dijkhuizen, BW Dijkstra. Crystallographic studies of the interaction of cyclodextrin glycosyltransferase from Bacillus circulans strain 251 with natural substrates and products. J Biol Chem 270:29256–29264, 1995. 23. EH Rydberg, C Li, R Maurus, CM Overall, GD Brayer, SG Withers. Mechanistic analyses of catalysis in human pancreatic alpha-amylase: Detailed kinetic and structural studies of mutants of three conserved carboxylic acids. Biochemistry 41:4492–4502, 2002. 24. JE Nielsen, TV Borchert. Protein engineering of bacterial alpha-amylases. Biochim Biophys Acta 1543:253–274, 2000. 25. JE Nielsen, TV Borchert, G Vriend. The determinants of alpha-amylase pH– activity proﬁles. Protein Eng 14:505–512, 2001. 26. RD Wind, JC Uitdehaag, RM Buitelaar, BW Dijkstra, L Dijkhuizen. Engineering of cyclodextrin product speciﬁcity and pH optima of the thermostable cyclodextrin glycosyltransferase from Thermoanaerobacterium therosulfurigenes EM1. J Biol Chem 273:5771–5779, 1998. 27. H Guzman-Maldonado, O Paredes-Lopez. Amylolytic enzymes and products derived from starch: A review. Crit Rev Food Sci Nutr 35:1730–1742, 1995. 28. DJ Vocadlo, GJ Davies, R Laine, SG Withers. Catalysis by hen egg-white lysozyme proceeds via a covalent intermediate. Nature 412:835–838, 2001. 29. WR Kester, BW Matthews. Crystallographic study of the binding of dipeptide inhibitors to thermolysin: Implications for the mechanism of catalysis. Biochemistry 16:2506–2516, 1977. 30. WL Mock, DJ Stanford. Arazoformyl dipeptide substrates for thermolysin. Conﬁrmation of a reverse protonation catalytic mechanism. Biochemistry 35:7369–7377, 1996. 31. WL Mock, M Aksamawati. Binding to thermolysin of phenolate-containing inhibitors necessitates a revised mechanism of catalysis. Biochem J 302(pt 1): 57– 68, 1994.

Rational Redesign of Enzymes

225

32. V Pelmenschikov, MR Blomberg, PE Siegbahn. A theoretical study of the mechanism for peptide hydrolysis by thermolysin. J Biol Inorg Chem 7:284–298, 2002. 33. A Beaumont, MJ O’Donohue, N Paredes, N Rousselet, M Assicot, C Bohuon, MC Fournie-Zaluski, BP Roques. The role of histidine 231 in thermolysin-like enzymes. A site-directed mutagenesis study. J Biol Chem 270:16803–16808, 1995. 34. PN Bryan. Protein engineering of subtilisin. Biochim Biophys Acta 1543:203– 222, 2000. 35. BS Harley, N Hanlon, RJ Jackson, M Rangarajan. Glucose isomerase: Insights into protein engineering for increased thermostability. Biochim Biophys Acta 1543:294–335, 2000. 36. H Sun, YP Pang, O Lockridge, S Brimijoin. Re-engineering butyrylcholinesterase as a cocaine hydrolase. Mol Pharmacol 62:220–224, 2002. 37. MD Joshi, G Sidhu, I Pot, GD Brayer, SG Withers, LP McIntosh. Hydrogen bonding and catalysis: A novel explanation for how a single amino acid substitution can change the pH optimum of a glycosidase. J Mol Biol 299:255– 279, 2000. 38. BW Matthews, H Nicholson, WJ Becktel. Enhanced protein thermostability from site-directed mutations that decrease the entropy of unfolding. Proc Natl Acad Sci U S A 84:6663–6667, 1987. 39. M Karpusas, WA Baase, M Matsumura, BW Matthews. Hydrophobic packing in T4 lysozyme probed by cavity-ﬁlling mutants. Proc Natl Acad Sci U S A 86:8237–8241, 1989. 40. R Aurora, GD Rose. Helix capping. Protein Sci 7:21–38, 1998. 41. SM Malakauskas, SL Mayo. Design, structure and stability of a hyperthermophilic protein variant. Nat Struct Biol 5:470–475, 1998. 42. D Gilis, M Rooman. PoPMuSiC, an algorithm for predicting protein mutant stability changes: Application to prion proteins. Protein Eng 13:849–856, 2000. 43. D Gilis, M Rooman. Predicting protein stability changes upon mutation using database-derived potentials: Solvent accessibility determines the importance of local versus non-local interactions along the sequence. J Mol Biol 272:276–290, 1997. 44. D Gilis, M Rooman. Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derived potentials. J Mol Biol 257: 1112– 1126, 1996. 45. E Lacroix, AR Viguera, L Serrano. Elucidating the folding problem of alphahelices: Local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. J Mol Biol 284:173–191, 1998. 46. V Munoz, L Serrano. Development of the multiple sequence approximation within the AGADIR model of alpha-helix formation: Comparison with Zimm– Bragg and Lifson–Roig formalisms. Biopolymers 41:495–509, 1997. 47. K Takano, M Ota, K Ogasahara, Y Yamagata, K Nishikawa, K Yutani. Experimental veriﬁcation of the ‘‘stability proﬁle of mutant protein’’ (SPMP) data using mutant human lysozymes. Protein Eng 12:663–672, 1999. 48. R Guerois, JE Nielsen, L Serrano. Predicting changes in the stability of pro-

226

49.

50. 51. 52.

53. 54. 55. 56.

57.

58.

59. 60. 61. 62. 63.

64.

65.

Nielsen teins and protein complexes: A study of more than 1000 mutations. J Mol Biol 320(2):369–387, 2002. LL Looger, HW Hellinga. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: Implications for protein design and structural genomics. J Mol Biol 307:429–445, 2001. G Klebe. Recent developments in structure-based drug design. J Mol Med 78:269–281, 2000. TL Blundell, H Jhoti, C Abell. High-throughput crystallography for lead discovery in drug design. Nat Rev Drug Discov 1:45–54, 2002. F Osterberg, GM Morris, MF Sanner, AJ Olson, DS Goodsell. Automated docking to multiple target structures: Incorporation of protein mobility and structural water heterogeneity in AutoDock. Proteins 46:34–40, 2002. HJ Bohm. A novel computational tool for automated structure-based drug design. J Mol Recognit 6:131–137, 1993. H Claussen, C Buning, M Rarey, T Lengauer. FlexE: Eﬃcient molecular docking considering protein structure variations. J Mol Biol 308:377–395, 2001. M Rarey, B Kramer, T Lengauer. Time-eﬃcient docking of ﬂexible ligands into active sites of proteins. Proc Int Conf Intell Syst Mol Biol 3:300–308, 1995. HW Hellinga, JP Caradonna, FM Richards. Construction of new ligand binding sites in proteins of known structure II. Grafting of a buried transition metal binding site into Escherichia coli thioredoxin. J Mol Biol 222:787–803, 1991. HW Hellinga, FM Richards. Construction of new ligand binding sites in proteins of known structure I. Computer-aided modeling of sites with pre-deﬁned geometry. J Mol Biol 222:763–785, 1991. AL Pinto, HW Hellinga, JP Caradonna. Construction of a catalytically active iron superoxide dismutase by rational protein design. Proc Natl Acad Sci U S A 94:5562–5567, 1997. DN Bolon, SL Mayo. Enzyme-like proteins by computational design. Proc Natl Acad Sci U S A 98:14274–14279, 2001. DN Bolon, CA Voigt, SL Mayo. De novo design of biocatalysts. Curr Opin Chem Biol 6:125–129, 2002. R Jaenicke. Stability and stabilization of globular proteins in solution. J Biotechnol 79:193–203, 2000. B Van den Burg, G Vriend, OR Veltman, G Venema, VG Eijsink. Engineering an enzyme to resist boiling. Proc Natl Acad Sci U S A 95:2056–2060, 1998. PM Coutinho, B Henrissat. Carbohydrate-active enzymes: An integrated database approach. In: HJ Gilbert, G Davies, B Henrissat, B Svensson, eds. Recent Advances in Carbohydrate Bioengineering. Cambridge: The Royal Society of Chemistry, 1999, pp 3–12. LP McIntosh, G Hand, PE Johnson, MD Joshi, M Korner, LA Plesniak, L Ziser, WW Wakarchuk, SG Withers. The pKa of the general acid/base carboxyl group of a glycosidase cycles during catalysis: A 13C-NMR study of Bacillus circulans xylanase. Biochemistry 35:9958–9966, 1996. JE Nielsen, G Vriend. Optimizing the hydrogen-bond network in Poisson– Boltzmann equation-based pK(a) calculations. Proteins 43:403–412, 2001.

Rational Redesign of Enzymes

227

66. MD Joshi, G Sidhu, JE Nielsen, GD Brayer, SG Withers, LP McIntosh. Dissecting the electrostatic interactions and pH-dependent activity of a family 11 glycosidase. Biochemistry 40:10115–10139, 2001. 67. D Vitkup, D Ringe, GA Petsko, M Karplus. Solvent mobility and the protein ‘‘glass’’ transition. Nat Struct Biol 7:34–38, 2000. 68. BF Rasmussen, AM Stock, D Ringe, GA Petsko. Crystalline ribonuclease A loses function below the dynamical transition at 220 K. Nature 357:423–424, 1992. 69. PK Agarwal, SR Billeter, PT Rajagopalan, SJ Benkovic, S Hammes-Schiﬀer. Network of coupled promoting motions in enzyme catalysis. Proc Natl Acad Sci U S A 99:2794–2799, 2002. 70. PT Rajagopalan, SJ Benkovic. Preorganization and protein dynamics in enzyme catalysis. Chem Rec 2:24–36, 2002. 71. MM Gromiha, H Uedaira, J An, S Selvaraj, P Prabakaran, A Sarai. ProTherm, thermodynamic database for proteins and mutants: Developments in version 3.0. Nucleic Acids Res 30:301–302, 2002.

10 Details in the Reaction Mechanism of Chitinases ˚seidnes, Vincent G. H. Eijsink, Gustav Kolstad, Sigrid Ga and Bjørnar Synstad Agricultural University of Norway ˚ s, Norway A

Martin G. Peter University of Potsdam Golm, Germany

Jens Erik Nielsen* University of California, San Diego La Jolla, California, USA

David Komander, Douglas Houston, and Daan M. F. van Aalten University of Dundee Dundee, Scotland

1

INTRODUCTION

Chitin, a linear polymer of h(1,4)-linked N-acetyl glucosamine (NAG), is one of the most abundant polysaccharides in nature. It occurs in a large number of *Current aﬃliation: University College Dublin, Dublin, Ireland.

229

230

Eijsink et al.

species, most prominently as a principal structural component of the exoskeleton of insects and crustaceans, as well as in the cell walls of a variety of fungi (1,2). Thus, it is not surprising that chitinolytic enzymes are abundant in nature, occurring in a large variety of organisms varying from prokaryotes to man. Chitin-containing organisms need chitinases during their normal life cycles, whereas other organisms (mainly bacteria) use these enzymes to exploit chitin as an energy source. Plants produce chitinolytic enzymes as part of their defense against chitin-containing pathogens. Interestingly, the malaria-causing Plasmodium falciparum depends on chitinases during its life cycle (3,4). Chitinases are of biotechnological interest for several reasons. First, these enzymes may be used for the conversion of chitin and chitosan (partly deacetylated chitin) into chito-oligosaccharides and NAG. The development of enzymatic methods for these conversions is of interest because the presently established chemical procedures for synthesis are highly laborious. Second, chitinases may be applied as inhibitors of chitin-containing pathogenic fungi and insects (5). Third, inhibitors of chitinases are of interest because they may be used to interfere with the life cycles of chitin-containing pathogens, or to interfere with chitinase-dependent transmission mechanisms of parasites (3,4,6). The use of chitinolytic enzymes as industrial biocatalysts requires the availability of enzymes that are suﬃciently stable and active under process conditions. Chito-oligosaccharides that are produced have several potential applications in agriculture and medicine, and there is evidence that the biological activity of these compounds depends on both oligomer length and the sequence of NAG and glucosamine residues (see Ref. 7 and references therein) (8). Thus, during the development of industrial chitinolytic biocatalysts, the substrate speciﬁcity and product speciﬁcity of the enzymes must also be considered. Protein engineering studies of chitinases are scarce, but recently, several interesting studies have appeared (9–12). Almost all published studies concern work aimed at unraveling the catalytic mechanism, which is a basis for future eﬀorts in rational redesign. The catalytic mechanism of one important class of chitinases has now been determined in great detail, providing fascinating insights into the complexity of catalysis. 2

CATALYSIS IN GLYCOSIDE HYDROLASES

Chitinases are just one member of an enormous collection of naturally occurring enzymes that are able to hydrolyze glycosidic bonds. These so-called glycoside hydrolases (or glycosidases) have been classiﬁed into more than 80 families by Couthinho and Henrissat (see Chapter 2 in this book and

Reaction Mechanism of Chitinases

231

http://afmb.cnrs-mrs.fr/CAZY/index.html). Glycosidases are often multidomain proteins, consisting of a catalytic domain and one or more relatively small domains that play a role in interactions with the substrates (13). Families 18 and 19 contain chitinases, whereas several other families contain enzymes called chitosanases. In terms of nomenclature, the diﬀerence between chitinases and chitosanases is rather arbitrary because many enzymes classi-

Figure 1 Catalytic mechanisms of glycoside hydrolases. The double displacement mechanism (left; residue numbering as in hen egg-white lysozyme) leads to retention of anomeric conﬁguration, whereas direct displacement (right) leads to inversion. (From Refs. 15–19.)

232

Eijsink et al.

ﬁed as chitinases also degrade chitosan. In fact, it has been suggested that family 19 chitinases and some of the chitosanases belong to the same enzyme family (14). In general, glycosidases act by two principally diﬀerent catalytic mechanisms that lead to either retention or inversion of the conﬁguration at the anomeric carbon (Fig. 1) (15–19). Retaining enzymes operate via a double displacement mechanism. In the ﬁrst step, protonation of the glycosidic oxygen by one acidic residue (Glu35 in Fig. 1) and a concomitant nucleophilic attack on the anomeric carbon by another acidic residue (Asp52 in Fig. 1) lead to breakage of the scissile bond and formation of a covalent glycosyl– enzyme intermediate (19). In the second step, this intermediate is hydrolyzed by a water molecule that approaches the anomeric carbon from a position close to that of the original glycosidic oxygen. Inverting enzymes operate via

Figure 2 Catalytic mechanism of family 18 chitinases as proposed in Ref. 24 on the basis of results described in Refs. 22 and 23. Family 18 chitinases are retaining enzymes that, however, lack the second acidic residue observed in other retaining glycoside hydrolases (e.g., Asp52 in hen egg-white lysozyme; Fig. 1). The need for this acidic residue is alleviated by anchimeric assistance by the N-acetyl group of the sugar in the 1 position, which leads to formation of an oxazolinium ion intermediate. Note the distortion of the 1 sugar, which is essential for this mechanism to be possible.

Reaction Mechanism of Chitinases

233

a direct displacement mechanism. Protonation of the glycosidic oxygen and breakage of the scissile bond occur concomitantly with nucleophilic attack by an activated water molecule. This water molecule approaches the anomeric carbon from the other side of the sugar plane, explaining why this mechanism leads to an inversion of the conﬁguration at the anomeric carbon. An important structural diﬀerence between retaining and inverting glycoside hydrolases is seen in the distance between the catalytic acid and the catalytic base/nucleophile (Fig. 1), which is approximately 4 A˚ larger in inverting enzymes. Thus, the active site of inverting enzymes provides suﬃcient space for a water molecule, which attacks the anomeric center with direct displacement of the glycosidic oxygen (Fig. 1). It should be noted that the mechanisms shown in Fig. 1 are, in fact, gross simpliﬁcations. Additional amino acid residues besides the catalytic acid are needed for optimal activity, for example, because they contribute to steering the acidity of this proton donor during the catalytic cycle. The presence and the importance of larger interaction networks in the active sites of glycosidases are well illustrated by recent works on a retaining family 11 xylanase (20,21). Family 19 chitinases are inverting enzymes, and their active sites resemble the active sites of other inverting glycoside hydrolases. However, in the probably most widespread family of chitinases, family 18, catalysis proceeds via a unique substrate-assisted mechanism (22–24) (Fig. 2), which has been unraveled recently in detail by protein engineering and crystallography. 3

DETAILS OF THE CATALYTIC MECHANISM OF FAMILY 18 CHITINASES

With respect to mechanism, chitinase B (ChiB) from the soil bacterium Serratia marcescens is, to date, one of the most intensively studied family 18 chitinases (10,25–27). ChiB is an exochitinase containing a catalytic domain with a (ha)8 (‘‘TIM barrel’’) fold and a small C-terminal chitin-binding domain, which is likely to be involved in interactions with the substrate (27,28) (Fig. 3A). The substrate-binding cleft contains six subsites, running from 3 (nonreducing end of the substrate) to +3 (reducing end of the substrate). The chitin-binding domain expands the substrate-binding surface toward the ‘‘+’’ (reducing end) direction, which implicates that ChiB degrades chitin from the nonreducing ends of the polysaccharide chains. The substrate-binding groove (Fig. 3B) is covered by two loops (Fig. 3A and B) that protrude from the catalytic TIM barrel, thus conferring a tunnel character to the substrate-binding site (see Ref. 17 for background information on active site architecture). The binding of the substrate results in closure of the roof of the tunnel, thus improving interactions with the substrate (10).

234

Eijsink et al.

Figure 3 The structure of ChiB (panels A and B) and ChiA (panel C) from S. marcescens (Refs. 27,51) Panel A: View into the substrate-binding groove of ChiB showing aromatic side chains that interact with the substrate (Refs. 10,27). The most clearly distinguishable subsites are numbered. The chitin-binding domain lies to the right of the catalytic core. The loop blocking the substrate-binding groove beyond the 3 subsite (arrow) and two loops that form the ‘‘roof’’ of the active site tunnel are drawn as sticks. The side chain of the catalytic acid (Glu144) is shown as a ball-andstick model. Panel B: The active site groove/tunnel of ChiB viewed from the ‘‘minus’’ side (the side where the nonreducing end of the subsite binds; in this view, the chitinbinding domain lies behind the catalytic core). The side chain of Glu144 is shown as a ball-and-stick model. The side chains of Asp316 and Trp97 are shown as sticks; closure of the ‘‘roof’’ of the active site tunnel upon substrate binding involves interactions between these two side chains (Ref. 10). The eight h-strands that make up the core of the (ha)8 barrel are shown as arrows. For clarity, the chitin-binding domain and the loop blocking the substrate-binding groove in front of the 3 subsite were omitted from the ﬁgure. Panel C: View into the substrate-binding groove of ChiA, showing aromatic side chains that interact with the substrate (Refs. 12,32). The most clearly distinguishable subsites are numbered. The chitin-binding domain lies to the left of the catalytic core. The side chain of the catalytic acid (Glu315) is shown as a ball-and-stick model. Note that ChiA and ChiB diﬀer in terms of the location of their chitin-binding domains. The substrate-binding groove of ChiA is open on both sides; in ChiB, there is no 4 subsite but an insertion (compared with ChiA; see panel A) that hampers substrate binding beyond the 3 subsite. The pictures were made using PyMOL. (From Ref. 52.)

Reaction Mechanism of Chitinases

Figure 3

235

Continued.

For comparison, Fig. 3C shows the structure of another chitinase, ChiA, from S. marcescens. This enzyme has a more open active site groove than ChiB and its chitin-binding domain expands the groove on the nonreducing side of the catalytic center. Consequently, ChiA is able to exert some endoactivity (25,26) and, most importantly, its exoactivity results in degradation of the chitin chains from their reducing ends (12,26,27). The main product of chitin hydrolysis by ChiB is NAG2 (25), which, as explained above, is released from the nonreducing end of the polysaccharide

236

Eijsink et al.

chains (27). Crystallography revealed that NAG5 preferably binds to the 2 to +3 sites (10). However, NAG6 binds to the 3 to +3 subsites (25). Longer (polymeric) substrates, covering the chitin-binding domain, apparently do not occupy the nonreducing end 3 subsite and are cleaved predominantly to yield the disaccharide. In family 18 chitinases, the catalytic acid that protonates the glycosidic oxygen is generally a glutamic acid residue (9), which, in the case of ChiB, is Glu144. As illustrated in Fig. 4A, family 18 chitinases contain several other conserved acidic residues. Mutation of these residues resulted in severe decreases in activity (9,29,30), but their functions were, until recently, unknown. Farther away from the active site, near Asp140 in ChiB, family 18 chitinases contain two more conserved residues (Tyr10 and Ser93 in ChiB) whose roles have not been studied until recently. Fig. 4B illustrates that these conserved residues play important roles during catalysis, as explained below. The mutation of Asp140, Asp142, Glu144, and Asp215 to their corresponding amides decreased the catalytic activity of ChiB (Table 1). Similar or even more profound negative eﬀects were observed when these residues were replaced by alanine (30). The alanine and amide mutants displayed similar pH activity proﬁles, which, in most cases, diﬀered considerably from the pH activity proﬁle of the wild type (WT; see below). This shows that the residual activity found in the amide mutants is not due to deamidation of the introduced amide residue. Analysis of the basic arm of the pH activity proﬁles (Fig. 5) of ChiB variants yielded some important insights. Intuitively, one would think that loss of activity at alkaline pH would be caused by deprotonation of the catalytic acid. However, in contrast to all other mutants, the pH activity proﬁle of the E144Q mutant was essentially the same as in the wild-type enzyme. Two major types of shifts in pH activity proﬁles were observed: the D142N mutation yielded an alkaline shift of the pH optimum of at least two units (Fig. 5) whereas in the D140N and D215N mutants, the pH optimum was shifted to the acidic side by at least two units (not shown). The acidic shifts found in D215N and D140N were expected (although diﬃcult to predict and explain quantitatively) because Asp215 and Asp140 are likely to increase the pKa of neighboring ionizable groups (e.g., Asp142 and Glu144). The basic shift in the pH activity proﬁle upon the D142N mutation indicates that the basic arm of the pH activity proﬁle in this mutant is determined by a group with a pKaz 10. The shift also indicates that the basic arm of the pH activity proﬁle in the wild-type enzyme reﬂects a titration of Asp142. The best candidate for the group determining the (not visible nor measurable) basic arm of the pH activity proﬁle in D142N is the catalytic acid, Glu144. It is conceivable that, in the enzyme–substrate complex, Glu144 has an exceptionally high pKa because it is shielded from the

Reaction Mechanism of Chitinases

237

Figure 4 Details of the active site of ChiB highlighting important conserved residues in family 18 chitinases. The pictures are taken from the crystal structures of the E144Q mutant (A) and the E144Q mutant in complex with NAG5 (B) (Ref. 10). Upon substrate binding, Asp142 rotates toward Glu/Gln144; the buried and charged side chains of Asp140 are stabilized because: (1) a hydrogen-bonding water molecule (sphere in panel A) is replaced by a better proton donor, namely the hydroxyl group of Tyr10 (which comes closer to Asp140), and (2) the side chain of Ser93 moves closer, thus strengthening the hydrogen bond between Ser93 and Asp140. In panel B, only three of the sugar moieties, occupying subsites +1 to 2, are shown. Dotted lines indicate hydrogen bonds. The arrow in panel B points from Glu144 toward the glycosidic oxygen that is to be protonated.

238

Eijsink et al. Table 1 of ChiB Variant Wild type D140N D142N E144Q D215N

Catalytic Activity of Active Site Mutants kcat (s1)

Km (AM)

16F4 0.051F0.010 0.33F0.05 0.0037F0.0005 0.92F0.30

32F12 52F16 5F3 20F4 45F22

Reactions were conducted at pH 6.0 and 37jC. The substrate, 4-methylumbelliferyl-NAG2, was converted exclusively to (ﬂuorescent) 4-methylumbelliferone and NAG2 (for experimental details, see Refs. 25 and 30).

Figure 5 The pH activity proﬁles of WT ChiB (.) and the E144Q (E), D142N (D), S93A (o), and Y10F (n) variants. Because the plots are based on kcat, the curves reﬂect pKa values of residues in the enzyme–substrate complex. (From Ref. 10; Copyright n 2001 National Academy of Sciences, USA.)

Reaction Mechanism of Chitinases

239

solvent and because the glycosidic oxygen in its vicinity (3.5 A˚) is likely to carry a partial negative charge due to close contacts between the latter and ionized Asp215 (the closest contact between Asp215–Oy and the glycosidic oxygen is 3.4 A˚). Further insights into the roles of the acidic residues were obtained from structural studies of the E144Q mutant (Fig. 4A), the complex of E144Q with NAG5 (Fig. 4B), the wild type in complex with the pseudotrisaccharide allosamidin,* and a cryo-trapped reaction intermediate obtained by soaking WT crystals in a NAG5 solution (10). The structures of the EQ–NAG5 and WT–allosamidin complexes showed that Asp142 rotates toward Glu144 upon substrate binding (compare Fig. 4A with Fig. 4B). Rotation of Asp142 is accompanied by movements of residues in the core of the TIM barrel, which bring Tyr10 and Ser93 closer to Asp140 (Fig. 4). The importance of Ser93 and Tyr10 was conﬁrmed by the observation that the Y10F and S93A mutations strongly decreased activity (Fig. 5). Most importantly, the pH activity proﬁles of the S93A and Y10F mutants looked very similar to that of D142N, suggesting that these three residues act in a concerted fashion during catalysis (Fig. 5). The EQ–NAG5 complex also showed that the sugar in the 1 subsite is distorted (Fig. 4B), as had been suggested previously (Fig. 2). The sugar ring is in the boat conformation, whereas the N-acetyl group is frozen by hydrogen bonding with Asp142 in a conformation that locates its oxygen atom at only 3.0 A˚ from the anomeric carbon. This results in a nearly colinear orientation of the glycosidic oxygen, the anomeric carbon, and the N-acetyl oxygen, thus preparing the scenario for an SN2-type displacement. The structure of a cryo-trapped reaction intermediate revealed density in the 1 subsite, which showed a good overlap with the density observed for the allosamizoline group of allosamidin, thus providing evidence for an oxazolinium ion intermediate (Fig. 2). The structure also showed a wellordered, displaced disaccharide, which occupied a position between the +1/ +2 and +2/+3 subsites. Displacement of the product from the catalytic center is essential to provide suﬃcient space for the approach of a water toward the anomeric carbon from the h-face, which is necessary for completion of the reaction. Indeed, in the structure, a well-ordered water molecule was visible at 3.0 A˚ from the anomeric carbon (10). Interestingly, the displaced disaccharide interacted with the loops that constitute the ‘‘roof’’ of the active site tunnel (Fig. 3A and B). Thus, it is conceivable * Allosamidin is the pseudotrisaccharide N,NV-diacetyl-allosaminobiosyl allosamizoline (50). It binds to subsites 3 to 1 in ChiB and in other family 18 chitinases. The allosamizoline moiety occupies subsite 1 and resembles the proposed oxazolinium ion intermediate shown in Figs. 2 and 6.

240

Eijsink et al.

that reversal of the roof closure that accompanies substrate binding actively contributes to substrate displacement. Taken together, the results from mutagenesis, enzymological studies, and crystallographic studies lead to the conclusion that family 18 chitinases act by the mechanism displayed in Fig. 6. The crucial role of Asp142 in this mechanism has been conﬁrmed recently by work on a hexosaminidase (31). In family 18 chitinases, rotation of the protonated Asp142 has three important consequences: (a) the conformation of the N-acetyl group of the NAG residue at the 1 subsite is frozen in a position optimal for nucleophilic attack at the anomeric carbon; (b) the hydrogen bond donated by the OH group of Asp142 increases the acidity of Glu144, thus promoting protonation of the glycosidic oxygen; and (c) the positive charge developing upon formation of the oxazolinium ion is stabilized by tight interactions with Asp142. The presence of a proton on rotated Asp142 is crucial, and the results indicate that the basic arm of the pH activity proﬁle of the wild-type enzyme reﬂects titration of this residue. It is important to note that acid catalysis by Glu144 is not enhanced only by Asp142, as catalysis is also closely coupled to Asp140, Tyr10, Ser93, and Asp215 (the role of the latter residue is as yet relatively unclear). One of the most important conclusions of this work is that catalysis in family 18 chitinases involves the concerted action of many

Figure 6 Catalytic mechanism of ChiB. (A) Resting enzyme (note that Glu144 is deprotonated in the empty enzyme; protonation occurs as water is displaced by the substrate). (B) Binding of substrate (only the sugar binding to the 1 subsite is shown) causes distortion of the 1 pyranose ring to a boat conformation and rotation of Asp142 toward Glu144; the rotated Asp142 distorts the N-acetyl group, increases the acidity of Glu144, and stabilizes the developing positive charge. (C) The hydrolysis of the oxazolinium ion by an incoming water molecule completes the reaction. The structural location of the amino acid residues, as well as the mechanism employed to stabilize the buried negatively charged Asp140 after rotation of Asp142, are visualized in Fig. 4. See text for further details. (From Ref. 10; Copyright n 2001 National Academy of Sciences, USA.)

Reaction Mechanism of Chitinases

241

residues, not all of which are obvious elements of what one would call the ‘‘catalytic site’’ (see Ref. 21 for an analysis of a similarly complex network in another glycosidase). 4

SUBSTRATE SPECIFICITY AND PRODUCT SPECIFICITY

So far, there have been no reports in the literature regarding rational engineering of chitinase subsites with the aim of changing substrate speciﬁcity or product speciﬁcity. Such engineering eﬀorts are likely to be needed to obtain biocatalysts that eﬃciently convert chitin or chitosan into deﬁned oligosaccharides. Our own preliminary (unpublished) observations indicate that interesting results may be obtained, but that rationalization is diﬃcult. When attempting a rational redesign of the substrate speciﬁcity and product speciﬁcity of chitinases and other polysaccharide-degrading enzymes, it is important to note that naturally occurring enzymes display fundamental diﬀerences that may give important leads for such attempts. For example, ChiA and ChiB from S. marcescens (Fig. 3) share similar catalytic cores but act on chitin in rather diﬀerent manners. As explained above, the two enzymes degrade chitin chains in diﬀerent directions. This diﬀerence in directionality is likely to have profound eﬀects on substrate aﬃnities in at least some of the subsites because, for example, the 2 site is a ‘‘product site’’ in ChiB (binding one of the sugars in the dimer that is being cleaved oﬀ), whereas it is a ‘‘substrate site’’ in ChiA (binding to a sugar that is part of a long chitin chain). Indeed, potentially relevant diﬀerences in subsite architecture are readily detectable by a structural comparison of ChiA and ChiB, and these diﬀerences are amenable to further studies using site-directed mutagenesis. Although reasonable to assume, there is no proof that ChiA and ChiB are processive enzymes (32). Processive action would render the structural basis of diﬀerences in directionality even more interesting. S. marcescens produces at least one additional chitinase, ChiC, which is an example of another type of family 18 chitinases, presumably having a much more open and shallow active site cleft than ChiA and ChiB. This putative architecture is inferred from sequence comparisons with ChiB and ChiA, which show that major parts of the sequences that make up the ‘‘walls’’ of the active site groove in ChiA and ChiB are lacking in ChiC. Thus, the catalytic core of ChiC may be compared with the plant chitinase, hevamine, for which a crystal structure is known (33) and which is a clear and welldescribed example of a family 18 chitinase with a shallow active site groove. These chitinases do not interact with individual sugars as intimately as enzymes possessing deep substrate-binding site grooves or tunnels. They display little or no activity toward smaller substrates such as NAG3 and NAG4 (26,34) and are thus good candidates as biocatalysts for oligomer produc-

242

Eijsink et al.

tion. However, it may be diﬃcult to further modify the speciﬁcity of these enzymes (e.g., preferences for acetylated versus nonacetylated sugars) because the limited number of interactions per subsite gives the protein engineer a limited number of residues to play with. 5

PERSPECTIVES FOR ENGINEERING CATALYTIC PROPERTIES

A rational redesign of the catalytic properties of enzymes is still a formidable challenge, especially if the goal is to engineer enzymes that are eﬃcient industrial biocatalysts (see Ref. 35 and references therein). One important challenge is understanding the interplay between binding and catalysis (36), which severely complicates a successful redesign of enzyme active sites. Another major challenge lies in the fact that (long-range) electrostatic interactions are diﬃcult to rationalize, although they play a major role during catalysis (37,38) (see also Chapter 9 by J. E. Nielsen). Finally, dynamics and ﬂexibility—phenomena that are not easy to address experimentally—are extremely important for catalytic eﬃciency (36,39–43). The work on chitinases described in this chapter provides an example of the complexity of catalysis, illustrating the importance of electrostatic interactions (cf., the Tyr10–Ser93–Asp140–Asp142–Glu144 assembly), concerted substrate binding and distortion, and conformational ﬂexibility (the abovementioned movement of Tyr10 is achieved by backbone movements of up to 2 A˚). In principle, the detailed knowledge on catalysis provides a basis for future eﬀorts in a rational redesign of the catalytic properties of chitinases, but the major eﬀect of this new knowledge probably is a strong notion that such redesign is quite a challenge. There is hope though, as illustrated by a considerable number of examples of successful redesign that occur in the literature (44–48). The complexity of catalysis and the lack of detailed knowledge of catalytic mechanisms indicate that, today, probably the fastest general route to the development of enzymes with improved catalytic properties includes the use of the combinatorial approaches discussed in other chapters of this volume (49). Rational protein engineering studies are, however, essential for understanding how enzymes work, and will eventually create a knowledge base that leads to an increasing success rate in the rational redesign of catalytic properties. REFERENCES 1.

MG Peter. Chitin and chitosan from fungi. In: EJ Vandamme, S De Baets, A Steinbu¨chel, eds. Biopolymers Vol. 6. Weinheim: Wiley-VCH, 2002, pp 123– 157.

Reaction Mechanism of Chitinases 2.

3.

4.

5. 6.

7.

8.

9.

10.

11.

12.

13. 14.

15. 16.

243

MG Peter. Chitin and chitosan from animal sources. In: EJ Vandamme, S De Baets, A Steinbu¨chel, eds. Biopolymers Vol. 6. Weinheim: Wiley-VCH, 2002, pp 481–574. RC Langer, JM Vinetz. Plasmodium ookinete-secreted chitinase and parasite penetration of the mosquito peritrophic matrix. Trends Parasitol 17:269–272, 2001. YL Tsai, RE Hayward, RC Langer, DA Fidock, JM Vinetz. Disruption of Plasmodium falciparum chitinase markedly impairs parasite invasion of mosquito midgut. Infect Immun 69:4048–4054, 2001. A Herrera-Estrella, I Chet. Chitinases in biological control. In: P Jolle`s, RAA Muzzarelli, eds. Chitin and Chitinases. Basel: Birkha¨user, 1999, pp 171–184. DR Houston, K Shiomi, N Arai, S Omura, MG Peter, A Turberg, B Synstad, VGH Eijsink, DMF van Aalten. High-resolution structures of a chitinase complexed with natural product cyclopentapeptide inhibitors: mimicry of carbohydrate substrate. Proc Natl Acad Sci USA 99:9127–9132, 2002. S Bahrke, JM Einarsson, J Gislason, S Haebel, MC Letzel, J Peter-Katalinic, MG Peter. Sequence analysis of chitooligosaccharides by matrix-assisted laser desorption ionization postsource decay mass spectrometry. Biomacromolecules 3:696–704, 2002. MC Letzel, B Synstad, VGH Eijsink, J Peter-Katalinic, MG Peter. Libraries of chito-oligosaccharides of mixed acetylation patterns and their interactions with chitinases. In: MG Peter, A Domard, RAA Muzzarelli, eds. Advances in Chitin Science Vol. 4. Potsdam: Universita¨t Potsdam, 2000, pp 545–557. T Watanabe, K Kobori, K Miyashita, T Fujii, H Sakai, M Uchida, H Tanaka. Identiﬁcation of glutamic acid-204 and aspartic acid-200 in chitinase-A1 of Bacillus circulans WL-12 as essential residues for chitinase activity. J Biol Chem 268:18567–18572, 1993. DMF van Aalten, D Komander, B Synstad, S Gaseidnes, MG Peter, VGH Eijsink. Structural insights into the catalytic mechanism of a family 18 exochitinase. Proc Natl Acad Sci USA 98:8979–8984, 2001. E Bokma, HJ Rozeboom, M Sibbald, BW Dijkstra, JJ Beintema. Expression and characterization of active site mutants of hevamine, a chitinase from the rubber tree Hevea brasiliensis. Eur J Biochem 269:893–901, 2002. Y Papanikolau, G Prag, G Tavlas, CE Vorgias, AB Oppenheim, K Petratos. High resolution structural analyses of mutant chitinase A complexes with substrates provide new insight into the mechanism of catalysis. Biochemistry 40: 11338–11343, 2001. Y Bourne, B Henrissat. Glycoside hydrolases and glycosyltransferases: families and functional modules. Curr Opin Struct Biol 11:593–600, 2001. AF Monzingo, EM Marcotte, PJ Hart, JD Robertus. Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core. Nat Struct Biol 3:133–140, 1996. DE Koshland. Stereochemistry and the mechanism of enzymatic reactions. Biol Rev Camb Philos Soc 28:416–436, 1953. JD McCarter, SG Withers. Mechanisms of enzymatic glycoside hydrolysis. Curr Opin Struct Biol 4:885–892, 1994.

244

Eijsink et al.

17. G Davies, B Henrissat. Structures and mechanisms of glycosyl hydrolases. Structure 3:853–859, 1995. 18. JC Uitdehaag, R Mosi, KH Kalk, BA van der Veen, L Dijkhuizen, SG Withers, BW Dijkstra. X-ray structures along the reaction pathway of cyclodextrin glycosyltransferase elucidate catalysis in the alpha-amylase family. Nat Struct Biol 6:432–436, 1999. 19. DJ Vocadlo, GJ Davies, R Laine, SG Withers. Catalysis by hen egg-white lysozyme proceeds via a covalent intermediate. Nature 412:835–838, 2001. 20. LP McIntosh, G Hand, PE Johnson, MD Joshi, M Korner, LA Plesniak, L Ziser, WW Wakarchuk, SG Withers. The pKa of the general acid/base carboxyl group of a glycosidase cycles during catalysis: a 13C-NMR study of Bacillus circulans xylanase. Biochemistry 35:9958–9966, 1996. 21. MD Joshi, G Sidhu, JE Nielsen, GD Brayer, SG Withers, LP McIntosh. Disecting the electrostatic interactions and pH-dependent activity of a family 11 glycosidase. Biochemistry 40:10115–10139, 2001. 22. AC Terwisscha van Scheltinga, A Armand, KH Kalk, A Isogai, B Henrissat, BW Dijkstra. Stereochemistry of chitin hydrolysis by a plant chitinase lysozyme and x-ray structure of a complex with allosamidin—evidence for substrate assisted catalysis. Biochemistry 34:15619–15623, 1995. 23. I Tews, A Perrakis, A Oppenheim, Z Dauter, KS Wilson, CE Vorgias. Bacterial chitobiase structure provides insight into catalytic mechanism and the basis of Tay–Sachs disease. Nat Struct Biol 3:638–648, 1996. 24. I Tews, AC Terwisscha van Scheltinga, A Perrakis, KS Wilson, BW Dijkstra. Substrate-assisted catalysis uniﬁes two families of chitinolytic enzymes. J Am Chem Soc 119:7954–7959, 1997. 25. MB Brurberg, IF Nes, VGH Eijsink. Comparative studies of chitinases A and B from Serratia marcescens. Microbiology 142:1581–1589, 1996. 26. K Suzuki, N Sugawara, M Suzuki, T Uchiyama, F Katouno, N Nikaidou, T Watanabe. Chitinases A, B, and C1 of Serratia marcescens 2170 produced by recombinant Escherichia coli: enzymatic properties and synergism on chitin degradation. Biosci Biotechnol Biochem 66:1075–1083, 2002. 27. DMF van Aalten, B Synstad, MB Brurberg, E Hough, BW Riise, VGH Eijsink, RK Wierenga. Structure of a two-domain chitotriosidase from Serratia marcescens at 1.9-angstrom resolution. Proc Natl Acad Sci USA 97:5842–5847, 2000. 28. T Ikegami, T Okada, M Hashimoto, S Seino, T Watanabe, M Shirakawa. Solution structure of the chitin-binding domain of Bacillus circulans WL-12 chitinase A1. J Biol Chem 275:13654–13661, 2000. 29. T Watanabe, M Uchida, K Kobori, H Tanaka. Site-directed mutagenesis of the Asp-197 and Asp-202 residues in chitinase A1 of Bacillus circulans WL-12. Biosci Biotechnol Biochem 58:2283–2285, 1994. 30. B Synstad, S Ga˚seidnes, G Vriend, JE Nielsen, VGH Eijsink. On the contribution of conserved acidic residues to catalytic activity of chitinase B from Serratia marcescens. In: MG Peter, A Domard, RAA Muzzarelli, eds. Advances in Chitin Science Vol. 4. Potsdam: Universita¨t Potsdam, 2000, pp 524–529.

Reaction Mechanism of Chitinases

245

31. SJ Williams, B Mark, DJ Vocadlo, MN James, SG Withers. Aspartate 313 in the Streptomyces plicatus hexosaminidase plays a critical role in substrate assisted catalysis by orienting the 2-acetamido group and stabilizing the transition state. J Biol Chem 277:40055–40065, 2002. 32. T Uchiyama, F Katouno, N Nikaidou, T Nonaka, J Sugiyama, T Watanabe. Roles of the exposed aromatic residues in crystalline chitin hydrolysis by chitinase A from Serratia marcescens 2170. J Biol Chem 276:41343–41349, 2001. 33. AC Terwisscha van Scheltinga, KH Kalk, JJ Beintema, BW Dijkstra. Crystal structures of hevamine, a plant defence protein with chitinase and lysozyme activity, and its complex with an inhibitor. Structure 2:1181–1189, 1994. 34. E Bokma, T Barends, AC Terwissch van Scheltinga, BW Dijkstra, JJ Beintema. Enzyme kinetics of hevamine, a chitinase from the rubber tree Hevea brasiliensis. FEBS Lett 478:119–122, 2000. 35. S Ga˚seidnes, B Synstad, JE Nielsen, VGH Eijsink. Rational engineering of the stability and the catalytic performance of enzymes. J Mol Catal B Enzym 21:3– 8, 2003. 36. AR Fersht. Structure and Mechanism in Protein Science. New York: WH Freeman, 1998. 37. SE Jackson, AR Fersht. Contribution of long-range electrostatic interactions to the stabilization of the catalytic transition state of the serine protease subtilisin BPNV. Biochemistry 32:13909–13916, 1993. 38. A de Kreij, B van den Burg, G Venema, G Vriend, VGH Eijsink, JE Nielsen. The eﬀects of modifying the surface charge on the catalytic activity of a thermolysinlike protease. J Biol Chem 277:15432–15438, 2002. 39. HR Faber, BW Matthews. A mutant T4 lysozyme displays ﬁve diﬀerent crystal conformations. Nature 348:263–266, 1990. 40. M Gerstein, AM Lesk, C Chothia. Structural mechanisms for domain movements in proteins. Biochemistry 33:6739–6749, 1994. 41. OR Veltman, VGH Eijsink, G Vriend, A deKreij, G Venema, B van den Burg. Probing catalytic hinge bending motions in thermolysin-like proteases by glycine!alanine mutations. Biochemistry 37:5305–5311, 1998. 42. MJ Osborne, J Schnell, SJ Benkovic, HJ Dyson, PE Wright. Backbone dynamics in dihydrofolate reductase complexes: role of loop ﬂexibility in the catalytic mechanism. Biochemistry 40:9846–9859, 2001. 43. EZ Eisenmesser, DA Bosco, M Akke, D Kern. Enzyme dynamics during catalysis. Science 295:1520–1523, 2002. 44. HM Wilks, KW Hart, R Feeney, CR Dunn, H Muirhead, WN Chia, DA Barstow, T Atkinson, AR Clarke, JJ Holbrook. A speciﬁc, highly active malate dehydrogenase by redesign of a lactate dehydrogenase framework. Science 242: 1541–1544, 1988. 45. JJ Perona, L Hedstrom, WJ Rutter, RJ Fletterick. Structural origins of substrate discrimination in trypsin and chymotrypsin. Biochemistry 34:1489–1499, 1995. 46. E Quemeneur, M Moutiez, JB Charbonnier, A Menez. Engineering cyclophilin into a proline-speciﬁc endopeptidase. Nature 391:301–304, 1998.

246

Eijsink et al.

47. F Cedrone, A Menez, E Quemeneur. Tailoring new enzyme functions by rational redesign. Curr Opin Struct Biol 10:405–410, 2000. 48. D Becker, C Braet, H Brumer III, M Claeyssens, C Divne, BR Fagerstrom, M Harris, TA Jones, GJ Kleywegt, A Koivula, S Mahdi, K Piens, ML Sinnott, J Stahlberg, TT Teeri, M Underwood, G Wohlfahrt. Engineering of a glycosidase family 7 cellobiohydrolase to more alkaline pH optimum: the pH behaviour of Trichoderma reesei Cel7A and its E223S/A224H/L225V/T226A/D262G mutant. Biochem J 356:19–30, 2001. 49. FH Arnold. Combinatorial and computational challenges for biocatalyst design. Nature 409:253–257, 2001. 50. S Sakuda, A Isogai, S Matsumoto, A Suzuki. Search for microbial insect growth regulators: II. Allosamidin, a novel insect chitinase inhibitor. J Antibiot (Tokyo) 40:296–300, 1987. 51. A Perrakis, I Tews, Z Dauter, AB Oppenheim, I Chet, KS Wilson, CE Vorgias. Crystal-structure of a bacterial chitinase at 2.3-angstrom resolution. Structure 2:1169–1180, 1994. 52. WL DeLano. The PyMOL Molecular Graphics System. San Carlos, CA, USA: DeLano Scientiﬁc 2002 (www.pymol.org).

11 Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase Frank M. Raushel Texas A&M University College Station, Texas, U.S.A.

1

AMIDOHYDROLASE SUPERFAMILY

The amidohydrolase superfamily is a related group of enzymes that catalyze the hydrolysis of bonds to carbonyl and phosphoryl centers. The most prominent members of this family of proteins include urease (URE), phosphotriesterase (PTE), adenosine deaminase (ADA), dihydroorotase (DHO), and atrazine chlorohydrolase (1). The reactions catalyzed by some of these enzymes are illustrated in Sch. 1. Structurally, all of these proteins have been shown to fold into a typical (ha)8-barrel motif, although the level of overall sequence identity is rather low. The hallmark for this family of enzymes is a cluster of four histidine residues that come together in three-dimensional space to form a highly structured binding site for divalent metal ions (2–4). The most common arrangement is for a binuclear metal center, as observed in the x-ray crystal structures of URE, PTE, DHO, and the phosphotriesterase homology protein (PHP), although a mononuclear metal binding site has been observed with ADA (5). Within the binuclear metal ion clusters, there 247

248

Scheme 1

Raushel

Reactions catalyzed by members of the amidohydrolase super-family.

Figure 1 Representation of the binuclear metal center within the active site of phosphotriesterase. (From Ref. 10.)

Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase

249

are always two ligands that bridge the two metal ions: a hydroxide from solvent and a carboxylate group. In the case of URE, PTE, and DHO, the bridging carboxylate originates from the side chain of a conserved lysine residue that has reacted with CO2 to form a carbamate functional group (2–4). In PHP, the bridging group is contributed from the side chain of a glutamate residue (6). A cartoon of the binuclear metal center in PTE is shown in Fig. 1. 2

CHEMICAL MECHANISM

The apparent role of the metal centers within the active sites of these enzymes is to activate the hydrolytic water molecule and substrate for nucleophilic attack. The actual chemical transformation is best understood in the reaction catalyzed by DHO because an x-ray crystal structure was determined with the substrate and product bound to separate monomers within the dimeric protein (4). The proposed reaction mechanism is summarized in Sch. 2 for the hydrolytic cleavage of dihydroorotate. In this chemical mechanism, the binding of dihydroorotate to the active site polarizes the carbonyl group via ligation to the h-metal ion. This binding interaction weakens the coordination of the bridging hydroxide to the h-metal (as evidenced by the longer bond to the h-metal ion relative to the a-metal ion). The hydroxide attacks the polarized carbonyl group, with assistance from Asp250, to form a tetrahedral adduct that now bridges the two divalent cations. Proton transfer from the protonated form of Asp250 to the incipient amide nitrogen initiates the collapse of this intermediate. Carbamoyl aspartate departs the active site and the binuclear metal center is recharged with a hydroxide ion from solvent. Similar mechanisms have been proposed for other members of the amidohydrolase superfamily. It would appear that this family of enzymes has evolved as a ‘‘delivery device’’ for the nucleophilic attack of hydroxide on target substrates. The architecture for the metal center has remained remarkably intact, but the individual active sites have been tailored through molecular evolution to

O

O O

O

O

H

O

OH

HN O

O HO

O

NH2

HN N H

Scheme 2

COOH

O

N H

COOH

O

N H

Catalytic reaction mechanism for dihydroorotase.

O

O

COOH

250

Raushel

recognize a speciﬁc set of substrates and associated functional groups for binding and chemical cleavage. Therefore, the amidohydrolase superfamily of enzymes oﬀered a rather attractive target with which to test the limits for a rational reconstruction of an active site. Modulation of the substrate and stereoselectivity of PTE through site-directed mutagenesis were utilized as a stringent test of this proposition. 3

PHOSPHOTRIESTERASE

A bacterial version of phosphotriesterase (aka organophosphate hydrolase or OPH) has been discovered in strains of Pseudomonas and Flavobacterium (7,8). The Flavobacterium isolate was originally identiﬁed from a rice patty in the Philippines where bacterial soil samples had been tested for their ability to hydrolyze speciﬁc organophosphate insecticides (7). The gene responsible for the coding of the enzyme was identiﬁed, cloned, and overexpressed in Escherichia coli, and the protein was puriﬁed to homogeneity (9). The threedimensional x-ray structure of PTE has been determined by the Holden laboratory to very high resolution (10). 4

REACTION MECHANISM

Bacterial PTE hydrolyzes a variety of organophosphate triesters of the type shown in Sch. 1 using the insecticide, paraoxon, as an example. The substrate speciﬁcity is such that the substituent that functions as the leaving group is very much dependent on the pKa and, thus, with paraoxon, only the pnitrophenol group is cleaved from the phosphorus center (11). The enzyme does not hydrolyze diesters at an appreciable rate and thus only a single substituent is subjected to cleavage (12). The substrate speciﬁcity of the native enzyme is reasonably broad in that the phosphoryl oxygen can be substituted with sulfur and the other three substituents can be replaced with a variety of other groups (13). The native bacterial phosphotriesterase served as an ideal candidate for the directed reconstruction of a substrate-binding site. The breadth of substrates recognized by the amidohydrolase superfamily of enzymes convinced us that the structural fold of the (ha)8 barrel could accommodate a variety of perturbations to the speciﬁc interactions between proteins and substrates. Moreover, the binuclear metal center within these proteins demonstrated that hydroxide could be delivered to a variety of trigonal and tetrahedral reaction centers. The construction of mutant variants of PTE would be quite useful in the detoxiﬁcation and detection of chemical warfare agents and agricultural insecticides. Our initial objective was to ﬁrst identify the structural determinants of substrate speciﬁcity for wild-type protein and then to

Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase

Scheme 3

251

Generic substrate for phosphotriesterase.

utilize this information to construct mutant forms of PTE where substrate speciﬁcity would be enhanced for speciﬁc substrates. The initial substrate library was a series of organophosphate triesters bearing a p-nitrophenol leaving group of the type presented in Sch. 3. The pnitrophenyl group was chosen because of the ease with which the kinetics of hydrolysis could be monitored spectrophotometrically. The substituents X and Y could be varied with large and small alkyl groups through straightfor-

Figure 2 Relative values for kcat for the wild-type phosphotriesterase with achiral substrate analogs. (From Ref. 11.)

252

Raushel

ward synthetic procedures, and chiral substrates could be constructed of either stereochemistry. Altogether, 16 such substrates were prepared using all possible combinations of methyl, ethyl, isopropyl, and phenyl groups. Shown in Fig. 2 are the relative kcat values for the four possible achiral variants of the target substrate (Y and X are the same substituents). These studies demonstrated that the wild-type protein accepted any of the four substituents in either the proS of proR position, but that not all of these substituents were accommodated in the same way by the protein (10). Kinetic assays of the six pairs of racemic mixtures demonstrated that the wild-type enzyme exhibited a distinct preference for one stereoisomer over the other, as shown in Fig. 3. In every case, except for the pair of methyl and ethyl, there was a > 20-fold preference for one isomer over the other and this catalytic preference rose to about 100:1 for the methyl phenyl substrate. Kinetic assays with the individual enantiomers demonstrated that the SPstereoisomer, in every case examined, was preferred over the RP-enantiomer for this series of chiral substrates. If this preference is deﬁned in terms of steric

Figure 3 Relative values for kcat for the wild-type phosphotriesterase with chiral substrate analogs. The values for SP enantiomers are shown in black whereas the RP enantiomers are shown in gray. (From Refs. 16,17.)

Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase

253

bulk, then the ‘‘large’’ substituent is preferred in the proS position whereas the ‘‘smaller’’ substituent is preferred in the proR position, as illustrated in Sch. 3. 5

KINETIC ENGINEERING OBJECTIVES

Our immediate goals for the mutagenesis of PTE were focused on a rational rearrangement of the active site binding cavity such that the inherent stereoselectivity using the aforementioned library of 16 model substrates could be manipulated. Thus, we were interested in enhancing the stereoselectivity possessed by the wild-type enzyme such that the catalytic preference for the SP isomer would be even more pronounced. The construction of mutants of this type would be quite useful, in a practical sense, for the kinetic resolution of racemic mixtures through the hydrolysis of a single stereoisomer while leaving the other enantiomer intact (14). Second, we were interested in relaxing the stereoselectivity of the wild-type enzyme. The goal here was to make the initially slower RP isomer as fast as the SP isomer (rather than making the SP isomer as slow as the RP isomer). Such mutants would be appropriate for the detoxiﬁcation of racemic mixtures of organophosphate triesters where both isomers are toxic. Mutants of this type would also be useful as an initial stepping stone for the ﬁnal objective, which was to create mutants where the stereoselective preference was reversed. With such mutants, the RP isomer would be hydrolyzed in preference to the SP isomer. In order to accomplish this goal, the SP isomers must be made poorer substrates while, simultaneously, the RP isomers must be made much better.

Scheme 4

Cartoon of the substrate binding pocket for phosphotriesterase.

254

Raushel

To accomplish these objectives, we set out to retool the active site in a semirational manner. Single-site mutants were constructed sequentially and then speciﬁc mutations were combined with one another to achieve the desired eﬀect. Our starting premise for this endeavor was based on the assumption that alterations to substrate speciﬁcity could be accommodated by the expansion and contraction of the individual subsites for each of the three substituents attached to the phosphorus core. A cartoon showing these three subsites is illustrated in Sch. 4. An additional assumption for this endeavor was that only one of these three subsites would be properly oriented for the expulsion of the leaving group. Therefore, the remaining two subsites would deﬁne the substrate and sterospeciﬁcity for PTE. The most obvious problem here is that substrate binding can occur in any one of the three possible orientations and thus there is the potential for nonproductive binding. 6

IDENTIFICATION OF SUBSITES

In order to identify those amino acid side chains that came together in threedimensional space to form the individual subsites for the substrate, Vanhooke et al. (15) (University of Wisconsin) solved the structure of PTE bound to the nonhydrolizable substrate analog shown in Sch. 4. From the x-ray structure, it was concluded that the proS ethoxy group was oriented in what we deﬁned as the leaving group pocket. The remaining two substituents (methylbenzyl) and the proR ethoxy group were oriented within the large and small pockets, respectively. The designation for the large and small pockets was deﬁned to acknowledge the stereoselective preference exhibited by the wild-type enzyme for the initial library of organophosphate esters (13). The residues that surrounded the leaving group pocket included W131, F132, F306, and Y309, whereas those that comprised the large pocket included H254, H257, L271, and M317. The small pocket was deﬁned by the side chains of G60, I106, L303, and S308. However, it should be noted that many of these residues are actually localized between these subsites, and thus the assignments are in some way rather arbitrary (15). 7

CONTRACTION OF SMALL SUBSITES

In order to construct mutants of PTE that were more stereoselective than the wild-type enzyme, we anticipated that the small subsite would have to be reduced in size. This reduction in the cavity size for the small subsite would likely obstruct or impair the positioning of substrates with bulky groups that would be required to bind within this region of the active site. Of the four residues that were probed in this manner, the mutation of Gly60 to an alanine proved to be the most eﬀective. Shown in Fig. 4 is a direct comparison of

Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase

255

Figure 4 Ratios of kcat/Km for chiral substrate analogs with the wild-type and G60A mutant of phosphotriesterase. The ratios are given for SP/RP. (From Ref. 16.)

the stereoselectivity (ratio of kcat/Km values for the SP and RP isomers) for the wild-type and G60A mutant (16). The results are extraordinary considering that only a single –CH2– group has been added to a sea of nearly 2000 carbon atoms. In every case, the G60A mutant is considerably more stereoselective than the wild-type enzyme. For example, the SP isomer of methyl ethyl pnitrophenyl phosphate is hydrolyzed 10 times faster than the RP isomer, where no diﬀerence in the rate of hydrolysis for these enantiomers was observed with the wild-type enzyme. Moreover, the ratio of kcat/Km values for the two enantiomers of methyl phenyl p-nitrophenyl phosphate increased from 20:1 to 10,000:1 with the mutant G60A. This mutant has proven to be quite eﬀective in the kinetic resolution of organophosphate triesters (14). Gram quantities of single RP isomers with ee values >98% have been obtained in a few minutes with this enzyme. 8

RELAXATION OF STEREOSELECTIVITY

In order to relax the stereoselectivity of the wild-type PTE, our approach was to enlarge the cavity space of the small subsite by mutation of residues within this site to either alanine or glycine. A simple alanine scan of residues C59, G60, S61, I106, W131, F132, H254, H257, L271, L303, F306, S308, Y309, and M317 showed that a signiﬁcant increase in the overall rate of hydrolysis of the initially slower RP isomers could be realized when some of these residues are changed to alanine (16). In general, the initially slower RP isomer has gotten faster in every case. Those residues that had the greatest overall impact in the improvement of the rate for the initially slower RP isomers were found to be I106, F132, and S308 (16). Further improvements in the relaxation of stereoselectivity were achieved by the construction of glycine mutants at the critical residue positions and through the combination of multiple alanine or glycine mutants at selected residue positions (17).

256

9

Raushel

REVERSAL IN SPECIFICITY

In order to reverse the stereoselectivity inherent within the wild-type PTE, two adjustments to the active site needed to be accomplished simultaneously. The small subsite must be expanded whereas the large subsite must be shrunk in size. The constriction of the large subsite was initiated in an attempt to make it more diﬃcult for the larger and bulkier groups to properly ﬁt within this portion of the active site. If eﬀective, this would reduce the rate of hydrolysis of the SP stereoisomers, relative to the values exhibited by the wildtype enzyme. The overall dimensional space of the large subsite was reduced by replacing H254, H257, L271, and M317 with the larger aromatic residues tyrosine, phenylalanine, or tryptophan. Shown in Fig. 5 are the eﬀects of these mutations on the kcat values for the hydrolysis of methyl phenyl pnitrophenyl phosphate. The kcat value has been reduced from a value that exceeds 40,000 s1 for the wild-type enzyme to a value that is about 200 s1 for the H254F mutant. Overall, the most interesting mutant within this series of modiﬁed enzymes was H257Y (17). The kinetic constants for the SP isomers for the six chiral organophosphates were all reduced, relative to those of the wild-type enzyme. The largest reductions were observed for those compounds containing a single phenyl substituent. Therefore, the H254Y mutation at the large subsite was combined with the mutations previously

Figure 5 Diminution in the value of kcat for the SP enantiomer of methyl phenyl pnitrophenyl phosphate when the indicated residues within the large subsite of phosphotriesterase are mutated. (From Ref. 17.)

Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase

257

made within the small subsite in the rational search for novel proteins where the stereoselectivity was the opposite to that of the wild-type enzyme. We had demonstrated that enlargement of the small subsite with the substitution of glycine and/or alanine residues for I106, F132, and S308 resulted in signiﬁcant improvements in the rates of hydrolysis for most of the initially slower RP enantiomers of the substrate library. However, the mutations made to the small subsite had much smaller eﬀects on the rates of hydrolysis for the initially faster SP enantiomers. In contrast, reduction in the size of the large subsite with the mutant H254Y resulted in the diminution in the kinetic parameters for most of the faster SP enantiomers but relatively smaller eﬀects on the kinetic parameters for the initially slower RP enantiomers. These results indicated that it should be possible to create variants of the native PTE that could reverse stereoselectivity by modulation of the sizes of the large and small subsites simultaneously, if the eﬀects at the individual sites were additive. A total of 11 mutants were constructed in an attempt to reshape the structure of the small and large subsites simultaneously (17). Mutant enzymes were identiﬁed for the reversal of each pair of stereoisomers, with the single exception of ethyl isopropyl p-nitrophenyl phosphate. The most dramatic example is the case for the two stereoisomers of the substrate, isopropyl phenyl p-nitrophenyl phosphate. The wild-type enzyme prefers the SP isomer by a factor of 35 whereas the mutant I106G/H254Y/S308G prefers the RP stereoisomer by a factor of 460. The enhancements in the rates of hydrolysis for the RP isomers caused by these mutants were very similar to those observed with the glycine and alanine mutants of I106, F132, and S308 that only enlarged the small subsite. 10

SUMMARY

The investigation of the enantiomeric selectivity of PTE is of considerable practical signiﬁcance. A variety of toxic pesticides and chemical warfare agents are phosphorus compounds that contain a chiral phosphorus center (18). Previous studies have shown that the more toxic isomers of these acetyl cholinesterase inactivators are the poorer substrates of the wild-type PTE (19). This study has clearly demonstrated that the reactivity and stereoselectivity of PTE can be enhanced, relaxed, or reversed by the rational evolution of speciﬁc active site residues. The enhancement and reversal of stereoselectivity have made it possible to utilize variants of PTE for the kinetic resolution of racemic mixtures of chiral organophosphates and to obtain either isomer with substantial enantiomeric excess (14). The relaxation of stereoselectivty is desired for bioremediation when catalysts are needed to eﬃciently detoxify hazardous pesticides and chemical warfare agents. The

258

Raushel

Figure 6 Manipulation of the stereoselectivity of phosphotriesterase for the chiral forms of ethyl phenyl p-nitrophenyl phosphate. The ratios of kcat/Km for the wild-type and selected mutants where the stereoselectivity has been enhanced (G60A), relaxed (I106G/F132G/S308G), and reversed (I106G/F132G/H257Y/S308G) are presented.

overall success of this eﬀort, directed at the modulation of the kinetic properties of the wild-type enzyme, is graphically presented in Fig. 6. The relative kinetic parameters for the kinetic parameters for ethyl phenyl pnitrophenylphosphate with the wild-type enzyme and the best mutant enzyme, where the relative kinetic parameters have been enhanced, relaxed, or reversed, are provided. Enhancements in stereoselectivity for the preferred SP enantiomer up to three orders of magnitude have been achieved by the mutant G60A for all substrates tested. Multiple mutations within the active site have led to a complete reversal of the original chiral selectivity. These results suggest that further mutations within the active site could be engineered to accommodate nearly any organophosphate.

ACKNOWLEDGMENTS Financial support for this project has been obtained from the National Institutes of Health, Oﬃce of Naval Research, and the Advanced Technology Program from the state of Texas.

REFERENCES 1. 2. 3.

L Holm, C Sander. An evolutionary treasure: Uniﬁcation of a broad set of amidohydrolases related to urease. Proteins 28:72–82, 1997. E Jabri, MB Carr, RP Hausinger, PA Karplus. The crystal structure of urease from Klebsiella aerogenes. Science 268:998–1004, 1995. MM Benning, JM Kuo, FM Raushel, HM Holden. Three-dimensional structure

Kinetic Evolution to the Catalytic Core of the Bacterial Phosphotriesterase

4.

5.

6.

7. 8. 9. 10.

11. 12.

13. 14.

15.

16.

17.

18.

259

of the binuclear metal center of phosphotriesterase. Biochemistry 34:7973–7978, 1995. JB Thoden, GN Phillips, TM Neal, FM Raushel, HM Holden. Molecular structure of dihydroorotase: A paradigm for catalysis through the use of a binuclear metal center. Biochemistry 40(24):6989–6997, 2001. DK Wilson, FA Quiocho. A pre-transition-state mimic of an enzyme: X-ray structure of adenosine deaminase with bound 1-deazaadenosine and zincactivate water. Biochemistry 32(7):1689–1694, 1993. JL Buchbinder, RC Stephenson, MJ Dresser, TS Scanlan, RJ Fletterick. Biochemical characterization and crystallographic structure of an Escherichia coli protein from the phosphotriesterase gene family. Biochemistry 37(15):17445– 17450, 1998. T Sethunathan, T Yoshida. A ﬂavobacterium sp. that degrades diazinon and parathion. Can J Microbiol 19:873–875, 1973. DM Munneke. Enzyme hydrolysis of organophosphate insecticides, a possible pesticide disposal method. Appl Environ Microbiol 32:7–13, 1976. FM Raushel, HM Holden. Phosphotriesterase: An enzyme in search of its natural substrate. Adv Enzymol 74:51–93, 2000. MM Benning, H Shim, FM Raushel, HM Holden. High resolution x-ray structures of diﬀerent metal-substituted forms of phosphotriesterase from Pseudomonas diminuta. Biochemistry 40:2712–2722, 2001. SB Hong, FM Raushel. Metal substrate interactions facilitate the catalytic activity of the bacterial phosphotriesterase. Biochemistry 35:10904–10912, 1996. H Shim, SB Hong, FM Raushel. Hydrolysis of phosphodiesters through transformation of the bacterial phosphotriesterase. J Biol Chem 273:17445–17450, 1998. SB Hong, FM Raushel. Stereochemical constraints on the substrate speciﬁcity of phosphotriesterase. Biochemistry 38:1159–1165, 1999. F Wu, WS Li, M Chen-Goodspeed, M Sogorb, FM Raushel. Rationally engineered mutants of phosphotriesterase for preparative scale isolation of chiral organophosphates. J Am Chem Soc 122:10206–10207, 2000. JL Vanhooke, MM Benning, FM Raushel, HM Holden. Three-dimensional structure of the zinc-containing phosphotriesterase with the bound substrate analog diethyl 4-methylbenzylphosphonate. Biochemistry 35:6020–6025, 1996. M Chen-Goodspeed, MA Sogorb, F Wu, SB Hong, FM Raushel. Structural determinants of the substrate and stereochemical speciﬁcity of phosphotriesterase. Biochemistry 40:1325–1331, 2001. M Chen-Goodspeed, MA Sogorb, F Wu, FM Raushel. Enhancement, relaxation, and reversal of the stereoselectivity for phosphotriesterase by rational evolution of active site residues. Biochemistry 40:1332–1339, 2001. HL Boter, C Van Dijk. Stereospeciﬁcity of hydrolytic enzymes on reaction with assymetric organophosphorus compounds. 3. The inhibition of acetylcholinestedrase and butyrylcholinesterase by enantiomeric forms of sarin. Biochem Pharmacol 18:2403–2407, 1969.

12 Protein Engineering of PQQ Glucose Dehydrogenase Satoshi Igarashi and Koji Sode Tokyo University of Agriculture and Technology Tokyo, Japan

1 1.1

INTRODUCTION PQQ and PQQ-Harboring Enzymes

Pyrroloquinoline quinone (PQQ) was ﬁrst proposed in the 1960s as the third major prosthetic group (along with pyridine nucleotides and ﬂavins) for redox enzymes (1). After about two decades, the structure of PQQ (Fig. 1) was determined by two groups (2,3). PQQ is the ortho-quinone at the C4 and C5 positions of the quinone ring. The C5 carbonyl group in the oxidized form is very reactive towards nucleophiles such as alcohols, sugars, amines, ammonia, cyanide, and amino acids. Knowledge about PQQ in the view of biology, biochemistry, and electrochemistry has been studied and summarized in several reviews (4–12). Until now, many PQQ-harboring proteins or PQQ and heme-harboring proteins have been discovered but only in Gramnegative bacteria (Table 1). Most of the PQQ-harboring enzymes belonged to dehydrogenases (4–31): PQQ methanol dehydrogenases (PQQMDH), PQQ ethanol dehydrogenases (PQQEDH), and PQQ glucose dehydrogenases (PQQGDH). 261

262

Igarashi and Sode

Figure 1

The structure of pyrroloquinoline quinone (PQQ).

PQQMDHs oxidize methanol to formaldehyde during the growth of methylotrophic bacteria on methane or methanol (32–34). PQQMDH from Methylotroph sp. was the ﬁrst PQQ enzyme for which a tertiary structure was elucidated. PQQMDH is a soluble periplasmic enzyme composed of a2h2 heterotetrameric structure (35). The catalytic subunit, the a-subunit (about 60 kDa), possesses one PQQ molecule and one Ca2+ ion. The a-subunit was shown to be an 8-bladed h-propeller fold (36). Other PQQ enzymes also appear to be h-propeller proteins. The h-propeller structure is composed of a repetitive folding unit called the W-motif, which is arranged circularly like the blades of propeller (Fig. 2). The W-motif is composed of four antiparallel hstrands. h-Propeller proteins having four to eight W-motifs have been reported (37). PQQ-dependent alcohol dehydrogenases including PQQMDHs can be categorized into three types. The ﬁrst group named ADH I was soluble alcohol dehydrogenases, including PQQMDH. The diﬀerence between PQQMDH and PQQEDH was simply substrate speciﬁcity (34). Type I PQQEDHs are homodimers of identical subunit of 60 kDa each, and its structure is 8-bladed h-propeller fold similar to PQQMDHs (38,39). ADH II is classiﬁed as heme-possessing PQQADH (15). The overall structure is composed of two domains: the N-terminal domain (1–566) as an 8-bladed h-propeller fold containing one PQQ molecule and one calcium ion in its active site and the C-terminal type I cytochrome domain (591–667) (40). The ADH III is a membrane-bound type alcohol dehydrogenase. ADH III is comprised of three subunits: a (catalytic), h (cytochrome), and small subunit (41,42). The a subunit has one PQQ molecule and single heme C, and h subunit possesses three heme C’s. In an electrochemical ﬁeld, direct electron transfer from PQQ to an electrode via heme C was observed (43). The substrate speciﬁcity proﬁle of ADH III is relatively restricted compared with other ADHs (12).

Type III

Type II

Polypropylene glycol dehydrogenase Alcohol dehydrogenase

Tetrahydrofurfuryl alcohol dehydrogenase Polyvinyl alcohol dehydrogenase Polyethylene glycol dehydrogenase

Ethanol dehydrogenase Alcohol dehydrogenase

? ?

PQQ/ heme C PQQ/ heme C

PQQ/heme C/3 heme C’s

PQQ

?

PQQ/ heme C

Periplasm/ membrane-bound Membrane-bound

Periplasm/ membrane-bound

Periplasm

Periplasm

Periplasm

Location

PQQ/heme C

PQQ

PQQ

Prosthetic group

The List of PQQ or PQQ-Heme-Harboring Proteins

Alcohol dehydrogenases Type I Methanol dehydrogenase

Table 1

a/h/small

Homodimeric

Tetrameric

Monomeric

Monomeric

Monomeric

Homodimeric

a2h2

Component

21 21

20

20

19

18

16 17

15

12

4

Ref.

(Continued on next page)

Pseudomonas sp. Stenotrophomonas maltophilia Gluconobacter sp. Acetobacter sp.

Pseudomonas sp. strain VM15C Rhodopseudomonas acidophila Flavobacterium sp.

Comamonas testosteroni P. putida HKS Ralstonia eutropha strain B0

Methylotrophs Paracoccus denitriﬁcans Pseudomonas sp.

Organism

Protein Engineering 263

Continued

Glucose dehydrogenase (s) Cyclic alcohol dehydrogenase D-Arabitol dehydrogenase Formaldehyde dehydrogenase

Glucose dehydrogenases Glucose dehydrogenase (m)

Table 1

Periplasm Membrane-bound Membrane-bound Membrane-bound

PQQ PQQ PQQ

Membrane-bound

Location

PQQ

PQQ

Prosthetic group

Heterodimeric (a/h) Homotetrameric

Monomeric

Homodimeric

Monomeric

Component

G. suboxydans IFO3257 Methylococcus capsulatus

G. frateurii CHM9

12 12 12

Gluconobacter sp. Pseudomonas sp. Acinetobacter calcoaceticus Acetobacter sp. Pseudomonas sp. N11 Acinetobacter sp.

25

24

23

12 22 12

12

Ref. Enterio bacteria

Organism

264 Igarashi and Sode

Glycerol dehydrogenase

Sorbose/sorbosone dehydrogenase Quinate dehydrogenase 1-Butanol dehydrogenase

Lupanine hydroxylase Sorbitol dehydrogenase

Periplasm Particle-bound Periplasm Periplasm Membrane-bound

PQQ PQQ PQQ PQQ/heme C PQQ

Monomeric Monomeric

Monomeric

Monomeric

Heterodimeric

Oligomer?

Membrane-bound

P. butanovora G. industrius

A. calcoaceticus Gluconobacter sp. P. butanovora

G. suboxydans IFO3255 The strain DSM4025

Gluconobacter sp.

a/h/small

Membrane-bound

PQQ/heme C/3 heme C’s

P putida

Monomeric

Periplasm

PQQ/heme C

30 31

29

16

28

27

26

Protein Engineering 265

266

Igarashi and Sode

Figure 2 Overall structure of water-soluble quinoprotein glucose dehydrogenase (PQQGDH-B). This model was complemented by the addition of PQQ, Ca2+, some loop regions, and the energy minimization based on previously reported model (From Ref. 68) (PDB code: 1QBI).

PQQ-dependent glucose dehydrogenases (PQQGDHs) have also been studied extensively. PQQGDHs are described in the next section. 1.2

PQQ Glucose Dehydrogenases: The Basic Science and Industrial Application

There are two types of glucose dehydrogenases harboring PQQ as their prosthetic group (44,45). Membrane-bound type glucose dehydrogenase (PQQGDH-A) has been isolated from various Gram-negative bacteria such as Escherichia coli, Acinetobacter calcoaceticus, Pseudomonas sp., and acetic acid bacteria (12). PQQGDH-As are all single peptide with MWs of about 87 kDa containing one PQQ molecule (46,47). PQQGDH-As make a bioenergetic contribution via coupling of the oxidation of glucose to the respiratory chain through ubiquinone (48,49). The ﬁve genes encoding PQQGDH-A have been elucidated (46,47,50–52). The 3-D structure of PQQGDH-A is predicted to be a h-propeller composed of eight W-motifs, based on the homology modeling with PQQMDH (53) and also based on the CD spectroscopy of an enzyme from which the membrane spanning region was deleted (54). The N-terminal region was predicted to be the membrane spanning the anchoring region (55). The authors have reported the ﬁrst site-directed

Protein Engineering

267

mutagenesis study on a PQQ enzyme, PQQGDH-A. Since then, several mutations were introduced in this enzyme, including the studies introduced in this review, to elucidate the enzyme mechanisms (56–66). Besides the membrane-bound glucose dehydrogenase, A. calcoaceticus possesses a completely diﬀerent PQQGDH, the water-soluble glucose dehydrogenase (PQQGDH-B or s-GDH), which does not share any obvious homology with the primary structures of other PQQ enzymes (67). The BLAST search for PQQGDH-B homology identiﬁed two open reading frames from the E. coli K-12 strain MG1655 genome and Synechocystis sp. strain PCC6803 genome and two incomplete sequences from the genomes of Pseudomonas aeruginosa and Bordetella pertussis. The functions of these four deduced open reading frames are uncertain, and the predicted protein localization also diﬀers using the prediction program (PSORT and Signal P) (68,69). PQQGDH-B is a homodimeric enzyme consisting of an identical subunit of approximately 50 kDa (67,70). The monomer has one PQQ molecule and three Ca2+ ions, two of which are located in the dimer interface and the third Ca2+ ion is near PQQ (68). The physiological roles of PQQGDH-B have not yet been elucidated. PQQGDH-B does not couple with the respiration chain of A. calcoaceticus. The substrate speciﬁcity proﬁle of PQQGDH-B is broad compared with that of PQQGDH-A. This enzyme catalyzes the oxidation of glucose, allose, 3-O-methyl-glucose, and also the disaccharide lactose, cellobiose, and maltose (71). PQQGDH-B contains a 24amino acid signal peptide at its N-terminus and secreted in the periplasmic space after excision of the signal sequence. PQQGDH-B is also a h-propeller, but apparently forms a 6-bladed structure (68). PQQ resides in a deep, broad, positively charged cleft at the top of the propeller near the 6-fold pseudosymmetry axis (72). In this model, PQQ is directly exposed to the solvent. Ca2+ ion is bound to N6, O7A, and O5 atoms of PQQ. These bonds are similar to that of PQQMDH, and it indicates that catalysis of Ca2+ ion near the PQQ requires a cofactor. The active site of PQQGDH-B is composed of loop1D2A, loop2D3A, loop4BC, loop4D5A, and loop6BC. The substrate biding residues have been reported and are included mainly in loop1D2A, loop2D3A, and loop4BC (72). Among them, His168 was speciﬁed as an important residue that works for proton abstraction from substrate because His168 is the only base close to the glucose O1 atom, and glucose C1 atom is positioned directly above the PQQH2 C5 atom (72). 1.3

The Industrial Significance of PQQGDH: The Glucose Sensors

Diabetes mellitus is a serious metabolic disorder that places patients at increased risk of coronary and vascular disease, as well as debilitating

268

Igarashi and Sode

conditions such as retinopathy, nephropathy, and neuropathy. Therefore rapid and accurate blood glucose monitoring is essential for treating critically ill patients and managing diabetic patients. The glucose sensor is a traditional biosensor and was ﬁrst reported by Clark in 1962 (73). Clark’s sensor was based on glucose oxidase (GOD) as its sensor constituent, and GOD-based glucose sensors dominate the current market. GOD is categorized as a stable protein and may be easily produced and puriﬁed from Aspergillus sp. GOD is an electron mediator-type glucose sensor. The inherent property of GOD is that it utilizes oxygen as the electron acceptor. This limits the further application in this ﬁeld because enzyme activity is a function of oxygen partial pressure (74–76). Various glucose sensors employing PQQGDHs have been reported (77–81). The merits of using the PQQGDHs as a glucose sensor component are as follows: 1. PQQGDHs show high catalytic eﬃciency compared with GOD. The high activity allows rapid glucose sensing. 2. PQQ is tightly bound to GDH; therefore it is not necessary to add an extra cofactor like NAD (P). 3. PQQGDHs do not utilize dissolved oxygen as its electron acceptor during glucose oxidation. This property enhances accurate measurement of glucose in the human blood. Focusing these merits, PQQGDH-B glucose sensors are already on the market. However, despite their superior features, further improvements are required. This is particularly true when PQQGDH is compared with GOD, which has better substrate speciﬁcity and operational stability. The establishment of economical recombinant enzyme production system is also essential. The authors’ research group initiated and is currently the only group engaged in the protein engineering of PQQGDHs to develop an optimized glucose sensor enzyme. This review summarizes the current status of PQQGDH protein engineering. 2

PROTEIN ENGINEERING OF PQQGDH-A

Highly homologous primary structures have been observed in PQQGDH-As, which have been cloned from various Gram-negative bacteria; however, the enzymatic characteristics are dependent upon the derived bacterial sources. Although their tertiary structure was hard to elucidate due to hydrophobic properties, the highly homologous primary structure of this protein enabled us to initiate the protein engineering of this enzyme based on the homologous recombination to construct a chimeric enzyme library (58,60,63).

Protein Engineering

269

Among various properties, we focused on the diﬀerence in the cofactor binding stability as the marker for the chimeric enzyme library. PQQGDHs require divalent ion for holoenzyme formation with PQQ; however, divalent ions such as Ca2+ are removed by the presence of chelating reagents such as EDTA, resulting in apoenzyme formation (82). Therefore EDTA tolerance can be interpreted as an indicator of cofactor binding stability. A. calcoaceticus PQQGDH-A is a representative EDTA-tolerant enzyme, whereas E. coli PQQGDH-A is a representative EDTA-sensitive enzyme (57,58,70). The highly homologous primary structure between E. coli and A. calcoaceticus PQQGDH-A structural genes provided a strategy for the construction of a chimeric enzyme library based on homologous recombination. The investigation of the chimeric PQQGDH-A library resulted in the elucidation of the region responsible for EDTA tolerance (58,60) (Table 2). One of the chimeric enzymes (designated as E97A3) showed the increase in the thermal stability of which the N-terminal 97% region is from E. coli and the remaining 3% is from A. calcoaceticus PQQGDH-A (57). This observation suggested that the interaction between C-terminal and N-terminal regions may play a crucial role in maintaining the overall structure of hpropeller proteins (63). We have also carried out the ﬁrst site-directed mutagenesis studies on PQQ enzymes, particularly PQQGDH-A, focusing on the C-terminal highly conserved region, previously postulated as the putative PQQ binding site (83). Cleton-Jansen et al. (50) reported an altered substrate speciﬁcity of a mutant PQQGDH-A in Gluconobacter oxydans, for which substrate speciﬁcity was enlarged by the substitution of the conserved C-terminal His region. Based on this information, site-directed mutagenesis studies on the conserved Cterminal His residues of E. coli PQQGDH-A (His775) and of A. calcoaceticus (His781) were carried out (59,62). The substitution of E. coli His775 to Asn showed the increase in both Km value (from 0.9 to 1.5 mM) and Vmax/Km ratio (from 116 to 287 U/mg protein mM) for glucose compared with wild type (Table 2). The substrate speciﬁcity of His775Asn drastically changed and increased vs. wild-type E. coli PQQGDH-A. The Vmax/Km ratios for all substrates except for glucose decreased as compared with wild type; consequently, His775Asn scarcely oxidized sugars other than glucose. His775Asp also showed a signiﬁcant increase in Km values for all the saccharides used in the study (Table 2), and showed improvement of the substrate speciﬁcity compared with wild-type E. coli PQQGDH-A, as did His775Asn. Amino acid substitution at His781 in A. calcoaceticus also signiﬁcantly aﬀected substrate speciﬁcity. On the basis of the accumulated information from these studies, we constructed an enzyme composed of all the regions that showed improved

100 30 1 13 2 4 5

100 81 30 105 38 48 14

0 21 0 1 3

100 52

140

His775Asp

3 29 6 4 2

100 47

79

His775Glu

0 161 6 16 23

100 177

5.6

His775Lys

3 73 7 5 4

100 64

195

His775Ser

Substrate concentration is 1 mM. The values were the relative activity compared with the activity toward glucose as the substrate.

417

His775Asn

110

Wild type

Substrate Speciﬁcity Proﬁles of PQQGDH-A His775 Variants

Activity (U/mg) D-Glucose 2-Deoxyglucose D-Mannose D-Allose D-Galactose D-Xylose Maltose

Table 2

1 29 6 4 2

100 39

7.9

His775Leu

2 105 12 3 1

100 46

3.1

His775Tyr

2 58 3 9 4

100 49

1.1

His775Trp

270 Igarashi and Sode

4 93 7 10 0

<1 100(%) 76

E97A3

0 29 0 0 0

1.4 100(%) 27 7 92 18 37 4

+ 27 100(%) 69 11 87 17 35 6

+ 21 100(%) 61

13 58 15 17 13

+ 7.7 100(%) 31

1 46 7 1 7

+ 39 100(%) 18

E97A3H775N E32A27E41 E32A27 E32A27E41 E32A27E38A3 E38A3 H782N H782N

Substrate concentration is 1 mM. The values were the relative activity compared with the activity toward glucose as the substrate. x: His775 or His782 mutation; +: shows EDTA tolerance; : does not show EDTA tolerance.

0 4 0 2 2

<1 100(%) 11

<1 100(%) 69 5 81 8 21 3

H775N

E. coli PQQGDH

The Enzymatic Properties of Multichimeric PQQ Glucose Dehydrogenases

EDTA tolerance T1/2 at 45jC D-Glucose 2-Deoxy-Dglucose D-Mannose D-Allose D-Galactose D-Xylose Maltose

Table 3

Protein Engineering 271

272

Igarashi and Sode

enzymatic characteristics with the goal of engineering an optimized sensor enzyme. Multichimeric PQQGDH-As with improved enzymatic characteristics were engineered by substituting and combining various PQQGDH-A constructs: the region responsible for EDTA tolerance (A27 region), for the thermal stability (A3 region), and for the substrate speciﬁcity (conserved His residue in PQQGDH-A) (57,63). The resulting chimeric PQQGDH-As were E32A27E38A3 His782Asn and E32A27E38A3 His782Asp. Both multichimeric PQQGDH-As showed increased cofactor binding stability, thermal stability, and alteration in substrate speciﬁcity. Moreover, E32A27E38A3 His782Asp showed a 10-fold increase in the Km value for glucose compared with the wild-type E. coli PQQGDH-A (Table 3). This study indicated the complementarity of the protein regions responsible for the improvement of diﬀerent enzymatic properties of PQQGDH-A.

3

PROTEIN ENGINEERING OF PQQGDH-B

3.1

3-D Engineering Approaches

We have carried out the protein engineering on PQQGDH-B for the improvement of thermal stability, catalytic eﬃciency, and enhanced substrate speciﬁcity. Although the tertiary and quaternary structures of this enzyme are now available, these only provide limited information on enzyme function. Successful engineering will still require multiple strategies; our approach uses a combination of random mutagenesis and rational design of amino acid substitutions. We have been attempting to develop an optimized PQQGDHB for sensor applications. In this section, we summarize our current eﬀorts to engineer PQQGDH-B. 3.1.1

Glu277Lys

Prior to the elucidation of the tertiary structure of PQQGDH-B, the authors carried out the site-directed mutagenesis on PQQGDH-B, the putative active site based on the enzymatic properties of PCR mutants of this enzyme (71). The random mutant enzyme found in our laboratory, designated as No.87, contained eight amino acid substitutions. This mutant showed a decreased Km value and also a decreased EDTA tolerance, indicating decreased holoenzyme and thermal stability compared with the wild type. On the basis of mutational analyses, we found that the substitution of Glu277 residue with Gly was responsible for the properties of No.87. Moreover, mutational analyses on the neighboring amino acid residues of Glu277, Asp275Glu, Asp276Glu, Ile278Phe, and Asn279His were also carried out. Considering

Protein Engineering

273

that Asp275Glu, Asp276Glu, and Glu277Gly showed drastic decreases in EDTA tolerance, we assumed that this region might be the PQQGDH-B active site and/or a binding site for Ca2+ or PQQ. This was later conﬁrmed by elucidation of the tertiary structure. Glu277 variants all showed decreased Km values and altered substrate speciﬁcity proﬁles. Among them, Glu277Lys showed similar enzymatic activity and thermal stability to the wild-type enzyme, but its catalytic eﬃciency (kcat/Km) was approximately 3-folds higher compared to the wild type (349 to 128 s1 mM1) (Table 4). According to the 3-D structure of PQQGDH-B, the position of Glu277 is located at strand 4C. This strand is connected to loop4BC, one of the loop regions that creates the enzyme’s cavity. Glu277 mainly interacts with Ca2+ ion (II) that is located in the dimer interface loop region. Therefore the replacement of Glu277 with other amino acids may aﬀect its dimer conformation. Furthermore, the neighboring amino acid residue, Asp275, apparently works as one of the donated residues connecting a water molecule to Ca2+ ion (III). Asp276 is included in Ca2+ ion binding site (I) that is located at an active site. In addition, Asn279 also contributes in connecting a water molecule to Ca2+ ion (III). Thus the region from Asp275 to Asn279 is the hot spot related to all Ca2+ ion binding site and an active site in PQQGDH-B. We can conclude that the decreased EDTA tolerance and thermal stability are consistent with the location and function of these residues. The recent emergence of self-testing markets for blood glucose requires less painful methods for taking the sample and for enhanced measurement,

Table 4 Kinetic Parameters of Wild-Type and Glu277Lys PQQGDH-B for Various Substrates Wild type

Glucose 2-Deoxyglucose Mannose Allose 3-O-methylglucose Galactose Xylose Lactose Maltose

Glu277Lys

Km (mM)

kcat (s1)

kcat/Km (s1 mM1)

Km (mM)

kcat (s1)

kcat/Km (s1 mM1)

26.8 90

3436 331

128 (100%) 4 (3%)

8.8 88

3071 1063

349 (100%) 12 (3%)

22 35.5 28.7

267 2509 3011

12 (9%) 71 (55%) 105 (82%)

22 21 27

861 4563 3198

39 (11%) 287 (82%) 118 (34%)

5.3 14.3 18.9 26

232 201 1659 1930

6.8 34 7.5 14.3

630 678 1795 1015

78 20 239 71

44 14 88 74

(34%) (11%) (69%) (58%)

(22%) (6%) (68%) (20%)

274

Igarashi and Sode

simplicity, and reliability. Semi- or minimally invasive systems are considered optimal. Minimization of the blood or interstitial ﬂuid (ISF) sample implies the need for high catalytic eﬃciency in the sensor element. As Glu277Lys showed three times higher catalytic eﬃciency, this variant has great potential as a component of a highly sensitive glucose monitoring system. 3.1.2

Ser231Lys

Another PCR mutant of PQQGDH-B, Ser231Cys, was found to retain higher thermal stability than the wild-type PQQGDH-B (69) (Fig. 3). Therefore the authors replaced Ser231 with a series of amino acids and analyzed their impact on thermal stability. Ser231Lys showed the highest thermal stability at 55jC without decreasing catalytic activity. Ser231Lys showed more than an 8fold increase in its half-life (Ser231Lys: 40 min, wild type: 5 min) during the thermal inactivation at 55jC compared with the wild-type enzyme without a decrease in catalytic activity. Therefore higher yield in active enzyme preparation is expected, which may improve the cost eﬀectiveness of glucose sensor component production using this enzyme. Moreover, higher thermal stability usually results in higher storage stability; therefore the application of this mutant enzyme as a glucose sensor constituent may develop into a stable glucose sensor construction.

Figure 3 Thermal stability of wild-type and Ser231Lys PQQGDH-B at 55jC. o: Ser231Lys; .: wild type. The residual activity of PQQGDHs was determined at 25jC.

Protein Engineering

275

The replacement of Ser231 with hydrophobic residues does not aﬀect its thermal stability and substrate speciﬁcity proﬁle. Furthermore, it is reasonable to suggest that the charge or the size of the side chain at the 231st positioned residue will not show any correlation with its enzymatic properties. Generally, h-propeller fold can be divided into three regions: (1) h-sheet regions that provide scaﬀold structure; (2) loopBC and loopDA regions (loop regions connecting strand B and strand C, and then strand D and strand A, respectively; these tend to be the functional regions, coenzyme or metal ion binding); and (3) loopAB and loopCD regions, which are located opposite the functional regions. Since Ser231 is located in loop3CD, the replacement of this residue might not aﬀect the catalytic properties of this enzyme. Modeling analysis of Ser231Lys suggested that the replacement of Ser231 with Lys eﬀectively increased the hydrophobicity in the loop3CD region. This observation suggested that the increase in hydrophobic interaction strengthened the packing of W-motif and/or h-propeller structure. 3.1.3

Asn452Thr

The conserved C-terminus amino acid residues in PQQGDH-A, His775 (in E. coli PQQGDH), and His781 (in A. calcoaceticus PQQGDH-A) were shown to be responsible for their substrate speciﬁcity proﬁles (59,62). The primary structure of PQQGDH-B has little similarity to that of PQQGDHAs. However, in both enzymes, the orientation of the active site is in the opposite site of the region where the C-terminus and N-terminus interact to circularize the h-propeller. On the basis of this similarity, we assumed that residues with the same 3-D orientation could aﬀect the substrate speciﬁcity of both PQQGDH-A and PQQGDH-B. This would make the C-terminus the region of interest. Moreover, according to the structural information on the PQQGDH-B active site, substrate glucose locates in the cavity composed of 1D2A, 2D3A, 3BC, 4D5A, and 6BC loops and interacts with the amino acid residues located at 1D2A, 2D3A, and 3BC (72). However, loop6BC does not have amino acid residues that interact with glucose. One of the characteristic properties of GDH-B substrate speciﬁcity is that PQQGDH-B reacts with disaccharide such as lactose and maltose. If loop6BC is not involved in substrate binding, the engineering of loop6BC for direct substrate interaction or interaction with other loops to create indirect substrate interaction may alter the size of the cavity and create novel catalytic properties. More speciﬁcally, such engineering could result in PQQGDH-B with narrowed substrate speciﬁcity. Therefore we have introduced amino acid substitutions into the loop6BC region to improve the substrate speciﬁcity proﬁle. We focused on polar amino acid residues in loop6BC region and constructed a series of variants. Among these mutants, we found that Asn452Thr did, in

276

Igarashi and Sode

fact, show a narrowed substrate speciﬁcity proﬁle without a decrease in the catalytic activity (84). Because Asn452Thr showed greater substrate speciﬁcity than the wildtype enzyme, we investigated the eﬀect of the presence of lactose on the glucose measurement using both wild-type PQQGDH-B and Asn452Thr (Fig. 4). When using the wild-type enzyme, with increased lactose concentration, discoloring reaction of DCIP toward 10 mM glucose increased. No saturation was observed up to 25 mM lactose. In the presence of 10 mM lactose, the eﬀect was more than 10%. This inﬂuence was due to the high aﬃnity toward lactose and also catalytic eﬃciency compared with those for glucose. In contrast, when using Asn452Thr to measure glucose, the addition of 25 mM lactose only caused a 5% increase in the apparent rate of the reaction. Furthermore, saturation is reached at around 15 mM lactose concentration, and the signal can be increased by a maximum of 5%. This was due to the lower aﬃnity for lactose and higher aﬃnity for glucose compared with the wild-type enzyme. Considering that Asn452Thr showed lower Vmax/Km values toward maltose and galactose than the wild-type enzyme, the engineered enzyme appears to have glucose speciﬁcity. 3.2

4-D Engineering Approaches

Optimized sensor fabrication using PQQGDH-B requires maximization of enzyme stability. There are several ways to increase the protein thermal

Figure 4 The reactivity of wild-type and Asn452Thr PQQGDH-B toward 10 mM glucose in the presence of lactose. o: Asn452Thr; .: wild type. The rate of reaction toward 10 mM glucose, in the absence of lactose, is presented as 100%.

Protein Engineering

277

stability, such as increasing the interaction of amino acid residues responsible for conformational stability using site-directed mutagenesis. It has also been proposed that by excluding conformers that are in the denaturation pathway, the stability of protein may be enhanced. On the basis of the assumption that the ﬁrst step of the inactivation of PQQGDH-B is dimer dissociation, the stabilization of quaternary structure should enhance the stability of PQQGDH-B. In order to prevent the dissociation of quaternary structure, we designed chemically cross-linked PQQGDH using glutaraldehyde (81). Chemically linked PQQGDH-B showed higher thermal stability than wild-type enzyme; the half-life at 55jC is 63 min whereas that of native enzyme is 4 min. This suggested that the thermal stability of PQQGDH-B is signiﬁcantly improved by stabilizing quaternary structure. However, chemical modiﬁcation resulted in a decrease in speciﬁc activity (less than 10% of native enzyme) as a result of nonspeciﬁc modiﬁcation of amino acid residues. Therefore additional rational design parameters must be applied. Here we present our eﬀorts in stabilizing PQQGDH quaternary structures by protein engineering (Table 5). 3.2.1

Tethered PQQGDH-B

In order to decrease the chance of dissociation of PQQGDH-B subunits, we have attempted to construct a linked dimeric PQQGDH-B by the in-frame gene fusion technique (85). A tethered PQQGDH-B is constructed using the linker peptide, ‘‘Glu-Leu-Gly-Thr-Arg-Gly-Ser-Ser-Arg-Val-Asp-Leu-Gln,’’ derived from a part of h-galactosidase in expression vector pTrc99A. We produce a tethered PQQGDH-B in E. coli as the active soluble enzyme (86). This enzyme shows enhanced thermal stability over the native enzyme expressed in E. coli. At incubation temperatures above 45jC, the residual

Table 5 Comparison of the Enzymatic Properties of Engineered PQQGDH-Bs

Wild type Ser231Lys Cross-linking Tethered Ser415Cys a

Half-life time at 55jC (min)

Speciﬁc activity (U/mg protein)

Km (mM)

5 40 63a 16 183

4104 3313 389 897 4134

25 27 20 20 16

Mixture of cross-linking status.

278

Igarashi and Sode

activity of a tethered PQQGDH-B is more than twice that of the native dimeric enzyme. Moreover, we evaluate the thermal stability at 55jC using half-life time. A tethered PQQGDH-B shows longer half-life time at 55jC (17 min) compared with the wild type (5 min) (Fig. 5). The Vmax value of tethered PQQGDH-B is 897 U/mg protein with about 10–40% of the catalytic activity of the native one. The presence of the linker region prohibits the complete dissociation of the subunits. By linking the subunits, the entropy of denaturation decreased with a concomitant increase in the thermal stability. However, the length and ﬂexibility of linker peptide should be further optimized to construct a thermostable tethered enzyme with appropriate catalytic activity. 3.2.2

Ser415Cys

Although the stabilization of the quaternary structure of PQQGDH-B by the chemical modiﬁcation (81) or tethering with linker peptide (86) improves the thermal stability of PQQGDH-B, these modiﬁcations resulted in a decrease in catalytic activity. We therefore attempted to introduce disulﬁde bond into the dimer interface of PQQGDH-B to form covalent bonds between subunits and to stabilize its quaternary structure (87). We searched the residues, which are not associated with the active site but face each other. We speciﬁed Ser415; this residue is in loop5CD, does not participate in the active site, and faces the dimer interface. The distance between each side chain is 6.12 A˚ (Og–Og), so

Figure 5 Thermal stability of wild-type and tethered PQQGDH-B at 55jC. o: Tethered PQQGDH-B; .: wild type. The residual activity of PQQGDHs was determined at 25jC.

Protein Engineering

279

that a disulﬁde bond may be formed after the substitution of a Cys residue. Ser415 is therefore selected for substitutions to Cys. Ser415Cys shows 36 times higher thermal stability at 55jC than wild type (half-life; 183 vs. 5 min) without any decrease in catalytic activity (kcat; 3461 s1) (Fig. 6). Moreover, after incubation at 70jC for 10 min, Ser415Cys retains more than 90% of GDH activity. Disulﬁde bond formation between the subunits is conﬁrmed by comparing SDS-PAGE in the presence or absence of reductants. Our results indicate that the introduction of one Cys residue in each monomer of PQQGDH-B resulted in the formation of a disulﬁde bond at the dimer interface and thus achieving a large increase in the thermal stability of the enzyme. Ser415Cys shows about four times higher thermal stability compared with Ser231Lys at 55jC. 3.3

Recombinant Production of PQQGDH

Considering the huge market for self-monitoring blood glucose sensors, and the potential to engineer these enzymes, the next obvious challenge is to produce recombinant enzymes eﬃciently in recombinant systems. The expression of each recombinant PQQGDH-A and PQQGDH-B of A. calcoaceticus or PQQGDH-A of E. coli using E. coli as the host strain was ﬁrst reported by Cleton-Jansen et al. (67). Since both enzymes require PQQ and bivalent metal ion for enzymatic activity, recombinant PQQGDHs in E.

Figure 6 Thermal stability of Ser415Cys PQQGDH-B compared with wild-type and Ser231Lys at 55jC. o: Ser415Cys; .: wild type; n: Ser231Lys. The residual activity of PQQGDHs was determined at 25jC.

280

Igarashi and Sode

coli were produced as apoenzymes. We found that the apo-PQQGDHs are less thermally stable than holo-PQQGDHs. In order to prevent the denaturation of recombinant PQQGDHs during E. coli-based production, we ﬁrst tried recombinant PQQGDH-A production with PQQ in the medium (88). The presence of PQQ and Ca2+ in the medium resulted in increase productivity. It is known that PQQ is synthesized in several microorganisms, including Acinetobacter, Pseudomonas, and Klebsiella, and the operonencoding PQQ biosynthetic pathway has been cloned from several genera. The introduction of the PQQ biosynthesis operon from E. coli enabled the PQQ biosynthesis. An E. coli strain carrying both a vector with the Klebsiella-derived PQQ biosynthesis operon (89) and an expression vector for the PQQGDH-A structural gene was generated. This double recombinant produced holo-PQQGDH-A. However, this system had the problem that the population of E. coli cells harboring both plasmids decreased during cultivation. Considering the genetic instability, the integration of the PQQ operon into the host chromosome may be preferable. Alternatively, a PQQ-synthesizing bacterium could be used as the expression system of PQQGDH in such microorganisms which was also reported in Acinetobacter strains and Pseudomonas strains. However, the use of the broad host range vectors in these host strains has inherent problems with respect to the production of recombinant proteins. However, Klebsiella can synthesize PQQ and also maintain several E. coli expression vector systems. We recently reported recombinant PQQGDHB production utilizing Klebsiella pneumoniae as the host strain and a conventional E. coli expression vector for PQQGDH-B production (90). The recombinant K. pneumoniae expressed PQQGDH-B in its holoform at levels about equal to that achieved in recombinant E. coli (Fig. 7). The signal sequence of recombinant PQQGDH-B was correctly processed. In the above-mentioned recombinant systems, PQQGDHs are being accumulated in the cell during production. Therefore cell disruption is essential for the recovery of holoenzyme. Considering that PQQGDH-B is secreted in the periplasmic space of the Gram-negative bacteria by posttranslationally processing the signal peptide, extracellular production of recombinant PQQGDH-B will be expected. We have achieved this type of production of PQQGDH-B using the methylotrophic yeast Pichia pastoris. P. pastoris is known for the expression of heterologous genes requiring secretion. One of the important factors for this system is the alcohol oxidase I (AOX1) promoter of P. pastoris which can regulate the expression of foreign genes by the concentration of methanol. Furthermore, since the molecular genetic manipulation of P. pastoris is similar to that of the well-characterized Saccharomyces cerevisiae expression system, P. pastoris is also widely accep-

Protein Engineering

281

Figure 7 Time course of PQQGDH-B production in E. coli PP2418 and K. pneumoniae NCTC418. .: Growth of K. pneumoniae; o: productivity in K. pneumoniae; n: growth of E. coli; 5: productivity in E. coli. Each culture contains 1 mM CaCl2.

Figure 8 Growth curve and GDH production in recombinant P. pastoris. concentration OD600; o: produced PQQGDH-B (kU/L).

.: Cell

282

Igarashi and Sode

ted. Instead of the native signal sequence of PQQGDH-B, the S. cerevisiae afactor signal sequence was used for the secretion of PQQGDH-B in P. pastoris (91). The productivity of secreted PQQGDH-B reached 218 kU/l (43 mg/l) which was almost the same as that of the recombinant PQQGDH-B previously produced in E. coli (Fig. 8). The secreted PQQGDH-B in P. pastoris was glycosylated but showed similar enzymatic properties as compared with those of the recombinant PQQGDH-B produced in E. coli. Further optimization of the downstream process and culture condition for high-level production of the recombinant PQQGDH-B by P. pastoris is expected to achieve industrial level production. 4 4.1

APPLICATION OF ENGINEERED PQQGDH FOR GLUCOSE SENSORS AND FOR DNA SENSORS Glucose Sensors

Ultimate goal in the engineering of PQQGDHs is the construction of an optimized enzyme for monitoring glucose. Our ﬁrst attempt involved the utilization of engineered PQQGDH-A for the construction of a glucose sensor with extended dynamic range (92). Fig. 9 shows the correlation between glucose concentration and enzymatic activities of both wild-type E. coli PQQGDH-A and His775Asp. The enzymatic activity of wild-type E. coli PQQGDH-A almost saturated at the glucose concentration higher than 10 mM. The enzymatic activity of His775Asp increased with the increased glucose concentration between 10

Figure 9 A calibration curve of all the ranger glucose sensor, employing engineered PQQGDHs, His775Asn and His775Asp. 5: His775Asn (linearity; 2–31 mM); n: His775Asp (linearity; 9–80 mM); o: His775Asn and His775Asp (linearity; 3–70 mM).

Protein Engineering

283

and 50 mM. With diabetes, glucose concentration in blood is often over 200 mg/dl (11.1 mM). Considering that in disposable glucose sensors blood samples are directly subjected to the sensor element, the high Km value of His775Asp is an important property of the sensor element. The increased dynamic range of glucose measurement with the engineered enzyme enabled us to develop a strategy for a glucose sensor with an expanded dynamic range (92). The proposed strategy improves the dynamic range of the biosensor by utilizing protein-engineered PQQGDH-As with diﬀerent Km values, which, in turn, expands the dynamic range. The composite extended-range glucose sensor employing two engineered PQQGDH-As, His775Asp and His775Asn, was demonstrated. The extended-range glucose sensor showed not only an expanded dynamic range (3–70 mM), but also greater substrate speciﬁcity for glucose due to the engineered enzymes (Fig. 9). Another type of glucose sensor was constructed utilizing one of the multichimeric PQQGDH-As, E32A27E38A3 His782Asp, with increased cofactor binding stability, thermal stability, an alteration in substrate speciﬁcity, and increased Km value for glucose compared with the wild-type E. coli PQQGDH-A (93). The application of E32A27E38A3 His782Asp in amperometric glucose sensor construction achieved an expanded dynamic range together with increased operational stability and greater substrate speciﬁcity. The glucose sensor can measure glucose from 5 to 40 mM, which should allow for the direct measurement of high blood glucose levels in diabetic patients. We have also reported a glucose enzyme sensor with engineered PQQGDH-B. We employed the enzyme with increased thermal stability, Ser231Lys (81). The residual activity after heat treatment at 60jC for 2 h was 80% of the initial activity, whereas the electrode fabricated with wild-type PQQGDH-B was 30% (Fig. 10). This result showed that the sensor employing Ser231Lys exhibited signiﬁcantly enhanced thermal stability and promises a high operational stability. 4.2

Application of PQQGDH for DNA Sensors

DNA sensing has become very important since it is a powerful tool for detection of the toxic microorganisms in food (or the environment) and may also be used for fundamental studies in molecular biology (94). Many types of DNA sensing systems have been developed such as DNA microarrays based on ﬂuorescence detection of the hybridization using the probe DNA labeled with ﬂuorescent compounds. An electrochemical DNA sensing system would also be of interest since it only requires an electrode and relatively simple electrochemical instrumentation. Most current electrochemical DNA sensing

284

Igarashi and Sode

Figure 10 Thermal stability of the sensor employing PQQGDH-Bs at 60jC. o: Ser231Lys; .: wild type. After heat treatment, the responses at 0.99 mM glucose were measured at 25jC. The measurement was carried out in 10 mM MOPS-NaOH (pH 7.0) containing 1 mM methoxy-PMS, 1 AM PQQ, and 1 mM CaCl2. The operating potential: +100 mV vs. Ag/AgCl; temperature for measurement: 25jC.

systems are based on electrochemically active probes that detect the hybridization events. Examples include the use of redox intercalators to recognize the double-stranded DNA, DNA detection via a DNA-mediated electron transfer to the electrode using mediators, and the use of ferrocene-labeled oligonucleotide probe which is hybridized to the DNA immobilized on the electrode. To improve the sensitivity, an enzyme label was used for the detection of hybridization since enzyme labels can dramatically amplify the signal produced by DNA hybridization. We proposed a novel DNA sensing system utilizing PQQGDH-B as the probing enzyme for DNA detection. In order to detect the hybridization of the DNA probe with the target DNA, the PQQGDH was chemically conjugated with avidin. Using the sensor system, we aimed to detect the speciﬁc DNA sequence of a pathogenic bacteria, Salmonella virulence (invA) gene. The probe DNA bearing complementary to the speciﬁc sequence of invA gene was immobilized onto the carbon paste electrode, and the biotinylated target DNA was hybridized to it. After the hybridization, PQQGDH–avidin conjugate was added, and the electric current generated from glucose oxidization catalyzed by PQQGDH via 1-methoxyphenazine methosulfate (m-PMS) as a mediator was measured. The sensor response was increased by glucose addition, and it increased in the range from 5.0108 to 1.0105 M as DNA concentration increased in the presence of 6.3 mM glucose (Fig. 11).

Protein Engineering

285

Figure 11 Calibration curve of the sensor employing PQQGDH-B for DNA. The gray colored region displayed the measurable concentration in this study. For measurement, 0.2 unit of the PQQGDH–avidin conjugate was added to 10 ml of 10 mM MOPS-NaOH (pH 7.0) containing 1 mM m-PMS, 1 mM CaCl2, 1 AM PQQ, and 6.3 mM glucose. The operational potential: +100 mV vs. Ag/AgCl. Temperature for measurement: 25jC. o: Probe DNA; .: control DNA.

Routine use of DNA-based analyses, such as SNP or pathogen detection, will require both simplicity and sensitivity in sensor design. Our DNA sensing system has the advantage of stability of substrate compared to the peroxidase since the hydrogen peroxide is very reactive and decreases quickly. Glucose oxidase (GOD) may also be used, but the samples for DNA sensing can be from sources such as human blood, foods, and soils so that the dissolved oxygen concentration (a variable in the GOD reaction) might diﬀer. 5

CONCLUSION

Due to the high speciﬁc activity of PQQGDH-B vs. PQQGDH-A, current glucose sensor technology employing GDH is limited to PQQGDH-B.

286

Igarashi and Sode

However, comparing the catalytic eﬃciencies (kcat/Km) of both PQQGDHA and PQQGDH-B, no signiﬁcant diﬀerence is observed. Considering the narrow substrate speciﬁcity of PQQGDH-A, the development of enzyme sensor employing this enzyme will expand the use of quinoprotein dehydrogenases. The applications of GOD may be limited due to its low catalytic eﬃciency (compared with both types of PQQGDH) and its dependence on O2 partial pressure. Our recent advances in the protein engineering of PQQGDHs indicate that these enzymes may be the most versatile and ideal for glucose sensors. Of equal importance, the application of the engineered enzyme is not limited to glucose sensors but may be used to detect DNA (and possibly other molecules) via electrochemical coupling.

REFERENCES 1. 2.

3. 4.

5. 6. 7. 8. 9.

10. 11. 12.

13.

JG Hauge. Glucose dehydrogenase from Bacterium antiratum: an enzyme with a novel prosthetic group. J Biol Chem 239:3630–3639, 1964. J Westerling, J Frank, JA Duine. The prosthetic group of methanol dehydrogenase from Hyphomicrobium X: electron spin resonance evidence for a quinone structure. Biochem Biophys Res Commun 87:719–724, 1979. SA Salisbury, H Forrest, WBT Cruse, O Kennard. A novel coenzyme from bacterial primary alcohol dehydrogenases. Nature 280:843–844, 1979. C Anthony. Methanol dehydrogenase in Gram-negative bacteria. In: V Davidson, ed. Principles and Applications of Quinoproteins. New York: Dekker, 1993, pp 17–45. C Anthony. Quinoprotein-catalyzed reactions. Biochem J 320:697–711, 1996. C Anthony. The structure and function of the PQQ-containing quinoprotein dehydrogenases. Prog Biophys Mol Biol 69:1–21, 1998. C Anthony. Pyrroloquinoline quinone (PQQ) and quinoprotein enzymes. Antioxid Redox Signal 3:757–774, 2001. JA Duine. PQQ and quinoprotein research—the ﬁrst decade. Biofactors 2:87–94, 1989. JA Duine. Quinoproteins: enzymes containing the quinonoid cofactor pyrroloquinoline quinone, topaquinone or tryptophan–tryptophyl quinone. Eur J Biochem 200:271–284, 1991. JA Duine. The PQQ story. J Biosci Bioeng 88:231–236, 1999. PM Goodwin, C Anthony. The biochemistry, physiology and genetics of PQQ and PQQ-containing enzymes. Adv Microb Physiol 40:1–80, 1998. K Matsushita, O Adachi. Bacterial quinoproteins glucose dehydrogenase and alcohol dehydrogenase. In: V Davidson, ed. Principles and Applications of Quinoproteins. New York: Dekker, 1993, pp 245–273. K Matsushita, H Toyama, M Yamada, O Adachi. Quinoproteins: structure, function, and biotechnological applications. Appl Microbiol Biotechnol 58:13– 22, 2001.

Protein Engineering

287

14. WS McIntire. Quinoproteins. FASEB J 8:513–521, 1994. 15. BW Groen, MA van Kleef, JA Duine. Quinohaemoprotein alcohol dehydrogenase apoenzyme from Pseudomonas testosteroni. Biochem J 234:611–615, 1986. 16. H Toyama, A Fujii, K Matsushita, E Shinagawa, M Ameyama, O Adachi. Three distinct quinoprotein alcohol dehydrogenases are expressed when Pseudomonas putida is grown on diﬀerent alcohols. J Bacteriol 177:2442–2450, 1995. 17. G Zarnt, T Schrader, JR Andreesen. Catalytic and molecular properties of the quinohemoprotein tetrahydrofurfuryl alcohol dehydrogenase from Ralstonia eutropha strain Bo. J Bacteriol 183:1954–1960, 2001. 18. M Shimao, K Ninomiya, O Kuno, N Kato, C Sakazawa. Existence of a novel enzyme, pyrroloquinoline quinone-dependent polyvinyl alcohol dehydrogenase, in bacterial symbiont, Pseudomonas sp. strain VM15C. Appl Environ Microbiol 51:268–275, 1986. 19. M Yasuda, A Cherepanov, JA Duine. Polyethylene glycol dehydrogenase activity of Rhodopseudomonas acidophila derives from a type I quinohaemoprotein alcohol dehydrogenase. FEMS Microbiol Lett 138:23–28, 1996. 20. F Kawai, H Yamanaka, M Ameyama, E Shinagawa, K Matsushita, O Adachi. Identiﬁcation of the prosthetic group and further characterization of a novel enzyme, polyethylene-glycol dehydrogenase. Agric Biol Chem 49:1071–1076, 1985. 21. K Matsushita, H Toyama, O Adachi. Respiratory chain and bioenergetics of acetic acid bacteria. In: AH Rose, DW Tempest, eds. Advances in Microbial Physiology. Vol. 36. London: Academic Press, 1994, pp 247–301. 22. K Sode, K Matsumura, W Tsugawa, M Tanaka. Isolation of a marine bacterial pyrroloquinoline quinone-dependent glucose dehydrogenase. J Mar Biotechnol 2:214–218, 1995. 23. D Moonmangmee, Y Fujii, H Toyama, G Theeragool, N Lotong, K Matsushita, O Adachi. Puriﬁcation and characterization of membrane-bound quinoprotein cyclic alcohol dehydrogenase from Gluconobacter frateurii CHM 9. Biosci Biotechnol Biochem 65:2763–2772, 2001. 24. O Adachi, Y Fujii, MF Ghaly, H Toyama, E Shinagawa, K Matsushita. Membrane-bound quinoprotein D-arabitol dehydrogenase of Gluconobacter suboxydans IFO 3257: a versatile enzyme for the oxidative fermentation of various ketoses. Biosci Biotechnol Biochem 65:2755–2762, 2001. 25. JA Zahn, DJ Bergmann, JM Boyd, RC Kunz, AA DiSpirito. Membraneassociated quinoprotein formaldehyde dehydrogenase from Methylococcus capsulatus Bath. J Bacteriol 183:6832–6840, 2001. 26. DJ Hopper, J Rogozinski. Redox potential of the haem group in the quinocytochrome, lupanine hydroxylase, an enzyme located in the periplasm of a Pseudomonas sp. Biochim Biophys Acta 1383:160–164, 1998. 27. ES Choi, EH Lee, SK Rhee. Puriﬁcation of membrane-bound sorbitol dehydrogenase from Gluconobacter suboxydans. FEMS Microbiol Lett 125:45–49, 1995. 28. A Asakura, T Hoshino. Isolation and characterization of a new quinoprotein

288

29. 30.

31.

32. 33. 34. 35. 36.

37. 38. 39.

40.

41.

42.

43.

44.

Igarashi and Sode dehydrogenase, L-sorbose/L-sorbosone dehydrogenase. Biosci Biotech Biochem 62:469–478, 1999. MAG van Kleef, JA Duine. Bacterial NAD(P)-independent quinate dehydrogenase is a quinoprotein. Arch Microbiol 150:32–36, 1988. AS Vangnai, DJ Arp, LA Sayavedra-Soto. Two distinct alcohol dehydrogenases participate in butane metabolism by Pseudomonas butanovora. J Bacteriol 184:1916–1924, 2002. M Ameyama, E Shinagawa, K Matsushita, O Adachi. Solubilization, puriﬁcation and properties of membrane-bound glycerol dehydrogenase from Gluconobacter industrius. Agric Biol Chem 49:1001–1010, 1985. C Anthony, LJ Zatman. Isolation and properties of Pseudomonas sp. M27. Biochem J 92:609–614, 1964. C Anthony, LJ Zatman. The methanol-oxidizing enzyme of Pseudomonas sp. M27. Biochem J 92:614–621, 1964. C Anthony. The bacterial oxidation of methane and methanol. Adv Microb Physiol 27:113–210, 1986. DN Nunn, DJ Day, C Anthony. The second subunit of methanol dehydrogenase of Methylobacterium extorquens AM1. Biochem J 260:857–862, 1989. ZX Xia, WW Dai, JP Xiong, ZP Hao, VL Davidson, S White, FS Mathews. The 3-dimensional structures of methanol dehydrogenase from 2 methylotrophic bacteria at 2.6A˚ resolution. J Biol Chem 267:22289–22297, 1992. M Paori. Protein folds propelled by diversity. Prog Biophys Mol Biol 76:103– 130, 2001. H Go¨risch, M Rupp. Quinoprotein ethanol dehydrogenase from Pseudomonas. Antonie van Leeuwenhoek 56:35–45, 1989. T Keitel, A Diehl, T Knaute, JJ Stezowski, W Ho¨hne, H Go¨risch. X-ray structure of the quinoprotein ethanol dehydrogenase from Pseudomonas aeruginosa : basis of substrate speciﬁcity. J Mol Biol 297:961–974, 2000. A Oubrie, HJ Rozeboom, KH Kalk, EG Huizinga, BW Dijkstra. Crystal structure of quinohemoprotein alcohol dehydrogenase from Comamonas testosteroni structural basis for substrate oxidation and electron transfer. J Biol Chem 277:3727–3732, 2002. O Adachi, K Tayama, E Shinagawa, K Matsushita, M Ameyama. Puriﬁcation and characterization of particulate alcohol dehydrogenase from Gluconobacter suboxydans. Agric Biol Chem 42:2045–2056, 1978. O Adachi, E Miyagawa, E Shinagawa, K Matsushita, M Ameyama. Puriﬁcation and properties of particulate alcohol dehydrogenase from Acetobacter aceti. Agric Biol Chem 42:2331–2340, 1978. A Ramanavicius, K Habermu¨ller, E Cso¨regi, V Laurinavicius, W Schuhmann. Polypyrrole-entrapped quinohemoprotein alcohol dehydrogenase. Evidence for direct electron transfer via conducting-polymer chains. Anal Chem 71:3581– 3586, 1999. K Matsushita, E Shinagawa, O Adachi, M Ameyama. Quinoprotein D-glucose dehydrogenase of the Acinetobacter calcoaceticus respiratory chain: membranebound and soluble forms are diﬀerent molecular species. Biochemistry 28:6276– 6280, 1989.

Protein Engineering

289

45. AM Cleton-Jansen, N Goosen, TJ Wenzel, P van de Putte. Cloning of the gene encoding quinoprotein glucose dehydrogenase from Acinetobacter calcoaceticus: evidence for the presence of a second enzyme. J Bacteriol 170:2121–2125, 1988. 46. AM Cleton-Jansen, N Goosen, G Odle, P van de Putte. Nucleotide sequence of the gene coding for quinoprotein glucose dehydrogenase from Acinetobacter calcoaceticus. Nucleic Acids Res 16:6228, 1988. 47. AM Cleton-Jansen, N Goosen, O Fayet, P van de Putte. Cloning, mapping, and sequencing of the gene encoding Escherichia coli quinoprotein glucose dehydrogenase. J Bacteriol 172:6308–6315, 1990. 48. K Matsushita, Y Ohno, E Shinagawa, O Adachi, M Ameyama. Membranebound, electron transport-linked, D-glucose dehydrogenase of Pseudomonas ﬂuorescens. Interaction of the puriﬁed enzyme with ubiquinone or phospholipid. Agric Biol Chem 46:1007–1011, 1982. 49. M Beardmore-Gray, C Anthony. The oxidation of glucose by Acinetobacter calcoaceticus: interaction of the quinoprotein glucose dehydrogenase with the electron transport chain. J Gen Microbiol 132:1257–1268, 1986. 50. AM Cleton-Jansen, S Dekker, P van de Putte, N Goosen. A single amino acid substitution changes the substrate speciﬁcity of quinoprotein glucose dehydrogenase in Gluconobacter oxydans. Mol Gen Genet 229:206–212, 1991. 51. CJ Pujol, CI Kado. gdhB, a gene encoding a second quinoprotein glucose dehydrogenase in Pantoea citrea, is required for pink disease of pineapple. Microbiology 145:1217–1226, 1999. 52. AH Goldstein, PU Krishnaraj, Submitted for publication (GenBank accession number: AF441442) 53. GE Cozier, C Anthony. Structure of quinoprotein glucose dehydrogenase of Escherichia coli modeled on that of methanol dehydrogenase from Methylobacterium extorquens. Biochem J 312:679–685, 1995. 54. AB Witarto, S Ohuchi, M Narita, K Sode. Secondary structure study of pyrroloquinoline quinone glucose dehydrogenase. J Biochem Mol Biol Biophys 2:209–213, 1999. 55. M Yamada, K Sumi, K Matsushita, O Adachi, Y Yamada. Topological analysis of quinoprotein glucose dehydrogenase in Escherichia coli and its ubiquinonebinding site. J Biol Chem 268:12812–12817, 1993. 56. K Sode, H Sano. Glu742 substitution to Lys enhances the EDTA tolerance of Escherichia coli PQQ glucose dehydrogenase. Biotechnol Lett 16:455–460, 1994. 57. K Sode, K Watanabe, S Ito, K Matsumura, T Kikuchi. Thermostable chimeric PQQ glucose dehydrogenase. FEBS Lett 364:325–327, 1995. 58. K Sode, H Yoshida, K Matsumura, T Kikuchi, M Watanabe, N Yasutake, S Ito, H Sano. Elucidation of the region responsible for EDTA tolerance in PQQ glucose dehydrogenases by constructing Escherichia coli and Acinetobacter calcoaceticus chimeric enzymes. Biochem Biophys Res Commun 211:268–273, 1995. 59. K Sode, K Kojima. Improved substrate speciﬁcity and dynamic range for glucose measurement of Escherichia coli PQQ glucose dehydrogenase by site directed mutagenesis. Biotechnol Lett 19:1073–1077, 1997. 60. K Sode, H Yoshida. Construction and characterization of a chimeric Escherichia

290

61.

62.

63.

64.

65.

66.

67.

68.

69.

70.

71.

72.

73.

Igarashi and Sode coli PQQ glucose dehydrogenase (PQQGDH) with increased EDTA tolerance. Denki Kagaku 65:444–451, 1997. H Yoshida, K Sode. Thr424 to Asn substitution alters bivalent metal speciﬁcity of pyrroloquinoline quinone glucose dehydrogenase. J Biochem Mol Biol Biophys 1:89–93, 1997. J Okuda, H Yoshida, K Kojima, M Himi, K Sode. The role of conserved His775/ 781 in membrane-binding PQQ glucose dehydrogenase of Escherichia coli and Acinetobacter calcoaceticus. J Biochem Mol Biol Biophys 4:415–422, 2000. H Yoshida, K Kojima, AB Witarto, K Sode. Engineering a chimeric pyrroloquinoline quinone glucose dehydrogenase: improvement of EDTA tolerance, thermal stability and substrate speciﬁcity. Protein Eng 12:63–70, 1999. LD Elias, M Tanaka, H Izu, K Matsushita, O Adachi, M Yamada. Functions of amino acid residues in the active site of Escherichia coli PQQ-containing quinoprotein glucose dehydrogenase. J Biol Chem 275:7321–7326, 2000. LD Elias, M Tanaka, M Sakai, M Toyama, K Matsushita, O Adachi, M Yamada. C-terminal periplasmic domain of Escherichia coli quinoprotein glucose dehydrogenase transfers electrons to ubiquinone. J Biol Chem 276:48356–48361, 2001. GE Cozier, RA Salleh, C Anthony. Characterization of the membrane quinoprotein glucose dehydrogenase from Escherichia coli and characterization of a site-directed mutant in which histidine-262 has been changed to tyrosine. Biochem J 340:639–647, 1999. AM Cleton-Jansen, N Goosen, K Vink, P van de Putte. Cloning, characterization and DNA sequencing of the gene encoding the Mr 50,000 quinoprotein glucose dehydrogenase from Acinetobacter calcoaceticus. Mol Gen Genet 217: 430–436, 1989. A Oubrie, HJ Rozeboom, KH Kalk, JA Duine, BW Dijkstra. The 1.7A˚ crystal structure of the apo-form of the soluble quinoprotein glucose dehydrogenase from Acinetobacter calcoaceticus reveals a novel internal conserved sequence repeat. J Mol Biol 289:319–333, 1999. K Sode, T Ohtera, M Shirahane, AB Witarto, S Igarashi, H Yoshida. Increasing the thermal stability of the water-soluble pyrroloquinoline quinone glucose dehydrogenase by single amino acid replacement. Enzyme Microb Technol 26: 491–496, 2000. P Dokter, J Frank, JA Duine. Puriﬁcation and characterization of quinoprotein glucose dehydrogenase from Acinetobacter calcoaceticus L.M.D.79.41. Biochem J 239:163–167, 1986. S Igarashi, T Ohtera, H Yoshida, AB Witarto, K Sode. Construction and characterization of mutant water-soluble PQQ glucose dehydrogenases with altered Km value—site-directed mutagenesis studies on the putative active site. Biochem Biophys Res Commun 264:820–824, 1999. A Oubrie, HJ Rozeboom, KH Kalk, AJJ Olsthoorn, JA Duine, BW Dijkstra. Structure and mechanism of soluble quinoprotein glucose dehydrogenase. EMBO J 18:5187–5194, 1999. LC Clark, C Lyons. Electrode systems for continuous monitoring in vascular surgery. Ann NY Acad Sci 102:29–45, 1962.

Protein Engineering

291

74. AEG Cass, DG Francis, HAO Hill, WJ Aston, IJ Higgins, EV Plotkin, LD Scott, APF Turner. Ferrocene-mediated enzyme electrode for amperometric determination of glucose. Anal Chem 56:667–671, 1984. 75. DR Matthews, RR Holman, E Brown, J Steemson, A Watson, S Hughes, D Scott. Pen sized digital 30-second blood glucose meter. Lancet 4:778–779, 1987. 76. Y Degani, A Heller. Direct electrical communication between chemically modiﬁed enzymes and metal electrodes. 1. Electron transfer from glucose oxidase to metal electrodes via electron relays, bound covalently to the enzyme. J Phys Chem 91:1285–1288, 1987. 77. EJ D’Costa, IJ Higgins, APF Turner. Quinoprotein glucose dehydrogenase and its application in an amperometric glucose sensor. Biosensors 2:71–87, 1986. 78. L Ye, M Hammerle, AJJ Olsthoorn, W Schuhmann, HL Schmidt, JA Duine, A Heller. High current density ‘‘wired’’ quinoprotein glucose dehydrogenase electrode. Anal Chem 65:238–241, 1993. 79. GJ Kost, HT Vu, JH Lee, P Bourgeois, FL Kiechle, C Martin, SS Miller, AO Okorodudu, JJ Podczasy, R Webster, KJ Whitlow. Multicenter study of oxygeninsensitive handheld glucose point-of-care testing in critical care/hospital/ ambulatory patients in the United States and Canada. Crit Care Med 26:581– 590, 1998. 80. H Yoshida, T Iguchi, K Sode. Construction of multi-chimeric pyrroloquinoline quinone glucose dehydrogenase with improved enzymatic properties and application in glucose monitoring. Biotechnol Lett 22:1505–1510, 2000. 81. Y Takahashi, S Igarashi, Y Nakazawa, W Tsugawa, K Sode. Construction and characterization of glucose enzyme sensor employing engineered water soluble PQQ glucose dehydrogenase with improved thermal stability. Electrochemistry 68:907–911, 2000. 82. JA Duine, J Frank, JA Jongejan. Detection and determination of pyrroloquinoline quinone, the coenzyme of quinoproteins. Anal Biochem 133:443–446, 1983. 83. M Yamada, H Inbe, M Tanaka, K Sumi, K Matsushita, O Adachi. Mutant isolation of the Escherichia coli quinoprotein glucose dehydrogenase and analysis of crucial residues Asp-730 and His-775 for its function. J Biol Chem 273:22021– 22027, 1998. 84. K Sode, S Igarashi, A Morimoto, H Yoshida. Construction of engineered water-soluble PQQ glucose dehydrogenase with improved substrate speciﬁcity. Biocatal Biotransform 20:405–412, 2002. 85. JM Slauch, TJ Silhavy. Genetic fusions as experimental tools. Methods Enzymol 204:213–248, 1991. 86. K Sode, M Shirahane, H Yoshida. Construction and characterization of a linked-dimeric pyrroloquinoline quinone glucose dehydrogenase. Biotechnol Lett 21:707–710, 1999. 87. S Igarashi, K Sode. The stabilization of quaternary structure of water-soluble quinoprotein glucose dehydrogenase. Molec Biotech 24:97–103, 2003. 88. K Sode, AB Witarto, K Watanabe, K Noda, S Ito, W Tsugawa. Over expression of PQQ glucose dehydrogenase in Escherichia coli under holoenzyme forming condition. Biotechnol Lett 16:1265–1268, 1994. 89. K Sode, K Ito, AB Witarto, K Watanabe, H Yoshida, P Postma. Increased

292

90.

91.

92. 93.

94.

Igarashi and Sode production of recombinant pyrroloquinoline quinone (PQQ) glucose dehydrogenase by metabolically engineered Escherichia coli strain capable of PQQ biosynthesis. J Biotechnol 49:239–243, 1996. K Kojima, AB Witarto, K Sode. The production of soluble pyrroloquinoline quinone glucose dehydrogenase by Klebsiella pneumoniae, the alternative host of PQQ enzymes. Biotechnol Lett 22:1343–1347, 2000. H Yoshida, N Araki, A Tomisaka, K Sode. Secretion of water soluble pyrroloquinoline quinone glucose dehydrogenase by recombinant Pichia pastoris. Enzyme Microb Technol 13:312–318, 2002. T Yamazaki, K Kojima, K Sode. Extended-range glucose sensor employing engineered glucose dehydrogenases. Anal Chem 72:4689–4693, 2000. H Yoshida, T Iguchi, K Sode. Construction of multi-chimeric pyrroloquinoline quinone glucose dehydrogenase with improved enzymatic properties and application in glucose monitoring. Biotechnol Lett 22:1505–1510, 2000. K Ikebukuro, Y Kohiki, K Sode. Amperometric DNA sensor using the pyrroquinoline quinone glucose dehydrogenase-avidin conjugate. Biosens Bioelectron 17:1075–1080, 2002.

13 The Proline Rule: A Concept for Engineering Protein Stability Yuzuru Suzuki Kyoto Prefectural University Kyoto, Japan

1

INTRODUCTION

The side chain of proline is made of an aliphatic ﬁve-membered pyrrolidine ring. The side chain is unique among protein-constituting amino acids in that its terminal Cy is covalently bonded to the preceding peptide bond nitrogen. The polypeptide backbone at this point has no amide hydrogen for use as a donor in hydrogen bonding. The pyrrolidine ring makes rigid constraints on the rotation about the NCa bond of the peptide backbone and drastically ﬁxes the angle / of rotation to approximately 58j (1). The conformational energy of a proline residue depends largely on the angle w of rotation about the CaC bond of the peptide backbone. The proline residue, when isolated from other proline residues in a polypeptide chain, has two energy minima at w=55j and w=145j (1). The bulky pyrrolidine ring restricts the available conformational space of the preceding residue in the polypeptide chain if its side chain extends at least to Ch (1,2). The peptide bond (Xaai1Proi) preceding a proline residue, unlike those preceding other amino acid residues, might not be expected to have double bond character due to the lack of 293

294

Suzuki

an imide hydrogen (3). The Xaai1Proi peptide bond is more likely to adopt the cis rather than the trans conﬁguration, compared with other peptide bonds (probability: 0.10.3 vs. <103) (2,3). The diﬀerence in the standard free energy DG j between cis and trans Xaai1Proi peptide bonds is on the order of +8 to 8 kJ/mol (1 J=0.239 cal) (4), and the activation energy for the cis–trans isomerization is about 54 kJ/mol, which is far less than 84 kJ/ mol for other peptide bonds (2,3). However, the rate of the cis–trans isomerization is very slow for the Xaai1Proi peptide bond, with a half-time of 20 min at 0jC, compared with 1012 s1 for other peptide bonds (3). The above-mentioned exceptional properties of proline impose signiﬁcant constraints on the polypeptide conformation of proteins, their folding processes, and their functions (2–9). Thus, proline tends to be a conserved residue (2,10). In 1988, the frequency of proline was determined to be 5.1 mol% among 1021 unrelated proteins of known sequence (11). This frequency is comparable with that (4.8 mol% among 393 unrelated protein sequences) reported by Chakrabarti and Pal (9). In 1982, we found a big diﬀerence in the proline content between oligo-1,6-glucosidases from Bacillus cereus ATCC7064 and B. thermoglucosidasius KP1006 (12). At that time, there was no report about the relationship between proline frequency and thermostability, except one made by Crabb et al. (13) on D-glyceraldehyde-3-phosphate dehydrogenases from the facultative thermophile B. coagulans and the obligate thermophile B. stearothermophilus. On sequence comparison, they proposed a proline residue in a solvent-exposed variable loop of the B. stearothermophilus enzyme to contribute to its higher thermal stability. In 1987, we found a strong correlation between the number of proline residues and thermostability (expressed as tm*) for ﬁve Bacillus oligo-1,6-glucosidases (2,14–17). Based on this ﬁnding, we proposed a general principle for increasing protein thermostability, known as the Proline Rule (16,18). The rule states that globular proteins can be thermostabilized by increasing the frequency of proline in the second position (i+1) of h-turns.y A direct proof

tm is the temperature at which the half-life of the enzyme activity is 10 min at pH 6.8 (16). * y

The h-turn comprises four consecutive residues (i, i+1, i+2, and i+3) in a protein in which the polypeptide chain folds back on itself at an angle of nearly 180j (20). The position i+1 refers to the second position of the h-turn. There is a hydrogen bond between the CO group of residue i and the NH group of residue i+3. A type I turn has (/,w)2=(64j, 27j) and (/,w)3=(90j, 7j); a type II turn has (/,w)2=(60j, 131j) and (/,w)3=(84j, 1j); a type III turn has (/,w)2=(60j, 30j) and (/,w)3=(60j, 30j); a type VIII turn has (/,w)2=(72j, 33j) and (/,w)3=(123j, 121j); and a type IIV turn has (/,w)2=(60j, 126j) and (/,w)3=(91j, 1j), respectively, for residues i+1 and i+2 of the h-bend (20).

The Proline Rule

295

for the proposal was given by Matthews et al. (19). These authors increased the thermostability of bacteriophage T4 lysozyme by replacing an alanine residue with a proline residue in one of the h-turns to decrease the backbone entropy of the unfolded molecule. From 1994 to 1998, we cumulatively thermostabilized B. cereus oligo-1,4-glucosidase through stepwise addition of proline residues in 12 separate positions, namely, four i+1 positions in h-turns, four N1 positions in the ﬁrst turns of a-helices, and four positions in coils (21). The validity of the Proline Rule has been conﬁrmed by site-directed mutagenesis on various proteins (21–44). Many ﬁndings correlating proline residues to thermostability have been reported in numerous proteins from microorganisms including hyperthermophiles (45–63). Directed evolution for protein thermal stabilization has been found to include proline substitutions (64–67). It is furthermore interesting that many psychrophilic proteins have a low number of proline residues (68–72). In 1999, Suzuki (38) proposed the Proline Rule in reﬁned form. The Proline Rule suggests an evolutionary rule for protein thermal adaptation (38,73– 75) and a paradigm for engineering protein thermostability (19,21–23,38). This paper describes three steps that we have taken to examine and reﬁne this rule (38). 2 2.1

EXAMINATION OF THE PROLINE RULE Step 1: Folding Features of B. cereus Oligo-1,6-glucosidase

The oligo-1,6-glucosidase from B. cereus ATCC7064 has been produced in large amounts in Escherichia coli, puriﬁed to homogeneity, and crystallized (76). The tertiary structure of the enzyme has been determined by x-ray crystallography at 2.0 A˚ resolution (Fig. 1) (77,78). Overall structural features of the enzyme are similar to those of a-amylases from Aspergillus oryzae (79), pig pancreas (80,81), and B. licheniformis (82), and are common to the enzymes in the glycoside hydrolase family 13 (83,84). The B. cereus enzyme has several prominent features of folding: (a) The enzyme has three domains: the N-terminal domain (Ndomain, residues 1–104 and 175–480) sandwiched between the subdomain (residues 105–174) and the C-terminal domain (Cdomain, residues 481–558). This domain topology is typical for aamylases (79–82). (b) A deep active-site cleft and its wall are both made of polypeptide stretches connecting h-strands to their adjoining a-helices in the (h/a)8 barrel of the N-domain. The cleft is much deeper than those of a-amylases (78–82), but is pocket-shaped similar to the

296

Suzuki

Figure 1 Representations of B. cereus oligo-1,6-glucosidase (Ref. 78). (A) Perpendicular to the axis of the (h/a)8 barrel. (B) Along the axis of the (h/a)8 barrel. Spirals, arrows, and strings show a-helices, h-strands, and loops, respectively.

cleft of Neisseria polysaccharea amylosucrase (85). The pocketshaped cleft ensures the exotype catalysis of the B. cereus enzyme (12). The three catalytic residues Asp199, Glu255, and Asp329 as well as some substrate-binding residues such as His103 and His328 are at the bottom of the cleft [i.e., at the C-terminus of the (h/a)8 barrel] (86).

The Proline Rule

297

(c) Three long, ﬂexible, polypeptide stretches contributing to the wall of the active-site cleft are situated (i) between the third h-strand Nh3 and the third a-helix Na3, (ii) between the fourth h-strand Nh4 and the fourth a-helix Na4, and (iii) between the eighth hstrand Nh8 and the eighth a-helix Na8. The ﬁrst polypeptide stretch (i) forms the loop-rich subdomain containing an a-helix Sa1 and a sheet of three antiparallel h-strands Sh1, Sh2, and Sh3. (d) The N-domain has ﬁve additional a-helices Na6V, Na7V, Na8V, Na8W, and Na8j. The polypeptide stretch between the sixth hstrand Nh6 and the sixth a-helix Na6 contains a-helix Na6V; the polypeptide stretch between the seventh h-strand Nh7 and the seventh a-helix Na7 contains a-helix Na7V; and the polypeptide stretch between the h-strand Nh8 and the a-helix Na8 contains three a-helices Na8V, Na8W, and Na8j. a-Helices Na8j and Na8j, which protrude from the (h/a)8 barrel of the N-domain, are at the edge of the wall of the active-site cleft to cover the top of the cleft. (e) The C-domain has a h-barrel structure of eight antiparallel hstrands Ch1-Ch8 folded in two Greek key motifs. Other aglucosidases of diﬀerent substrate speciﬁcities in the glycoside hydrolase family 13 have the same folding architecture as the B. cereus enzyme (87). 2.2

Step 2: Sites of Proline Residues Critical for Thermostabilization

The sites and structural features likely to aﬀect thermostability when proline substitution is made have been sought in four oligo-1,6-glucosidases with diﬀerent degrees of thermostability from B. cereus ATCC7064, B. coagulans ATCC7060, B. thermoglucosidasius KP1006, and B. ﬂavocaldarius KP1228 (74). Multiple sequence alignments were made, with the 3D structure of the B. cereus enzyme taken in consideration (78). Sequences of the four enzymes deduced from the corresponding genes are 40–72% identical (Fig. 2) (73,74,88–90). The four enzymes share 176 residues distributed over their entire sequences. These residues include 96 of 108 residues commonly found in microbial a-glucosidases of the glycoside hydrolase family 13 (87), and these 96 residues, in turn, include 11 key residues universally conserved among the diﬀerent enzymes that belong to this family (Fig. 2) (83,84,91). These 11 key residues include three catalytic acid residues and two His residues interacting with the glycon a-glucosyl residue (i.e., at subsite 1 adjacent to the substrate bond to be cleaved) (73,74,86,88). Despite minor insertions and deletions, the four enzymes seem to contain all sequences

298

Suzuki

The Proline Rule

299

corresponding to secondary structure elements found in the B. cereus enzyme (78). 2.2.1

Conserved Proline Residues

The four enzymes share 14 proline residues (Fig. 2) (74). Ten of the 14 prolines are buried in the molecule and only four prolines (equivalent to Pro130, Pro231, Pro298, and Pro520 in the B. cereus enzyme) are solventexposed (78). Nine of the 14 prolines are placed in loops, whereas three prolines (equivalent to Pro231, Pro549, and Pro520 in the same enzyme) are located in positions i+1 in h-turns of types I and II and at the ﬁrst position i of a type III h-turn, and two prolines (equivalent to Pro130 and Pro362 in the enzyme) are located in h-strands Sh1 and Nh8 (78). The 14 conserved proline residues probably contribute to maintaining basic conformational integrity (21). It has been shown that for many other proteins, the replacement of conserved or buried proline residues with other amino acids drastically reduces their stability (25,92–105). 2.2.2

Nonconserved Proline Residues Critical for Thermostabilization

Besides the 14 conserved proline residues, the B. cereus, B. coagulans, B. thermoglucosidasius, and B. ﬂavocaldarius enzymes have 5, 10, 18, and 33 more proline residues, respectively (Fig. 2, Table 1) (74). These prolines seem critical for thermostability because their number is in direct proportion to the tm

Figure 2 Primary sequences and secondary structure elements of four Bacillus oligo-1,6-glucosidases (Ref. 74). Amino acid sequences of oligo-1,6-glucosidases from B. cereus, B. coagulans, B. thermoglucosidasius, and B. ﬂavocaldarius are from Refs. 73, 74, 88, and 90. The enzymes are labeled Bce, Bco, Bth, and Bﬂ, respectively. Residues identical to those of the B. cereus enzyme are indicated by asterisks. Gaps (–) are introduced during alignment. Secondary structure elements are boxed with their designations given beneath. The number of enzymes with a proline at a given position of the alignment is indicated below the four sequences at each occurrence of proline. The proline positions are indicated by black arrowheads (z) above the sequences. Residues shared with eight microbial a-glucosidases in the glycoside hydrolase family 13 (Refs. 78,87) are indicated by # and @ above the sequences; the @ sign denotes the 11 key residues universally conserved among (h/a)8 barrel enzymes in the same family 13 (Refs. 83,84,91).

300

Suzuki

Table 1 Nonconserved Proline Residues and Their Corresponding Residues in Oligo-1,6glucosidases from B. cereus, B. coagulans, B. thermoglucosidasius, and B. ﬂavocaldariusa,b Secondary structure element in Bce (domainc) a-Helix Na1 (N) Loop (N) Loop (N) a-Helix Sa1 (S) h-Turn I (S) Loop (S) Loop (S) Loop (S) h-Turn I (S) a-Helix Na3 (N) h-Turn II (N) Loop (N) Loop (N) Loop (N) Loop (N) (Loop in Bﬂ)f (N) h-Turn II (N) h-Turn II (N) a-Helix Na5 (N) Loop (N) h-Strand Nh6 (N) h-Turn I (N) (Loop in Bco)f (N) h-Strand Nh7 (N) Loop (N) a-Helix Na7 (N) Loop (N) Loop (N) h-Turn III (N) a-Helix Na8V (N) a-Helix Na8W (N) h-Turn I (N) Loop (N) Loop (N) Loop (N) h-Turn I (N) h-Turn I (N) h-Turn III (N) h-Turn I (C)

Positiond

N1 i+1

i+1 N1 i+1

i+1 i+2 N1

i+1

N1

i+1 N1 N1 i+1

i+1 i+1 i+1 i+1

Bce

Bco

Bth

Bﬂ

Exposed or buriede

Asp38 Asn73 Val101 Asn109 Lys121 Lys132 Glu136 Ser147 Lys165 Glu175 Glu208 Glu214 Glu216 Glu218 — — Pro257 Gly258 Thr261 Glu270 Val278 Glu290 — Asn321 Pro331 Arg344 Thr373 Glu378 Ile380 Ile386 Ile403 Asn417 Gln423 Asn428 Thr440 Pro 443 Lys457 Glu475 Glu487

Asp37 Lys72 Val100 Ala108 Pro120 Pro132 Pro136 Ser147 Lys165 Glu175 Leu208 Glu214 Pro216 — Lys220 — Ile257 Gly258 Val261 Pro270 Ile278 Pro290 Pro293 Asn324 Ala334 Arg347 Val376 Glu381 Leu382 Ile388 Ser405 Asn420 Gln426 Leu431 Lys443 Ser446 Pro460 Asp478 Pro490

Asp38 Asp73 Val101 Pro109 Pro121 Lys132 Glu136 Ser147 Lys165 Pro175 Pro208 Glu214 Pro215 Ser217 Lys220 — Pro258 Gly259 Pro262 Pro271 Val279 Pro291 — Asn322 Pro332 Arg345 Thr374 Pro379 Ile381 Ile387 Pro404 Asn418 Gln424 Glu429 Pro441 Pro444 Pro458 Asp476 Glu488

Pro36 Pro71 Pro99 Pro107 Pro119 Pro131 Pro135 Pro146 Pro164 Pro174 Pro207 Pro218 Trp218 Pro220 Pro223 Pro256 Leu268 Pro269 Pro271 Gly280 Pro284 — — Pro319 Pro329 Pro338 Pro365 Pro370 Pro371 Pro377 Pro387 Pro397 Pro403 Pro408 Pro420 Pro423 Pro437 Pro454 Arg465

e e b e e e e e e e e e e e e e b e b e b e e b b e e e e e e e e e e e e e e

The Proline Rule Table 1

301

Continued

Secondary structure element in Bce (domainc) Loop (C) h-Strand Ch4 (C) Loop (C) Loop (C-terminus) (C) tm (jC)

Positiond

Bce

Bco

Bth

Bﬂ

Pro490 Cys515 Pro541 Lys558 44.5

Pro493 Val518 Thr539 Lys555 59.6

Pro491 Pro516 Glu543 Pro562 71.0

Glu468 Lys490 Vall512 Asp529 89.2

Exposed or buriede b b e e

a Oligo-1,6-glucosidases from B. cereus, B. coagulans, B. thermoglucosidasius, and B. ﬂavocaldarius are called Bce, Bco, Bth, and Bﬂ, respectively. b Source: Fig. 2 and Refs. 74 and 78. c N, S, and C denote the N-domain, the subdomain, and the C-domain, respectively. d N1 denotes position N1 of an a-helix; i+1 and i+2 denote the second and third positions of h-turns in Bce. e Letter e or b denotes residues exposed to solvent or buried in the molecule, respectively (Refs. 78,106). f Deduced by published methods (Refs. 106,107).

values* of the enzymes (Table 1) (74). Of these prolines, 18 are shared by two or three of the four enzymes, whereas the remaining 25 prolines are found in only one of the four enzymes (Table 1). A detailed investigation of these nonconserved proline residues has produced ﬁve conclusions (74): (a) The four enzymes under investigation contain at least 43 positions (nonconserved positions) at which nonconserved proline residues occur with increasing thermostability (Table 1, Fig. 3). The majority (38 positions, 88%) of the positions are in the N-domain and the subdomain rather than the C-domain. (b) Twenty-six of these nonconserved positions are in solventexposed h-turns, mainly at positions i+1 in h-turns of types I and II, or in coils within the subdomain and loops of the (h/a)8 barrel forming the active-site cleft (Table 1, Fig. 3). More proline residues are introduced into this subset of 26 positions as thermostability increases. Most of the proline residues introduced are clustered on three long, ﬂexible loops, which connect h-

* The tm values (44.5jC, 59.6jC, 71.0jC, and 89.2jC) of the four enzymes are also in direct proportion to their activation free energy changes DGd* (95.4, 100.4, 103.1, and 109.2 kJ/mol at pH 6.8) for irreversible heat unfolding (74). DGd* is a parameter of molecular rigidity in the folded state (74).

302

Suzuki

Figure 3 Forty-three positions of B. cereus oligo-1,6-glucosidase that are equivalent to those of nonconserved proline residues in four Bacillus oligo-1,6-glucosidases (see Table 1) (Refs. 74,78). The 43 residues and their positions (.) are indicated by single letters and residue numbers. Positions 121, 165, 290, 417, 443, 457, and 487 are in positions i+1 in type I h-turns; positions 208 and 257 are in positions i+1 in type II h-turns; positions 380 and 475 are in positions i+1 in type III h-turns; position 258 is in the third position i+2 of a type II h-turn; positions 109, 175, 261, 344, 386, and 403 are in positions N1 of a-helices Sa1, Na3, Na5, Na7, Na8V, and Na8W; positions 278, 321, and 515 are within h-strands Nh6, Nh7, and Ch4; position 38 is within a-helix Na1; all other positions are in coils. Positions 109, 121, 132, 136, 147, and 165 are in the subdomain; positions 487, 490, 515, 541, and 558 are in the Cdomain; all other positions are in the N-domain. P223 and P256 of the B. ﬂavocaldarius enzyme, and P293 of the B. coaguans enzyme are in loop insertions and lack an equivalent position in the B. cereus enzyme; therefore, only their approximate location can be indicated. P223 of the B. ﬂavocaldarius enzyme is equivalent to K220 of the B. coagulans and B. thermoglucosidasius enzymes (see Table 1). All other symbols and abbreviations are the same as in Fig. 1.

strands Nh3, Nh4, and Nh8 to their adjacent a-helices Na3, Na4, and Na8, and which constitute large parts of the active-site cleft area (Figs. 2 and 3). (c) Six of the nonconserved positions are at the edge of the active-site cleft, at the solvent-exposed positions N1 in a-helices Sa1, Na3, Na5, Na7, Na8V, and Na8W in the N-domain and the subdomain (Table 1, Figs. 2 and 3). No proline residues were found in these

The Proline Rule

303

positions in the enzymes from B. cereus and B. coagulans. Proline residues occupy four of these positions (in Sa1, Na3, Na5, and Na8W) in the B. thermoglucosidasius enzyme and all six positions in the B. ﬂavocaldarius enzyme. (d) h-Strands Nh6 and Nh7 and a-helix Na1 of the (h/a)8 barrel comprise three nonconserved positions, which are ﬁlled with proline residues only in the B. ﬂavocaldarius enzyme (Table 1, Fig. 2). Proline residues lacking an amide hydrogen usually serve as strong breakers of a-helices and h-strands and produce a sharp kink within them (2,108–115). The three secondary structure elements bent by proline residues in the B. ﬂavocaldarius enzyme probably better pack the (h/a)8 barrel N-domain and increase interior hydrophobic interactions (74). In support of this notion, Pro96 within h-strand 3 of the (h/a)8 barrel a-subunit of E. coli tryptophan synthetase contributes to the stability of the enzyme (92,116). Clostridium beijerinckii alcohol dehydrogenase is thermostabilized by replacing Ala347 in h-strand 9 by a proline residue (34). Improved packing and, consequently, stabilizing eﬀects of proline residues within a-helices have been found in Rhodobacter capsulatus photosynthetic center (117), E. coli FoF1-ATPase a-subunit (118), bacteriophage T4 lysozyme (119,120), Aquiﬂex pyrophilus ﬂagellin (121), and Kluyveromyces lactis heat shock transcription factor (122). There are many proline residues in a-helices in the hyperthermostable 3-isopropylmalate dehydrogenase of Thermus thermophilus (123). (e) Proline residues mainly replace charged or polar residues in the nonconserved positions (73,74,88,90) (Table 1). The most frequently replaced residue is glutamate, followed by lysine and asparagine (Glu+Lys+Asn=45% of replaced residues).

2.3

Step 3: Multiple Proline Substitutions by Site-Directed Mutagenesis

2.3.1

Conformational Angles / and w

Analysis of (/,w) values at the 40 ‘‘critical’’ positions of B. cereus ATCC7064 oligo-1,6-glucosidase has yielded two results (78): (a) In Ramachandran (/,w) plots (124), 22 and 13 of the 40 residues are clustered around two distinct regions of aR and h conformations, respectively, as described using the nomenclature of Eﬁmov (125) (Fig. 4). Their mean (/,w) values are (61j, 35j) for region aR and (82j, 146j) for region h. These regions are near two

304

Suzuki

Figure 4 Ramachandran diagram showing (/,w) values for the 40 positions (o) of B. cereus oligo-1,6-glucosidase that are equivalent to those of nonconserved proline residues in four Bacillus oligo-1,6-glucosidases, and (/,w) values of their preceding positions (.) (see Table 1 and Fig. 3) (Refs. 74,78,124). The nomenclature of loop conformations is based on that used by Eﬁmov (125). (/,w)aR=(61j, 35j) and (/,w)h=(65j, 150j) denote mean (/,w) values of the database set of 963 proline residues clustered in regions of aR and h conformations, respectively (Ref. 2).

theoretically predicted energy minima for proline conformations in a polypeptide chain (1,2,9,78, 119,126). The (/,w) values at the remaining ﬁve nonconserved positions are (139j, 11j) for Val101, (123j, 37j) for Glu216, and (91j, 1.8j) for Pro490, which are around region gR; (74j, 0.4j) for Gly258 in region gL; and (78j, 77j) for Lys558 near region q (Fig. 4) (78,125). The latter two (/,w) values are not surprising in terms of Gly258 adopting a gL conformation at the third position i+2 in a type II hturn (1,2,20) and of Lys558 being at the C-terminus of the enzyme. (b) Thirty-two of the residues preceding the 40 nonconserved positions have extended h-conformations with mean values of (/,w)h=

The Proline Rule

305

(96j, 141j) (Fig. 4) (78). The (/,w) values of the other eight preceding residues are (55j, 36j) for Leu37 and (73j, 13j) for Asp416 in region aR; (76j, 168j) for Gly146 in region q; (135j, 87j) for Met256 in region y; (114j, 35j) for Tyr343 in region gR; (44j, 45j) for Leu486 in region aL; (105j, 155j) for Gly289; and (172j, 144j) for Gly540 (78,125). All these residues, with the exception of Leu486, have (/,w) values within the allowed region given by Schimmel and Flory (1) and by other investigators (2,119,126).

2.3.2

Twelve Positions Selected

Twelve of the nonconserved positions were selected for substitution by proline residues in the B. cereus enzyme (21,22,37). All these positions are in the N-domain and the subdomain of the enzyme (Fig. 5). Lys121, Glu290, and Lys457 are at positions i+1 in type I h-turns that connect ahelix Sa1 to h-strand Sh1, a-helices Na6V and Na6, and a-helices Na8j and Na8, respectively. Glu208 is at position i+1 in a type II h-turn connecting h-strand Nh4 to a-helix Na4 (Figs. 2 and 5, Table 1). Asn109, Glu175, Thr261, and Ile403 are at positions N1 on a-helices Sa1, Na3, Na5, and Na8W, respectively (Figs. 2 and 5, Table 1). Glu216, Glu270, Glu378, and Thr440 are in random coil structure in the loops connecting h-strand Nh4 to a-helix Na4, a-helix Na5 to h-strand Nh6, h-strand Nh8 to a-helix Na8V, and a-helices Na8W and Na8j, respectively (Figs. 2 and 5). These nonconserved residues are on the C-terminal side of the (h/a)8 barrel (i.e., on the side of the active-site cleft) (Fig. 5), with the exception of Glu270, which is on the N-terminal side of the barrel. Proline substitutions at these 12 nonconserved positions probably cause no signiﬁcant change in enzyme conformation. This assumption is supported by ﬁve arguments: (a) Proline residues favor positions i+1 in h-turns of types I and II, positions N1 in a-helices, and coils (2,9,20,23,108–112,115,127– 129). (b) The above 12 (nonconserved) residues in the B. cereus enzyme have been replaced by proline residues in its thermostable counterpart from B. thermoglucosidasius (Table 1) (90). The sequences of the two proteins are very similar around the 12 nonconserved positions (Fig. 2). (c) The 12 nonconserved residues and their preceding residues have (/,w) values that are allowed for proline substitutions, although Glu216 is in region gR (Table 2) (1,2,78,119,126).

306

Suzuki

Figure 5 The 12 residues replaced by proline residues in a consecutive series of mutants of B. cereus oligo-1,6-glucosidase (Refs. 21,22,37). The order of proline substitutions is indicated by the increasing numbers of the respective mutants (Mut1!Mut12; see Table 3). The mutated residues and their positions are indicated by single letters and residue numbers. The last-mutated positions of Mut1, Mut3, Mut4, and Mut10 correspond to positions i+1 in h-turns of types I, I, II, and I on loops between a-helix Sa1 and h-strand Sh1, between a-helices Na6V and Na6, between h-strand Nh4 and a-helix Na4, and between a-helices Na8j and Na8, respectively. The last-mutated positions of Mut2, Mut7, Mut9, and Mut12 correspond to positions N1 of a-helices Na3, Na5, Sa1, and Na8W, respectively. The last-mutated positions of Mut5, Mut6, Mut8, and Mut11 are on coils within loops between a-helix Na5 and h-strand Nh6, between h-strand Nh8 and a-helix Na8V, between h-strand Nh4 and a-helix Na4, and between a-helices Na8W and Na8j, respectively. All other symbols and abbreviations are the same as in Figs. 1 and 3.

(d) All residues except Thr261 are solvent-exposed (Table 1) (21,78). Solvent-exposed residues, in general, can be replaced by proline residues without a large perturbation in their neighboring residues’ conformations (53,130,131). As a matter of fact, enhanced thermostability has been achieved by proline substitution on the surface of various proteins (19,23–29,33–36,42,118).

The Proline Rule

307

Table 2 Dihedral Angles of 12 Critical Residues Replaced by Proline Residues as Well as of Their Preceding Residues in B. cereus Oligo-1,6-glucosidase Preceding residues

/j

wj

Loop conformationb

Critical residues

/j

wj

Loop conformationa

h h h h h h h h h h h h

Asn109 Lys121 Glu175 Glu208 Glu216 Thr261 Glu270 Glu290 Glu378 Ile403 Thr440 Lys457

53.0 66.5 77.5 47.8 123.3 53.5 48.8 50.3 82.0 56.1 60.7 73.2

41.0 25.8 10.6 127.7 36.6 34.7 38.4 54.0 9.7 43.4 138.6 10.5

aR aR aR h gR aR aR aR aR aR h aR

76.0 145.4 72.3 140.8 104.6 120.3 71.7 130.6 158.2 167.4 111.4 162.9 59.5 131.0 105.2 155.1 94.3 141.7 69.8 124.4 146.8 147.3 106.9 117.4

His108 Asn120 Asn174 Glu207 Thr215 Thr260 Gly269 Gly289 Phe377 Asp402 Ile439 Asn456 a

Loop conformations (see Fig. 4). The nomenclature is based on that used by Eﬁmov (Ref. 125). Source: Ref. 78.

(e) Proline substitution of the above 12 nonconserved residues yielded active mutants whose speciﬁc activities are very close to the B. cereus enzyme (21,22,37). 2.3.3

Mutant Enzymes

Site-directed mutagenesis of the B. cereus enzyme gene has been done stepwise according to the method of Kunkel et al. (132) to direct mutations in the order: Lys121!Pro (Lys121Pro), Glu175!Pro (Glu175Pro), Glu290!Pro (Glu290Pro), Glu208!Pro (Glu208Pro), Glu270!Pro (Glu270Pro), Glu378!Pro (Glu378Pro), Thr261!Pro (Thr261Pro), Glu216!Pro (Glu216Pro), Asn109!Pro (Asn109Pro), Lys457!Pro (Lys457Pro), Thr440!Pro (Thr440Pro), and Ile403!Pro (Ile403Pro) (Fig. 5) (21,22,37). The 12 active mutant enzymes (Mut1 to Mut12) comprising 1–12 additional proline residues have been produced in E. coli and puriﬁed to homogeneity (21,22,37). Proline substitutions for Asn, Lys, Glu, Thr, and Ile are estimated to contribute 17 to 7 J/mol/K to the backbone entropy of unfolding (19,133). This value corresponds to an increase of 5.4–2.2 kJ/mol in the free energy of unfolding for the B. cereus enzyme at 49.3jC and pH 7.0 (19,133). However, the observed increases in thermostability vary according to the positions mutated (Table 3).

308

Suzuki

Table 3 The tm Values and Thermodynamic Parameters of B. cereus Oligo-1,6-glucosidase and Its Mutant Enzymes (Mut1 to Mut12) with 1–12 Proline Residues Introduceda

Mutants Native enzyme Mut1 Mut2 Mut3 Mut4 Mut5 Mut6 Mut7 Mut8 Mut9 Mut10 Mut11 Mut12

Sites (residues) mutatedb

T (K121) H (E175) T (E290) T (E208) C (E270) C (E378) H (T261) C (E216) H (N109) T (K457) C (T440) H (I403)

tm (jC)

Dtmc (jC)

Tdd (jC)

DCpe (kJ/mol/K)

DH(Td)f (kJ/mol)

DHdg (kJ/mol)

DSdg (J/mol/K)

DDGdh (kJ/mol)

44.5

—

49.3

21.0

1090

1090

3350

45.9 46.3 47.1 47.9 47.6 47.6 48.1 48.2 49.6 49.8 49.4 50.1

1.4 1.8 2.6 3.4 3.1 3.1 3.6 3.7 5.1 5.3 4.9 5.6

50.6

25.7

1110

1080

3320

4.6

52.9

25.5

1100

1000

3080

11.7

53.1

15.3

967

908

2790

10.5

54.3

25.9

1040

908

2770

15.1

54.6

18.9

929

828

2520

15.5

—

a

Source: Refs. 21, 22, 37, 38. Mut1 to Mut12, mutated enzymes with 1–12 additional proline residues (see Fig. 5). b T, H, and C denote positions i+1 in h-turns, positions N1 in a-helices, and positions on coils, respectively. c Dtm is the diﬀerence between tm values at pH 6.8 of the native enzyme and each mutant enzyme. d Td is the temperature at the midpoint of unfolding transition of each enzyme at pH 7.0, determined by diﬀerential scanning calorimetry. e DCp is the diﬀerence in the heat capacity between the folded and unfolded states of each enzyme, determined by diﬀerential scanning calorimetry. f DH(Td) is the enthalpy change of unfolding at Td of each enzyme, determined by diﬀerential scanning calorimetry. g DHd and DSd are the unfolding enthalpy and entropy changes calculated for Td (49.3jC) of the native enzyme. h DDGd is the diﬀerence in the free energy change at 49.3jC between the native enzyme and each mutant enzyme.

2.3.4

Thermostability and Positions Mutated

The thermostability of the B. cereus enzyme and the mutant enzymes Mut1 to Mut12 has been analyzed by comparing their tm values and thermodynamic parameters determined by diﬀerential scanning calorimetry (21,22, 37,38). This study has resulted in ﬁve conclusions: (a) The mutant enzymes are thermostabilized cumulatively by increasing the number of proline residues introduced into the 12 nonconserved positions (see Dtm, Td, and DDGd in Mut1 to Mut12 in Table 3). Proline substitutions increase kinetic stability as well

The Proline Rule

309

as thermodynamic stability of the mutants.* Cumulative stabilization by multiple substitutions (including proline substitutions) has also been found in E. coli ribonuclease HI (26), A. awamori glucoamylase (32), C. beijerinckii alcohol dehydrogenase (34), barley h-amylase (35,134), B. stearothermophilus thermolysin-like protease (36), B. subtilis 3-isopropylmalate dehydrogenase (39), Streptomyces diastaticus d-xylose isomerase (42), and bacteriophage T4 lysozyme (135). (b) The increased thermostability of Mut1, Mut4, Mut7, Mut9, and Mut12, which were characterized in great detail, is due to an increase in the free energy change DGd of unfolding, which, in turn, is due to a decrease in the entropy change DSd of unfolding that more than compensates for the decrease in the enthalpy change DHd of unfolding (Table 3). This is in agreement with results from the bacteriophage T4 lysozyme Ala82Pro and Ala93Pro mutants (19, 119) and the human lysozyme Ala47Pro and Val110Pro mutants (25). In these lysozyme mutants, proline replacements aﬀect position i+1 in type I h-turns and position N1 in a-helices (19, 25,119). (c) Proline residues introduced into positions i+1 of h-turns are most eﬀective for increasing thermostability [tm increase=1.4–0.2jC per proline (average 0.8jC) in Mut1, Mut3, Mut4, and Mut10 in Table 3]. Thermal stabilization by proline substitutions at the turn positions i+1 has been evidenced with the bacteriophage T4 lysozyme Ala82Pro mutant (19), the human lysozyme Ala47Pro mutant (25,136), the Bacillus sp. alkaline liquefying a-amylase Arg124Pro mutant (40), the B. stearothermophilus thermolysin-like protease Ser65Pro mutant (28), and the C. beijerinckii alcohol dehydrogenase Ser24Pro mutant (34,54) (each mutation in a type I turn); the Bacillus sp. alkaline serine protease Thr203Pro mutant (29) and the A. awamori glucoamylase Ser30Pro mutant (32) (each mutation in a type II turn); the murine immunoglobulin variable domain Ala15Pro, Asp56Pro, and Asp60Pro mutants (in type II, type II, and type I turns, respectively) (23); and the barley h-amylase Ser351Pro mutant (the mutation being in a type III turn of 310 helical conformation) (35,134). The h-amylase Ser350Pro mutant is also eﬀective in increasing thermostability (35,134), as proline

* Td (the temperature at the midpoint of the transition in protein unfolding curve), DGd, DSd, and DHd are parameters for thermodynamic stability toward reversible heat unfolding; tm is used as a parameter for kinetic stability toward irreversible heat unfolding.

310

Suzuki

residues favor positions i in type III turns or N-caps of 310 helices as well as their turn positions i+1 or helical positions N1 (2,20). A few examples of destabilization by proline residues in h-turns have been reported: (i) Murine immunoglobulin variable domain is destabilized by substitution Gly68Pro at position i+1 of a type IIV hturn (23). The (/,w) value of Gly68 is within region q* and, thus, disallowed for proline substitution (Fig. 4) (20,119). B. stearothermophilus thermolysin-like protease is destabilized by substitution Tyr66Pro at position i+2 of a type VIII or type I h-turn (64-ASYD) because Pro66 destroys favorable contacts between the side chains of Tyr66 and His105 (28). Proline residues have no or less propensity for positions i+2 in such turns (20). (ii) A. awamori glucoamylase is destabilized by substitution Ala27Pro at position i+1 in a type II h-turn (26-GAWV) because the pyrrolidine ring of Pro27 has unfavorable van der Waals contacts with backbone nitrogens of Val29 and Ser30 (31). In cases (i) and (ii), a conformational distortion or strain is likely to be produced around the sites of the prolines added. (iii) In the T. aquaticus TaqI endonuclease Ala56Pro mutant, proline substitution at position i+1 of a type I h-turn (55KALE) appears to produce a looser and consequently less heat-resistant h-turn or, rather, loop (20,137). (d) Positions N1 of a-helices are almost as eﬀective in thermal stabilization [tm increase=1.4–0.4jC per proline (average 0.8jC) in Mut2, Mut7, Mut9, and Mut12 in Table 3] as positions i+1 in h-turns. Thermal stabilization by proline replacements at helical positions N1 has been demonstrated for the B. stearothermophilus thermolysin-like protease Ala69Pro mutant (28,138), the human lysozyme Val110Pro mutant (25,136), the A. awamori glucoamylase Asp345Pro mutant (32), the C. beijerinckii alcohol dehydrogenase Ala177Pro and Leu316Pro mutants [Leu316 being in position N1 in a 310 helix (316-LSKL)] (34,54), the bacterio-

* The (/,w) values at positions replaced by prolines are within (50j to 80j, 120j to 180j) or (50j to 70j, 10j to 50j), which are consistent with the conformations observed for proline residues in highly reﬁned protein crystal structures (2,9,119). These two regions correspond to regions h and aR in Fig. 4, respectively.

The Proline Rule

311

phage T4 lysozyme Lys60Pro and Ala93Pro mutants (119), the barley family 31 a-glucosidase Thr340Pro mutant (44), and the B. stearothermophilus neutral protease Ile140Pro mutant (41). In the same protease, proline replacement at a helical N2 position was also eﬀective, although the Asp141Pro mutant has lower stability than the N1 mutant (41). In contrast, human lysozyme is destabilized by substitution Asp91Pro at position N2 of an a-helix because Pro91 disrupts the hydrogen bond between the side chains of Asp91 and Gln86 (25,136). These ﬁndings are consistent with results from dynamic simulation (127), which prove that a proline residue at position N1 serves as a better helix former than those at other helical positions. Indeed, less stabilization or even destabilization has been observed for proline substitutions at helical position N3 (44,120), helical central regions (34,41,139), helical C-termini (33,41), helical N and C-caps (26,27,34,54,137), and positions NV preceding N-caps (137). Proline residues have lower or no propensity for these positions compared with position N1 (2,9,115,128,129,140). Proline substitutions at internal positions cause a spreading distortion on a-helices (33,34,41,120,139). The (/,w) values of helical residues are within region aR and, thus, are not compatible with residues preceding proline residues (Fig. 4) (1,119,126). Proline residues at helical centers and Ctermini destroy helical hydrogen bonds (33,34,41,139). Replacement Asp72Pro in a helical center of bacteriophage T4 lysozyme causes a steric clash of the Cy of the pyrrolidine ring of Pro72 with the peptide nitrogen of Val71 or the carbonyl oxygen of Asn68; in addition, the salt bridge between Asp70 and His31 is weakened (139). E. coli ribonuclease HI is destabilized by substitution Gln113Pro at a helical C-cap (26,27). Gln113 and the preceding Gly112 have (/,w) values within regions gR and aR, respectively, which are not compatible with proline residues and their preceding residues (Fig. 4) (1,119,126). Substitution His222Pro at a helical Ncap in C. beijerinckii alcohol dehydrogenase results in loss of solvation energy (34,54). (e) Proline residues introduced into random coils on solvent-exposed ﬂexible loops produce marginal changes [tm increase=0.1jC to 0.4jC per proline (average 0.2jC) in Mut5, Mut6, Mut8, and Mut11 in Table 3]. Proline residues are most likely to increase the conformational energy in the folded state, which can oﬀset the entropic gains, as found with hen egg white lysozyme (24,141). Marginal or negative thermal stabilization by prolines added in coil regions has also been observed in the Thermoanaerobacterium

312

Suzuki

thermosulfurigenes D-xylose isomerase Ala62Pro mutant (43); the hen egg white lysozyme Asp101Pro and Gly102Pro mutants (24); the B. stearothermophilus thermolysin-like protease Thr63Pro mutant (28); the dimeric 4-a-bundle protein ROC Ala31Pro mutant (142); the A. awamori glucoamylase Ala393Pro, Ala435Pro, and Ser460Pro mutants (31); the C. beijerinckii alcohol dehydrogenase Leu275Pro mutant (34); the Glu110Pro mutant of a chimeric 3isopropylmalate dehydrogenase (143,144); the bovine pancreatic ribonuclease A Ala19Pro mutant (145); and the Streptomyces sp. family 11 xylanase Ser33P mutant (146). Marginal or negative effects on these mutants are caused by at least one, or a combination, of the following factors: (i) perturbations of (/,w) preferences at positions replaced by proline residues and at their preceding positions (24, 28,31,142); (ii) unfavorable van der Waals contacts introduced between the pyrrolidine ring of proline and neighboring atoms (24,31, 43,146); and (iii) disruption by proline residues of original stabilizing noncovalent interactions such as hydrogen bonds (31,142) and hydrophobic interactions (34,145). In the A. awamori glucoamylase Ser460Pro mutant, Pro460 added in a loop disrupts the hydrogen bond between the hydroxyl oxygen of Ser460 and the backbone nitrogen of Val461, and destroys the O-glycosylation site at Ser460 (31). In the C. beijerinckii alcohol dehydrogenase Leu275Pro mutant, Pro275 added in a loop involved in dimerization weakens hydrophobic interactions contributing to dimer stability (34). In the bovine pancreatic ribonuclease A Ala19Pro mutant, Pro19 introduced in a hinge peptide (residues 16–22) sharply bends this hinge and increases its ﬂexibility, so as to weaken the interaction of the N-segment (residues 1–15) with the main body of the protein (145). In these three cases, proline replacements appear to cause a deformation in the folded structure (31,41,145). Despite the above-mentioned ﬁndings, there is no doubt that solvent-exposed loops of nonregular structure are potential targets for thermal stabilization, as shown with the E. coli ribonuclease HI His62Pro mutant (26,27), the A. awamori glucoamylase Ser436Pro mutant (31), the S. diastaticus D-xylose isomerase Gly138Pro mutant (42), the barley (1!3, 1!4)–h-glucan endohydrolase His300Pro mutant (33), the C. beijerinckii alcohol dehydrogenase Ala22Pro mutant (34), and the T. thermosulfurigenes D-xylose isomerase Gln58Pro mutant (43). Stabilization of these mutants, particularly the latter three mutants (33,34,43), very likely results from reduction in backbone entropy of

The Proline Rule

313

unfolding. In the former three mutants, proline substitutions produce additional stabilizing factors that can decrease the free energy in the folded state (141). In the S. diastaticus D-xylose isomerase Gly138Pro mutant, Pro138 added in a loop connecting h-strand 5 and a-helix 5 ﬁlls the cavity left by Gly138 and makes the loop more rigid (42). In the E. coli ribonuclease HI His62Pro mutant, Pro62 added in a loop connecting ahelix 1 and h-strand D disrupts the hydrogen bond between the imidazole nitrogen Ny1 of His62 and the backbone carbonyl oxygen of Gln113 (26,27). This negative eﬀect is compensated by the increase in rigidity of the loop by Pro62, or by the hydrogen bond formed between the side chain of Gln113 and water (27). In the A. awamori glucoamylase Ser436Pro mutant, the hydrogen bond between the side chains of Ser436 and Asp237 is lost (31). This negative eﬀect is overcome by Pro436 ﬁlling the space left by Ser436 and stimulating hydrophobic interactions in a packing void (31). 3

THE PROLINE RULE IN REFINED FORM

The above examination has allowed us to reﬁne the Proline Rule (38). ‘‘A globular protein can be cumulatively thermostabilized by increasing the frequency of proline occurrence in positions i+1 of h-turns (and in positions N1 of a-helices or in coils) on the surface of the molecule without major changes in the functional and structural integrity, in such a way that the folded state becomes less ﬂexible while the unfolded state is further reduced in the backbone conformational entropy’’ (18,38). The eﬀect on thermal stabilization will be further strengthened by the clustering of proline residues around the ﬂexible regions of the protein (74,90). The Proline Rule oﬀers a useful concept for protein engineering (18,19,22,23,38). The rule suggests the potential positions (positions i+1 of h-turns and positions N1 of a-helices) and residues (Glu,Lys,Asn, etc.!Pro) of substitution in a protein to improve thermostability (38).

ACKNOWLEDGMENT I thank Dr. Yosiyuki Tsujimoto for his help in the collection of references. REFERENCES 1. 2.

PR Schimmel, PJ Flory. Conformational energies and conﬁgurational statistics of copolypeptides containing L-proline. J Mol Biol 24:105–120, 1968. MW MacArthur, JM Thornton. Inﬂuence of proline residues on protein conformation. J Mol Biol 218:397–412, 1991.

314

Suzuki

3.

TE Creighton. Proteins: Structures and Molecular Properties. 2nd ed. New York: WH Freeman and Company, 1993, pp. 180–181. K Wutherich, C Grathwohl. A novel approach for studies of the molecular conformations in ﬂexible polypeptides. FEBS Lett 43:337–340, 1974. BW Matthews. Structural and genetic analysis of protein stability. Annu Rev Biochem 62:139–160, 1993. CR Matthews. Pathways of protein folding. Annu Rev Biochem 62:653–683, 1993. JL Popt, DM Engelman. Helical membrane protein folding, stability, and evolution. Annu Rev Biochem 69:881–922, 2000. MC Sheldon, P Loughlin, ML Tierney, SM Howitt. Proline residues in two tightly coupled helices of the sulphate transporter, SHST1, are important for sulphate transport. Biochem J 356:589–594, 2001. P Chakrabarti, D Pal. The interrelationships of side-chain and main chain conformations in proteins. Prog Biophys Mol Biol 76:1–102, 2001. TE Creighton. Proteins: Structures and Molecular Properties. 2nd ed. New York: WH Freeman and Company, 1993, pp 114–119. P McCaldon, P Argos. Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences. Proteins Struct Funct Genet 4:99–122, 1988. Y Suzuki, R Aoki, H Hayashi. Assignment of a p-nitrophenyl-a-D-glucopyranoside-hydrolyzing a-glucosidase of Bacillus cereus ATCC7064 to an exooligo-1,6-glucosidase. Biochim Biophys Acta 704:476–483, 1982. JW Crabb, AL Murdock, T Suzuki, JW Hamilton, JH MacLinden, RE Amelunxen. Sequence homology in the amino-terminal and active-site regions of thermolabile glyceraldehyde-3-phosphate dehydrogenase from a thermophile. J Bacteriol 145:503–512, 1981. Y Suzuki, Y Tomura. Puriﬁcation and characterization of Bacillus cogulans oligo-1,6-glucosidase. Eur J Biochem 158:77–83, 1986. Y Suzuki, H Fujii, H Uemura, M Suzuki. Puriﬁcation and characterization of extremely thermostable exo-oligo-1,6-glucosidase from a caldoactive Bacillus sp. KP1228. Starch/Sa¨rke 39:17–23, 1987. Y Suzuki, K Oishi, H Nakano, T Nagayama. A strong correlation between the increase in number of proline residues and the rise in thermostability of ﬁve Bacillus oligo-1,6-glucosidases. Appl Microbiol Biotechnol 26:546–551, 1987. Y Suzuki, N Sugita, T Kishimoto. Puriﬁcation and characterization of an oligo-1,6-glucosidase from the caldoactive thermophile Bacillus caldotenax. Starch/Sta¨rke 49:148–154, 1997. Y Suzuki. A general principle of increasing protein thermostability. Proc Jpn Acad Ser B Phys Biol Sci 65:146–148, 1989. B Matthews, H Nicholson, WJ Becktel. Enhanced thermostability from sitedirected mutations that decrease the entropy of unfolding. Proc Natl Acad Sci USA 84:6663–6667, 1987. EG Hutchinson, JM Thornton. A revised set of potentials for h-turn formation in proteins. Protein Sci 3:2207–2216, 1994.

4. 5. 6. 7. 8.

9. 10. 11.

12.

13.

14. 15.

16.

17.

18. 19.

20.

The Proline Rule 21.

22. 23.

24.

25.

26.

27.

28.

29.

30.

31. 32.

33.

34.

35.

315

K Watanabe, T Masuda, H Ohashi, H Mihara, Y Suzuki. Multiple proline substitutions cumulatively thermostabilize Bacillus cereus ATCC7064 oligo1,6-glucosidase. Irrefragable proof supporting the Proline Rule. Eur J Biochem 226:277–283, 1994. K Watanabe, Y Suzuki. Protein thermostabilization by proline substitutions. J Mol Catal B Enzym 4:167–180, 1998. EC Ohage, W Graml, MM Walter, S Steinbacher, B Steipe. h-Turn propensities as paradigms for the analysis of structural motifs to engineer protein stability. Protein Sci 6:233–241, 1997. T Ueda, T Tamura, Y Maeda, Y Hashimoto, T Miki, H Yamada, T Imoto. Stabilization of lysozyme by the introduction of Gly–Pro sequence. Protein Eng 6:183–187, 1993. T Herning, K Yutani, K Inaka, R Kuroki, M Matsushima, M Kikuchi. Role of proline residues in human lysozyme stability: a scanning calorimetric study combined with x-ray structure analysis of proline mutants. Biochemistry 31: 7077–7085, 1992. S Kimura, H Nakamura, T Hashimoto, M Oobatake, S Kanaya. Stabilization of Escherichia coli ribonuclease HI by strategic replacement of amino acid residues with those from thermophilic counterpart. J Biol Chem 267:21535– 21542, 1992. K Ishikawa, S Kimura, S Kanaya, K Morikawa, H Nakamura. Structural study of mutants of Escherichia coli ribonuclease HI with enhanced thermostability. Protein Eng 6:85–91, 1993. F Hardy, G Vriend, OR Veltman, B van der Vinne, G Venema, VGH Eijsink. Stabilization of Bacillus stearothermophilus neutral protease by introduction of prolines. FEBS Lett 317:89–92, 1993. A Masui, N Fujiwara, T Imanaka. Stabilization and rational design of serine protease AprM under high alkaline and high-temperature conditions. Appl Environ Microbiol 60:3579–3584, 1994. Y Okada, N Yoshigi, H Sahara, S Koshino. Increase in thermostability of recombinant barley h-amylase by random mutagenesis. Biosci Biotechnol Biochem 59:1152–1153, 1995. Y Li, PJ Reilly, C Ford. Eﬀect of introducing proline residues on the stability of Aspergillus awamori. Protein Eng 10:1199–1204, 1997. MJ Allen, PM Coutinho, CF Ford. Stabilization of Aspergillus awamori glucoamylase by proline substitution and combining stabilizing mutations. Protein Eng 9:783–788, 1998. RJ Stewart, JN Varghese, TPJ Garrett, PB Høj, GB Fincher. Mutant barley (1!3, 1!4)-h-glucan endohydrolases with enhanced thermostability. Protein Eng 14:245–253, 2001. O Bogin, M Peretz, Y Hacham, Y Korkhin, F Frolow, AJ Kalb (Giboa), Y Bustein. Enhanced thermal stability of Clostridium beijerinckii alcohol dehydrogenase after strategic substitution of amino acid residues with prolines from the homologous thermophilic Thermoanaerobacter brockii alcohol dehydrogenase. Protein Sci 7:1156–1163, 1998. B Mikami, H Yoon, N Yoshigi. The crystal structure of the sevenfold mutant

316

36. 37.

38. 39.

40.

41.

42.

43.

44. 45.

46. 47.

48.

49.

Suzuki of barley h-amylase with increased thermostability at 2.5 A˚ resolution. J Mol Biol 285:1235–1243, 1999. B Van den Burg, G Vriend, OR Veltman, G Venema, VGH Eijsink. Engineering an enzyme to resist boiling. Proc Natl Acad Sci USA 95:2056–2060, 1998. K Watanabe, Y Fujita, M Usami, A Takimoto, Y Suzuki. Thermodynamic analysis of Bacillus cereus oligo-1,6-glucosidase and its cumulatively prolineintroduced mutant proteins by diﬀerential scanning calorimetry. J Mol Catal B Enzym 10:257–262, 2000. Y Suzuki. The Proline Rule—a strategy for protein thermal stabilization. Proc Jpn Acad Ser B Phys Biol Sci 75:133–137, 1999. S Akanuma, A Yamagishi, N Tanaka, T Oshima. Further improvement of the thermal stability of a partially stabilized Bacillus subtilis 3-isopropylmalate dehydrogenase variant by random and site-directed mutagenesis. Eur J Biochem 260:499–504, 1999. K Igarashi, T Ozawa, K Ikawa-Kitayama, Y Hayashi, H Araki, K Endo, H Hagihara, K Ozaki, S Kawai, S Ito. Thermostabilization by proline substitution in an alkaline, liquefying a-amylase from Bacillus sp. strain KSM-1378. Biosci Biotechnol Biochem 63:1535–1540, 1999. S Nakamura, T Tanaka, RY Yada, S Nakai. Improving the thermostability of Bacillus stearothermophilus neutral protease by introducing proline into the active site helix. Protein Eng 10:1263–1269, 1997. GP Zhu, C Xu, MK Teng, LM Tao, XY Zhu, CJ Wu, J Hang, LW Niu, YZ Wang. Increasing the thermostability of D-xylose isomerase by introduction of a proline into the turns of a random coil. Protein Eng 12:635–638, 1999. D Sriprapundh, C Vieille, JG Zeikus. Molecular determinants of xylose isomerase thermal stability and activity: analysis of thermozymes by site-directed mutagenesis. Protein Eng 13:259–265, 2000. EH Muslin, SE Clark, CA Henson. The eﬀect of proline insertions on the thermostability of a barley a-glucosidase. Protein Eng 15:29–33, 2002. LF Delboni, SC Mande, F Rentier-Delrue, V Mainfroid, S Turley, FMD Vellieux, JA Martial, WGJ Hol. Crystal structure of recombinant triosephosphate isomerase from Bacillus stearothermophilus. An analysis of potential thermostability factors in six isomerases with known three-dimensional structures points to the importance of hydrophobic interactions. Protein Sci 4: 2594–2606, 1995. C Vieille, JG Zeikus. Thermozymes: identifying molecular determinants of protein structural and functional stability. Trends Biotechnol 14:183–190, 1996. M Peretz, O Bogin, S Tel-Or, A Cohen, G Li, J Chen, Y Burstein. Molecular cloning, nucleotide sequencing, and expression of genes encoding alcohol dehydrogenases from the thermophile Thermoanaerobacter brockii and the mesophile Clostridium beijerinckii. Anaerobe 3:259–270, 1997. D Tsuchiya, T Sekiguchi, A Takenaka. Crystal structure of 3-isopropylmalate dehydrogenase from the moderate facultative thermophile, Bacillus coagulans: two strategies for thermostabilization of protein structures. J Biochem 122: 1092–1104, 1997. P Polverino de Laureto, E Scaramella, V De Filippis, O Marin, MG Doni, A

The Proline Rule

317

Fontana. Chemical synthesis and structural characterization of the RGDprotein decorsin: a potent inhibitor of platelet aggregation. Protein Sci 7:433– 444, 1998. 50. T Fleming, J Littlechild. Sequence and structural comparison of thermophilic phosphoglycerate kinases with a mesophilic equivalent. Comp Biochem Physiol 118A:439–451, 1997. 51. G Vogt, S Woell, P Argos. Protein thermal stability, hydrogen bonds, and ion pairs. J Mol Biol 269:631–643, 1997. 52. S Chakravarty, R Varadarajan. Elucidation of determinants of protein stability through gene sequence analysis. FEBS Lett 470:65–69, 2000. 53. C Vieille, GJ Zeikus. Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability. Microbiol Mol Biol Rev 65:1–43, 2001. 54. C Li, J Heatwole, S Soelaiman, M Shoham. Crystal structure of a thermophilic alcohol dehydrogenase substrate complex suggests determinants of substrate speciﬁcity and thermostability. Proteins Struct Funct Genet 37:619–627, 1999. 55. LL Leggio, S Kalogiannis, MK Bhat, RW Pickersgill. High resolution structure and sequence of T auranticus xylanase: I. Implications for the evolution of thermostability in family 10 xylanases and enzymes with ha-barrel architecture. Proteins Struct Funct Genet 36:295–306, 1999. 56. PJ Haney, JH Badger, GL Buldak, CI Reich, CR Woese, GJ Olsen. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. Proc Natl Acad Sci USA 96: 3578–3583, 1999. 57. TH Tahirov, H Oki, T Tsukihara, K Ogasahara, K Yutani, K Ogata, Y Izu, S Tsunasawa, I Kato. Crystal structure of methionine aminopeptidase from hyperthermophile, Pyrococcus furiosus. J Mol Biol 284:101–124, 1998. 58. T Nakai, K Okada, S Akutsu, I Miyahara, S Kawaguchi, R Kato, S Kuramitsu, K Hirotsu. Structure of Thermus thermophilus HB8 aspartate aminotransferase and its complex with maleate. Biochemistry 38:2413–2424, 1999. 59. M Henning, R Stermer, K Kinschner, JN Jansonius. Crystal structure at 2.0 A˚ resolution of phosphoribosyl anthranilate isomerase from the hyperthermophile Thermotoga maritima: possible determinants of protein stability. Biochemistry 36:6009–6016, 1997. 60. S Fukuchi, K Nishikawa. Protein surface amino acid compositions distinctively diﬀer between thermophilic and mesophilic bacteria. J Mol Biol 309:835–843, 2001. 61. G Wallon, G Kryger, ST Lovett, T Oshima, D Ringe, GA Petsko. Crystal structure of Escherichia coli and Salmonella typhimurium 3-isopropylmalate dehydrogenase and comparison with their thermophilic counterpart from Thermus thermophilus. J Mol Biol 266:1016–1031, 1997. 62. MDi Giulio. The late stage of genetic code structuring took place at a high temperature. Gene 261:189–195, 2000. 63. S Parathasarathy, MRN Murthy. Protein thermal stability: insights from atomic displacement parameters (B values). Protein Eng 13:9–13, 2000. 64. K Miyazaki, PL Wintrode, RA Grayling, DN Rubingh, FH Arnold. Directed

318

65.

66.

67. 68.

69.

70.

71. 72.

73.

74.

75. 76.

77.

78.

Suzuki evolution study of temperature adaptation in a psychrophilic enzyme. J Mol Biol 297:1015–1026, 2000. J Hoseki, T Yano, Y Koyama, S Kuramitsu, H Kagamiyama. Directed evolution of thermostable kanamycin-resistance gene: a convenient selection marker for Thermus thermophilus. Biochem J 126:951–956, 1999. JK Song, JS Rhee. Simultaneous enhancement of thermostability and catalytic activity of phospholipase A1 by evolutionary molecular engineering. Appl Environ Microbiol 66:890–894, 2000. H Zhao, FH Arnold. Directed evolution converts subtilisin E into a functional equivalent of thermitase. Protein Eng 12:47–53, 1999. RJM Russel, U Gerike, MJ Danson, DW Hough, GL Taylor. Structural adaptations of the cold-active citrate synthetase from an Antarctic bacterium. Structure 6:351–361, 1998. D Georlette, ZO Jonsson, F Van Petegem, J Chessa, J Van Beeumen, U Hubscher, C Gerday. A DNA ligase from the psychrophile Pseudoalteromonas haloplanktis gives insights into the adaptation of proteins to low temperatures. Eur J Biochem 267:3502–3512, 2000. C Gerday, M Aittaleb, JL Arpigny, E Baise, J Chessa, G Carsoux, I Petrescu, G Feller. Psychrophilic enzymes: a thermodynamic challenge. Biochim Biophys Acta 1342:119–131, 1997. G Gianese, A Argos, S Pascarella. Structural adaptation of enzymes to low temperatures. Protein Eng 14:141–148, 2001. PP Sheridan, N Panasik, JM Coombs, JE Brenchley. Approaches for deciphering the structural basis of low temperature enzyme activity. Biochim Biophys Acta 1543:417–433, 2000. K Watanabe, K Kitamura, Y Suzuki. Analysis of the critical sites for protein thermostabilization by proline substitution in oligo-1,6-glucosidase from Bacillus coagulans ATCC7050 and the evolutionary consideration of proline residues. Appl Environ Microbiol 62:2066–2073, 1996. S Kashiwabara, S Matsuki, T Kishimoto, Y Suzuki. Clustered proline residues around the active-site cleft in thermostable oligo-1,6-glucosidase of Bacillus ﬂavocaldarius KP1228. Biosci Biotechnol Biochem 62:1093–1102, 1998. S Janecek. a-Amylase family: molecular biology and evolution. Prog Biophys Mol Biol 67:67–97, 1997. K Watanabe, K Kitamura, Y Hata, Y Katsube, Y Suzuki. Overproduction, puriﬁcation and crystallization of Bacillus cereus oligo-1,6-glucosidase. FEBS Lett 290:221–223, 1991. H Kizaki, Y Hata, K Watanabe, Y Katsube, Y Suzuki. Polypeptide folding of Bacillus cereus ATCC7064 oligo-1,6-glucosidase revealed by 3.0 A˚ resolution x-ray analysis. J Biochem 113:646–649, 1993. K Watanabe, Y Hata, H Kizaki, Y Katsube, Y Suzuki. The reﬁned crystal structure of Bacillus cereus oligo-1,6-glucosidase at 2.0 A˚ resolution: structural characterization of proline-substitution sites for protein thermostabilization. J Mol Biol 269:142–153, 1997.

The Proline Rule 79. 80.

81. 82. 83.

84. 85.

86.

87.

88.

89.

90.

91. 92.

93.

319

Y Matsuura, M Kusunoki, W Harada, M Kakudo. Structure and possible catalytic residues of Taka-amylase A. J Biochem 95:697–702, 1984. SB Larson, A Greenwood, D Cascilo, J Day, A MacPherson. Reﬁned molecular structure of pig pancreatic a-amylase at 2.1A˚ resolution. J Mol Biol 235: 1548–1560, 1994. M Qian, R Haser, F Payan. Structure and molecular model reﬁnement of pig pancreatic a-amylase at 2.1 A˚ resolution. J Mol Biol 231:785–799, 1993. M Machius, G Wiegand, R Huber. Crystal structure of calcium-depleted Bacillus licheniformis a-amylase at 2.2 A˚ resolution. J Mol Biol 246:545–559, 1995. EA MacGregor, S Janecek, B Svensson. Relationship of sequence and structure to speciﬁcity in the a-amylase family of enzymes. Biochim Biophys Acta 1546:1–20, 2001. V Horvathova, S Janecek, E Sturdik. Amylolytic enzymes: molecular aspects of their properties. Gen Physiol Biophys 20:7–32, 2001. O Mirza, LK Skov, M Remaud-Simeon, G Potocki de Montalk, C Albenne, P Moensan, M Gajhede. Crystal structures of amylosucrase from Neisseria polysaccharea in complex with D-glucose and the active site mutant Glu328Gln in complex with the natural substrate sucrose. Biochemistry 40:9032–9039, 2001. K Watanabe, K Miyake, Y Suzuki. Identiﬁcation of catalytic and substratebinding site residues in Bacillus cereus ATCC7064 oligo-1,6-glucosidase. Biosci Biotechnol Biochem 65:2058–2064, 2001. Y Takii, K Takahashi, K Yamamoto, Y Sogabe, Y Suzuki. Bacillus stearothermophilus ATCC12016 a-glucosidase speciﬁc for a-1,4 bonds of maltosaccharides and a-glucans shows high amino acid sequence similarities to seven a-D-glucohydrolases with diﬀerent substrate speciﬁcity. Appl Microbiol Biotechnol 44:629–634, 1996. K Watanabe, K Kitamura, H Iha, Y Suzuki. Primary structure of the oligo1,6-glucosidase of Bacillus cereus ATCC7064 deduced from the nucleotide sequence of the cloned gene. Eur J Biochem 192:609–620, 1990. K Watanabe, H Iha, A Ohashi, Y Suzuki. Cloning and expression in Escherichia coli of an extremely thermostable oligo-1,6-glucosidase gene from Bacillus thermoglucosidasius. J Bacteriol 171:1219–1222, 1989. K Watanabe, K Chishiro, K Kitamura, Y Suzuki. Proline residues responsible for thermostability occur with high frequency in the loop regions of an extremely thermostable oligo-1,6-glucosidase from Bacillus thermoglucosidasius KP1006. J Biol Chem 266:24287–24294, 1991. B Svensson. Protein engineering in the a-amylase family: catalytic mechanism, substrate speciﬁcity, and stability. Plant Mol Biol 25:141–157, 1994. K Yutani, S Hayashi, Y Sugisaki, K Ogasahara. Role of conserved proline residues in stabilizing tryptophan synthase a subunit: analysis by mutants with alanine or glycine. Proteins Struct Funct Genet 9:90–98, 1991. LM Mayr, O Landt, U Hahn, FX Schmid. Stability and folding kinetics of ribonuclase T1 are strongly altered by the replacement of cis-proline 39 with alanine. J Mol Biol 231:897–912, 1993.

320

Suzuki

94. DA Schultz, RL Baldwin. Cis proline mutants of ribonuclease A: I. Thermal stability. Protein Sci 1:910–916, 1992. 95. I Lascu, D Deville-Bonne, P Glaser, M Veron. Equilibrium dissociation and unfolding of nucleoside diphosphate kinase from Dictyosterlium discoideum. J Biol Chem 268:20268–20275, 1993. 96. I Quinkal, V Davasse, J Gaillard, J Moulis. On the role of conserved proline residues in the structure and function of Clostridium pasteurianum 2[4Fe–4S] ferredoxin. Protein Eng 7:681–867, 1994. 97. G de Part Gay, CM Johnson, AR Fersht. Contribution of a proline residue and a salt bridge to the stability of a type I reverse turn in chymotrypsin inhibitor 2. Protein Eng 7:103–108, 1994. 98. F de Lamotte-Guery, C Pruvost, P Minard, M Delsuc, M Miginiac-Maslow, J Schmitter, M Stein, P Decottignies. Structural and functional roles of a conserved proline residue in the a2 helix of Escherichia coli thioredoxin. Protein Eng 10:1425–1432, 1997. 99. AV Grinberg, R Bernhardt. Eﬀect of replacing a conserved proline residue on the function and stability of bovine adrenodoxin. Protein Eng 11:1057–1064, 1998. 100. R Yelin, S Steiner-Mordoch, B Aroeti, S Schuldiner. Glycosylation of a vesicular monoamine transporter: a mutation in a conserved proline residue affects the activity, glycosylation, and localization of the transporter. J Neurochem 71:2518–2527, 1998. 101. J McHarg, SM Kelly, NC Price, A Cooper, JA Littlechild. Site-directed mutagenesis of proline 204 in the ‘hinge’ region of yeast phosphoglycerate kinase. Eur J Biochem 259:939–945, 1999. 102. N Allocati, E Casalone, M Masulli, I Ceccarelli, E Carletti, MW Parker, C Di Ilio. Functional analysis of the evolutionarily conserved proline 53 residue in Proteus mirabilis glutathione transferase B1-1. FEBS Lett 445:347–350, 1999. 103. T Tomita, T Watabiki, R Takikawa, Y Morohashi, N Takasugi, R Kopan, B De Strooper, T Iwatsubo. The ﬁrst proline of PALP motif at the C terminus of presenilins is obligatory for stabilization, complex formation, and g-secretase activities of presenilins. J Biol Chem 276:33273–33281, 2001. 104. CM Deane, SCR Lummis. The role and predicted propensity of conserved proline residues in the 5-HT3 receptor. J Biol Chem 276:37962–37966, 2001. 105. H Masuda, T Uchiumi, M Wada, T Ichiba, A Hachimori. Eﬀects of replacement of prolines with alanines on the catalytic activity and thermostability of inorganic pyrophosphatase from thermophilic bacterium PS-3. J Biochem 131:58–63, 2002. 106. B Rost, C Sander. Conservation and prediction of solvent accessibility in protein families. Proteins Struct Funct Genet 20:216–226, 1994. 107. B Rost, C Sander. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins Struct Funct Genet 19:55–72, 1994. 108. PY Chou, GD Fasman. Conformational parameters for amino acids in helical, h-sheet, and random coil regions calculated from proteins. Biochemistry 13:211–221, 1974.

The Proline Rule 109.

110. 111. 112. 113. 114. 115.

116.

117.

118.

119.

120.

121.

122.

123.

124. 125. 126.

321

JL Crawford, WN Lipscomb, CG Schellman. The reverse turn as a polypeptide conformation in globular proteins. Proc Natl Acad Sci USA 70:538– 542, 1973. TE Creighton. Proteins: Structures and Molecular Properties. 2nd ed. New York: WH Freeman and Company, 1993, pp 255–257. PY Chou, GD Fasman. Prediction of protein conformation. Biochemistry 13: 222–245, 1974. M Levitt. Conformational preferences of amino acids in globular proteins. Biochemistry 17:4277–4285, 1978. DN Woolfson, DH Williams. The inﬂuence of proline residues on a-helical structure. FEBS Lett 277:185–188, 1990. G van Heijne. Proline kinks in transmembrane a-helices. J Mol Biol 218:499– 503, 1991. K Gunasekaran, HA Nagarajaram, C Ramakrishnan, P Balaram. Stereochemical punctuation marks in protein structures: glycine and proline containing helix stop signals. J Mol Biol 275:917–932, 1998. K Ogasahara, K Yutani. Equilibrium and kinetic analyses of unfolding and refolding for the conserved proline mutants of tryptophan synthase a subunit. Biochemistry 36:932–940, 1997. M Schiﬀer, CF Ainsworth, Y Deng, G Johnson, FH Pascoe, DK Hanson. Proline in a transmembrane helix compensates for cavities in the photosynthetic reaction center. J Mol Biol 252:472–482, 1995. SM Mowitt, M Cleeter, L Hatch, GB Cox. Functional stability of the asubunit of the FoF1-ATPase from Escherichia coli is aﬀected by mutations in three proline residues. Biochim Biophys Acta 1144:17–21, 1993. H Nicholson, DE Tronrud, WJ Becktel, BW Matthews. Analysis of the eﬀectiveness of proline substitutions and glycine replacements in increasing the stability of phage T4 lysozyme. Biopolymers 32:1431–1441, 1992. MM Dixon, H Nicholson, L Shewchuk, WA Baase, BW Matthews. Structure of a hinge-bending bacteriophage T4 lysozyme mutant, Ile3!Pro. J Mol Biol 227:917–933, 1992. W Behammer, Z Shao, W Mages, R Rachel, K Stetter, R Schmitt. Flagellar structure and hyperthermophilicity: analysis of a single ﬂagellin gene and its product in Aquiﬂex pyrophilus. J Bacteriol 177:6630–6637, 1995. MA Ceruso, H Weinstein. Structural mimicry of proline kinks: tertiary packing interactions support local structural distortions. J Mol Biol 318:1237–1249, 2002. K Imada, M Sato, N Tanaka, Y Katsube, Y Matsuura, T Oshima. Threedimensional structure of a highly thermostable enzyme, 3-isopropylmalate dehydrogenase of Thermus thermophilus at 2.2 A resolution. J Mol Biol 222: 725–738, 1991. C Ramakrishnan, GN Ramachandran. Stereochemical criteria for polypeptide and protein chain conformations. Biophys J 5:909–933, 1965. AV Eﬁmov. Standard conformations of polypeptide chain in irregular regions of proteins. Mol Biol (Mosc) 20:250–260, 1986. JH Hurley, DA Mason, BW Matthews. Flexible-geometry conformational

322

127.

128. 129. 130. 131. 132. 133. 134.

135.

136.

137.

138.

139.

140. 141. 142.

Suzuki energy maps for the amino acid residue preceding a proline. Biopolymers 32: 1443–1446, 1992. RH Yun, A Anderson, J Hermans. Proline in a-helices: stability and conformation studied by dynamic simulation. Proteins Struct Funct Genet 10: 219– 228, 1991. JS Richardson, DC Richardson. Amino acid preferences for speciﬁc locations at the ends of a helices. Science 240:1648–1652, 1988. S Penel, E Hughes, AJ Doig. Side-chain structures in the ﬁrst turn of the ahelix. J Mol Biol 287:127–143, 1999. T Alber. Mutational eﬀects on protein stability. Annu Rev Biochem 58:765– 798, 1989. BM Matthews. Mutational analysis of protein stability. Curr Opin Struct Biol 1:17–21, 1991. TA Kunkel, J Roberts, R Zakour. Mutagenesis and protein engineering. Methods Enzymol 154:367–382, 1987. G Nemethy, SJ Leach, HA Scheraga. The inﬂuence of amino acid side chains on the free energy of helix–coil transitions. J Phys Chem 70:998–1004, 1966. N Yoshigi, Y Okada, H Maeba, H Sahara, T Tamaki. Construction of a plasmid used for the expression of a sevenfold-mutant barley h-amylase with increased thermostability in Escherichia coli and properties of the sevenfoldmutant h-amylase. J Biochem 118:562–567, 1995. X Zhang, WA Baase, BK Shoichet, KP Wilson, BW Matthews. Enhancement of protein stability by the combination of point mutations in T4 lysozyme is additive. Protein Eng 8:1017–1022, 1995. T Herning, K Yutani, Y Taniyama, M Kikuchi. Eﬀects of proline mutations on the unfolding and refolding of hyman lysozyme: the slow refolding kinetic phase does not result from proline cis–trans isomerization. Biochemistry 30:9882–9891, 1991. W Cao, J Lu, SG Welch, RAD Williams, F Barany. Cloning and thermostability of TaqI endonuclease isoschizomers from Thermus species SM32 and Thermus ﬁliformis Tok6A1. Biochem J 333:425–431, 1998. OR Veltman, G Vriend, PJ Middelhoven, B van den Burg, G Venema, VGH Eijsink. Analysis of structural determinants of the stability of thermolysin-like proteases by molecular modelling and site-directed mutagenesis. Protein Eng 9:1181–1189, 1996. UH Sauer, S Dao-Pin, BW Matthews. Tolerance of T4 lysozyme to proline substitutions within the long interdomain a-helix illustrates the adaptability of proteins to potentially destabilizing lesions. J Biol Chem 267:2393–2399, 1992. S Dasgupta, JA Bell. Design of helix ends. Amino acid preferences, hydrogen bonding and electrostatic interactions. Int J Pept Protein Res 41:499–511, 1993. A Shaw, R Bott. Engineering enzymes for stability. Curr Opin Struct Biol 6:546–550, 1996. K Peters, H Hinz, G Cesareni. Introduction of a proline residue into position 31 of the loop of the dimeric 4-a-helical protein ROP causes a drastic destabilization. Biol Chem 378:1141–1152, 1997.

The Proline Rule 143.

144.

145.

146.

323

J Kawaguchi, K Numata, T Oshima. Rigid type II h-turn contributes to stability of Thermus thermophilus isopropylmalate dehydrogenase. Protein Eng 7: 1158–1159, 1994. K Numata, Y Hayashi-Iwasaki, J Kawaguchi, M Sakurai, H Moriyama, N Tanaka, T Oshima. Thermostabilization of a chimeric enzyme by residue substitutions: four amino acid residues in loop regions are responsible for the thermostability of Thermus thermophilus isopropylmalate dehydrogenase. Biochim Biophys Acta 1545:174–183, 2001. F Catanzao, G Graziano, V Cafaro, G D’Alessio, A Di Donato, G Barone. From ribonuclease A toward bovine seminal ribonuclease: a step by step thermodynamic analysis. Biochemistry 36:14403–14408, 1997. J Georis, F De Lemos Esteves, J Lamotte-Brasseur, V Bougnet, B Devreese, F Giannotta, B Granier, J Frere. An additional aromatic interaction improves the thermostability and thermophilicity of a mesophilic family 11 xylanase: structural basis and molecular study. Protein Sci 9:466–475, 2000.

14 Homing Endonucleases: Tools and Targets for Protein Engineering ´e Noe ¨l, Vera Pingoud, Alfred Pingoud, Ann-Jose Shawn Steuer, and Wolfgang Wende ¨t Justus-Liebig-Universita Giessen, Germany

1 1.1

INTRODUCTION Structure, Function, and Evolution of Homing Endonucleases

Homing endonucleases are a recently discovered class of enzymes (reviews: 1–5). To date (May 2002), 92 diﬀerent homing endonucleases have been identiﬁed (http://rebase.neb.com/rebase/link_homing (6). They all recognize extended speciﬁc DNA sequences of up to 40 base pairs in length and cleave the DNA in both strands of the recognition sequence or nearby, thereby producing DNA fragments with sticky ends. These fragments are recombinogenic and, provided that homologous sequences are present, may lead to the integration of homing endonuclease genes by a double-strand break repair mechanism into alleles that lack them (Fig. 1). This process of unilateral transfer of an intervening sequence was called ‘‘homing’’ and the endonucleases that initiate this process were therefore called ‘‘homing endonucleases’’ (7,8). The reaction is not reversible because the cleavage site for 325

326

Pingoud et al.

Homing Endonucleases

327

the homing endonuclease is disrupted by the insertion of the intervening sequence. The genes coding for homing endonucleases can be considered as selﬁsh genetic elements, which increase in frequency within a population. However, in the absence of a selective pressure, these genes will mutate to give inactive variants or will be deleted altogether. Horizontal intra- and interspecies gene transfer could replace nonfunctional or deleted homing endonuclease genes or lead to the acquisition of other homing endonuclease genes (9,10). This may explain the ﬁnding that homing endonucleases are widely distributed in fungi, protists, eubacteria, archaea, and bacteria (phages). Homing endonucleases can be coded for by open-reading frames in group I and group II introns (I-nucleases) or can occur as in-frame spacers in pre-proteins, consisting of an intein (a protein intron representing the coding sequence of the PI-nuclease) and two ﬂanking exteins (which, after protein splicing, are fused to produce a functional protein, unrelated in function to the homing endonuclease, but usually essential for the cell). The nomenclature of homing endonucleases is similar to that of restriction endonucleases: A three-letter code designates the species from which the enzyme was ﬁrst isolated, followed by a Roman numeral to distinguish diﬀerent enzymes from the same organism. In addition, a preﬁx (I-, PI-, or F-) characterizes the genomic localization of the genes encoding the homing endonucleases (11). Thus I-CreI designates the ﬁrst intron-encoded homing endonuclease discovered in Chlamydomonas reinhardtii, whereas PI-SceI was the ﬁrst protein intron (intein) encoded homing endonuclease isolated from Saccharomyces cerevisiae. Only a few homing endonucleases are encoded by free-standing open reading frames (F-nucleases), such as the Ho-endonuclease that initiates mating type switch in yeast, i.e., F-SceII. A classiﬁcation of homing endonucleases is based on diﬀerent shortsequence motifs—the LAGLIDADG, His-Cys box/HNH, and GIY-YIG motifs—which deﬁne three subfamilies of homing endonucleases (Table 1). Crystal structures were determined for four homing endonucleases of the LAGLIDADG subfamily and one for each member of the other two sub-

Figure 1 The homing event. Homing is a site-speciﬁc unidirectional transfer of a mobile genetic element (an intein or a group I or II intron) from one allele to a homologous allele lacking this element. The process is initiated by a double-strand cut in the recipient allele, catalyzed by a homing endonuclease that is usually inteinencoded such as PI-SceI (left), or intron-encoded such as I-CreI (right). Doublestrand break repair, which involves duplication of the mobile element in the donor allele, leads to the speciﬁc insertion of the mobile element coding for the homing endonuclease in the recipient allele by homologous recombination.

208-AYLLGLWIGDGL139-YWLAGFIAGDGC-

a a

a2

a

PI-SceI

PI-PfuI

His-Cys box motif/ HNH motif a I-PpoI

GIY-YIG motif I-TevI

240-SFIAGLFDAEGH-

316-TFLAGLIDSDGYV-

83-AFIKGLYVAEGDK-

P2

N

N

G

G

G

G

G

5V-CAACjGCTCAGT - 12bp - GGGTCTACC-3V 3V-GTjTGCGAGTCA - ...... - CCCAGATGG-5V

NG

5V-GAGAGAATTCjCATCG-3V 3V-CTCTCTjTAAGGTAGC-5V

N

N N

5V-AAAACGTCGTGAjGACAGTTT-3V 3V-TTTTGCAGjCACTCTGTCAAA-5V 5V-TTGCCGGG TAAGjTTCCGGCG-3V 3V-AACGGCCCjATTCAAGGCCGC-5V 5V-TATGTCGGGTGCjGGAGAAAGAGGTAATGAAA-3V 3V-ATACAGCCjCACG C C T C T T T C T C C A T T A CTTT-5V 5V-TACAGAAGATGGGAGGjAGGACC G GACTCA-3V 3V-ATGTCTTCTACCjC T C C TCCCTGGCCTGAGT-5V

Recognition Sequence

a The His-Cys box motif and HNH motif containing homing endonucleases were originally considered as two separate subfamilies. Based on the observation that they share a common active site architecture, it was proposed to reclassify them in a single subfamily—the hhaMe family, to which several nonspeciﬁc nucleases, such as the colicin E9 and Serratia nuclease, also belong (Ref. 95). However, it must be emphasized that little structural homology exists outside the active site and that the enzymes having the HNH motif lack the Zn-binding motifs characteristic for the His-Cys box subfamily member.

1-KSGIYQIKNTLNNKVYVGSAK

94-CTASHLCHNTRCHNPLHLC-112 125-CPGPNGGCVHAVVC-138

11-AYLLGLIIGDGG-

a

10-LYLAGFVDGDGS-

I-DmoI

P1

Conserved Motif

a2

Subunit Composition

Subfamilies of Homing Endonucleases

LAGLIDADG motif I-CreI

Enzyme

Table 1

328 Pingoud et al.

Homing Endonucleases

329

families (Figs. 2 and 3), some of them as co-crystal structures together with their substrate and product (Fig. 3). As can be seen from these co-crystal structures, the DNA is severely bent in the I-PpoI DNA complex and more or less straight in the I-CreI DNA complex. The co-crystal structures of homing endonucleases, in conjunction with biochemical data, clearly show how these enzymes recognize their DNA substrates. It is noteworthy that, diﬀerent from restriction enzymes that form multiple interactions with the bases of their relatively short recognition sequences (4 to 8 base pairs in length), homing endonucleases do not make exhaustive use of potential interactions with their recognition sequences. For example, EcoRV, a Type II restriction enzyme with a 6-base-pair recognition sequence, is involved in a similar number of base-speciﬁc contacts as I-CreI, a homing endonuclease with a 22-base-pair recognition sequence (5). Therefore it is not surprising that homing endonucleases usually accept degeneracies in their recognition sequences, which are not stringently deﬁned and whose precise boundaries are generally not known, in contrast to what is observed with Type II restriction enzymes. The family of homing endonucleases displays a similar heterogeneity as restriction endonucleases, which also catalyze a very speciﬁc double-strand cleavage reaction but have diﬀerent architectures and subunit compositions (reviews: 12,13). Of particular interest here is that homodimeric as well as monomeric homing endonucleases with more than one domain exist. This is most obvious for the LAGIDADG subfamily, with homodimeric (I-CreI), small monomeric (I-DmoI), and large monomeric representatives (PI-SceI, PI-PfuI). It is tempting to speculate that these enzymes evolved from a common homodimeric precursor by gene fusion (to generate a small monomeric enzyme) and acquisition of sequences coding for new domains to gain the catalytic machinery for protein splicing and to accommodate larger DNA recognition sequences (Fig. 4). All homing endonucleases known so far share with restriction endonucleases the requirement for Mg2+ ions (or Mn2+) as essential cofactors in the cleavage reaction, and the mode of cleavage, which is always a double-strand cleavage. This implies that even the monomeric homing endonucleases must have two active sites, which was veriﬁed for PI-SceI (14). Interestingly, diﬀerent from restriction enzymes, several homing endonucleases seem to ﬁrmly stick to one or both of their product fragments, shown, e.g., for I-CreI (15), I-SceI (16), I-TevII (17), PI-SceI (18,19), and F-SceII (Ho-endonuclease) (20), which may be of advantage for the homing process, as it protects one or both of the recombinogenic ends from degradation. The mechanism of DNA cleavage has been studied for representatives of the LADLIDADG and the His-Cys box/HNH subfamilies, both by structural and functional studies. For I-CreI, a two-metal ion mechanism

330

Pingoud et al.

Figure 2 Crystal structures of homing endonucleases. Shown in a ribbon presentation are the crystal structures of four LAGLIDADG subfamily members, I-CreI (Ref. 96), I-DmoI (Ref. 97), PI-SceI (Ref. 33), and I-PfuI (Ref. 47), and a crystal structure of a member of the His-Cys box subfamily, I-PpoI (Ref. 21).

Homing Endonucleases

331

Figure 3 Co-crystal structures of homing endonucleases and their substrates. Shown in a ribbon presentation are the co-crystal structures of the I-CreI DNA (Ref. 98) and I-PpoI DNA complexes (Ref. 99) as well as the structure of the DNA-binding domain of I-TevI with its substrate (Ref. 100) (the structure of the catalytic domain of I-TevI has been recently determined (P. van Roey, personal communication). Note the pronounced bending of the DNA in the I-PpoI DNA complex.

332

Pingoud et al.

Figure 4 Evolution of homing endonucleases. The scheme is based on the structures of diﬀerent members of the LAGLIDADG subfamily of homing endonucleases. Starting from a homodimeric enzyme, such as I-CreI, a monomeric enzyme, such as IDmoI, evolved. By acquisition of a protein splicing domain, such as the gyrA-miniintein, a homing endonuclease was generated, which allowed ‘‘colonization’’ of protein encoding open reading frames. Fusion with DNA-binding domains expanded the range of possible target sequences, e.g., PI-SceI and PI-PfuI.

has been proposed, with one metal ion being shared between the two active sites (5). I-PpoI, in contrast, seems to follow a one-metal ion mechanism (21,22) (Fig. 5). Homing endonucleases have attracted a great deal of attention because their recognition sites (typically between 12 and 40 base pairs) are extremely rare. For example, an 18-base-pair recognition site will occur only once in a random sequence of 71010 base pairs, i.e., on average only once in 20 mammalian-sized genomes (23). However, because homing endonucleases tolerate some sequence degeneracy within their recognition sequence (18,24–26), the real target size speciﬁcity will be 10 to 12 base pairs on average. Indeed, it was Figure 5 Mechanism of DNA cleavage by homing endonucleases. Shown on top is, in general form, the transition state for a phosphodiesterase that follows an SN2 type mechanism in cleaving DNA, which is characterized by an in-line attack of an activated water molecule on phosphorus. A general base serves to activate the water molecule, a Lewis acid will polarize the P–O bond to be cleaved and stabilize the extra negative charge accumulating in the transition state, and a general acid will protonate the leaving group. In the schemes below, the active centers of I-PpoI and I-CreI are depicted. (From Ref. 5.)

Homing Endonucleases

333

334

Pingoud et al.

shown for I-SceI that it does not cleave a variety of bacterial and phage genomes; however, it cleaves the genome of some yeast strains once (27), similarly as PI-SceI, which also cleaves the yeast genome only once (28). Therefore homing endonucleases are ideal tools for large genome mapping and, presumably in the near future, also for gene targeting. In addition, homing endonucleases that have a protein splicing activity can be used to fuse proteins for a variety of purposes [it must be noted that a homing endonuclease, PI-SceI, was the ﬁrst protein identiﬁed that catalyzes its excision from a precursor protein with concomitant fusion of the protein parts ﬂanking the homing endonuclease (29,30)]. Given the importance of homing endonucleases as tools, it is not surprising that eﬀorts were and are being undertaken to improve them by protein engineering, the focus of course being on changing their sequence speciﬁcity. In the present contribution, we will ﬁrst introduce the system that we are working with and then describe what has been achieved so far and what one can hope to achieve. 1.2

The Homing Endonuclease PI-SceI

The 119-kDa primary translation product of the VMA1 gene of S. cerevisiae undergoes an autocatalytic protein splicing reaction that excises an internal 50-kDa protein (intein), the homing endonuclease PI-SceI (formerly known as VDE, VMA1 derived endonuclease), and joins the amino- and carboxylterminal segments (exteins) to generate the 69-kDa subunit of the vacuolar membrane-associated H+-ATPase (29,31,32). Therefore PI-SceI has two catalytic activities, a protein splicing and an endonucleolytic cleavage activity, such as 17 out of 92 other homing endonucleases known to date. PI-SceI is a two-domain protein (33), with one domain (domain I) harboring the catalytic center for protein splicing, and another domain (domain II) being responsible for DNA cleavage (Fig. 6). These domains can be separately cloned and expressed. Whereas the DNA cleavage domain in isolated form only has a nonspeciﬁc DNA binding activity and is not able to hydrolyze DNA, the isolated protein splicing domain is active in protein splicing and in addition speciﬁcally binds to the right half of the PI-SceI recognition sequence (34). This means that both domains must be involved in DNA binding. This was experimentally demonstrated by detailed site-directed mutagenesis (35 36 37) and cross-linking experiments (38 39 40). Based on the results of these experiments and the previous ﬁnding that DNA binding by PI-SceI is associated with DNA bending by up to 75j (18,19), a model for the PI-SceI DNA complex was proposed (39,41) (Fig. 6). This model, of course, is crude and does not consider possible conformational changes of

Homing Endonucleases

335

Figure 6 A model of the PI-SceI DNA complex. The model is based on the crystal structure of PI-SceI, cross-linking data, and the ﬁnding that the DNA is bent by approximately 75j in the complex. Three sets of cross-linking data were used for the model: photocross-linking data obtained with iodopyrimidine-substituted DNA (dashed lines leading to black circles) (Refs. 38,39), photocross-linking data obtained with p-acidophenacyl-substituted phosphorothioate-containing DNA (lines leading to grey circles) (Ref. 39), and p-acidophenacyl-modiﬁed PI-SceI-cysteine variants (lines originating from grey circles) (Ref. 41). The cross-link positions are indicated; the arrowhead denotes whether the cross-link was formed by using a substituted DNA or a substituted protein. The two stars indicate the positions of the active centers in domain II.

the protein, e.g., movement of ﬂexible loops that may contact the DNA after initial binding (37,40) (A.J. Noe¨l, unpublished), or hinge movements of a subdomain of domain I to optimize the protein–DNA interface (E. Werner, personal communication). PI-SceI is a monomeric enzyme, such as several other homing endonucleases, which raises the question of how a monomeric enzyme can perform a double-strand cut, the most obvious solution being that it harbors two active sites (19). When the crystal structure of PI-SceI was determined (33), it became clear that the DNA cleavage domain has an internal symmetry (42). Subsequent experiments demonstrated that PI-SceI indeed has two catalytic centers for cleavage of the two strands of its DNA substrate (14). Relevant catalytic residues, including those responsible for Mg2+-binding, were identiﬁed by a mutational analysis (35,37,43,44). Not surprisingly, the two catalytic centers of the monomeric PI-SceI can be nicely superimposed with the symmetry-related catalytic centers of the homodimeric I-CreI (Fig. 7). The identiﬁcation of Asp-218 and Asp-326 in PI-SceI to be the principal Mg2+

336

Pingoud et al.

Figure 7 The catalytic centers of PI-SceI superimposed on the I-CreI DNA complex structure. Structural alignment of the catalytic centers of PI-SceI and I-CreI (together with the DNA substrate and the three metal ions as seen in the co-crystal structure of the I-CreI DNA complex). The catalytic residues in PI-SceI are D218, D229, R231, K403 (catalytic center I, attacking the top strand of the recognition sequence) and K301, D326, T341, H343 (catalytic center II, attacking the bottom strand of the recognition sequence). The homologous residues of the two identical catalytic centers of I-CreI are diﬀerentiated by a hyphen. (From Ref. 5.)

binding ligands was achieved by Mn2+-rescue experiments of the D218C and D326C variants, which are inactive in the presence of the oxophilic Mg2+, but active in the presence of the thiophilic Mn2+ (44). 2 2.1

HOMING ENDONUCLEASES AS TOOLS IN BIOTECHNOLOGY Protein Splicing

One special feature of the LAGLIDADG family of homing endonucleases is that some members are intein-encoded. As mentioned above, these endonucleases are translated as a part of a precursor polypeptide and catalyze their own excision from the pre-protein as well as the concomitant fusion of the ﬂanking protein fragments to form a mature host protein and the free homing endonuclease. The crystal structure analysis of two intein-encoded homing endonucleases is now published: PI-SceI (33,45,46) and PI-PfuI (47). These proteins consist of two separate domains, which possess two diﬀerent

Homing Endonucleases

337

enzymatic activities. The endonuclease domain, which is structurally homologous to other homing endonucleases from the LAGLIDADG family, is responsible for the cleavage of the DNA substrate and consequently for the homing process. The additional protein splicing domain is responsible for the autocatalytic proteolytic excision of the mature homing endonuclease. Structurally, this domain resembles in its catalytic core the structure of the C-terminal part of the Hedgehog protein [Hint domain (48)] and another mini-intein [gyrA (49)], which both lack the endonuclease domain. Protein splicing was ﬁrst described in 1990 by Kane et al. (29) for the PI-SceI intein, and more than 115 inteins with and without an endonuclease domain have been registered in the intein database InBase (50). After having established which amino acid residues are essential for the splicing activity (51–53), eﬀorts were directed to know which parts of an intein-encoded homing endonuclease can be deleted under preservation of the protein splicing activity to produce a functional mini-intein (54–56). These studies not only helped to resolve the chemical mechanism of protein splicing [reviewed by Noren et al. (57)] (Fig. 8), but also supported the hypothesis that mobile inteins as found in the homing endonucleases arose by invasion of an endonuclease gene into a sequence encoding a small, functional protein splicing element (54,39). Today, a versatile intein technology is available as a tool for protein engineering (58). One of the main applications of the intein technology is that it allows for a novel single-column protein puriﬁcation system, where the recombinant protein is fused to an intein that can be bound via an aﬃnity tag to a column matrix. Following the washing step during the puriﬁcation procedure, the recombinant protein can be released by a thiol-induced cleavage (59–61). Further developments of the intein technology lead to expression systems that allow intein-mediated ligation of proteins in vitro (61–63). This technique can be used to add modiﬁed or labeled peptides as well as protein fragments to a recombinant protein (64,65) to express cytotoxic proteins (66) or to produce cyclic proteins or peptides (67,68). Most relevant is a technique to segmentally modify proteins for NMR structural analysis (69–71). Recently, new strategies for monitoring protein–protein interactions in vivo have been developed that are based on the intein technology (72). 2.2

Genomic Mapping

A random DNA sequence is cut on average once in 44 (= 256), 46 (= 4096), or 48 (= 65536) base pairs by a restriction endonuclease that has a 4-, 6-, or 8base-pair recognition sequence, respectively. Therefore mapping of large genomes relies on enzymes that recognize longer sequences. This can be performed by Type II restriction endonucleases in conjunction with speciﬁc

338

Pingoud et al.

Homing Endonucleases

339

DNA methyltransferases using the Achilles’ heel strategy (73,74) and the PNA-assisted rare cleavage approach (75), or, more simply, by homing endonucleases (review: 76). Six homing endonucleases are currently commercially available: I-CeuI, I-DmoI, I-PpoI, I-SceI, PI-PspI, and PI-SceI. I-CeuI, for example, has only one cleavage site in the large subunit rRNA gene in eubacteria, and has been very instrumental in mapping these genes and thereby providing landmarks in eubacterial genomes such as the Bacillus subtilis genome (77). Similarly, a single I-PpoI site seems to be present in each of the rDNA repeats in most eukaryotes, which was used for the physical mapping of the Arabidopsis thaliana genome (78). Mapping of large genomes can also be performed by artiﬁcial introduction of homing endonuclease recognition sites by homologous recombination or random insertion using transposons followed by cleavage with homing endonucleases. This was used for physical mapping of the S. cerevisiae and Candida albicans genome sequencing projects (79,80). 2.3

Gene Targeting

Homing endonucleases, as well as dimeric restriction enzymes (81), could be extremely useful for gene targeting, gene replacement, and, ﬁnally, gene therapy. A major problem for gene targeting is the extremely low eﬃciency of recombination between an introduced DNA and the homologous chromosomal target (1/106 in mammalian cells). However, a speciﬁc double-strand break in the target sequence increases the frequency of speciﬁc homologous recombination events by more than 1000-fold. For this purpose, rare-cutting restriction endonucleases and in particular homing endonucleases can be used (82–86). Of course, it has to be previously demonstrated that the introduction of a homing endonuclease into a cell or the expression of its gene is not deleterious for the cell. This was shown, e.g., for I-SceI and murine cells, which allowed the replacement of a defective gene in an experimental system by an intact gene supplied in trans (87) (Fig. 9). The repair of the chromosomal double-strand break(s) can occur by nonhomologous as well as homologous recombination, depending on the degree of homology between the target DNA and the transfected DNA. Gene targeting using homing endonucleases requires that recognition sites are present at suitable location or are artiﬁcially introduced, but if this

Figure 8 The mechanism of protein splicing. The intein ﬁrst catalyzes an intramolecular rearrangement of the primary translation product (N!S acyl shift followed by transesteriﬁcation) that leads to a branched intermediate. This intermediate is cleaved to produce the mature intein (hydrolysis of the succinimide) and the fused exteins (S!N acyl shift).

340

Pingoud et al.

Figure 9 Stimulation of recombination by a targeted double-strand break. Recombination requires regions of homology (indicated by shading) in the donor (with the intact gene, supplied on a transfected plasmid) and recipient (the chromosome carrying the gene defect) and is stimulated by a double-strand cut catalyzed by a homing endonuclease, either expressed from a transfected plasmid or directly introduced into the cell.

needs to be carried out, one might well use site-speciﬁc recombination using, e.g., the cre-lox system of the P1 phage (88). Clearly, it would be more elegant to use dedicated site-speciﬁc endonucleases! 3

HOMING ENDONUCLEASES AS TARGETS IN BIOTECHNOLOGY: CHANGING THE SPECIFICITY OF HOMING ENDONUCLEASES

Changing the speciﬁcity of enzymes has been and still is an extremely demanding goal in the ﬁeld of protein engineering. This is particularly true for very speciﬁc enzymes, such as restriction enzymes, for which only limited success can be reported (89). It is not yet clear whether homing endonucleases will be more malleable than restriction enzymes in eﬀorts to alter the speciﬁcity. Reasons to believe that this is the case are as follows: (1) Homing endonucleases in general are tolerant to base substitutions in their recognition sequences. (2) Some homing endonucleases, such as PI-SceI, have a two-domain structure, with an additional DNA binding domain or sub-

Homing Endonucleases

341

domain that is separated from the DNA cleavage domain: Substitutions of amino acid residues involved in base recognition but not in catalysis might be more easily performed with such homing endonucleases than with singledomain homing endonucleases or restriction enzymes in which coupling of recognition and catalysis is locally constrained and interwoven. (3) Sequence of four enzymes belonging to the LAGLIDADG subfamily cleave the same DNA substrate as I-CreI, although only 7 out of 21 I-CreI amino acids interactions with DNA are conserved, suggesting that there is only a weak pressure in each subfamily to maintain identical protein–DNA contacts (90). Two examples serve to illustrate that speciﬁcity of some homing endonucleases can be altered: Ho-endonuclease of yeast (F-SceII) belongs to the subfamily of LAGLIDADG enzymes and shares with PI-SceI six of the seven intein motifs (Fig. 10), but is not an intein—making it likely that it evolved from a PI-nuclease. It is unique among the LADGIDADG enzymes in that it uses three zinc ﬁngers for DNA recognition, a motif often used by eukaryotic transcription factors for speciﬁc DNA binding in a modular fashion (91). Nahon and Raveh (92) have replaced the three zinc ﬁngers and the 6th intein motif in Ho-endonuclease by the three zinc ﬁngers from the yeast and vertebrate transcription factors Swi5 and Sp1, respectively, and demon-

Figure 10 Engineering of homing endonucleases with new speciﬁcities by fusion with DNA recognition modules. Ho-endonuclease is related to PI-SceI, but is devoid of a protein splicing activity, as it has only six of the seven conserved intein motifs. Motifs 1 and 2 can be deleted without impairing the endonuclease activity. When motif 7 and the three zinc ﬁngers are replaced by the three zinc ﬁngers of the Swi5 or Sp1 transcription factor; a functional chimera with a new speciﬁcity is obtained. (From Ref. 92.)

342

Pingoud et al.

strated that these chimeric endonucleases are active in vivo and display the expected new speciﬁcity, but also cleave DNA at some other, not identiﬁed sites. If DNA recognition modules can be exchanged among related homing endonucleases, it should be possible not only to generate chimeras with new speciﬁcities but also with speciﬁcities intermediate between those of the parent enzymes. To test this hypothesis, we selected two closely related homing endonucleases of the LAGLIDADG subfamily: PI-SceI shares with the putative homing endonuclease VMA intein from the yeast Candida tropicalis (‘‘PI-CtrI’’) 32% amino acid sequence identity and a similar DNA recognition site, 24 out of 31 bases being identical. We exchanged the coding region for the DNA binding subdomain of PI-SceI (amino acid residues 86–174) for the coding region of the binding subdomain of ‘‘PI-CtrI’’ (amino acid residues 89–174) (Fig. 11). The chimeric enzyme was expressed in Escherichia coli. The cleavage activity and speciﬁcity of the puriﬁed enzyme was analyzed. Despite the fact that PI-SceI and ‘‘PI-CtrI’’ have only 25% identity in this region, the chimeric enzyme cleaves both sites with similar activity (Table 2); it is noteworthy that the chimeric enzyme otherwise is as speciﬁc as PI-SceI: It does not cleave, e.g., bacteriophage E DNA (W. Wende, unpublished). As mentioned above, rational protein design to change the speciﬁcity of restriction endonucleases was only partially successful; the most promising results were obtained instead by directed evolution, e.g., random mutagenesis of selected regions of the proteins combined with powerful high-throughput screening procedures (review: 93). Recently, selection systems have been developed for the screening of homing endonucleases of desired, predetermined speciﬁcity. Gruen et al. (94) devised an in vivo selection system that is based on two plasmids, one coding for a toxic gene product, the other for a homing endonuclease variant (library), which has to cleave the plasmid coding for the toxic gene product to allow the cells to survive (Fig. 12). The two-plasmid system we have developed operates on similar principle: One plasmid codes for the variant homing endonuclease fused to a ﬂuorescent protein (CFP), the other for a marker gene that is only expressed to give a ﬂuorescent protein (YFP) when the variant homing endonuclease does not cleave the plasmid (S. Steuer and W. Wende, unpublished) (Fig. 13). Whether these and similar selection systems are powerful enough to make the in vitro evolution of homing endonucleases with new speciﬁcities possible remains to be seen. NOTE ADDED IN PROOF Barry Stoddard, Brett Chevalier, and colleagues (Seattle) have succeeded in engineering of an artiﬁcial endonuclease with unique site speciﬁcity using

Figure 11 Subdomain ‘‘swap’’ between PI-SceI and ‘‘PI-CtrI’’ to produce a chimeric homing endonuclease with a new speciﬁcity. PI-SceI and ‘‘PI-CtrI’’ are closely related homing endonucleases with similar recognition sites. Replacing the DNA recognition subdomain of PI-SceI by that of ‘‘PI-CtrI’’ leads to a fully active chimera, which cleaves both the PI-SceI and ‘‘PI-CtrI’’ recognition sites.

Homing Endonucleases 343

344 Table 2

Pingoud et al. Relative Cleavage Activity of the Chimeric PI-SceI/‘‘PI-CtrI’’ Enzyme Linear

Substrate Plasmid with PI-SceI site Plasmid with ‘‘PI-CtrI’’ site

2+

Mg

Supercoiled Mn

2+

Mg

2+

Mn2+

0.8

1.0

1.0

1.0

0.3

0.5

0.6

0.6

1.0 = wt PI-SceI with lin. plasmid with PI-SceI site

body parts from two LAGLIDADG homing endonucleases, I-CreI and I-DmoI. The engineered enzyme, E-DreI, whose crystal structure was solved, cleaves a composite site derived from the I-CreI and I-DmoI recognition site, with only a slightly reduced catalytic activity compared to the parent enzymes. Marlene Belfort, George Silva and colleagues (Albany) substituted the LAGLIDADG helices of I-DmoI by those of I-CreI; the resulting variants showed reduced activity, but could be reactivated by reintroducing one amino acid residue from one of the I-DmoI LAGLIDADG helices. In addition, they produced a split version of I-DmoI: The resulting heterodimer showed partial activity.

Figure 12 A positive selection system to screen for homing endonucleases with new speciﬁcities. A library of variant homing endonuclease genes is cloned into an expression plasmid. The ‘‘indicator’’ plasmid carries the gene for a barnase variant, which when expressed, would lead to cell death. Only those cells expressing a homing endonuclease capable of cleaving the ‘‘indicator’’ plasmid at the recognition site for the homing endonuclease with the new speciﬁcity are viable. (From Ref. 94.)

Homing Endonucleases

345

Figure 13 A screening system to identify homing endonucleases with new speciﬁcities. The expression plasmid harbors the gene for a homing endonuclease variant (from a library of genes) fused to cyan ﬂuorescent protein (CFP) to have a control for full-length expression of the PI-SceI variants. The ‘‘indicator’’ plasmid carries the gene for the yellow ﬂuorescent protein (YFP), which is only expressed when the plasmid is not cleaved at the recognition site. Therefore screening is for cells that show the characteristic ﬂuorescence of CFP, but not of YFP.

These achievements demonstrate that the LAGLIDADG homing endonucleases are promising candidates for engineering hybrid enzymes with new speciﬁcities. ACKNOWLEDGMENTS We thank Karina Urbach for typing the manuscript. Work in the authors’ laboratory was supported by the Deutsche Forschungsgemeinschaft (Pi 155/ 2-2). REFERENCES 1.

2.

JE Mueller, M Bryk, N Loizos, M Belfort, RS Lloyd, RJ Roberts. Homing endonucleases. In: SM Linn, RS Lloyd, RJ Roberts, eds. Nucleases. Cold Spring Harbor: Cold Spring Harbor Laboratory Press, 1993, pp 111–143. M Belfort, RJ Roberts. Homing endonucleases: keeping the house in order. Nucleic Acids Res 25:3379–3388, 1997.

346

Pingoud et al.

3.

MS Jurica, BL Stoddard. Homing endonucleases: structure, function and evolution. Cell Mol Life Sci 55:1304–1326, 1999. FS Gimble. Invasion of a multitude of genetic niches by mobile endonuclease genes. FEMS Microbiol Lett 185:99–107, 2000. BS Chevalier, BL Stoddard. Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility. Nucleic Acids Res 29:3757– 3774, 2001. RJ Roberts, D Macelis. REBASE–restriction enzymes and methylases. Nucleic Acids Res 29:268–269, 2001. B Dujon, L Colleaux, A Jacquier, F Michel, C Monteilhet. Mitochondrial introns as mobile genetic elements: the role of intron-encoded proteins. Basic Life Sci 40:5–27, 1986. B Dujon. Group I introns as mobile genetic elements: facts and mechanistic speculations—A review. Gene 82:91–114, 1989. MR Goddard, A Burt. Recurrent invasion and extinction of a selﬁsh gene. Proc Natl Acad Sci USA 96:13880–13885, 1999. FS Gimble. Degeneration of a homing endonuclease and its target sequence in a wild yeast strain. Nucleic Acids Res 29:4215–4223, 2001. FB Perler, EO Davis, GE Dean, FS Gimble, WE Jack, N Neﬀ, CJ Noren, J Thorner, M Belfort. Protein splicing elements: inteins and exteins—A deﬁnition of terms and recommended nomenclature. Nucleic Acids Res 22:1125–1127, 1994. DT Dryden, NE Murray, DN Rao. Nucleoside triphosphate-dependent restriction enzymes. Nucleic Acids Res 29:3728–3741, 2001. A Pingoud, A Jeltsch. Structure and function of type II restriction endonucleases. Nucleic Acids Res 29:3705–3727, 2001. F Christ, S Schoettler, W Wende, S Steuer, A Pingoud, V Pingoud. The monomeric homing endonuclease PI-SceI has two catalytic centers for cleavage of the two strands of its DNA substrate. EMBO J 18:6908–6916, 1999. J Wang, HH Kim, X Yuan, DL Herrin. Puriﬁcation, biochemical characterization and protein–DNA interactions of the I-CreI endonuclease produced in Escherichia coli. Nucleic Acids Res 25:3767–3776, 1997. A Perrin, M Buckle, B Dujon. Asymmetrical recognition and activity of the ISceI endonuclease on its site and on intron–exon junctions. EMBO J 12:2939– 2947, 1993. N Loizos, GH Silva, M Belfort. Intron-encoded endonuclease I-TevII binds across the minor groove and induces two distinct conformational changes in its DNA substrate. J Mol Biol 255:412–424, 1996. FS Gimble, J Wang. Substrate recognition and induced DNA distortion by the PI-SceI endonuclease, an enzyme generated by protein splicing. J Mol Biol 263:163–180, 1996. W Wende, W Grindl, F Christ, A Pingoud, V Pingoud. Binding, bending and cleavage of DNA substrates by the homing endonuclease PI-SceI. Nucleic Acids Res 24:4123–4132, 1996. Y Jin, G Binkowski, LD Simon, D Norris. HO endonuclease cleaves MA

4. 5.

6. 7.

8. 9. 10. 11.

12. 13. 14.

15.

16.

17.

18.

19.

20.

Homing Endonucleases

21.

22. 23. 24.

25.

26.

27.

28.

29.

30.

31.

32. 33. 34.

35.

347

TDNA in vitro by an ineﬃcient stoichiometric reaction mechanism. J Biol Chem 272:7352–7359, 1997. EA Galburt, B Chevalier, W Tang, MS Jurica, KE Flick, RJ Monnat Jr, BL Stoddard. A novel endonuclease mechanism directly visualized for I-PpoI. Nat Struct Biol 6:1096–1099, 1999. SJ Mannino, CL Jenkins, RT Raines. Chemical mechanism of DNA cleavage by the homing endonuclease I-PpoI. Biochemistry 38:16178–16186, 1999. M Jasin. Genetic manipulation of genomes with rare-cutting endonucleases. Trends Genet 12:224–228, 1996. M Bryk, SM Quirk, JE Mueller, N Loizos, C Lawrence, M Belfort. The td intron endonuclease I-TevI makes extensive sequence-tolerant contacts across the minor groove of its DNA target. EMBO J 12:2141–2149, 1993. GM Argast, KM Stephens, MJ Emond, RJ MonnatJrI-PpoI and I-CreI homing site sequence degeneracy determined by random mutagenesis and sequential in vitro enrichment. J Mol Biol 280:345–353, 1998. M Elde, P Haugen, NP Willassen, S Johansen. I-NjaI, a nuclear intron-encoded homing endonuclease from Naegleria generates a pentanucleotide 3V cleavageoverhang within a 19 base-pair partially symmetric DNA recognition site. Eur J Biochem 259:281–288, 1999. A Thierry, A Perrin, J Boyer, C Fairhead, B Dujon, B Frey, G Schmitz. Cleavage of yeast and bacteriophage T7 genomes at a single site using the rare cutter endonuclease I-SceI. Nucleic Acids Res 19:189–190, 1991. MCD Bremer, FS Gimble, J Thorner, CL Smith. VDE endonuclease cleaves Saccharomyces cerevisiae genomic DNA at a single site: physical mapping of the VMA1 gene. Nucleic Acids Res 20:5484, 1992. PM Kane, CT Yamashiro, DF Wolczyk, N Neﬀ, M Goebl, TH Stevens. Protein splicing converts the yeast TFP1 gene product to the 69-kD subunit of the vacuolar H(+)-adenosine triphosphatase. Science 250:651–657, 1990. R Hirata, Y Anraku. Mutations at the putative junction sites of the yeast VMA1 protein, the catalytic subunit of the vacuolar membrane H(+)-ATPase, inhibit its processing by protein splicing. Biochem Biophys Res Commun 188:40–47, 1992. R Hirata, Y Ohsumi, A Nakano, H Kawasaki, K Suzuki, Y Anraku. Molecular structure of a gene, VMA1, encoding the catalytic subunit of H+-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. J Biol Chem 265:6726–6733, 1990. FS Gimble, J Thorner. Homing of a DNA endonuclease gene by meiotic gene conversion in Saccharomyces cerevisiae. Nature 357:301–306, 1992. X Duan, FS Gimble, FA Quiocho. Crystal structure of PI-SceI, a homing endonuclease with protein splicing activity. Cell 89:555–564, 1997. W Grindl, W Wende, V Pingoud, A Pingoud. The protein splicing domain of the homing endonuclease PI-SceI is responsible for speciﬁc DNA binding. Nucleic Acids Res 26:1857–1862, 1998. Z He, M Crist, H Yen, X Duan, FA Quiocho, FS Gimble. Amino acid residues in both the protein splicing and endonuclease domains of the PI-SceI intein mediate DNA binding. J Biol Chem 273:4607–4615, 1998.

348

Pingoud et al.

36. D Hu, M Crist, X Duan, FS Gimble. Mapping of a DNA binding region of the PI-SceI homing endonuclease by aﬃnity cleavage and alanine-scanning mutagenesis. Biochemistry 38:12621–12628, 1999. 37. W Wende, S Schoettler, W Grindl, F Christ, S Steuer, AJ Noel, V Pingoud, A Pingoud. A mutational analysis of DNA binding and cleavage by the homing endonuclease PI-SceI. Mol Biol (Mosk) 34:1054–1064, 2000. 38. V Pingoud, H Thole, F Christ, W Grindl, W Wende, A Pingoud. Photocrosslinking of the homing endonuclease PI-SceI to its recognition sequence. J Biol Chem 274:10235–10243, 1999. 39. F Christ, S Steuer, H Thole, W Wende, A Pingoud, V Pingoud. A model for the PI-SceI DNA complex based on multiple base and phosphate backbone speciﬁc photocross-links. J Mol Biol 300:867–875, 2000. 40. KL Posey, FS Gimble. Insertion of a reversible redox switch into a rare-cutting DNA endonuclease. Biochemistry 41:2184–2190, 2002. 41. D Hu, M Crist, X Duan, FA Quiocho, FS Gimble. Probing the structure of the PI-SceI–DNA complex by aﬃnity cleavage and aﬃnity photocross-linking. J Biol Chem 275:2705–2712, 2000. 42. FS Gimble, X Duan, D Hu, FA Quiocho. Identiﬁcation of Lys-403 in the PISceI homing endonuclease as part of a symmetric catalytic center. J Biol Chem 273:30524–30529, 1998. 43. FS Gimble, BW Stephens. Substitutions in conserved dodecapeptide motifs that uncouple the DNA binding and DNA cleavage activities of PI-SceI endonuclease. J Biol Chem 270:5849–5856, 1995. 44. S Schoettler, W Wende, V Pingoud, A Pingoud. Identiﬁcation of Asp218 and Asp326 as the principal Mg(2+) binding ligands of the homing endonuclease PI-SceI. Biochemistry 39:15895–15900, 2000. 45. BW Poland, MQ Xu, FA Quiocho. Structural Insights into the Protein Splicing Mechanism of PI-SceI. J Biol Chem 275:16408–16413, 2000. 46. R Mizutani, S Nogami, M Kawasaki, Y Ohya, Y Anraku, Y Satow. Proteinsplicing reaction via a thiazolidine intermediate: Crystal structure of the VMA1derived endonuclease bearing the N- and C-terminal propeptides. J Mol Biol 316:919–929, 2002. 47. K Ichiyanagi, Y Ishino, M Ariyoshi, K Komori, K Morikawa. Crystal structure of an archaeal intein-encoded homing endonuclease PI-PfuI. J Mol Biol 300:889–901, 2000. 48. TMT Hall, JA Porter, KE Young, EV Koonin, PA Beachy, DJ Leahy. Crystal structure of a hedgehog autoprocessing domain: homology between hedgehog and self-splicing proteins. Cell 91:85–97, 1997. 49. T Klabunde, S Sharma, A Telenti, WR Jacobs Jr, JC Sacchettini. Crystal structure of GyrA intein from Mycobacterium xenopi reveals structural basis of protein splicing. Nat Struct Biol 5:31–36, 1998. 50. FB Perler. InBase: the Intein Database. Nucleic Acids Res 30:383–384, 2002. 51. SR Chong, Y Shao, H Paulus, J Benner, FB Perler, MQ Xu. Protein splicing involving the Saccharomyces cerevisiae VMA intein. The steps in the splicing pathway, side reactions leading to protein cleavage, and establishment of an in vitro splicing system. J Biol Chem 271:22159–22168, 1996.

Homing Endonucleases 52. 53. 54.

55.

56. 57. 58. 59.

60.

61.

62. 63.

64.

65.

66. 67.

68.

349

MQ Xu, FB Perler. The mechanism of protein splicing and its modulation by mutation. EMBO J 15:5146–5153, 1996. S Chong, MQ Xu. Protein splicing of the Saccharomyces cerevisiae VMA intein without the endonuclease motifs. J Biol Chem 272:15587–15590, 1997. V Derbyshire, DW Wood, W Wu, JT Dansereau, JZ Dalgaard, M Belfort. Genetic deﬁnition of a protein-splicing domain: functional mini-inteins support structure predictions and a model for intein evolution. Proc Natl Acad Sci USA 94:11466–11471, 1997. M Kawasaki, S Nogami, Y Satow, Y Ohya, Y Anraku. Identiﬁcation of three core regions essential for protein splicing of the yeast Vma1 protozyme—a random mutagenesis study of the entire Vma1-derived endonuclease sequence. J Biol Chem 272:15668–15674, 1997. DW Wood, W Wu, G Belfort, V Derbyshire, M Belfort. A genetic system yields self-cleaving inteins for bioseparations. Nat Biotechnol 17:889–892, 1999. CJ Noren, J Wang, FB Perler. Dissecting the chemistry of protein splicing and its applications. Angew Chem, Int Ed Engl 39:450–466, 2000. FB Perler, E Adam. Protein splicing and its applications. Curr Opin Biotechnol 11:377–383, 2000. S Chong, FB Mersha, DG Comb, ME Scott, D Landry, LM Vence, FB Perler, J Benner, RB Kucera, CA Hirvonen, JJ Pelletier, H Paulus, MQ Xu. Singlecolumn puriﬁcation of free recombinant proteins using a self-cleavable aﬃnity tag derived from a protein splicing element. Gene 192:271–281, 1997. SR Chong, GE Montello, AH Zhang, EJ Cantor, W Liao, MQ Xu, J Benner. Utilizing the C-terminal cleavage activity of a protein splicing element to purify recombinant proteins in a single chromatographic step. Nucleic Acids Res 26:5109–5115, 1998. MW Southworth, K Amaya, TC Evans, MQ Xu, FB Perler. Puriﬁcation of proteins fused to either the amino or carboxy terminus of the Mycobacterium xenopi gyrase A intein. Biotechniques 27:110–114, 116, 118–120, 1999. TW Muir, D Sondhi, PA Cole. Expressed protein ligation: a general method for protein engineering. Proc Natl Acad Sci U S A 95:6705–6710, 1998. TC Evans Jr, J Benner, MQ Xu. The in vitro ligation of bacterially expressed proteins using an intein from Methanobacterium thermoautotrophicum. J Biol Chem 274:3923–3926, 1999. GJ Cotton, B Ayers, R Xu, TW Muir. Insertion of a synthetic peptide into a recombinant protein framework: a protein biosensor. J Am Chem Soc 121:1100–1101, 1999. IR Cottingham, A Millar, E Emslie, A Colman, AE Schnieke, C McKee. A method for the amidation of recombinant peptides expressed as intein fusion proteins in Escherichia coli. Nat Biotechnol 19:974–977, 2001. TC Evans Jr, J Benner, MQ Xu. Semisynthesis of cytotoxic proteins using a modiﬁed protein splicing element. Protein Sci 7:2256–2264, 1998. TC Evans Jr, J Benner, MQ Xu. The cyclization and polymerization of bacterially expressed proteins using modiﬁed self-splicing inteins. J Biol Chem 274:18359–18363, 1999. CP Scott, E Abel-Santos, M Wall, DC Wahnon, SJ Benkovic. Production of

350

69.

70.

71. 72. 73.

74. 75. 76.

77.

78.

79.

80. 81. 82.

83.

84.

85.

Pingoud et al. cyclic peptides and proteins in vivo. Proc Natl Acad Sci USA 96:13638–13643, 1999. T Otomo, N Ito, Y Kyogoku, T Yamazaki. NMR observation of selected segments in a larger protein: central-segment isotope labeling through inteinmediated ligation. Biochemistry 38:16040–16044, 1999. R Xu, B Ayers, D Cowburn, TW Muir. Chemical ligation of folded recombinant proteins: segmental isotopic labeling of domains for NMR studies. Proc Natl Acad Sci USA 96:388–393, 1999. H Yu. Extending the size limit of protein nuclear magnetic resonance. Proc Natl Acad Sci USA 96:332–334, 1999. T Ozawa, Y Umezawa. Detection of protein–protein interactions in vivo based on protein splicing. Curr Opin Chem Biol 5:578–583, 2001. M Koob, E Grimes, W Szybalski. Conferring new speciﬁcity upon restriction endonucleases by combining repressor–operator interaction and methylation. Gene 74:165–167, 1988. M Koob. Conferring new cleavage speciﬁcities of restriction endonucleases. Methods Enzymol 216:321–329, 1992. AG Veselkov, VV Demidov, MD Frank-Kamenetskii, PE Nielsen. PNA as a rare genome-cutter. Nature 379:214, 1996. S Johansen, M Elde, A Vader, P Haugen, K Haugli, F Haugli. In vivo mobility of a group I twintron in nuclear ribosomal DNA of the myxomycete Didymium iridis. Mol Microbiol 24:737–745, 1997. T Toda, M Itaya. I-CeuI recognition sites in the rrn operons of the Bacillus subtilis 168 chromosome: inherent landmarks for genome analysis. Microbiology 141:1937–1945, 1995. GP Copenhaver, CS Pikaard. RFLP and physical mapping with an rDNAspeciﬁc endonuclease reveals that nucleolus organizer regions of Arabidopsis thaliana adjoin the telomeres on chromosomes 2 and 4. Plant J 9:259–272, 1996. A Thierry, B Dujon. Nested chromosomal fragmentation in yeast using the meganuclease I-SceI: a new method for physical mapping of eukaryotic genomes. Nucleic Acids Res 20:5625–5631, 1992. WA Fonzi, MY Irwin. Isogenic strain construction and gene mapping in Candida albicans. Genetics 134:717–728, 1993. Y-.G Kim, L Li, S Chandrasegaran. Insertion and deletion mutants of FokI restriction endonuclease. J Biol Chem 269:31978–31982, 1994. P Rouet, F Smih, M Jasin. Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol Cell Biol 14:8096–8106, 1994. M Cohen Tannoudji, S Robine, A Choulika, D Pinto, EL F, C Babinet, D Louvard, F Jaisser. I-SceI-induced gene replacement at a natural locus in embryonic stem cells. Mol Cell Biol 18:1444–1448, 1998. B Elliott, C Richardson, J Winderbaum, JA Nickoloﬀ, M Jasin. Gene conversion tracts from double-strand break repair in mammalian cells. Mol Cell Biol 18:93–101, 1998. C Richardson, ME Moynahan, M Jasin. Double-strand break repair by

Homing Endonucleases

86.

87.

88. 89.

90.

91. 92. 93.

94. 95.

96. 97.

98. 99.

100.

351

interchromosomal recombination: suppression of chromosomal translocations. Genes Dev 12:3831–3842, 1998. C Richardson, B Elliott, M Jasin. Chromosomal double-strand breaks introduced in mammalian cells by expression of I-SceI endonuclease. Methods Mol Biol 133:453–463, 1999. P Rouet, F Smith, M Jasin. Expression of a site-speciﬁc endonuclease stimulates homologous recombination in mammalian cells. Proc Natl Acad Sci USA 91:6064–6068, 1994. NJ Kilby, MR Snaith, JA Murray. Site-speciﬁc recombinases: tools for genome engineering. Trends Genet 9:413–421, 1993. T Lanio, A Jeltsch, A Pingoud. Towards the design of rare cutting restriction endonucleases: using directed evolution to generate variants of EcoRV diﬀering in their substrate speciﬁcity by two orders of magnitude. J Mol Biol 283:59–69, 1998. P Lucas, C Otis, JP Mercier, M Turmel, C Lemieux. Rapid evolution of the DNA-binding site in LAGLIDADG homing endonucleases. Nucleic Acids Res 29:960–969, 2001. A Klug, JW Schwabe. Protein motifs 5. Zinc ﬁngers. FASEB J 9:597–604, 1995. E Nahon, D Raveh. Targeting a truncated Ho-endonuclease of yeast to novel DNA sites with foreign zinc ﬁngers. Nucleic Acids Res 26:1233–1239, 1998. T Lanio, A Jeltsch, A Pingoud. Evolutionary generation versus rational design of restriction endonucleases with novel speciﬁcity. In: S Brakmann, K Johnsson, eds. Directed Molecular Evolution of Proteins. Weinheim: Wiley-VCH, 2002, pp 309–327. M Gruen, K Chang, I Serbanescu, DR Liu. An in vivo selection system for homing endonuclease activity. Nucleic Acids Res 30:e29, 2002. UC Kuehlmann, GR Moore, R James, C Kleanthous, AM Hemmings. Structural parsimony in endonuclease active sites: should the number of homing endonuclease families be redeﬁned? FEBS Lett 463:1–2, 1999. PJ Heath, KM Stephens, RJ Monnat, BL Stoddard. The structure of I-CreI, a Group I intron-encoded homing endonuclease. Nat Struct Biol 4:468–476, 1997. GH Silva, JZ Dalgaard, M Belfort, P Van Roey. Crystal structure of the thermostable archaeal intron-encoded endonuclease I-DmoI. J Mol Biol 286:1123– 1136, 1999. MS Jurica, RJ Monnat Jr, BL Stoddard. DNA recognition and cleavage by the LAGLIDADG homing endonuclease I-CreI. Mol Cell 2:469–476, 1998. KE Flick, MS Jurica, RJ Monnat Jr, BL Stoddard. DNA binding and cleavage by the nuclear intron-encoded homing endonuclease I-PpoI. Nature 394:96– 101, 1998. P Van Roey, CA Waddling, KM Fox, M Belfort, V Derbyshire. Intertwined structure of the DNA-binding domain of intron endonuclease I-TevI with its substrate. EMBO J 20:3631–3637, 2001.

15 Evolutionary Methods for Protein Engineering Huimin Zhao and Wenjuan Zha University of Illinois at Urbana–Champaign Urbana, Illinois, U.S.A.

1

INTRODUCTION

The dynamic Darwinian evolutionary process of diversiﬁcation and selection has resulted in a myriad of shapes, functions, and systems evident in every living organism. For centuries, people have been using a process of selective breeding (classical breeding) to produce new varieties of plants and animals such as dogs, crops, and ﬂowers. By Darwin’s own accounts, ‘‘The key (of breeding) is man’s power of accumulative selection: nature gives successive variations; man adds them up in certain directions useful to them’’ (1). However, only very recently, scientists began to harness the same power of evolution to produce better molecules, ranging from drugs to industrial chemicals, and doing it in days or weeks rather than decades or millions of years. The ingenious process, which creates genetic diversity and selects those with desired features in the laboratory, is called ‘‘directed evolution’’ or ‘‘in vitro evolution.’’ The principle of directed evolution at the molecular level was ﬁrst demonstrated by Mills et al. (2) for nucleic acids in 1967. Almost a quarter 353

354

Zhao and Zha

century later, a powerful in vitro evolution scheme termed SELEX (systemmatic evolution of ligands by exponential enrichment) was developed to evolve functional RNA and DNA such as aptamers, ribozymes, and DNA catalysts (3–5). At about the same time, Roberts et al. (6) and Chen and Arnold (7) at Caltech were among the ﬁrst to apply the principle of directed evolution to engineer proteins, a class of biomolecules that are practically much more important than nucleic acids. This, together with the invention of DNA shuﬄing by Stemmer (8) in 1994, catalyzed the rapid growth of the directed evolution ﬁeld. DNA shuﬄing mimics the natural homologous DNA recombination process, allowing the rapid accumulation of beneﬁcial mutations and removal of deleterious mutations. The eﬀectiveness of DNA shuﬄing was ﬁrst demonstrated with mutants of a single gene (8). The subsequent extension of DNA shuﬄing to a family of related genes of diﬀerent origins (family shuﬄing) unleashes the true power of directed evolution (9–11). Directed evolution has also been extended beyond proteins to pathways, viruses, and whole genomes (12–14). The general methodology for directed evolution is shown in Fig. 1. As with natural evolution, genetic diversity is introduced into a target gene/ pathway/genome through random mutagenesis and gene recombination. Functionally improved variants are ﬁrst identiﬁed by a high-throughput screening or selection method and then used as the parents for the next round

Figure 1

The general scheme of directed evolution.

Evolutionary Methods for Protein Engineering

355

of evolution. The same process will be repeated until the goal is achieved, or no further improvement is possible. Directed evolution has emerged as a powerful approach for protein engineering and studies of protein structure–function relationship. Proteins in nature have evolved, through selective pressure, to speciﬁc biological tasks. They are generally not optimized for speciﬁc, nonnatural applications. For example, as chemical catalysts, naturally occurring enzymes often have problems such as low stability, lack of speciﬁcity, and low catalytic eﬃciency. Similarly, as therapeutic agents, natural human proteins often have limitations such as low eﬃcacy, low stability, and low selectivity. As a result, naturally occurring proteins often need to be tailored in the laboratory. A structure-based rational protein design approach was developed for protein engineering. However, because of our current limited knowledge about protein folding, structure, function, and dynamics, this approach is rather time-consuming and unreliable. In comparison, directed evolution does not require any mechanistic and structural information, and can solve a complex protein design problem within a very short period of time. Because it avoids preconceived ideas about what is important, directed evolution can often discover many unexpected solutions to a protein design problem. Directed evolution, in combination with other biochemical and biophysical methods such as kinetic or thermodynamic analysis and x-ray crystallography, has the potential to provide many new insights into protein structure–function relationships that would otherwise be very diﬃcult to obtain. To date, directed evolution is ﬂourishing worldwide in both the academia and industry. A number of biotechnology companies including Maxygen, Diversa, Enchira Biotechnology, Applied Molecular Evolution, Phylos, KAIROS Scientiﬁc, Proteus (France), and Isogenica (UK) were started within the past 5 years to explore the commercial potential of directed evolution technologies in a plethora of areas including vaccines, therapeutic proteins, diagnostics, gene therapy, agriculture, chemicals, detergents, and food processing. In addition, an increasing number of academical laboratories are using directed evolution as an indispensable tool for both applied and basic research. The number of publications in the past decade increases exponentially (Fig. 2), and this growing trend is expected to continue for the next 5–10 years. The methods and applications of directed evolution in various research areas have been covered in many recent reviews and books (5,15–19). This review will focus on the methods for diversity generation in directed evolution. The methods for searching the library of variants using highthroughput screening, or in vitro or in vivo selection, which is the other component of directed evolution, will be discussed in Sec. 2. The application of directed evolution to functional nucleic acids will also not be covered in

356

Zhao and Zha

Figure 2 The number of journal-based publications in the directed evolution ﬁeld within the past decade. Data were collected from ISI Web of Science (http://www. webofscience.com) using keywords search. Every publication has been conﬁrmed to be relevant to directed evolution based on its abstract or full text.

this review, and interested readers are referred to a number of recent reviews (5,20,21). 2

METHODS FOR DIVERSITY GENERATION

Evolutionary changes at the molecular level in nature are dynamic processes, which are the origin of molecular diversity. Such processes include gene duplication, shuﬄing of DNA (exon shuﬄing), random mutation, transposition, gene recombination, and gene conversion (22). These processes have created and are creating a stunning variety of living organisms, cell types, and biological molecules existing in the world, each with its own highly specialized talents. However, these in vivo mechanisms operate at very low eﬃciency, eliciting insigniﬁcant changes of gene structures or functions even after millions of years. For example, random changes (neutral

Evolutionary Methods for Protein Engineering

357

substitutions) of one residue only occur at a rate of roughly one per 108 years, and highly conserved residues occur at a rate of less than one per 1011 years (23). Thus, to harness the power of natural evolution, this simple diversiﬁcation–selection scenario must occur very rapidly, preferably in the order of weeks or days. As a consequence, the technology development of directed evolution is two-pronged: rapid generation of functionally rich diversity at the molecular level and rapid identiﬁcation of the best among a library of variants. Two main natural evolutionary processes have been mimicked so far by directed evolution to create molecular diversity: random mutation and gene recombination. Random mutation refers to the errors occurring during DNA replication. Random mutation can be classiﬁed into four types: (a) substitutions (the replacement of one nucleotide by another), (b) deletions (the removal of one or more nucleotides), (c) insertions (the addition of one or more nucleotides), and (d) inversions (the rotation by 180j of a doublestranded DNA segment comprising two or more base pairs) (22). The ﬁrst three types of random mutation have been successfully mimicked in the laboratory. Gene recombination is the reassortment of a series of nucleotides along a nucleic acid molecule, usually of double-stranded DNA and, in exceptional cases, also of RNA. Genetic recombination can also be classiﬁed into four types: (a) homologous recombination (recombination occurring between two homologous genes), (b) illegitimate recombination (recombination occurring between two DNA duplexes with little or no DNA homology), (c) reciprocal recombination (symmetrical exchange between two DNA double helices), and (d) site-speciﬁc recombination (recombination occurring at highly preferred sites) (24). So far, three types of recombination including the ﬁrst two types and the last one have been successfully mimicked in the laboratory. The main diﬀerences between random mutagenesis methods and gene recombination methods are illustrated in Fig. 3. Brieﬂy, random mutagenesis starts from a single parent gene and introduces new nucleotide substitutions randomly in the progeny genes, or inserts/deletes one or more nucleotides at random positions in the progeny genes. In comparison, gene recombination usually starts from a pool of variants from a single gene, or a pool of closely related parent genes of diﬀerent origins and creates a blockwise exchange of sequence information among the parent genes. The resulting progeny genes are essentially chimeric products. Due to the use of DNA polymerases that do not have 100% ﬁdelity in replication, all recombination methods are expected to introduce new point mutations as well. As listed in Fig. 3, for each of these two key evolutionary processes, a number of gene diversiﬁcation methods have been developed, which will be discussed in details below.

358

Zhao and Zha

Figure 3 A comparison between random mutagenesis methods and gene recombination methods. Random mutagenesis methods create a library of variants containing point mutations or insertions/deletions from a single parent gene, whereas gene recombination methods create a library of variants containing existing mutations from a pool of parental genes. Listed are the methods that have been developed so far.

2.1

Random Mutagenesis Methods

Random mutagenesis by point substitutions is the simplest way to create molecular diversity. Eﬀective methods to introduce point mutations over the whole length of a target gene include chemical mutagens (25), UV radiation (26), mutator strains (27), and error-prone polymerase chain reaction (PCR) (28,29). Among these methods, error-prone PCR, based on inaccurate copying by DNA polymerases, is the most widely used method. This method is very simple, robust, and eﬃcient, and, most importantly, the mutation rate can be easily and precisely controlled by adjusting a single reaction parameter—the manganese chloride concentration (30). However, errorprone PCR is not without limitations. First, the number of accessible amino

Evolutionary Methods for Protein Engineering

359

acid substitutions at each residue position is only about six amino acids on average because multiple nucleotide substitutions within a single codon are rare and the genetic code is degenerate. Second, the type of mutation is biased to some extent. Transitions (substitutions between two purines or between two pyrimidines) occur much more frequently than transversions (substitutions between a purine and a pyrimidine), and mutations at AT base pairs occur much more frequently than mutations at GC base pairs. Also, the amino acid substitutions tend to be conservative. The new amino acids often have similar physicochemical properties as the original residue. Nonetheless, it is noteworthy that these limitations are also more or less true for other abovementioned random point mutagenesis methods. To address these limitations, several saturation mutagenesis-based methods have been developed, such as combinatorial cassette mutagenesis (31,32), recursive ensemble mutagenesis (33), scanning saturation mutagenesis (34), and codon cassette mutagenesis (35). Saturation mutagenesis entails the creation of all possible amino acids at any predetermined single residues or regions (continuous series of residues) of a protein. The target residues or regions presumed to be important for a speciﬁc protein function are often identiﬁed either from detailed structural and functional analysis (32), or from random point mutagenesis experiments (36). Typically, the randomized codons are introduced to a gene of interest using synthetic oligonucleotidedirected PCR-based methods, or mutagenic cassette-directed restriction– ligation-based methods. Codon randomization is achieved with degenerate codons in the solid-phase synthesis of oligonucleotides using mononucleotide, dinucleotide, or trinucleotide units (31–33,37,38), or a set of universal mutagenic cassettes (35). By focusing on a few residues or regions, the library size to be screened may be reduced signiﬁcantly. However, identifying the target residues or regions remains an overwhelming challenge. The fact that many beneﬁcial mutations are found far away from the active site and throughout the whole protein, as evidenced in many directed evolution experiments, highlights our limited ability to predict the critical sites for mutagenesis. On the other hand, these methods require the construction of many libraries to scan the whole protein and thus are very time-consuming and laborious. Random mutagenesis by insertions and deletions is another eﬀective way to create molecular diversity. Unlike random point mutagenesis, random insertion/deletion will alter the full length of the target gene. Consequently, it opens a new sequence space that is inaccessible by random point mutagenesis, yet functionally rich. As demonstrated in the random elongation mutagenesis of catalase I, the addition of peptide tails with random sequences to the Cterminus of the enzyme resulted in a library of mutants with its diversity in activity and thermostability equal to or higher than that obtained after random point mutagenesis (39). Unfortunately, this random elongation

360

Zhao and Zha

mutagenesis method is limited to the addition of a random peptide sequence at the C-terminus of a protein. In comparison, a transposon-based linker insertion mutagenesis method, peptapeptide scanning mutagenesis, can randomly insert a variable ﬁve-amino-acid cassette at any site in a target protein (40). This method relies on the random insertion of transposon Tn4430 and the subsequent in vitro deletion of the bulk of the transposon after which a 15bp insertion remains within the target gene. The limitations of this method include the ﬁxed length of the insertion peptide and no random deletions of nucleotides. Recently, a general and more versatile random insertion and deletion mutagenesis method has been developed (41). This method enables the deletion of an arbitrary number of consecutive bases up to 16 bases at random positions and, at the same time, the insertion of a speciﬁc sequence or random sequences of an arbitrary number of bases into the same position. In the case of equal numbers of bases deleted and inserted, this method can serve as a general codon-level random point mutagenesis method to introduce all possible single mutations along the entire range of a protein. However, this method is technically demanding, very time-consuming, requires a large amount of DNA as the templates, and diﬃcult to iterate. Because most mutations are deleterious or neutral, and beneﬁcial mutations are rare, a low mutation rate (one or two amino acid substitutions per protein) is typically employed in most random mutagenesis-based directed evolution strategies. Only in a few reports was a high mutation rate used to isolate functionally improved proteins (42–44), which requires the screening of very large libraries of variants. In the case that screening or selection is based on a combination of more than one protein property, the possibility of ﬁnding a functionally improved variant is extremely low (18). Due to its low mutation rate, random mutagenesis can only access a very small sequence space and cannot discover the synergetic eﬀects of mutations sometimes needed for the creation of a new protein function. In addition, in each round of directed evolution, only the best mutant will be selected as the parent for the subsequent round of evolution. Other useful variants will be discarded and their beneﬁcial mutations must be rediscovered to become accumulated in later rounds of evolution. Furthermore, deleterious mutations cannot be removed from the evolved variants, which may limit the evolutionary potential (16). Thus, the accumulation of beneﬁcial mutations through iterative random mutagenesis, coupled with screening/selection, is very slow and not economical, representing an ‘‘asexual’’ evolutionary process. 2.2

Gene Recombination Methods

In contrast to random mutagenesis, gene recombination is a ‘‘sexual’’ evolutionary process. The key advantage of gene recombination is its capability to accumulate beneﬁcial mutations while simultaneously removing deleterious

Evolutionary Methods for Protein Engineering

361

mutations. In nature, in vivo gene recombination plays a very important role in the survival and evolution of living organisms because it can repair damaged genes and increase genetic variation of a population through combining diﬀerent variants (45). Computational simulation studies have underscored the importance of recombination in the evolution of biological systems (46). Various approaches, both in vitro and in vivo, have been developed to mimic nature’s recombination strategy including homologous and nonhomologous recombinations. Generally speaking, in vitro recombination methods oﬀer much higher recombination eﬃciencies and greater experimental ﬂexibility than in vivo approaches. Table 1 summarizes the pros and cons of each in vitro gene recombination method. In the DNA shuﬄing method (8,47), parental genes from a pool of selected variants are ﬁrst randomly fragmented and the puriﬁed fragments are then reassembled into full-length gene products by repeated cycles of overlap extension reaction. Recombinogenic events occur when fragments derived from diﬀerent parental genes primer one another. Largely due to the use of low-ﬁdelity Taq polymerase, the original DNA shuﬄing method is frequently associated with a high level of random point mutations, especially when small fragments are used for reassembly (f0.7% mutagenesis rate for fragments less than 50 bp). However, the associated mutagenesis rate can be reduced to as low as 0.02% with the choice of high-ﬁdelity DNA polymerase and appropriate reaction condition (48). Such a high-ﬁdelity DNA shuﬄing method can be used in protein function–structure relationship studies, such as distinguishing functional mutations from nonfunctional mutations among evolutionarily related proteins (49). It is notable that a modiﬁed version of DNA shuﬄing, random priming recombination (RPR) (50), in which the DNA fragments to be reassembled are prepared using random short primers instead of DNase I digestion, may overcome the biased diversity introduced by nonrandom fragmentation associated with the DNase I enzyme. A second technically and conceptually novel in vitro homologous gene recombination method is the staggered extension process (StEP) method developed by Zhao et al. (51). Instead of being fragmented as in DNA shuﬄing, the full-length parental genes in the StEP method are used as the templates for the synthesis of chimeric gene products. One or more primers are added as ‘‘seeds’’ to grow new genes using repeated cycles of denaturation and extremely short annealing/extension steps. Recombinogenic events occur when the growing fragments anneal to diﬀerent templates (template-switching events) based on sequence complementarity. Because this method does not require the digestion and puriﬁcation of DNA fragments, it is technically simple and can be carried out in single tube. It is noteworthy that this StEP method is somewhat similar to the process that retroviruses like HIV virus use to evolve their genomes (52).

362 Table 1

Zhao and Zha In Vitro Gene Recombination Methods

Method

Pros

DNA shuﬄing

Robust Flexible (singlesequence shuﬄing or family shuﬄing, spiked oligonucleotides)

StEP

Single-tube reaction Relatively simple protocol Low associated point mutation rate

RACHITT

High crossovers per gene Recombine genes with low sequence homology 100% chimeric variants in the shuﬄed library mRNA or ssDNA can be used directly as templates Able to recombine short DNA sequences Needs only a small amount of parent genes as templates No sequence bias in fragments Very low background of parent genes in the shuﬄed library Sometimes able to recombine genes that DNA shuﬄing cannot recombine Relatively low background of parent genes in the shuﬄed library.

Random primer recombination

Family shuﬄing with restriction enzyme digestion

Family shuﬄing with ssDNA

Cons

Reference

Fragment digestion by DNase I introduces sequence bias Relatively low crossovers per gene Relies on high sequence homology High background of parent genes in the shuﬄed library Relies on very high sequence homology Relatively low crossovers per gene High background of parent genes in the shuﬄed library Technically demanding and laborious Requires preparation of ssDNA template and ssDNA fragments

8

51

53

Relies on high sequence homology Relatively low crossovers per chimeric gene High background of parent genes in the shuﬄed library

50

Limited number of crossovers per gene

54

Requires additional steps to prepare ssDNA

55

Evolutionary Methods for Protein Engineering Table 1

363

Continued

Method

Pros

DOGS

Recombine genes with low sequence homology Low background of parent genes in the shuﬄed library Shuﬄing of particular gene segments can be adjusted by altering the segment input ratios Recombine genes with low sequence homology

RM-PCR

ITCHY

Recombine genes with no sequence homology

THIO-ITCHY

Same pros as for ITCHY More eﬃcient and easier than ITCHY

SCRATCHY

Recombine genes with no sequence homology Multiple crossovers per chimeric gene Recombine genes with no sequence homology Sequence alignment of the parent genes is well maintained Crossovers occur mostly at structurally related sites Recombine genes with no sequence homology Maintain structural integrity of domains High percentage of folded proteins

SHIPREC

Exon shuﬄing

Cons

Reference

Relatively low crossovers per chimerical gene

56

Limited diversity Detailed knowledge is needed

62

Only single crossover per chimerical gene Limited to two parent genes Selection is required Same cons as for ITCHY dNTP analogs may be problematic in the subsequent steps Sequence bias in the location and number of crossovers Selection is required

57

Same cons as for ITCHY May introduce deletions or insertions at junctions

60

Limited diversity Detailed knowledge is needed

61

58

59

The ﬁrst eight methods are in vitro homologous gene recombination methods, whereas the last ﬁve methods are in vitro nonhomologous gene recombination methods. Advantages and disadvantages of these methods are highlighted.

364

Zhao and Zha

A third technically and conceptually novel in vitro homologous gene recombination method is the random chimeragenesis on transient templates (RACHITT) method (53). Unlike DNA shuﬄing and the StEP method, this method does not use thermocycling, overlap extension, or staggered extension to create chimeric genes, but rather relies on the ordering, trimming, and joining of randomly cleaved single-stranded parental gene fragments annealed onto a transient uracil-containing full-length single-stranded template prepared from one of the parental genes. Recombinogenic events occur when the fragments from diﬀerent parental genes anneal to the same template. The template is eventually removed from the single-stranded reassembled gene products by uracil–DNA–glycosylase treatment, and the single-stranded reassembled gene products are rendered double-stranded by conventional PCR. Compared to other existing gene recombination methods, this method has the advantages of higher crossover rates per chimeric gene product and 100% chimeric gene products. However, this method is technically demanding and requires the preparation of single-stranded gene fragments and template DNA. In addition to creating genetic diversity from a pool of variants generated from a single parental gene, in vitro gene recombination methods can also be used to create functionally diverse libraries of variants from naturally occurring homologous genes. The process of recombining naturally occurring homologous genes is called ‘‘family shuﬄing,’’ which might be carried out by any of the three in vitro gene recombination methods described above. The power of family shuﬄing may arise from its ability to sparsely sample a larger portion of sequence space that is functionally rich because the parental genes have been selected in nature to be functional and useful. The power of DNA shuﬄing-based family shuﬄing was ﬁrst demonstrated using four cephalosporinases with 57–82% sequence identity (9). After one round of family shuﬄing, an evolved variant conferring 270-fold to 540-fold greater resistance to moxalactam than did the best parent was isolated from a library of 50,000 variants. This improvement was 50-fold greater than that achieved by single-sequence DNA shuﬄing of any of the four parental genes separately. Two other excellent examples of DNA shuﬄing-based family shuﬄing include the shuﬄing of 26 subtilisin genes with pairwise sequence identity as low as 56.4%, and the shuﬄing of more than 20 human a-interferons that shared sequence identity of 85–95% (10,11). Although DNA shuﬄing-based family shuﬄing is technically robust, the main limitations of this method include its low recombination eﬃciency (typically one to four crossovers per chimeric gene product and a large fraction of parental genes in the library) and its requirement for relatively high sequence homology among the parental genes. To increase the recombination eﬃciency, several modiﬁcations were made to the original DNA shuﬄing

Evolutionary Methods for Protein Engineering

365

protocol. For example, to increase the fraction of recombined products between two highly homologous genes encoding catechol 2,3-dioxygenases, restriction enzymes were used to prepare gene fragments (54), or single-stranded DNA templates were used in the gene fragmentation step (55). Alternatively, degenerate primers could be used to create gene fragments from a family of genes as described in the degenerate oligonucleotide gene shuﬄing (DOGS) method (56). Another method shown to increase the recombination eﬃciency for family shuﬄing is the RACHITT method described above. When two dszC genes encoding dibenzothiophene monooxygenase with 89.9% sequence identity were shuﬄed to create a library of variants, an average of 14 crossovers per gene variant was obtained with RACHITT compared to typically one to four crossovers with other gene recombination methods. Moreover, all the gene variants were recombined products. In contrast to homologous gene recombination, nonhomologous gene recombination allows the creation of chimeric genes from parental genes sharing low, or even no discernible sequence homology. A few nonhomologous recombination methods were recently developed to explore the sequence space in bigger steps. Ostermeier et al. (57) described an approach, incremental truncation for the creation of hybrid enzymes (ITCHY), for the creation of a library of variants from two homologous glycinamide ribonucleotide formyltransferase genes with 50% sequence identity. This method entails carefully controlled exonuclease digestion (incremental truncation) of the two parental genes followed by blunt-end ligation of the two truncated parental genes of variable lengths. Although this ITCHY method is eﬀective in creating functional hybrid enzymes, the exonuclease digestion step was laborious and diﬃcult to control. To address this limitation, a modiﬁed method, THIO-ITCHY, was developed in which nucleotide triphosphate analogs were randomly incorporated into the parental genes (58). Because nucleotide triphosphate analogs can protect the DNA from exonuclease digestion, subsequent nuclease treatment will result in the generation of desired variations in truncation. Unfortunately, the main drawbacks of these two methods are that only two parental genes can be joined at one time and each chimeric progeny gene contains only one crossover. The ﬁrst drawback is diﬃcult to address, whereas the second drawback could be addressed by combining ITCHY and DNA shuﬄing (socalled SCRATCHY method) (59). Sieber et al. (60) developed an alternative approach, sequence homology-independent protein recombination (SHIPREC), that can create libraries of single-crossover chimeric variants of unrelated or distantly related proteins. Unlike ITCHY, SHIPREC entails the truncation of the two parental genes that are linked by a cleavable sequence using DNase I followed by fragment size selection and blunt-end ligation. This method

366

Zhao and Zha

maintains the proper sequence alignment between the parental genes and introduces crossovers mainly at structurally related sites distributed over the aligned sequences. Iterative cycles of SHIPREC should be able to generate variants with multiple crossovers. Another nonhomologous gene recombination method is in vitro exon shuﬄing (61). In this method, exons or combinations of exons that encode protein domains are ampliﬁed using mixtures of chimeric oligonucleotides. Mixtures of these PCR fragments are then combinatorially assembled into full-length genes based on numerous self-priming overlap extension reactions. Recombination occurs when an exon from one gene is connected to an exon from a diﬀerent gene. The exons to be shuﬄed are determined based on structural homology, not sequence homology. Technically speaking, this method is similar to the DOGS method described above, and the random multirecombinant PCR (RM-PCR) method (62). All of them rely on the design of multiple deﬁned primers to synthesize the gene fragments to be shuﬄed. The majority of the gene recombination methods that have been developed so far are performed in vitro. Only a few methods explore the in vivo recombination mechanisms to create libraries of chimeric genes. One of the most widely used in vivo recombination mechanisms is based on yeast homologous recombination, in which the co-transformation of a linearized plasmid and a linear partially overlapping target gene yields a functional circularized plasmid (63,64). Although the in vivo gene recombination methods based on yeast homologous recombination are technically simple and the recombination eﬃciency between the linear plasmid and the linear target genes is very high, the recombination eﬃciency among the target genes is very low. To address this problem, a method termed combinatorial libraries enhanced by recombination in yeast (CLERY) was recently developed (65). This method essentially combines in vitro DNA shuﬄing and in vivo homologous recombination in yeast, leading to the creation of a library of chimeric variants with low levels of parental genes. Another in vivo recombination mechanism is based on the DNA mismatch repair system in Escherichia coli. Volkov et al. (66) described a method, random chimeragenesis by heteroduplex recombination, for in vivo gene recombination. This method relies on the DNA mismatch repair system to repair regions of nonidentity in the heteroduplex formed among diﬀerent parental genes. Because it does not require PCR ampliﬁcation, this method should be useful for recombining large DNA sequences. 3

BEYOND ENGINEERING OF SINGLE PROTEINS

Directed evolution has been very successful in engineering proteins with altered substrate speciﬁcity, improved activity, selectivity, aﬃnity, thermo-

Evolutionary Methods for Protein Engineering

367

stability, solvent stability, and protein folding (15,67), or in engineering functional nucleic acids (5). As a natural extension, directed evolution has also been successfully used to design more complex biological systems such as metabolic pathways, viruses, and whole genomes. Metabolic pathways are multienzyme complexes that involve multistep chemical transformations within living organisms. Many pathways have been successfully engineered to produce an amazing diversity of chemical compounds such as aromatics, carbohydrates, organic acids, alcohols, and secondary metabolites. Just as in structure-based protein engineering, most strategies of metabolic engineering are rational, relying on a detailed knowledge of the pathway of interest. These rational design strategies include enhancing the desired metabolic ﬂux by overexpressing the ratelimiting enzymes, or deleting the genes encoding competing pathways, as well as creating new pathways by introducing heterologous genes. Because of our limited knowledge on the complexity of biological systems, these strategies have met only limited success (68). As a combinatorial protein engineering approach, directed evolution complements these strategies of metabolic engineering very well. The key advantage of directed evolution is that it can optimize not only the performance of individual enzymes but also the complex interactions among individual enzymes. Crameri et al. (12) demonstrated the productivity of this approach by improving the arsernate resistance capability of the Staphylococcus aureus arsenate resistance operon in E. coli. After three rounds of DNA shuﬄing and screening, an evolved variant of the operon conferred 40-fold higher resistance to arsenate. It was found that most of the mutations were clustered in the gene encoding the arsenite eﬄux pump and the original episomal plasmid was also integrated into the chromosome. These unexpected genetic solutions highlighted the power of directed evolution and our limited ability to rationally engineer pathways. Schmidt-Dannert et al. (69) described a new strategy to create novel biosynthetic pathways by ﬁrst assembling heterologous genes from diﬀerent microorganisms and then optimizing the synthesized pathway by directed evolution. As a demonstration, they shuﬄed two phytoene desaturases (encoded by crtI) and two lycopene cyclases (encoded by crtY), respectively, in the context of a carotenoid biosynthetic pathway assembled from diﬀerent bacterial species, which resulted in the production of novel carotenoids. Retroviruses have been used as gene delivery vehicles for gene therapy. Unfortunately, despite decades of eﬀorts, the currently available viral vectors still have some poor clinical properties such as low stability, processing yield, speciﬁcity, and tissue-type or cell-type tropism. Recently, family shuﬄingbased directed evolution has demonstrated to be an eﬀective approach to overcome these limitations. For example, Soong et al. (13) shuﬄed six envelope genes from diﬀerent ecotropic murine leukemia virus strains and

368

Zhao and Zha

obtained one chimeric envelope gene with new tropism for Chinese hamster ovary (CHO) K1 cells. The same library of chimeric envelope gene variants was submitted to a subsequent selection for improved stability and processing yield under downstream processing conditions, which resulted in a few evolved mutant viruses exhibiting 30-fold to 100-fold improvement in their viral titer compared to that of the parental viruses (70). A few industrially important microorganisms have been used to produce a variety of microbial products, ranging from antibiotics to chemicals, for decades. The performance of these microorganisms is usually improved through classical strain improvement consisting of random mutation and selection, which is very laborious and time-consuming. Recently, a whole genome shuﬄing method has been developed to rapidly evolve a whole microbial genome toward one speciﬁc direction (14). Based on in vivo homologous recombination, this method involves multiple cycles of fusions of a mixed protoplast population. Two rounds of genome shuﬄing of Streptomyces fradiae within a few months resulted in a new strain with its performance equivalent to that of the strain SF21 engineered through 20 rounds of classical strain improvement within 20 years. The same method of genome shuﬄing has also been successfully used to improve the acid tolerance of a poorly characterized industrial strain, Lactobacillus (71). These results demonstrate the striking ability of genome shuﬄing to rapidly manipulate the complex phenotypes of whole cells and organisms. 4

CONCLUSIONS AND FUTURE PROSPECTS

Within the past decade, directed evolution has become a stand-alone research ﬁeld and is on the verge of exponential growth. As a research tool, directed evolution has proven itself to be very powerful for the rapid manipulation of proteins, nucleic acids, metabolic pathways, viruses, and whole genomes, and furthers our fundamental understanding of protein structure–function relationships, protein–protein interactions, protein– DNA interactions, and protein–small molecule interactions. The rapid progress that this ﬁeld has enjoyed is, to a large extent, the result of the development of various evolutionary methods. However, our available tools for gene diversity generation are still limited compared to nature’s manifold evolutionary toolbox. Future advances in directed evolution will certainly entail the development of more sophisticated evolutionary methods that better mimic nature’s various evolutionary processes. For example, a family shuﬄing method that can create libraries of chimeric variants with randomly distributed multiple crossovers at a high frequency from a pool of distantly related genes is yet to be developed (15). Directed evolution-based protein engineering is complementary to structure-based protein design. For the foreseeable future, evolutionary

Evolutionary Methods for Protein Engineering

369

methods seem to be the most fertile approach for protein engineering. However, with the signiﬁcant improvements of NMR and x-ray crystallography techniques as well as computer computing power in recent years, the capabilities of rational design are rapidly expanding, too. Because directed evolution is best at hill climbing in the sequence landscape (ﬁne-tuning the protein functions) whereas rational design is best at sketching the sequence landscape (creating novel but poor protein functions), an engineering approach that combines the best of two methods will probably represent the most powerful approach for protein engineering in the future. Similar to what directed evolution has done in protein engineering, the extension of direction evolution methods to pathway engineering and genome engineering will not only lead to the development of more complex biological processes and products, but will also unlock the secrets of the complexity of biological systems. Whole genome shuﬄing provides a very eﬀective tool to manipulate the complex biological systems. The combination of whole genome shuﬄing with other systems biology approaches such as genomics, proteomics, and bioinformatics promises to be a powerful tool for various postgenomic investigations. The potential of directed evolution is limitless and the best is yet to come.

REFERENCES 1.

2.

3.

4. 5. 6.

7.

8.

C Darwin. On the Origin of Species by Means of Natural Selection, or the Preservation of Favored Races in the Struggle for Life. London: John Murray, 1859. DR Mills, RL Peterson, S Spiegelman. An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc Natl Acad Sci USA 58:217– 224, 1967. D Irvine, C Tuerk, L Gold. SELEXION systematic evolution of ligands by exponential enrichment with integrated optimization by non-linear analysis. J Mol Biol 222:739–761, 1991. AA Beaudry, GF Joyce. Directed evolution of an RNA enzyme. Science 257:635–641, 1992. RW Roberts, WW Ja. In vitro selection of nucleic acids and proteins: what are we learning? Curr Opin Struct Biol 9:521–529, 1999. BL Roberts, W Markland, AC Ley, RB Kent, DW White, SK Guterman, RC Ladner. Directed evolution of a protein: selection of potent neutrophil elastase inhibitors displayed on M13 fusion phage. Proc Natl Acad Sci USA 89:2429– 2433, 1992. K Chen, FH Arnold. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proc Natl Acad Sci USA 90:5618–5622, 1993. WP Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature 370:389–391, 1994.

370

Zhao and Zha

9.

A Crameri, SA Raillard, E Bermudez, WP Stemmer. DNA shuﬄing of a family of genes from diverse species accelerates directed evolution. Nature 391:288– 291, 1998. JE Ness, M Welch, L Giver, M Bueno, JR Cherry, TV Borchert, WP Stemmer, J Minshull. DNA shuﬄing of subgenomic sequences of subtilisin. Nat Biotechnol 17:893–896, 1999. CC Chang, TT Chen, BW Cox, GN Dawes, WP Stemmer, J Punnonen, PA Patten. Evolution of a cytokine using DNA family shuﬄing. Nat Biotechnol 17:793–797, 1999. A Crameri, G Dawes, E Rodriguez Jr, S Silver, WP Stemmer. Molecular evolution of an arsenate detoxiﬁcation pathway by DNA shuﬄing. Nat Biotechnol 15:436–438, 1997. NW Soong, L Nomura, K Pekrun, M Reed, L Sheppard, G Dawes, WP Stemmer. Molecular breeding of viruses. Nat Genet 25:436–439, 2000. YX Zhang, K Perry, VA Vinci, K Powell, WP Stemmer, SB del Cardayre. Genome shuﬄing leads to rapid phenotypic improvement in bacteria. Nature 415:644–646, 2002. C Schmidt-Dannert. Directed evolution of single proteins, metabolic pathways, and viruses. Biochemistry 40:13125–13136, 2001. FH Arnold. Design by directed evolution. Acc Chem Res 31:125–131, 1998. H Zhao, K Chockalingam, Z Chen. Directed evolution of enzymes and pathways for industrial biocatalysts. Curr Opin Biotechnol 13:104–110, 2002. AL Kurtzman, S Govindarajan, K Vahle, JT Jones, V Heinrichs, PA Patten. Advances in directed protein evolution by recursive genetic recombination: applications to therapeutic proteins. Curr Opin Biotechnol 12:361–370, 2001. FH Arnold, ed. Evolutionary Approaches to Protein Design Advanced Protein Chemistry. Vol. 55. Academic Press, 2001. DS Wilson, JW Szostak. In vitro selection of functional nucleic acids. Annu Rev Biochem 68:611–647, 1999. A Jaschke, B Seelig. Evolution of DNA and RNA as catalysts for chemical reactions. Curr Opin Chem Biol 4:257–262, 2000. W-H Li, D Graur. Fundamentals of Molecular Evolution. Sunderland, MA: Sinauer Associates, Inc., 1991. B Robson, J Garnier. Introduction to Proteins and Protein Engineering. Amsterdam: Elsevier, 1986, p 321. FE Wurgler. Recombination and gene conversion. Mutat Res 284:3–14, 1992. RM Myers, LS Lerman, T Maniatis. A general method for saturation mutagenesis of cloned DNA fragments. Science 229:242–247, 1985. D Botstein, D Shortle. Strategies and applications of in vitro mutagenesis. Science 229:1193–1201, 1985. A Greener, M Callahan, B Jerpseth. An eﬃcient random mutagenesis technique using an E. coli mutator strain. Mol Biotechnol 7:189–195, 1997. DW Leung, E Chen, DV Goeddel. A method for random mutagenesis of a deﬁned DNA segment using a modiﬁed polymerase chain reaction. Technique 1:11–15, 1989.

10.

11.

12.

13. 14.

15. 16. 17. 18.

19. 20. 21. 22. 23. 24. 25. 26. 27. 28.

Evolutionary Methods for Protein Engineering

371

29. RC Cadwell, GF Joyce. Randomization of genes by PCR mutagenesis. PCR Methods Appl 2:28–33, 1992. 30. H Zhao, JC Moore, AA Volkov, FH Arnold. Methods for optimizing industrial enzymes by directed evolution. In: AL Demain, JE Davies, eds. Manual of Industrial Microbiology and Biotechnology. 2nd ed. Washington, DC: ASM Press, 1999, pp 597–604. 31. JA Wells, M Vasser, DB Powers. Cassette mutagenesis: an eﬃcient method for generation of multiple mutations at deﬁned sites. Gene 34:315–323, 1985. 32. JF Reidhaar-Olson, RT Sauer. Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241:53–57, 1988. 33. S Delagrave, ER Goldman, DC Youvan. Recursive ensemble mutagenesis. Protein Eng 6:327–331, 1993. 34. G Chen, I Dubrawsky, P Mendez, G Georgiou, BL Iverson. In vitro scanning saturation mutagenesis of all the speciﬁcity determining residues in an antibody binding site. Protein Eng 12:349–356, 1999. 35. DM Kegler-Ebo, CM Docktor, D DiMaio. Codon cassette mutagenesis: a general method to insert or replace individual codons by using universal mutagenic cassettes. Nucleic Acids Res 22:1593–1599, 1994. 36. K Miyazaki, FH Arnold. Exploring nonnatural evolutionary pathways by saturation mutagenesis: rapid improvement of protein function. J Mol Evol 49:716–720, 1999. 37. P Neuner, R Cortese, P Monaci. Codon-based mutagenesis using dimerphosphoramidites. Nucleic Acids Res 26:1223–1227, 1998. 38. J Sondek, D Shortle. A general strategy for random insertion and substitution mutagenesis: substoichiometric coupling of trinucleotide phosphoramidites. Proc Natl Acad Sci USA 89:3581–3585, 1992. 39. T Matsuura, K Miyai, S Trakulnaleamsai, T Yomo, Y Shima, S Miki, K Yamamoto, I Urabe. Evolutionary molecular engineering by random elongation mutagenesis. Nat Biotechnol 17:58–61, 1999. 40. B Hallet, DJ Sherratt, F Hayes. Pentapeptide scanning mutagenesis: random insertion of a variable ﬁve amino acid cassette in a target protein. Nucleic Acids Res 25:1866–1867, 1997. 41. H Murakami, T Hohsaka, M Sisido. Random insertion and deletion of arbitrary number of bases for codon-based random mutation of DNAs. Nat Biotechnol 20:76–81, 2002. 42. FC Christians, LA Loeb. Novel human DNA alkyltransferases obtained by random substitution and genetic selection in bacteria. Proc Natl Acad Sci USA 93:6124–6128, 1996. 43. M Zaccolo, E Gherardi. The eﬀect of high-frequency random mutagenesis on in vitro protein evolution: a study on TEM-1 beta-lactamase. J Mol Biol 285:775–783, 1999. 44. PS Daugherty, G Chen, BL Iverson, G Georgiou. Quantitative analysis of the eﬀect of the mutation frequency on the aﬃnity maturation of single chain Fv antibodies. Proc Natl Acad Sci USA 97:2029–2034, 2000. 45. JF Crow. The Importance of Recombination. The Evolution of Sex: An Examination of Current Ideas. Sunderland, MA: Sinauer Associates, Inc., 1988.

372

Zhao and Zha

46. S Forrest. Genetic algorithms: principles of natural selection applied to computation. Science 261:872–878, 1993. 47. WP Stemmer. DNA shuﬄing by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc Natl Acad Sci USA 91: 10747–10751, 1994. 48. H Zhao, FH Arnold. Optimization of DNA shuﬄing for high ﬁdelity recombination. Nucleic Acids Res 25:1307–1308, 1997. 49. H Zhao, FH Arnold. Functional and nonfunctional mutations distinguished by random recombination of homologous genes. Proc Natl Acad Sci USA 94:7997–8000, 1997. 50. Z Shao, H Zhao, L Giver, FH Arnold. Random-priming in vitro recombination: an eﬀective tool for directed evolution. Nucleic Acids Res 26:681–683, 1998. 51. H Zhao, L Giver, Z Shao, JA Aﬀholter, FH Arnold. Molecular evolution by staggered extension process (StEP) in vitro recombination. Nat Biotechnol 16:258–261, 1998. 52. WS Hu, EH Bowman, KA Delviks, VK Pathak. Homologous recombination occurs in a distinct retroviral subpopulation and exhibits high negative interference. J Virol 71:6028–6036, 1997. 53. WM Coco, WE Levinson, MJ Crist, HJ Hektor, A Darzins, PT Pienkos, CH Squires, DJ Monticello. DNA shuﬄing method for generating highly recombined genes and evolved enzymes. Nat Biotechnol 19:354–359, 2001. 54. M Kikuchi, K Ohnishi, S Harayama. Novel family shuﬄing methods for the in vitro evolution of enzymes. Gene 236:159–167, 1999. 55. M Kikuchi, K Ohnishi, S Harayama. An eﬀective family shuﬄing method using single-stranded DNA. Gene 243:133–137, 2000. 56. MD Gibbs, KM Nevalainen, PL Bergquist. Degenerate oligonucleotide gene shuﬄing (DOGS): a method for enhancing the frequency of recombination with family shuﬄing. Gene 271:13–20, 2001. 57. M Ostermeier, JH Shim, SJ Benkovic. A combinatorial approach to hybrid enzymes independent of DNA homology. Nat Biotechnol 17:1205–1209, 1999. 58. S Lutz, M Ostermeier, SJ Benkovic. Rapid generation of incremental truncation libraries for protein engineering using alpha-phosphothioate nucleotides. Nucleic Acids Res 29:E16, 2001. 59. S Lutz, M Ostermeier, GL Moore, CD Maranas, SJ Benkovic. Creating multiple-crossover DNA libraries independent of sequence identity. Proc Natl Acad Sci USA 98:11248–11253, 2001. 60. V Sieber, CA Martinez, FH Arnold. Libraries of hybrid proteins from distantly related sequences. Nat Biotechnol 19:456–460, 2001. 61. JA Kolkman, WP Stemmer. Directed evolution of proteins by exon shuﬄing. Nat Biotechnol 19:423–428, 2001. 62. T Tsuji, M Onimaru, H Yanagawa. Random multi-recombinant PCR for the construction of combinatorial protein libraries. Nucleic Acids Res 29:e97, 2001. 63. D Pompon, A Nicolas. Protein engineering by cDNA recombination in yeasts: shuﬄing of mammalian cytochrome P-450 functions. Gene 83:15–24, 1989. 64. JR Cherry, MH Lamsa, P Schneider, J Vind, A Svendsen, A Jones, AH

Evolutionary Methods for Protein Engineering

65.

66.

67. 68. 69. 70.

71.

373

Pedersen. Directed evolution of a fungal peroxidase. Nat Biotechnol 17:379– 384, 1999. V Abecassis, D Pompon, G Truan. High eﬃciency family shuﬄing based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2. Nucleic Acids Res 28:e88, 2000. AA Volkov, Z Shao, FH Arnold. Recombination and chimeragenesis by in vitro heteroduplex formation and in vivo repair. Nucleic Acids Res 27:e18, 1999. O Kuchner, FH Arnold. Directed evolution of enzyme catalysts. Trends Biotechnol 15:523–530, 1997. JE Ness, SB Del Cardayre, J Minshull, WP Stemmer. Molecular breeding: the natural approach to protein design. Adv Protein Chem 55:261–292, 2000. C Schmidt-Dannert, D Umeno, FH Arnold. Molecular breeding of carotenoid biosynthetic pathways. Nat Biotechnol 18:750–753, 2000. SK Powell, MA Kaloss, A Pinkstaﬀ, R McKee, I Burimski, M Pensiero, E Otto, WP Stemmer, NW Soong. Breeding of retroviruses by DNA shuﬄing for improved stability and processing yields. Nat Biotechnol 18:1279–1282, 2000. R Patnaik, S Louie, V Gavrilovic, K Perry, WP Stemmer, CM Ryan, S Del Cardayre. Genome shuﬄing of Lactobacillus for improved acid tolerance. Nat Biotechnol 20:707–712, 2002.

16 Directed Evolution by Random Mutagenesis: A Critical Evaluation Thorsten Eggert and Karl-Erich Jaeger ¨t Du ¨sseldorf Heinrich-Heine-Universita ¨lich, Germany Ju

Manfred T. Reetz ¨r Kohlenforschung Max-Planck-Institut fu ¨lheim an der Ruhr, Germany Mu

1

INTRODUCTION

Biocatalysis using whole cells, crude cell extracts, or puriﬁed enzymes has become an important tool in the production of numerous chemical compounds including food additives, agrochemicals, cosmetics, ﬂavors, and, particularly, pharmaceuticals. The steadily increasing demand for these compounds results in a pressing need to identify and isolate novel biocatalysts. Unfortunately, natural biocatalysts often do not meet the requirements of the chemical industries because they have been optimized by natural evolution to catalyze speciﬁc reactions inside living cells. Therefore, molecular biologists have developed a variety of diﬀerent methods that allow the engineering of enzymes for speciﬁc needs. The classical, but time-consuming and cost-intensive, approach includes structure-based predictions to design site-directed mutagenesis experiments and subsequent biochemical characterization of the 375

376

Eggert et al.

resulting enzyme variants. Recently, directed evolution has been introduced as a new and powerful method to optimize the properties of a given biocatalyst without requiring knowledge of its structure, or the catalytic mechanism (1– 6). Biocatalyst properties that have successfully been optimized by directed evolution include substrate speciﬁcity (7), thermal stability (8), and organic solvent resistance (9), but also more sophisticated traits such as cofactor dependence (10) and enantioselectivity (3,11–15). Here, we will summarize the results of our directed evolution experiments carried out to evolve enantioselective lipases. Special emphasis will be given to a critical evaluation of random mutagenesis by error-prone polymerase chain reaction (ep-PCR), a method that is frequently used to generate large libraries of enzyme variants. Alternative strategies suitable to circumvent the problems encountered by ep-PCR will also be presented.

2

DIRECTED EVOLUTION OF BACTERIAL LIPASES

Lipases represent the most important class of enzymes for organic chemistry because they catalyze a wide variety of diﬀerent hydrolysis and synthesis reactions and also work in organic solvents (16–19). Usually, lipases exhibit a high enantioselectivity; however, they are restricted to a few substrates. Therefore, the creation of new lipases with high enantioselectivity toward a predeﬁned substrate is a major challenge that we have tackled by using two diﬀerent bacterial lipases. 2.1

Pseudomonas aeruginosa Lipase

Bacterial lipases belonging to the genus Pseudomonas are known to be useful for several biotechnological applications (20). The lipase from P. aeruginosa catalyzes with high enantioselectivity the hydrolysis and synthesis of several diﬀerent esters using alcohols, amines, and carboxylic acids as substrates (21). However, the hydrolytic kinetic resolution of the chiral ester p-nitrophenyl-2methyldecanoate (Fig. 1A) catalyzed by the wild-type lipase yields an ee of only 5% in favor of the (S)-acid at about 50% conversion (E = 1.1), making this substrate an interesting model compound to evolve an enantioselective P. aeruginosa lipase. The initial strategy was based on random mutagenesis of the lipase gene lipA using ep-PCR with a low mutagenesis frequency (resulting in one amino acid exchange per variant) and screening in microtiter plates by spectrophotometric detection at 410 nm of p-nitrophenol released from the (S)-substrates and (R)-substrates, respectively. In the ﬁrst generation created by ep-PCR, about 12 improved mutants were identiﬁed, the best one resulting in an ee value of 31% (E = 2.1) in the test reaction (Fig. 1A). The process was repeated in a second, third, and fourth round of mutagenesis in the same way,

Directed Evolution by Random Mutagenesis

377

Figure 1 Model reactions used for directed evolution of enantioselective lipases. (A) Hydrolytic kinetic resolution of p-nitrophenyl-2-methyldecanoate catalyzed by P. aeruginosa lipase. (B) Hydrolysis of pseudo-meso-1,4-diacetoxy-cyclopentene catalyzed by B. subtilis lipase.

always using low error rate ep-PCR. A variant catalyzing the reaction with an ee-value of 81% (E = 11.3) was identiﬁed in the fourth generation, which carried four amino acid exchanges (13). A basic problem with this type of directed evolution approach relates to the fact that upon passing from one mutant generation to the next, many diﬀerent ‘‘pathways’’ in protein sequence space are possible. Moreover, it was reasonable to assume that the observed amino acid exchanges imply the correct position but, owing to limitations of ep-PCR (see below and Fig. 3), not necessarily the optimal amino acid. Thus, we applied saturation and site-directed mutagenesis at these ‘‘hot spots’’ of the enzyme. In doing so, it became obvious that the combination of ep-PCR and saturation mutagenesis constitutes an eﬃcient way to explore protein sequence space with respect to enantioselectivity. Indeed, this strategy led to the creation of several highly (S)-selective mutant lipases (ee = 88–91%; E = 20–25) (11). The starting point for further exploration of protein sequence space was based on the assumption that recombining the best variants in combination with high-diversity mutagenesis at previously identiﬁed hot spots will lead to further improvements. Therefore, a special type of DNA shuﬄing, called combinatorial multiple cassette mutagenesis (CMCM) (22), was used in an appropriately modiﬁed form to recombine two previously generated mutants in combination with an oligocassette in which two codons were saturation-mutagenized, representing amino acid positions 155 and 162, which had previously been identiﬁed as

378

Eggert et al.

being important for enantioselectivity of this lipase. Following expression and screening, several enantioselective lipases were found, among them a variant containing six amino acid exchanges (D20N, S53P, S155M, L162G, T180I, and T234S), which shows an unprecedented enantioselectivity of E > 51 (ee>95%) and appears to be particularly active and stable (23). Furthermore, we successfully tried to reverse the sense of enantioselectivity. In order to evolve (R)-selective lipase variants for the model reaction illustrated in Fig. 1A, we used an appropriate combination of ep-PCR at high error rate (resulting in an average of three amino acid exchanges per variant) and DNA shuﬄing, leading to the identiﬁcation of a highly (R)-selective lipase (E = 30). This variant carries 11 amino acid substitutions distributed all over the sequence (M16L, A34T, P86L, T87S, V94A, D113G, S147N, T150A, L208H, V232I, and S237T) (24). The mutagenesis experiments summarized in Fig. 2A not only represent the ﬁrst example of directed evolution of an enantioselective enzyme, they also constitute the most comprehensive study

Figure 2 Directed evolution of enantioselective bacterial lipases. Various directed evolution methods (indicated by diﬀerent shaped bars) were used to evolve highly enantioselective variants of lipases originating from (A) P. aeruginosa and (B) B. subtilis.

Directed Evolution by Random Mutagenesis

379

concerning the exploration of protein sequence space with respect to enantioselectivity of a given enzyme-catalyzed reaction (3,11,13,23–25). 2.2

Bacillus subtilis Lipase

The gram-positive bacterium B. subtilis is frequently used by the biotechnology industry as a host strain to express several industrially important enzymes (26,27). In a diﬀerent and ongoing study, we have chosen the B. subtilis lipase LipA (BSLA), which is composed of 181 amino acids, as the catalyst in the asymmetric hydrolysis of a meso-compound, namely meso-1,4-diacetoxy-2cyclopentene, leading to the formation of enantiomeric alcohols (3). This reaction does not constitute a kinetic resolution and can thus be carried out to 100% conversion. Screening is done by electrospray ionization mass spectrometry (ESI-MS) (28) using the deuterium-labeled pseudo-meso substrate (Fig. 1B). The wild-type enzyme leads to an ee-value of only 38% in favor of the (1R,4S) enantiomer. Following an initial round of ep-PCR-based random mutagenesis, a mutant showing an ee-value of 58% was identiﬁed. This was increased to about 70% ee in the second round (Fig. 2B). Interestingly, no variants with further improved enantioselectivity were found in a thirdgeneration library generated by ep-PCR. Therefore, we have recently applied a new strategy to improve the enantioselectivity of this lipase by using complete saturation mutagenesis at every amino acid position in the sequence (SA Funke, A Eipper, T Eggert, K-E Jaeger, MT Reetz, submitted for publication).

3

RANDOM IS NOT ALWAYS RANDOM: THE BIAS OF EP-PCR MUTAGENESIS

Error-prone polymerase chain reaction represents a standard method to randomly introduce point mutations into a target DNA sequence during a PCR-reaction, which is performed at conditions suboptimal for Taq polymerase to work correctly, mostly achieved by addition of MnCl2 at concentrations of 0.1–0.5 mM or imbalanced dNTP concentrations in the reaction buﬀer (29–31). The Taq polymerase, which originates from the thermophilic bacterium Thermus aquaticus, usually incorporates wrong nucleotides at a low frequency of 0.1–2104 (32,33). This error rate can continuously be increased up to 1–20103 in an ep-PCR experiment. The diversity of an enzyme library generated by ep-PCR is usually calculated by correlating the basepair substitutions introduced per gene to the amino acid exchanges introduced per enzyme molecule (i.e., an average of one to two basepair substitutions usually results in one amino acid exchange). Afterward, the overall size of a variant library can be calculated by a combinatorial algorithm (34), as shown in Table 1. This algorithm is based on the

380

Eggert et al.

assumption that all 19 remaining amino acids can be introduced at a single position (E = 19). Unfortunately, this is not true for the case of ep-PCR because the event of two or even three basepair exchanges per codon is highly unlikely. At best, one nucleotide of a given codon will be exchanged, thereby leading to just nine (instead of 64 possible) diﬀerent codons encoding four to seven (instead of 20) diﬀerent amino acids. In reality, the number of amino acid exchanges to be achieved depends on the type of the original codon, as illustrated in Fig. 3A. Silent mutations (i.e., those that do not result in an amino acid exchange) are more likely for some types of codons (e.g., CGA coding for arginine) than for other types (e.g., AAC coding for asparagine). As an example, we have calculated the real number of enzyme variants to be obtained by ep-PCR with just one mutation introduced per gene by analyzing every single codon of BSLA. The results are shown in Table 2 and indicate that only one third of the theoretical number of variants is experimentally accessible when using low error rate ep-PCR. Comparable results were obtained upon analysis of DNA sequences encoding B. subtilis lipase LipB, P. aeruginosa lipase, and Burkholderia glumae lipase (Table 3). The mutational bias of DNA polymerases poses yet another restriction to ep-PCR, leading to a further lowered diversity of the mutant libraries. In most of the published studies that used Taq polymerase in MnCl2-containing buﬀer, the enzyme preferentially introduced A!T, T!A transversions and

Table 1 Theoretical Number of Enzyme Variants in a Library Obtained for an Enzyme Consisting of 181 Amino Acids (e.g., Lipase A from B. subtilis) with One to Five Amino Acid Exchanges Per Molecule Number of amino acid exchanges (M) 1 2 3 4 5 a

Number of variantsa (N) 3,439 5,880,690 6,666,742,230 5,636,730,555,465 3,791,264,971,605,760

Values calculated with E = 19 using the algorithm: N¼

E M X! ðX M Þ!M!

where N = number of variants at maximum size of diversity; E = number of amino acids exchanged per position; M = total number of amino acid exchanges per enzyme molecule; and X = number of amino acids per enzyme molecule.

Figure 3 Mutational bias of ep-PCR. The substitution of one nucleotide per codon results in nine new triplets which may encode four to seven diﬀerent amino acids depending on the type of codon. (A) The example shows that the mutation of the codon AAC coding for asparagine can yield a maximum of seven diﬀerent amino acids, whereas the mutation of the codon CGA coding for arginine can yield a maximum of four diﬀerent amino acids. (B) Low frequencies of transversions G!T, C!A, G!C, and C!G result in a further decrease of diversity: for codon AAC, six diﬀerent amino acid exchanges may occur, and for the GC-rich codon CGA just a single new amino acid exchange is expected. Background color coding: white shows codons that encode new amino acids, gray indicates silent mutations, or the formation of stop codons, and black shows codons that would require the formation of an unfavored basepair exchange (G!T, C!A, G!C, or C!G). The bold letters indicate nucleotides exchanged by ep-PCR.

382

Eggert et al.

Table 2 Codon Usage (Left Panel), and Theoretical and Actual Numbers of Enzyme Variants to Be Obtained Upon ep-PCR Mutagenesis of B. subtilis Lipase LipA (Right Panel) Codon usage Number Codon 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

gca gcc gcg gcu aga agg cga cgc cgg cgu aac aau gac gau ugc ugu caa cag gaa gag gga ggc ggg ggu cac cau aua auc auu cua cuc cug cuu uua uug aaa aag aug uuc uuu

B. subtilis lipase LipA

Amino Amino acid acid exchangesa Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe

6 6 6 6 5 6 4 6 5 6 7 7 7 7 6 6 6 6 6 6 4 6 5 6 7 7 6 7 7 5 6 5 6 4 5 6 6 6 6 6

Number of codons 3 1 5 3 2 0 1 0 1 1 7 10 3 6 0 0 4 2 3 0 6 11 4 3 4 1 1 2 7 0 1 5 4 5 1 5 6 4 1 3

Maximum number Real number of variantsb of variantsc 57 19 95 57 38 0 19 0 19 19 133 190 57 114 0 0 76 38 57 0 114 209 76 57 76 19 19 38 133 0 19 95 76 95 19 95 114 76 19 57

18 6 30 18 10 0 4 0 5 6 49 70 21 42 0 0 24 12 18 0 24 66 20 18 28 7 6 14 49 0 6 25 24 20 5 30 36 24 6 18

Directed Evolution by Random Mutagenesis Table 2

383

Continued Codon usage

Number Codon 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

cca ccc ccg ccu agc agu uca ucc ucg ucu aca acc acg acu ugg uac uau gua guc gug guu uaa uag uga

B. subtilis lipase LipA

Amino Amino acid acid exchangesa Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp Tyr Tyr Val Val Val Val Stop Stop Stop

6 6 6 6 6 6 4 6 5 6 6 5 6 5 6 6 6 5 6 5 6

Number of codons

Maximum number Real number of variantsb of variantsc

2 0 2 0 6 1 3 1 1 1 6 0 4 0 2 6 3 2 5 3 7

38 0 38 0 114 19 57 19 19 19 114 0 76 0 38 114 57 38 95 57 133

Total number of variants:

3439 100%

12 0 12 0 36 6 12 6 5 6 36 0 24 0 12 36 18 10 30 15 42 1077 31.3%

a Maximum number of amino acid exchanges calculated for each of the naturally occurring 64 codons. b Maximum theoretical number of enzyme variants if each amino acid is replaced by the 19 remaining ones. c Actual number of amino acid exchanges to be obtained for each amino acid (see Fig. 3A).

A!G, T!C transitions. A!C and T!G transversions as well as G!A and C!T transitions were also introduced, but at much lower frequencies. The frequencies of transversions G!T and C!A were very low, and G!C and C!G transversions hardly ever happened (11,35,36). If this mutational bias is also taken into account (Fig. 3B), the calculated library sizes represent only about 20% of the theoretical sizes, with the actual sizes also depending on the GC content of the target gene (see Table 3).

384

Eggert et al.

Table 3 Theoretical and Actual Numbers of Enzyme Variants to Be Obtained Upon ep-PCR Mutagenesis of Diﬀerent Bacterial Lipases Lipases

Number of diﬀerent enzyme variants

Namea (number of aa)

GC content of gene (%)

BSLA (181 aa) BSLB (182 aa) PAL (285 aa) BGL (319 aa)

46 41 65 69

Theoreticalb 3439 3458 5415 6061

(100%) (100%) (100%) (100%)

aa exchanges, experimental-unbiasedc 1077 1083 1696 1883

(31.3%) (31.3%) (31.3%) (31.1%)

aa exchanges, experimental-biasedd 757 790 1068 1165

(22.0%) (22.8%) (19.7%) (19.2%)

a BSLA = B. subtilis lipase A; BSLB = B. subtilis lipase B; PAL=P. aeruginosa lipase; and BGL = Bu. glumae lipase. b The theoretical number of different enzyme variants was calculated using the formula shown in Table 1 assuming that one amino acid exchange occurs per enzyme molecule. c The actual number of amino acid exchanges was calculated using the data for codon usage shown in Table 2 (real number of variants). d The actual number of amino acid exchanges was corrected for the bias in occurrence of base exchanges as shown in Fig. 3B.

Nevertheless, ep-PCR may remain a method of choice for directed evolution experiments; however, screening results should be interpreted more carefully: (a) better variants may be obtained more easily by identifying the optimal amino acid exchange at a predeﬁned position using site-speciﬁc saturation mutagenesis, and (b) mutants showing a dramatic negative eﬀect (e.g., instability, inactivation, and unwanted enantioselectivity) may anyhow indicate a hot spot position at which the ‘‘searched for’’ property can be improved into the favored direction by applying saturation mutagenesis. 4

EFFICIENT METHODS FOR THE CREATION OF A FIRST-GENERATION LIBRARY

Directed evolution experiments comprise iterative cycles of mutation and identiﬁcation of improved variants by screening or selection. If natural diversity (e.g., a family of related genes) is not available, members of the ﬁrst-generation library must provide the starting material for further directed evolution experiments. Therefore, the diversity of the ﬁrst-generation library is very important because some members of this library will parent all following variants. Diﬀerent directed evolution methods may be considered to generate this library. Larger deletions or insertions can be randomly introduced into a gene as shown recently for a haloalkane dehalogenase (37). In principle, it is also conceivable to randomize only a speciﬁc group of amino acids as, for example, the hydrophobic, hydrophilic, acidic, or basic residues. Such an approach has successfully been applied to assess the

Directed Evolution by Random Mutagenesis

385

importance of secondary structural elements for a given biocatalyst (for review, see Ref. 38). However, with this method, the overall diversity of the resulting library, as well as the number of enzymatically active biocatalyst proteins, is low. At present, most researchers construct a ﬁrst-generation library using one out of several diﬀerent nonrecombinative or recombinative methods to introduce point mutations, thereby exchanging single amino acids, with ep-PCR being the most popular method. The drawbacks of all these approaches as outlined above forced eﬀorts to develop alternative methods for isolating a ﬁrst-generation library, two of which will be discussed here in more detail. 4.1

Alanine-Scanning Mutagenesis

This experimental strategy was originally developed to study protein–protein interactions. Single alanine substitutions were introduced at those positions of a protein’s amino acid sequence, which were suspected to be involved in the recognition of potential binding partners. Because substitutions by alanine lead to the removal of side chains potentially involved in the recognition process, functional residues can be identiﬁed by drastic eﬀects such as inactivation of the protein caused by disturbing essential interactions with binding partners. Using this method, speciﬁc side chains in human growth hormone (hGH) that interact with the hGH receptor from human liver cells were identiﬁed (39) and several other functional epitopes were mapped as well (40,41). More recently, novel methods—which signiﬁcantly accelerate the mapping of functional binding epitopes—allowing for combinatorial alaninescanning mutagenesis, including a shotgun-scanning method, were published (42–45). Obviously, these methods not only allow to identify residues important for protein interaction and folding, but also those that are involved in enzymatic activity and substrate selectivity (44,46–47). However, alanine scanning of an enzyme will not reveal the best possible amino acid at a given position. Instead, several important residues may be identiﬁed, which, in a second generation, can either be further mutagenized by site-speciﬁc saturation mutagenesis or cassette mutagenesis with low or high error rate to generate large diversity libraries. In both cases, the number of samples to be screened as well as the project duration time are minimized as compared to the conventional ep-PCR approach. 4.2

Complete Saturation Mutagenesis at Each Amino Acid Position of the Target Enzyme

Site-speciﬁc saturation mutagenesis is a method to generate all possible variants of a protein at each amino acid position as calculated by the algorithm shown in Table 1. This method introduces all possible base triplets at a given codon position, thereby resulting in the formation of all 20 amino

386

Eggert et al.

acids at this position of the protein. Complete mutagenesis at all positions ﬁnally yields a library comprising all single-site sequence variants of the corresponding protein. At present, two examples of complete saturation mutagenesis libraries being used as a starting point for directed evolution experiments exist. In one case, a haloalkane dehalogenase from Rhodococcus rhodochrous was mutagenized, resulting in a signiﬁcant increase in thermostability of this enzyme. A total of eight single-site variants with improved thermostability was identiﬁed in the library, and the combination of these eight mutations in a single secondgeneration variant further improved the half-life of the enzyme by a factor of 30,000 (48). The other case concerns the creation of an enantioselective lipase from B. subtilis (SA Funke, A Eipper, T Eggert, K-E Jaeger, MT Reetz, submitted for publication). Here, directed evolution by ep-PCR revealed several improved variants; however, further rounds of ep-PCR did not result in further improvement of enantioselectivity. A library consisting of about 70,000 clones (384 clones at each of 181 amino acid positions) of BSLA was screened for variants showing increased enantioselectivity in the asymmetric hydrolysis of the model substrate pseudo-meso-1,4-diacetoxy-cyclopentene with formation of the (1S,4R)-enantiomer and (1R,4S)-enantiomer (Fig. 1B). The wildtype enantioselectivity of about 38% ee in favor of the (1S,4R)-enantiomer was improved to an ee-value of about 65%. Additionally, a variant with reversed enantioselectivity showing an ee-value of 56% in favor of the (1R,4S)-enantiomer was also identiﬁed. In addition, site-speciﬁc recombination has been performed of all better-performing variants identiﬁed so far and the resulting library is currently subjected to analysis by ESI-MS screening (28). 5

CONCLUSION

Directed evolution of enzymes oﬀers a unique opportunity to increase our knowledge of the way enzymes function. However, until now, a directed evolution protocol that is generally applicable does not exist. As outlined above, the introduction of random mutations into a given gene by ep-PCR may result in a ﬁrst-generation library, which is of much lower diversity than expected. Complete site-speciﬁc saturation mutagenesis may be used to generate a more comprehensive library, thereby allowing to identify amino acid positions that aﬀect the enzyme property to be optimized. Subsequently, recombinant methods such as DNA-shuﬄing (49), StEP (50), ITCHY (51), SHIPREC (52), RACHITT (53), and ADO (54) can be used. In the near future, the application of genetic algorithms will help in exploring protein sequence space. Additionally, data management and analysis will become more important. Furthermore, the determination of enzyme 3D structures, in combination

Directed Evolution by Random Mutagenesis

387

with molecular modeling and molecular dynamics calculations, will help to uncover the structural basis of biocatalyst optimization. Undoubtedly, directed evolution will serve to create novel biocatalysts that are highly enantioselective and active, as well as stable enough to allow for a variety of biotechnological applications.

ACKNOWLEDGMENTS This work was supported by a grant from the European Commission (project no. QLK3-CT-2001-00519). The authors thank Prof. Maarten Egmond (Department of Membrane Enzymology, University of Utrecht, the Netherlands) for critical reading of the manuscript and valuable advices.

REFERENCES 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11.

12. 13.

S Panke, MG Wubbolts. Enzyme technology and bioprocess engineering. Curr Opin Biotechnol 13:111–116, 2002. H Zhao, K Chockalingam, Z Chen. Directed evolution of enzymes and pathways for industrial biocatalysis. Curr Opin Biotechnol 13:104–110, 2002. K-E Jaeger, T Eggert, A Eipper, MT Reetz. Directed evolution and the creation of enantioselective biocatalysts. Appl Microbiol Biotechnol 55:519–530, 2001. KA Powell, SW Ramer, SB del Cardayre´, WPC Stemmer, MB Tobin, PF Longchamp, GW Huisman. Directed evolution and biocatalysis. Angew Chem Int Ed 40:1000–1026, 2001. S Brakmann. Discovery of superior enzymes by directed molecular evolution. ChemBioChem 3:865–871, 2001. ET Farinas, T Bulter, FH Arnold. Directed enzyme evolution. Curr Opin Biotechnol 12:545–551, 2001. MM Altamirano, JM Blackburn, C Aguayo, AR Fersht. Directed evolution of new catalytic activity using the a/h-barrel scaﬀold. Nature 403:617–622, 2000. H Zhao, FH Arnold. Directed evolution converts subtilisin E into a functional equivalent of thermitase. Protein Eng 12:47–53, 1999. JC Moore, FH Arnold. Directed evolution of a para-nitrobenzyl esterase for aqueous–organic solvents. Nat Biotechnol 14:458–467, 1996. H Joo, Z Lin, FH Arnold. Laboratory evolution of peroxide-mediated cytochrome P450 hydroxylation. Nature 399:670–673, 1999. K Liebeton, A Zonta, K Schimossek, M Nardini, D Lang, BW Dijkstra, MT Reetz, K-E Jaeger. Directed evolution of an enantioselective lipase. Chem Biol 7:709–718, 2000. K-E Jaeger, MT Reetz. Directed evolution of enantioselective enzymes for organic chemistry. Curr Opin Chem Biol 4:68–73, 2000. MT Reetz, A Zonta, K Schimossek, K Liebeton, K-E Jaeger. Creation of enan-

388

14.

15.

16. 17. 18. 19. 20. 21.

22.

23.

24.

25.

26.

27.

28.

Eggert et al. tioselective biocatalysts for organic chemistry by in vitro evolution. Angew Chem Int Ed 36:2830–2832, 1997. O May, PT Nguyen, FH Arnold. Inverting enantioselectivity by directed evolution of hydantoinase for improved production of L-methionine. Nat Biotechnol 18:317–320, 2000. S Fong, TD Machajewski, CC Mak, CH Wong. Directed evolution of D-2-keto3-deoxy-6-phoshogluconate aldolase to new variants for the eﬃcient synthesis of D- and L-sugars. Chem Biol 7:873–883, 2000. K-E Jaeger, T Eggert. Lipases for biotechnology. Curr Opin Biotechnol 13:390– 397, 2002. MT Reetz. Lipases as practical biocatalysts. Curr Opin Chem Biol 6:145–150, 2002. A Liese, K Seelbach, C Wandrey. Industrial Biotransformations. Weinheim: Wiley-VCH, 2000. UT Bornscheuer, RJ Kazlauskas. Hydrolases in Organic Synthesis: Regio- and Stereoselective Biotransformations. Weinheim: Wiley-VCH, 1999. K-E Jaeger, MT Reetz. Microbial lipases form versatile tools for biotechnology. Trends Biotechnol 16:396–403, 1998. K-E Jaeger, K Liebeton, A Zonta, K Schimossek, MT Reetz. Biotechnological application of Pseudomonas aeruginosa lipase: eﬃcient kinetic resolution of amine and alcohols. Appl Microbiol Biotechnol 46:99–105, 1996. AA Crameri, WPC Stemmer. Combinatorial multiple cassette mutagenesis creates all permutations of mutant and wild-type sequences. Biotechniques 18:194–196, 1995. MT Reetz, S Wilensek, D Zha, K-E Jaeger. Directed evolution of an enantioselective enzyme through combinatorial multiple-cassette mutagenesis. Angew Chem Int Ed 40:3589–3591, 2001. D Zha, S Wilensek, M Hermes, K-E Jaeger, MT Reetz. Complete reversal of enantioselectivity of an enzyme-catalyzed reaction by directed evolution. Chem Commun 24:2664–2665, 2001. MT Reetz, K-E Jaeger. Directed evolution as a means to create enantioselective enzymes for use in organic chemistry. In: S Brakmann, K Johnsson, eds. Directed Molecular Evolution of Proteins or How to Improve Enzymes for Biocatalysis. Weinheim: Wiley-VCH, 2002, pp 245–279. JM van Dijl, A Bolhuis, H Tjalsma, JDH Jongbloed, A de Jong, S Bron. Protein transport pathways in Bacillus subtilis: a genome-based road map. In: AL Sonenshein, JA Hoch, R Losick, eds. Bacillus subtilis and Its Closest Relatives: From Genes to Cells. Washington, DC: ASM Press, 2001, pp 337–355. S Bron, R Meima, JM van Dijl, A Wipat, CR Harwood. Molecular biology and genetics of Bacillus species. In: AL Demain, E Davies, eds. Manual of Industrial Microbiology and Biotechnology. 2nd ed. Washington, DC: ASM Press, 1999, pp 392–416. MT Reetz, MH Becker, H-W Klein, D Sto¨ckigt. A method for high-throughput screening of enantioselective catalysts. Angew Chem Int Ed 38:1758–1761, 1999.

Directed Evolution by Random Mutagenesis

389

29. M Zaccolo, DM Williams, DM Brown, E Gherardi. An approach to random mutagenesis of DNA using mixtures of triphosphate derivatives of nucleoside analogues. J Mol Biol 255:589–603, 1996. 30. RC Cadwell, GF Joyce. Mutagenic PCR. In: CW Dieﬀenbach, GS Dveksler, eds. PCR Primer: A Laboratory Manual. Cold Spring Harbor: CSHL Press, 1995, p 583. 31. YH Zhou, XP Zhang, RH Ebright. Random mutagenesis of gene-sized DNA molecules by use of PCR with Taq DNA polymerase. Nucleic Acids Res 19:6052, 1991. 32. KA Eckert, TA Kunkel. High ﬁdelity DNA synthesis by Thermus aquaticus DNA polymerase. Nucleic Acids Res 18:3739–3744, 1990. 33. KR Tindall, TA Kunkel. Fidelity of DNA synthesis by the Thermus aquaticus DNA polymerase. Biochemistry 27:6008–6013, 1988. 34. FH Arnold. Directed evolution: creating biocatalysts for the future. Chem Eng Sci 51:5091–5102, 1996. 35. L Wan, MB Twitchett, LD Eltis, AG Mauk, M Smith. In vitro evolution of horse heart myoglobin to increase peroxidase activity. Proc Natl Acad Sci USA 95: 12825–12831, 1998. 36. JP Vartanian, M Henry, S Wain-Hobson. Hypermutagenic PCR involving all four transitions and a sizeable proportion of transversions. Nucleic Acids Res 24:2627–2631, 1996. 37. MG Pikkemaat, DB Janssen. Generating segmental mutations in haloalkane dehalogenase: a novel part in the directed evolution toolbox. Nucleic Acids Res 30:e35–5, 2002. 38. SV Taylor, P Kast, D Hilvert. Investigating and engineering enzymes by genetic selection. Angew Chem Int Ed Engl 40:3310–3335, 2001. 39. BC Cunningham, JA Wells. High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. Science 244:1081–1085, 1989. 40. M Blaber, WA Baase, N Gassner, BW Matthews. Alanine scanning mutagenesis of the alpha-helix 115–123 of phage T4 lysozyme: eﬀects on structure, stability and the binding of solvent. J Mol Biol 246:317–330, 1995. 41. A Ashkenazi, LG Presta, SA Marsters, TR Camerato, KA Rosenthal, BM Fendly, DJ Capon. Mapping the CD4 binding site for human immunodeﬁciency virus by alanine-scanning mutagenesis. Proc Natl Acad Sci USA 87:7150–7154, 1990. 42. FF Vajdos, CW Adams, TN Breece, LG Presta, AM de Vos, SS Sidhu. Comprehensive functional maps of the antigen-binding site of an anti-ErbB2 antibody obtained with shotgun scanning mutagenesis. J Mol Biol 320:415–428, 2002. 43. AB Madhankumar, A Mintz, W Debinski. Alanine scanning mutagenesis of alpha-helix D segment of interleukin-13 reveals new functionally important residues of the cytokine. J Biol Chem 277:43194–43205, 2002. 44. KL Morrison, GA Weiss. Combinatorial alanine-scanning. Curr Opin Chem Biol 5:302–307, 2001. 45. GA Weiss, CK Watanabe, A Zhong, A Goddard, SS Sidhu. Rapid mapping of

390

46. 47.

48.

49. 50.

51. 52. 53.

54.

Eggert et al. protein functional epitopes by combinatorial alanine scanning. Proc Natl Acad Sci USA 97:8950–8954, 2000. J Huang, J Lu, F Barany, W Cao. Mutational analysis of endonuclease V from Thermotoga maritima. Biochemistry 41:8342–8350, 2002. DF Gomez-Casati, RY Igarashi, CN Berger, ME Brandt, AA Iglesias, CR Meyer. Identiﬁcation of functionally important amino-terminal arginines of Agrobacterium tumefaciens ADP-glucose pyrophosphorylase by alanine scanning mutagenesis. Biochemistry 40:10169–10178, 2001. KA Gray, TH Richardson, K Kretz, JM Short, F Bartnek, R Knowles, L Kan, PE Swanson, DE Robertson. Rapid evolution of reversible denaturation and elevated melting temperature in a microbial haloalkane dehalogenase. Adv Synth Catal 343:607–617, 2001. WPC Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature 370:389–391, 1994. H Zhao, L Giver, Z Shao, JA Aﬀholter, FH Arnold. Molecular evolution by staggered extension process (StEP) in vitro recombination. Nat Biotechnol 16: 258–261, 1998. M Ostermeier, JH Shim, SJ Benkovic. A combinatorial approach to hybrid enzymes independent of DNA homology. Nat Biotechnol 17:1205–1209, 1999. V Sieber, CA Martinez, FH Arnold. Libraries of hybrid proteins from distantly related sequences. Nat Biotechnol 19:456–460, 2001. WM Coco, WE Levinson, MJ Crist, HJ Hektor, A Darzins, PT Pienkos, CH Squires, DJ Monticello. DNA shuﬄing method for generating highly recombined genes and evolved enzymes. Nat Biotechnol 19:354–359, 2001. D Zha, A Eipper, MT Reetz. Assembly of designed oligonucleotides as an eﬃcient method for gene recombination: a new tool in directed evolution. Chembiochem 4:34–39, 2002.

17 Enzyme Engineering by Phage Display Patrice Soumillion, Daniel Legendre, and Jacques Fastrez Universite´ Catholique de Louvain Louvain-la-Neuve, Belgium

1

INTRODUCTION

Enzymes, the catalysts of life, are used by living cells and organisms to sustain their development. Consequently, natural evolution has only selected those catalysts that are useful for the growth of cells in their natural environment. In the last decade, several strategies have been envisaged to modify natural enzymes in order to widen their range of applications. Engineering by rational design, preferentially guided by molecular modelling, has sometimes led to remarkable results (1). However, this approach remains quite challenging. Indeed, the number of contributions reporting successful experiments has not exploded in the last years, and it is diﬃcult to estimate the percentage of unsuccessful attempts that remained unpublished. In view of this diﬃculty, the alternative strategy of creating libraries of mutants and screening or selecting for interesting ones has been increasingly followed. The success of this strategy depends on the size, quality, and diversity of the libraries, and, crucially, on the sensitivity, eﬃciency, and discriminating power of the screening or selection technique available. 391

392

Soumillion et al.

Figure 1 Construction of a phage-enzyme by cloning into gene 3 of a ﬁlamentous phage between sequences encoding the signal peptide and the mature protein. A tetracyclin resistance gene allows in vivo selection of E. coli cells infected by the phage.

Phage display has been extensively used for the selection of peptide ligands endowed with high aﬃnities for chosen receptors and of antibodies against speciﬁc haptens (2). Nucleotide sequences encoding random peptides, protein variants, or antibodies cloned from natural repertoires are inserted within a gene encoding a phage coat protein in such a way that the encoded polypeptides are displayed on the surface of the phage (Fig. 1). The phage libraries are subjected to cycles of adsorption on supports on which the target receptor or hapten is immobilized; these are eluted after washing away the nonbinders. After a few rounds of selection, high-aﬃnity clones are obtained and characterized (Fig. 2). The power of the phage-display technology arises from the linkage between genotype and phenotype. In highly diverse libraries, the number of copies of individual clones is quite small. As a consequence, if an eﬃcient protocol succeeds in extracting a useful clone, the extracted quantity is below the detection limit—let alone the characterization limit—

Figure 2 Selection by aﬃnity chromatography or ‘‘biopanning’’ of phages displaying peptides or proteins endowed with aﬃnity for an immobilised ligand. Several rounds of selection are generally required to select high-aﬃnity clones from libraries of large size. The gene encoding the displayed protein can be recloned and expressed.

Enzyme Engineering by Phage Display

393

394

Soumillion et al.

of most, if not all, physical techniques. However, the physical linkage between the gene and the encoded peptide or protein allows the ampliﬁcation of the number of copies to any desirable level by replication in a speciﬁc host. Genes encoding enzymes can also be inserted into a phage genome to create phage enzymes from which libraries of mutants can be generated using techniques similar to those described in another chapter of this book. However, the selection of interesting mutants is less direct than for peptides or antibodies libraries because aﬃnity for a substrate or an inhibitor is generally not the most important property. More intricate selection protocols have to be designed to be able to extract active enzymes from libraries. Several strategies have been tested, among others, in our group. They are reviewed in this chapter. 2

CONSTRUCTION AND CHARACTERIZATION OF PHAGE ENZYMES

The most widespread vectors used in phage display are ﬁlamentous bacteriophages such as M13, fd, or related phagemids (3). These phages are not lytic; they infect Escherichia coli strains featuring the FV episome. They are composed of a single-stranded DNA molecule encapsulated in a long, cylindrical capsid made up of ﬁve proteins. The major one, the product of gene 8 (g8p), forms the cylinder; it is a small protein with a molecular mass of 5.2 kDa present in a number of copies adjusted to the size of the DNA [2700 copies in the wild-type (wt) phage]. The other coat proteins close the extremities of the cylinder. Among them, the product of gene 3 (g3p), present in three to ﬁve copies, is responsible for the infectivity of the phage. Most phage enzymes described so far have been created by cloning in gene 3 (4–9). Several vectors featuring multiple cloning sites between sequences encoding the signal peptide and mature g3p have been designed. In the encoded fusion protein, the displayed enzyme is connected to g3p through a linker peptide. It may be advantageous to include in this connector a speciﬁc protease-sensitive site to allow a disconnection of the displayed enzyme from the phage when necessary (see Sec. 3.2); in our experience, a factor Xa cleavage site is suitable. Three examples of g6p fusions have also been reported. The displayed protein is attached to the carboxy-terminal of g6p. This presents the advantage that cDNA libraries can be functionally displayed at this site (10,11). As for steric reasons, it is not possible to assemble phage particles from rather bulky enzyme–g8p fusion proteins; no phage enzyme has been constructed by cloning in gene 8. Antibiotic resistance genes have also been inserted into the genomes of the phage vectors to facilitate the selection of infected bacteria. Phage enzymes are produced from E. coli-infected cultures and are easily puriﬁed by precipitation with polyethylene glycol from the culture

Enzyme Engineering by Phage Display

395

medium after separation of the bacteria by centrifugation. To obtain phage enzymes of higher purity, it is advisable to purify them by ultracentrifugation in cesium chloride gradient. In theory, as the only copy of gene 3 is fused to that encoding the enzyme; all copies of g3p should display the cloned enzyme. In practice, during phage morphogenesis, proteolytic degradation of the enzyme or the connecting peptide reduces the number of copies. This phenomenon can be demonstrated by an analysis of the phage proteins by Western blot using an anti-g3p antibody for detection (12). Besides the fusion protein, free g3p and sometimes degradation products are detected. The number of copies of enzyme per phage can be estimated from the relative intensities of the corresponding bands. The level of proteolysis depends on the structure of the connector, the stability of the displayed enzyme, and the culture conditions. In general, production of phages at lower temperature favors a higher level of display. The level of display can have a signiﬁcant inﬂuence on the eﬃciency of the selection experiments (see below). It depends also on the relative eﬃciency of the incorporation of free and fusion g3p on phage morphogenesis; the properties of fusion proteins can aﬀect this eﬃciency. Phagemids have also been used as vectors to clone genes encoding various enzymes in fusion with gene 3 or gene 8 (13–28). The production of phagemid particles is then triggered by an infection with a helper phage. By deﬁnition, in this case, there are two copies of g3p or g8p: the fusion and the free coat protein. As free g3p is normally better incorporated in the particle than the fusion, the level of display is expected to be lower with phagemids than with phages. Cloning at the gene 8 site becomes feasible with a phagemid as steric interactions between fusion proteins are reduced because they are scattered among free g8p throughout the ﬁlament. Phagemid vectors can be used to facilitate the display of toxic proteins as their expression can be put under the control of an inducible promoter, which is activated only on helper-phage infection. So far, about 30 diﬀerent enzymes have been displayed on phage or phagemid (29). The catalytic activity of a phage enzyme can be measured by following the initial rates of substrate disappearance or product appearance as a function of time and substrate concentration as in classical enzymology. Vmax and Km values can be extracted from these primary data. Values of kcat can be obtained by dividing the Vmax by the phage-enzyme concentration, which can be determined by measurement of the absorbance of stock phage solution at 265 nm. The extinction coeﬃcients can be estimated by assuming that it is proportional to phage size and by using an extinction coeﬃcient of 8.4107 M1 cm1 for a phage enzyme with a DNA of 10 kb. Comparisons of kcat values of phage enzymes and corresponding free enzymes allow the determination of the number of copies of enzymes per phage. In our experience, levels of display estimated in this way agree rather well with data

396

Soumillion et al.

from Western blots. The main practical diﬃculty in the determination of kinetic parameters results from the fact that the ‘‘molecular weight’’ of phage enzymes is quite large (> 2106 kDa) compared to soluble enzymes, with the consequence that the upper limit of their concentration in solution is rather low (i.e., around 107 M). Long measurement times are then needed to detect low activities, and signiﬁcant background rates of spontaneous reactions can sometimes be a problem. As far as the questions about size limit, possible oligomerization states, and posttranslational modiﬁcations of enzymes on ﬁlamentous phage are concerned, the following provisional answers can be given. The largest protein that has been displayed on fd is penicillin acylase (86 kDa), with a maximum of one copy per phage. In our experience, smaller enzymes (V30 kDa) can be displayed in up to three to four copies. Dimerization at least is possible because glutathione-S-transferase (GST), an enzyme active only as a dimer of 25-kDa subunits, has been functionally displayed on fd (17). Although this has not been formally demonstrated, disulﬁde bridges are thought to be formed in several displayed enzymes. Maturation of subtilisin from its pre-prosubtilisin precursor to its active form appears to occur without external assistance (8). A potential limitation of ﬁlamentous phages in phage display is that the displayed protein must be exported through the cytoplasmic membrane and pass through the canal provided in the external membrane by phage g4p. Cytoplasmic enzymes and large proteins (MW>80 kDa) may not satisfy these criteria. Cloning for display in lytic phages such as E and T4 removes these limitations. Two enzymes, h-lactamase and h-galactosidase, have been cloned as fusions with the E coat proteins gpD and gpV (30,31). These display formats have not been used extensively in enzyme engineering so far.

3

SELECTION FOR ACTIVITY

The selection of a phage enzyme with desirable properties from libraries of mutants based on their catalytic activity is far more demanding than the selection of protein variants for binding to a speciﬁc target. Indeed, it is necessary to ﬁnd a way to couple the ability of an enzyme to catalyze a chemical transformation, which, in principle, leaves it unchanged, with an acquired binding ability. Several strategies of increasing sophistication have been conceived to achieve that goal. 3.1

Selection by Binding: Transition State Analogues vs. Substrate or Product Analogues

The ﬁrst selection protocols were simply based on binding to substrate or product analogues. They were tested on libraries of mutants of staphy-

Enzyme Engineering by Phage Display

397

lococcal nuclease (16) or glutathione-S-transferase, an enzyme that catalyzes the conjugation of toxic compounds to glutathione to facilitate their elimination (17). The goal of isolating mutants with modiﬁed speciﬁcities was met with limited success (16,17). Binding to transition state analogues (TSAs) represented a step forward as these are designed to mimic the geometry and charge distribution of the true transition state of a reaction. Indeed, as ﬁrst suggested by Pauling (31a), enzymes are able to catalyze reactions because they are more complementary to transition states than to substrates. It is interesting to compare the results obtained from selections of GST mutant libraries by binding to a TSA vs. a product analogue. The A1-1-type GST is active on aromatic substrates activated for nucleophilic addition, such as chlorodinitrobenzene. In an eﬀort to change its speciﬁcity toward substrates bearing a negative charge on the aromatic ring, Widersten and Mannervik (17) created libraries of phage-displayed GST in which 10 amino acid residues in the aromatic electrophile binding site were randomly mutated. Selection by binding to product-like aﬃnity ligands (e.g., 1 in Fig. 3) allowed extracting novel GST with altered substrate speciﬁcity, but the speciﬁc activities of these enzymes were reduced 1000-fold compared to the wild-type enzyme. On the other hand, selection by binding to a transition state analogue was used to extract active enzymes from

Figure 3 Immobilized product analogue or transition state analogue used for the selection of glutathione-S-transferase mutants displayed on phage.

398

Soumillion et al.

another library of mutants in which four residues in the same binding site had been randomly mutated. The j-complex (2), which mimics the transition state of a nucleophilic aromatic substitution, was used as an aﬃnity ligand (Fig. 3). Several mutants were characterized after four rounds of biopanning. The catalytic eﬃciency of one of them was determined on several substrates: it was 20-fold to 90-fold lower than the wild-type enzyme on chloronitrobenzene substrates (32). Although a deﬁnite conclusion cannot be reached from a comparison between the results of these two sets of experiments because the libraries and speciﬁcities are diﬀerent, the observation of a better catalytic activity after selection with a TSA suggests that it is more appropriate to select active enzymes. The strategy of selection on phosphonate TSAs was also applied to libraries of catalytic antibodies with speciﬁc esterase activities in the hope of ﬁnding mutants with enhanced catalytic activity. This has led to conﬂicting results: in one case, mutants with higher TSA aﬃnity but lower catalytic activity were selected (33) whereas, on the other hand, better binders were more active (34). 3.2

Selection with Suicide Substrates: Mechanism-Based Inhibitors

A suicide substrate is a relatively unreactive compound that can be transformed by an enzyme into a very reactive inhibitor. As this transformation arises by the normal enzymatic mechanism, suicide substrates are also called mechanism-based inhibitors (35). We have taken advantage of that property to design a selection strategy applicable to the extraction of active phage enzymes from libraries of mutants (Fig. 4). Mixtures or libraries of mutants are incubated under kinetic control with a limiting concentration of a biotinylated suicide inhibitor. Preferential reaction of the most active phage enzymes with the inhibitor leads to labelling of their active site and biotinylation of these phages. After a deﬁned reaction period, excess nonbiotinylated inhibitor is added to stop all labellings. The labelled phages are then extracted from the mixture by adsorption on a streptavidin-coated support. As inhibition involves the formation of a covalent bond, the recovery of phages necessitates the cleavage either of a disulﬁde bond introduced in the connector between the suicide inhibitor head and biotin, or of the peptide

Figure 4 Selection of active phage-enzymes by labelling with a biotinylated suicide inhibitor followed by capture on streptavidin coated support. The phages are released by cleavage of a disulﬁde bond in the activity label or by cleavage of a peptide bond in the connector between the displayed enzyme and g3p (not represented).

Enzyme Engineering by Phage Display

399

400

Soumillion et al.

connector between the displayed enzyme and g3p using a speciﬁc protease. In our experience, the second method is more eﬃcient. The method has been tested on model mixtures using the phage-displayed TEM-1 h-lactamase and mutants of known activities using penicillin sulfone as the biotinylated suicide inhibitor (3 in Fig. 5). The most active phage enzymes could be extracted from the libraries (5,36). However, in other model experiments designed to assess the potential of the method to select enzymes of increased activity and stability, we discovered that this simple protocol may lead to the selection of low-activity mutants. In selections, run in the presence of denaturant, on mixtures of phage enzymes displaying the wild-type h-lactamase and two mutants of diﬀerent activities and stabilities [kcat vs. PenG, residual activity (r.a.) in 0.65 M GuHCl wt: kcat = 1500 s1, r.a. = 40%; E104K: kcat = 2260 s1, r.a. = 29%; and G238S: kcat = 45 s1, r.a. = 6.7%], we observed a twofold enrichment in the less active and less stable G238S mutant. Analysis of the kinetics of reaction with the suicide substrate revealed that this mutant reacted actually faster with the suicide inhibitor (3) than predicted from its turnover rate with the substrate. Similarly, when the protocol was applied to libraries of mutants generated by error-prone PCR, an inactive mutant (E166V) was isolated as the dominant clone after three rounds of selection (I De Conninck and J Fastrez, unpublished results). These surprising results can be explained by the following kinetic scheme describing the interaction between a suicide inhibitor and its target enzyme when the suicide event originates from a covalent intermediate on the reaction pathway (Sch. 1). The probability of forming the irreversibly inhibited enzyme from the covalent intermediate—an acyl-enzyme in the case of a h-lactamase—is directly related to the ratio between the rate constants of the suicide event (k4) and turnover (k3). If the acyl-enzyme becomes too stable and the enzyme does not turn over eﬃciently, the ratio k4/k3 becomes high and the enzyme is eﬃciently inhibited. This is the case of the E166V and the G238S h-lactamase mutants mentioned above. Both muta-

Figure 5 Biotinylated penicillin sulfone used for the selection of phage-displayed hlactamases.

Enzyme Engineering by Phage Display

401

Scheme 1

tions aﬀect the activity mainly through a reduction in k3, the eﬀect being more dramatic on E166 (37) than on G238 (38). Without correction, this simple protocol can lead to the selection of enzymes that do not turn over. The problem can be corrected by preincubating the mixture of phage enzymes with the substrate. The phage enzymes, whose ratio of rates of deacylation vs. acylation (k3/k2) is low, will be blocked as acyl enzymes and will not be available for labelling with the biotinylated suicide inhibitor. Consequently, they will not be selected. This protocol was applied to a library of h-lactamase mutants containing an unknown proportion of phage enzymes with low k3/k2 ratios, behaving like ‘‘penicillin-binding proteins (PBPs)’’ and other mutants with typical h-lactamase properties (high k3). The h-lactamases were eﬃciently selected under these conditions (39). To analyze the eﬃciency of this protocol involving a counterselection by substrate, it was also tested on a mixture of phages displaying enzymes of known properties, respectively, a low-activity h-lactamase mutant [kcat = 16 s1c1% of the kcat of the wild-type enzyme and the E. coli PBP4 (a DDpeptidase with a k3 of 7.2105 s1 and k3/k2 ratio of 6.3106] (40). In the absence of counterselection, the PBP4 phage enzyme was selected. Preincubation with 105 M substrate (benzyl penicillin) reversed the enrichment factor to V0.1 (S Lavenne, J Fastrez, in preparation). If the technique of selection with suicide substrates is to be used to select mutants with a modiﬁed speciﬁcity, it is essential that the rate of labelling with suicide substrates and that of turnover of true substrates be sensitive to the same structural features. Some information on this issue is available from results obtained with the serine protease subtilisin and a lipase. Although, strictly speaking, esters of acyl-amino-phosphonic acids should be referred to as covalent transition state analogues, they behave like suicide substrates for serine proteases. They form a stable phosphonylenzyme analogous to the classical acyl-enzyme intermediate. Furthermore, the rates of phosphonylation by these inhibitors and acylation by the corresponding esters or amides substrates are sensitive to the same structural features. Taking advantage of this characteristic, we have tested the possibility of selecting mutants of the Bacillus lentus subtilisin whose extended active site

402

Soumillion et al.

would accept positively charged residues in the P4 site. Two residues lining the P4 binding pocket were randomized and the phage enzymes library was selected by labelling with a phosphonylating inhibitor (4b in Fig. 6) whose structure is as close as possible to that of a substrate of the target speciﬁcity (5b). With inhibitors (4b), mutants whose activity on the charged substrate was 4% of that of the wild-type phage enzyme on substrates (5b) could be isolated even from this small library. As a control, clones with a wild-type-like speciﬁcity were shown to be eﬃciently selected with inhibitors (4a) (8). The ‘‘detergent lipase’’ from Thermomyces lanuginosa, an enzyme active on emulsiﬁed lipids and responding to interfacial activation, has also been functionally displayed on phagemid. A library of mutants of high diversity has been created in which nine amino acid residues in two regions constituting the hinge controlling the opening of the active site were randomized. Selection by labelling, in the presence of detergent, with the trioleinemulsiﬁed biotinylated p-nitrophenyl-phosphonate (6 in Fig. 7) succeeded in extracting active clones from the library. However, none of the analyzed clones was more active than the wild-type enzyme on p-nitrophenyl-palmitate, particularly in the presence of detergent. In one region, a high degree of conservation of wild-type residues was observed in the active clones. In the most active ones (activity z50% of wt), a substitution known to alter the substrate chain length preference was selected. This may result from the fact that the long hydrophobic chain connecting the inhibitor head to biotin

Figure 6 Biotinylated phophonylating inhibitor used for the selection of subtilisin mutants with wild-type like activity (4a) or with a speciﬁcity modiﬁed to accept positively charged residues in the P4 site (4b) and structures of the corresponding substrates (5a and 5b).

Enzyme Engineering by Phage Display

Figure 7 mutants.

403

Biotinylated phosphonylating inhibitor used for the selection of lipolase

interacts more favorably with the enzyme than the palmitate chain in the substrate and/or from the inﬂuence of biotin on the conformation of the inhibitor in the lipid layer during phage enzymes labelling (27). In conclusion, selection with suicide substrates appears to allow the extraction of interesting mutants from libraries. However, several limitations remain. The most obvious one is that suicide substrates are not available for all classes of enzymes. Furthermore, when available, the suicide substrates are not necessarily suitable for selection of the desired activity. For instance, nearly all the suicide substrates that have been designed to inhibit proteases are activated by cleavage of a lactone-like function; consequently, the suicide event does not really require the ability to cleave an amide bond but an ester bond, indeed an easier reaction. Finally, although our investigation on the selection of various h-lactamase mutants from model mixtures or libraries was relatively successful, the technique has not yet allowed us to select clones having acquired even a weak h-lactamase activity (increased rate of deacylation) from libraries of PBP mutants. This would demonstrate that the technique is suitable for isolating mutants with weak activities (i.e., the kind of activities that are likely to be found initially when trying to engineer enzymatic activities de novo). Nevertheless, Some success in that direction has been achieved in the selection of catalytic antibodies with h-lactamase or glycosidase activities (41,42). 3.3

Direct Selection with Substrates

In view of the potential limitations of the suicide inhibitors approach, eﬀorts have been devoted to develop new techniques of selection based directly on substrate transformation. In the ﬁrst two reports, the substrates were attached to the phage enzymes. On reaction, occurring in an ‘‘intraphage’’ format, the active phage enzymes became labelled with the product. They were then captured on a support coated with a product-binding protein. This strategy was tested by Pedersen et al. on a model system with the phagemiddisplayed staphylococcal nuclease (SNase), a nuclease requiring Ca2+ for

404

Soumillion et al.

activity. Biotinylated double-stranded DNA was attached to the phagemid and immobilized on streptavidin-coated support. The addition of Ca2+ activated the enzyme and allowed the release of phagemids from their support 100 times more eﬀectively than with a control phage displaying a Fab fragment (43). Demartis et al. have also tested selection protocols based on substrate transformation with four diﬀerent enzymes: two proteases, a GST, and a biotin ligase. Peptide substrates were attached to the phage enzymes through a complex formation with a calmodulin module fused to both g3p and the displayed enzyme. The protease phage enzymes were selected after ‘‘intraphage’’ proteolytic cleavage by binding to a productspeciﬁc antibody. The GST was selected by streptavidin binding after conjugation of a glutathione-containing peptide substrate with a biotinylated electrophilic aromatic cosubstrate. The E. coli biotin ligase was similarly selected directly after biotinylation. On model systems, enrichment factors of active phage enzymes vs. control were adequate (up to 2000-fold). Unfortunately, application of the selection scheme to a library of trypsin mutants did not allow the obtainment of mutants with catalytic activity superior to those of the original H57A mutants (21,28). Being based on single ‘‘intraphage’’ turnover, these strategies are likely to lead to a selection of low-activity enzymes. Jestin et al. have tried to avoid this shortcoming by coupling two independent reactions: the enzymatic reaction, presumed to occur intermolecularly, and a chemical reaction, connecting the substrate/product to the phage. With this strategy, they were able to enrich model mixtures of phagedisplayed active and inactive DNA polymerases into active ones (22). Recently, we have explored the possibility of selecting phages displaying active metalloenzymes by aﬃnity chromatography associated with catalytic elution (Fig. 8). The selection protocol includes three steps: (a) the phage enzymes are ﬁrst inactivated by extraction of the metal cofactor; (b) they are adsorbed on a support coated with a penicillin substrate; and (c) the phages displaying active enzymes are then selectively eluted by the addition of the cofactor. Active enzymes transform the substrate into a product for which they have normally a signiﬁcantly lower aﬃnity. The method requires that the apoenzymes are still able to bind their substrates. The selection process was

Figure 8 Selection of metallo-phage-enzymes by catalytic elution. Three kinds of phage-enzymes present in the library of mutants are represented: (1) top: inactive mutant retaining aﬃnity for the substrate, (2) middle: active phage-enzyme (a star in the active site represents the metal cofactor), and (3) bottom: inactive mutant devoid of aﬃnity. The selection operates in three steps: inactivation by metal extraction, binding on immobilised substrate, release of active phage-enzymes by catalytic elution on metal insertion in the active site.

Enzyme Engineering by Phage Display

405

406

Soumillion et al.

ﬁrst tested with the B. cereus metallo-h-lactamase phage enzyme (fd-hLII). The wild-type fd-hLII was shown to be preferentially extracted from model mixtures containing fd-hLII and either a dummy phage, a phage displaying an inactive mutant of the serine h-lactamase TEM-1, or inactive and lowactivity mutants of hLII. Enrichment factors varying between 36-fold and 820-fold were observed. The selection was also applied to extract active phage enzymes from a library of mutants generated by mutagenic PCR. The activity of the library was shown to increase 60-fold on two rounds of selection. Eleven clones from the second round were randomly picked for characterization. They contained between two and four mutations. The kcat values of the phage enzymes varied between 30% and 160% and the Km values varied between 70% and 170% of the wild-type values. All enzymatic activities were less stable than the wild-type vs. thermal denaturation. 3.3

Construction of an Allosteric Binding Site by Hierarchical Selection and Screening

Allosteric regulation lies in good standing among the characteristic properties of enzymes. In a project whose ﬁnal purpose is to design enzymes whose activity could be regulated by the binding of various ligands, we endeavored to construct an allosteric binding site in the vicinity of the active site (Fig. 9). To create the ﬁrst-generation regulatable enzymes, we reasoned that it might

Figure 9 Schematic representation of a h-lactamase in which three contiguous loops have been extended by insertion of decapeptides into surface loops close to the active site. The essential Ser70 located at the bottom of the active site is represented in space-ﬁlling mode.

Enzyme Engineering by Phage Display

407

be easier to engineer a recognition site for a protein than for a small ligand because this could simply require the building of a protuberance that would ﬁt into an existing cavity on the surface of the target protein, whereas building a binding site for a small ligand would probably require the more diﬃcult creation of a new cavity. Because we cannot reliably predict the sequences or structures that would constitute adequate binding sites, the strategy will be to insert random sequences in surface loops of the chosen enzyme and then to select potentially interesting clones on three criteria: (a) enzymatic activity, (b) aﬃnity for the target ligand, (c) if possible, modulation of activity on target ligand binding (Fig. 10). Selection for these properties is best organized hierarchically. The enzyme chosen for this project was the serine TEM-1 h-lactamase in the phage display format. Random peptides were genetically inserted in ﬁve diﬀerent loops by replacement of one to three residues of the wild-type

Figure 10 Schematic representation of the eﬀect of allosteric ligand binding on enzymatic activity.

408

Soumillion et al.

sequence by ﬁve to nine random residues. In the case of a h-lactamase, active clones can be extracted from libraries either by in vivo selection or by in vitro selection using biotinylated suicide inhibitors as explained above. In vivo selection was chosen for its simplicity: the insertion libraries were plated on solid media containing a h-lactam antibiotic at a concentration chosen to exclude inactive or low-activity clones. The percentage of active clones reﬂects the insertion tolerance of the loops. The following results were founds: insertions were relatively well accepted (z5% of clones with z5% of wild-type activity) in replacements of G41-A42 (P Mathonet, unpublished results) or of T271 (8). These loops are symmetrically located, respectively, between the amino-terminal helix and the ﬁrst h-strand, or between the last h-strand and the carboxy-terminal helix. A weak tolerance to insertion (<0.1% of clones with z5% of wild-type activity) was observed in replacements of A172-I173 or V103-Y105, respectively, in the back of an V-loop supporting the essential glutamic acid 166 and on the rim of the active site. Insertions were hardly accepted at all in positions 238–241 (i.e., between two h-strands located at the entrance of the active site). Active libraries could be recombined to generate more complex binding sites in which more than one loop had been extended. The libraries of active phage enzymes were then selected by aﬃnity chromatography on a solid support on which monoclonal antibodies (mAbs) or other proteins were immobilized. The target proteins were three monoclonal antibodies against PSA: the prostate-speciﬁc antigen and two proteins presumed to be diﬃcult targets because they are not known to form high-aﬃnity complexes with other proteins, horse liver ferritin and hgalactosidase. After several rounds of selection, binders with high aﬃnities for anti-PSA mAbs (Kd values between 10 nM and 1 AM) were isolated (8). Low-aﬃnity binders could also be selected on ferritin and h-galactosidase. The aﬃnity of a ferritin binder was increased by error-prone mutagenesis and selection to a Kd of 16 nM. This enzyme had an activity similar to that of the wild-type h-lactamase. It featured two extended loops (in positions 103–105 and 271) plus one additional mutation (44). To ﬁnd clones whose activities were modulated by binding of allosteric ligand, the eﬀect of ligand addition on activity was measured on individual clones. Many clones selected on monoclonal antibodies showed an activity modulation, mostly inhibitory. A weak eﬀect of ferritin binding on the activity of the high-aﬃnity clone was also observed. 4

CONCLUSION

Several strategies have been developed to select active enzymes from phagedisplayed libraries. Labelling with biotinylated suicide substrates, followed

Enzyme Engineering by Phage Display

409

by capture on streptavidin-coated beads, has achieved some success. This strategy has been tested in projects designed to engineer enzymes with modiﬁed properties (new speciﬁcity, better behavior in the presence of a detergent, etc.). Enzymes with improved properties have not yet been obtained using this protocol. In view of the lack of suicide substrates for many enzymatic activities, new strategies have also been considered: they are based directly on substrate transformation. These methods are still in the early stage of development but have shown promising results. Besides being very active and speciﬁc catalysts, many enzymes are regulatable by interaction with allosteric ligands. Progressive construction of an allosteric binding site by insertion of random peptides in loops, hierarchical selection for activity and aﬃnity for an allosteric ligand, screening for modulation of activity, and recombination of selected partial binding sites may, in the future, lead to eﬃcient engineering of regulatory sites.

REFERENCES 1. 2. 3. 4.

5.

6. 7.

8.

9.

10.

F Cedrone, A Menez, E Quemeneur. Tailoring new enzyme functions by rational redesign. Curr Opin Struct Biol 10:405–410, 2000. GP Smith, VA Petrenko. Phage display. Chem Rev 97:391–410, 1997. P Model, M Russel. Filamentous bacteriophage. In: R Calendar, ed. The Bacteriophages. Vol. 2. New York: Plenum, 1988, pp 375–456. J McCaﬀerty, RH Jackson, DJ Chiswell. Phage-enzymes: expression and aﬃnity chromatography of functional alkaline phosphatase on the surface of bacteriophage. Protein Eng 4:955–961, 1991. P Soumillion, L Jespers, M Bouchet, J Marchand-Brynaert, G Winter, J Fastrez. Selection of h-lactamase on ﬁlamentous bacteriophage by catalytic activity. J Mol Biol 237:415–422, 1994. J Ku, PG Schultz. Phage display of catalytically active staphylococcal nuclease. Bioorg Med Chem 2:1413–1415, 1994. I Lasters, N Van Herzeele, HR Lijnen, D Collen, L Jespers. Enzymic properties of phage-displayed fragments of human plasminogen. Eur J Biochem 244:946–952, 1997. D Legendre, N Laraki, T Graslund, ME Bjornvad, M Bouchet, PA˚ Nygren, TV Borchert, J Fastrez. Display of active subtilisin 309 on phage: analysis of parameters inﬂuencing the selection of subtilisin variants with changed substrate speciﬁcity from libraries using phosphonylating inhibitors. J Mol Biol 296:87– 102, 2000. I Ponsard, M Galleni, P Soumillion, J Fastrez. A method for the selection of mnetalloenzymes by catalytic activity using phage display and catalytic elution. ChemBioChem 2:253–259, 2001. M Fransen, PP Van Veldhoven, S Subramani. Identiﬁcation of peroxisomal proteins by using M13 phage protein VI phage display: molecular evidence that

410

11.

12. 13. 14.

15.

16. 17.

18.

19.

20.

21.

22.

23.

24.

Soumillion et al. mammalian peroxisomes contain a 2,4-dienoyl-CoA reductase. Biochem J 340:561–568, 1999. L Amery, GP Mannaerts, S Subramani, PP Van Veldhoven, M Fransen. Identiﬁcation of a novel human peroxisomal 2,4-dienoyl-CoA reductase related protein using the M13 phage protein VI phage display technology. Comb Chem High Throughput Screen 4:545–552, 2001. D Legendre, P Soumillion, J Fastrez. Engineering a regulatable enzyme for homogeneous immunoassays. Nat Biotechnol 17:67–72, 1999. DR Corey, AK Shiau, Q Yang, BA Janowski, CS Craik. Trypsin display on the surface of bacteriophage. Gene 128:129–134, 1993. R Crameri, M Suter. Display of biologically-active proteins on the surface of ﬁlamentous phages—a cDNA cloning system for selection of functional geneproducts linked to the genetic information responsible for their production. Gene 137:69–75, 1993. R Eerola, P Saviranta, H Lilja, K Pettersson, T Lovgren, M Karp. Expression of prostate speciﬁc antigen on the surface of a ﬁlamentous phage. Biochem Biophys Res Commun 200:1346–1352, 1994. J Light, RA Lerner. Random mutagenesis of staphylococcal nuclease and phage display selection. Bioorg Med Chem 3:955–967, 1995. M Widersten, B Mannervik. Glutathione transferases with novel active-sites isolated by phage display from a library of random mutants. J Mol Biol 250:115–122, 1995. K Maenaka, M Furuta, K Tsumoto, K Watanabe, Y Ueda, I Kumagai. A stable phage-display system using a phagemic vector: phage display of hen egg-white lysozyme (HEL), Escherichia coli alkaline, phosphatase, and antiHEL monoclonal antibody, HyHEL10. Biochem Biophys Res Commun 218:682–687, 1996. JA Hunt, CA Fierke. Selection of carbonic anhydrase variants displayed on phage—aromatic residues in zinc binding site enhance metal aﬃnity and equilibration kinetics. J Biol Chem 272:20364–20372, 1997. N Dimasi, A Pasquo, F Martin, S Di Marco, C Steinkuhler, R Cortese, M Sollazzo. Engineering, characterization and phage display of hepatitis C virus NS3 protease and NS4A cofactor peptide as a single-chain protein. Protein Eng 11:1257–1265, 1998. S Demartis, A Huber, F Viti, L Lozzi, L Giovannoni, P Neri, G Winter, D Neri. A strategy for the isolation of catalytic activities from repertoires of enzymes displayed on phage. J Mol Biol 286:617–633, 1999. JL Jestin, P Kristensen, G Winter. A method for the selection of catalytic activity using phage display and proximity coupling. Angew Chem Int Ed 38:1124–1127, 1999. RMD Verhaert, J van Duin, WJ Quax. Processing and functional display of the 86 kDa heterodimeric penicillin G acylase on the surface of phage fd. Biochem J 342:415–422, 1999. JA Watson, MG Rumsby, RG Wolowacz. Phage display identiﬁes thioredoxin and superoxide dismutase as novel protein kinase C-interacting proteins:

Enzyme Engineering by Phage Display

25. 26.

27.

28.

29.

30.

31. 31a. 32.

33.

34. 35.

36.

37.

38.

39.

411

thioredoxin inhibits protein kinase C-mediated phosphorylation of histone. Biochem J 343:301–305, 1999. T Wind, S Kjaer, BFC Clark. Display of Ras on ﬁlamentous phage through cysteine replacement. Biochimie 81:1079–1087, 1999. K Korn, HH Foerster, U Hahn. Phage display of RNase A and an improved method for puriﬁcation of phages displaying RNases. Biol Chem 381:179–181, 2000. S Danielsen, M Eklund, H-J Deussen, T Graslund, P-A Nygren, TV Borchert. In vitro selection of enzymatically active lipase variants from phage libraries using a mechanism-based inhibitor. Gene 272:267–274, 2001. C Heinis, A Huber, S Demartis, J Bertschinger, S Melkko, L Lozzi, P Neri, D Neri. Selection of catalytically active biotin ligase and trypsin mutants by phage display. Protein Eng 14:1043–1052, 2001. P Soumillion, J Fastrez. Investigation of phage display for the directed evolution of enzymes. In: S Brakmann, K Johnsson, eds. Directed Molecular Evolution of Proteins. Weinheim: Wiley-VCH, 2002, pp 79–110 I Maruyama, HI Maruyama, S Brenner. Lambda-Foo—a Lambda-phage vector for the expression of foreign proteins. Proc Natl Acad Sci USA 91:8273– 8277, 1994. YG Mikawa, IN Maruyama, S Brenner. Surface display of proteins on bacteriophage lambda heads. J Mol Biol 262:21–30, 1996. L Pauling. Molecular architecture and biological reactions. Chem Eng News 24:1375–1377, 1946. LO Hansson, M Widersteen, B Mannervik. Mechanism-based phage display selection of active-site mutants of human glutathione transferase A1-1 catalyzing SNAr reactions. Biochemistry 36:11252–11260, 1997. M Baca, TS Scanlan, RC Stephenson, JA Wells. Phage display of a catalytic antibody to optimize aﬃnity for transition-state analog binding. Proc Natl Acad Sci USA 94:10063–10068, 1997. I Fujii, S Fukuyama, Y Iwabuchi, R Tanimura. Evolving catalytic antibodies in a phage-displayed combinatorial library. Nat Biotechnol 16:463–467, 1998. MA Ator, PR Ortiz de Montellano. Mechanism-based (suicide) enzyme inactivation. In: DS Sigman, PD Boyer, eds. Enzymes. 3rd ed. Vol. 19. San Diego: Academic Press, 1990, pp 213–282. S Vanwetswinkel, J Marchand-Brynaert, J Fastrez. Selection of the most active enzymes from a mixture of phage-displayed h-lactamase mutants. Bioorg Med Chem Lett 6:789–792, 1996. H Adachi, T Ohta, H Matsuzawa. Site-directed mutants, at position 166, of RTEM-1 h-lactamase that form a stable acyl-enzyme intermediate with penicillin. J Biol Chem 266:3186–3191, 1991. I Saves, O Burletschiltz, L Maveyraud, JP Samama, JC Prome, JM Masson. Mass-spectral kinetic-study of acylation and deacylation during the hydrolysis of penicillins and cefotaxime by h-lactamase TEM-1 and the G238S mutant. Biochemistry 34:11660–11667, 1995. S Vanwetswinkel, B Avalle, J Fastrez. Selection of h-lactamases and penicillin

412

40. 41.

42.

43.

44.

Soumillion et al. binding mutants from a library of phage displayed TEM-1 h-lactamase randomly mutated in the active site V-loop. J Mol Biol 295:527–540, 2000. N Rhazi, Etude du me´canisme catalytique des DD-peptidases bacte´riennes. PhD dissertation, Universite´ de Lie`ge, Lie`ge, Belgium, 2000. F Tanaka, H Almer, RA Lerner, CF Barbas III. Catalytic single-chain antibodies possessing h-lactamase activity selected from a phage displayed combinatorial library using a mechanism-based inhibitor. Tetrahedron Lett 40:8063– 8066, 1999. JD Janda, L-C Lo, C-HL Lo, M-M Sim, R Wang, C-H Wong, RA Lerner. Chemical selection for catalysis in combinatorial antibody libraries. Science 275:945–948, 1997. H Pedersen, S Holder, DP Sutherlin, U Schwitter, DS King, PG Schultz. A method for directed evolution and functional cloning of enzymes. Proc Natl Acad Sci USA 95:10523–10528, 1998. D Legendre, B Vucic, V Hougardy, A-L Girboux, C Henrioul, J Van Haute, P Soumillion, J Fastrez. TEM-1 h-lactamase as a scaﬀold for protein recognition and assay. Protein Sci 11:1506–1518, 2002.

18 In Vivo Gene Shuffling in Yeast: A Fast and Easy Method for Directed Evolution of Enzymes Jens Sigurd Okkels* Novozymes A/S Bagsværd, Denmark

1

INTRODUCTION

Exchange of genetic material by recombination occurs in all living organisms and is the major contributor to high-quality diversity generation in the evolution of the species. The recombination apparatus is evolved to various degrees of complexity in various living organisms with a number of ﬁne tunings and mechanisms that are still not completely understood. The homologous recombination frequency varies greatly between organisms (and between cell cycle stages including mitosis and meiosis). Saccharomyces cerevisiae has been used for many years as a eukaryotic model for studying homologous recombination (1). It has a high frequency of homologous recombination, e.g., as shown by the site-speciﬁc integration into the chromo-

*Current aﬃliation: Maxygen Aps, Horsholm, Denmark

413

414

Okkels

some upon transformation with DNA. On the contrary, Escherichia coli has a relatively low frequency of integration of homologous DNA into the chromosome, despite the fact that the E. coli chromosome is smaller than the S. cerevisiae chromosome. Using in vivo recombination in S. cerevisiae as a cloning tool of a fragment into a gapped plasmid with a 40 to 150 bp overlap at the ends has been described (2). This is a very eﬃcient method for cloning a fragment in S. cerevisiae. A similar method in E. coli gives a relatively low recombination eﬃciency (3). Since the market introduction in 1988 of the ﬁrst recombinantly produced detergent lipase from Thermomyces lanuginosa (earlier named Humicola lanuginosa) (4), lipases have been used as detergent enzymes to remove lipid or fatty stains from clothes and other textiles. A drawback of many detergent lipases is that they exert the best lipid-removing eﬀect after more than one wash cycle, presumably because the known lipases, when deposited on the fatty stain, are more active during a certain period of the drying process rather than during the wash itself (5). Therefore at least two wash cycles, separated by a suﬃcient drying period, are required to obtain substantial removal of fatty stains. This paper describes the development of an in vivo gene shuﬄing method in S. cerevisiae, and its application for the directed evolution of the T. lanuginosa lipase, resulting in lipase variants with a strong one-cycle wash eﬀect.

2

RESULTS

Muhlrad et al. (2) have shown that co-transformation of a PCR product with a gapped plasmid, containing homology at both ends of the PCR product, allows in vivo recombination in S. cerevisiae to repair the gap with the PCR product. This procedure is eﬃcient and requires no subcloning steps in E. coli. Described here are the results from the use of in vivo recombination in S. cerevisiae for gene shuﬄing by co-transformation of plasmids (linearized in the middle of a gene) and PCR products (containing variants of the linearized gene) as well as further developments of this procedure. These procedures are named ‘‘in vivo gene shuﬄing.’’

Figure 1 Alignment of the nucleotide sequences of the wild-type (uppercase) and synthetic (lowercase) T. lanuginosa lipase genes. The amino acid sequence of the lipase is written above the nucleotide sequence in the one-letter code. The restriction enzyme sites, which are introduced into the synthetic gene, are written below the nucleotide sequence.

In Vivo Gene Shuﬄing in Yeast

415

416

2.1

Okkels

Testing In Vivo Gene Shuffling of a Wild-Type Gene and a Synthetic Variant Gene

A synthetic gene expressing the T. lanuginosa lipase was constructed and cloned into an improved S. cerevisiae expression vector giving the plasmid pJSO37 (6). The synthetic gene contains 12 additional restriction sites compared to the wild-type gene, but no amino acid exchanges (Fig. 1). The plasmid containing the synthetic gene was cut with Nru I, Pst I, and Nru I+Pst I, respectively, to open the gene approximately in the middle of the DNA sequence encoding the lipase. The opened plasmid was transformed into S. cerevisiae YNG318 (available from ATCC) together with an approximately 0.9 kb wild-type T. lanuginosa lipase DNA fragment (see Figs. 1 and 2) prepared from pJSO26 (6) by PCR ampliﬁcation. Furthermore, the opened plasmid alone (i.e., without the 0.9-kb wild-type lipase DNA fragment) was also transformed into the yeast recombination host cell. The transformed yeast cells were plated on selective agar plates, and the transformation frequency was determined (7). It was found that the transformation frequency of the opened plasmid alone was very low (10 transformants/ Ag opened plasmid) in comparison to the transformation frequency of plasmid/fragment (50,000 transformants/Ag opened plasmid). The plasmid/

Figure 2 Testing of in vivo gene shuﬄing in S. cerevisiae as a method for gene shuﬄing. The wild-type T. lanuginosa lipase fragment was prepared by PCR ampliﬁcation using pJSO26 as a template (Ref. 6). The plasmid, pJSO37, containing the synthetic T. lanuginosa lipase gene, which has a 96% nucleotide sequence identity with the wild-type gene, was digested with Nru I. The Nru I site is located approximately in the middle of the synthetic lipase gene.

In Vivo Gene Shuﬄing in Yeast

417

fragment region covering the lipase gene was PCR-ampliﬁed from 20 randomly selected S. cerevisiae transformants. The recombination mixture of the 20 transformants was analyzed by restriction enzyme digestion (Fig. 2). Judged by this method, 20% contained a random exchange of the nucleotide diﬀerences between the wild-type and synthetic gene. In reality, this number is most likely higher, since only up to 8 of the possible 12 restriction sites were tested (7). This level of random exchanges was considered suﬃcient for any high throughput screening system, where the improved activity encoded by the oﬀspring DNA sequences can be distinguished from the activity encoded by the parent DNA sequences. The method was therefore applied to screening for improved detergent lipases as described below. 2.2

In Vivo Gene Shuffling of T. lanuginosa Lipase Variants

From site-directed and random mutagenesis of the T. lanuginosa lipase, a number of variants have been isolated with improved performance in a 3cycle detergent wash (8–10). None of them, however, showed an improvement after one-cycle wash. In an attempt to obtain variants with one-cycle wash improvements, 20 of the T. lanuginosa lipase variants were in vivo geneshuﬄed (Table 1). Six expression vectors were prepared from the lipase variants (a) to (f) (Table 1) by ligation into the yeast expression vector pJSO37 (6). All six vectors were digested with Nru I. DNA fragments of all 20 variants (a) to (t) (Table 1) were prepared by PCR ampliﬁcation. The 20 DNA fragments and the 6 opened vectors were mixed and transformed into S. cerevisiae YNG318. The transformed cells were plated on selective agar plates on top of two ﬁlters, and after the appearance of S. cerevisiae colonies, the protein-binding ﬁlters were screened for lipase activity in the presence of detergent as described in Ref. 11. The wild type and the 20 variants showed no activity under the conditions used for screening. A number of transformants from the library showed strong activity and were isolated and tested for improved wash performance. An example of the amino acid exchanges in two positive recombinants, identiﬁed using the ﬁlter assay, is shown in Fig. 3. Further rounds of in vivo gene shuﬄing and screening gave several lipase variants, which were strongly improved compared to any known lipase or lipase variant, and they removed a major part of lipid on cloth after one wash cycle. 2.3

Further Studies of the In Vivo Gene Shuffling Method

A number of inactivated T. lanuginosa lipase genes were constructed by introducing either frameshift mutations or stop codons in the synthetic lipase gene at various places. These were used to monitor the in vivo gene shuﬄing of

418

Okkels Table 1 List of the T. lanuginosa Lipase Variants Used for In Vivo Gene Shuﬄing (a) E56R,D57L,I90F,D96L,E99K (b) E56R,D57L,V60M,D62N,S83T,D96P,D102E (c) D57G,N94K,D96L,L97M (d) E87K,G91A,D96R,I100V,E129K,K237M,I252L,P256T,G263A,L264Q (e) E56R,D57G,S58F,D62C,T64R,E87G,G91A,F95L,D96P,K98I (f ) E210K (g) S83T,N94K,D96N (h) E87K,D96V (i) N94K,D96A (j) E87K,G91A,D96A (k) D167G,E210V (l) S83T,G91A,Q249R (m) E87K,G91A (n) S83T,E87K,G91A,N94K,D96N,D111N (o) N73D,E87K,G91A,N94I,D96G (p) L67P,I76V,S83T,E87N,I90N,G91A,D96A,K98R (q) S83T,E87K,G91A,N92H,N94K,D96M (r) S85P,E87K,G91A,D96L,L97V (s) E87K,I90N,G91A,N94S,D96N,I100T (t) I34V,S54P,F80L,S85T,D96G,R108W,G109V,D111G,S116P,L124S, V132M,V140Q,V141A,F142S,H145R,N162T,I166V,F181P,F183S, R205G, A243T,D254G,F262L

Figure 3 Amino acid exchanges in two isolated recombinants from in vivo gene shuﬄing of 20 T. lanuginosa lipase variants. Recombinant A is a recombination of two variants. - - - -: Originates from the variant (c) (see Table 1) and ====: originates from the variant (d). Recombinant B is a recombination of vector (c), DNA fragments (t) and (l). - - - -: Originates from the vector (c), <<<<: originates from the DNA fragment prepared from variant (t), ====: originates from the DNA fragment prepared from variant (l), and #### are mutations not present in the parent sequences. These have most likely arisen from a mutation during the PCR ampliﬁcation of the fragments.

In Vivo Gene Shuﬄing in Yeast

419

diﬀerent combinations of opened vector and fragments, which were inactivated at diﬀerent places, by scoring the ratio of active vs. inactive colonies. The detailed data are described in Ref. 7. Here is a summary of the results.

2.4

One frameshift mutation in the vector and another in the fragment on the opposite side of the opening gave 3–32% of active colonies depending on the location and combination. It can be concluded that the closer the mutation is to the ends of the vector, the higher the exchange is. The same experiment performed with stop codon mutations gave up to four times more colonies with lipase activity, compared to the frameshift mutations at the same positions, but giving the same relative diﬀerence between the opposite experiments. This might indicate that the stop codon mutations, which is closer to the application of the method (shuﬄing of nucleotide substitutions), give a better exchange of parent sequences than the frameshift mutations, maybe due to a DNA repair mechanism that is involved. One frameshift mutation in the opened vector and two in the fragment on each side of the opening site gave 4–42% of active colonies depending on the location and combination. The active colonies in several of the combinations can be considered as mosaics, meaning that more than one fragment piece has integrated into the vector parent. Two frameshift mutations in the opened vector on each side of the opening site and one in the fragment gave 0.5–3% of active colonies. All of these active colonies are mosaics of the parent DNA. A complete random exchange would have given 12.5% of active colonies. The amount of fragment DNA relative to vector DNA inﬂuences the result with an increasing integration of fragment sequences with an increasing amount of fragment relative to vector.

Test of the In Vivo Gene Shuffling in the S. cerevisiae Rad52 Mutant

The S. cerevisiae rad52 mutant transformed very well with the wild-type plasmid and expressed the T. lanuginosa lipase, but gave no transformants at all in several independent tests with any of the opened vector and fragment combinations. The Rad52 gene product is therefore absolutely required for the in vivo gene shuﬄing to occur. The Rad52 function is required for classical recombination (12), but not for unequal sister-strand mitotic recombination, indicating that the recombination of opened vector and fragments involves a classical recombination mechanism.

420

2.5 2.5.1

Okkels

Recombination of Multiple Partially Overlapping Fragments Two Partially Overlapping Fragments

In order to increase the exchange of parent mutations by the in vivo gene shuﬄing method, recombination of two fragments and one gapped vector was attempted (Fig. 4). Surprisingly, the recovery of the lipase gene is very eﬃcient as can be seen in Table 2. On the contrary, the last ﬁve rows in Table 2 show that the opened vectors, when transformed into S. cerevisiae alone or with only one fragment not covering the whole gap, give very few colonies. The ﬁrst row is with wild-type fragments and gives, as expected, 100% of active lipase colonies. The second row is with two fragments each containing a frameshift. The PCR331 fragment has the frameshift located where a wild-type fragment does not cover it, and therefore gives close to 0% of active lipase colonies as expected. The same is the case for rows 3 and 6. The precise location of the frameshift mutations can be deducted from the restriction sites shown in Fig. 1 and it is also described in Ref. 7. In row 4, fragment PCR386 contains a frameshift, which is overlapped by wild-type sequences in the gapped vector. The frameshift was recombined into less than

Figure 4 Overview of the fragments used in the in vivo gene shuﬄing of two partial overlapping fragments into a gapped vector. The restriction enzyme sites indicate the approximate location of the frameshift mutation in the respective fragment (if not wild type).

In Vivo Gene Shuﬄing in Yeast Table 2 Vector

421

Recombination Results of Two Partially Overlapping Fragments and a Gapped

Vector+fragment 1. pJSO37/HindIII-XhoI+PCR319+PCR327 2. pJSO37/HindIII-XhoI+PCR321+PCR331 3. pJSO37/HindIII-XhoI+PCR319+PCR331 4. pJSO37/HindIII-XhoI+PCR319+PCR386 5. pJSO37/HindIII-XhoI+PCR321+PCR386 6. Blue 428/HindIII-XhoI+PCR321+PCR331 7. Blue 428/HindIII-XhoI+PCR319+PCR327 8. Blue 428/HindIII-XhoI+PCR321+PCR327 9. Blue 428/HindIII-XhoI+PCR327+PCR385 10. Blue 429/HindIII-XhoI+PCR319+PCR386 11. Blue 429/HindIII-XhoI+PCR321+PCR386 12. Blue 442/HindIII-XhoI+PCR319+PCR327 13. Blue 428/HindIII-XhoI 14. Blue 429/HindIII-XhoI 15. Blue 442/HindIII-XhoI 16. Blue 428/HindIII-XhoI+PCR331 17. Blue 428/HindIII-XhoI+PCR321

Number of colonies

Percentage of colonies with lipase activity (%)

c2000 c2000 c1500 c5000 c5000 400 c1500 c150 c1500 c400 c350 c1500 2 0 6 4 2

100 c0.2 c1 c90 c25 0.2 > 95 c10 c10% c15% c15% c10% 0 0 0 0 0

The vector pJSO37 contains no frameshift mutations, blue 428 contains a frameshift at the SphI site, blue 429 at the SpeI site, and blue 442 contains a frameshift at both the SphI and SpeI sites (see the restriction enzyme site location in Fig. 1). The contents and locations of frameshift mutations in the PCR fragments are shown in Fig. 4.

10% of the genes, which is lower than the result for one fragment recombination. In row 5, a high exchange of parent sequences is observed between the two fragments, each containing a frameshift and the wild type gapped vector giving 25% active and 75% inactive lipase colonies. This is probably due to the fact that the fragment PCR321 has the frameshift in the overlap between the two fragments and in the gapped region of the vector. If fragment PCR386 contributes to 10% inactive lipase colonies like in row 4, fragment PCR321 gives the remaining 65% inactive lipase colonies—therefore PCR386 gives 35% wild type in the overlap. Row 7 is the opposite of row 4 with the frameshift on the vector and two wild-type fragments giving an integration of the wild-type fragment into more than 90% of the vectors. Row 8 shows like in row 5 that the frameshift of PCR321 in the overlap and gap region gives a very high number of inactive variants. In row 9, fragment PCR385 with a frameshift in the vector overlap causes a very high number of inactive lipase colonies. Row 10 gives a rather

422

Okkels

Table 3 Recombination Result of Three Partially Overlapping Wild-Type Fragments and a Gapped Vector

Vector+fragment pJSO37/PvuIISpeI+PCR353+PCR354+PCR367 pJSO37/PvuIISpeI+PCR353+PCR355+PCR367 pJSO37/PvuII-SpeI

Number of colonies

Percentage of colonies with lipase activity (%)

c5000

100

c5000

100

20

100

PCR353, PCR354, PCR355, and PCR367 are all wild-type fragments. The last row is a control with gapped vector without any fragment.

high number of inactive lipase colonies compared to rows 7 and 4. It is not increased in row 11. Row 12 shows that two frameshifts on the vector give a lower number of active lipase colonies compared to one in row 7. 2.5.2

Three Partial Overlapping Fragments

The recombination of three partial overlapping fragments into a gapped vector is also surprisingly eﬃcient as seen in Table 3. The last row with the vector alone gives very few colonies. All fragments used are wild type. In the ﬁrst row in Table 3, there are rather long overlaps between the vector and fragments, but in the middle row, the overlap between PCR353 and PCR355 is only 10 bp long and it is still very eﬃciently recombined! This surprising result may be utilized for very easy domain shuﬄing of even distantly related genes. For example, can three diﬀerent domains from 10 diﬀerent genes be made as PCR fragments, designed to have a 10 to 20 bp overlap by primer design and recombined together and subsequently screened for the best combination (1000 possible combinations). 3

DISCUSSION

Testing the in vivo gene shuﬄing in S. cerevisiae of two homologous genes demonstrated that at least 20% of the recombined genes were randomly shuﬄed, making it applicable for directed evolution. The presence of 30% recombinants, with one-half of the wild-type gene recombined into the synthetic gene, can be explained by the most recent models for recombination described in the literature (1). Following these models, the in vivo gene shuﬄing is initiated after transformation by the double-strand break in the

In Vivo Gene Shuﬄing in Yeast

423

vector, which in this case has been created by restriction enzyme digestion. This is followed by a 5V-3V-exonuclease processing yielding 3V single-strand tails. These tails pair with the homologous fragment and repair DNA synthesis is initiated using the fragment DNA as template. Then resolution of the intermediates and ligation complete the recombination process. In vivo gene shuﬄing of T. lanuginosa lipase variants yielded improved variants that have been formed by recombination of three or more variants. One example is shown with the amino acid exchanges of the improved variant, recombinant B (Fig. 3). This variant contains amino acid exchanges from three parent sequences, one vector and two fragment parent sequences, demonstrating that more than two parent sequences can be recombined using this method. In addition, this variant contains mutations, which are not a result of in vivo recombination, as none of the parent lipase variants contains any of these mutations. Consequently, these mutations are most likely a result of random mutagenesis arisen during the preparation of the DNA fragments by the PCR ampliﬁcation. Introduction of this low level of random mutations is often beneﬁcial for ﬁnding improved variants by gene shuﬄing, mimicking the natural evolution of proteins. Protein engineering of enzymes has gone through a rapid development over the last two decades. Site-directed mutagenesis, based on structure– function predictions, has yielded a large number of improvements in the understanding of enzymes. It is, however, still very diﬃcult to predict all the amino acid exchanges that are needed for strongly improving the performance of an enzyme for a complex application. Random mutagenesis creates a diversity of amino acid exchanges that can be screened for the desired improved properties. The diversity generated by random mutagenesis is, however, of relatively low quality due to the generation of a relatively large number of nonfunctional variants. On the contrary, DNA shuﬄing of improved or naturally occurring genes gives very high-quality diversity for a desired enzyme activity because the diversity samples mostly functional sequence space and at the same time covers greater sequence space. An example of a comparison between the results from random mutagenesis and DNA shuﬄing, respectively, is described in Crameri et al. (13). A powerful in vitro DNA shuﬄing method, using PCR, has been developed (14,15). This method is more generally applicable than the described in vivo shuﬄing method, since the shuﬄed products can be cloned and expressed into any expression host and a more uniform exchange of parent sequences can be obtained. The in vivo shuﬄing method, however, is a very fast, simple, and easy method if the target enzyme can be expressed and screened in yeast. The method has proven its value by producing an industrially relevant enzyme, which was not identiﬁed by intensive site-directed or random mutagenesis approaches alone.

424

Okkels

ACKNOWLEDGMENTS I am very grateful to Anette Holtmann, Kim Borch, Sham Patkar, Marianne Tellersen, Jesper Vind, Allan Svendsen, and other colleagues from Novozymes for their contributions to the presented work.

REFERENCES 1. 2. 3. 4.

5. 6.

7. 8. 9.

10. 11.

12.

13.

14.

15.

P Sung, KM Trujillo, SV Komen. Recombination factors of Saccharomyces cerevisiae. Mutat Res 451:257–275, 2000. D Muhlrad, R Hunter, R Parker. A rapid method for localized mutagenesis of yeast genes. Yeast 8:79–82, 1992. DH Jones, AN Riley, SC Winistorfer. Production of a vector to facilitate DNA mutagenesis and recombination. Biotechniques 16:694–701, 1994. E Boel, IB Huge-Jensen. Recombinant Humicola lipase and process for the production of recombinant Humicola lipases. European Patent Application EP 0305216, 1989. E Gormsen. Proceedings of the 3rd World Conference on Detergents. Illinois, USA: AOCS Press, 1993, pp 198–203. JS Okkels. An URA3-promoter deletion in a pYES type vector increases the expression level of a fungal lipase in Saccharomyces cerevisiae. Ann NY Acad Sci 782:202–207, 1996. JS Okkels. Method for preparing polypeptide variants. International Patent Application WO97/07205. European Patent EP843725. A Svendsen, IG Clausen, SA Patkar, E Gormsen. Lipase variants. International Patent Application WO 9205249, 1992. JS Okkels, A Svendsen, SA Patkar, K Borch. Protein engineering of microbial lipases of industrial interest. In: FX Malcata, ed. Engineering of/with Lipases. NATO Adv Stud Inst Ser E Appl Sci 317:203–218, 1995. A Svendsen, SA Patkar, E Gormsen, IG Clausen, JS Okkels, M Thellersen. Lipase variants. United States Patent US589213. JS Okkels, A Svendsen, K Borch, M Thellersen, DA Petersen, SA Patkar, J Royer, and T Kretzschmar. Novel lypolytic enzymes. International Patent application WO97/07202. UH Mortensen, C Bendixen, I Sunjevaric, R Rothstein. DNA strand annealing is promoted by the yeast Rad52 protein. Proc Natl Acad Sci USA 93:10729– 10734, 1996. A Crameri, S-A Raillard, E Bermudez, WPC Stemmer. DNA shuﬄing of a family of genes from diverse species accelerates directed evolution. Nature 391:288–291, 1998. WPC Stemmer. DNA shuﬄing by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc Natl Acad Sci USA 91: 10747–10751, 1994. WPC Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature 370:389–391, 1994.

19 Effective DNA Shuffling Methods for Enzyme Evolution Osamu Kagami and Sang-Ho Baik Marine Biotechnology Institute Co., Ltd. Kamaishi, Japan

Shigeaki Harayama National Institute of Technology and Evaluation Tokyo, Japan

1

INTRODUCTION

The ultimate purpose of protein engineering is to produce proteins having desired functions. Traditional methods of protein engineering have involved random mutagenesis by using chemicals or radiation. These methods were used by molecular biologists in demonstrating how enzymes can be tailored for applications. However, in classical mutagenesis, most mutations are harmful or neutral, and for successful mutagenesis, it is important to avoid introducing many mutations in target genes. Therefore optimum tuning of the mutation rate is required in addition to screening large numbers of mutants (1). Another drawback of traditional mutagenesis methods is that point mutations can only introduce a limited range of amino-acid substitutions because of the nature of the codon table, and the expected frequency of a change from one amino acid to another is quite diﬀerent for diﬀerent types 425

426

Kagami et al.

of amino acid substitutions. This limitation narrows the sequence space of mutant proteins that traditional mutagenesis methods can create. The rational design of novel proteins is another approach: selected amino acid residues of a protein are modiﬁed to develop mutant proteins with the desired properties. Knowledge of the three-dimensional structure of at least one member of a target protein family is generally a prerequisite for the selection of target amino acid residues for site-directed mutagenesis (2). However, many attempts to alter the properties of proteins by this approach failed because of the unexpected inﬂuence on the structure and function of target enzymes caused by introduced amino acid substitutions. These failures might have resulted, to some extent, from diﬃculties in predicting the functional and structural properties of modiﬁed proteins because of an incomplete understanding of the underlying mechanism required to enhance the desired enzyme properties. Evolutionary protein engineering, which involves recursive random mutagenesis and selection, is a rapidly growing tool to modify proteins for biotechnological applications. This is a developed form of the traditional mutagenesis methods. Evolutionary protein engineering has usually been involved in the construction of large mutant protein libraries from which desired catalytic properties are screened over multiple generations. DNA shuﬄing was proposed as an eﬀective search strategy for evolutionary protein engineering (3,4), and it provides methods to introduce a wide variety of mutations. 2

DNA SHUFFLING

In vitro evolution by random mutagenesis can generate several useful mutations in the mutagenized population after one round of the mutagenesis cycle. However, only those mutations existing in the selected clones are inherited by the next generation. Moreover, selected mutants may have neutral or slightly harmful mutations in addition to the useful ones and these deleterious mutations may also be accumulated through generations (5). DNA shuﬄing overcomes this limitation (6,7). It involves the random fragmentation of several mutant DNAs and their subsequent reassembly by polymerase chain reaction (PCR) into full-length molecules (Fig. 1). The DNA segments of several of the improved mutants produced by random mutagenesis are mixed and randomly fragmented by deoxyribonuclease I (DNase I). This mixture of fragments is then applied to the PCR reaction without adding foreign primers. Instead of DNase I fragmentation, random priming PCR can be used to generate DNA fragments (8). The fragments are recombined randomly in vitro in this PCR reaction to form a chimera gene library. In addition, novel mutations caused by the PCR reaction itself can be introduced by tuning the

DNA Shuﬄing Methods for Enzyme Evolution

427

Figure 1 DNA shuﬄing. DNase I randomly fragments genes possessing various useful mutations. The resulting fragments are mixed and recombined by PCR without primers. Novel mutations can be introduced by this process. Finally, PCR with primers complementary to the 5V and 3V ends of the genes ampliﬁes to full-length chimeric genes. Unﬁlled circles indicate useful mutations; unﬁlled stars indicate newly introduced mutations; unﬁlled triangles indicate harmful mutations.

428

Kagami et al.

PCR conditions in this step. The chimera gene library is ﬁnally applied to a further PCR reaction to elongate to full-length enzyme genes with primers corresponding to the sequence at the 5V and 3V portions of the enzyme genes. These genetically diverse gene products can be screened for desirable or improved functions, and the improved clones selected from the ﬁrst-round screening procedure are reshuﬄed to accumulate the beneﬁcial mutations and remove the deleterious ones. Backcrossing with the wild-type gene is eﬀective for removing the neutral mutations. This method enables multiple useful mutations to be accumulated in the recombined mutants in each generation and thus provides faster evolution than the method using single amino-acid substitution. DNA shuﬄing has been applied to successfully obtain a number of improved enzymes (9–13). 3

FAMILY SHUFFLING

Another approach to create diversity is shuﬄing a family of homologous genes, instead of a set of mutant genes derived from a single gene; this is called ‘‘family shuﬄing’’ (14; Fig. 2). When a single sequence is used as the starting material, diversity is generated by random mutagenesis and is limited to point mutations around the original sequence. Furthermore, the low yield of beneﬁcial mutations from single-point mutations (15) results in a relatively slow evolution rate of the desired function in such experiments. The deleterious variants have already been removed by the evolution process in naturally occurring homologous sequences and they are assumed to be preenriched for functional diversity. Therefore one would expect more diverse variants to be created by family shuﬄing. A potential problem of family shuﬄing is the strong tendency for reconstitution of the parental structures by PCR-based reassembly methods. A low eﬃciency of the mosaic structure formation in family shuﬄing with deoxyribonuclease I has thus frequently been reported (16,17). The low yield of chimeric genes may not be a problem when eﬀective screening methods are combined. However, it is assumed to increase the diﬃculty of selecting the improved properties as indicated by a relatively low signal/noise (S/N; in this case, signal corresponds to mutant activity and noise corresponds to wild-type activity) ratio (i.e., thermal resistance and improved relative activity). Several techniques were reported to improve this low crossover rate between the fragmented genes. Random chimeragenesis on transient templates (RACHITT; 18) is one of the techniques based on DNA shuﬄing, where single-stranded parental gene fragments are annealed on to a full-length, single-stranded template. A several-fold higher crossover rate by RACHITT has been reported in the evolution of dibenzothiophene monooxygenase than by previous methods (18). Combinatorial libraries enhanced by recom-

DNA Shuﬄing Methods for Enzyme Evolution

429

Figure 2 Family shuﬄing. Naturally occurring homologous genes (family genes) are applied to DNA shuﬄing to create a library of chimeric genes. The desired clones are subsequently selected by an appropriate screening procedure.

bination in yeast (CLERY; 19) is a modiﬁed family shuﬄing method combined with in vivo homologous recombination in yeast (20,21). Multistep hybridization of two family sequences was conducted, before recombinant PCR, at a decreasing temperature from high to very low, which may favor low-stringency hybridization of the family sequences. The PCR products thus prepared were subsequently recombined in vivo with a cloning vector (in yeast). This in vivo recombination step may produce a second round of DNA shuﬄing between similar but not identical PCR products transformed into a single yeast cell. This method has yielded libraries with a low content of parental structures. However, a relatively low number of functionally active clones (12%) were observed, possibly because they incorporated point

430

Kagami et al.

mutations. We also developed simple methods for family shuﬄing to achieve more eﬀective recombination and utilized them for acquiring improved properties of enzymes. 4

SINGLE-STRANDED DNA SHUFFLING

It was assumed that the formation of the parental gene structures by family shuﬄing was caused by annealing of the DNA fragments derived from the same gene (homoduplex formation; Fig. 3), whose probability was assumed to be much higher than that of heteroduplex formation. To reduce this homoduplex formation, a family shuﬄing method with single-stranded DNAs (ssDNAs) was developed (15). DNase I can cleave not only double-stranded

Figure 3 Potential problem with family shuﬄing. Homoduplex molecules may be formed in the annealing process at a higher probability than heteroduplex molecules by family shuﬄing with genes showing relatively low homology. The overall frequency of recombination in the products may therefore be low.

DNA Shuﬄing Methods for Enzyme Evolution

431

DNAs (dsDNAs), but also ssDNAs (22). Accordingly, ssDNAs can be used as materials for family shuﬄing. ssDNAs of two homologous genes are prepared via this method (Fig. 4), one being the coding strand of one gene, and the other being the noncoding strand of the other gene. By using ssDNAs of opposite direction in each parent, high recombination eﬃciency between two parental sequences can be expected, because only heteromeric pairs are hybridized in the ﬁrst round of the PCR reaction. ssDNAs can be prepared by cloning each DNA to phagemids, or by asymmetric PCR with only one side of the primer (5V- or 3V-side) being used. This method has been applied to the shuﬄing of two catechol 2,3-dioxygenase (C23O, EC 1.3.11.2) genes (23),

Figure 4 Family shuﬄing using ssDNAs. ssDNAs of opposite direction are prepared and used for family shuﬄing to prevent homoduplex formation. After fragmenting ssDNAs by DNase I, these fragments are applied to a PCR reaction without primers. Only heteroduplex molecules are formed in the ﬁrst cycle of PCR.

432

Kagami et al.

xylE and nahH sharing 80% identity in their nucleotide sequences (15). ssDNAs of each nahH and xylE were respectively prepared by cloning in the phagemid vectors pBluescript KS(+), and pBluescript SK(+). Escherichia coli TOP10FV cells were then transformed with these phagemids. To produce phage particles containing ssDNAs with the opposite directions, each transformant was infected by the M13KO7 helper phage (24). Each ssDNA was then randomly fragmented with DNase I, and fragments in the size range of 40–100 bases were mixed and subjected to family shuﬄing. The reconstructed genes were reintroduced in pBluescript SK(+). After transforming the E. coli TOP10FV cells, about 60% of the colonies grown on the plates showed C23O activity. When 50 randomly selected clones exhibiting C23O activity were analyzed for the nucleotide sequences of their C23O genes, seven of them (14%) were chimeric, the others being either parental genes (40 clones) or their point mutants (three clones). This chimera formation rate is much higher than the rate attained via the double-stranded method, which obtained less than 1% of chimera genes (15). Although this method is applicable for only two parents at a time, more than three genes can also be dealt with. The homologous genes are ﬁrst divided into sets of two diﬀerent genes. If the number of homologous genes is odd, one gene with the worst property is discarded. Each of the two genes is then subjected to shuﬄing with ssDNAs, and the shuﬄed libraries are screened for desirable properties. The selected improved clones from diﬀerent libraries are next divided into sets of two gene libraries and subjected to shuﬄing with ssDNAs again. After the process is repeated, one shuﬄed library ﬁnally remains and is subjected to the ﬁnal screening. 5

FAMILY SHUFFLING WITH RESTRICTION ENZYME-DRIVEN FRAGMENTATION

The eﬃciency of chimera formation by shuﬄing with ssDNAs was obviously improved, but not particularly high. Moreover, additional steps for ssDNA puriﬁcation are required. To obtain a higher probability of recombination, another family shuﬄing method, which uses a restriction enzyme instead of DNase I, was developed (15; Fig. 5). When diﬀerent family DNA was digested by diﬀerent restriction enzymes and subjected to family shuﬄing, annealing the digested DNA fragments would produce a homoduplex at a high frequency, but signiﬁcant DNA elongation would only occur on the heteroduplex molecules. This method was to shuﬄe nahH and xylE (14). The combination of any restriction enzymes can be used for shuﬄing any homologous genes, X and Y. However, if the cleavage site on X is close to the cleavage site of Y, or if the end of a DNA fragment of X generated by restriction enzyme digestion is not strongly homologous to the corresponding

DNA Shuﬄing Methods for Enzyme Evolution

433

Figure 5 Family shuﬄing by using DNA fragments generated by restriction enzyme digestion. Each parental DNA is fragmented by a diﬀerent set of restriction enzymes. Only heteroduplex molecules with space for elongation are ampliﬁed in the PCR reassembly process.

region of Y, annealing between the X and Y DNAs in this region may not be eﬀective, and hardly any DNA elongation may occur beyond this region. Because the problematic regions for DNA elongation are assumed to be diﬀerent in diﬀerent sets of enzyme-cleaved fragments, several sets of restriction enzyme digestion were used to avoid these problems. nahH and xylE were each digested by three diﬀerent restriction enzymes and then shuﬄed by a three-step PCR reaction (Fig. 6). The chimeric genes obtained by this shuﬄing were cloned into expression plasmid pZErO-2, and a chimeric library was constructed by transforming E. coli TOP10FV. The recombinant plasmids isolated from 10 randomly selected transformants exhibiting C23O activity were then analyzed. This method yielded nahH– xylE hybrids at a frequency of 100% (14). Because the digestion sites of restriction enzymes are not random, the diversity of the shuﬄed genes was to be lower than that by random fragmentation by DNase I. To conﬁrm whether the diversity of the library is improved by increasing the number of restriction enzymes, each xylE and nahH was digested by three, four, or six diﬀerent restriction enzymes and then subjected to family shuﬄing. Each 10 clones exhibiting C23O activity from the library made by digestion with three, four, or six restriction enzymes were randomly selected, and their nucleotide

434

Kagami et al.

Figure 6 Example of family shuﬄing by using restriction fragments. DpnI, Fnu4HI, or ScrFI was used to cleave nahH (fragments A, B, and C, respectively). Similarly, BslI, MnlI, or TaqI was used to cleave xylE (fragments D, E, and F, respectively). Subsequently, nahH fragments A, B, and C, and xylE fragments D, E, and F were mixed as indicated in the table. The resulting nine diﬀerent pairs of the mixture were then each subjected to the ﬁrst PCR reaction without a primer. The products from these nine reactions were next mixed in a single tube, and the second PCR reaction was carried out under the same conditions. Full-length chimeric DNAs were ﬁnally ampliﬁed from the second PCR products as templates (third PCR) with the 3V- and 5V-oligonucleotide primers.

DNA Shuﬄing Methods for Enzyme Evolution

435

sequences were determined. In the chimeric gene library made by using three restriction enzymes, the average number of crossovers in a single clone was 4.4F1.7. In the libraries constructed by using four and six restriction enzymes, the average numbers of crossovers in a single clone were similar, being 4.8F1.1 and 3.8F1.4, respectively. However, the same crossover sites recurred more frequently in independent clones when fewer restriction enzymes were used (Fig. 7). For example, among the 10 clones constructed by using three restriction enzymes, one particular crossover site was shared by six diﬀerent clones, while another crossover site was shared by ﬁve diﬀerent

Figure 7 Estimate of the diversity in chimera libraries made by family shuﬄing with three (ﬁlled circles), four (unﬁlled squares), and six (unﬁlled circles) diﬀerent restriction enzymes. In the case of shuﬄing with three restriction enzymes, the nahH fragment cleaved by DdeI+AvaI and the xylE fragment cleaved by NciI were shufﬂed. In the case of shuﬄing with four restriction enzymes, four digests, nahH/HhaI (nahH DNA digested by HhaI), nahH/BstNI, xylE/NciI, and xylE/Sau3AI, were mixed in the four diﬀerent combinations, nahH/HhaI+xylE/NciI, nahH/HhaI+ xylE/Sau3AI, nahH/BstNI+xylE/NciI, and nahH/BstNI+xylE/Sau3AI, before being subjected to family shuﬄing. In the case of shuﬄing with six restriction enzymes, the digestion, restriction enzymes and PCR conditions were the same as those shown in Fig. 6. Ten clones were randomly selected from each of the family shuﬄing libraries, and the nucleotide sequences of the clones were determined. The crossover points in each clone were analyzed, and those shared by more than one clone were identiﬁed.

436

Kagami et al.

clones. In contrast, the same crossover sites were shared by less than three clones when the library was constructed by using six restriction enzymes. 6

ISOLATION OF THERMOSTABLE C23OS BY EFFECTIVE FAMILY SHUFFLING METHODS

The C23O chimeric library constructed by the ssDNA-based or restriction enzyme-based shuﬄing method was screened for thermal stability. Colonies exhibiting C23O activity were transferred to membrane ﬁlters and then incubated at 65jC for 10 min (14,15). Those clones showing residual C23O activity after the heat treatment were selected. After one round of shuﬄing and screening, about 1.3% of the 750 clones produced by ssDNA-based family shuﬄing exhibited greater thermal stability than that of the wild types. On the other hand, approximately 15% of the more than 2000 clones produced by restriction enzyme-based family shuﬄing exhibited greater thermal stability than wild-type XylE and NahH. The most thermally stable chimeric enzyme, hybrid 6637, was obtained after two rounds of restriction enzyme-based shuﬄing and screening. Although the km and kcat values for hybrid 6637 (6.5 AM and 430 sec1, respectively) are almost the same as those for NahH (4.0 AM and 440 sec1, respectively), the half-life of 6637 at 50jC (70.0 min) was much longer than that of NahH (2.7 min) or XylE (5.4 min). In spite of its greatly improved thermal stability, most of the sequences of 6637 were derived from NahH, and only small C-terminal sequences (7 out of 307 amino acids) came from XylE (14). According to the threedimensional structure of XylE (25), the C-terminal region is located in monomer–monomer interface in a homotetramer. Interactions are occurred between the C-terminal region of one monomer and the middle regions of the adjacent monomer. Therefore, we assume that the enhanced thermostability of 6637 would be caused by the increased stability of subunit-subunit interaction in the homotetramer. 7

CASSETTE PCR

Family shuﬄing is a powerful method for improving enzymes, although it requires two or more related genes. However, molecular cloning of new family genes requires many steps and laborious work. To retrieve homologous genes from a mixed bacterial culture, a method called ‘‘cassette PCR’’ was developed (26). This method involves the following: (1) isolation of DNA from the mixed bacterial culture, (2) PCR ampliﬁcation of central segments of genes from the isolated DNA, and (3) second PCR to join the central segments with the 5V and 3V regions of the gene of a known enzyme (Fig. 8). To enrich the bacteria possessing C23O genes, bacteria in soil or

DNA Shuﬄing Methods for Enzyme Evolution

437

Figure 8 Cassette PCR. Phenol- or crude oil-degrading bacteria are enriched from environmental samples (soil, seawater, etc.). Diverse bacteria possessing diﬀerent catechol 2,3-dioxygenase (C23O) genes may be enriched in the mixed culture. Genomic DNAs are prepared from the mixed culture, and the central part of the C23O genes is PCR ampliﬁed (ﬁrst PCR) from the genomic DNAs with the degenerate primers sets (CR and CF) designed from consensus sequences of the C23O genes. The 5V and 3V arms are PCR ampliﬁed from a known C23O gene (nahH) with corresponding primer sets (5R and 5F, and 3R and 3F). The central C23O gene fragments and the 5V and 3V arms are then joined in the second PCR to compose a library of hybrid C23O genes.

seawater were cultured in a medium containing either phenol or crude oil as the carbon source, and DNA templates for PCR ampliﬁcation were prepared from bacterial cells grown in the mixed culture. A set of degenerate primers (CR and CF) was designed from the amino acid sequences conserved among the 1.2.A subfamily C23Os (KEYTGKW and TIYFFDP) to amplify the central C23O gene segments. The central gene sequences of approximately 0.5 kb in size were ampliﬁed with the primer set, and the ampliﬁed fragments were subsequently joined to both the 5V and 3V arms of nahH to compose full-length hybrid C23O genes. This method enabled divergent C23O sequences to be readily isolated. More than 90% of the E.

438

Kagami et al.

coli clones harboring hybrid plasmids expressed C23O activity, and 20 clones exhibiting C23O activity obtained by using DNAs from phenoldegrading bacteria were isolated and their nucleotide sequences determined. The nucleotide sequences of the central C23O gene segments could be classiﬁed into three types, P3, P8, and P16, according to their deduced amino-acid sequences. The amino acid sequence of the central C23O gene segment encoded by P3 was identical to that of XylE in Pseudomonas putida HS1 (27), and that encoded by clone P8 showed 95% identity with the XylE sequence of P. putida mt-2 (28). However, the amino acid sequence encoded by clone P16 did not show any strong similarity to the other C23O sequences. P3 and P16 in these hybrid C23Os exhibited greater thermostability than NahH. These hybrid enzymes also exhibited stronger aﬃnity for all the examined substrates (catechol, 4-methylcatechol, 3-methylcatechol, and 4-chlorocatechol) than NahH, although their Vmax values were smaller than that of NahH (26). Therefore cassette PCR allowed not only the isolation of divergent functional C23O genes, but also the acquisition of enzymes with divergent properties, without needing to isolate individual bacteria and gene cloning. Cassette PCR combined with family shuﬄing would be a powerful tool for enzyme evolution. 8

CONCLUSION

Recursive genetic recombination represented by DNA shuﬄing has become the most powerful method for enzyme evolution. One of the greatest merits of this method is that no information about the three-dimensional structure or substrate-binding sites of the enzyme is required (16). Additionally, important information about the structures and functions of enzymes could be obtained from an analysis of the isolated enzymes created by these methods without three-dimensional structural information. In practice, it is diﬃcult to assume that speciﬁc amino-acid residues would be involved in thermal stability from only the primary or tertiary structure of an enzyme. On the other hand, it is possible to select thermally stable mutants from a wide variety of mutations and estimate the mechanism for thermal stability by an analysis of the primary structures of these mutants. Indeed, thermostable C23O mutants obtained from the eﬀective family shuﬄing methods described here have provided important information about the stability mechanism for C23O. Optimization of the DNA shuﬄing processes is required for the successful in vitro evolution of target proteins. Furthermore, the probability of desired mutations should be estimated to identify the required size of mutant libraries. However, we can assess very little information about the composition of a shuﬄed library. Joern et al. reported an eﬀective method

DNA Shuﬄing Methods for Enzyme Evolution

439

for analyzing a shuﬄed gene library by probe hybridization in a macroarray format (29). They have shown the existence of notable bias in the shuﬄing reaction lowering the diversity of the mutant library from the characterization of hundreds of shuﬄed genes encoding dioxygenases by the method. As expected, recombination occurred in the regions of high-sequence identity. In addition, the chimeric gene library was found to show a bias to exclude the sequences from a low-identity parent. Such bias with recombination in high-identity sequences cannot be avoided with the current family shuﬄing methods as they are based on homologous recombination. Whole-genome shuﬄing was recently developed for phenotypic improvement in bacteria (30,31). This method is an extension of in vitro DNA shuﬄing into the in vivo entire genome. Recursive protoplast fusion of bacterial cells derived from multiple strains enabled rapid the improvement of phenotypes, i.e., tyrosine production by Streptomyces fradiae and acid tolerance of Lactobacillus strains were achieved with a signiﬁcantly higher evolution rate compared with that by any classical method for strain improvement. Although family shuﬄing is assumed to be more suitable for the improvement of a particular protein, genome shuﬄing is an undoubtedly useful technique for the concept of metabolic engineering. It is also important to design eﬀective screening methods for the production of improved enzymes. As described in the next part of this book, Part III: Screening, eﬀective screening and selection strategies are a growing realm. Eﬀective DNA shuﬄing methods combined with a high-throughput screening method will provide new applications for both the industrial and therapeutic optimization of proteins.

ACKNOWLEDGMENT This work was supported by New Energy and Industrial Technology Development Organization (NEDO).

REFERENCES 1. 2. 3. 4.

JC Moore, FH Arnold. Directed evolution of a para-nitrobenzyl esterase for aqueous–organic solvents. Nat Biotechnol 14:458–467, 1996. JL Cleland, CS Craik. Protein Engineering: Principles and Practice. New York: Wiley-Liss, 1996. WP Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature 370:389–391, 1994. WP Stemmer. DNA shuﬄing by random fragmentation and reassembly. Proc Natl Acad Sci USA 91:10747–10751, 1994.

440

Kagami et al.

5.

HJ Muller. The relation of recombination to mutational advance. Mutat Res 1:2–9, 1964. Z Shao, H Zhao, L Giver, FH Arnold. Random-priming in vitro recombination: an eﬀective tool for directed evolution. Nucleic Acid Res 26:681–683, 1998. A Crameri, EA Whitehorn, E Tate, WPC Stemmer. Improved green ﬂuorescent protein by molecular evolution using DNA shuﬄing. Nat Biotechnol 14:315– 319, 1996. A Crameri, G Dawes, E Rodriguez Jr, S Silver, WPC Stemmer. Molecular evolution of an arsenate detoxiﬁcation pathway by DNA shuﬄing. Nat Biotechnol 15:436–438, 1997. JC Moore, HM Jim, O Kuchner, FH Arnold. Strategies for the in vitro evolution of protein function: enzyme evolution by random recombination of improved sequences. J Mol Biol 272:336–347, 1997. T Yano, S Oue, H Kagamiyama. Directed evolution of an aspartate aminotransferase with new substrate speciﬁcities. Proc Natl Acad Sci USA 95:5511– 5515, 1998. JH Zhang, G Dawes, WPC Stemmer. Directed evolution of a fucosidase from a galactosidase by DNA shuﬄing and screening. Proc Natl Acad Sci USA 94:4504–4509, 1997. A Crameri, SA Raillard, E Bermudez, WPC Stemmer. DNA shuﬄing of a family of genes from diverse species accelerates directed evolution. Nature 391:288–291, 1998. H Zhao, FH Arnold. Optimization of DNA shuﬄing for high ﬁdelity recombination. Nucleic Acids Res 25:1307–1308, 1997. M Kikuchi, K Ohnishi, S Harayama. Novel family shuﬄing methods for the in vitro evolution of enzymes. Gene 236:159–167, 1999. M Kikuchi, K Ohnishi, S Harayama. An eﬀective family shuﬄing method using single-stranded DNA. Gene 243:133–137, 2000. FH Arnold. When blind is better: protein design by evolution. Nat Biotechnol 16:617–618, 1998. SW Michnick, FH Arnold. ‘‘Itching’’ for new strategies in protein engineering. Nat Biotechnol 17:1159–1160, 1999. WM Coco, WE Levinson, MJ Crist, HJ Hektor, A Darzins, PT Squires, DJ Monticello. DNA shuﬄing method for generating highly recombined genes and evolved enzymes. Nat Biotechnol 19:354–359, 2001. V Abecassis, C Pompon, G Truan. High eﬃciency family shuﬄing based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2. Nucleic Acids Res 28:E88, 2000. C Mezard, D Pompon, A Nicolas. Recombination between similar but not identical DNA sequences during yeast transformation occurs within short stretches of identity. Cell 70:659–670, 1992. D Pompon, A Nicolas. Protein engineering by cDNA recombination in yeasts: shuﬄing of mammalian cytochrome P-450 functions. Gene 83:15–24, 1989.

6.

7.

8.

9.

10.

11.

12.

13. 14. 15. 16. 17. 18.

19.

20.

21.

DNA Shuﬄing Methods for Enzyme Evolution

441

22. E Melgar, DA Goldthwait. Deoxyribonucleic acid nucleases: II. The eﬀects of metals on the mechanism of action of deoxyribonuclease I. J Biol Chem 243: 4409–4418, 1968. 23. P Cerdan, M Rekik, S Harayama. Substrate speciﬁcity diﬀerences between two catechol 2,3-dioxygenases encoded by the TOL and NAH plasmids from Pseudomonas putida. Eur J Biochem 229:113–118, 1995. 24. J Viera, J Messing. Production of single-stranded plasmid DNA. Methods Enzymol 153:3–11, 1987. 25. A Kita, S Kita, I Fujisawa, K Inaka, T Ishida, K Horiike, M Nozaki, K Miki. An archetypical extradiol-cleaving catecholic dioxygenase: the crystal structure of catechol 2,3-dioxygenase (metapyrocatechase) from Pseudomonas putida mt2. Struct Fold Des 7:25–34, 1999. 26. A Okuta, K Ohnishi, S Harayama. PCR isolation of catechol 2,3-dioxygenase gene fragments from environmental samples and their assembly into functional genes. Gene 212:221–228, 1998. 27. RC Benjamin, JA Voss, DA Kunz. Nucleotide sequence of xylE from the TOL pDK1 plasmid and structural comparison with isofunctional catechol 2,3dioxygenase genes from TOL pWW0 and NAH7. J Bacteriol 173:2724–2728, 1991. 28. C Nakai, H Kagamiyama, M Nozaki, T Nakamura, S Inoue, Y Ebina, A Nakazawa. Complete nucleotide sequence of the metapyrocatechase gene on the TOL plasmid of Pseudomonas putida mt-2. J Biol Chem 258:2923–2928, 1983. 29. JM Joern, P Meinhold, FH Arnold. Analysis of shuﬄed gene libraries. J Mol Biol 316:643–656, 2002. 30. Y Zhang, K Perry, VA Vinci, K Powell, WPC Stemmer, SB del Cardayre. Genome shuﬄing leads to rapid phenotypic improvement in bacteria. Nature 415:644–646, 2002. 31. R Patnaik, S Louie, V Gavrilovic, K Perry, WPC Stemmer, CM Ryan, S del Cardayre. Genome shuﬄing of Lactobacillus for improved acid tolerance. Nat Biotechnol 20:707–712, 2002.

20 Exploring the Functional Space of Combinatorial Mutant Libraries for the Directed Evolution of Novel Enzyme Activities Bengt Mannervik, Lars O. Hansson, and William G. Bardsley Uppsala University Uppsala, Sweden

1 1.1

PRINCIPLES OF PROTEIN EVOLUTION BY REDESIGN Functional Plasticity of Proteins

All life processes are dependent on the functional and structural versatility of proteins. At the molecular level, structural scaﬀolds, selective transport of molecules, sensing and signaling, mechanochemical locomotion, energy transduction, etc. bear evidence of the functional plasticity of proteins. Essentially all biotransformations of chemical species are catalyzed by enzymes, which govern the ﬂuxes of reactants in the myriad of simultaneous chemical reactions in living cells. The enzymatic functions are eﬀected with exquisite speciﬁcities and optimized catalytic eﬃciencies. This chapter is focused on the optimization of directed enzyme evolution based on the creation of mutant populations and identiﬁcation of variants with novel activities. 443

444

1.2

Mannervik et al.

Protein Molecules Regarded as Composites of Structural Modules

Proteins can be considered as being composed of structural building blocks at all hierarchical levels. The primary structure is a polypeptide chain in which linear segments of peptides are covalently linked together and where the smallest segment is a single amino acid residue. At the level of secondary structure a-helices, loops, and h-strands are modular units, and at the tertiary level modules in the form of structural domains can often be observed. In oligomeric proteins the subunits are the obvious building blocks of quaternary structure. From the viewpoint of molecular engineering, the design and assembly of proteins can be regarded as combinatorial chemistry in three dimensions. The structures of the soluble glutathione transferases (GSTs) illustrate how recombination of building blocks at all structural levels gives rise to novel functional properties (1).

1.3

Evolutionary Aspects of Structural Diversity

Studies of naturally occurring proteins suggest that tertiary structures have only a limited number of folds (2,3) and that the majority of the proteins arise by redesign of existing molecules. Gene duplication followed by mutagenesis and selection of proteins with valuable properties is a major pathway for natural protein evolution. The mutations may involve single amino acid residues or segments of primary structure. Even changes of tertiary or quaternary structures are based on alterations of the primary structure. One or a few single-point mutations may cause a dramatic increase in catalytic activity (4– 6), but for acquisition of functions that diﬀer substantially from the parent protein more substantial changes of structure are generally required (7). In natural protein evolution exon shuﬄing and gene conversion have likely contributed to new combinations of existing segments of primary structure (8). Similar recombination of sequences has great potential also in the engineering of proteins for novel functions. Fusion points linking segments of partners can be chosen rationally or be obtained by stochastic combinations of fragmented parental sequences. For directed evolution in vitro the concept of a subpopulation as the evolving unit reﬂects the natural process. Variants of the evolving species show a distribution of properties, and this ensemble, called ‘‘quasi-species’’ (9), is the progenitor of the next generation. The quasi-species by virtue of its distribution of properties has a greater potential for further evolution than the ‘‘optimal’’ individual in the current generation. A broader genetic background also reduces the risk of dead ends in the evolutionary pathway, which are similar to the detrimental eﬀects that arise by inbreeding of higher organisms. It would appear that

Functional Space of Combinatorial Mutant Libraries

445

tailoring a protein for entirely novel functions requires a more varied structural ancestry than does the optimization of already existing properties. The ability of the quasi-species to evolve is governed by its structural properties and functional potential. In general, an enzyme is composed of a binding pocket, which aﬀords speciﬁcity in the interaction with the substrate and stabilization of the transition state. In most cases the active site is furnished with functional chemical groups that promote the chemical transformation of the substrate. An important feature is also that the protein structure shields the reactants in the active site from unfavorable interactions in the surrounding medium, which could otherwise jeopardize speciﬁcity and catalytic eﬃciency of the catalyzed reaction. From the evolutionary perspective it is therefore often desirable to conserve an overall structure that can accommodate the necessary functional properties. This is in accord with the paradigm of redesign as a major route for protein evolution. 1.4

Strategy for Directed Evolution

An already existing protein framework with adequate stability and solubility is a good starting point for the evolution of enzymes with new functions. The generation of mutations in an evolved scaﬀold is a robust approach that minimizes the risks of unfavorable folding and low stability of the resulting structure. Repeated use of existing structural modules appears to be a powerful strategy for directed evolution in protein engineering mimicking the natural evolution of proteins. In some cases it is possible to engineer desired catalytic properties by rational redesign (1,5,6,10), but in the absence of accurate methods for prediction of function stochastic methods are generally more eﬀective. For the evolution of enzymes with altered substrate speciﬁcities recursive DNA shuﬄing (11,12) is often a successful approach. The shuﬄing generates a library of variant structures and may be based on a single DNA sequence or on several similar sequences, i.e., family shuﬄing (13). From the resulting recombinant sequences those that express improved properties are chosen for further rounds of DNA shuﬄing among the variants. This procedure is reiterated until a satisfactory result has been obtained. DNA shuﬄing has similarities to genetic recombination processes that are implied in the natural evolution of protein functions. An important feature is that a large proportion of the nearest-neighbor residues are conserved, which is expected to favor the folding and assembly of the recombined polypeptide chains. For example, a joint N-cap and hydrophobic staple motif is important for the stability and folding of some proteins such as GSTs (14) and is fully conserved in their evolution (15). DNA shuﬄing is associated with a high probability of maintaining these essential structural elements.

446

Mannervik et al.

In the selection of variants with a high evolutionary potential for the next generation, it is desirable to identify members of the total population sharing the targeted property but also expressing the diversity characterizing the quasi-species. The boundary conditions for such a subpopulation may be diﬃcult to deﬁne, but a practical option is to focus on a manageable subgroup deviating from the main body of the total population.

2 2.1

COMBINATORIAL PROTEIN CHEMISTRY IN THE EVOLUTION OF GLUTATHIONE TRANSFERASES Natural Evolution of Soluble Glutathione Transferases

Glutathione transferases have emerged as a superfamily of detoxication enzymes with broad but diverse substrate speciﬁcities. In biological tissues they are involved in the inactivation of genotoxic electrophiles arising by oxidative processes (16). The enzymes have been divided into classes (17) and GSTs consist of two similar subunits, which are composed of two distinct domains (18). The N-terminal domain is thioredoxin-like with a hsheet ﬂanked by a-helices. The C-terminal domain is formed by helices, and a portion of helix 4 and the C-terminal polypeptide are the main contributors to the H-site, which accommodates the electrophilic substrate (1). The thiol substrate common to the GSTs is glutathione (GSH), and this second reactant is bound to a well-conserved G-site, primarily formed by the Nterminal thioredoxin-like domain. The structural conservation of the G-site has a functional counterpart in the high speciﬁcity for the thiol substrate GSH, and the contrasting variability of the H-site reﬂects the diversity among the GSTs in their various selectivities for the electrophilic substrates (16,18). From the evolutionary viewpoint it appears possible that a glutathione-binding domain similar to thioredoxin has been combined with a helix bundle that aﬀords a binding site for the second substrate. A similar module appears in several other glutathione-linked proteins such as glutaredoxin and selenium-dependent glutathione peroxidase (cf. Ref. 19). The available evidence suggests that gene duplications accompanied by mutations and recombinations of the variant GST sequences have led to the emergence of the diﬀerent classes and the further multiplication of GST class members (16, 18,20). In the human genome the diﬀerent GST classes are represented on diﬀerent chromosomes (21), demonstrating that divergence of the classes was an early event in evolution. The evolution of diﬀerent class members was a later event. Five Mu class members on human chromosome 1 bear evidence of gene conversion as a contributor to diversiﬁcation of the class (22). Nature amply demonstrates the functional versatility of the GSTs and their

Functional Space of Combinatorial Mutant Libraries

447

ability to evolve and acquire new functions, suggesting that GSTs have great potential for further evolution in vivo as well as by protein engineering in vitro. 2.2

Functional Versatility of GSTs

Like many other detoxication enzymes, GSTs have broad substrate speciﬁcities. As an ensemble they can catalyze the reaction of glutathione with most electrophilic compounds presented to biological systems (23–25). Nevertheless, the individual GSTs show markedly diﬀerent substrate selectivities, and high catalytic eﬃciency for a given substrate may be restricted to a single isoenzyme. For example, the inactivation of aminochrome, an oxidation product of the neurotransmitter dopamine, is catalyzed by two orders of magnitude higher activity of GST M2-2 than of the 84% sequence identical Mu class enzyme GST M1-1 (26). In comparison with GSTs tested from other classes, GST M2-2 is superior by even further orders of magnitude. Similarly, the Alpha class GST A4-4 has evolved selectively as an eﬃcient catalyst for the inactivation of 4-hydroxynonenal, a toxic product of biological lipid peroxidation (27). Protein engineering has demonstrated that a small number of site-directed mutations may be suﬃcient to install a targeted activity in a suitable GST framework (1,4,10). Design of functional chimeric GSTs composed of segments from diﬀerent isoenzymes has also been accomplished (4,28–31). 2.3

Creation of a Combinatorial Mu Class GST M1/M2 Library by DNA Shuffling

Mu class GSTs were investigated in order to probe structure–activity relationships and to lay a foundation for directed evolution. Mosaic sequences were generated by family shuﬄing (13) of DNA encoding GST M1-1 and GST M22 (32). The homologous DNA sequences are 89% identical, and at the protein level 34 out of 217 amino acid residues diﬀer in the corresponding primary structures (Fig. 1). Of the 34 variant amino acids a limited number are responsible for the divergent substrate selectivities of the two enzymes. The constructed GST M1/M2 library heterologously expressed in Escherichia coli contained a high proportion of functional chimeric enzymes (32). Approximately 90% of the randomly isolated chimeras displayed enzymatic activity. Sequence analysis of isolated clones demonstrated that all the analyzed enzymes consisted of segments from both of the parental GSTs and that the amino acid sequences could be divided into 11 segments of diﬀerent sizes with overlapping ends exactly matching those of the neighboring building blocks (Fig. 1). This GST M1/M2 library of chimeras was constructed such that all sequences had an N-terminal segment derived from GST M1-1, which

448

Mannervik et al.

Figure 1 Divergent amino acids in the primary structures of human GST M1-1, GST M2-2, and their occurrence in the clones of the GST M1/M2 mutant library. Above, the 34 variant amino acid residues out of the total 217 in the sequences are aligned. Below, the 11 segments of the primary structures found to be exchangeable between GST M1-1 and GST M2-2 by DNA shuﬄing are shown for the clones sequenced in the various clusters (cf. Fig. 5). The white segments derive from GST M1-1 and the black segments from GST M2-2. The sizes of the segments diﬀer considerably, but their ends are not accurately deﬁned because crossing-over occurs in the stretches of identity of the two parental DNA sequences. Segments 7 and 11 contain variant residues that contribute to substrate binding in the H-site.

Functional Space of Combinatorial Mutant Libraries

449

had been optimized for high-level expression in E. coli (33). In addition to the stochastic combination of parental segments, single random point mutations were found in most sequences analyzed. These mutations add further diversity to the library. 2.4

Probing the Functional Space of the GST M1/M2 Mutant Library by the Use of Alternative Substrates

Clones of the library of chimeric Mu class GSTs obtained by DNA shuﬄing were randomly picked from colonies on bacterial culture plates. Glutathione transferase activities were measured in bacterial lysates using three alternative electrophilic substrates: 1-chloro-2,4-dinitrobenzene (CDNB), 2cyano-1,3-dimethyl-1-nitrosoguanidine (cyanoDMNG), and aminochrome (Fig. 2). The activities of the 81 clones analyzed spanned several orders of

Figure 2 Chemical reactions catalyzed by GSTs. Aminochrome is conjugated in an addition reaction, whereas cyanoDMNG and CDNB undergo nucleophilic displacement reactions. In all three reactions the nucleophilic sulfur of glutathione (GSH) reacts with an electrophilic center in the second substrate. GST M2-2 is highly active with all three substrates, but GST M1-1 has similar high activity with CDNB only.

450

Mannervik et al.

Functional Space of Combinatorial Mutant Libraries

451

magnitude with all three substrates (Fig. 3). Wild-type GST M2-2 is approximately 200 times as active as wild-type GST M1-1 with cyanoDMNG and aminochrome, but only twice as active with CDNB. The activities measured in the bacterial lysates ranged both below and above those of the wild-type GSTs (Fig. 3). It should be noted that the three substrates used undergo diﬀerent types of chemical transformation. Aminochrome is conjugated in an addition reaction, whereas CDNB and cyanoDMNG are involved in displacement reactions with diﬀerent chemistries. Thus, the functional space of the library is explored not only with respect to shape and reactivity of the electrophilic substrate but also with respect to the type of chemical reaction catalyzed. The library has previously been used to investigate the structural basis for high activity with cyanoDMNG and aminochrome (34,35). For directed evolution it is important to identify variants that functionally diﬀer from the parental enzymes. This can be achieved through multivariate analysis of their distinctive functional properties. 2.5

Principal Component Analysis of Multidimensional Catalytic Activities in the GST M1/M2 Mutant Library

Each GST variant can be represented by a point in multidimensional factor space. In the present case the factors are the three activities measured with the diﬀerent substrates in lysates of the bacterial clones. Two-dimensional projections of the factor space are shown in Fig. 3 for the 81 samples of the population. For comparison, points corresponding to bacterial colonies expressing wild-type GST M1-1 and GST M2-2 are shown. As expected, the samples of the two parental clones are close in the dimension of CDNB but separated signiﬁcantly in the dimensions of cyanoDMNG and aminochrome (Fig. 3). In order to see if the mutants could be divided into distinct subgroups, a principal component analysis (36) of the data was carried out (Fig. 4). Such an analysis was originally used in the overall classiﬁcation of the GSTs (17). In the present case some of the mutants appeared in the vicinity of the positions of GST M1-1 and some in the vicinity of the GST M2-2 positions. However, a more distinct subgrouping than that obtained in

Figure 3 Functional space of the GST M1/M2 mutant library in projections of catalytic activities with alternative electrophilic substrates. Each enzyme variant is represented by a point in three dimensions corresponding to the expressed activities in lysates of bacterial colonies. The diﬀerent dimensions are displayed pairwise. Open triangles mark colonies expressing wild-type GST M2-2 and open squares mark colonies expressing GST M1-1. The scatter of the points representing the wild-type enzymes indicates the experimental variance of the data. Activities are given in Amol min1 per ml bacterial lysate and are presented on logarithmic scales.

452

Mannervik et al.

Figure 4 Principal component analysis (Ref. 36) of the activities of the variant GSTs with alternative substrates presented in Fig. 3. The plot shows a projection of the data on the plane spanned by the ﬁrst principal components PC1 and PC2. In the analysis, based on a log transformation of the data, PC1 accounts for 81% and PC2 15% of the total variance. Small circles represent individual clones of the GST M1/M2 mutant library, open triangles wild-type GST M2-2, and open squares wild-type GST M1-1.

Fig. 3 could not be made by principal component analysis in the absence of additional information. 2.6

Cluster Analysis of the GST M1/M2 Mutant Library

The clustering of the 81 mutants in the three-dimensional factor space was mapped in a dendrogram in order to further investigate the functional relationships among the variants (Fig. 5). In the analysis, diﬀerent transformations and scalings of the data had to be considered and a distance measure chosen (37). Among the several alternatives, the conditions underlying Fig. 5 appeared to be the most reliable and robust method for grouping of the GST variants. The lines in the dendrogram showing distances indicate functional diﬀerences among the diﬀerent enzyme variants and those of the subgroups. Nodes with low or moderate distances (Bray–Curtis distances less than 35 in Fig. 5) are strongly inﬂuenced by the experimental variance of the activity measurements and cannot be used for a

Functional Space of Combinatorial Mutant Libraries

453

Figure 5 Cluster analysis of the mutants from the GST M1/M2 library based on their activities with alternative substrates. The distances between the points in Fig. 3 were determined as the Bray–Curtis dissimilarity (38); calculations were based on the square root transformation of activity values (37). Clusters are formed by clones separated less than the estimated experimental variance (Bray–Curtis distance less than 35) and are marked by square brackets. Diamonds identify clones that were not consistently associated with a particular subgroup in alternative analyses. Numbers denote clones subjected to DNA sequence analysis (cf. Fig. 1) and M1-1 and M2-2 represent samples of colonies of the wild-type GSTs.

reliable subgrouping of the data. Using this boundary condition, all of the variant GSTs analyzed can be consistently grouped into six clusters, except for seven outliers that fall into diﬀerent clusters depending on the transformation of data. Fig. 6 shows a mapping of the six clusters onto the scattered data in Fig. 3 and reveals coherent subgroups in diﬀerent projections of the three-dimensional substrate–activity space. Two discrete clusters contain the wild-type GST M1-1 and GST M2-2, respectively. The primary structures of the mutants in these clusters are

454

Mannervik et al.

Functional Space of Combinatorial Mutant Libraries

455

composed of segments predominantly derived from the corresponding parental structures (Fig. 1). A third cluster (D) is formed by mutants generally characterized by very low catalytic activities with the three substrates. In this group two clones were subjected to sequence analysis; mutant 76 is mutated and C-terminally truncated with part of the H-site missing, and mutant 58 has a L54P substitution which may aﬀect subunit–subunit interactions and glutathione binding (Fig. 1). The remaining three clusters contain mutants with diﬀerent combinations of high or low activities with the three distinguishing substrates. Of the 11 structural segments only numbers 7 and 11 contain H-site residues that distinguish GST M1-1 and GST M2-2. Even if other structural diﬀerences do have functional consequences, it is noteworthy that all sequences determined for members of cluster A contain the same combination of segment 7 from GST M2-2 and segment 11 from GST M1-1. Similarly, both sequences in cluster B have the GST M1-1 version of segment 7. A more extensive analysis involving a larger number of sequences may provide a deeper understanding of the structural basis of substrate selectivity. For further directed evolution a subgroup can be chosen as multiple parents and its members subjected to family shuﬄing, or to other generators of stochastic DNA mutations, in order to optimize their functional properties. For example, if the goal were to evolve GSTs with high activities toward all three substrates, CDNB, cyanoDMNG, and aminochrome, the variants clustering with GST M2-2 could be selected. For the development of variants with suppressed CDNB activity cluster A could be chosen. In large-scale screening, enzyme activities are commonly measured in crude bacterial lysates, and the values obtained will depend on the expression level of the protein being assayed. Thus high catalytic activity of a given clone can reﬂect both high intrinsic catalytic eﬃciency of the enzyme and high-level expression of the protein. For practical applications of protein engineering it may be desirable to select clones by both of these criteria. In order to make the analysis independent of the expression level, additional input is required to identify enzymes with high speciﬁc activities. The amount of the protein could be determined by immunoassay using speciﬁc antibodies or by suitable tagging of the protein. The most direct approach is to make activity measure-

Figure 6 Segregation of the activity data into subgroups based on the cluster analysis. The data presented in the three panels of Fig. 3 are replotted with diﬀerent symbols based on the clusters in the dendrogram (Fig. 5). Plus signs (+) mark the clones of the ‘‘GST M2-2 like’’ and crosses () mark those of the ‘‘GST M1-1 like’’ clusters. Clusters A (triangles, 4), B (squares, 5), C (circles, o), and D (minus signs, ) are distinguishable. Diamonds (x) mark the clones that did not show a consistent distribution among the diﬀerent clusters when diﬀerent transformations and scalings were used in the data analysis.

456

Mannervik et al.

ments after puriﬁcation of the enzymes. However, in many cases puriﬁcation is tedious and not suitable for large-scale screening. In the numerical analysis of the experimental data it is also possible to scale the data, in order to make the characterization of the variant enzymes independent of their expression levels. The ratio of activities with two substrates is an alternative enzyme parameter that measures substrate selectivity independently of protein concentration. The GST M1/M2 library has been constructed such that the expression level should not vary substantially (32). 3

CLUSTER ANALYSIS APPLIED TO THE DIRECTED EVOLUTION OF MOLECULAR FUNCTIONS

In the stepwise redesign of molecules for new and optimized functions, the most successful strategy is generally to sample not only the best individual of the current generation for further evolution but also to add some of the contributions of variants with suboptimal properties. This approach lowers the probability of nonproductive combinations in the evolutionary pathway and is similar to strategies practiced in the breeding of plants and animals. The concept of a broad parentage for the next generation has been theoretically analyzed for the quasi-species (9). It has also been emphasized that the degree of diversity has important boundary conditions. If the rate of random mutations per generation exceeds a critical value (the error threshold) the evolutionary process collapses and does not give functional oﬀspring. Cluster analysis is an empirical approach to the evolutionary design of new molecular functions. In a given generation of functionally variant enzymes, a cluster or a group of related clusters will be the ensemble of parental structures that has the highest probability of giving improved oﬀspring by suitable recombinations. This subset of mutants is a representation of the molecular quasi-species (9). Multivariate analysis is currently used in the directed evolution of a mutant library of Theta class GSTs for new and optimized activities. By selecting diﬀerent clusters of mutants it is possible to choose alternative evolutionary pathways. For example, the Theta class library of GST variants (39) may be engineered toward improved activities with epoxide substrates in one direction and evolved for activity with alkyl halides in an alternative direction. Principal component analysis may help to determine the number of substrates necessary to deﬁne the quasi-species, i.e., estimate the dimensionality of the activities deﬁning the factor space. The cluster analysis applied to evolution of protein function presented here is not dependent on structural information about the evolving species. Furthermore, it is not limited to enzyme catalysis but could be applied to other properties that can be subjected to quantitative analysis (40). Other applications include the directed evolution of nucleic acids for catalytic and binding functions (41–43). Obviously, the combinatorial approaches to small-

Functional Space of Combinatorial Mutant Libraries

457

molecule design (44,45), such as drug development, are beneﬁting from similar cluster analysis.

ACKNOWLEDGMENTS The authors’ work has been supported by grants from the Swedish Research Council, the Swedish Cancer Society, and the Carl Trygger Foundation. We thank Ms. Anna-Karin Larsson and Ms. Malena Andersson for carefully reading and commenting on the manuscript.

REFERENCES 1.

2. 3. 4.

5.

6. 7. 8. 9. 10.

11.

12. 13.

14.

LO Nilsson, A Gustafsson, B Mannervik. Redesign of substrate-selectivity determining modules of glutathione transferase A1-1 installs high catalytic eﬃciency with toxic alkenal products of lipid peroxidation. Proc Natl Acad Sci USA 97:9408–9412, 2000. JM Thornton, CA Orengo, AE Todd, FMG Pearl. Protein folds, functions and evolution. J Mol Biol 293:333–342, 1999. YI Wolf, NV Grishin, EV Koonin. Estimating the number of protein folds and families from complete genome data. J Mol Biol 299:897–905, 2000. R Bjo¨rnestedt, S Tardioli, B Mannervik. The high activity of rat glutathione transferase 8-8 with alkene substrates is dependent on a glycine residue in the active site. J Biol Chem 270:29705–29709, 1995. JJ Onuﬀer, JF Kirsch. Redesign of the substrate speciﬁcity of Escherichia coli aspartate aminotransferase to that of Escherichia coli tyrosine aminotransferase by homology modeling and site-directed mutagenesis. Protein Sci 4:1750–1757, 1995. TM Penning, JM Jez. Enzyme redesign. Chem Rev 101:3027–3046, 2001. L Wang, PG Schultz. Expanding the genetic code. Chem Commun 1–11, 2002. W Gilbert. Why genes in pieces? Nature 271:501, 1978. M Eigen, J McCaskill, P Schuster. Molecular quasi-species. J Phys Chem 92: 6881–6891, 1988. PL Pettersson, A-S Johansson, B Mannervik. Transmutation of human glutathione transferase A2-2 with peroxidase activity into an eﬃcient steroid isomerase. J Biol Chem 277:30019–30022, 2002. WPC Stemmer. DNA shuﬄing by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc Natl Acad Sci USA 91: 10747–10751, 1994. WPC Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature 370:389–391, 1994. A Crameri, S-A Raillard, E Bermudez, WPC Stemmer. DNA shuﬄing of a family of genes from diverse species accelerates directed evolution. Nature 391: 288–291, 1998. A Aceto, B Dragani, S Melino, N Allocati, M Masulli, C Di Ilio, R Petruzzeli. Identiﬁcation of an N-capping box that aﬀects the a6-helix propensity in glu-

458

15.

16. 17.

18. 19.

20.

21.

22.

23.

24. 25. 26.

27.

28.

29.

Mannervik et al. tathione S-transferase superfamily proteins: a role for an invariant aspartic residue. Biochem J 322:229–234, 1997. R Cocco, G Stenberg, B Dragani, D Rossi Principe, D Paludi, B Mannervik, A Aceto. The folding and stability of human Alpha class glutathione transferase A1-1 depend on distinct roles of a conserved N-capping box and hydrophobic staple motif. J Biol Chem 276:32177–32183, 2001. B Mannervik. The isoenzymes of glutathione transferase. Adv Enzymol Rel Areas Mol Biol 57:357–417, 1985. B Mannervik, P A˚lin, C Guthenberg, H Jensson, MK Tahir, M Warholm, Jo¨rnvall. Identiﬁcation of three classes of cytosolic glutathione transferase common to several mammalian species: correlation between structural data and enzymatic properties. Proc Natl Acad Sci USA 82:7202–7206, 1985. RN Armstrong. Structure, catalytic mechanism, and evolution of glutathione transferases. Chem Res Toxicol 10:2–18, 1997. B Mannervik, I Carlberg, K Larson. Glutathione: general review of mechanism of action. In: D Dolphin, R Poulson, O Avramovic, eds. Coenzymes and Cofactors. Vol. 3A. New York: John Wiley and Sons, 1989, pp 475–516. SE Pemble, JB Taylor. An evolutionary perspective on glutathione transferases inferred from class-theta glutathione transferase cDNA sequences. Biochem J 287:957–963, 1992. A-S Johansson, B Mannervik. Interindividual variability of glutathione transferase expression. In: GM Paciﬁci, O Pelkonen, eds. Interindividual Variability in Human Drug Metabolism. London: Taylor & Francis, 2001, pp 460–519. WR Pearson, WR Vorachek, S Xu, R Berger, I Hart, D Vannais, D Patterson. Identiﬁcation of class-mu glutathione transferase genes GSTM1–GSTM5 on human chromosome 1p13. Am J Hum Genet 53:220–233, 1993. LF Chasseaud. The role of glutathione and glutathione S-transferases in the metabolism of chemical carcinogens and other electrophilic agents. Adv Cancer Res 29:175–274, 1979. WB Jakoby, WH Habig. Glutathione transferases. In: WB Jakoby, ed. Enzymatic Basis of Detoxication. Vol. 2. New York: Academic Press, 1980, pp 63–94. B Mannervik, UH Danielson. Glutathione transferases—structure and catalytic activity. CRC Crit Rev Biochem 23:283–337, 1988. J Segura-Aguilar, S Baez, M Widersten, CJ Welch, B Mannervik. Human class Mu glutathione transferases, in particular isoenzyme M2-2, catalyze detoxication of the dopamine metabolite aminochrome. J Biol Chem 272:5727–5731, 1997. I Hubatsch, M Ridderstro¨m, B Mannervik. Human glutathione transferase A44: an Alpha class enzyme with high catalytic eﬃciency in the conjugation of 4hydroxynonenal and other genotoxic products of lipid peroxidation. Biochem J 330:175–179, 1998. R Bjo¨rnestedt, M Widersten, PG Board, B Mannervik. Design of two chimaeric human–rat class Alpha glutathione transferases for probing the contribution of C-terminal segments of protein structure to the catalytic properties. Biochem J 282:505–510, 1992. PH Zhang, SX Liu, S Shan, XH Ji, GL Gilliland, RN Armstrong. Modular mutagenesis of exons 1, 2, and 8 of a glutathione-S-transferase from the Mu

Functional Space of Combinatorial Mutant Libraries

30. 31.

32.

33.

34.

35.

36. 37. 38. 39.

40. 41. 42. 43. 44.

45.

459

class-mechanistic and structural consequences for chimeras of isoenzyme 3-3. Biochemistry 31:10185–10193, 1992. KP Van Ness, TM Buetler, DL Eaton. Enzymatic characteristics of chimeric mYc/rYc1 glutathione S-transferases. Cancer Res 54:4573–4575, 1994. A Pal, YJ Gu, SS Pan, XH Ji, SV Singh. C-terminal region amino acid substitutions contribute to catalytic diﬀerences between murine class Alpha glutathione transferases mGSTA1-1 and mGST A2-2 toward anti-diol epoxide isomers of benzo[c]phenanthrene. Biochemistry 40:7047–7053, 2001. LO Hansson, R Bolton-Grob, T Massoud, B Mannervik. Evolution of diﬀerential substrate speciﬁcities in Mu class glutathione transferases probed by DNA shuﬄing. J Mol Biol 287:265–276, 1999. M Widersten, M Huang, B Mannervik. Optimized heterologous expression of the polymorphic human glutathione transferase M1-1 based on silent mutations in the corresponding cDNA. Protein Expr Purif 7:367–372, 1996. LO Hansson, R Bolton-Grob, M Widersten, B Mannervik. Structural determinants in domain II of human glutathione transferase M2-2 govern the characteristic activities with aminochrome, 2-cyano-1, 3-dimethyl-1-nitrosoguanidine and 1,2-dichloro-4-nitrobenzene. Protein Sci 8:2742–2750, 1999. LO Hansson, B Mannervik. Use of chimeras generated by DNA shuﬄing: probing structure–function relationships among glutathione transferases. Methods Enzymol 328:463–477, 2000. WJ Krzanowski. Principles of Multivariate Analysis. A User’s Perspective. Rev. ed. New York: Oxford University Press, 2000, pp 48–85. http://www.simﬁt.man.ac.uk (version: 5.4, release 4.022, WG Bardsley, University of Manchester). JR Bray, JT Curtis. An ordination of the upland forest communities of Southern Wisconsin. Ecol Monogr 27:324–349, 1957. K Broo, A-K Larsson, P Jemth, B Mannervik. An ensemble of Theta class glutathione transferases with novel catalytic properties generated by stochastic recombination of fragments of two mammalian enzymes. J Mol Biol 318:59–70, 2002. AD Keefe, JW Szostak. Functional proteins from a random-sequence library. Nature 410:715–718, 2001. L Gold, B Polisky, O Uhlenbeck, M Yarus. Diversity of oligonucleotide functions. Ann Rev Biochem 64:763–797, 1995. GF Joyce. Nucleic acid enzymes: playing with a fuller deck. Proc Natl Acad Sci USA 95:5845–5847, 1998. DS Wilson, JW Szostak. In vitro selection of functional nucleic acids. Ann Rev Biochem 68:611–647, 1999. RA Houghten. Parallel array and mixture-based synthetic combinatorial chemistry: tools for the next millennium. Ann Rev Pharmacol Toxicol 40:273–282, 2000. DA Erlanson, AC Braisted, DR Raphael, M Randal, RM Stroud, EM Gordon, JA Wells. Site-directed ligand discovery. Proc Natl Acad Sci USA 97:9367–9372, 2000.

21 Modifying the Character of an Enzyme by Producing Chimeric Enzymes: Chimeric B-glucosidases as an Illustration Kiyoshi Hayashi, Bong Jo Kim, Kshamata Goyal, Satya Singh, Jong-Deog Kim, Yeon-Kye Kim, Satoru Nirasawa, and Motomitsu Kitaoka National Food Research Institute Tsukuba, Japan

Enzymes have become essential to the course of modern daily life; they are employed in such diverse ﬁelds as food processing, drug synthesis, and they are also found in some washing detergents. Ideally, the enzyme employed exactly matches the characteristics required for its application, although in practice this is quite diﬃcult. Several methods aimed at changing the character of an enzyme were developed (1,2). In recent years, protein engineering has become an increasingly important tool in the development of novel hybrid enzymes with useful catalytic functions (3). The construction of chimeric enzymes proved to be one of the most sensitive methods used in the study of structure and function relationships in the parent proteins (4–7). In addition, the construction of chimeric enzymes facilitated the progress 461

462

Hayashi et al.

toward the production of enzymes with improved catalytic activities and thermal stabilities. Several enzymes with improved properties and thermal stabilities have already been produced by gene shuﬄing experiments and these enzymes also proved useful in mechanistic studies (8,9). In this article, the modiﬁcation of an enzyme’s character through the preparation of chimeric enzymes is described, and the process is illustrated with the example of chimeric h-glucosidases. h-Glucosidase is one of the enzymes involved in the decomposition of cellulose and hydrolysis of cellooligosaccharides, including the conversion of cellobiose into glucose. Based on its amino acid sequence, this enzyme is classiﬁed into one of two families; family 1 or family 3. Of the family 3 h-glucosidases, three enzymes from Cellvibrio gilvus (10), Agrobacterium tumefaciens (11), and Thermotoga maritima (12) were selected for the construction of chimeric enzymes because they possess thoroughly diﬀerent characteristics, especially with regard to their thermal stabilities (Table 1). Based on the amino acid alignment of these enzymes, they were found to consist of three regions: an N-terminal domain, a C-terminal domain, and a nonhomologous region (Fig 1). The length of the nonhomologous region is quite diﬀerent among the three enzymes. It is interesting to note that the locations of the N-terminal and Cterminal domains are inverted in the enzymes produced by Ruminococcus albus (13) and Butyrivibrio ﬁbrisolvens (14) compared with those of the other family 3 glycosidases. This inversion can be considered as an indication of gene replacement (i.e., chimeric enzymes) occurring in nature. Regarding the three-dimensional structures of the family 3 enzymes, only one structure is available: that of the barley h-glucosidase (15). Based on this structure, the N-terminal domain forms an (a/h)8 barrel, which is one of the most common structures found in glycosidases. It is unfortunate that the C-terminal domain of the barley h-glucosidase is rather short; as a consequence, it is not possible to elucidate the structures of the C-terminal

Table 1

Comparison of the Characteristics of the Parental h-glucosidases pNP-glucosea

Optimum Enzyme (origin) Cg (C. gilvus) At (A. tumefaciens) Tm (T. maritima) a

Temperature (jC) 35 65 85

pH

Km (mM)

kcat (sec 1)

Transglycosylation activity

Number of amino acid residues

6.4 7.3 3.4

0.44 0.012 0.004

42 95 6.4

No Yes Yes

752 818 721

All kinetic parameters were measured at 30jC and at pH 6.5.

Modifying the Character of an Enzyme

463

Figure 1 Homologous regions in the amino acid sequences of the h-glucosidases. The positions of the two catalytic residues, D (the nucleophile/base) and E (the proton donor), located in each domain; one in the N-terminal domain and one in the Cterminal domain. The number of amino acid residues from the N-teminal is indicated.

domains in the three selected h-glucosidases at present. It is important to note that the two catalytic residues, D (nucleophile/base) and E (proton donor), are located in separate domains; D is found in the N-terminal domain and E in the C-terminal domain (Fig 1). As can be seen in Table 2, the amino acid identities of the three selected h-glucosidases lie between 32% and 45%, which is a relatively low rate and which gives rise to the prominent diﬀerences observed in their enzymatic characters.

Table 2 Amino Acid Identities Among the Parental Enzymes Enzymes Cg and At At and Tm Tm and Cg

N-terminal domain (%)

C-terminal domain (%)

37 45 32

40 37 36

464

1

Hayashi et al.

PREPARATION OF THE CHIMERIC ENZYMES

To obtain chimeric enzymes, chimeric genes are normally constructed by applying one of two methods: using restriction enzymes or by overlapping polymerase chain reaction (PCR) (16). The use of restriction enzymes is the easier method; however, a high level of identity is required in the parental DNA sequences to obtain common restriction enzyme sites in both genes. Often, the availability of the common restriction enzyme sites limits the use of this strategy in the construction of chimeric genes. In contrast, there are no such limitations in deﬁning the shuﬄing sites for chimeric genes constructed via the overlapping PCR technique, a technique requiring three PCR steps. Previously, the errors incorporated into the constructed genes by using the three-step PCR method represented one of the largest problems limiting the use of this method. However, as several new DNA polymerases with proof reading ability are now commercially available, the use of overlapping PCR for the construction of chimeric genes has become considerably simpler and more popular. An example of constructing chimeric genes by using proof reading DNA polymerase for the PCR is illustrated in Fig 2. Based on the amino acid alignment of the two parental enzymes, Cg (C. gilvus) and Tm (T. maritima), four shuﬄing sites were selected (Fig 2A); two in the N-terminal domain and two in the Cterminal domain. Typically, the shuﬄing sites are selected in the highly homologous regions of the amino acid sequences where the three-dimensional structures are also quite conserved, and therefore there is less torsion expected at the shuﬄing site in the constructed chimeric enzyme. Therefore, four shuﬄing sites were selected in regions where several identical amino acid sequences were found in the two parental enzymes (GRNFY in the ﬁrst shuﬄing site, VMSDW in the second site, VGY in the third site, and QVY in the fourth site) (Fig 2B). In addition, the location of the two catalytic residues and the distribution of the shuﬄing sites were also considered in selecting the positions of the shuﬄing sites. Of the eight chimeric enzymes that were constructed between Cg and Tm, only two were obtained as catalytically active enzymes (Fig. 3). In general, catalytically active enzymes are produced in a soluble form, whereas

Figure 2 Homology in the amino acid sequences of the C. gilvus and T. maritima h-glucosidases. Cg and Tm represent the C. gilvus and T. maritima h-glucosidases respectively. (A) Schematic representation of the sequence homology and positions of the constructed chimera in the N-terminal and C-terminal domains of the Cg and Tm h-glucosidases. (B) Amino acid alignment of the C. gilvus and T. maritima h-glucosidases in the C-terminal domain. Identical and similar amino acid residues are designated by the symbols ‘‘*’’ and ‘‘.’’, respectively.

Modifying the Character of an Enzyme

465

466

Hayashi et al.

Figure 3 Constructed chimeric and parental enzymes. The name of the chimeric enzyme, such as Cg85At15, indicates that this chimeric enzyme consists of 85% C. gilvus h-glucosidase (Cg) at N-terminal domain and 15% A. tumefaciens (At) at the C-terminal domain. The percentage value is based not on the amino acid residues of the constructed chimeric enzyme, but on those of the parental enzymes.

Modifying the Character of an Enzyme

467

the catalytically inactive ones are normally obtained as inclusion bodies, suggesting that the folding information buried in the amino acid sequence becomes disturbed during the construction of the chimeric genes. A total of 18 chimeric enzymes have been constructed using the three parental h-glucosidases, and only seven of these were obtained as catalytically active enzymes. Considering the low level of identity in the amino acid sequences among the three parental enzymes (32–45%), the proportion of catalytically active enzymes (39%) is actually very high. Indeed, by using parental enzymes that are highly homologous in their amino acid sequences, catalytically active chimeric enzymes can easily be constructed. However, when the level of identity of the respective amino acid sequences is high, the characteristics of the generated enzymes are also quite similar. In general, enzymes whose amino acid identities are less than 40% display quite distinct characteristics. Therefore, it is apparent that diﬀerences in enzyme character and amino acid identity have to be traded oﬀ against one another. 2

REFOLDING OF THE INCLUSION BODIES

To recover the activity of the chimeric enzymes producing low activities, such as Cg25Tm4Cg70 and Cg25Tm16Cg58, several recovery attempts, including coexpression with GroEL/ES (17) and refolding by slow dialysis (18,19), were conducted. The highest yield of recovered activity obtained via refolding was only 2%; this enzyme activity was very unstable and it was not possible to subject the enzyme to further puriﬁcation steps. Therefore, we were not able to ﬁnd a suitable method for recovering the activity of the chimeric enzymes produced as inclusion bodies. 3 3.1

CHARACTERISTICS OF THE CHIMERIC ENZYMES Thermal Profiles

The temperature stabilities determined for the parental and chimeric enzymes are shown in Fig. 4. Typical examples of the thermal proﬁles observed were obtained in the four constructs using the Tm (T. maritima) and Cg (C. gilvus) h-glucosidases as the parental enzymes. The heat stability of Cg is 41jC, whereas that of At is 67jC (20,21). The heat stability of the four chimeric enzymes (Cg92At7, Cg78At19, Cg70At27, and Cg61At37) increased by 6–16jC relative to Cg as the level of At incorporation increased from 7 % to 37% (Fig 4). Similar results were also obtained with the Tm/Cg chimeric enzymes. The heat stabilities of the two chimera (Tm80Cg20 and Tm88Cg12) were 70jC and 74jC. The heat stabilities of the resulting chimeric enzymes were

468

Hayashi et al.

Figure 4 Thermal stabilities of the parental and chimeric enzymes. The residual activity was determined after incubating the enzymes at pH 6.5 for 30 min. The incubation temperatures where an activity level of 50% remained are indicated. Parental enzymes (solid lines): ., Cg (C. gilvus); +, At (A. tumefaciens); , Tm (T.

maritima). Chimeric enzymes (dotted lines): o, At76Tm26; D, Cg92At7; Cg70At27; E, Cg61At37; y, Tm80Cg20; w , Tm88Cg12.

z,

Cg78At19; q,

also observed to increase as the incorporated region of the thermostable enzyme (Tm) increased in the chimeric enzyme. However, unlike these chimeric enzymes which display proﬁles intermediate to those of their parental enzymes, the At76Tm26 chimera did not display any (22). The observed temperature stability for the At76Tm26 chimera was approximately 41jC, which is 26jC or 40jC lower than that of the parental enzymes (67jC and 81jC, respectively). With regard to heat stability, these results demonstrate that the temperature stability of the chimeric enzymes generally correlates well with the thermal stabilities of the parental enzymes, although there are some exceptions. In addition, the optimal temperature for the enzymatic reaction is correlated to the heat stability of the enzyme. 3.2

pH Profiles

Examples of the pH-activity proﬁles of the parental and chimeric enzymes are shown in Fig. 5. The Tm (T. maritima) and Cg (C. gilvus) h-glucosidases display marked diﬀerences in their pH optima: The pH optima for the Tm and Cg enzymes occur at pH 3.4 and 6.4, respectively, and those for the

Modifying the Character of an Enzyme

469

Figure 5 Optimum pH proﬁles of the chimeric and parental h-glucosidases of Tm (T. maritima), Tm88Cg12, Tm80Cg20, and Cg (C. gilvus). The pH was adjusted with 50 mM of the following buﬀers: sodium phosphate (E; pH 1.1–3.1), sodium citrate (q; pH 2.2–4.1), sodium acetate (.; pH 4.1–5.8), sodium succinate (o; pH 4.3–6.5), MOPS (z; pH 6.2–8.2), HEPES (D; pH 6.6–8.5), and CAPS (y; pH 9.6–10.6).

Tm80Cg20 and Tm88Cg12 chimeric enzymes were both observed at approximately pH 4. The pH stability of both chimeric enzymes is closer to that of Tm than to that of the corresponding Cg h-glucosidase. The results obtained for the other chimera (At76Tm26, Cg61At37, Cg70At27, Cg78At19, and Cg92At7) are similar; the pH optima and stabilities lie between those of the parental enzymes.

470

3.3

Hayashi et al.

Kinetic Parameters

The kinetic parameters of the three parental and three chimeric h-glucosidases were investigated using various aryl glycosides as substrates. The Km and kcat values observed for each enzyme toward pNP-h-D-glucopyranoside, pNP-h-D-xylopyranoside, pNP-h-D-fucopyranoside, and pNP-a-L-arabinofuranoside are summarized in Table 3. The observed Km values for the chimeric Tm80Cg20 and Tm88Cg12 enzymes toward pNP-h-D-glucopyranoside were 0.012 and 0.0082 mM, respectively. These Km values are lower than the level observed for Cg (0.44 mM), but are more or less similar to that obtained for Tm (0.0039 mM). However, the kcat values observed for the chimeric Tm80Cg20 and Tm88Cg12 enzymes were 5.62 and 3.84 sec1 respectively, which is lower than those obtained for both of the parental enzymes, i.e., Tm (6.4 sec1) and Cg (42.2 sec1). Similarly, the Km values observed for the chimeric Tm80Cg20 and Tm88Cg12 enzymes toward pNP-h-D-xylopyranoside were 2.8 and 3.2 mM respectively, which is lower than the Km value observed for the Cg enzyme (10.6 mM), but very similar to that observed for the Tm enzyme (2.64 mM). In comparison, the kcat values obtained for the chimeric Tm80Cg20 and Tm88Cg12 enzymes toward pNP-h-D-xylopyranoside were 12.7 and 24.0 sec1 respectively, which were both higher than the value observed for Cg (3.08 sec1). The above data, including those obtained with the substrates pNP-h-D-fucopyranoside and pNP-a-L-arabinofuranoside, indicate that the substrate speciﬁcities of the chimeric enzymes Tm80Cg20 and Tm88Cg12 are similar to each other and are closer to those of the Tm parental enzyme than to the Cg enzyme. The kinetic parameters obtained for the At76Tm26 chimeric enzyme were also investigated. The Km values obtained for the parental enzymes At and Tm toward pNP-h-D-glucopyranoside were 0.012 and 0.0039 mM, respectively. However, the observed Km value for the At76Tm26 chimeric enzyme toward pNP-h-D-glucopyranoside was 0.081 mM, which is slightly higher than the level observed for either of the parental enzymes. In comparison, the kcat value obtained for the chimeric enzyme toward pNPh-D-glucopyranoside was 3.3 sec1, which is lower than those observed for both the At (95.4 sec1) and the Tm (6.4 sec1) h-glucosidases. The kcat/ Km (mM1 sec1) values for the Tm enzyme toward pNP-h-D-glucopyranoside, pNP-h-D-xylopyranoside, pNP-h-D-fucopyranoside, and pNP-a-Larabinofuranoside are 7950, 5780, 280, and 440, respectively; while for the Tm enzyme they are 1641, 6.96, 0.6, and 0.0476, respectively. The kcat/Km (mM1 s1) values obtained for the chimeric enzyme At76Tm26 toward the same substrates were 41, 0.17, 0.054, and 0.038, respectively. These data indicate that the speciﬁcity of this chimeric enzyme is slightly closer

pNP-h-D-glucopyranoside Km (mM) kcat (sec 1) kcat/Km (mM sec 1) pNP-h-D-xylopyranoside Km (mM) kcat (sec 1) kcat/Km (mM sec 1) pNP-h-D-fucopyranoside Km (mM) kcat (sec 1) kcat/Km (mM sec 1) pNP-a-L-arabinfuranoside Km (mM) kcat (sec 1) kcat/Km (mM sec 1) 0.081 3.3 41 0.95 0.16 0.17 0.24 0.013 0.054 0.66 0.025 0.038

0.012 95.4 7950 0.005 28.9 5780 0.079 22.1 280 0.24 119 440

18.9 9.0 0.476

42.7 27.6 0.65

2.64 18.4 6.97

0.0039 6.4 1640

Tm (T. maritima)

At (A. tumefaciens)

Substrates

At76Tm26

Kinetic Parameters of the Parental and Chimeric Enzymes

Table 3

21.0 5.2 0.247

57.3 16.6 0.290

2.8 12.7 4.54

0.012 5.62 468

Tm80Cg20

17.2 7.9 0.459

35.5 22.5 0.634

3.2 24.0 7.50

0.0082 3.84 468

Tm88Cg12

43.7 0.474 0.011

56.6 0.49 0.0087

10.6 3.08 0.291

0.44 42.2 95.9

Cg (C. gilvus)

Modifying the Character of an Enzyme 471

472

Hayashi et al.

to that of the Tm h-glucosidase because they showed more or less the same speciﬁcity toward the investigated substrates even though the chimeric enzyme contains only 26% of the amino acid residues of the T. maritima enzyme. The other four chimera that were constructed using Cg and At as the parental enzymes (Cg61At37, Cg70At27, Cg78At19, and Cg92At7) also showed similar results in terms of their kinetic parameters; those observed for the chimera lay between those of the parental enzymes. Therefore it can be summarized that shuﬄing regions in the Cterminal domain produces a slight eﬀect on the enzyme’s substrate speciﬁcity and catalytic eﬃciency. 3.4

Transglycosidation Activity

One unique characteristic of two of the h-glucosidases, At and Tm, is that they possess transglycosylation activity in the presence of alcohols; indeed,

Figure 6 Transglycosylation activity of Tm, Cg, and the chimeric enzymes. Relative rates of p-nitrophenol production from pNP-h-D-glucopyranoside with the chimeric C. gilvus and T. maritima h-glucosidases were measured in the presence of 50 mM concentrations of the series of straight chain alcohols from methanol to octanol at 30jC.

Modifying the Character of an Enzyme

473

most of the family 3 glycosidases, including Cg, do not possess such transglycosylation activity. Therefore, to study the transglycosylation activity of the chimeric mutants and those of their parental enzymes, their activities were assayed in the presence of a series of straight carbon chain alcohols. Because higher alcohols are not miscible in water, 15% DMSO was used to increase their solubility. At this concentration, DMSO has little or no eﬀect on the rate of hydrolysis of pNP-h-D-glucopyranoside (30). The Tm (T. maritima) h-glucosidase was activated by alcohols, with the highest rate of pnitrophenol release being observed in the presence of 50 mM hexanol (Fig. 6). As expected, no transglycosylation activity was observed for the Cg (C. gilvus) h-glucosidase. However, determination of the transglycosylation activity possessed by the chimeric mutants (Tm80Cg20 and Tm88Cg12) provided some interesting results: Of the four enzymes examined, the highest level of transglycosylation activity, which occurred optimally in the presence of heptanol, was obtained with the Tm80Cg20 chimera—almost double the level observed for the Tm enzyme, while the Tm88Cg12 chimera displayed almost the half the level of transglycosylation activity of the Tm h-glucosidase. Therefore, it appears that the amino acid residues responsible for the display of transglycosylation activity in the Tm enzyme are located upstream of the third shuﬄing site (Tm80Cg20).

4

CONCLUSION

The unique features relating to the preparation of chimeric enzymes are summarized as follows: 1. Parental enzymes can be easily selected using various databases. Enzymes whose amino acid identities are as low as 30% are acceptable for the construction of chimeric enzymes. 2. Three-dimensional information regarding the target enzyme is not essential for the construction of chimeric enzymes. 3. Shuﬄing sites for chimeric enzymes can be easily selected by the amino acid alignment, and shuﬄing in highly homologous regions is recommended. 4. The construction of chimeric genes is quite simple by using overlapping PCR and applying a proof reading DNA polymerase. 5. It is evident that the character of an enzyme can be modiﬁed via the preparation of chimeric enzymes. The characteristics of chimeric enzymes can be expected to lie in between those of the parental enzymes. As shown by the example of transglycosylation activity displayed by the Tm80Cg20 chimeric enzyme, it can also exceed the activity of the parental enzymes.

474

Hayashi et al.

6.

Based on the results of the 18 chimeric enzymes constructed, the folding information in the family 3 h-glucosidase appears to be unevenly distributed and can be summarized as follows: a) It is strict in the N-terminal domain: of the seven constructed chimera, two chimera gave weak activity and the remainder were obtained as inclusion bodies. b) It is tolerant in the C-terminal domain: of the 11 constructed chimera, seven chimeric enzymes formed active enzymes and one produced weak activity.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.

WP Stemmer. Nature 370:389–391, 1994. A Crameri, SA Raillard, E Bermudez, WP Stemmer. Nature 391:288–291, 1998. M Lehmann, L Pasamontes, SF Lassen, M Wyss. Biochim Biophys Acta 1543: 408, 2000. MA Arnott, RA Michael, CR Thompson, DW Hough, MJ Danson. J Mol Biol 304:657, 2000. AE Medlock, HA Dailey. Biochemistry 39:7461, 2000. T Kanematsu, K Yoshimura, K Hidaka, H Takeuchi, M Katan, M Hirata. Eur J Biochem 267:2731, 2000. S Hong, J Preiss. Arch Biochem Biophys 378:349, 2000. INS Subbayya, S Sukumaran, K Shivashankar, H Balaram. Biochem Biophys Res Comm 272:596, 2000. H Yoshida, K Kojima, AB Witarto, K Sode. Protein Eng 12:63, 1999. Y Kashiwagi, C Aoyagi, T Sasaki, H Taniguchi. J Ferment Bioeng 75:159–165, 1993. DK Watt, H Ono, K Hayashi. Biochem Biophys Acta 1385:78–88, 1998. K Goyal, P Selvakumar, K Hayashi. J Mol Cat B Enzymatic 16:43–51, 2001. K Ohmiya, M Takano, S Shimizu. Nucleic Acids Res 18:671, 1990. LL Lin, E Rumbak, H Zappe, JA Thompson, DR Woods. J Gen Microbiol 136:1567–1576, 1990. JN Varghese, M Hrmova, GB Fincher. Structure 7:179, 1999. MM Ahsan, S Kaneko, Q Wang, K Yura, M Go, K Hayashi. Enz Microb Tech 28:8, 2001. S Machida, Y Yu, S Singh, JD Kim, K Hayashi, Y Kawata. FEMS Microbiol Lett 159:41–46, 1998. K Hayashi, L Ying, S Singh, S Kaneko, S Nirasawa, T Shimonishi, Y Kawata, T Imoto, M Kitaoka. J Mol Cat B Enzymatic 11:811–816, 2000. SP Singh, JD Kim, S Machida, K Hayashi. Ind J Biosci Biophy 39:235–239, 2002. A Singh, K Hayashi. J Biol Chem 270:21928–21933, 1995. A Singh, K Hayashi, TT Hoa. Biochem J 305:715–719, 1995. K Goyal, Y-K Kim, M Kitaoka, K Hayashi. J Mol Cat B Enzymatic 16:43–51, 2000.

22 Assay Systems for Screening or Selection of Biocatalysts Uwe T. Bornscheuer Ernst-Moritz-Arndt-University Greifswald Greifswald, Germany

1

INTRODUCTION

The application of enzymes, especially in organic synthesis, is now well documented in literature (1–4). Biocatalytic processes are, however, not new as the ﬁrst examples are more than 100 years old, i.e., the use of an oxynitrilases for the synthesis of L-arabinose was reported back in 1894 (5). In the last century many processes have been developed based on enzymatic reactions for the production of a plethora of products, such as citric acid, amino acids, antibiotics, etc. However, the majority of them used whole cell systems. Only in the last two decades did the use of isolated enzymes become more and more important, and again new processes have found their way into industry (6–8). Although a considerable range of factors have contributed to this increased use of biocatalysts, the increasing availability of enzymes on an industrial scale is for sure one of the main factors. Rather recent breakthroughs in the development of novel methods for the discovery of biocatalysts as well as the development of eﬃcient tools for their optimization 475

476

Bornscheuer

again led and will continue to lead to a substantial increase in the number of enzymes suitable for biocatalytic applications. Indeed, when a new biocatalytic route is envisaged, the central point is the availability of the biocatalyst and its integration into the whole process as depicted in Fig. 1.

Figure 1

Steps involved in designing a biocatalytic process. (From Ref. 6.)

Screening or Selection of Biocatalysts

477

Usually, enzymes for an envisaged reaction are identiﬁed from the following sources: (i)

Screening of culturable microorganisms (strain collections, environmental samples, etc.), plant, or mammalian tissues. (ii) Investigation of commercially available enzymes. (iii) Screening of clone banks (from culturable and nonculturable organisms). (iv) Data mining, i.e., a putative enzyme is identiﬁed by sequence comparison with a known enzyme and then cloned and expressed. Once a biocatalyst is found by any of these approaches, the enzyme usually needs to be optimized to fulﬁll all the criteria for an eﬃcient production of the desired products (9). The key parameters to be considered are activity, stability, and, if applicable, enantioselectivity. If the enzyme is not commercially available, then methods for production (either by wild-type strains or by recombinant expression) and its isolation have to be developed. 1.1

Screening of Culturable Microorganisms

This approach represents the traditional method to identify a producer of an enzyme of interest. Three important stages for this strategy have to be considered: It must be clear from the intended reaction or process which enzymatic activity is required; one has to decide which groups of microorganisms have to be selected; and one must design an appropriate, convenient, and reliable assay. Usually, identiﬁcation is achieved by cultivating the microorganism in an appropriate media followed by rather simple means to identify enzymatic activity. As an example, lipase activity is usually detected by growth on agar plates supplemented with a triglyceride such as tributyrin (10). Release of free fatty acids by the hydrolytic activity of the lipase yields clear zones around the colonies. If activity has to be detected in liquid culture, a simple spectrophotometric assay, i.e., with p-nitrophenyl palmitate, is used. Many enzymes need to be induced before the microorganisms can produce this biocatalyst. Prominent examples are nitrilases and nitrile hydratases for which certain nitriles or simply urea has been used to promote enzyme production (11). Similarly, P450 monooxygenases and Baeyer– Villiger monooxygenases often need to be induced. Alternatively, a growth assay can be used for the identiﬁcation of a desired organism. For instance, amino acid racemases have been identiﬁed by supplementation of the media with the D-enantiomer of an amino acid in a synthetic media lacking the corresponding L-enantiomer. Thus it is very likely that only those organisms

478

Bornscheuer

can grow, which are able to racemize the D-enantiomer to facilitate growth on the L-amino acid. Overviews about screening strategies as well as several impressive examples of enzymes identiﬁed by this approach, which are used in industrial processes, can be found in some reviews (12,13). 1.2

Modern Approaches for Biocatalyst Discovery

It is estimated that only 0.001–1% of all microorganisms in an environmental sample are culturable using common fermentation technology. Consequently, only a minor fraction of the enzymes produced by them are accessible for biocatalytic applications (14). As a further drawback, this approach often only discloses that an enzymatic activity is present; however, no details about the substrate range, the stereoselectivity, and other enzyme properties are identiﬁed. Thus tedious and time-consuming investigations have to be performed until it is clear whether the enzyme is indeed useful for a given application. Moreover, wild-type strains often produce the enzyme of interest only to a limited extent and therefore either a strain-improvement program has to be conducted or the encoding gene has to be identiﬁed to allow overexpression of the biocatalysts in a suitable host. In contrast, recent molecular biology techniques allow to isolate the complete DNA present in a sample (in principle, the entire genomes of all organisms in the sample are accessible) followed by expression of the corresponding enzymes. Even if by this approach not all enzymes encoded by the genomes are functionally expressed, it deﬁnitely provides a very eﬀective way to access a huge number of still undiscovered biocatalysts. For instance, a large and diverse set of nitrilases were recently found using this approach. These novel nitrilases exhibited unprecedented enantioselectivity and broad substrate range; enantiocomplementary enzymes were also found (15). Data mining is yet another useful approach that takes advantage of the rapidly expanding number of sequences deposited in public (and proprietary) databases. Sequences homologous to already known ones are then used to clone and express the encoded biocatalyst. In principle, enzyme properties can be optimized by rational protein design or by directed evolution techniques or a combination of both (Fig. 2). In contrast to rational protein design, the structure and mechanism of the targeted enzymes must not be available for directed evolution. Brieﬂy, this method consists of the generation of large mutant libraries using various methods for random mutagenesis and/or recombination followed by advanced high-throughput screening or selection methods to identify the desired variants.

Screening or Selection of Biocatalysts

Figure 2

479

Principles and comparison of rational protein design vs. directed evolution.

Whereas a broad range of good and reliable methods for the creation of mutant libraries have been developed (16–21), the major bottleneck of directed evolution resides in the identiﬁcation of suitable biocatalysts. Thus there is an increasing demand for highly sophisticated assay systems, which allow the rapid and reliable identiﬁcation of desired enzymes from mutant libraries obtained by directed evolution as well as from commercially available biocatalyst samples. Even for the latter, one has to take into account that for screening of 100 commercial enzymes one usually investigates several substrates under varying reaction conditions (i.e., diﬀerent substrate concentrations, pH, temperature, various solvents, etc.) so that 500–1000 experiments can be conducted easily. In terms of time-saving, highthroughput assays are then advantageous to identify appropriate reaction conditions on a small scale and then verify only the best ones at lab-scale reactions. In the past few years, a considerable number of reviews were published summarizing various methods for the identiﬁcation of desired biocatalysts,

480

Bornscheuer

independent of whether screening or selection is envisaged. For more examples and a thorough introduction to the background, readers are referred to a number of reviews, a book, and especially the other chapters in this book for further reading (5,22–29).

2

SCREENING OR SELECTION?

Various approaches for an eﬃcient, rapid, and reliable identiﬁcation of desired enzymes within expressed gene banks or mutant libraries have been developed. The initial decision to be made is usually whether screening or selection is advantageous. In this context, screening should not be confused with the screening of strain collections or environmental samples mentioned above in Sec. 1.1. Here screening refers only to high-throughput technologies (usually using isolated enzymes). For selection, a system must be designed in which the presence of the desired catalytic activity provides a growth advantage to the microorganism producing it. This includes the simple use of the substrate of interest as carbon or nitrogen source as well as sophisticated approaches in which the catalytic activity is directly linked to a survival factor, i.e., the formation of an essential metabolite, e.g., an amino acid not present in the growth media. In order to identify the enzyme of interest, the following criteria should be fulﬁlled for a suitable screening or selection system:

The assay must be speciﬁc for the targeted enzyme. Background activity should be negligible. The substrates must be stable under assay conditions. Neither the substrate nor the product should undergo side-reactions (i.e., no undesired metabolism within a whole cell system). The signal should be concentration dependent (i.e., for spectrophotometric or ﬂuorimetric tests). The assay should be fast.

Table 1 summarizes some of the principal ways for screening or selection together with their advantages and disadvantages. 2.1 2.1.1

Selection Examples of Growth Assays

Generally, growth assays have the advantage in that they are very speciﬁc and only those biocatalysts are detected, which promote the growth of the microorganism. Which, in turn, means that huge numbers of clones can be investigated on just one agar plate, as the fraction of active enzymes within a

Photometer/ﬂuorimeter ESI-MS/NMR

MTP-screening using ‘‘true’’ substrates

Very sensitive Low background signal High-throughput Detection of improved variants possible Direct detection of true activitie

Ultra-HTS Very sensitive

Very speciﬁc Ultra-high throughput possible

Very sensitive Ultra-high throughput possible

Pros

Less sensitive Strong background possible

Synthesis required Surrogate substrates used

Not generally applicable Diﬃcult to detect improved variants for existing activity

Not generally applicable; usually restricted to products of metabolism

Not generally applicable Can generate highly resistant strains

Cons

Identiﬁcation of more stereoselective lipase/ esterase variants Fingerprinting of various enzymatic activities Determination of lipase/ esterase activity by, e.g., pH-shift, acetic-acid kit)

Identiﬁcation of proteases Epitope mapping

Identiﬁcation of trytophane producing mutants

Increased moxalactame resistance using shuﬄed Cephalosporinase genes

Example

FACS, ﬂuorescence-activated cell sorting; MTP, microtiter plate; ESI-MS, electrospray ionization-mass spectroscopy.

Synthesis/design of assay substances Photometer/ﬂuorimeter

Screening MTP-screening using chromogenic/ﬂuorogenic substrates

Display methods coupled with detection, e.g., FACS, bio-panning, or suicide substrate

Complementation

Mutated cells must release enzyme which destroys the antibiotic Enzymatic reaction product often must occur in metabolism Protein must be displayed

Requirements

Overview of Some Methods Described for Selection or Screening of Enzyme Libraries

Selection Growth in the presence of antibiotics

Method

Table 1

(42,45)

(36,37,46)

(53–56)

(31)

(52)

Ref.

Screening or Selection of Biocatalysts 481

482

Bornscheuer

given library are usually very low. As a disadvantage, these assays are less useful if the activity of the wild-type enzyme already exists—even at low levels—as it is very diﬃcult to distinguish slow-growing colonies from fastgrowing ones. Mutants of an esterase from Pseudomonas ﬂuorescens (PFE) produced by directed evolution using the mutator strain Epicurian coli XL1-Red were assayed for altered substrate speciﬁcity using a selection procedure (30). Key to the identiﬁcation of improved variants acting on a sterically hindered 3hydroxy ester—which was not hydrolyzed by the wild-type esterase—was an agar plate assay system based on pH indicators, thus leading to a change in color upon hydrolysis of the ethyl ester. Parallel assaying of replica-plated colonies on agar plates supplemented with the glycerol derivative of the 3hydroxy ester was used to reﬁne the identiﬁcation, because only E. coli colonies producing active esterases had access to the carbon source glycerol, thus leading to enhanced growth and in turn larger colonies. By this strategy, a double mutant was identiﬁed, which eﬃciently catalyzed hydrolysis. However, with this growth assay, it is almost impossible to identify mutants with enhanced activity, and especially stereoselectivity because the starting variant already shows activity. This problem might be overcome by a novel concept for a diﬀerential cell growth assay developed for the in vivo identiﬁcation of an enantioselective hydrolase. Here one enantiomer of a substrate is linked to acetic acid providing a carbon source as a growth advantage, whereas the other enantiomer is coupled to a toxic compounds, the ﬂuoro derivative of acetic acid, which causes cell death upon release (Scheme 1). Although only very preliminary experimental data are yet provided, upon further optimization, this approach will deﬁnitely ease the identiﬁcation of desired biocatalysts as the throughput will be orders of magnitudes higher than the microtiter plate screening methods given below in Sec. 2.2. Complementation of biochemical pathways has also been used to identify mutants of an enzyme involved in tryptophan biosynthesis. HisA and TrpF (isomerases involved in the biosynthesis of histidine and trypto-

Scheme 1

Principle of the diﬀerential cell growth assay to select for enantioselective esterases. (From Ref. 51.)

Screening or Selection of Biocatalysts

483

phan, respectively) have a similar (ha)8-barrel structure, and the aminoaldose substrates (ProFAR and PRA) used are very similar except for some diﬀerences in the aromatic residue as shown in Scheme 2. Using random mutagenesis and selection by complementation on media lacking tryptophan, several HisA variants that catalyze the TrpF reaction both in vivo and in vitro were identiﬁed (31). One of these variants also retained signiﬁcant HisA activity. Digital image screening was used by Arnold’s group to identify P450cam variants showing enhanced activity in naphthalene hydroxylations in the absence of the cofactor NADPH via a ‘‘peroxide shunt’’ pathway. Coexpression of P450cam with horseradish peroxidase from E. coli converted the hydroxylation products into ﬂuorescent products amenable to digital screening (32). In another example, a triple mutant of P450 BM-3 obtained by directed evolution was found to hydroxylate indole producing indigo and indirubin (33). Active mutants could be visually identiﬁed by the formation of blue/purple indigo. Similarly, mutants producing novel carotenoids could be identiﬁed visually in a library containing family-shuﬄed genes of desaturases (34). 2.2

Screening

Screening within mutant libraries is currently the method of choice used by most researchers. Albeit the throughput can be magnitudes lower than the selection methods mentioned above, it is often much easier to create a suitable assay system and at the same time to directly have access to the clone of interest, as the mutants are assayed in distinct wells of a microtiterplate or a membrane ﬁlter. The vast majority of the methods have been developed for hydrolases and within this class especially for lipases, esterases and proteases. One reason is for sure the broad interest in applying these enzymes, but also their handling and stability are easier and they do not require cofactors for activity.

Scheme 2 Principle of the growth selection assay for the identiﬁcation of HisAmutants capable of tryptophan precursor synthesis. (From Ref. 31.)

484

Bornscheuer

Several reviews already cover the recent achievements for selection assays, and a chapter of this book is devoted to enantioselective assays only. Thus only a few selected examples are summarized below to give the reader an overview of what has been achieved so far. For the rapid determination of enantioselectivity E (also named enantiomeric ratio) (35), several assay formats have been developed for lipases and esterases. Besides a few exceptions, optically pure enantiomers are usually assayed in parallel to identify variants with improved discrimination between enantiomers. Fig. 3 gives an overview of the steps required to assay a mutant library. Most researchers used chromogenic (36) or ﬂuorogenic (37) substrates for the identiﬁcation of more enantioselective hydrolases. Although measurements with these substances are usually very sensitive, a major disadvantage resides in the presence of bulky groups, which usually diﬀer considerably from the ‘‘true’’ desired substrate, i.e., an acetate. As a consequence, the risk is high that the identiﬁed ‘‘suitable’’ enzymes may show diﬀerent selectivities toward the ‘‘true’’ substrate. To overcome this problem, Kazlauskas and coworkers reported a more general applicable assay (‘‘Quick E’’) in which both acetates of chiral alcohols and esters of chiral carboxylic acids were hydro-

Figure 3 Principal steps required to assay a mutant library for improved enantioselectivity by determination of the apparent E value (Eapp) using optically pure enantiomers of the substrate.

Screening or Selection of Biocatalysts

485

lyzed in the presence of p-nitrophenol serving as pH indicator (38,39). Addition of resoruﬁn tetradecanoate introduces competition and provides more exact E values (40). Other methods are based on time-resolved IRthermographic determination of enantioselectivity (41) or on the use of ‘‘pseudo’’-enantiomers in which one form is isotopically labeled in combination with mass spectroscopic analysis (42). The major advantage is that true E values are determined, but the method needs deuterated pure enantiomers and equipment costs are very high. A ‘‘Super-HTS’’ by parallel capillary electrophoresis with chiral cyclodextrins was reported, in which optical purities of FITC-labeled amines were measured (43). So far this system is restricted to amines and also requires investments for expensive equipment. As an alternative an assay format based on a coupled enzymatic conversion was described (Scheme 3). First, acetic acid is released by lipaseor esterase-catalyzed hydrolysis from the corresponding chiral ester followed by a cascade of enzymatic reactions based on a readily available and cheap test-kit initially developed for food analysis (44). Thus acetic acid released in the initial hydrolytic reaction is stoichiometrically ‘‘transformed’’ into NADH, which can be easily quantiﬁed by spectrophotometric measurements at 340 nm. By this methodology, the activity and enantioselectivity of hydrolases can be identiﬁed and quantiﬁed within less than 2 min and it is estimated that >30,000 E values can be easily determined per day using standard equipment (45). A broad range of versatile assays were established by Reymond’s group (46,47). The most striking one is based on umbelliferone derivatives, which, in

Scheme 3 Enzyme cascade reaction used for the identiﬁcation of active and enantioselective lipases or esterases. (From Ref. 45.)

486

Bornscheuer

Scheme 4 Periodate/BSA-coupled ﬂuorogenic assay to determine enzyme activity using non-activated and stable substrate analogs. (From Refs. 46 and 47.)

contrast to simple esters or amides, are very stable under assay conditions. Only after oxidation with sodium periodate followed by addition of bovine serum albumin (BSA) at alkaline pH is the highly ﬂuorescent umbelliferone released. As shown in Scheme 4, a broad range of diﬀerent enzyme classes can be assayed for activity and/or stereoselectivity, e.g., amidases, acylases, lipases, esterases, epoxide hydrolases, and phosphatases. For P450 monooxygenase, which are known to catalyze a plethora of oxidation reactions, a versatile assay has been developed to identify enzyme or mutants, which show an altered fatty acid chain length speciﬁcity (Scheme 5). Upon hydroxylation at the N-end of a fatty acid analog bearing a p-nitrophenyl group at this position, an unstable hemiacetal is formed, which spontaneously decomposes into an aldehyde and the p-nitrophenolate anion, which can be quantiﬁed by spectrophotometric measurements. Thus mutants of P450 BM3 from Bacillus megaterium were identiﬁed, which can also hydroxylate medium chain fatty acids (48–50).

Scheme 5 Assay to identify P450-mutants with altered chain-length speciﬁcity in fatty acid hydroxylation. (From Ref. 50.)

Screening or Selection of Biocatalysts

3

487

CONCLUSION

From the examples summarized in this overview it should be obvious that a considerable range of versatile methods for screening and selection to identify the desired biocatalysts have been developed in the past few years. Despite this, there is still a need to expand the range of methods. This includes tools for the discovery of further enzyme classes as well as sophisticated systems for increased high-throughput independent of whether screening or selection will be the envisaged approach. Indeed, no general recommendation can be made so far as to which approach is better, and decisions have to be made on a caseto-case basis depending on the enzyme class, speciﬁc substrates, or enzyme characteristics to be identiﬁed or improved. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8.

9. 10. 11. 12. 13.

14. 15.

K Drauz, H Waldmann. Enzyme Catalysis in Organic Synthesis. 2 ed. Vol. 1–3. Weinheim: VCH, 2002. K Faber. Biotransformations in Organic Chemistry. 4 ed. Berlin: Springer, 2000. RN Patel. Stereoselective Biocatalysis. New York: Marcel Dekker, 2000. UT Bornscheuer, RJ Kazlauskas. Hydrolases in organic synthesis—regio- and stereoselective biotransformations. Weinheim: Wiley-VCH, 1999. DC Demirjan, PC Shah, F Moris-Varas. Screening for novel enzymes. Topics Curr Chem 200:1–29, 1999. A Schmid, JS Doridick, B Hauer, A Kiener, M Wubbolts, B Witholt. Industrial biocatalysis today and tomorrow. Nature 409:258–268, 2001. A Liese, Seelbach, C Wandrey. Industrial Biotransformations. Weinheim: WileyVCH, 2000. UT Bornscheuer. Industrial biotransformations. In: HJ Rehm, G Reed, A Pu¨hler, PJW Stadler, DR Kelly, eds. Biotechnology-Series. Vol. 8b. Weinheim: Wiley-VCH, 2000, pp 277–294. UT Bornscheuer, C Bessler, R Srinivas, SH Krishna. How to optimize lipase and related enzymes for eﬃcient application. Trends Biotechnol 20:433–437, 2002. K-E Jaeger, S Ransac, BW Dijkstra, C Colson, Mv Heuvel, O Misset. Bacterial lipases. FEMS Microbiol Rev 15:29–63, 1994. N Layh, B Hirrlinger, A Stolz, H-J Knackmuss. Enrichment strategies for nitrilehydrolysing bacteria. Appl Microbiol Biotechnol 47:668–674, 1997. J Ogawa, S Shimizu. Microbial enzymes: new industrial applications from traditional screening methods. Trends Biotechnol 17:13–20, 1999. Y Asano. Overview of screening of new microbial catalysts and their uses in organic synthesis - selection and optimization of biocatalysts. J Biotechnol 94:65–72, 2002. CA Miller. Advances in enzyme discovery. Inform 11:489–495, 2000. G DeSantis, Z Zhu, WA Greenberg, K Wong, J Chaplin, SR Hanson, B Farwell, LW Nicholson, CL Rand, DP Weiner, DE Robertson, MJ Burk. An enzyme library approach to biocatalysis: development of nitrilases for enantioselec-

488

16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.

29. 30.

31.

32. 33.

34. 35.

Bornscheuer tive production of carboxylic acid derivatives. J Am Chem Soc 124:9024–9025, 2002. UT Bornscheuer, M Pohl. Improved biocatalysts by directed evolution and rational protein design. Curr Opin Chem Biol 5:137–143, 2001. UT Bornscheuer. Directed evolution of enzymes. Angew Chem Int Ed Engl 37:3105–3108, 1998. UT Bornscheuer. Directed evolution of enzymes for biocatalytic applications. Biocat Biotransf 19:84–96, 2001. C Schmidt-Dannert, FH Arnold. Directed evolution of industrial enzymes. Trends Biotechnol 17:135–136, 1999. FH Arnold, AA Volkov. Directed evolution of biocatalysts. Curr Opin Chem Biol 3:54–59, 1999. S Brakmann, K Johnsson, eds. Directed Molecular Evolution of Proteins. Vol. 1. Weinheim: Wiley-VCH, 2002. SA Sundberg. High-throughput and ultra-high-throughput screening: solutionand cell-based approaches. Curr Opin Biotechnol 11:47–53, 2000. D Wahler, JL Reymond. Novel methods for biocatalyst screening. Curr Opin Chem Biol 5:152–158, 2001. D Wahler, JL Reymond. High-throughput screening for biocatalysts. Curr Opin Biotechnol 12:535–544, 2001. MT Reetz. New methods for the high-throughput screening of enantioselective catalysts and biocatalysts. Angew Chem Int Ed Engl 41:1335–1338, 2002. M Olsen, B Iverson, G Georgiou. High-throughput screening of enzyme libraries. Curr Opin Biotechnol 11:331–337, 2000. RP Herzberg, AJ Pope. High throughput screening: new technology for the 21st century. Curr Opin Chem Biol 4:445–451, 2000. N Cohen, S Abramov, Y Dror, A Freeman. In vitro enzyme evolution: the screening challenge of isolating the one in a million. Trends Biotechnol 19:507– 510, 2001. AD Griﬃths, DS Tawﬁk. Man-made enzymes—from design to in vitro compartmentalisation. Curr Opin Biotechnol 11:338–353, 2000. UT Bornscheuer, J Altenbuchner, HH Meyer. Directed evolution of an esterase: screening of enzyme libraries based on pH-indicators and a growth assay. Bioorg Med Chem 7:2169–2173, 1999. C Juergens, A Strom, D Wegener, S Hettwer, M Wilmanns, R Sterner. Directed evolution of a (beta-alpha)8-barrel enzyme to catalyze related reactions in two diﬀerent metabolic pathways. Proc Natl Acad Sci USA 97:9925–9930, 2000. H Joo, Z Lin, FH Arnold. Laboratory evolution of peroxide-mediated cytochrome P450 hydroxylation. Nature 399:670–673, 1999. Q-S Li, U Schwaneberg, P Fischer, RD Schmid. Directed evolution of the fatty acid hydroxylase P450 BM-3 into an indole-hydroxylating catalyst. Chem Eur J 6:1531–1536, 2000. C Schmidt-Dannert, D Umeno, FH Arnold. Molecular breeding of carotenoid biosynthetic pathways. Nat Biotechnol 18:750–753, 2000. CS Chen, Y Fujimoto, G Girdaukas, CJ Sih. Quantitative analyses of bio-

Screening or Selection of Biocatalysts

36.

37.

38.

39.

40. 41.

42.

43.

44. 45.

46.

47. 48.

49.

489

chemical kinetic resolutions of enantiomers. J Am Chem Soc 104:7294–7299, 1982. MT Reetz, A Zonta, K Schimossek, K Liebeton, K-E Jaeger. Creation of enantioselective biocatalysts for organic chemistry by in vitro evolution. Angew Chem Int Ed Engl 36:2830–2832, 1997. E Henke, UT Bornscheuer. Directed evolution of an esterase from Pseudomonas ﬂuorescens. Random mutagenesis by error-prone PCR or a mutator strain and identiﬁcation of mutants showing enhanced enantioselectivity by a resoruﬁnbased ﬂuorescence assay. Biol Chem 380:1029–1033, 1999. LE Janes, AC Lo¨wendahl, RJ Kazlauskas. Quantitative screening of hydrolase libraries using pH indicators: identifying active and enantioselective hydrolases. Chem Eur J 4:2324–2331, 1998. AMF Liu, NA Somers, RJ Kazlauskas, TS Brush, F Zocher, MM Enzelberger, UT Bornscheuer, GP Horsman, A Mezzetti, C Schmidt-Dannert, RD Schmid. Mapping the substrate selectivity of new hydrolases using colorimetric screening:lipases from Bacillus thermocatenulatus and Ophiostoma piliferum, esterases from Pseudomonas ﬂuorescens and Streptomyces diastatochromogenes. Tetrahedron: Asymmetry 20:545–556, 2001. LE Janes, RJ Kazlauskas, E Quick. A fast spectrophotometric method to measure the enantioselectivity of hydrolases. J Org Chem 62:4560–4561, 1997. MT Reetz, MH Becker, KM Ku¨hling, A Holzwarth. Time-resolved IR-thermographic detection and screening of enantioselectivity in catalytic reactions. Angew Chem Int Ed Engl 37:2647–2650, 1998. MT Reetz, MH Becker, HW Klein, D Sto¨ckigt. A method for high throughput screening of enantioselective catalysts. Angew Chem Int Ed Engl 38:1758–1761, 1999. MT Reetz, KM Ku¨hling, A Deege, H Hinrichs, D Belder. Super-high-throughput screening of enantioselective catalysts by using capillary array electrophoresis. Angew Chem Int Ed Engl 39:3891–3893, 2000. HU Bergmeyer, ed. Methods of Enzymatic Analysis. Vol. 1. Weinheim: WileyVCH, 1974. M Baumann, R Stu¨rmer, UT Bornscheuer. A high-throughput-screening method for the identiﬁcation of active and enantioselective hydrolases. Angew Chem Int Ed Engl 40:4201–4204, 2001. F Badalassi, D Wahler, G Klein, P Crotti, J-L Reymond. A versatile periodate coupled ﬂuorogenic assay for hydrolytic enzymes. Angew Chem Int Ed Engl 39:4067–4070, 2000. JL Reymond, D Wahler. Substrate arrays as enzyme ﬁngerprinting tools. Chem Bio Chem 3:701–708, 2002. D Appel, S Lutz-Wahl, P Fischer, U Schwaneberg, RD Schmid. A P450 BM-3 mutant hydroxylates alkanes, cycloalkanes, arenes and heteroarenes. J Biotechnol 88:167–171, 2001. O-S Li, U Schwaneberg, M Fischer, J Schmitt, J Pleiss, S Lutz-Wahl, RD Schmid. Rational evolution of a medium chain-speciﬁc cytochrome P-450 BM-3 variant. Biochem Biophys Acta 1545:114–121, 2001.

490

Bornscheuer

50. U Schwaneberg, C Schmidt-Dannert, J Schmitt, RD Schmid. A continuous spectrophotometric assay for P450 BM-3, a fatty acid hydroxylating enzyme, and its mutant F87A. Anal Biochem 269:259–266, 1999. 51. MT Reetz, CJ Ru¨ggeberg. A screening system for enantioselective enzymes based on diﬀerential cell growth. Chem Commun 1428–1429, 2002. 52. A Crameri, SA Raillard, E Bermudez, WPC Stemmer. DNA shuﬄing of a family of genes from diverse species accelerates directed evolution. Nature 391:288–291, 1998. 53. A Christmann, A Wentzel, C Meyer, G Meyers, H Kolmar. Epitope mapping and aﬃnity puriﬁcation of monospeciﬁc antibodies by Escherichia coli cell surface display of gene-derived random peptide libraries. J Immunol Methods 257:163–173, 2001. 54. JL Jestin, P Kristensen, G Winter. A method for the selection of catalysis using phage display and proximity coupling. Angew Chem Int Ed Engl 38:1124–1127, 1999. 55. JR Betley, S Cesaro-Tadic, A Mekhakﬁa, JH Rickard, H Durham, LJ Partridge, A Plu¨ckthun, GM Blackburn. Direct screening for phosphatase activity by turnover-based capture of protein catalysts. Angew Chem Int Ed Engl 41:775– 777, 2002. 56. J Jose, R Bernhardt, F Hannemann. Functional display of active bovine adrenodoxin on the surface of E coli by chemical incorporation of the [2Fe–2S] cluster. Chem Bio Chem 2:695–701, 2001.

23 Screening of Enzyme Variants for Thermostability Shigenori Kanaya Osaka University Osaka, Japan

1

INTRODUCTION

Enzymes are one of the most important biomolecules which catalyze a variety of reactions. Because of their high speciﬁcities and catalytic eﬃciencies, enzymes are widely used for practical purposes. However, most of the enzymes, except for those isolated from thermophilic organisms, are usually heat labile and rapidly lose their activities at high temperatures. Such instability limits the application of enzymes for practical purposes. Because a thermophilic counterpart of a mesophilic enzyme is not always available, it is important to develop a technique to increase protein stability. Various interactions important for protein stability have been identiﬁed by introducing a series of mutations in a given protein and analyzing the eﬀect of the mutations on the protein stability (1–4). They include hydrophobic interactions, hydrogen bonds, electrostatic interactions, and van der Waals interactions. It has been reported that either a hydrophobic interaction created by a buried single methylene or methyl group (5) or an intramolecular hydrogen bond (6) contributes roughly 1.3 kcal/mol to the protein stability. The 491

492

Kanaya

number of these interactions in each protein molecule is usually beyond 100 in total. Nevertheless, most of the proteins are stabilized only by 10 kcal/mol in the free energy (DG) on average. These facts indicate that a protein structure is built on a delicate balance of numerous stabilizing and destabilizing interactions within a protein molecule. Therefore it is expected that protein stability can be greatly increased by adding a single stabilizing interaction or removing a single destabilizing interaction. Two strategies have been developed to increase protein stability. One is a computer-assisted design and the other is a directed evolution. The ﬁrst strategy can only be applied to the proteins whose threedimensional structures are determined or three-dimensional structural models are available. These proteins should also be functionally well studied. Without knowing a role of each amino acid residue, it may not be possible to design protein variants with desirable functions. In contrast, the second strategy does not require detailed information on the structure and function of proteins. In directed evolution procedures, the most important step is a screening of the large mutant enzyme libraries for thermostable variants. These libraries can be created by introducing random mutagenesis into the gene encoding the enzyme of interest by error-prone PCR (7,8) and/or DNA shuﬄing (9,10). The resultant mutant genes are used to transform the cells (Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, etc.) to generate a library of transformed cells. Therefore it is important to develop a rapid and simple assay method for the activities of the enzymes produced from these transformed cells to facilitate the identiﬁcation of a desirable transformant. Two alternative methods are available to identify it. One is the in vitro assay method, and the other is the in vivo assay method. This chapter summarizes these methods, as well as the stabilization of E. coli RNase HI by a directed evolution method as a typical example. 2

IN VITRO ASSAY METHOD

The in vitro assay method includes a solution assay method and a ﬁlter assay method. In a solution assay method (Fig. 1), the transformants are ﬁrst grown on an agarose plate to form colonies. Ideally, these transformants should be examined for production of functional enzymes by developing a signal, such as color formation arising from colored products ( p-nitrophenol, etc.) or pH change mediated by pH indicators (cresol red, etc.), ﬂuorescence emission, or turbidity clearance because of the enzyme reaction. After this initial screening, colonies which give positive signals are individually grown in a liquid culture. After induction for gene expression, the culture broth is centrifuged to separate the supernatant and the cells. If the

Screening of Enzyme Variants for Thermostability

493

Figure 1 Solution assay method. The transformants grown on agar plates are individually grown in liquid cultures using 96-well format plate. A replica plate is prepared, incubated at high temperatures, and measured for residual activity. The transformants that exhibit higher activity than the parent one are chosen as positive ones.

enzyme is secreted into the external medium, the supernatant is used to measure the enzymatic activity. If the enzyme is not secreted into the external medium, the cells are suspended in the assay buﬀer and used to measure the enzymatic activity. The stability of the enzyme is evaluated by measuring the residual activities after incubation at high temperatures. The use of a 96-well format plate facilitates rapid and large-scale screening. By using this method, both the secreted enzymes, such as subtilisin E (11), subtilisin S41 (12,13), and horseradish peroxidase (14), and the cytoplasmic enzymes, such as p-nitrobenzyl esterase (15), a-aspartyl dipeptidase (16), and diacylglycerol kinase (17), have been successfully improved for thermostability. A ﬁlter assay method (Fig. 2) was developed to measure the activity of the enzyme accumulated inside the cells. To measure the activities of these enzymes, the cells should be disrupted to permit the enzyme to expose to the external solution containing a substrate, although the activities of several

494

Kanaya

Figure 2 Filter assay method. The transformants grown on agar plates are transferred to nitrocellulose membranes. Alternatively, the transformants are grown on nitrocellulose membranes placed on agar plates. The membranes, on which the transformants are grown, are treated for cell lysis, incubated at high temperatures at which the parent enzyme is largely inactivated, and subjected to activity staining. The transformants that exhibit the enzymatic activities are chosen as positive ones.

enzymes can be measured without disrupting the cells as mentioned above. However, it would be time-consuming and laborious to prepare the crude lysates for individual transformants which are grown in a liquid culture, even if a 96-well format plate is used. In this method, the transformed cells harboring randomly mutated genes are grown on nitrocellulose membranes placed on agar plates, or an agar plate and transferred to nitrocellulose membranes. It is ideal to make an initial screening for the transformants producing functional enzymes by developing a signal as mentioned above. The membranes, on which the transformants are grown, are soaked in lysis buﬀer at room temperature for cell lysis. Then, the processed membranes are incubated at a temperature high enough to largely inactivate the parent enzyme for screening. The heat-treated membranes are subjected to activity

Screening of Enzyme Variants for Thermostability

495

staining. Alternatively, the processed membranes are incubated at high temperatures on activity indicator plates. By using this method, several cytoplasmic enzymes, such as prolyl endopeptidase (18), h-glucuronidase (19), phospholipase A1 (20), and glycerol kinase (21), have been improved for thermostability. This ﬁlter assay method is useful for the secreted proteins as well. In this case, the membranes, on which the transformants are grown, are washed to remove cells, heat-treated, and stained for enzymatic activity. It is noted that the transformants screened by using these in vitro assay methods do not necessarily produce thermostable variants. If the mutation increased the production level or the speciﬁc activity of the enzyme without seriously aﬀecting the stability, the transformant which produces the resultant mutant enzyme would give a positive signal upon heat treatment. In this case, both the mutant and parent enzymes may lose their activities upon heat treatment with similar rates. However, the total enzymatic activities remained in each cell after heat treatment must increase as the production level or the speciﬁc activity of the enzyme increases. Likewise, if the mutation improved the reversibility of the enzyme in thermal denaturation without seriously aﬀecting the stability, the transformant which produces the resultant mutant enzyme would give a positive signal upon heat treatment, as long as the enzymatic activity is measured at mild temperatures. The stability of the enzyme against heat inactivation can be evaluated by measuring the residual activities at mild temperatures, only when the enzyme is irreversible in thermal denaturation. Therefore we cannot conclude that the transformants screened by using in vitro assay methods produce thermostable variants, unless these enzymes are puriﬁed and carefully characterized for enzymatic properties and stabilities. 3

IN VIVO ASSAY METHOD (GENETIC METHOD)

The in vivo assay method (genetic method) includes a thermoadaptation method, a plate assay method, and a suppressor mutation method. In a thermoadaptation method (Fig. 3), the gene encoding a given enzyme from a mesophile is introduced into a thermophile and the transformants that can grow at high temperatures are selected. This method can be applied to any enzyme, if a transformation system of a thermophile is available, and this thermophile acquires a new phenotype, which can be easily detected, upon transformation. So far, the enzymes that confer resistance to antibiotics, such as kanamycin nucleotidyl transferase (KNT) (22–24) and hygromycin B phosphotransferase (HTH) (25), and those that are involved in metabolic pathways, such as 3-isopropylmalate dehydrogenase (IPMDH) (26–28), have been used to demonstrate the usefulness of this method for the screening of

496

Kanaya

Figure 3 Thermoadaptation method. The gene encoding an enzyme from a mesophile (unﬁlled) is introduced into a thermophile by the chromosomal integration (left) or the transformation using a plasmid vector (right). Spontaneous mutations, followed by the selection of the transformants which can grow at high temperatures, permit to identify the genes encoding thermostable variants (ﬁlled gray).

thermostabilized mutants. KNT and HPH confer resistance to the antibiotics kanamycin and hygromycin B, respectively, and IPMDH permits the growth of the cells in the absence of leucine. The thermostable variant of KNT was isolated from those spontaneously mutated by introducing the gene encoding KNT from a mesophile, Staphylococcus aureus, into a moderate thermophile, Bacillus stearothermophilus, followed by the selection of the transformants which are resistant to kanamycin at 61–63jC (22,24). The B. stearothermophilus transformant carrying the wild-type gene is resistant to the antibiotics at<55jC, but not at >55jC, probably because S. aureus KNT is unstable at >55jC. Therefore B. stearothermophilus transformants that are resistant to the antibiotics at 61–63jC should produce the KNT variant which is more stable than the

Screening of Enzyme Variants for Thermostability

497

wild-type enzyme. The resultant thermostable variant was further stabilized by introducing the gene encoding this variant into an extreme thermophile, Thermus thermophilus, followed by the selection of the transformants that are resistant to the antibiotics at the temperatures at which the parent enzyme is thermally denatured (23). This selection procedure is repeated several times by gradually increasing the growth temperatures of the transformants from 64jC to 81jC. The thermostabilized mutant selected at a given temperature on a 1.5% gellan gum plate with 500 Ag/ml kanamycin was always used as a parent enzyme for the next selection procedure, in which the growth temperature of the transformant is slightly increased. Thus the KNT variant that confers resistance of the transformant to the antibiotics at 81jC was obtained. Likewise, the thermostable variant of HPH was isolated from those spontaneously mutated by introducing the gene encoding E. coli HPH into a hyperthermophilic archaeon, Sulfolobus solfataricus, followed by the selection of the transformants which are resistant to hygromycin B at 82jC (25). This temperature is the maximum temperature imposed by the growth on Gelrite plates. IPMDH, which is encoded by the leuB gene, is involved in leucine biosynthesis. Therefore a leuB-deﬁcient strain requires leucine for growth. When the leuB gene encoding B. subtilis IPMDH is integrated into the chromosome of a leuB-deﬁcient strain of T. thermophilus, the resultant transformants showed a leucine autotrophy at 56jC, but not at 61jC and above (26). Likewise, when a plasmid vector harboring the LEU2 gene encoding IPMDH from S. cerevisiae is used to transform a leuB-deﬁcient strain of T.

Figure 4 Suppressor mutation method. A mutation that destabilizes or stabilizes the protein is shown by a cross or a circle, respectively. The proteins which are considerably less stable (unﬁlled) and more stable (ﬁlled dark) than the wild-type protein, as well as those which are roughly equivalent to the wild-type protein in stability (ﬁlled gray), are schematically shown.

498

Kanaya

thermophilus, the resultant transformants do not show a leucine autotrophy at 50jC and above (28). The mutants that can grow without leucine are selected from those spontaneously mutated at higher temperatures. By increasing the selection temperature stepwise, the mutations are serially accumulated and the B. subtilis and S. cerevisiae IPMDH variants produced from the T. thermophilus transformants that can grow without leucine at 70jC have three and ﬁve mutations, respectively, as compared to the individual wild-type enzymes. In a plate assay method, the agar plates on which colonies or plaques are formed are incubated at higher temperatures and those producing stabilized mutant enzymes are selected by measuring their residual activities directly on the plates. By using this method, the T4 lysozyme variants with improved thermal stability have been obtained (29,30). The T4 lysozyme activity can be detected as a digestion halo surrounding phage plated on a bacterial lawn, following exposures to chloroform.

Figure 5 Screening of thermostabilized mutants by an alternative suppressor mutation method, in which destabilized mutant protein is constructed by a C-terminal truncation. Minimum detection level represents the lowest stability of the protein that is required to make the protein functional at the growth temperature of the transformant. The proteins with and without the stabilizing mutations are shown in thick and thin lines, respectively. The broken line represents the C-terminal region which is truncated to destabilize the protein.

Screening of Enzyme Variants for Thermostability

499

In a suppressor mutation method, intragenic, second-site reversions have been used to identify amino acid substitutions that enhance the thermostability of a given protein (Fig. 4). By using this method, the Asn57!lle mutation that stabilizes iso-1-cytochrome c from S. cerevisiae by 17jC in thermal transition temperature (Tm) has been identiﬁed (31). This mutation, which is produced by the second-site reversions, at least partially restores the function of a nonfunctional missense variant of iso-1-cytochrome c with the Gly34!Ser or His38!Pro mutation, probably by alleviating its instability. The disadvantage of this method is a relatively low frequency of the second-site reversions because the use of a nonfunctional mutant protein with a single amino acid substitution for the screening of second-site revertants often results in the reversion of the original mutation. To avoid this unfavorable reversion, an alternative screening method for second-site revertants has been developed, in which a nonfunctional mutant protein is created by deletions or truncations at its N- or C-terminus (32). By using this method, a number of mutations that stabilize E. coli RNase HI by 0.8– 7.8jC in Tm have been identiﬁed (32). The strategy to stabilize E. coli RNase HI by a suppressor mutation method, which is schematically shown in Fig. 5, will be described below in more detail. 4

STABILIZATION OF E. COLI RNASE HI BY SUPPRESSOR MUTATION METHOD

E. coli RNase HI, which is encoded by the rnhA gene, speciﬁcally hydrolyzes the RNA strand of an RNA/DNA hybrid (33,34). The protein is composed of a single polypeptide chain with 155 amino acid residues and requires Mg2+ or Mn2+ for activity. E. coli RNase HI is not required for cell growth because the rnhA mutant strain grows normally. However, the rnhA mutation aﬀects cell growth if combined with the recB or recC mutation (35). For example, E. coli strain MIC3001 with the rnhA and recB (ts) mutations shows a temperaturesensitive growth phenotype (35). Namely, this strain can grow at 30jC, but not at 42jC. Introduction of the rnhA gene into this strain complements its temperature-sensitive growth phenotype. In other words, this strain can grow at 42jC when it is transformed with the plasmid bearing the rnhA gene. When this strain was used to examine whether a C-terminal truncation aﬀects the enzymatic activity of E. coli RNase HI, the protein variants with truncations of <12 residues exhibited the RNase H activities in vivo at 42jC, whereas those with truncations of >13 residues did not. Because it has been shown that a C-terminal truncation (up to 7 residues) destabilizes this protein without considerable loss of the enzymatic activity (32), it is likely that the protein variants with truncations of >13 residues do not exhibit the RNase H activity in vivo at 42jC because of a dramatic decrease in protein stability. Thus the gene encoding these truncated proteins is suitable to screen for revertant

500

Kanaya

mutations that stabilize these truncated proteins and thereby make them functional in vivo at 42jC. Such mutations are expected to stabilize the wildtype protein as well. Random mutagenesis was introduced into the gene encoding 142RNase HI, which lacks the C-terminal 13 residues, by error-prone PCR methods (7,8). When E. coli MIC3001 transformants bearing plasmids harboring the randomly mutagenized rnhA genes were examined for their growth, colonies grew at 42jC with frequencies of 1/103–1/104. Plasmid DNAs were isolated from these colonies, and the DNA sequences of the mutant genes were determined. Of the 24 sequences examined, 1 sequence encodes the 142-RNase HI variant with three mutations, 3 encode those with two mutations, and the others encode those with a single mutation. As a result, 13 single mutations in total were shown to restore function to 142RNase HI in vivo. By introducing these single mutations individually into the wild-type protein and analyzing the stability of the resultant mutant proteins thermodynamically, 10 of the 13 mutations that restore function to 142-RNase HI in vivo were shown to increase the stability of the wild-type protein by 0.8–7.8jC in Tm (Fig. 6). Thus this screening method was shown to be quite eﬀective to identify stabilizing mutations. Three mutations did not seriously aﬀect the stability of the wild-type protein, suggesting that in vivo stability does not necessarily reﬂect in vitro stability or these mutations only stabilize 142-RNase HI. Of the 10 stabilized mutant proteins, H62P and K95N, in which His62 and Lys95 are replaced by Pro and Asn, respectively, have been extensively studied for stabilization mechanism (36–38). The H62P and K95N proteins are more stable than the wild-type protein by 4.1jC and 3.2jC in Tm, respectively (Fig. 6). In order to examine whether the elimination of the histidine residue or the introduction of the proline residue is responsible for the stabilization of the H62P protein, the additional mutant protein H62A, in which His62 is replaced by Ala, was constructed (39). This protein was nearly as stable as the wild-type protein, suggesting that the introduction of Pro at position 62 is responsible for the stabilization of the protein. Nevertheless, the crystal structures of the H62P and H62A proteins were nearly identical to that of the wild-type protein, except for the local structures around the mutation sites (Fig. 7) (36). Because His62 is located in a loop region between the al-helix and the hD-strand and is exposed to the solvent, the His62!Pro mutation probably increases the protein stability because of the decrease in the entropy of the unfolded state of the mutant protein. Likewise, to examine whether the elimination of the lysine residue or the introduction of the asparagine residue is responsible for the stabilization of the K95N protein, the additional mutant proteins K95A and K95G, in which Lys95 is replaced by Ala and Gly, respectively, were constructed (38). Analyses for the protein stability indicated that the K95A protein was as stable as the wild-type

Screening of Enzyme Variants for Thermostability

501

Figure 6 Localization of the thermostabilized mutations identiﬁed by suppressor mutation method. The side chains of the amino acid residues with mutations that restore the stability to 142-RNase HI, as well as those of the active site residues, Asp10, Glu48, and Asp70, are indicated in the crystal structure of E. coli RNase HI (PDB: 2RN2). N and C represent the N and C termini of the protein. The types of amino acid substitutions, as well as the changes in thermal transition temperature (Tm) upon mutations, are also shown. Tm represents the temperature of the midpoint of the transition in a thermal denaturation curve, which was determined by monitoring the change in the circular dichroism (CD) values at 220 nm (Ref. 37).

protein, whereas the K95G protein was more stable than the wild-type protein by 6.8jC in Tm. Determination of the crystal structures of these mutant proteins indicated that only the local structures around the mutation sites are responsible for the increase in thermostability (Fig. 8) (36). The amino acid residue at position 95 is located in a typical 3:5 type loop and forms a left-handed helical structure. In this structure, a h-carbon causes steric hindrance with the carbonyl oxygen atom within the same residue. Therefore the Lys95!Gly mutation greatly increases the protein stability, probably due to the elimination of this steric hindrance, because Gly does not have a h-carbon. The K95N protein has a h-carbon. Nevertheless, it was

502

Kanaya

Figure 7 Backbone structures of E. coli RNase HI from Leu59 to Val65. The backbone structures of the wild-type, H62P (PDB: 1RBR), and H62A (PDB: 1RBS) proteins of E. coli RNase HI around residue 62, as well as the side chains of the amino acid residues at position 62, are shown. The changes of the thermal transition temperature upon mutations are also indicated in parentheses.

Figure 8 Backbone structures of E. coli RNase HI from Thr92 to Pro97. The backbone structures of the wild-type, K95A (PDB: 1RBV), K95N (PDB: 1RBU), and K95G (PDB: 1RBT) proteins of E. coli RNase HI around residue 95 are shown. The side chains of the amino acid residues at position 95, as well as the backbone carbonyl oxygen atom within the same residue, are also shown. The changes of the thermal transition temperature upon mutations are indicated in parentheses.

Screening of Enzyme Variants for Thermostability

503

more stable than the wild-type protein as well. Its crystal structure revealed that the side-chain of Asn95 (Ny atom) forms hydrogen bond with the backbone carbonyl oxygen atom within the same residue (Fig. 8). This hydrogen bond probably compensates the backbone instability. The stabilization mechanisms of other mutant proteins remain to be analyzed. However, comparison of the locations and types of amino acid substitutions with those previously identiﬁed as thermostabilizing mutations allows us to propose stabilization mechanisms for some of the mutant proteins. For example, the Gly23!Ala, Ala24!Val, and Ala52!Val mutations probably stabilize the protein by ﬁlling a cavity, as does the Val74!Leu or lle mutation (40). Gly23, Ala24, and Val52 are fully buried inside the protein molecule and face a cavity. An increase in the volume of the side chain by the mutation is therefore expected to reduce the size of the cavity around each residue. In addition, the Glu119!Val mutation would be a result of an increase in the stability of the h-sheet because Glu119 is located in the hE strand and Val is more favorable for a h-sheet than Glu. 5

CONCLUDING REMARKS

Development of a rapid and eﬃcient screening system for thermostabilizing mutants is important not only for practical purposes, but also for basic research. If a number of the mutations which are responsible for stabilization of the protein were identiﬁed, a hyperthermostable mutant protein would be created by combining these mutations by DNA shuﬄing or site-directed mutagenesis. Cumulative eﬀects of the thermostabilizing mutations have been reported for subtilisin BPNV (41), E repressor (42), and E. coli RNase HI (43). For example, the combination of the ﬁve stabilizing mutations Gly23!Ala, His62!Pro, Val74!Leu, Lys95!Gly, and Asp134!His, which contribute to the stability of E. coli RNase HI by 0.5, 1.1, 0.9, 1.9, and 1.9 kcal/mol in DG (1.8jC, 4.1jC, 3.3jC, 6.8jC, and 7.0jC in Tm), respectively, increased the thermal stability of this protein by 5.6 kcal/mol in DG (20.2jC in Tm), as compared to that of the wild-type protein (43). In addition, identiﬁcation of thermostabilizing mutations, followed by the analyses for their stabilization mechanisms, provide valuable information on stability– structure relationships of proteins. Accumulation of this information will facilitate a development of a general method to increase protein stability in a rational manner. There is a controversy as to whether the enzymatic activity at low temperature decreases as the protein stability increases. It is generally believed that conformational ﬂexibility in the active site is responsible for the enzymatic activity, and there is an inverse relationship between the stability and the enzymatic activity at low temperature. However, directed

504

Kanaya

evolution studies have indicated that thermostable enzymes, which retain high activity at low temperature, are physically possible (44). Any enzyme can be improved for the stability without a cost of the activity. Therefore any enzyme can be stabilized without seriously aﬀecting the activity, as long as an appropriate screening system is available. An enzyme molecule may generally have a number of loci outside the active site with conformational defects that aﬀect the thermostability. Therefore reinforcing these loci by introducing amino acid substitutions would be generally eﬀective to increase the protein stability without seriously aﬀecting the enzymatic activity.

REFERENCES 1. 2. 3. 4. 5. 6.

7.

8.

9.

10. 11. 12.

13.

14.

T Alber. Mutational eﬀects on protein stability. Ann Rev Biochem 58:765–798, 1989. KA Dill. Dominant forces in protein folding. Biochemistry 29:7133–7155, 1990. AR Fersht, L Serrano. Principles of protein stability determined from protein engineering experiments. Curr Opin Struct Biol 3:75–83, 1993. K Takano, K Yutani. Stability and structure of a series of mutant human lysozymes. Curr Top Pept Protein Res 4:1–16, 2001. CN Pace. Contribution of the hydrophobic eﬀect to globular protein stability. J Mol Biol 226:29–35, 1992. BA Shirley, P Stanssens, U Hahn, CN Pace. Contribution of hydrogen bonding to the conformational stability of ribonuclease T1. Biochemistry 31:725–732, 1992. RK Saiki, DH Gelfand, S Stoﬀel, SJ Scharf, R Higuchi, GT Horn, KB Mullis, HA Erlich. Primer-directed enzymatic ampliﬁcation of DNA with a thermostable DNA polymerase. Science 239:487–491, 1988. DW Leung, E Chen, DV Goeddel. A method for random mutagenesis of a deﬁned DNA segment using a modiﬁed polymerase chain reaction. Technique 1:11–15, 1989. WP Stemmer. DNA shuﬄing by random fragmentation and reassembly: in vitro recombination for molecular evolution. Proc Natl Acad Sci U S A 91: 10747– 10751, 1994. WP Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature 370:389–391, 1994. H Zhao, FH Arnold. Directed evolution converts subtilisin E into a functional equivalent of the thermitase. Protein Eng 12:47–53, 1999. K Miyazaki, PL Wintrode, RA Grayling, DN Rubingh, FH Arnold. Directed evolution study of temperature adaptation in a psychrophilic enzyme. J Mol Biol 297:1015–1026, 2000. K Miyazaki, FH Arnold. Exploring nonnatural evolutionary pathways by saturation mutagenesis: rapid improvement of protein function. J Mol Evol 49:716–720, 1999. B Morawski, S Quan, FH Arnold. Functional expression and stabilization of

Screening of Enzyme Variants for Thermostability

15. 16.

17. 18.

19. 20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

505

horseradish peroxidase by directed evolution in Saccharomyces cerevisiae. Biotechnol Bioeng 76:99–107, 2001. L Giver, A Gershenson, PO Freskgard, FH Arnold. Directed evolution of a thermostable esterase. Proc Natl Acad Sci U S A 95:12809–12813, 1998. X Kong, Y Liu, X Gou, S Zhu, H Zhang, X Wang, J Zhang. Directed evolution of alpha-aspartyl dipeptidase from Salmonella typhimurium. Biochem Biophys Res Commun 289:137–142, 2001. Y Zhou, JU Bowie. Building a thermostable membrane protein. J Biol Chem 275:6975–6979, 2000. H Uchiyama, T Inaoka, T Ohkuma-Soyejima, H Togame, Y Shibanaka, T Yoshimoto, T Kokubo. Directed evolution to improve the thermostability of prolyl endopeptidase. J Biochem (Tokyo) 128:441–447, 2000. H Flores, AD Ellington. Increasing the thermal stability of an oligomeric protein, beta-glucuronidase. J Mol Biol 315:325–337, 2002. JK Song, JS Rhee. Simultaneous enhancement of thermostability and catalytic activity of phospholipase A(1) by evolutionary molecular engineering. Appl Environ Microbiol 66:890–894, 2000. Y Koga, M Haruki, M Morikawa, S Kanaya. Stabilities of chimeras of hyperthermophilic and mesophilic glycerol kinases constructed by DNA shuﬄing. J Biosci Bioeng 91:551–556, 2001. M Matsumura, S Aiba. Screening for thermostable mutant of kanamycin nucleotidyltransferase by the use of a transformation system for a thermophile, Bacillus stearothermophilus. J Biol Chem 260:15298–15303, 1985. J Hoseki, T Yano, Y Koyama, S Kuramitsu, H Kagamiyama. Directed evolution of thermostable kanamycin-resistance gene: a convenient selection marker for Thermus thermophilus. J Biochem (Tokyo) 126:951–956, 1999. H Liao, T McKenzie, R Hageman. Isolation of a thermostable enzyme variant by cloning and selection in a thermophile. Proc Natl Acad Sci U S A 83:576–580, 1996. R Cannio, P Contursi, M Rossi, S Bartolucci. Thermoadaptation of a mesophilic hygromycin B phosphotransferase by directed evolution in hyperthermophilic Archaea: selection of a stable genetic marker for DNA transfer into Sulfolobus solfataricus. Extremophiles 5:153–159, 2001. S Akanuma, A Yamagishi, N Tanaka, T Oshima. Serial increase in the thermal stability of 3-isopropylmalate dehydrogenase from Bacillus subtilis by experimental evolution. Protein Sci 7:698–705, 1998. M Tamakoshi, A Yamagishi, T Oshima. Screening of stable proteins in an extreme thermophile, Thermus thermophilus. Mol Microbiol 16:1031–1036, 1995. M Tamakoshi, Y Nakano, S Kakizawa, A Yamagishi, T Oshima. Selection of stabilized 3-isopropylmalate dehydrogenase of Saccharomyces cerevisiae using the host-vector system of an extreme thermophile, Thermus thermophilus. Extremophiles 5:17–22, 2001. P Pjura, M Matsumura, WA Baase, BW Matthews. Development of an in vivo method to identify mutants of phage T4 lysozyme of enhanced thermostability. Protein Sci 2:2217–2225, 1993.

506

Kanaya

30. T Alber, JA Wozniak. A genetic screen for mutations that increase the thermal stability of phage T4 lysozyme. Proc Natl Acad Sci U S A 82:747–750, 1985. 31. G Das, DR Hickey, D McLendon, G McLendon, F Sherman. Dramatic thermostabilization of yeast iso-1-cytochrome c by an asparagine!isoleucine replacement at position 57. Proc Natl Acad Sci U S A 86:496–499, 1989. 32. M Haruki, E Noguchi, A Akasako, M Oobatake, M Itaya, S Kanaya. A novel strategy for stabilization of Escherichia coli ribonuclease HI involving a screen for an intragenic suppressor of carboxyl-terminal deletions. J Biol Chem 269: 26904–26911, 1994. 33. RJ Crouch, M-L Dirksen. Ribonuclease H. In: SM Linn, RJ Roberts, eds. Nuclease. New York: Cold Spring Harbor Laboratory, 1982, pp 211–241. 34. S Kanaya. Enzymatic activity and protein stability of E. coli ribonuclease HI. In: RJ Crouch, JJ Toulme, eds. Ribonucleases H. Paris: INSERM, 1998, pp 1–38. 35. M Itaya, RJ Crouch. A combination of RNase H (rnh) and recBCD or sbcB mutations in Escherichia coli K12 adversely aﬀects growth. Mol Gen Genet 227:424–432, 1991. 36. K Ishikawa, S Kimura, S Kanaya, K Morikawa, H Nakamura. Structural study of mutants of Escherichia coli ribonuclease HI with enhanced thermostability. Protein Eng 6:85–91, 1993. 37. S Kimura, H Nakamura, T Hashimoto, M Oobatake, S Kanaya. Stabilization of Escherichia coli ribonuclease HI by strategic replacement of amino acid residues with those from the thermophilic counterpart. J Biol Chem 267:21535–21542, 1992. 38. S Kimura, S Kanaya, H Nakamura. Thermostabilization of Escherichia coli ribonuclease HI by replacing left-handed helical Lys95 with Gly or Asn. J Biol Chem 267:22014–22017, 1992. 39. S Kanaya, M Oobatake, H Nakamura, M Ikehara. pH-Dependent thermostabilization of Escherichia coli ribonuclease HI by histidine to alanine substitutions. J Biotechnol 28:117–136, 1993. 40. K Ishikawa, H Nakamura, K Morikawa, S Kanaya. Stabilization of Escherichia coli ribonuclease HI by cavity-ﬁlling mutations within a hydrophobic core. Biochemistry 32:6171–6178, 1993. 41. MW Pantoliano, M Whitlow, JF Wood, SW Dodd, KD Hardman, ML Rollence, PN Bryan. Large increases in general stability for subtilisin BPNV through incremental changes in the free energy of unfolding. Biochemistry 28:7205–7213, 1989. 42. RS Stearman, AD Frankel, E Freire, B Liu, CO Pabo. Combining thermostable mutations increases the stability of lambda repressor. Biochemistry 27:7571– 7574, 1988. 43. A Akasako, M Haruki, M Oobatake, S Kanaya. High resistance of E. coli ribonuclease HI variant with quintuple thermostabilizing mutations to thermal denaturation, acid denaturation, and proteolytic degradation. Biochemistry 34: 8115–8122, 1995. 44. FH Arnold, PL Wintrode, K Miyazaki, A Gershenson. How enzymes adapt: lessons from directed evolution. Trends Biochem Sci 26:100–106, 2001.

24 Combinatorial Mutagenesis Algorithms, Digital Imaging Spectroscopy, and Solid-Phase Assays for Directed Evolution Simon Delagrave BioTech Studio Newark, Delaware, U.S.A.

Edward J. Bylina, William J. Coleman, Steven J. Robles, Mary M. Yang, and Douglas C. Youvan* KAIROS Scientific Inc. San Diego, California, U.S.A.

Christin L. McConnell Harvard University Cambridge, Massachusetts, U.S.A.

1

DIRECTED EVOLUTION OF ENZYMES

Thousands of years ago, humanity harnessed evolution to produce the great number of domesticated crops and animals we know today. Since the late *Current aﬃliation: Foundation for the Biological Manhattan Project, Frontenac, Kansas, U.S.A.

507

508

Delagrave et al.

1980s, it has become apparent that macromolecules such as nucleic acids and polypeptides can also be evolved according to the selective criteria of scientists and technologists (1,2). Enzymes are used extensively to solve diﬃcult synthesis problems in the pharmaceutical and chemical industries (3–5). Directed evolution, also known as ‘‘in vitro evolution’’ or ‘‘applied molecular evolution,’’ is an enabling technology that will provide us with superior catalysts capable of working eﬃciently in a variety of solvents, including water, and under a wide range of conditions of temperature and pressure. By reducing the need for organic solvents or heavy metal catalysts in industrial processes and by decreasing energy consumption, the hope is that laboratory evolution will help to create a more environmentally friendly chemical industry. Directed evolution extends the enzyme repertoire available to chemists beyond what is found in nature, thereby making it possible to catalyze new reactions and create new compounds of value. Moreover, the ability of engineered (or evolved) enzymes to modify the properties of natural products will open the door to novel biomaterials: marketable goods made from inexpensive and renewable resources. A recent illustration is the use of oxidized guar, a naturally occurring polysaccharide oxidized by an evolved galactose oxidase, as a strength additive in paper manufacturing (6). Directed evolution of proteins is essentially a two-step process. In the ﬁrst step, a population of related, but heterogeneous, molecules is generated. Typically, thousands of mutations must be generated to identify a few variants with improved properties compared to the parent molecule. Combinatorial mutagenesis (7–9) and error-prone polymerase chain reaction (PCR) (10,11) are eﬀective means of generating molecular diversity. Recursive ensemble mutagenesis (REM) (12–14), target set mutagenesis (TSM) (15), and DNA shuﬄing (16) all illustrate how recombination, already known to be of great importance in natural evolution, can be coupled advantageously with the above mutagenesis methods to create optimized libraries of mutants from which large numbers of functional mutants can be readily identiﬁed. The diverse genetic library is expressed in an appropriate host organism (e.g., Escherichia coli) so that the encoded proteins can be tested. The second step of this mimicked evolutionary process is the screen, or selection, by which each of the thousands of variants generated is characterized so as to identify the rare ones having improved or interesting properties. Homogeneous liquid-phase assays have been used successfully to identify numerous improved enzymes, and many other assay formats are continually being developed (17–19). Digital imaging spectroscopy has also been used as part of a very high-throughput screen to design better mutagenesis strategies (14,20) or identify improved proteins (21), and has been combined with assays performed on solid substrates (i.e., gels) to improve enzymes (6,22). The importance of the screen cannot be overstated, as it determines in large part

Directed Evolution

509

the phenotypes of the proteins ultimately produced by the evolutionary process in the laboratory. Screening is also the stage at which most of the time and money is expended in a directed evolution project. In this chapter, we will review technologies developed in our laboratories to better harness evolution in vitro. Recent directed evolution experiments performed using high-throughput solid-phase assays, optimized combinatorial mutagenesis algorithms, and digital imaging spectroscopy are described. 2

Kcat COMBINES DIGITAL IMAGING SPECTROSCOPY AND SOLID-PHASE KINETICS SCREENING

Digital imaging spectroscopy and solid-phase kinetics screening can be combined to aﬀord a very high-throughput screen of proteins expressed in microbial colonies. This combination of technologies is currently implemented on a commercial instrument called Kcat (23,24). Tens of thousands of bacterial microcolonies expressing enzyme variants can be simultaneously assayed for activity by using a digital camera and a computer to monitor the accumulation of colored reaction products. A distinct advantage of the approach, as illustrated below, is that extremely high throughput can be achieved with minimal liquid handling, while the amounts of reagent used per enzyme variant assayed are also minimized. Another advantageous feature is that poorly soluble or highly viscous substrates are more readily assayed in a solid-phase format than by methods requiring pipetting of substrate solutions. This approach will become increasingly important as polymeric compounds become targets for enzymatic modiﬁcation. Moreover, the quality of a library of variants can be rapidly assessed using solid-phase assays. For example, the mutation rates of diﬀerent libraries can be rapidly compared by determining the proportion of functional enzyme variants they contain. This type of quantitative, very high-throughput, solid-phase kinetics screen utilizing Kcat technology can be applied to a variety of diﬀerent enzyme evolution tasks (6,23–25). Thousands of microcolonies (<1 mm in diameter) are grown on a single microporous membrane that is in contact with agar growth medium. The membrane can then be conveniently taken from the ﬁrst agar plate to another containing, for example, an inducer such as isopropyl-h-D-thiogalactoside (IPTG). It can also be incubated at a diﬀerent temperature, if necessary. Uniform lysis of cells in the microcolonies is performed by controlled exposure to chloroform vapor in a lysis chamber. The membrane is then moved to an agarose gel or other solid support containing a substrate. When an active enzyme variant in a microcolony converts the substrate to the product, a detectable color change occurs at that location on the membrane.

510

Delagrave et al.

Figure 1 Graphical user interface (GUI) of the Kcat instrument displaying data from an ultra-high density screen of more than 50,000 microcolonies. A mixture of three diﬀerent E. coli stocks containing three diﬀerent A. faecalis h-glucosidase mutants with varying levels of activity was deposited on an 82-mm PETE membrane for this experiment. The deposited cells were allowed to grow into microcolonies and then assayed as described in the text. The time course of blue color formation (monitored at 610 nm) due to the hydrolysis of substrate X-gal in a solid-phase assay is monitored using the Kcat instrument and software. The upper-left window of the GUI gives a view of the microcolonies. This view is also reduced and expanded in the two ﬂanking inset images (connected by arrows). Each microcolony in this image is approximately 200 Am in diameter. Pixels highlighted in white correspond to microcolonies showing signiﬁcant enzyme activity. The lower-left window of the GUI shows the change in absorbance of these pixels plotted as a function of time, and the right-hand window shows the same information in a color-coded form, which allows for rapid identiﬁcation of interesting mutants. Here, the absorbance measured at various times is displayed in the right-hand window using a graded assignment of black for low intensity and white for high intensity. A full-color display of this same data is available online at: http://www.kairos-scientiﬁc.com/images/SvendsenChap/ Chap24.htm.

Directed Evolution

511

A high-resolution charge-coupled device (CCD) camera is readily interfaced with a computer to take images of the membrane while it is being illuminated with a tunable monochromatic light source. As color develops in the microcolonies on the membrane, a timed series of digital images records the kinetics of product formation for all microcolonies simultaneously. Every pixel in these digital images can be followed as a function of time using customized software, which rapidly identiﬁes microcolonies having the desired enzymatic activity (26). In addition to enzyme kinetics data, this instrument can also measure the absorption spectra of microcolonies simultaneously by taking digital images of the membrane at many diﬀerent wavelengths in the visible and near-infrared. The software again captures the data, analyzes them, sorts them, and displays them in a form convenient for rapid identiﬁcation of interesting microcolonies so that their DNA can be retrieved. The enzyme h-glucosidase from A. faecalis (Abg) hydrolyzes h-glucosides and is known to have broad substrate speciﬁcity (27). There are many soluble chromogenic glucosidase and galactosidase substrates that yield insoluble products. One well-known example is the substrate of h-galactosidase, X-gal, which gives a deep blue product upon enzymatic hydrolysis. We have performed solid-phase kinetics assays on approximately 50,000 microcolonies deposited on an 82-mm polyester track etch (PETE) membrane, using a mixture of three Abg mutants. Fig. 1 shows the result of this solidphase assay, in which a mixture of microcolonies expressing one of three Abg mutants was screened on assay plates containing the substrate X-gal at a concentration of 1 mg/mL. Because the total volume used to make this assay plate was 5 mL, the eﬀective reaction volume per microcolony in this experiment is approximately 100 nL. Microcolonies expressing the high-activity variant (Y380F) were found by identifying pixels with the greatest rate of blue color formation. 3

GALACTOSE OXIDASE, THERMOSTABILITY, AND Kcat

Recently, Kcat technology was used to evolve the enzyme galactose oxidase (GO), expressed in E. coli, as described in Ref. 6. GO catalyzes the oxidation of the primary alcohol of galactose to an aldehyde without the need for nucleotide cofactors (e.g., NAD, FAD), by concomitantly reducing O2 to H2O2. The enzyme also oxidizes the galactose side chains of guar, a polysaccharide useful for paper making and other industrial applications. However, the activity of the native enzyme on guar is too low for certain applications of this enzyme to be commercially viable. By mutating the GO gene, expressing it in E. coli, and screening the library of variants for enhanced activity on guar (or the guar surrogate, methyl-galactose), it was hoped that an industrially useful enzyme could be engineered. A solid-phase assay for screening GO

512

Delagrave et al.

mutant libraries was implemented by coupling the release of H2O2 to 4chloronaphthol (4-CN) oxidation mediated by a peroxidase (Fig. 2). Errorprone PCR mutagenesis was used to mutate the entire GO gene, including the leader sequences (signal sequence and pro-sequence), yielding variants with improved catalytic activity and expression in E. coli. It is also possible to apply Kcat to the isolation of thermostable enzyme variants (28). The GO screen described above has been adapted to rapidly identify variants with improved thermostability. To do this, a membrane carrying microcolonies expressing GO variants was transferred, after lysis by chloroform vapor, to a vessel kept in a circulating water bath. The bath was held at 64–70jC for 8–10 min, depending on the experiment. Using this technique, thousands of enzyme variants can be simultaneously exposed to a perturbing environment, after which they are simultaneously assayed, as described above, for enzymatic activity at 37jC. Two libraries were screened for improved thermostability. The ﬁrst library was generated by introducing mutations, via error-prone PCR, into the wildtype GO gene, and the second library was generated by introducing mutations in the same manner into the gene of a GO variant called 8-1, which

Figure 2 A coupled assay of guar oxidation by galactose oxidase. The guar monomer, shown in brackets, comprises a galactose side chain that is oxidized at the C6 position to an aldehyde by the enzyme galactose oxidase. Concomitantly with oxidized product, GO releases hydrogen peroxide, which is used by horseradish peroxidase (HRP) as a cosubstrate to oxidize 4-chloronaphthol. Oxidation of 4-CN yields an insoluble, deeply colored product.

Directed Evolution

513

has enhanced activity. Several mutants were isolated in the primary screen and characterized further (Table 1 and Fig. 3). Three of these, GO.05h1B, GO.05h1C, and GO.1h1C (closed symbols in Fig. 3), show noticeably increased residual activity compared to their wildtype ‘‘parent’’ (WT, GOK3). As seen in Fig. 3, the temperature at which wildtype GO loses half of its initial activity is approximately 57jC. Interestingly, we found evidence suggesting that substitution C383S, identiﬁed previously as causing a threefold decrease in Km toward the substrate methyl-a-D-galactose, may also decrease thermostability by at least 3jC (compare curves of mutant 8-1 and WT in Fig. 3). Consistent with this hypothesis is the observation that substitution C383S has reverted in mutant GO8-1h2A, derived from the 8-1 library, thereby yielding wildtype-like thermostability. Mutants GO8-1h3A and GO8-1h1A, also derived from the 8-1 library (open symbols in Fig. 3), show thermostability returned to wildtype-like levels, an improvement relative to their ‘‘parent’’ 8-1. Clone GO8-1h4A is a minor contaminant with slightly increased activity, compared to wildtype, previously observed in our GO libraries. It is interesting to note that residue G195 has been changed to Ala or Glu in two diﬀerent mutants, suggesting that changes at this residue contribute to increased thermostability in GO. Assuming additivity of mutations (29), the improved thermostability of mutant GO8-1h3A (which carries the mutations G195A and Q63K relative to its parent 8-1) indicates that the G195A substitution is likely responsible for most of the enhancement (compare GO8-1h4A with WT in Fig. 3).

Table 1 Deduced Amino Acid Substitutions of GO Mutants Isolated in a Thermostability Screen Amino acid substitutionsa

Variant name GO.05h1B GO.05h1C GO.1h1C GO8-1h1A GO8-1h2A GO8-1h3A GO8-1h4A

N115H G195E G6R Q238L K342E C383S Q63K G195A Q63K

S553C

Y436H V494A N427T Y436H V494A C383S Y436H V494A

a Amino acid substitutions, deduced by DNA sequencing, are positioned on each line to indicate their relative positions in the GO amino acid sequence. Substitutions shown in bold are present in clone 8-1, parent of mutants GO8-1h1A, GO8-1h2A, and GO8-1h3A. Clone GO8-1h4a is a minor contaminant in some of our libraries showing activity slightly improved compared to wildtype GO.

514

Delagrave et al.

Figure 3 Residual activity of GO mutants as a function of pretreatment temperature. GO mutants identiﬁed by solid-phase thermostability screening were characterized by heating induced cell extract for 10 min at the speciﬁed temperature (x-axis), cooling them to room temperature, and assaying their activity using methyl-galactose as substrate. The residual activity of each clone is normalized to its residual activity after incubation at 50jC. Closed symbols are wildtype GO or mutants derived from it. Open symbols are clone 8-1 or mutants derived from it.

Sun et al. (30) also described eﬀorts to evolve the enzyme galactose oxidase. It is both interesting and instructive to compare their results with those of the earlier report by Delagrave et al. (6). Three of the seven amino acid substitutions reported by Sun et al. were also seen in Refs. 6 and 28 and the present study: V494A, N535D, and G195E. These mutations yield relatively small (less than twofold) increases in intrinsic enzymatic activity, even when combined. The fact that Sun et al., who were selecting for both improved activity and thermostability, also isolated a mutant with the G195E substitution supports the hypothesis that substitutions at residue G195 are important for thermostability (above and Ref. 28). The very interesting substitution C383S (which generates a threefold improvement in Km) was not isolated by Sun et al. possibly because it is not favored in a combination screen for mutants with improved activity and stability, as suggested by our observations on thermostability (above and Ref. 28). Among the diﬀerences in the screening conditions used by the two groups are: (a) Delagrave et al. used the complete, native GO sequence in E. coli whereas Sun et al. used a truncated version in which the leader (signal and pro) sequence was removed;

Directed Evolution

515

(b) Sun et al. induced their cultures at 30jC whereas Delagrave et al. induced at 26jC; and (c) Delagrave et al. used methyl-a-D-galactose and guar as substrates whereas Sun et al. used galactose. This illustrates that the screen may have a signiﬁcant eﬀect on the sequence of the mutants found, and that there are many opportunities for subtle diﬀerences to be introduced in independent directed evolution eﬀorts.

4

OPTIMIZED COMBINATORIAL MUTAGENESIS ALGORITHMS

Although error-prone PCR is both a simple and eﬀective tool for directed evolution, other means of creating diversity are worth considering when planning a directed evolution project, especially because diﬀerent mutagenesis methods may yield entirely diﬀerent ensembles of mutations. Certain projects may beneﬁt from mutagenesis strategies that concentrate mutations within a particular region of the protein (e.g., active sites of structurally characterized proteins, complementarity determining regions of antibodies, short proteins or peptides.) In such cases, methods of recombining mutations such as DNA shuﬄing (16) may be ineﬀective because recombination frequency decreases sharply when the mutations to be recombined are close to each other, or even adjacent, in a sequence (31). Likewise, it is also occasionally the case that certain phenotypes reside in a region of sequence space, which is too distant from a given starting point to be accessed by sequential accumulation of point mutations. For example, broadening the substrate speciﬁcity of GO to include substrates such as glucose was not possible by combining error-prone PCR and screening, but could be achieved to a limited extent by combinatorial mutagenesis (32). Combinatorial mutagenesis involves the synthesis of gene segments (oligonucleotides) in which several nucleotide positions are degenerate (7,8). By cloning the degenerate sequences in the appropriate expression vector, a library of mutant genes, which can be screened for improvements in a given phenotype, is produced. However, the possible number of mutants encoded by degenerate oligonucleotides increases exponentially as the number of degenerate positions is increased. Consequently, when as few as six codons are randomly degenerate, a library of 10,000 mutants may need to be screened to identify a single functional (i.e., non-null) mutant (13,14). Degenerate positions in which all four bases may be incorporated can be described as random or stochastic, but degenerate positions in which a subset of the four bases is used are more accurately described as biased. If this bias is introduced intentionally, based on information available from a phylogeny or from prior mutagenesis experiments, the resulting combinatorial library is not random,

516

Delagrave et al.

but ‘‘optimized.’’ By screening optimized combinatorial libraries, one can ﬁnd functional mutants and, a fortiori, improved mutants, at a much—if not inﬁnitely—higher rate than by simply screening random libraries. Recursive ensemble mutagenesis was introduced as a means of optimizing combinatorial libraries for protein engineering applications (13,14,20). In REM, small numbers of selected codons are randomly mutated. An appropriate selection or screen is then applied, and functional (but not necessarily improved) mutants are isolated from this library. The isolates are then sequenced to build a small database of permissible residues at each mutated position, creating an artiﬁcial phylogeny. This information is then used to develop an optimized library from which much larger numbers of unique functional mutants are isolated. Importantly, these unique mutants correspond to the recombined sequences of the functional mutants isolated in the ﬁrst round of random mutagenesis. In REM, crossover points are ﬁxed and occur between every mutated nucleotide, no matter how near or far they are from each other. Thus, although REM is similar to DNA shuﬄing, it is more eﬀective on short sequences because it enables a high density of crossovers between amino acids in a sequence—a result that cannot be achieved by shuﬄing (see Fig. 3, panel C, in Ref. 14). If REM is analogous to shuﬄing of random mutants, then the related approach called target set mutagenesis (15) is similar to ‘‘family shuﬄing’’ (33). TSM is a mutagenesis algorithm that relies on a phylogeny of natural sequences to design an optimized combinatorial library. As in REM, permissible amino acid substitutions at a given mutated residue are compiled and entered into a computer program called CyberDope. The program ﬁnds degenerate codons encoding the speciﬁed amino acids. Using this information, a degenerate sequence spanning several mutated codons is synthesized and an optimized combinatorial library is prepared. 5

ALTERED SUBSTRATE SPECIFICITY IN AGROBACTERIUM B-GLUCOSIDASE (ABG)

Target set mutagenesis was used to construct two diﬀerent combinatorial libraries of Abg mutants using oligonucleotide cassettes. These cassettes were synthesized based on phylogenetic sequence information from related glucosidases (Fig. 4). The construction of one of these libraries was described previously (25). Brieﬂy, the Abg structural gene was cloned into the pQE70 expression vector (QIAgen) to create an Abg expression plasmid. A BglII site was introduced at Ala389 by changing the GCC codon to GCA. Degenerate oligonucleotides corresponding to the 90-bp NarI–BglII fragment (encoding Gly360 to Ala389 in Abg) were designed and synthesized. The sequence of the degenerate oligonucleotide AbgREM1 (5V CACCGAAAACGGCGYCKS-

Directed Evolution

517

Figure 4 Directed evolution of a h-glucosidase by optimized combinatorial mutagenesis. A BLAST search of nonredundant Genbank CDS using amino acid residues 350–385 of Abg identiﬁed 20 related sequences (35). Sequence diﬀerences between these sequences and Abg were tabulated to create a pool of phylogenetically related sequences. Additional related protein sequences and sequence changes of active Abg mutants (36) were added to this pool of sequences. These phylogenetic data are shown in Panels A and B. The sequence information was inputted into the CyberDope program, which outputs degenerate codon sequences encoding speciﬁc subsets of all 20 amino acids. These subsets reﬂect the amino acid preferences observed in the phylogeny. Panel A: The combinatorial region in oligonucleotide AbgREM1 used to mutate the wildtype Abg sequence from residues 361–373. Panel B: The combinatorial region in oligonucleotide AbgREM2 used to mutate the wild-type Abg sequence from residues 375–384.

518

Figure 4

Delagrave et al.

Continued.

CWHCRAKRWKRDGNTTSWGRAWDGCRRGVTCMAWGACCAGCCGCGTCTCGATTATTACGCCGAACACCTCGGCATCGTCGCAGATCTCATCC 3V) contains the combinatorial region shown in Fig. 4A. The sequence of the degenerate oligonucleotide AbgREM2 (5V CACCGAAAACGGCGYCTGCTACAATATGGGCGTTGAAAACGGCGAGGTCAATGACSAKVVGCGTVYTKNTTWTNWCVNGVHGYATCTCGGCATCGTCGCAGATCTCATCC 3V) contains the combinatorial region shown in Fig. 4B. DNA cassettes were synthesized by PCR ampliﬁcation of the degenerate oligonucleotides. The cassettes were digested with BglII– BsaHI, ligated into the Abg expression vector digested with BglII–NarI, and transformed into M15 (pREP4). The resulting libraries were screened with a mixture of the two chromogenic substrates X-glu and Red-gal. Enzymatic hydrolysis of the substrate X-glu gives an insoluble blue product absorbing at f610 nm, whereas enzymatic hydrolysis of Red-gal produces an insoluble red product absorbing at f540 nm. These substrates were mixed together such that microcolonies expressing Abg produced a mixture of the two products. Variants with improved speciﬁcity for either galactosides or glucosides could be identiﬁed in this screen. Microcolonies expressing variants with an increased preference for Red-gal showed an increased 540–610 nm

Directed Evolution

519

spectral ratio and appeared more red, whereas microcolonies expressing variants with an increased preference for X-glu showed a decreased 540–610 nm ratio and appeared more blue. Data acquired using Kcat were analyzed to identify microcolonies expressing Abg mutants in which the substrate speciﬁcity was altered. A generalized ﬂowchart of the steps used to identify enzyme variants exhibiting both the fastest kinetics and highest substrate speciﬁcity is shown in Fig. 5. For these Abg experiments, kinetics data were obtained at 540 and 610 nm. If kA max is 540 nm, then following the steps outlined in the ﬂowchart will

Figure 5 Generalized ﬂowchart of steps that can be used to identify enzyme variants exhibiting both the fastest kinetics and highest substrate speciﬁcity.

520

Delagrave et al.

identify the ‘‘fastest, reddest’’ microcolonies. If kA max is 610 nm, then following the steps in the ﬂowchart will identify the ‘‘fastest, bluest’’ microcolonies. A previous publication shows some examples of how the Kcat instrument identiﬁes ‘‘fastest, bluest’’ microcolonies on a membrane containing an AbgREM1 combinatorial library (25). Numerous clones that display either an increased preference for the galactoside substrate relative to the glucoside substrate or an increased preference for the glucoside substrate

Figure 6 Solid-phase enzymatic assembly (SEA) of genes for directed evolution. Sequences to be assembled combinatorially are identiﬁed using bioinformatics. Necessary oligonucleotides are synthesized using current technology. A solid support (circle) is used to synthesize a combinatorial gene library by sequentially ligating mixtures of the oligonucleotides (rectangles). The resulting combinatorial gene library is screened for a desirable phenotype. As in REM, a recursive approach can be used to optimize the combinatorial library, accelerating the discovery of novel proteins with desired functions.

Directed Evolution

521

relative to the galactoside substrate were identiﬁed. Further characterization of these mutants is in progress. 6

CONCLUSION: THE EVOLUTION OF DIRECTED EVOLUTION

Tremendous progress has been made in engineering incremental improvements in the properties of proteins, to the extent that existing activities or other measurable characteristics can usually be improved by stepwise accumulation of beneﬁcial mutations. The next challenge, however, may reside in the task of creating proteins with entirely novel activities that are not observed in nature. Improvements in screening technologies are providing the throughput that may be necessary to achieve this goal. New strategies for generating sequence diversity may also be needed. Recently, new approaches involving the synthesis of genes using oligonucleotides as building blocks have been proposed as tools for directed evolution (Fig. 6) (34). Combinatorial gene libraries in which degeneracy is at the oligonucleotide level instead of the nucleotide level have therefore become possible. As a consequence, structural or other types of motifs are more readily swapped and linked in new ways without being constrained by sequence homology considerations, as in the case of DNA shuﬄing. Given the tremendous increase in the number of genome sequences available in public databases, an abundance of sequence fragments is potentially available. These novel sequences could be assembled robotically from oligonucleotides, expressed in various hosts, and screened for interesting new activities. Such an approach would also circumvent the need to acquire and maintain extensive collections of biological samples. Therefore, directed evolution may evolve into a postgenomic activity combining bioinformatics and highly automated biochemical experimentation. ACKNOWLEDGMENTS The authors thank the following individuals for their participation in the galactose oxidase project: Ms. Jennifer Rittenhouse Pruss, Mr. Anthony Maﬃa III, Ms. Christina Grek, Dr. Dennis Murphy, and Dr. Barry Marrs. REFERENCES 1.

2.

BC Cunningham, JA Wells. Improvement in the alkaline stability of subtilisin using an eﬃcient random mutagenesis and screening procedure. Protein Eng 1:319–325, 1987. GF Joyce. Ampliﬁcation, mutation and selection of catalytic RNA. Gene 82:83– 87, 1989.

522

Delagrave et al.

3.

KM Koeller, CH Wong. Enzymes for chemical synthesis. Nature 409:232–240, 2001. C Walsh. Enabling the chemistry of life. Nature 409:226–231, 2001. A Schmid, JS Dordick, B Hauer, A Kiener, M Wubbolts, B Witholt. Industrial biocatalysis today and tomorrow. Nature 409:258–268, 2001. S Delagrave, DJ Murphy, JL Pruss, AM Maﬃa III, BL Marrs, EJ Bylina, WJ Coleman, CL Grek, MR Dilworth, MM Yang, DC Youvan. Application of a very high-throughput digital imaging screen to evolve the enzyme galactose oxidase. Protein Eng 14:261–267, 2001. JF Reidhaar-Olson, RT Sauer. Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241:53–57, 1988. AR Oliphant, K Struhl. An eﬃcient method for generating proteins with altered enzymatic properties: application to beta-lactamase. Proc Natl Acad Sci USA 86:9094–9098, 1989. DK Dube, LA Loeb. Mutants generated by the insertion of random oligonucleotides into the active site of the beta-lactamase gene. Biochemistry 28:5703– 5707, 1989. KQ Chen, FH Arnold. Enzyme engineering for nonaqueous solvents: random mutagenesis to enhance activity of subtilisin E in polar organic media. Biotechnology (New York) 9:1073–1077, 1991. K Chen, FH Arnold. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. Proc Natl Acad Sci USA 90:5618–5622, 1993. AP Arkin, DC Youvan. An algorithm for protein engineering: simulations of recursive ensemble mutagenesis. Proc Natl Acad Sci USA 89:7811–7815, 1992. S Delagrave, ER Goldman, DC Youvan. Recursive ensemble mutagenesis. Protein Eng 6:327–331, 1993. S Delagrave, DC Youvan. Searching sequence space to engineer proteins: exponential ensemble mutagenesis. Biotechnology (New York) 11:1548–1552, 1993. ER Goldman, DC Youvan. An algorithmically optimized combinatorial library screened by digital imaging spectroscopy. Biotechnology (New York) 10:1557– 1561, 1992. WP Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature 370:389–391, 1994. DC Demirjian, PC Shah, F Moris-Varas. Screening for novel enzymes. In: W-D Fessner, ed. Biocatalysis—From Discovery to Application. 1st ed. Heidelberg, Germany: Springer-Verlag, 1999, pp 1–29. D Wahler, JL Reymond. High-throughput screening for biocatalysts. Curr Opin Biotechnol 12:535–544, 2001. N Cohen, S Abramov, Y Dror, A Freeman. In vitro enzyme evolution: the screening challenge of isolating the one in a million. Trends Biotechnol 19:507– 510, 2001. AP Arkin, ER Goldman, SJ Robles, CA Goddard, WJ Coleman, MM Yang, DC Youvan. Applications of imaging spectroscopy in molecular biology: II.

4. 5. 6.

7. 8.

9.

10.

11.

12. 13. 14.

15.

16. 17.

18. 19.

20.

Directed Evolution

21.

22.

23.

24. 25.

26. 27. 28.

29. 30.

31. 32.

33.

34.

35.

36.

523

Colony screening based on absorption spectra. Biotechnology (New York) 8:746–749, 1990. S Delagrave, RE Hawtin, CM Silva, MM Yang, DC Youvan. Red-shifted excitation mutants of the green ﬂuorescent protein. Biotechnology (New York) 13:151–154, 1995. H Joo, A Arisawa, Z Lin, FH Arnold. A high-throughput digital imaging screen for the discovery and directed evolution of oxygenases. Chem Biol 6:699–706, 1999. EJ Bylina, WJ Coleman, MR Dilworth, CM Silva, MM Yang, DC Youvan, Solid phase enzyme kinetics screening in microcolonies. US Patent No. 5914245, 1999. EJ Bylina, WJ Coleman, MR Dilworth, CM Silva, MM Yang, DC Youvan. Solid-phase enzyme screening. ASM News 66:211–217, 2000. EJ Bylina, WJ Coleman, CL Grek, MM Yang, DC Youvan. Directed evolution and solid phase enzyme screening. Biotechnology et alia 7:1–6, 2000. MM Yang, MR Dilworth, DC Youvan. Graphical user interface for single-pixel spectroscopy. Biotechnology et alia 5:1–8, 2000. JB Kempton, SG Withers. Mechanism of Agrobacterium beta-glucosidase: kinetic studies. Biochemistry 31:9961–9969, 1992. S Delagrave, AM Maﬃa III, DJ Murphy, JL Rittenhouse, EJ Bylina, WJ Coleman. Variant galactose oxidase, nucleic acid encoding same and methods of using same. US Patent Application No. 20010051369, 2001. JA Wells. Additivity of mutational effects in proteins. Biochemistry 29:8509– 8517, 1990. L Sun, IP Petrounia, M Yagasaki, G Bandara, FH Arnold. Expression and stabilization of galactose oxidase in Escherichia coli by directed evolution. Protein Eng 14:699–704, 2001. JN Pelletier. A RACHITT for our toolbox. Nat Biotechnol 19:314–315, 2001. L Sun, T Bulter, M Alcalde, IP Petrounia, FH Arnold. Modiﬁcation of galactose oxidase to introduce glucose-6-oxidase activity. ChemBioChem 3:781–783, 2002. A Crameri, SA Raillard, E Bermudez, WP Stemmer. DNA shuﬄing of a family of genes from diverse species accelerates directed evolution. Nature 391:288– 291, 1998. S Delagrave, BL Marrs. Methods for the enzymatic assembly of polynucleotides and identiﬁcation of polynucleotides having desired characteristics. WIPO Patent Application No. WO01/88173, 2001. SF Altschul, TL Madden, AA Schaﬀer, J Zhang, Z Zhang, W Miller, D Lipman. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402, 1997. DE Trimbur, RAJ Warren, SG Withers. Region-directed mutagenesis of residues surrounding the active site nucleophile in beta-glucosidase from Agrobacterium faecalis. J Biol Chem 267:10248–10251, 1992.

25 Screen Automation and Robotics Michael H. Lamsa Novozymes Biotech, Inc. Davis, California, U.S.A.

Nils Buchberg Jensen Novo Nordisk Engineering A/S Bagsværd, Denmark

Steen Krogsgaard Novozymes A/S Bagsværd, Denmark

1

INTRODUCTION

In this chapter, the ﬁeld of automation and enzyme screening will be presented from a variety of views: those of engineers, users, programmers, assay developers, molecular biologists, and cell biologists. Some results of the author’s screening and automation experience will be presented. It is not the intention of this chapter to present one right way to automate or exactly how to automate. Rather, it is to share experiences in the industrial enzyme screening area, to reference some pertinent literature related to the ﬁeld, and to discuss the eﬀects current and future technologies will have on shaping the industrial enzyme screening. There are two major aspects to industrial enzyme screen525

526

Lamsa et al.

ing: improving function and improving yield. This chapter will primarily focus on screening automation related to improving enzyme function. Screen automation is not just about automation and robotics equipment. An understanding of the origin of the samples, the analytical methods, and the biological, chemical, and mechanical variables that will aﬀect the success of the screening program is integral to screen automation. It is a team eﬀort involving many scientiﬁc disciplines (1). For industrial enzyme screening, more than nearly any other kind of screening environment, it is about keeping all expenses low (2), with product yield being a key focus of our industry (3). Industrial enzymes are in a market where price competition is ﬁerce and yields need to be in the multiple grams per liter in the fermentation process (2). Contrasting this to the cost of a pharmaceutical product, enzymes are very inexpensive commodities. The industrial enzyme market in the year 2000 was >$2 billion and growing, with over 60% of the enzymes used in the food, starch, and detergent industry produced in recombinant organisms (4). Our customers need to purchase inexpensive enzymes in high volume, an important point and limitation of our industry. Almost all projects within industrial enzymology require that the product (the improved enzyme) be produced inexpensively in large quantities. This is the ﬁrst constraint for any candidate enzyme to be improved; proﬁtability is an important consideration in the screen automation decision. Rarely will a program start without a clear view of potential proﬁt and beneﬁt. Although automation of screening lowers the cost on a per sample basis, the high throughput required and high equipment costs make it expensive to set up even a modest program to perform automated screening. When considering automation, there are at least two, if not three, obstacles to clear to be successful. The most important is the ﬁnal screen, e.g., the application of the enzyme. Are there resources or buy in from the resources to test the candidates? What are the limitations of those resources? If the ﬁnal application can only test a small number of hits, requiring large quantities of enzyme for the test, this can severely inhibit the success of the program. It also raises a challenge for the screener to go beyond the primary screen thought process and to consider the integration of applications-relevant or applications-correlating screens into the design process. Answers to these questions will ultimately deﬁne how far to go with a screening program. Fermentation capacity and process costs present the ﬁnal two hurdles to consider. Capacity has two facets, sample production for applications testing and manufacturing capability of the ﬁnal product. Whether this is in-house or at a toll facility, is there an appropriate industrial expression host to produce the samples and the ﬁnal product? If this is in-house, what is the capacity to test your candidates? How many can be processed? Nothing will hurt progress faster than overwhelming this capability or sending marginal hits through the process. In the case of a yield improvement program, whether it is a classical

Screen Automation and Robotics

527

or molecular biology approach, the ﬁnal screen is the current fermentation method. This needs to be emphasized because rarely can the manufacturing process be changed signiﬁcantly. A winner may be produced in the screen, but if it has a unique set of requirements to fulﬁll its potential, in manufacturing, it could fail if the process is impractical and unmanageable. Although this is a chapter on automating the screening process for programs that alter enzyme function, yield and cost are the ultimate hurdles. Many programs will be academically interesting, but will not make it past these two hurdles. Rarely is an industrial enzyme so valued that unlimited resources will be invested in its success. Thanks to the pharmaceutical industry, the genomics area, and the human genome project, industries that could bear the brunt of the investment to encourage laboratory automation, there are now many choices to assist screening eﬀorts. Without these driving forces, we would indeed have limited choices for automating our screening processes. There is, however, a big diﬀerence in what the industrial enzyme screener can spend on day-to-day reagent and disposable costs as compared to the pharmaceutical companies. This is an unfortunate but vital aspect of designing your screening process. On paper, methods are available, however, in reality, costs are prohibitive to perform the ‘‘best’’ process available for a screen, and innovations must be made and tested to circumvent these requirements. In addition, sometimes the desired methods may be extremely complicated to automate, and a simpliﬁed version of the desired analytical methods may need to be devised. This is an often misunderstood part of the screening process, where a scientist expects results but does not understand these constraints. Why automation enzyme screening? Many of the processes involved in enzyme screening are composed of relatively simple and repetitive unit operations. A typical manual assay sequence could be that a microtiter plate is placed on the workbench, samples are added from another microtiter plate, one or more reagents are added manually or by semiautomated dispenser, and, ﬁnally, the activity of the samples is quantiﬁed by reading the absorbance using a plate reader. In this setup, thousands of samples could be analyzed each day. This process satisﬁes two characteristics for successful automation: simplicity and repetition. The value that automation oﬀers in this setup is reproducibility, throughput, and traceable quantiﬁcation of the screening experiment. Further improvement of throughput can be achieved by integrating more hardware and converting your system to higher density microtiter plate formats. This is a simple example that will be further discussed in the pages to follow. There are some very amazing technologies available today in screening and automation. These will work to beneﬁt screening eﬀorts if used wisely and the capabilities are not oversold (a common problem that has dogged laboratory automation from the beginning), and actually join forces to make the most of the new ways to produce more candidates (5–13a). In researching for

528

Lamsa et al.

this work, that is what is becoming clearly evident in the numerous papers discussing increased screening capabilities and generation of hits. Current screening programs are already on the verge of overwhelming the capability of the applications and fermentation side to test all hits in a meaningful way. A new bottleneck of candidates from the ultrahigh throughput methods cited above is being created in our ﬁeld. This bottleneck parallels the eﬀects of HTS in the pharmaceutical industry (14). This may in fact be good news. If successful and routine, these are the ultimate primary screens that can be utilized to provide higher quality candidates to run on existing, lower throughput workstation and integrated robotics systems. It is now common in our own laboratories to use these systems for more intensive follow-up and veriﬁcation screening of hits from primary screens. It is unlikely that these systems will become obsolete if planning for change and upgrading is part of the initial process of automation design (1,5–19). 2

BRIEF HISTORY

Laboratory automation coupled to enzyme screening was ﬁrst implemented in the mid-1980s (3). By the mid-1990s, the area of directed evolution became a primary use for this automation (20). Most of the automation for screening is microtiter plate-based, and it is likely that this will remain an important part of the methods of automated screening (21). Automation discussed in this chapter will focus on microtiter plates. However, integration of the newer technologies with microtiter plate-based screening has become common in many research laboratories and is likely to increase as more eﬀort is focused on faster and easier screening. 2.1

Automation for Screening

Routine, highly dispersed use of automation was unheard of prior to the 1990s. Since this time, laboratory automation has undergone a small revolution. In the past, if a scientist wanted to use automation, they would basically have to build a system or contract with the few systems integrators which began to spring up in the late 1980s. Early integration consisted of devices that, in general, were not designed for automation. Typically, these were diﬃcult to interface with existing robotic arms and other devices. The early days were fraught with failures and unrealistic expectations. Since this time, and especially since the mid-1990s, there has been an explosion of growth in these ﬁelds (15,16,18,19,22–25). So much so that to some extent, the lines have blurred between areas and the use of automation, with laboratory automation now serving several versions of high throughput screening (HTS). Enzyme screening operations in the past were relatively low

Screen Automation and Robotics

529

Figure 1 This is an example of early automation and how it was conﬁgured in one of our research laboratories. This system operated with ﬁve PCs, whose monitors can be seen in the top of the photo. Four of the monitors are the Biomek 1000 robotic workstations (bottom of the photo), and one is for the ORCA robot, seen holding the plate in the photo. This system operates as a fully integrated system or in varying combinations with the robot arm, allowing unused workstations to be used individually if needed.

530

Lamsa et al.

in technology and existed in fractured environments within companies. These tended to be individuals or small groups manually performing many steps of the screening process. With today’s automation, it is likely that many small and large biotechnological companies have some elements of automation available to all laboratory staﬀ. 2.2

Example of Industrial Experience

As an international company, with many research locations, Novozymes has a multitude of diﬀerent approaches to automation implemented at diﬀerent sites. Like the early days of screening, described in the previous paragraph, automation eﬀorts are dispersed among many diﬀerent groups internationally. The systems being utilized have now been put together in four diﬀerent ways: by internal automation engineers, solely by users themselves (Fig. 1), by the interaction of users, internal automation engineers, and external integration companies, and by purchase of oﬀ-the-shelf systems put together by a vendor. A multitude of oﬀ-the-shelf systems has become available from vendors like Beckman, Packard, Tecan, CRS, and Zymark. (23,25). The availability of interfaces between components from diﬀerent vendors now allows components for automation to interface between most of the vendors. Most manufacturers of peripherals (such as spectrophotometers, plate washers, sealers, dispensers, etc.), who do not themselves make a total integrated system, now provide interfaces for all the major systems. No single way is seen as the best, and much of this depends on the particular needs and skills within the group using the automation. However, improvements and the availability of oﬀ-the-shelf systems have now made automation available to a larger group of researchers. 3

ROBOTICS AND AUTOMATION SYSTEM CATEGORIES

Medium to high throughput screening systems can be divided into three categories according to their capacity and complexity: Workstations, Single Platform Systems, and Integrated Systems. 3.1

Workstations

Workstations are specialized instruments performing a single speciﬁc task or a small number of highly related tasks. Examples include liquid handlers or plate readers in combination with a stacker/feeder mechanism. For example, sales of these devices by Tomtec with the Quadra series of liquid handling workstations and Zymark with the Twister have been very successful, integrating a few instruments in a relatively simple manner. For screens which are not too complicated to setup and do not have strict timing

Screen Automation and Robotics

531

requirements, a workstation approach to automation can often oﬀer a better value for money than the more complex integrated systems (22,26). These are for situations where a throughput requirement is below 50–100 plates a day (which may leave out many enzyme activity assays). In many cases, a number of workstations and occasional human intervention to move stacks of plates between the stations can be more eﬃcient and lower in cost than a big integrated system. In addition, a number of workstations are often considerably more ﬂexible than most big systems both because new instruments can be easily added and because changes in, for example, assay setup may be implemented with minimal eﬀort. The workstation approach is often a good one to follow when ﬁrst starting out in screen automation, since getting started and getting some experience can be achieved for a relatively small investment. Almost no matter what the experiences and later decisions are, there will likely be continued value from your investment: both because most screening laboratories have good use for a few workstations regardless of any integrated systems they may have (e.g., for assay development) and because they can use the workstation hardware to build into a later fully integrated system. 3.2

Single Platform Systems

The next step on the automation ladder is the single platform systems, such as the Tecan Genesis Freedom, where screens with medium complexity and throughput requirements can be performed on a single platform. Such systems combine a number of tasks, such as plate storage, liquid handling, incubation, centrifugation, and plate reading, on a platform with a relatively small footprint. A major advantage of these is that not-too-complex protocols may be run just as eﬃciently but at a much lower cost than large, fully integrated systems. Compared to a large integrated system, the capacity and throughput of a single platform system will be smaller and the capability to run complex assays is diminished. Advantages of single platform systems are that two or three such systems have a cost similar to the cost of one integrated system; a couple of midsized systems are likely to give much more ﬂexibility for running more diﬀerent screens in parallel than one large system. In this situation, it is likely that at least some instrumentation will be available to work with. Large integrated systems are often all or nothing: if one module is down, the entire system is down. In some cases, the beneﬁts of workstations, single platform systems, and a fully integrated system can be combined (Fig. 2). These can be considerably more expensive than the simple, stand-alone workstations or single platform systems; however, they may oﬀer the ultimate in ﬂexibility for multiple use and multiple users.

532

Lamsa et al.

Figure 2 Example of commercial automation and how it might be conﬁgured as it became available in the middle of the 1990s. The unlabeled areas could be incubators or other devices or workstations that integrated with the system.

3.3

Integrated Systems

For screens that are highly complex and/or where capacity and throughput requirements are substantial (>100 plates per day), it may be necessary to invest in one or more large integrated system(s). The integrated systems can be divided into two groups: robot-centric systems and conveyor-based systems. 3.3.1

Robot-Centric Systems

Robot-centric systems, such as Beckman/Sagian Core System, CRS HTS/ UHTS, and Zymark Presto, have all equipment placed around an articulated robot. The robot may be ﬁxed on the worktable, giving it a relatively small cylindrical working volume, or it may be placed on a track, giving it a larger working volume, especially in systems allowing equipment to be placed on both sides of a central track. Such systems have a very large ﬂexibility, where equipment can be placed freely around the robot or track, allowing redundant resources or several types of microtiter plate readers in the same system. In very large and complex systems, however, the robot can become the limiting resource, as it is responsible for all the transportation of plates between equipment. Even with a fast robot, for a complex protocol involving many steps, the cycle time becomes several minutes. Adding redundant resources such as more liquid handling often does not solve the problem. Some companies have implemented dual robot (or even triple robot) systems to overcome this limitation (18), but then the systems become very complex and such systems can be

Screen Automation and Robotics

533

recommended only for organizations with very high experience and skills in the ﬁeld of laboratory automation. 3.3.2

Conveyor-Based Systems

Conveyor-based systems, such as CCS Packard PlateTrak, CRS Dimension4, and Zymark Allegro, have all equipment aligned along a conveyor mechanism transporting plates between instrument positions. The conveyor can be at least as long as a robot track, allowing systems to be just as complex and ﬂexible with regard to the amount of equipment. Relatively simple pick and place robots are transporting plates between the conveyor and the instruments. For a very high throughput system, consider a conveyor-based system in favor of a robot-centric system. With a conveyor-based system, several plates can be transported to and from instruments and between instrument positions at the same time leading to cycle times of a minute or less. The industry tendency over the last few years seems to be moving away from the robot-centric systems and toward the conveyor-based systems. This trend is developing because in very complex and very high throughput systems, the robot quickly becomes the bottleneck. A conveyor-based system may be the best choice for methods involving these elements. 3.4

Schedulers

A scheduler is a software program that organizes and optimizes the workﬂow of an automated system, typically a fully integrated system. Schedulers have several advantages over user-programmed scheduling of tasks. They can ﬁnd a processing pattern, which is more eﬃcient than a sequential processing pattern. They can utilize redundant resources; the more advanced schedulers can recover from loss of resources and dynamically reschedule the process. To some extent, they can also ensure similar timing for all plates in a given step, e.g., as in an incubation step. Schedulers are also very useful in allowing the user to test several conﬁgurations to optimize the process without having to take the time to run it. One disadvantage of using schedulers is that the process pattern is not always predictable. Plates within a run may be treated diﬀerently, and the derived process pattern is probably not the most eﬃcient. (Frequently, the programmer can visually see ineﬃciencies that they would like to ﬁx, but may be unable to do so within such a system.) Also, if bar codes are not used, it can become diﬃcult to track plates and data associated with those plates without going through the method log and transportation schedule. Many systems with scheduler software are performing the very same (and few) processes over long periods of time. In situations where protocols

534

Lamsa et al.

are not frequently changed, or the system does not have redundant resources, a better performing system may be obtained by carefully creating processing sequences without a scheduler. 3.5

Software Development Environments

In many cases, it is desirable for the user to write software for the entire system or to enhance aspects of a commercial system. Attempting this task is largely dependent on the user’s programming background and the environment of the commercial system. Some older versions of systems, such as the ORCA robot from Beckman/Sagain, came with robot control software that could be used to control the entire system. The user could set this up or, more commonly, the control software is set up by a systems integrator. Microsoft Visual Basic is also commonly used to control automation equipment (27,28), often being the interface within the commercial software. Again, depending on user expertise, manipulations of common automation can be performed in VB or nearly any software language. Now, it is more common, however, for the vendor-supplied software to be more than adequate and easy to use such that many users can learn and manipulate their systems with relative ease. 3.5.1

Proprietary Environments

Proprietary environments with their own programming language, such as Beckman/Sagian MDS, Tecan Pegasus, and Zymark EasyLab, have several advantages. They get a system up and running quickly. They create a framework for most common applications. They cover all low- and mid-level details and contain all common procedures programmed beforehand. Some of the disadvantages are that it can be diﬃcult to perform very complex tasks; some often must be done by separate programs or systems. The programming paradigm has been chosen by the vendor, making it diﬃcult, if not impossible, to create applications outside the intended scope. The systems can be less ﬂexible because of the paradigm constraints. Some vendors have done a good job to include nearly all the functionality needed for nearly every situation. However, the paradigms can often miss features important to some users, and it can be time-consuming to learn all the functionality of a program. The industry tendency does not seem to be clear. For example, Beckman/Sagian and, to some extent, Zymark have left the proprietary route several years ago. In contrast, Tecan is marketing a brand new proprietary environment. In conclusion, proprietary environments are good for the developed applications (what the designers were considering), but can be severely constrained in situations where an application needs to go outside the intended scope.

Screen Automation and Robotics

3.5.2

535

Device Libraries

Device libraries, such as Beckman/Sagian ORCA NT and ActiveX components from various vendors, are designed to handle equipment control leaving the application programming to a generic programming language, such as Visual Basic or C++. Device libraries have the following advantages: lowlevel functionality is still made by the vendor, more ﬂexible than proprietary environments, and more homogeneous run time environment, e.g., control of devices from several vendors can be integrated into a single software package. The disadvantages include a requirement for experience, and it is likely to be more time-consuming than application development in a proprietary environment. Device libraries are still somewhat limited in scope. When using device libraries from several vendors in the same application, the program feel and the GUI can become somewhat inhomogeneous. 3.5.3

Custom-Written Code

Custom code written entirely from scratch can be made for very speciﬁc purposes, where the above-mentioned methodologies cannot fulﬁll the given set of requirements. This should be considered for very experienced users who have the necessary resources for application development and maintenance. It seems to be the tendency that experienced developers tend to prefer the device libraries to the proprietary environments because of the additional ﬂexibility, but be aware that it certainly comes with a cost of much more development eﬀort. 3.5.4

Data Management

The volume of data generated by HTS systems can come as a surprise to new users. Very often, the instrumentation vendors do not prepare the users for the mountains of data their systems will produce, and typically the solutions are up to the users or to the company information management team. Fully integrated systems tend to have data logging, but that is usually the extent of it. A simple solution is to store the data ﬁles from the detection instruments and then import the ﬁle into a standard spreadsheet application. Although this method may work nicely for some applications, it has some serious drawbacks if the objective is to have a historical archive of data that is easily accessible. For most enzyme screening, the data are sorted and a subset is retained relating to the selected features identiﬁed in the screen. In general, libraries exist in test tubes in a mixture and are not, in general, retained as individual, unique isolates that are screened repeatedly. In the past, the need for extensive, large databases has not been a high priority for enzyme screening. However, as more hits are generated and the interest in retaining more of these as candidates for further manipulation in enzyme evolution, it

536

Lamsa et al.

will become more important to have a good database system. It is important to keep all primary data for data mining purposes. Key parameters of the improved mutants may be discovered in the course of retesting in larger scale. It is then good to be able to go back and reselect. Key traits of each run that will help to determine how reliable the results are may also be discovered. These retrospective analyses are only possible if the data are kept in a suitable annotated form. A substantial part of the resources allocated for HTS should be spent on creating and maintaining a good data collection, storage, reduction, and analysis system. Inspiration for this could be gained from the software used in medicinal chemistry, where the use of data reduction and analysis tools is a common practice. There are a number of specialized applications with very strong visualization capabilities, such as those produced by Spotﬁrek and Partekk. Database companies such as IDBS (http://www.id-bs.com) and NuGenesis (http://www.nugenesis.com/) oﬀer some of the solutions needed; however, these companies are more geared toward the pharmaceutical industry than toward industrial enzymes. In general, working within the capabilities already present within the organization is a good place to start. An excellent example of an in-house database that could be used as a design model for screening can be found in Ref. 29. Many of the features described could be applied to the design of a database for enzyme screening.

4

SCREENING AND AUTOMATION ISSUES

4.1

Enzyme Evolution in the Literature

Most of the focus of a push towards automation has been driven by the technologies available to make immense libraries of enzyme variants to be screened (5,30–38). In combination with this, many assays and methods that have been automated into microtiter plate formats have been demonstrated in automated screens or miniaturized manual screens (20,39–43). Screening is recognized as a necessary evil to ﬁnding useful enzymes, as tying selection to a screen is not always possible or functional. 4.2 4.2.1

Automation Evolution Older System Configurations

Automation has evolved signiﬁcantly since ﬁrst implemented in laboratories. Fig. 1 is a photograph of an integrated system in a screening laboratory. In this system, four separate PCs controlling the individual workstations (Biomek 1000, Beckman Instruments) were integrated with the PC controlling the robot arm (ORCA, at the time of purchase, produced by Hewlett-

Screen Automation and Robotics

537

Packard, now owned by Beckman Instruments) to make this system function. When this was set up, in the early 1990s, the idea was to use a workstation that was multifunctional to allow adapting methods as screening research projects and priorities changed. Each workstation pictured has liquid handling (single tip or eight-tip tools) and plate reading capability (optical density tool), in addition to being compatible with many vendor-supported capabilities such as PCR blocks, high-density replicating tools, and bulk dispensing. Workstations such as these were the early precursors to larger, more capable workstations that are available today. This system was built around a nonrandom access, plate-stacking system that was actually very fast. Plates were often moved around in stacks to speed delivery and reduce transport times. A feature of this system is the ability to use each workstation as an easily accessible stand-alone unit. Over the years, various devices and upgrades have been performed on the system to add functionality. 4.2.2

Newer System Configurations

An example of the newer type of system utilizing some of the same principles is illustrated in Fig. 2. It is a commercial system where all the components have been tested and integrated oﬀ-site by the vendor. The software to run the system takes advantage of scheduling to ease programming and assay optimization. This particular system is designed to function much as the older system in Fig. 1. Each large, multifunctional workstation is situated to be human-accessible; the system can be run with both workstations being utilized, or with only one, leaving the other available for analytical development or other small-scale screening tasks. A design such as this leaves open the option of performing two completely diﬀerent screens simultaneously. Redundant components also allow for minimization of downtime in the event of the failure of a major component. The major advantage of this type of system is that it is easier to learn to program for a motivated user who may have a minimal background in programming. 4.3

Automated Screening for Enzyme Function

A very common method that has been available since the mid-1990s is the use of a colony-picking robot to select active clones for a liquid-based screen in microtiter plates. Fig. 3 illustrates a typical ﬂow of a microtiter plate-based screen that begins with the utilization of a colony picker. This could be compatible with nearly any density format or microtiter plate, depending on the scale of the plates and robotics. Typically, if a solid phase screen was available to identify clones expressing active enzymes, that screen would be employed in conjunction with a colony-picking robot to only pick those active clones to the liquid phase. If no activity-based solid-phase screen was

538

Lamsa et al.

Figure 3 This is a simpliﬁed view of the interaction of complete automated screening method for a directed evolution program to improve enzyme function. A library is transformed into a host; the cells of the microorganism are plated on to agar plates containing medium or medium and substrate that can be detected by a CCD camera on a colony-picking robot. Colonies are picked to liquid medium, grown up to produce broth samples containing the enzyme being screened, then an automated treatment and/or analytical method is used to screen the library clones.

available, in order to have an eﬃcient method, all clones would be picked into all wells in the microtiter plates. In the past, prior to the invention and commercialization of colonypicking devices, a common way to inoculate microtiter plates was by dilution methods utilizing the characteristics of Poisson distribution statistics. In Fig. 4, the graph shows the number of single colonies/well as one line (diamonds), all wells with more than one colony are included as the other line (triangles), and the ﬁnal line (squares) is the sum of the two. For instance, an average of 1 colony per well means that 36 wells are single colonies, 24 wells are multiple colonies, with, on average, 2.5 colonies/well, and the remaining 26 wells are empty. These same statistics apply to higher density plates that are being produced now for screening. An example of this is the GigaMatrixk (13). This plate and other approaches like it use these statistics for inoculation of the plates with clones. These are dip and test plates, currently useful for selection and, to a degree, for screening. However, by the

Screen Automation and Robotics

539

Figure 4 Poisson statistics graphed to illustrate the typical distribution of microbial colonies in a 96-well microtiter plate. The theoretical inoculation rate represents how many colonies are added to a given volume of medium to achieve up to 192 total colonies in the total volume of medium to be distributed into the microtiter plate in this example. The graph depicts the predicted distribution of colonies within individual wells of the 96-well microtiter plate. Multiple colonies per well represent two or more, and in this example, they represent a range of two to six colonies per well.

540

Lamsa et al.

current methods of ﬁlling and inoculation, if the goal of this design is to screen unique isolates, the plate must be inoculated such that only 1/10th the total number of wells are utilized. This is still impressive; however, a workable solution is needed to make a better utility of all the wells to make use of its full potential as a screening tool. 4.3.1

Defining Noise in a Screening Method

Understanding what is going on in the screening process is important in making the screen work optimally. The advantage (and curse) of automating the process is that it is now easier to track noise quantitatively in the screening system. A few examples of this are given. In Fig. 5, an enzyme assay is performed in 96-well v-bottomed polycarbonate plates in customdesigned heating blocks. In these graphs, the same concentration of enzyme is tested in all wells to map noise in the heating device. The temperature at which the assays are performed and the % CV data are given in the ﬁgure legend. To simplify visualization of the data, the absorbance reading obtained in the assay is normalized to the mean value of the assay, then the graph is positioned by rotating it (MS ExcelR) such that 0% CV would yield a line that disappears when all data points lie at 100% on the z-axis. The x- and y-axes correspond to microtiter plate row and column locations, respectively. By setting up a template to view assay data in this manner, variations above and below the mean can easily be visualized. To further enhance the view, the graph can be rotated in MS ExcelR to view the column or row data to further visualize where noise resides. This is a nice way to visualize % CV for an analytical method. It can quickly show trouble spots and trends that are not apparent when simply looking at the results reported as a % CV. Thermal inactivation using the custom-designed heating blocks is illustrated in Fig. 6. In contrast to Fig. 5, the data obtained from a heating block for heat killing of enzyme in polycarbonate plates (in a buﬀer without substrate) behave quite diﬀerently from activity assays (enzyme plus substrate) in the block. Heating blocks should be validated for the type of measurements being generated using the block. Enzyme without substrate is much more susceptible to noise issues at temperatures where the enzyme begins to denature. In Fig. 6, graphs A, B, C, and D are treatments at 0, 5, 10, and 15 min at a ﬁxed temperature, respectively. As temperature or time is increased, the noise associated with this type of measurement generally increases, more so than in the case of the activity assay in Fig. 5. These ﬁgures illustrate why it is important to deﬁne and improve conditions that aﬀect analytical methods, particularly those related to temperature eﬀects (44).

Screen Automation and Robotics

541

Figure 5 This is a graphical illustration of an activity assay of a hydrolytic enzyme performed in v-bottomed 96-well polycarbonate plates placed in a heating block. Heating block temperatures are A = ambient (20jC), B = 45jC, C = 60jC, and D = 75jC. The same amount of enzyme and substrate are added to all of the wells; after treatment, the samples are transferred to a ﬂat-bottomed 96-well plate for reading absorbance values. The mean assay result is calculated and individual results are divided by the mean and expressed as a percent of the mean (z-axis). The xaxis = rows (A–H), y-axis = columns 1–12 of the 96-well plate. Variation within the plate can be visualized by the percent above or below the mean; no visible line (a line exactly at 100%) would indicate no variation. In this example, variation is expressed as % CV, and for graphs A, B, C, and D, it is 0.9%, 2.4%, 2.8%, and 5.3%, respectively.

4.3.2

Optimization of a Screening Process

An early-stage screen result is illustrated in Fig. 7 as an example of results that can be obtained when a screening method is scaled up from bench scale to high throughput scale. This is an example of a directed evolution library screening where a treatment is applied and the eﬀect is measured as % residual activity after heat challenge. Approximately 700 control data points were compared with 5500 data points from a mutant library under the conditions initially chosen for this screen. These data

542

Lamsa et al.

Figure 6 This is a graphical illustration of the thermal treatment of a hydrolytic enzyme in a buﬀer followed by an activity assay at ambient temperature. The same volume and concentration of enzyme is added to each well of v-bottomed 96-well polycarbonate plates which are then placed in heating blocks. Graph A = control (20jC), graphs B, C, and D are 5, 10, and 15 min, respectively, at 70jC. After treatment, the same sample volume is removed from each well and an ambient temperature activity assay is performed in a new, ﬂat-bottomed 96-well plate. The mean assay result is calculated and individual results are divided by the mean and expressed as a percent of the mean (z-axis). The x-axis = rows (A–H), yaxis = columns 1–12 of the 96-well plate. Variation within the plate can be visualized by the percent above or below the mean; no visible line (a line exactly at 100%) would indicate no variation. In this example, variation is expressed as % CV, and for graphs A, B, C, and D, it is 1.3%, 8.5%, 8.8%, and 9.7%, respectively.

show that it was virtually impossible under the initial conditions chosen for the screen to clearly identify mutants that improved compared to the controls. In the example depicted in Fig. 8, a hydrolase is being screened after conditions have been further optimized. Low and high % residual activity controls have been added to validate the method. In this screening method, improved variants are much easier to spot than in the previous example. A very useful statistic can be applied to a screen to determine the quality of the data. Table 1 illustrates the data graphed in Fig. 8 to calculate the Z-

Screen Automation and Robotics

543

Figure 7 This is an example of an early-stage screen development of a hydrolase enzyme after adapting the method from a manual method to an automated method. It is an example that illustrates the variation of the screen (which could include both the analytical and mechanical aspects of automating a screen). The mutants are a unique set of mutants; the controls are all the same hydrolase. As was discovered in follow-up analysis of mutants selected from this screen, most of the hits were false positives or noise associated with analytical and mechanical aspects of the screen that required further attention (data not shown).

statistic (45,46) for this particular screen, referred to as ZV in this instance (46), where: Z V ¼ 1 ð3rControl 1 þ 3rControl 2Þ=jl Control 1 l Control 2j In general, Z values >0.5 are excellent assays, with large separation bands. This is illustrated in Fig. 9, a graphical representation of the data in Table 1, where the wild-type control and the high control distribution peaks are well separated from each other. This illustrates the value of the application of the Z-factors in evaluating screening method performance and more clearly shows the separation bands as compared to the same data graphed as a scatter plot in Fig. 8. The wild-type and low controls are too close to each

544

Lamsa et al.

Figure 8 This ﬁgure illustrates the eﬀects of improvements made in the screening method for the hydrolase illustrated in Fig. 7. By adjusting mechanical and analytical aspects of the screen, a much improved, more reliable screening process was achieved. The wild-type control is now more distinct from the mutants being screened. These data are from directed evolution libraries of variants that were an improvement of wild type. A high-control variant of wild type and a low-control variant of wild type known to perform better and worse, respectively, than wild type with this particular treatment was available for this comparison.

Table 1 Applying the Z-Statistic to Screen Development for an Improved Hydrolase Description WT Low High Mutants

Median

3SD

n

Z-factor relationship

Z-factor

6.36 0.70 74.09 11.49

3.99 3.30 18.90 28.23

36 65 89 7664

WT–high WT–low Low–high NA

0.701 0.181 0.697 NA

Screen Automation and Robotics

545

Figure 9 This ﬁgure is the data for the hydrolase from Fig. 8 graphed as a distribution. It can help to visualize the Z-statistic described in the text and presented in Table 1. WT control is the wild-type control enzyme, low control and high control are the same as described in Fig. 8 legend. Mutants are all the variants screened in this particular screening experiment.

other to clearly distinguish improved variants. As the statistic goes toward 0, it is an indication that the quality of the assay is diminishing towards a yes/no type of assay, and values below 0 indicate that it is virtually impossible to screen in any meaningful way. Evaluation of your method with this statistic can aid in adjusting the screening parameters to improve the separation bands of control populations. In the above examples, the Z-statistic calculation is taking into consideration a mixture of analytical methods of which any one by itself could be evaluated for ZV factors. These include the enzyme assay itself or the heating block treatment eﬀect. Nearly any aspect of the parameters used in a screen could be analyzed for ZV factors to measure the quality of the process or devices used to help pinpoint particularly troublesome aspects of the screening process. There are many interactions of biology, chemistry, and machinery going on during a screen. Nothing should be taken for granted, and careful

546

Lamsa et al.

dissection of a screen is often necessary to deﬁne the parameters that aﬀect the screen. Sometimes it is necessary to run the screen with the noise issues ‘‘intact’’ for a while to look at reproducibility. Often, in an automated system, this is more of a job of the screeners than of the programmers or automation experts. Seemingly simple factors, such as locations of air-conditioning ducts in the laboratory, reagents sticking to tubing or reservoirs, heating block, or incubator eﬀects, show up when the number of test samples is increased. It can take patience and repeated testing to clearly identify problems in the process. Awareness of these issues in consideration of optimizing the process can pay big dividends in the quality of the data generated (47,48). Data interpretation tends to be biased based on previous experiences and expectations when interpreting data generated by screening. It is not always clear what statistical parameter to apply, and due to issues of undiscovered noise, the screening landscape can be a moving landscape. Within industrial enzyme screening, many programs do not have the longevity to allow the comfort of clear identiﬁcation of all the parameters important to improving the quality of the data. Generic techniques, to a certain extent, can be optimized. In the changing landscape of expression system-speciﬁc issues, enzyme-speciﬁc issues, and even device-speciﬁc issues, optimization can be elusive. 4.3.3

Variants Selected From a Directed Evolution Screening Process

In Fig. 10, hydrolase variants isolated from the primary screening process depicted in Fig. 8 were followed up in a variety of screens with slightly diﬀerent parameters to evaluate the diﬀerences in the selected variants. In this way, a ﬁngerprint of the variant can be obtained, and this can also be used to show that two similar variants selected in the primary screen are most likely the result of diﬀerent gene products. It can also help in evaluating which of a group would be better to test in an application if the follow-up test is more stringent than the original screen. In this ﬁgure, it is noteworthy to compare the Libraries A and B variant movement relative to the high control. It is clear that two isolates from Library B are clearly better than the rest (Graph B), while Library A variants retain function in the low temperature, destabilizing the condition. The ﬁnal example, depicted in Fig. 11, is of a directed evolution screen that correlated with a baking application that would be diﬃcult to automate. One of the mutants (from Library 9, the gray triangle in the upper left) was the best performer in the application. A large number of the variants depicted on this graph were tested in the application to clearly identify the improved variant. The application, the scale of the test required, and the market for the

Screen Automation and Robotics

547

Figure 10 This ﬁgure illustrates the ﬁngerprinting of improved variants selected in the type of hydrolase screen found graphed in Fig. 8. Graphs A, B, C, and D are combinations of temperature and treatment cocktails used to help identify and further diﬀerentiate selected variants of Libraries A and B from the controls, in particular the high control. Graph A is the data where no treatment is applied other than temperature and buﬀer. Graph B is a harsh treatment where the wild-type enzyme is known to be unstable. Graph C uses a stabilizer, known to protect the enzyme stability and challenging with a very high temperature. Graph D uses a treatment known to quickly destabilize the enzyme at a low temperature. The x-axis = relative activity, the y-axis = % residual activity, calculated by a comparison of initial samples and ﬁnal samples that have been treated by the indicated conditions. The high control retains activity after all treatments; ranking of the variants changes depending upon the treatment. In Graph B, Library B has the most improved variants; in Graph D, Library A variants are the most improved. Key to symbols : x = wild type, n = high control, D = low control, . = Library A, w = Library B.

548

Lamsa et al.

Figure 11 This ﬁgure is an example of a directed evolution screening program performed over a period of approximately a year, where wild type (WT) was a carbohydrase subjected to poison PCR mutagenesis. Successive rounds of shuﬄing of improved variants created these libraries. These were tested for improved thermal stability in a buﬀered screening system in 96-well plates. All of the improved mutants were tested against each other at this time in the follow-up screen. In this test, the cluster of mutants in the upper left corner provided a mutant with greatly improved performance in a baking process.

product often set the limits of the number of variants that can be tested from a screening program for industrial enzymes. Improvements in scale-down and automation of follow-up screens that correlate with applications are a severe limitation for improving industrial enzymes. 4.4 4.4.1

Automated Screening for Increased Enzyme Yield Growth of Microorganisms in Microtiter Plates

Factors such as type of organism, media composition and strength, volume, shaking, temperature, and moisture all inﬂuence the amount of enzyme mea-

Screen Automation and Robotics

549

sured in a well. Also, the inoculation method and the chemistry of the assay will have a great inﬂuence on the amount of retesting that will be required. When the goal is the identiﬁcation of mutants that produce higher enzyme titers, it is important to design the microtiter screening setup in a way that mimics the conditions that the strain will experience in a production environment. A mutant isolated from a microtiter plate screen should retain the increased performance when grown in production fermentors. Production processes are virtually impossible to replicate in microtiter plates. The biomass concentration in the production tank is much higher than what can be achieved in a microtiter plate, and it is not easy to add nutrients or to control pH in the wells during growth. In addition, the key parameters that are usually monitored in a production fermentor, such as pH, temperature, and oxygen tension, are virtually impossible to monitor in a microtiter plate. Methods for measuring many of the parameters can be too intrusive at this scale, and it usually makes sense to design the screen such that the composition of the fermentation medium maintains stability with respect to the key parameters. For instance, if the growth medium is suﬃciently dilute, the growth rate and biomass yield of the microorganism and thereby the oxygen consumption rate will be low enough to avoid oxygen limitation of the culture. Additionally, by buﬀering the media, the pH can be maintained within an acceptable range. It is necessary to evaluate whether the chosen conditions correlate to the results that are obtained when growing the organisms in fermentation tanks. The total yield of enzyme per mass unit of carbohydrate is often a good ﬁgure to use in this correlation if the cultures are limited by available carbohydrate. Another good ﬁgure to correlate is the productivity, i.e., the production of enzyme per unit time. Here, the success criteria would be that production of enzyme over time in the two systems has similar kinetics. Additionally, the correlation between the systems can be evaluated by growing strains that produce diﬀerent amounts of enzyme. The ranking, and preferably also the relative diﬀerence, between the strains should be maintained at all scales. Screening of mutagenized cells or spores is further complicated by the fact that the mutagenized cells require a varying amount of time before they start to grow. This is probably dependent on the harshness of the mutagenesis and the nature of the genes being mutated in each cell. The problem arises when the cultures are assayed at a ﬁxed time after inoculation. The diﬀerence in the timing of the initiation of the growth between the cultures will lead to a similar diﬀerence in the total incubation time when the cultures are assayed. Therefore, in theory, a high-producing, slow-starting mutant could be lost as it did not have suﬃcient time to show its potential, whereas a low-producing, fast-starting mutant would be identiﬁed as an improved candidate.

550

Lamsa et al.

One way to overcome some of this asynchrony between the wells is to extend the growth period so that all the cultures eventually reach the maximum expression level allowed by the media. However, the produced enzymes are not always very stable in the outgrown cultures, so it is possible that the measured enzyme activity will decline during prolonged incubation. Another way to overcome the asynchrony is to pre-grow the mutagenized cells before they are grown in the screening medium. With unicellular organisms, this could be solved by outgrowing the mutants individually in a duplicate set of microtiter plates, then inoculation of the screening culture from these plates (e.g., mother–daughter plate approach); however, this is at the expense of an additional step. Additionally, the entire population could be outgrown to get healthy cells (a fairly standard and sometimes necessary practice), with the cost being many copies of each mutant being present when the screen is performed. With multicellular organisms such as fungi, this method is generally not performed, but it is feasible to similarly outgrow and harvest spores from plates after mutagenesis, again at the expense of an additional step, uneven dilution of the mutant population, and the need to screen more of your library. Any of these ‘‘outgrowth’’ methods can have unknown enrichment eﬀects on your pool of mutants. 4.4.2

Expression of Enzymes

The enzyme expression level of a microorganism depends not only on the genetic makeup of the strain, but also on the regulation of the expression of the enzyme in question. When a given enzyme is to be produced, it can be made from the organism where it is initially identiﬁed, or the gene encoding the enzyme can be moved by gene technology techniques to another host that may be more suitable for the production environment or where the expression of the gene is higher. The industrially interesting enzymes are usually hydrolytic enzymes such as proteases, lipases, and carbohydrases. All these enzymes are inducible enzymes, meaning that they are only produced in signiﬁcant amount when certain inducing nutrients are present. For instance, proteases are induced by the presence of polypeptides, and amylases are induced by the presence of starch or maltose. Additionally, many of these genes are also repressed when other nutrients are available. Amylase genes are usually repressed when glucose is present in signiﬁcant concentrations, independent of the presence of the inducing agent (maltose). Furthermore, the details about these often quite complex regulation mechanisms are not always fully understood. The complexity of the regulatory mechanisms makes the design of screening medium more complicated. Whereas it is simple to determine the composition of the medium when the growth is initiated, it can be virtually

Screen Automation and Robotics

551

impossible to control how the medium composition changes during the fermentation. For instance, as starch is a potent inducer of amylases, one could design screening medium where the main carbon source was soluble starch. As the strain grows, it produces amylases that degrade the starch, making it accessible for the cells as a carbon source. However, since the industrially relevant strains produce very high amounts of amylase, the amylase concentration would soon be so high that all starch was degraded very quickly. This would release a lot of glucose leading to a high glucose concentration in the media. The high glucose concentration would repress the expression of the amylase genes, eﬀectively shutting down the synthesis of amylase. As the cells grow, they consume the glucose, so the glucose concentration would gradually be reduced and thereby allowing for amylase synthesis to be resumed. This complex expression pattern would lead to a quite unpredictable expression pattern, making it diﬃcult to determine when the cultures should be assayed for amylase activity. In the example above, the starch was soluble, so the growth medium was homogenous. However, many inducing nutrients are insoluble, such as soy ﬂour, potato protein, or cellulose. This adds an extra level of complexity to the medium, as the accessibility of these complex nutrients to the cells may vary during the growth phase. It is therefore always an advantage to design screening medium using soluble, simple components if possible. In the amylase example above, it would be possible to use maltose as a carbon source in the medium instead of starch and still get the required induction of the amylase. As an added bonus, few amylases can degrade maltose into glucose, so the amylase production would not lead to a boost in glucose concentration, and the multiphase expression pattern would be avoided. Interestingly, the enzyme expression issues discussed for yield improvement screening are virtually ignored when screening enzymes for improved function, such as in the directed evolution programs. In many cases, it is possible that measurements of activity that are reported many times in the literature are misleading characteristics of enzyme functional improvements. In general, one should not interpret activity in functional enzyme screening as speciﬁc activity. Activity is speciﬁc activity times the concentration. Unless a method to conﬁrm speciﬁc enzyme protein concentration is available, only rough estimates of speciﬁc activity may be made. This needs to be understood in screen design for evaluating libraries of enzymes for functional characteristics. 4.4.3

Sampling and Assay

For yield improvement screens, a key requirement of sampling is that the sample is representative of the enzyme concentration in the well. Although

552

Lamsa et al.

this looks trivial at ﬁrst glance, it can be seriously complicated by several factors, such as the homogeneity of the culture, the nature and concentration of the biomass, and the evaporation in microtiter plates. A culture of a microorganism is seldom homogenous. The medium and the biomass will always comprise two diﬀerent phases no matter how uniformly the biomass is distributed within the well. Cultures of unicellular bacteria can often be regarded as homogenous for all practical purposes, but even unicellular eukaryotes like yeast are a little more complicated, as the cells tend to sediment in the wells. Filamentous organisms such as actinomycetes and fungi are not dispersed in the wells but are often present as one or more pellets. This makes the sampling quite complicated, as the pellets tend to clog pipette tips. Even if the pellets are easily transferable, they will occupy a varying volume of the transferred sample and thus induce an uncertainty in the volume of the sample. Furthermore, many enzymes tend to stick to the biomass, making the concentration of the enzyme very much dependent on the presence of biomass in the sample. Adherence of enzyme to the biomass should be avoided. Adding surfaceactive components to the growth medium such as detergents or emulsiﬁers can reduce adherence. Evaporation from microtiter plates is not generally uniform from well to well but can be greatly minimized by incubating the plates in a sealed, moist box during growth. The wells located at the edges of the plate have a much higher evaporation rate than the wells in the center, leading to a concentration of the samples located at the edges. Performing enzyme assays directly in the growth plates can reduce this problem. In practice, this can be accomplished simply by adding buﬀers and reagents directly to the growth plate after incubation. This method requires that the assay can function under these conditions, and that putatively improved strains isolated from the screening wells survive the assay. The assays that are used are usually scaleddown versions of standard enzyme assays. If the assays require addition of insoluble compounds or a ﬁltration or centrifugation step, these unit operations are more diﬃcult to automate, and alternative approaches or solutions may need to be devised. The enzyme assays can be divided into two categories: the kinetic assays in which the enzyme reaction is monitored over time, and endpoint assays employing one measurement made at the end of the assay. Kinetic assays are superior to the endpoint ones in terms of robustness and precision; since the enzyme concentration in a kinetic assay is determined from the diﬀerences in absorbance, it is self-blanking. If a kinetic assay is too slow, then it is feasible to use a multiple read method where batches of plates are processed with repeated visits to the plate reader over a period of time. Endpoint assays work well if the blank background is low or known

Screen Automation and Robotics

553

(and relatively reproducible), or if a large-enough sample dilution is done such that the blank is negligible. In general, assays done at ambient temperature are most suitable and most easily controlled in a yield improvement screen setting. 4.4.4

Enzyme Yield Screen Throughput

Table 2 illustrates the results of a screening program to achieve an improvement in yield for an industrially relevant enzyme and host. Generally, when a screening program is started, the industrial host already produces enzymes at high level, but further improvement is required to make the process economical. The data in the table are from a program undertaken before substantial improvements in assay, and automation technology were implemented. With the technology and improvements discussed above, it is generally possible to screen hundreds of thousands of mutants in a month if needed. The limitations are the rounds of mutagenesis and the follow-up screening required, including fermentation time, to clearly identify the best candidate for another round of mutagenesis and screening. However, it illustrates the types of improvements that can be obtained with the numbers screened. In this program, a ﬁvefold improvement of an industrially relevant enzyme was achieved. 4.4.5

Secondary Screening

In yield improvement screening projects, a considerable amount of time is spent on designing the screening setup and in retesting the candidates isolated from the primary screen. There is generally low correlation between the primary screen and the production process due to many of the factors discussed in the preceding sections. It is important to test the isolated candidates in systems of increasing scale, to eliminate the false positive hits

Table 2 Enzyme Parent Initial Round Round Round Round Totals

1 2 3 4

Results of a Screening Program for Increased Yield of an Industrial

Mutagen

96-well

24-well

Shake ﬂasks

24-well

Shake ﬂasks

Fermentors

EMS UV NTG NTG NTG

37,900 4,700 26,800 9,800 14,900 94,100

255 93 490 300 368 1,506

20 12 41 20 44 137

6 6 16 12 22 62

2 5 16 12 22 57

2 5 5 3 1 16

554

Lamsa et al.

at each scale, and to reduce the number of isolates that have to be tested at each scale. The isolated candidates can be retested in microtiter plates (384, 96, and 24 wells), with several parallel cultures of each candidate. After retesting in microtiter plates, the candidates can be tested in ordinary shake ﬂasks or in fed-batch shake ﬂasks where nutrients are added continuously during the fermentation. Finally, the candidates can be tested in laboratoryscale fermentors and ultimately in pilot plant scale. Laboratory fermentor scale systems correlate better with the production processes than the microtiter-based systems. This is largely because the physical conditions and other parameters can be strictly controlled in fermentation vessels. This is the ﬁnal screen of the enzyme yield improvement screening process. Ultimately, it is also the ﬁnal screen of any enzyme that goes through an improvement program for enzyme function.

5

THE FUTURE

There are many new approaches being applied to screening and selection. Among these are the methods involving solid phase screening with digital imaging, single molecule detection assay technologies, phage display, protein display on microorganisms, ﬂuorescence-activated cell sorting (FACS), nanoscale growth and selection in liquid phase, and man-made cell-like compartments (5–12). These new approaches can make the current automation approaches more necessary and eﬃcient, rather than obsolete, as more and more candidates are generated for applications testing. A new emphasis of these robotic systems for applications-relevant or correlating screens is a likely scenario. As screening of larger and larger libraries increases, more and more hits will be obtained, further complicating the process of selecting the best variants, as has been seen in the drug discovery ﬁeld (14). Like the explosion of information since the Internet, it will further increase the need for a variety of automation methods at nearly all microtiter plate scales, although some say these newer methods could eventually replace screening in microtiter plates (8). Keeping automation as ﬂexible as possible will allow adapting existing systems to some of these new technologies (15,16,18,19). The scientists that are involved in screening will ﬁnd it easier to use automation because there is now more access to oﬀ-the-shelf automation available at many diﬀerent levels of complexity, with more variations constantly in development. Those who previously thought they could not use automation will ﬁnd it easier to use, so more scientists will become involved. The timing is right; there is more screening-related work to be done at a faster pace due to the explosion of results from these new primary screening

Screen Automation and Robotics

555

methods mentioned above. Automation of screening will remain a multidisciplinary activity, however, largely because of the complexity of the diﬀerent sciences that must come together to make up a screening program. It will become more important for all disciplines to share this function to make it as successful as it can be. REFERENCES 1. 2.

3.

4. 5. 6.

7.

8.

9.

10.

11. 12. 13.

M Divers. Point: screen development as a shared function. J Biomol Screen 3:263–266, 1998. GE Nedwin. Green chemistry: using enzymes as benign substitutes for synthetic chemicals and harsh conditions in industrial processes. In: G Salyer, ed. Biotechnology in the Sustainable Environment. New York: Plenum Press, 1997, pp 13–32. M Lamsa, P Bloebaum. Mutation and screening to increase chymosin yield in a genetically-engineered strain of Aspergillus awamori. J Ind Microbiol 5:229– 238, 1990. A Demain. Genetics and microbiology of industrial microorganisms. J Ind Microbiol Biotech 27:352–356, 2001. M Olsen, B Iverson, G Gerogiou. High-throughput screening of enzyme libraries. Curr Opin Biotechnol 11:331–337, 2000. JM Joern, T Sakamoto, A Arisawa, FH Arnold. A versatile high throughput screen for dioxygenase activity using solid-phase digital imaging. J Biomol Screen 6:219–223, 2001. S Delagrave, DJ Murphy, JL Rittenhouse Pruss, AM Maﬃa, BL Marrs, EJ Bylina, WJ Coleman, CL Grek, MR Dilworth, MM Yang, DC Youvan. Application of a very high throughput digital imaging screen to evolve the enzyme galactose oxidase. Protein Eng 14:261–267, 2001. KJ Moore, S Turconi, S Ashman, M Ruediger, U Haupts, V Emerick, AJ Pope. Single molecule detection technologies in miniaturized high throughput screening: ﬂuorescence correlation spectroscopy. J Biomol Screen 4:335–353, 1999. H Joo, A Arisawa, Z Lin, FH Arnold. A high-throughput digital imaging screen for the discovery and directed evolution of oxygenases. Chem Biol 6:699–706, 1999. DC Youvan, E Goldman, S Delagrave, MM Yang. Digital imaging spectroscopy for massively parallel screening of mutants. Methods in Enzymology. New York: Academic Press, 1995, pp 232–248. DS Tawﬁk, AD Griﬃths. Man-made cell-like compartments for molecular evolution. Nat Biotechnol 16:652–656, 1998. AD Griﬃths, DS Tawﬁk. Man-made enzymes—from design to in vitro compartmentalization. Curr Opin Biotechnol 11:338–353, 2000. WM Laﬀerty. GigaMatrixk: 100,000-well Screening Platform. Podium Presentation: LabAutomation 2002, Palm Springs, 2002, p. 66.

556

Lamsa et al.

13a. 14. 15. 16.

JM Perkel. Going Super-Duper Throughput. The Scientist 15(17):24, 2001. DJ Ausman. Screening’s age of insecurity. Mod Drug Discov, 32–39, 2002. G Karet. Transforming HTS. Drug Discov 5:20–26, 2002. J Babiak. Transforming your robotics into an infrastructure of the future. J Biomol Screen 2:139–143, 1997. M Banks, A Binnie, S Fogarty. Point: high throughput screening using fully integrated robotic screening. J Biomol Screen 2:133–135, 1997. M Beggs, H Blok, A Diels. The high throughput screening infrastructure: the right tools for the task. J Biomol Screen 4:143–149, 1999. J Major. Challenges and opportunities in high throughput screening: implications for new technologies. J Biomol Screen 3:13–17, 1998. JR Cherry, MH Lamsa, P Schneider, J Vind, A Svendsen, A Jones, A Pedersen. Directed evolution of a fungal peroxidase. Nat Biotechnol 17:379–384, 1999. JJ Burbaum. Point: the evolution of miniaturized well plates. J Biomol Screen 5:5–8, 2000. KR Oldenburg. Point: automation basics: robotics vs workstations. J Biomol Screen 4:53–56, 1999. G Karet. Options ﬂood the liquid handler market. Drug Discov 5:29–32, 2002. TA Bateman, RA Ayers, RB Greenway. An engineering evaluation of four ﬂuid transfer devices for automated 384-well high throughput screening. Lab Robot Autom 11:250–259, 1999. JW Armstrong, RA Gerren, SD Hamilton. A review of automation options to support plate preparation, cherry picking, and homogeneous assays. J Biomol Screen 3:271–275, 1998. MA Sills. Counterpoint: integrated robotics vs. task-oriented automation. J Biomol Screen 2:137–138, 1997. B Rasnow, K Kearns, P Grandsard. Open-Sourcing Laboratory Automation Control Software. Poster T068: LabAutomation 2002, Palm Springs, 2002. MF Russo, MM Echols. Automating Science and Engineering Laboratories with Visual Basic. New York: John Wiley and Sons, 1999, pp 1–355. TG Holt, C Dufresne, JM Liesch, GK Mallow. The design and development of an integrated natural products screening database. J Biomol Screen 5:421–433, 2000. H Zhao, FH Arnold. Combinatorial protein design: strategies for screening protein libraries. Curr Opin Struct Biol 7:480–485, 1997. JC Moore, HM Jin, O Kuchner, FH Arnold. Strategies for the in-vitro evolution of protein function: enzyme evolution by random recombination of improved sequences. J Mol Biol 272:336–347, 1997. FH Arnold, JC Moore. Optimizing industrial enzymes by directed evolution. In: T Schenor, ed. Advances in Biochemical Engineering/Biotechnology. Berlin: Springer-Verlag, 1997, pp 1–14. O Kuchner, FH Arnold. Directed evolution of enzyme catalysts. Tibtech 15: 523–530, 1997. KE Jaeger, T Eggert, A Eipper, MT Reetz. Directed evolution and the creation of enantioselective biocatalysts. Appl Microbiol Biotechnol 55:519–530, 2001.

17. 18. 19. 20. 21. 22. 23. 24.

25.

26. 27. 28. 29.

30. 31.

32.

33. 34.

Screen Automation and Robotics 35. 36.

37. 38.

39. 40.

41. 42.

43.

44.

45.

46. 47.

48.

557

FH Arnold, AA Volkov. Directed evolution of biocatalysts. Curr Opin Chem Biol 3:54–59, 1999. H Zhao, FH Arnold. Functional and nonfunctional mutations distinguished by random recombination of homologous genes. Proc Natl Acad Sci 94:7997– 8000, 1997. A Zaks. Industrial biocatalysis. Curr Opin Chem Biol 5:130–136, 2001. RR Chirumamilla, R Muralidhar, R Marchant, P Nigam. Improving the quality of industrially important enzymes by directed evolution. Mol Cell Biochem 224:159–168, 2001. M Sivaraja, J Giordano, MG Peterson. High-throughput screening assay for helicase enzymes. Anal Biochem 265:22–27, 1998. FC Christians, L Scapozza, A Crameri, G Folkers, WPC Stemmer. Directed evolution of thymidine kinase for AXT phosphorylation using DNA family shuﬄing. Nat Biotechnol 17:259–264, 1999. L Giver, A Gershenson, PO Freskgard, FH Arnold. Directed evolution of a thermostable esterase. Proc Natl Acad Sci 95:12809–12813, 1998. S Turconi, K Shea, S Ashman, K Fantom, DL Earnshaw, RP Bingham, UM Haupts, MJB Brown, AJ Pope. Real experiences of uHTS: a prototypic 1536well ﬂuorescence anisotropy-based uHTS screen and application of well-level quality control procedures. J Biomol Screen 6:275–290, 2001. T Lanio, A Jeltsch, A Pingoud. Automated puriﬁcation of His6-tagged proteins allows exhaustive screening of libraries generated by random mutagenesis. Biotech 29:338–342, 2000. S Silberblatt, RA Felder, TE Miﬄin. Optimizing reaction conditions of the NanoOrange protein quantitation method for use with microplate-based automation. JALA 6:83–87, 2001. JH Zhang, TDY Chung, KR Oldenburg. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen 4:67–73, 1999. P Lavery, MJB Brown, AJ Pope. Simple absorbance-based assays for ultra-high throughput screening. J Biomol Screen 6:3–9, 2001. PB Taylor, FP Steward, DJ Dunnington, ST Quinn, CK Schulz, KS Vaidya, E Kurali, TR Lane, WC Xiong, TP Sherrill, JS Snider, ND Terpstra, RP Hertzberg. Automated assay optimization with integrated statistics and smart robotics. J Biomol Screen 5:213–225, 2000. K Slinker. The statistics of synergism. J Mol Cell Cardiol 30:723–731, 1998.

26 Screening for Enantioselective Enzymes Manfred T. Reetz ¨r Kohlenforschung Max-Planck-Institut fu ¨lheim an der Ruhr, Germany Mu

1

INTRODUCTION

Enantiomerically pure or enriched organic compounds play a prominent role in pharmaceutical, agricultural, synthetic organic, and natural products chemistry (1). For example, the so-called chiral market of industrial products in 2000 amounted to $100 billion (1d,e). Many of these products can be prepared in the laboratories of organic chemists. Although conventional separation of enantiomers is still the preferred process in industry (1d), catalytic processes are likely to dominate in the future because asymmetric catalysis has the potential of constituting the economically and ecologically most attractive strategy. The two most important options available to organic chemists are synthetic chiral transition metal catalysts (2), on the one hand, and enzymes, on the other (3). Indeed, both areas are growing in importance. In the case of enantioselective biocatalysts, a formidable number of wild-type enzymes and/or whole cells are used in the industrial production of chiral organic compounds (3,4). Examples include carnitine dehydratasecatalyzed hydroxylation of g-aminobutyric acid with formation of l-carni559

560

Reetz

tine (Lonza), lipase-catalyzed kinetic resolution of chiral amines (BASF), lipase-catalyzed kinetic resolution of 3-(4-methoxyphenyl)glycidic acid ester in the production of diltiazem (DSM/Tanable/Sepracor), and aminoacylasecatalyzed kinetic resolution of N-acyl-methionine (Degussa) (4), to mention only a few. It is certain that the traditional methods of isolating or harvesting such enzymes or whole cells will continue to be applied. Indeed, because only a very small fraction of all enzymes existing on earth have been identiﬁed, it is likely that many more useful ones will be found and applied industrially in the future. Because only a very small fraction of soil microorganisms can be readily cultured by standard techniques (0.1% to 1%), methods have recently been developed which allow access to the biodiversity available in uncultured microorganisms (5). This new approach is based on

Figure 1

Directed evolution of an enantioselective enzyme (Ref. 7c).

Screening for Enantioselective Enzymes

561

novel methods for collecting genes in the environment, expressing them in recombinant organisms. Thus metagenome libraries of environmental DNA are being established in companies, leading to huge numbers of hitherto unknown enzymes. This raises the interesting possibility of testing hundreds of thousands of enzymes in enantioselective transformations of interest to organic chemists. The parameter of interest is the enantiomeric excess (ee) and/or, in the case of kinetic resolution, the so-called selectivity factor, E, which reﬂects the relative rate of reaction of the (R) and the (S) substrate. Obviously, such a task can only be carried out if the appropriate highthroughput screening system(s) is (are) available. The need to develop high-throughput ee assays also arises due to another reason. Recently, the ﬁrst case in which the methods of directed evolution (6) were applied to the development of an enantioselective enzyme has been reported (7). Accordingly, the appropriate combination of molecular biological methods for random mutagenesis and gene expression coupled with an eﬃcient high-throughput ee-screening system forms the basis of a new area of research (Fig. 1). The gene of the wild-type enzyme, which catalyzes a given reaction of interest A!B, but not with an acceptable degree of enantioselectivity, is ﬁrst subjected to random mutagenesis [e.g., error-prone polymerase chain reaction (PCR), saturation mutagenesis, DNA shuﬄing]. Following expression in a suitable bacterial host, the bacterial colonies are plated out on agar plates and harvested by a colony picker (Fig. 2) (7). After being placed in the wells of microtiter plates (e.g., 96 format) containing nutrient broth, arrays of thousands of spatially addressable catalysts become available. Following high-throughput screening, the most enantioselective enzyme variant is identiﬁed. Then the corresponding mutant gene is subjected once more to mutagenesis, expression, and screening, a process which creates evolutionary pressure (6,7). Because full exploitation of natural diversity (4,5) as well as the evolution-based extension of diversity (7,8) provides huge numbers of new

Figure 2 Individual steps in the directed evolution of an enantioselective enzyme (Ref. 7c).

562

Reetz

and potentially enantioselective enzymes (certainly thousands, probably millions), the importance of rapid ee assays cannot be underestimated (Fig. 3). This chapter summarizes the present status of research concerning highthroughput ee assays. As will be seen, some of them not only deliver information regarding the enantiopurity of samples, but are also time-resolved. Consequently, activity is also (crudely) measured. If this is not the case, the assay needs to be applied at given time intervals, if information regarding activity is desired. In some cases, it may be necessary to add an internal standard. Alternatively, to exclude nonactive mutants, colony-based on-plate pretests can be applied. Examples include the tributyrin test for lipase activity (9a) and a colorimetric test for epoxide hydrolase activity (10). At this point, the diﬀerence between screening and selection, which are sometimes confused, needs to be reemphasized (6d). Screening is the process of identifying by some analytical tool a desired member of a library of enzyme variants, e.g., the most enantioselective variant as a catalyst in a given reaction of interest A!B. If selection is applied in evolutionary experiments, only the desired member(s) of a potential library appears, e.g., as a viable microbial clone. Thus far, no selection system for the directed evolution of enantioselective enzymes has been developed although an ee screen based on diﬀerential cell growth was recently introduced (11). This chapter focuses on screening systems for assaying the enantioselectivity of enzyme-catalyzed reactions. Most of these developments arose from the need to apply directed evolution to the creation of enantioselective enzymes for use in organic chemistry. Others were developed by chemists active in the ﬁeld of combinatorial transition metal catalysis. In most cases, a given assay can be applied to both types of catalysts (9).

Figure 3

Two sources of large libraries of potentially enantioselective enzymes.

Screening for Enantioselective Enzymes

2 2.1

563

HIGH-THROUGHPUT ASSAYS FOR EVALUATING ENANTIOSELECTIVE ENZYMES UV/VIS-Based Assays

A number of assays for screening the (approximate) activity of enzymes have been developed which do not involve enantioselectivity, color tests, or ﬂuorescence-based systems generally being used (6,9,10,12). They are often rather simple and practical and may in fact be used as rough prescreens to sort out ‘‘dead’’ mutants. Unfortunately, extension of these tests to enantioselectivity is not trivial. Indeed, prior to 1997, not a single high-throughput ee assay existed. Conventional ways to determine the ee of a reaction were based on gas chromatography (GC) or high-performance liquid chromatography (HPLC) using chiral columns; however, this allowed only a few dozen samples to be analyzed per day. In a seminal project designed to test directed evolution as a means to create enantioselective enzymes, the lipase-catalyzed hydrolytic kinetic resolution of the p-nitrophenol ester 1 was chosen as a model reaction (7). The wild-type lipase from Pseudomonas aeruginosa catalyzes this transformation with only marginal enantioselectivity in favor of (S)-2. The selectivity factor, E, which reﬂects the relative rate of the two enantiomers, is only 1.1. The pnitrophenol ester, rather than the usual methyl or ethyl ester, was chosen because the hydrolysis product, p-nitrophenolate (3), can easily be detected by ultraviolet/visible (UV/VIS) spectroscopy using a standard plate reader which addresses microtiter plates in a high-throughput manner. However, if a racemate (rac-1) is used as in a normal kinetic resolution, only the overall activity can be ascertained. To solve this fundamental problem, enantiomerically pure (R)-1 and (S)-1 were used separately pairwise, which means that 48 enzyme variants can be tested on a 96-well microtiter plate (7a).

Two typical experimental plots are shown in Fig. 4. The top one shows the result of the wild-type lipase in which the slopes of the (S) and (R) lines are almost identical, indicating almost no enantioselectivity. The bottom plot displays the results using an enzyme variant in the ﬁrst generation of random mutagenesis (library of 2000 members), signaling increased (S) selectivity. Such a hit is then studied in detail by running a lab-scale kinetic resolution on the racemate, chiral GC serving as the analytical tool. Notice

564

Reetz

Figure 4 Course of the lipase-catalyzed hydrolysis of the (R) and (S) ester 1 as a function of time measured by a UV/VIS plate reader (Ref. 7a). a) Wild-type lipase from P. aeruginosa, b) improved mutant in the ﬁrst generation.

that the plots in Fig. 4 also provide some information regarding enzyme activity. Whereas quantiﬁcation of the reaction rate is not possible, experimental lines showing no slope indicate no activity in the time interval chosen. Thus enzyme variants showing such low activity are eliminated although some of them may actually be enantioselective. A total of four cycles of mutagenesis/expression/screening were performed, about 2000– 4000 enzyme variants being screened in each generation. This led to the creation of an enzyme having an E value of 11 (7a). Later this was increased to E = 26 by applying epPCR and saturation mutagenesis (7b), and most recently a mutant showing even higher enantioselectivity (E > 51) was evolved from the same parent wild-type using recombinant methods (DNA shuﬄing) (7d). All in all about 40,000 enzyme mutants were screened using this UV/VIS assay. Moreover, it was possible to invert the direction of enantioselectivity (E = 30 in favor of (R)-2) (13), in which case another 40,000 variants were screened. Although this is the ﬁrst high-throughput ee assay in the literature, allowing between 500 and 800 samples to be tested per day, it suﬀers from several drawbacks. The most serious disadvantage is that the process of evolution focuses on the p-nitrophenol ester (1), which will certainly not be used in real industrial applications. The methyl or ethyl ester would be industrially relevant, but these do not release a UV/VIS-active alcohol. Moreover, because the (S) and (R) esters are assayed separately, the enzymes are

Screening for Enantioselective Enzymes

565

not allowed to compete for the substrate, which may distort the results. That is why the hits in a library, once identiﬁed, need to be studied using the racemate in a lab-scale reaction; the exact ee (or E) is then determined by GC. On the practical side, it is useful to carry out the well-known tributyrin prescreening test to eliminate enzyme variants having no lipase activity whatsoever. Accordingly, the agar plates containing the bacterial colonies are charged with tributyrin. Because of its insolubility in the medium, the plates have a milky appearance. In the case of active lipases, hydrolysis occurs and clear spots appear (9a). Another colorimetric assay for testing the enantioselectivity of lipases or esterases in ester hydrolysis reactions is based on a diﬀerent principle (14). To simulate the state of competitive conditions of an enzymatic process, the so-called Quick-E-Test was developed in which a mixture of the p-nitrophenol ester of one enantiomeric form of a chiral ester 4 and a resoruﬁn ester 6 is subjected to enzyme-catalyzed hydrolysis, the latter taking on the ‘‘role’’ of the enantiomer. The two hydrolyses were monitored by recording the UV/VIS absorption of the two products 3 and 8 at two distinctly diﬀerent wavelengths (410 vs. 570 nm). Although this makes a more precise determination of E values possible, the method suﬀers from the same disadvantage noted previously, namely, the necessity of employing the p-nitrophenol ester of the chiral acid. Nevertheless, appropriate automation should allow a throughput of a thousand or more samples per day.

Yet another UV/VIS test useful in determining the ee of lipases or esterases is based on the notion that hydrolysis of an ester leads to a change in acidity which is measurable by an appropriate pH indicator (15). Upon using a buﬀer N,N-bis(2-hydroxyethyl)-2-(aminoethane sulfonic acid) and a pH indicator ( p-nitrophenol) having the same pKa value, a linear correlation between the acid generated and the protonation of the indicator was established. In this case, the two enantiomeric esters are studied separately pairwise, the color changes upon protonation in each case being monitored colorimetrically. Currently, it is not quite clear how general and how precise

566

Reetz

this method actually is because it was later observed that appreciable discrepancies between the E value obtained and the E value measured conventionally in control experiments exist in some cases (16). Nevertheless, it is likely that the basic concept can be optimized. A related assay was later reported which makes use of a diﬀerent and more convenient indicator (bromothymol blue) (17). This system seems to be very practical. However, it should be noted that all of the assays based on pH change reported so far refer to the use of isolated enzymes. In real applications, supernatants are likely to be used such as in directed evolution studies. In supernatants, however, pH variations may occur. Therefore an optimized assay was recently developed in which supernatants are employed (18). In doing so, the pH of the buﬀer is adjusted to the acidity of the medium. Then about 4000 samples in a kinetic resolution study can be roughly screened per day. The above screening systems are restricted to the hydrolytic kinetic resolution of esters catalyzed by lipases, esterases, or proteases. They are based on the original idea of testing (R) and (S) substrates separately pairwise on microtiter plates (7a). The same applies to an interesting version of this concept in the hydrolysis of chiral acetates (19). In this case, the liberated acetic acid is quantiﬁed by conversion into NADH which is monitored by a UV/VIS plate reader at 340 nm. The quantitative conversion of acetic acid into NADH occurs via a cascade of enzyme-catalyzed reactions using a

Figure 5 The hydrolase-catalyzed reaction releases acetic acid, which is converted by acetyl-CoA synthetase (ACS) to acetyl-CoA in the presence of adenosine triphosphate (ATP) and coenzyme A (CoA) (Ref. 19). Citrate synthase (CS) catalyzes the reaction between acetyl-CoA and oxaloacetate to give citrate. The oxaloacetate required for this reaction is formed from L-malate and NAD+ in the presence of Lmalate dehydrogenase (L-MDH). Initial rates of acetic acid formation can thus be determined by the increase in adsorption at 340 nm due to the increase in NADH concentration. Use of optically pure (R) or (S) acetates allows the determination of the apparent enantioselectivity, Eapp.

Screening for Enantioselective Enzymes

567

commercially available enzyme kit (Fig. 5). About 540 E values can be obtained within 1 h, which calculates to be ca. 13,000 determinations per day. Of course, because the enantiomers are tested separately, the hits need to be studied conventionally to ascertain real E values. Obviously, esters other than acetates cannot be used (19). 2.2

Fluorescence-Based Systems

The primary advantage of assays based on ﬂuorescence is the high degree of sensitivity, which allows the use of very dilute substrate concentrations and extremely small amounts of catalysts (20). An elegant ﬂuorogenic assay for the hydrolytic kinetic resolution of certain chiral acetates, e.g., 9, has been developed recently (Fig. 6) (21). It is based on a sequence of two coupled enzymatic steps that converts a pair of enantiomeric alcohols formed by the asymmetric hydrolysis under study [e.g., (R)- and (S)-10] to a ﬂuorescent product (e.g., 12). In step 1, the (R) and (S) substrates 9 are subjected separately to hydrolysis in reactions catalyzed by a mutant enzyme (lipase or esterase), a catalytic antibody, or, in principle, a synthetic catalyst compatible with the system. The goal of the assay is to measure the enantioselectivity of this kinetic resolution. The relative amount of (R)- and (S)-10 produced after a given reaction time is a measure of enantioselectivity and

Figure 6 21).

Fluorescence-based assay for enantioselectivity of ester hydrolysis (Ref.

568

Reetz

can be ascertained rapidly, but not directly. Two subsequent chemical transformations are necessary. In step 2, the enantiomeric alcohols (R)and (S)-10 are oxidized separately to the ketone 11 by horse-liver alcohol dehydrogenase (HLDH), from which the ﬂuorescent ﬁnal product umbelliferone (12) is released in each case by the catalytic action of bovine serum albumin (BSA) (step 3). Thus by measuring the ﬂuorescence of 12 for the (R) and the (S) substrate separately, the relative amounts of (R)- and (S)-10 can be determined. The authors tested 30 diﬀerent esterases and lipases and followed the rate of release of 12 by ﬂuorescence in the wells of standard microtiter plates (21). Control experiments ensured that the apparent rate of umbelliferone release is directly proportional to the rate of acetate hydrolysis. The predicted and observed E and ee values (as checked by standard chiral HPLC assay of a lab-scale kinetic resolution) were found to lie within F20%. Only in one case was a larger discrepancy observed, a result that was believed to be caused by the occurrence of an unusually low KM for one of the enantiomers. Thus because the test can be carried out on 96-well microtiter plates, high throughput should be possible. Of course, the inherent disadvantage noted earlier for some of the colorimetric tests also applies here, namely, the fact that the optimization of a potential catalyst is focused on a speciﬁc substrate 9 modiﬁed by the incorporation of a probe, in this case, the ﬂuorogenic moiety 12. A novel ﬂuorescence-based method for assaying the activity of synthetic catalysts in acylation reactions of alcohols has been described (22a). The underlying idea is to use a molecular sensor which ﬂuoresces upon formation of an acidic product (acetic acid). Protonation of an appropriate chemosensor leads to intense ﬂuorescence. Chiral modiﬁcation is possible (22b). 2.3

Assays Based on Gas Chromatography, HPLC, Thin-Layer Chromatography, or Capillary Array Electrophorosis

As already delineated, conventional GC or HPLC based on the use of chiral stationary phases can only handle a few dozen ee determinations per day (23,24). However, it was recently demonstrated that GC can be modiﬁed so that in certain cases, about 700 exact ee and E determinations are possible per day (25). The case study concerns the lipase-catalyzed kinetic resolution of the chiral alcohol (R)- and (S)-13 with formation of the acylated forms (R)- and (S)-14. Thousands of mutants of the lipase from P. aeruginosa were created by error-prone PCR for use as catalysts in the model reaction (26).

Screening for Enantioselective Enzymes

569

The initial approach concerned the use of two columns in a single GC oven (25, 26). However, this turned out to have a number of disadvantages. The successful construction consists of two GC instruments (27), one prepand-load sample manager (PAL) (28) and a PC (Fig. 7). The instruments are connected to the PC via a standardized data bus (HP-IB) (27), which controls pressure, temperature, etc., and handles other data such as that of the detector. A wash station as well as a drawer system with a maximum of eight microtiter plates were included. Using a special construction developed in-house, the sample manager was attached to the unit in such a way as to reach both injection ports. Because the sample manager can inject samples from 96- or 384-well microtiter plates, over 3000 samples can be handled without manual intervention. The software (ChemstationR) (29) enables additional programs (macros) to be applied before and after each analytical run. Such a macro controls the sample manager, each position on the microtiter plate being labeled via the sequence table. Another macro ensures analysis following each sample run in a speciﬁed manner; that is, the peaks of the chiral compound 13 are analyzed quantitatively. The analytical data are transferred to an ExcelR sheet via dynamic data exchange (DDE) (30) in table form or in microtiter format, allowing for a rapid overview. Finally, the setup includes H2 guards which monitor the hydrogen concentration in the ovens; at concentrations exceeding 1% (potentially explosive at >4% H2), the systems responds and automatically switches to nitrogen as the carrier gas (25,26). Using a stationary phase based on a h-cyclodextrin derivative (h-CD), 2,3-di-O-ethyl-6-O-tert-butyldimethylsilyl-h-CD, complete separation of

Figure 7 Schematic representation of a GC-screening system comprising two GC instruments (Ref. 25).

570

Reetz

(R)- and (S)-13 [but not of (R)/(S)-14] was achieved within 3.9 min (25,26). Because the conﬁguration illustrated in Fig. 7 comprises two simultaneously operating GC units, about 700 exact ee determinations of (R)/(S)-13 are possible per day. Moreover, the corresponding values for the conversion and the selectivity factor, E, (or s) are likewise automatically provided in microtiter format. A typical example is shown in Fig. 8 in which the data corresponding to the most selective mutant enzymes are shown in gray boxes (EV2.4) (25). Mutants displaying 0% conversion imply complete lack of enzyme activity within the predetermined time span. Negative values for ee indicate reversal of enantioselectivity. Contrary to common belief, it is thus possible to utilize GC in highthroughput screening of enantioselectivity in appropriate cases. This type of GC setup should also be useful in the screening of nonchiral transformations. Moreover, it is sometimes possible to increase throughput even further by injecting samples at proper times which are shorter than the total time span of the actual chromatogram, enabling maximum use of time between runs (interlocking chromatograms) (25,26). Major advantages relative to the employment of two totally separate GC units include the optimal use of laboratory space and the utilization of a single sampler and a

Figure 8 ExcelR sheet of GC data in microtiter format showing values for percent conversion (c), percent ee, and selectivity factor (E ) for mutant lipases catalyzing the hydrolytic kinetic resolution of alcohol 13 (Ref. 25).

Screening for Enantioselective Enzymes

571

computer system, resulting in high instrumental and economical eﬃciency. Although optimization needs to be performed for each new chiral compound to be tested, it can be anticipated that in appropriate cases 600–800 samples can easily be handled per day. It has recently been shown that HPLC can be developed analogously to suit the requirements of a given analytical problem (31). However, it is unlikely that truly high-throughput ee determinations, meaning many thousands of samples per day, can be achieved in a general way on the basis of GC or HPLC. Of course, depending upon the particular problem at hand, a throughput of 600–800 ee determinations per day may suﬃce. A related question concerns high-throughput screening of enantioselectivity based on thin-layer chromatography (TLC) (9a,26). It is easy to imagine that hundreds of TLC plates can be scanned rapidly using the appropriate computer image processing which ‘‘integrates’’ spots on a given surface. The real challenge is to ﬁnd eﬃcient chiral selectors which result in suﬃcient enantiomer separation. Although the above-mentioned chromatographic techniques may well serve as practical assays in special cases, truly high-throughput encompassing thousands of ee values per day is outside of the realm of these assays. In sharp contrast, capillary array electrophoresis (CAE) has recently been modiﬁed to allow the high-throughput determination of enantioselectivity (32). It is well known that traditional capillary electrophoresis (CE) in which the electrolyte contains chiral selectors, such as h-cyclodextrin derivatives (h-CDs), can be used to determine the enantiomeric purity of a given sample (33). Unfortunately, the conventional forms of this analytical technique allow for only a few dozen ee determinations per day. However, because of the analytical demands arising from the Human Genome Project, inter alia, CE has been revolutionized in recent years so that eﬃcient techniques for instrumental miniaturization are now available, making super-high-throughput analysis of biomolecules possible for the ﬁrst time (34,35). Two diﬀerent approaches have emerged, namely, capillary array electrophoresis (CAE) (34) and CE on microchips (also called CAE on chips) (35). Both techniques can be used to carry out DNA sequence analyses and/or to analyze oligonucleotides, DNA restriction fragments, amino acids, or PCR products. Many hundred thousands and more analytical data points can be accumulated per day (34,35). In the case of CAE, commercially available instruments have been developed which contain a high number of capillaries in parallel, e.g., the 96 capillary unit MegaBACER which consists of 6 bundles of 16 capillaries (36). The system can, therefore, address a 96-well microtiter plate. Each capillary is about 50 cm long. This system was adapted as a super-highthroughput analytical tool for ee determination (32). In this study, chiral

572

Reetz

amines of the type 17, which are of importance in the synthesis of pharmaceutical and agrochemical products (37), were used as the model substrates. They are potentially accessible by catalytic reductive amination of ketones 15, Markovnikov addition of ammonia to oleﬁns 16 or enzymatic hydrolysis of acetamides 18 (the reverse reaction also is possible).

In exploratory experiments, the conditions for conventional CE assay of the amines 17 were ﬁrst optimized using various a- and h-CD derivatives as chiral selectors (32). To enable a sensitive detection system, namely laserinduced ﬂuorescence detection (LIF), the amines were ﬁrst derivatized by conventional reaction with ﬂuorescene-isothiocyanate (19) leading to ﬂuorescence-active compounds 20. Although extensive optimization was not carried out (only six CD derivatives were tested), in all cases, satisfactory baseline separation was accomplished.

The next step involved the use of compounds 17c/20c as the model substrates for CAE analysis using an instrument of the kind MegaBACER. Known enantiomeric mixtures of the amine 17c were transformed into the

Screening for Enantioselective Enzymes

573

ﬂuorescence-active derivative 20c. The latter samples were then analyzed by CAE. Unfortunately, the results of the conventional single capillarly system could not be reproduced in the CAE experiments because of unstable electrophoretic runs. The problem was solved by developing a special electrolyte having a higher viscosity. It is composed of 40 mM CHES pH 9.1/6.25 mM g-CD 5:1 diluted with a buﬀer containing linear polyacrylamide. The MegaBACER instrument was operated at a potential of 10 kV/ 8 AA and a sample injection potential of 2 kV/9 s. Under these conditions, baseline separation is excellent. The agreement between ee values of (R)/(S) mixtures of 20c determined by CAE and those of the corresponding (R)/(S) mixtures of 17c as measured by GC turned out to be excellent (32). The enantiomer separation of (R)/(S)-20c on the MegaBACER instrument required about 19 min. This means that although the conditions are far from optimized, the automated 96-array system provides more than 7000 ee determinations in a single day (32). In related cases, optimization resulted in shorter analysis times for enantiomer separation so that a daily throughput of 15,000 to 30,000 ee determinations is realistic. Such super-highthroughput screening for enantioselectivity is not readily possible by any other currently available technology. In view of the possibility of chiral selector optimization and the fact that CAE has many advantages, such as extremely small amounts of samples, essentially no solvent consumption, absence of high pressure pumps and valves, as well as high durability of columns, this CAE assay is ideally suited for high-speed ee determination. A variation of the above method, namely, the possibility of highthroughput screening of the ee of chiral organic compounds by utilizing capillary electrophoresis on microchips has been proposed (38). CE (or more speciﬁcally CAE) on microchips (typically 1010 cm), in general, had previously been developed for the analysis of biomolecules (35). Traditional photolithographic techniques are thereby used to produce capillary arrays on plastic or glass microchips. However, the enantiomer separation of organic molecules on plastic microchips is not generally feasible due to the chemical instability of such systems. The situation is quite diﬀerent in the case of glass chips (32). In such a modiﬁcation, enantiomer separation, e.g., of compound 20c, is possible, the detection being based on laser-induced ﬂuorescence (LIF). Optimization and automation using robotics still need to be carried out. Nevertheless, a cheap and eﬃcient CAE-based assay for super-highthroughput ee determination may emerge in a few years (38b). In summary, the two forms of capillary array electrophoresis are emerging as powerful methods for the determination of enantiomeric purity of chiral compounds in a truly high-throughput manner. Of course, for a given analytical problem, derivatization and antipode separation need to be eﬃcient, which means that universal generality cannot be claimed. Various

574

Reetz

modiﬁcations are possible, e.g., detection systems based on UV/VIS, MS, or electrical conductivity. Moreover, chiral selectors in the CE electrolyte are not even necessary if the mixture of enantiomers is ﬁrst converted into diastereomers, e.g., using chiral ﬂuorescent-active derivatization agents (32). 2.4

Assays Based on Circular Dichroism

An alternative to HPLC employing chiral columns which separate the enantiomers of interest is the use of normal columns which simply separate the starting materials from the enantiomeric products, enantiomeric excess (ee) of the mixture of enantiomers then being determined by circular dichroism (CD) spectroscopy. Indeed, this principle was ﬁrst established in 1980 (39) and developed further in later research (40,41). Recently, it was shown that the method can be applied in the screening of combinatorially prepared enantioselective transition metal catalysts (42). However, it should be amenable to enzyme-catalyzed processes as well. The method is based on the use of sensitive detectors for HPLC which determine in a parallel manner both the circular dichroism (De) and the UV absorption (e) of a sample at a ﬁxed wavelength in a ﬂow-through system (39–42). The CD signal depends only on the enantiomeric composition of the chiral products, whereas the absorption relates to their concentration. Thus only short HPLC columns are necessary (39,40,42). Upon normalizing the CD value with respect to absorption, the so-called anisotropy factor g is obtained (42): g¼

De e

For a mixture of enantiomers, it is thus possible to determine the ee value without recourse to complicated calibration. The fact that the method is theoretically valid only if the g factor is independent of concentration and if it is linear with respect to ee has been emphasized repeatedly (39–42). However, it needs to be pointed out that these conditions may not hold if the chiral compounds form dimers or aggregates because such enantiomeric or diastereomeric species would give rise to their own particular CD eﬀects. Although such cases have yet to be reported, it is mandatory that this possibility be checked in each new system under study. This precaution was described in detail in the development of a CDbased ee assay for chiral alcohols (43). In work concerning the directed evolution of enantioselective enzymes, there was the need to develop fast and eﬃcient ways to determine the enantiomeric purity of these compounds, which can be produced enzymatically either by reduction of the prochiral ketone (e.g., 21) using reductases or by kinetic resolution of rac acetates (e.g., 23) by lipases. In both systems, the CD approach is theoretically

Screening for Enantioselective Enzymes

575

possible. In the former case, an LC column would have to separate the educt 21 from the product (S)/(R)-22, whereas in the latter case, (S)/(R)-22 would have to be separated from (S)/(R)-23.

Because acetophenone (21) has a considerably higher extinction coeﬃcient than 1-phenylethanol (22) at a similar wavelength (near 260 nm), the separation of starting material from product was absolutely necessary which was accomplished using a relatively short HPLC column based on a reversed phase system. In preliminary experiments using enantiomerically pure product 22, the maximum value of the CD signal was determined (43). Mixtures of 22 having diﬀerent enantiomer ratios (and, therefore, ee values) were prepared and analyzed precisely by chiral GC in control experiments. The same samples were studied by CD, resulting in the compilation of g values. Upon plotting the g against the ee values, a linear dependency was in fact observed with a correlation factor of r = 0.99995 which translates into the following simple equation for enantioselectivity: ee ¼ 3176:4 g 8:0 The possible dependency of the g factors on concentration was then studied (43). A mixture of (S)- and (R)-22 corresponding to an enantiomeric excess of ee = 20% was prepared at a concentration of 10 Al ml1 in acetonitrile, which was then successively diluted. It was shown that no dependency of g on concentration (standard deviation = 2.6%) exists. Thus possible aggregation due to hydrogen bonding between two or more molecules of the product (S)- and (R)-22 in this medium, which could lead to artifacts, is not involved, making the system amenable to CD analysis and, therefore, to high-throughput analysis. Although complete optimization was not carried out, separation of 21 from (S)/(R)-22 was in fact accomplished using reversed phase silica as the column material and methanol/water (47/53) as the eluant. In view of the results concerning the dependency of the g factor on concentration (see above), aggregation can be excluded in this protic medium. Fig. 9 shows the

576

Reetz

Figure 9 HPLC chromatogram of a mixture of 21 (peak 1) and (S)/(R)-22 (peak 2) (Ref. 25).

corresponding HPLC chromatogram in which the mixture is fully separated within less than 1.5 min. Thus using the JASCO-CD-1595 instrument in conjunction with a robotic autosampler, it is possible to perform about 700– 900 exact ee determinations per day (26,43). In some cases, it is also possible to obtain reliable ee values using CDbased assays although no LC separation is performed whatsoever (26,43, 44). Prerequisite is a prochiral substrate (e.g., a meso-compound) as well as a UV-active product (chromophore) which is formed as the enantioselective reaction proceeds. The absorption maximum of the prochiral compound has to diﬀer considerably from that of the desired chiral product. This new principle is illustrated by the lipase-catalyzed enantioselective acylation of the meso-diol 24 by benzoic acid p-nitrophenyl ester (25) with formation of the chiral product 26 and the yellow-colored p-nitrophenolate (3) having a characteristic UV/VIS absorption at 410 nm. Upon measuring the g value of the absorption maximum of 26 and the additional UV absorption of 3, all information necessary to determine conversion and enantiopurity is available without the need to perform any LC separation. The advantage of this novel approach has to do with ease of performance and the obvious prospect of higher throughput (26,43,44).

Screening for Enantioselective Enzymes

577

On the basis of these and previous studies, HPLC–UV–CD or UV– CD alone may well constitute a viable high-throughput screening system for enantioselective enzymes in a given situation. Success will depend upon the particular substrate under study. Moreover, the precautions as delineated above need to be considered. 2.5

IR-Thermographic Assays

Modern photovoltaic IR cameras equipped with focal plane array detectors are capable of detecting infrared radiation (black body radiation) emitted by objects (45). The picture obtained thereby provides a two-dimensional thermal image which is nothing but a spatial map of the temperature and the emissivity distribution of all objects in the picture. It is customary to use diﬀerent colors in the pictures to visualize diﬀerent photon intensities of the detected infrared radiation, e.g., red areas indicate ‘‘hot spots’’, blue areas denote ‘‘cold spots’’. The technique was ﬁrst used to monitor the dynamics of reactions on solid surfaces (46) and was extended to obtain temperature proﬁles of exothermic gas-phase reactions catalyzed by SiO2-supported platinum particles (47). The ﬁrst cases of parallel testing of the activity of the members of a library of heterogeneous catalysts were reported later (48 49 50). An important conceptional advancement pertains to emissivitycorrected IR thermography of large libraries of heterogeneous catalysts, a technique that requires only very small amounts of catalysts (<200 Ag) (50). The main object of this study was to visualize temperature diﬀerences solely due to the catalytic activity of the catalysts, which was achieved by applying a linear correction to the detector response and subtracting the IR image of the library just before the start of the reaction, i.e., as background (oﬀset) from the images during catalytic experiments. This means that local emissivity diﬀerences are no longer visible, and the heat evolution due to the catalytic reactions on a microtiter plate can then be reliably detected. This was successfully applied to large libraries of heterogeneous catalysts in gas-phase hydrogenation and oxidation (50). Following these developments, application in the area of enantioselective homogeneous transition metal catalysis and enzyme catalysis was reported (51,52). In doing so, a commercially available Eppendorf-Thermo-

578

Reetz

mixer was modiﬁed such that the top was replaced by an aluminum plate into which holes were drilled for cylindrical glass reaction vessels about 8 mm in diameter and 35 mm in height. The whole microtiter plate can be shaken so as to ensure agitation of the reaction contents in each well. The method was illustrated by experiments involving kinetic resolution of chiral substrates; that is, the (R) and (S) compounds are reacted separately pairwise, heat evolution signaling reactivity. In one series of experiments, time-resolved detection of an enantioselective enzyme-catalyzed kinetic resolution was demonstrated (52). In this case, the enzyme (lipase from Candida antarctica) was added to the wells of the microtiter plate in immobilized form; that is, the reaction was catalyzed by a heterogeneous catalyst. Using (R)- and (S)-1-phenylethanol (22) as the substrates separately and vinyl acetate as the acylating agent, it was demonstrated that the reaction is highly (R)-selective; that is, hot spots appeared above the wells of the microtiter plate containing (R)-22 (Fig. 10). This is in perfect agreement with the literature data according to which the ee value of the acylated form at 50% conversion is >99% in favor of (R)-23 (53).

In conclusion, IR thermography is a viable tool in the high-throughput identiﬁcation of highly active and enantioselective enzymes (or other catalysts) in exothermic processes. The method allows one to distinguish such ‘‘hits’’ from other members of a library of catalysts which are much less active or less enantioselective. However, quantiﬁcation has yet to be achieved. This

Figure 10 Time-resolved IR thermographic imaging of the lipase-catalyzed enantioselective acylation of 22 after a) 0.5 min, b) 0.5 min, and c) 3.5 min. The control experiment without enzyme is given in the bottom row in each case. The bar on the far right is the temperature/color key of the temperature window used [jC] (Ref. 52).

Screening for Enantioselective Enzymes

579

means that small diﬀerences in enantioselectivity, as usually observed in sequential rounds of enzyme mutagenesis, cannot be picked up by IRthermographic assays. Moreover, the fact that meaningful comparisons on microtiter can only be made if the same amount of enzyme is present in each well needs to be kept in mind when attempting to apply this technology in real situations. 2.6

Assays Based on Mass Spectrometry

Although several assays based on mass spectrometry (MS) have been developed for use in combinatorial catalysis (54,55), application to the screening of enantioselective catalysts is not obvious because the (R) and (S) forms of a chiral compound show identical mass spectra. However, certain chiral eﬀects, such as enantioselective host/guest interactions, ion/molecule reactions with chiral reagents, and parallel reactions for enantiomeric quantiﬁcation of peptides, have been studied by MS (56). Moreover, the absolute conﬁguration of chiral alcohols can be determined using the Horeau method in which the substrate is derivatized by (R)- and (S)conﬁgurated reagents, one of them being mass-tagged (e.g., deuterium) (57). The diastereomers can thus be distinguished by MS. To be able to measure the enantiomeric excess (ee) of a sample, a measurable degree of kinetic resolution is theoretically necessary. This principle forms the basis of a novel high-throughput ee assay (58). The technique makes use of an equimolar mixture of pseudo-enantiomeric mass-tagged chiral acylating agents that diﬀer in a substituent remote to the stereogenic center (e.g., methyl vs. hydrogen) in a way that the mass of the molecule correlates with its absolute conﬁguration. In principle, the reactions of enantiomers with chiral reagents can proceed with unequal rate constants (kf>ks; f = fast, s = slow). Chiral alcohols (R)-OH and (S)-OH and mass-tagged enantiomerically pure acylating agents are illustrated in Fig. 11 (58). The enantiomeric alcohols (R)-OH and (S)-OH are ﬁrst reacted with the chiral mass-tagged acids A-CO2H and B-CO2H in the presence of 1,3-dicyclohexylcarbodiimide. The relative amounts of the diastereomeric product esters as measured by MS can then be used to determine the enantiomeric composition of the starting mixture (R)-OH/(S)-OH and, therefore, the ee value, provided two calibration measurements are performed. Speciﬁcally, prolinederived mass-tagged acylating agents were used in great excess (Fig. 11) (58). The system fails if no measurable degree of kinetic resolution occurs. The sensitivity of the method was shown to be F10% of the ee values. Moreover, the authors have pointed out the possibility of a robotic high-throughput screening system using microtiter plates (58). In principle, the method can be used to study (bio)catalytic desymmetrization of meso-compounds, kinetic

580

Reetz

Figure 11 MS-based ee determination in the kinetic resolution of alcohols (R)-OH/ (S)-OH (58). I = peak intensity; q = correction factor for ionization; DCC = dicyclohexylcarbodiimide.

resolution of racemates, as well as transformation of prochiral compounds into chiral products. In a rather diﬀerent MS-based approach, diastereomer formation by chiral derivatization is not necessary (59). In the original version, about 1000 catalyst evaluations are possible per day (59). Indeed, the method is currently in operation in the directed evolution of enantioselective lipases and epoxide hydrolases (60). Recently, throughput has been increased by a factor of about 8 to 10 (see discussion below) (61). Ionization can be accomplished by a number of standard methods including electrospray ionization (ESI) and matrix-assisted laser desorption ionization (MALDI). Two basically diﬀerent stereochemical processes can be monitored by this

Screening for Enantioselective Enzymes

581

method, namely, kinetic resolution of racemates and asymmetric transformation of substrates which are prochiral due to the presence of enantiotopic groups (59,61). The underlying principle is based on the use of isotopically labeled substrates in the form of pseudo-enantiomers or pseudo-prochiral compounds (Fig. 12). The course of the asymmetric transformation, i.e., the relative amounts of reactants and/or products, are detected by ESI–MS. In the case of kinetic resolution, pseudo-enantiomers 27 and 28, diﬀering in absolute conﬁguration and in labeling at the functional group FG*, need to be prepared in enantiomerically pure form and then mixed in a 1:1 manner simulating a racemate (Fig. 12a). Following asymmetric func-

Figure 12 a) Asymmetric transformation of a mixture of pseudo-enantiomers involving cleavage of the functional groups FG and labeled FG*. b) Asymmetric transformation of a mixture of pseudo-enantiomers involving either cleavage or bond formation at the functional group FG; isotopic labeling at R2 is indicated by the asterisk. c) Asymmetric transformation of a pseudo-meso-substrate involving cleavage of the functional groups FG and labeled FG*. d) Asymmetric transformation of a pseudo-prochiral substrate involving cleavage of the functional groups FG and labeled FG* (Ref. 59).

582

Reetz

tional group transformation (in an ideal kinetic resolution 50% conversion), true enantiomers 29 and 30 are formed in addition to nonlabeled and labeled achiral products 31a and 31b, respectively. The ratios of the total intensities of 27/28 and 31a/31b in the MS spectra (m/z intensities of the quasimolecular ions) allow for the determination of enantiomeric purity and, therefore, enantioselectivity of a catalyst. In some cases, it may be advantageous to use an internal standard to determine the conversion (59,61). As a variation of this theme, kinetic resolution of the pseudo-enantiomers 27 and 32 in which labeling occurs at residue R2 aﬀords a new pair of pseudo-enantiomers 29 and 33 (Fig. 12b). Based on the m/z intensities of the quasi-molecular ions of 27/32 and 29/33, the conversion, enantioselectivity, and selectivity factor (s or E value) can be obtained. An internal standard is not necessary (59,61). In the case of prochiral substrates having enantiotopic groups, e.g., meso-compounds (Fig. 12c), the synthesis of a single pseudo-meso-compound suﬃces, e.g., 34, because the stereodiﬀerentiating reaction of interest delivers a mixture of two MS-detectable pseudo-enantiomers 35 and 36. The same applies to other pseudo-prochiral substrates of the type 37 (Fig. 12d). The ﬁrst system to be tested concerns the kinetic resolution of racemic 1-phenylethyl acetate (23) (59). For this purpose, the pseudo-enantiomers (S)-23 and (R)-40 were prepared in enantiomerically pure form. To test the assay system, these two compounds were mixed in various ratios and the resulting mixtures were analyzed by GC to ascertain the exact pseudo-ee values as a control. Thereafter, the same samples were analyzed by ESI–MS. A typical ESI–mass spectrum is shown in Fig. 13. Because the sodium

Figure 13

ESI–mass spectrum of a sample containing (S)-23 and (R)-40 (Ref. 59).

Screening for Enantioselective Enzymes

583

adducts of (S)-23 and (R)-40 appear at diﬀerent m/z values due to the deuterium labeling, integration is a simple manner. A total of 17 control samples were studied, and the correspondence between ee values determined by GC and ESI–MS of 17 samples is excellent (F5%) (59). Therefore in real practice, as in the directed evolution of enantioselective lipases, 1:1 mixtures of (S)-23 and (R)-40 are used to simulate a racemate in the actual catalytic process. In contrast to a number of other methods which suﬀer from the fact that (R)- and (S)-conﬁgurated substrates are tested separately as pairs on microtiter plates, the present system utilizes 1:1 mixtures of pseudo-enantiomers in kinetic resolutions. Moreover, analogous reactions on the solid phase, if necessary, should pose no problems.

An experimental setup capable of high-throughput screening of enantioselective reactions was then devised. This was achieved by combining an automated liquid sampler for microtiter plates (96-format) with an ESI–MS system, both commercially available (Fig. 14) (59,61). This ﬁrst generation unit allows in a single day about 1000 rather precise determinations of the ee value and the conversion (and thus E) of such transformations as the above model reaction. The uncertainty in the ee value is only F2%. As apparent from Fig. 12, deuterium labeling can be performed at any position of the substrate. It is advisable to perform a quick kinetic study of labeled and nonlabeled substrates to exclude possible secondary isotope eﬀects. As an example of asymmetric transformation of a prochiral substrate bearing reactive enantiotopic groups, the desymmetrization of cis-1,4-di-

Figure 14

ESI–MS-based ee-screening system (Refs. 59,61).

584

Reetz

acetoxy-cyclopentene was described (59). In this case, the pseudo-prochiral compound 43 was prepared. The products of asymmetric transformation are compounds 44 and 45, each having two stereogenic centers. Because they are pseudo-enantiomers diﬀering in mass, they can easily be distinguished by ESI–MS. It has been demonstrated that this assay is well suited in the directed evolution of an enantioselective lipase from Bacillus subtilis, by performing saturation mutagenesis systematically at every position of the enzyme, speciﬁcally for evolving mutants that catalyze the enantioselective hydrolysis of 43 (60,61).

More recently, a signiﬁcant increase in throughput has been achieved, which was possible on the basis of instrumental improvements (61). MS instruments equipped with eight-channel multiplex spray systems are now available. Appropriate second-generation modiﬁcation to suit the demands of high-throughput allows the determination of 8000 to 10,000 ee values per day, making this screening system one of the most powerful, precise, and practical ee assays currently available (61). It is being applied in the directed evolution of enantioselective lipases and epoxide hydrolases (60,61). Other types of isotopic labeling are also possible, e.g., 15 N (59b). This has been applied in the directed evolution of an enantioselective nitrilase (62). 2.7

Assays Based on DNA Microarrays

A novel ee assay was recently reported in which DNA microarrays are used (63). This type of technology had previously been employed to determine relative gene expression levels on a genomewide basis as measured by the ratio of ﬂuorescent reporters (64). In the case of the ee assay, the goal was to measure the enantiopurity of chiral amino acids (63). Mixtures of (R)/(S) amino acid were ﬁrst subjected to acylation at the amino function with formation of N-Boc-protected derivatives. Samples were then covalently attached to amine-functionalized glass slides in a spatially arrayed manner (Fig. 15). In a second step, the uncoupled surface amino functions were

Figure 15 Reaction microarrays in high-throughput ee determination (Ref. 63). Reagents and conditions: step 1) BocHNCH(R)CO2H, PyAOP, iPr2NEt, N,Ndimethylformamide (DMF); step 2) Ac2O, pyridine; step 3) 10% CF3CO2H and 10% Et3SiH in CH2Cl2, then 3% Et3N in CH2Cl2; step 4) pentaﬂuorophenyl diphenylphosphinate, iPr2NEt, 1:1 mixture of the two ﬂuorescent proline derivatives, DMF, 20jC.

Screening for Enantioselective Enzymes

585

586

Reetz

acylated exhaustively. The third step involved complete deprotection to aﬀord the free amino function of the amino acid. Finally, in a fourth step, two pseudo-enantiomeric ﬂuorescent probes were attached to the free amino groups on the surface of the array. An appreciable degree of parallel kinetic resolution in the process of amide coupling is a requirement for the success of the ee assay similar to one of the second MS-based systems described above (58) [Horeau principle (57)]. In the present case, the ee values are accessible by measuring the ratio of the relevant ﬂuorescent intensities. It was reported that 8000 ee determinations are possible per day, precision amounting to F10% of the actual value. Although it was not explicitly demonstated that this ee assay can be used to evaluate enzymes (e.g., proteases), this should in fact be possible. The question whether other types of substrates (and enzymes) are amenable to this type of screening also needs to be addressed. 2.8

Enzymatic Method for Determining Enantiomeric Excess (EMDee)

Recently an enzymatic method for determining enantiomeric excess (EMDee) has been described (65). It is based on the idea that an appropriate enzyme can be used to selectively process one enantiomer of a product from a catalytic or a biocatalytic reaction. In the original paper, the well-known catalytic addition of diethylzinc (47) to benzylaldehyde (46) was chosen as a test reaction for demonstrating EMDee. The reaction product, 1-phenylpropanol (48), can be oxidized to ethyl phenyl ketone (49) using the alcohol dehydrogenase from Thermoanaerobium sp., this process being completely (S)-selective (Fig. 16). It was possible to measure the rate of this enzymatic oxidation by monitoring the formation of nicotinamide adenine dinucleotide phosphate (NADPH) by UV spectroscopy at 340 nm.

Figure 16 Scheme illustrating EMDee in the case of 1-phenylpropanol produced by asymmetric addition of diethylzinc to benzaldehyde (Ref. 65).

Screening for Enantioselective Enzymes

587

Decisive for the success of the assay is the ﬁnding that the rate of oxidation constitutes a direct measure of the ee (65). High-throughput was demonstrated by analyzing 100 samples in a 384-well format using a UV/ ﬂuorescence plate reader. Each sample contained 1 Amol of 1-phenylpropanol (48) in a volume of 100 Al. The accuracy in the ee value amounts to F10% as checked by independent GC determinations. About 100 samples could be processed within 30 min (65), which calculates to be 4800 ee determinations per day. It should be noted that EMDee does not distinguish between processes that proceed with low enantioselectivity but high conversion and with high enantioselectivity but low conversion. Therefore EMDee was extended to provide information regarding both ee and conversion (64). In a second set of assays, the (R)-selective alcohol dehydrogenase from Lactobacillus keﬁr was used to quantify the amount of (R)-48 present in the mixture. Because the amounts of (R)-48 and (S)-48 are known, conversion can be calculated. It is currently unclear how general the EMDee assay is in the case of other chiral alcohols which do not show such high enantioselectivity in the alcohol dehydrogenase-catalyzed oxidation. In this case, a diﬀerent and more selective alcohol dehydrogenase should be used. Indeed, a large number of such enzymes are commercially available. In summary, EMDee constitutes an interesting way to determine the ee of alcohols in a highthroughput manner using standard instrumentation. Of course, the assay has to be optimized in each new case of a chiral alcohol under study. 2.9

Enzyme Immunoassays as a Means to Measure Enantiomeric Excess

Another recent development concerns high-throughput screening of enantioselective catalysts by enzyme immunoassays (66), a technology that is routinely applied in biology and medicine. As in the case of some of the other screening systems, this new assay was not developed speciﬁcally for enzymecatalyzed processes. In fact, it was illustrated by analyzing (R)/(S) mixtures of mandelic acid generated by enantioselective Ru-catalyzed hydrogenation of benzoyl formic acid (50) (Fig. 17). By employing an antibody that binds both enantiomers, it was possible to measure the concentration of the reaction product, thereby allowing the yield to be calculated. The use of an (S)speciﬁc antibody then makes the determination of ee possible (Fig. 17). Of course, the success of this assay depends upon the availability of speciﬁc antibodies; indeed, these can be raised to almost any compound of interest. Moreover, a simple automated equipment comprising a plate washer and a plate absorbance reader is all that is necessary. About 1000 ee determinations are possible per day, the precision amounts to F9% (66).

588

Reetz

Figure 17 Scheme illustrating high-throughput screening of enantioselective catalysts by competitive enzyme immunoassays (Ref. 66). The antibody marked blue recognizes both enantiomers, whereas the antibody marked red is (S)-speciﬁc, making the determination of yield and ee possible.

2.10

NMR-Based Assays

Magnetic resonance imaging (NMR) spectroscopy is traditionally viewed as a relatively slow analytical procedure and, therefore, may not appear to be amenable to high-throughput analyses. However, progress has in fact been made in the utilization of NMR methods in combinatorial drug discovery processes both on solid supports and in solution (67). Additional advancements pertaining to the miniaturization of probes and the development of cryo-probes are expected to stimulate future progress in the use of NMR spectroscopy in combinatorial chemistry. Because high-throughput stands at the heart of combinatorial catalysis, NMR spectroscopy, if applied in this area, needs to be modiﬁed. One possible approach is magnetic resonance imaging (NMR tomography) which is used successfully in medicine to image tissues and organs. In high-throughput ee screening, the goal is to obtain tomograms of microtiter plates on which enantioselective reactions are occurring. In exploratory experiments, the principle was illustrated, but quantitative evaluation and implementation in a real system still needs to be accomplished (9a,26). However, the current technological state of instrumentation makes real applications diﬃcult. The test measurements were made with a sample head having a diameter of 5 mm, 8.5 min being needed. However, the length of a microtiter plate is 12 cm, and the recording time increases rapidly with increasing cavity size in the magnet. Nevertheless, this method has some potential in ee determinations (and other types of screening) if a combination of specially formed sample vessels (which are designed to utilize the round magnet more eﬃciently) and a small sample head were to be developed. A second and very diﬀerent NMR-based approach promises to be truly practical (68). Indeed, organic chemists and biochemists are well versed in solution NMR spectroscopy and may thus prefer this method. In one manifestation, the assay makes use of the concept of isotopically labeled

Screening for Enantioselective Enzymes

589

pseudo-enantiomers and pseudo-meso-compounds. This is related to one of the MS systems described above (59,61). In this case, 1H NMR is the detection system. This again means that monitoring kinetic resolution of chiral compounds and/or desymmetrization of prochiral compounds bearing reactive enantiotopic groups is possible. In the present NMR system, isotope labeling of one enantiomer in a given pseudo-racemate is best performed with 13C. An example is the lipase-catalyzed kinetic resolution of the pseudo-enantiomeric pair (R)-23/(S)-51 in which the latter contains a 13 C-labeled acetoxy moiety. In kinetic resolution, it is necessary to determine the ratio after a given period of time (e.g., at 50% conversion). The 1H NMR spectra of (R)-23 and (S)-51 are quite diﬀerent due to the 13C–1H coupling in the methyl group of (S)-51. In the unlabeled case (R)-23, the methyl group appears as a singlet, whereas the 13C-labeled (S)-51 gives rise to a doublet, and these peaks are easily integrated.

It was readily demonstrated that in various mixtures of (R)-23/(S)-51, integration of the methyl peaks allows for the determination of the relative amounts of the pseudo-enantiomers present. Thus ee values are accessible. These turned out to be amazingly accurate (F2%) as checked by chiral GC. Moreover, using ﬂow-through systems now commercially available, it is possible to measure at least 1400 samples per day (68). With new NMR cell systems, it is likely that throughput can be increased by a factor of at least 8. In a second and more general manifestation of this NMR-based ee assay, the mixture of enantiomers is ﬁrst derivatized by a chiral reagent using a robotic system (68). Using a ﬂow-through cell, system integration of the diastereomers aﬀords ee values. Precision in this embodiment is slightly lower (F5%). Currently, throughput amounts to about 1400 samples per day. Upon using parallel ﬂow-through cells and chemical imaging (tomography), it should be possible to increase this by a factor of about 8. 2.11

IR-Based Assays

The concept of using isotopic labeling in order to distinguish enantiomers can also be applied to IR spectroscopy in appropriate cases, as in the kinetic resolution of acetates of the 23 (69). In this case 13C labeling is introduced at the carbonyl C atom. The method is very cheap.

590

3

Reetz

CONCLUSION

A number of approaches have been described which make possible the highthroughput determination of ee values. Not all of them have been developed for assessing the enantioselectivity of enzymes. However, modiﬁcations to include biocatalysts are possible. On the practical side, it is useful to apply some kind of an (achiral) prescreening test to eliminate nonactive (‘‘dead’’) enzyme mutants. This reduces the size of the library to be tested and, therefore, maximizes eﬃciency. Presently, several colorimetric ee assays are available for certain types of transformations, e.g., ester hydrolyses catalyzed by lipases, esterases, proteases, or catalytic antibodies. Such assays allow the screening of up to a few thousand samples per day. However, they are only semiquantitative; that is, they are accurate enough to identify hits which then need to be analyzed by conventional analytical techniques. In some cases, instrumentally modiﬁed chiral GC or HPLC may suﬃce; in favorable cases, the exact analysis of about 700 samples is possible per day. Of the other approaches reported so far for the high-throughput screening of enantioselective enzymes, the MS-, JR-, and NMR-based assays using isotopically labeled substrates belong to the most eﬃcient and accurate systems currently known. They allow the precise determination of 1000 to 10,000 ee values per day. Capillary array electrophoresis is also worthy of mention because a throughput of 8000 to 20,000 samples per day at high precision is realistic in some cases. Other assays show lower precision in the ee value (F10%) which suﬃces in the early stages of directed evolution of an enantioselective enzyme. Hits are thereby easily identiﬁed; however, in the later stages of optimization, e.g., when going from ee = 90% to ee>98%, problems arise. Then more precise ee assays are necessary. It needs to be emphasized that no single assay is universal. Rather, the best systems may turn out to be complementary.

REFERENCES 1a. AN Collins, GN Sheldrake, J Crosby. Chirality in Industry: The Commercial Manufacture and Applications of Optically Active Compounds. Chichester: Wiley, 1992, pp 409. 1b. AN Collins, GN Sheldrake, J Crosby, eds. Chirality in Industry II: Developments in the Commercial Manufacture and Applications of Optically Active Compounds. Chichester: Wiley, 1997, pp 411. 1c. RA Sheldon. Chirotechnology: Industrial Synthesis of Optically Active Compounds. New York: Dekker, 1993, pp 416. 1d. SC Stinson. Chiral drug interactions. Chem Eng News 77(41):101–120, 1999.

Screening for Enantioselective Enzymes

591

1e. SC Stinson. Chiral drugs. Chem Eng News 78(43):55–78, 2000. 1f. AM Rouhi. Chiral roundup as pharmaceutical companies face bleak prospects, their suppliers diligently tend the fertile ﬁeld of chiral chemistry in varied ways. Chem Eng News 80(23):43–50, 2002. 2a. EN Jacobsen, A Pfaltz, H Yamamoto. Comprehensive Asymmetric Catalysis. Vol. I–III. Berlin: Springer, 1999, pp 1500. 2b. H Brunner, W Zettlmeier. Handbook of Enantioselective Catalysis with Transition Metal Compounds. Vol. I-II. Weinheim: VCH, 1993. 2c. R Noyori. Asymmetric Catalysis in Organic Synthesis. Wiley, 1994, pp 378. 2d. I Ojima, Ed. Catalytic Asymmetric Synthesis. Weinheim: VCH, 1993, pp 476. 2e. DJ Berrisford, C Bolm, KB Sharpless. Ligand-accelerated catalysis. Angew Chem 107: 1159–1171, 1995; Angew Chem, Int Ed Engl 34:1059–1070, 1995. 3a. HG Davies, RH Green, DR Kelly, SM Roberts. Biotransformations in Preparative Organic Chemistry: The Use of Isolated Enzymes and Whole Cell Systems in Synthesis. London: Academic Press, 1989, pp 268. 3b. CH Wong, GM Whitesides. Enzymes in Synthetic Organic Chemistry. Tetrahedron Organic Chemistry Series. Vol. 12. Oxford: Pergamon, 1994, pp 370. 3c. K Drauz, H Waldmann. Enzyme Catalysis in Organic Synthesis: A Comprehensive Handbook. Vol. I-II. Weinheim: VCH, 1995. 3d. K Faber. Biotransformations in Organic Chemistry. 3rd ed. Berlin: Springer, 1997, pp 402. 4. A Liese, K Seelbach, C Wandrey. Industrial Biotransformations. Weinheim: Wiley-VCH, 2000, pp 423. 5a. SF Brady, CJ Chao, J Handelsman, J Clardy. Cloning and heterologous expression of a natural product biosynthetic gene cluster from eDNA. Org Lett 3:1981–1984, 2001. 5b. P Hugenholtz, BM Goebel, NR Pace. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol 180:4765– 4774, 1998. 5c. SB Bintrim, TJ Donohue, J Handelsman, GP Roberts, RM Goodman. Molecular phylogeny of archaea from soil. Proc Natl Acad Sci U S A 94:277–282, 1997. 5d. G DeSantis, Z Zhu, WA Greenberg, K Wong, J Chaplin, SR Hanson, B Farwell, LW Nicholson, CL Rand, DP Weiner, DE Robertson, MJ Burk. An enzyme library approach to biocatalysis: development of nitrilases for enantioselective production of carboxylic acid derivatives. J Am Chem Soc 124: 9024–9025, 2002. 6a. FH Arnold. Combinatorial and computational challenges for biocatalyst design. Nature (London) 409:253–257, 2001. 6b. KA Powell, SW Ramer, SB del Cardayre´, WPC Stemmer, MB Tobin, PF Longchamp, GW Huisman. Directed evolution and biocatalysis. Angew Chem 113:4068–4080; Angew Chem Int Ed 40:3948–3959, 2001. 6c. RC Cadwell, GF Joyce. Randomization of genes by PCR mutagenesis. PCR Methods Appl 2:28–33, 1992. 6d. MT Reetz, K-E Jaeger. Superior biocatalysts by directed evolution. Top Curr Chem 200:31–57, 1999.

592

Reetz

6e. JD Sutherland. Evolutionary optimisation of enzymes. Curr Opin Chem Biol 4:263–269, 2000. 6f. MT Reetz. Directed evolution of selective enzymes and hybrid catalysts. Tetrahedron 58:6595–6602, 2002. 7a. MT Reetz, A Zonta, K Schimossek, K Liebeton, K-E Jaeger. Creation of enantioselective biocatalysts for organic chemistry by in vitro evolution. Angew Chem 109:2961–2963; Angew Chem, Int Ed Engl. 36:2830–2832, 1997. 7b. K Liebeton, A Zonta, K Schimossek, M Nardini, D Lang, BW Dijkstra, MT Reetz, K-E Jaeger. Directed evolution of an enantioselective lipase. Chem Biol 7:709–718, 2000. 7c. MT Reetz, K-E Jaeger. Enantioselective enzymes for organic synthesis created by directed evolution. Chem Eur J 6:407–412, 2000. 7d. MT Reetz, S Wilensek, D Zha, K-E Jaeger. Directed evolution of an enantioselective enzyme through combinatorial multiple cassette mutagenesis. Angew Chem 113:3701–3703; Angew Chem Int Ed 40:3589–3591, 2001. 7e. D Zha, A Eipper, MT Reetz. Assembly of designed oligonucleotides as an eﬃcient method for gene recombination: a new tool in directed evolution. Chem BioChem 4:34–39, 2003. 8a. UT Bornscheuer, J Altenbuchner, HH Meyer. Directed evolution of an esterase for the stereoselective resolution of a key intermediate in the synthesis of epothilones. Biotechnol Bioeng 58:554–559, 1998. 8b. O May, PT Nguyen, FH Arnold. Inverting enantioselectivity by directed evolution of hydantoinase for improved production of L-methionine. Nat Biotechnol 18:317–320, 2000. 8c. S Fong, TD Machajewski, CC Mak, C-H Wong. Directed evolution of D-2keto-3-deoxy-6-phosphogluconate aldolase to new variants for the eﬃcient synthesis of D- and L-sugars. Chem Biol 7:873–883, 2000. 9. Reviews of high-throughput ee-assays: 9a. MT Reetz. Combinatorial and evolution-based methods in the creation of enantioselective catalysts. Angew Chem 113:292–320; Angew Chem Int Ed 40:284–310, 2001. 9b. MT Reetz. New methods for the high-throughput screening of enantioselective catalysts and biocatalysts. Angew Chem 114:1391–1394; Angew Chem Int Ed 41:1335–1338, 2002. 9c. D Wahler, J-L Reymond. High-throughput screening for biocatalysts. Curr Opin Biotechnol 12:535–544, 2001. 10. F Zocher, MM Enzelberger, UT Bornscheuer, B Hauer, RD Schmid. A colorimetric assay suitable for screening epoxide hydrolase activity. Anal Chim Acta 391:345–351, 1999. 11. MT Reetz, CJ Ru¨ggeberg. A screening system for enantioselective enzymes based on diﬀerential cell growth. Chem Commun (Cambridge) 1428–1429, 1996. 12. See, for example: 12a. H Zhao, FH Arnold. Combinatorial protein design: strategies for screening protein libraries. Curr Opin Struct Biol 7:480–485, 1997.

Screening for Enantioselective Enzymes

593

12b. G Gauglitz. Optical detection methods for combinatorial libraries. Curr Opin Chem Biol 4:351–355, 2000. 12c. I Venekei, L Hedstrom, WJ Rutter. A rapid and eﬀective procedure for screening protease mutants. Protein Eng 9:85–93, 1996. 12d. G Xue, H Pang, ES Yeung. Multiplexed capillary zone electrophoresis and micellar electrokinetic chromatography with internal standardization. Anal Chem 71:2642–2649, 1999. 12e. H Fenniri. Rapid screening of biocatalysts. Chemtech 26:15–25, 1996. 12f. RP Hertzberg, AJ Pope. High-throughput screening: new technology for the 21st century. Curr Opin Chem Biol 4:445–451, 2000. 12g. RKC Knaust, P Nordlund. Screening for soluble expression of recombinant proteins. Anal Biochem 297:79–85, 2001. 12h. D Wahler, F Badalassi, P Crotti, J-L Reymond. Enzyme ﬁngerprints by ﬂuorogenic and chromogenic substrate assays. Angew Chem 113:4589–4592; Angew Chem Int Ed 40:4457–4460, 2001. 12i. KD Janda, L-C Lo, C-HL Lo, M-M Sim, R Wang, C-H Wong, RA Lerner. Chemical selection for catalysis in combinatorial antibody libraries. Science (Washington, DC) 275:945–948, 1997. 13. D Zha, S Wilensek, M Hermes, K-E Jaeger, MT Reetz. Complete reversal of enantioselectivity of an enzyme-catalyzed reaction by directed evolution. Chem Commun (Cambridge) 2664–2665, 2001. 14. LE Janes, RJ Kazlauskas. Quick E. A fast spectrophotometric method to measure the enantioselectivity of hydrolases. J Org Chem 62:4560–4561, 1997. 15. LE Janes, AC Lo¨wendahl, RJ Kazlauskas. Quantitative screening of hydrolase libraries using pH indicators: identifying active and enantioselective hydrolases. Chem Eur J 4:2324–2331, 1998. 16. R Kazlauskas. Abstract of a lecture presented at the Enzyme Technologies 2000 Pre-Conference, Workshop on High-Throughput Screening, International Business Communications, Las Vegas, 2000. 17. F Morı´ s-Varas, A Shah, J Aikens, NP Nadkarni, JD Rozzell, DC Demirjian. Visualization of enzyme-catalyzed reactions using pH indicators: rapid screening of hydrolase libraries and estimation of the enantioselectivity. Bioorg Med Chem 7:2183–2188, 1999. 18. C Ru¨ggeberg. Beitra¨ge zur gerichteten Evolution von Enzymen fu¨r die organische Synthese. PhD dissertation, Ruhr-Universita¨t, Bochum, Germany, 2001. 19. M Baumann, R Stu¨rmer, UT Bornscheuer. A high-throughput-screening method for the identiﬁcation of active and enantioselective hydrolases. Angew Chem 113:4329–4333; Angew Chem Int Ed 40:4201–4204, 2001. 20a. AW Czarnik, (Ed.). (1996). Fluorescent Chemosensors for Ion and Molecule Resognition. ACS Symp Ser (538). Washington, DC: American Chemical Society, 1993, pp 235. 20b. AP de Silva, HQN Gunaratne, T Gunnlaugsson, AJM Huxley, CP McCoy, JT Rademacher, TE Rice. Signaling recognition events with ﬂuorescent sensors and switches. Chem Rev (Washington, DC) 97: 1515–1566, 1997. 20c. G Zandonella, L Haalck, F Spener, K Faber, F Paltauf, A Hermetter. Enan-

594

21. 22a. 22b.

23. 24. 25.

26.

27. 28. 29. 30. 31. 32a.

32b. 33a. 33b. 33c. 33d. 33e.

33f.

Reetz tiomeric perylene-glycerolipids as ﬂuorogenic substrates for a dual wavelength assay of lipase activity and stereoselectivity. Chirality 8:481–489, 1996. G Klein, J-L Reymond. Enantioselective ﬂuorogenic assay of acetate hydrolysis for detecting lipase catalytic antibodies. Helv Chim Acta 82:400–406, 1999. GT Copeland, SJ Miller. A chemosensor-based approach to catalyst discovery in solution and on solid support. J Am Chem Soc 121:4306–4307, 1999. ER Jarvo, CA Evans, GT Copeland, SJ Miller. Fluorescence-based screening of asymmetric acylation catalysts through parallel enantiomer analysis. Identiﬁcation of a catalyst for tertiary alcohol resolution. J Org Chem 66: 5522–5527, 2001. WA Ko¨nig. Gas Chromatographic Enantiomer Separation with Modiﬁed Cyclodextrins. Heidelberg: Hu¨thig, 1992, pp 163. AM Krstulovic, ed. Chiral Separation by HPLC. Chichester: Ellis Horwood, 1989, pp 548. MT Reetz, KM Ku¨hling, S Wilensek, H Husmann, UW Ha¨usig, M Hermes. A GC-based method for high-throughput screening of enantioselective catalysts. Catal Today 67:389–396, 2001. KM Ku¨hling. Beitra¨ge zur Antibiotikaforschung. Naturstoﬃsolierung, enzymatische Racematspaltung und Screening Systeme. PhD dissertation, RuhrUniversita¨t, Bochum, Germany, 1999. GC instruments and data bus (HP-IB) are commercially available from Hewlett-Packard, Waldbronn, Germany. The sample manager PAL is commercially available from CTC, Schlieren, Switzerland. ChemstationR is commercially available from Hewlett-Packard, Waldbronn, Germany. Microsoft Excel is commercially available from Microsoft, Unterschleissheim, Germany. MT Reetz, A Deege, F Daligauld, unpublished results. MT Reetz, KM Ku¨hling, A Deege, H Hinrichs, D Belder. Super-high-throughput screening of enantioselective catalysts by using capillary array electrophoresis. Angew Chem 112:4049–4052; Angew Chem Int Ed 39:3891–3893, 2000. MT Reetz, KM Ku¨hling, A Deege, H Hinrichs, D Belder. Studiengesellschaft Kohle mbH. Patent application DE-A 100 42 451.1, 2000. B Chankvetadze. Capillary Electrophoresis in Chiral Analysis. Chichester: Wiley, 1997. E Gassmann, JE Kuo, RN Zare. Electrokinetic separation of chiral compounds. Science (Washington, DC) 230:813–814, 1985. LG Blomberg, H Wan. Determination of enantiomeric excess by capillary electrophoresis. Electrophoresis 21:1940–1952, 2000. H Nishi, T Fukuyama, S Terabe. Chiral separation by cyclodextrin-modiﬁed micellar electrokinetic chromatography. J Chromatogr 553:503–516, 1991. S Fanali. Separation of optical isomers by capillary zone electrophoresis based on host guest complexation with cyclodextrins. J Chromatogr 474:441–446, 1989. A Guttman, A Paulus, AS Cohen, N Grinberg, BL Karger. Use of complexing agents for selective separation in high-performance capillary electrophoresis—

Screening for Enantioselective Enzymes

33g.

33h. 33 i. 34a. 34b. 34c. 34d.

34e.

35a.

35b.

35c.

35d.

35e. 35f.

35g.

35h. 36. 37.

595

chiral resolution via cyclodextrins incorporated within polyacrylamide-gel columns. J Chromatogr 448:41–53, 1988. D Belder, G Schomburg. Chiral separations of basic and acidic compounds in modiﬁed capillaries using cyclodextrin-modiﬁed capillary zone electrophoresis. J Chromatogr A 666:351–365, 1994. D Wistuba, V Schurig. Enantiomer separation of chiral pharmaceuticals by capillary electrochromatography. J Chromatogr A 875:255–276, 2000. G Blaschke, B Chankvetadze. Enantiomer separation of drugs by capillary electromigration techniques. J Chromatogr A 875:3–25, 2000. XC Huang, MA Quesada, RA Mathies. DNA sequencing using capillary array electrophoresis. Anal Chem 64:2149–2154, 1992. H Kambara, S Takahashi. Multi-sheathﬂow capillary array DNA analyser. Nature (London) 361:565–566, 1993. NJ Dovichi. DNA sequencing by capillary electrophoresis. Electrophoresis 18:2393–2399, 1997. G Xue, H Pang, ES Yeung. Multiplexed capillary zone electrophoresis and micellar electrokinetic chromatography with internal standardization. Anal Chem 71:2642–2649, 1999. S Behr, M Ma¨tzig, A Levin, H Eickhoﬀ, C Heller. A fully automated multicapillary electrophoresis device for DNA analysis. Electrophoresis 20:1492–1507, 1999. DJ Harrison, K Fluri, K Seiler, Z Fan, CS Eﬀenhauser, A Manz. Micromachining miniaturized capillary electrophoresis-based chemical analysis system on a chip. Science (Washington, DC) 261:895–897, 1993. SC Jacobson, R Hergenroder, LB Koutny, RJ Warmack, JM Ramsey. Eﬀects of injection schemes and column geometry on the performance of microchip electrophoresis devices. Anal Chem 66:1107–1113, 1994. LD Hutt, DP Glavin, JL Bada, RA Mathies. Microfabricated capillary electrophoresis amino acid chirality analyzer for extraterrestrial exploration. Anal Chem 71:4000–4006, 1999. D Schmalzing, L Koutny, A Adourian, P Belgrader, P Matsudaira, D Ehrlich. DNA typing in thirty seconds with a microfabricated device. Proc Natl Acad Sci U S A 94:10273–10278, 1997. SC Jacobson, CT Culbertson, JE Daler, JM Ramsey. Microchip structures for submillisecond electrophoresis. Anal Chem 70:3476–3480, 1998. S Liu, H Ren, Q Gao, DJ Roach, RT Loder Jr, TM Armstrong, Q Mao, I Blaga, DL Barker, SB Jovanovich. Automated parallel DNA sequencing on multiple channel microchips. Proc Natl Acad Sci U S A 97:5369–5374, 2000. SR Wallenborg, CG Bailey. Separation and detection of explosives on a microchip using micellar electrokinetic chromatography and indirect laser-induced ﬂuorescence. Anal Chem 72:1872–1878, 2000. I Rodriguez, LJ Jin, SFY Li. High-speed chiral separations on microchip electrophoresis devices. Electrophoresis 21:211–219, 2000. MegaBACE is commercially available from Amersham Pharmacia Biotech, Freiburg, Germany. F Balkenhohl, K Ditrich, B Hauer, W Ladner. Optisch aktive Amine durch

596

38. 39.

40. 41. 42a.

42b.

43.

44. 45.

46.

47.

48.

49a.

49b. 50.

51. 52a.

Reetz Lipase-katalysierte Methoxyacetylierung. J Prakt Chem/Chem-Ztg 339:381– 384, 1997. MT Reetz, A Zonta, K Schimossek, K Liebeton, K-E Jaeger. Studiengesellschaft Kohle mbH. Patent application DE-A 197 31 990.4, 1997. AF Drake, JM Gould, SF Mason. Simultaneous monitoring of lightabsorption and optical activity in the liquid chromatography of chiral substances. J Chromatogr 202:239–245, 1980. P Salvadori, C Bertucci, C Rosini. Circular dichroism detection in HPLC. Chirality 3:376–385, 1991. A Mannschreck. On-line measurement of circular dichroism spectra during enantioselective liquid chromatography. Trends Anal Chem 12:220–225, 1993. K Ding, A Ishii, K Mikami. Super high throughput screening (SHTS) of chiral ligands and activators: asymmetric activation of chiral diol-zinc catalysts by chiral nitrogen activators for the enantioselective addition of diethylzinc to aldehydes. Angew Chem 111:519–523; Angew Chem Int Ed 38:497–501, 1999. R Angelaud, Y Matsumoto, T Korenaga, K Kudo, M Senda, K Mikami. Optical rotation per refractive index unit, or enantiomeric (e) factor, for screening enantioselective catalysts through asymmetric activation of carbohydrates. Chirality 12:544–547, 2000. MT Reetz, KM Ku¨hling, H Hinrichs, A Deege. Circular dichroism as a detection method in the screening of enantioselective catalysts. Chirality 12:479– 482, 2000. MT Reetz, A Eipper, KM Ku¨hling, unpublished results. U Glu¨ckert. Erfassung und Messung von Wa¨rmestrahlung: Eine praktische Einfu¨hrung in die Pyrometrie und Thermographie. Mu¨nchen: Franzis, 1992, pp 153. PC Pawlicki, RA Schmitz. Spatial eﬀects on supported catalysts: thermal infrared imaging is a useful tool for studying local rate variations on catalytic surfaces in situ. Chem Eng Prog 83:40–45, 1987. G Georgiades, VA Self, PA Sermon. IR-emission analysis of temperature proﬁles of Pt/SiO2 catalysts in exothermic reactions. Angew Chem 99:1050– 1052, 1987. Angew Chem, Int Ed Engl 26:1042–1043, 1987. FC Moates, M Somani, J Annamalai, JT Richardson, D Luss, RC Willson. Infrared thermographic screening of combinatorial libraries of heterogeneous catalysts. Ind Eng Chem Res 35:4801–4803, 1996. SJ Taylor, JP Morken. Thermographic selection of eﬀective catalysts from an encoded polymer-bound library. Science (Washington, DC) 280:267–270, 1998. DE Bergbreiter. Infrared thermographic screening of combinatorial libraries of heterogeneous catalysts. Chemtracts 10:683–686, 1997. A Holzwarth, H-W Schmidt, WF Maier. Detection of catalytic activity in combinatorial libraries of heterogeneous catalysts by IR thermography. Angew Chem 110:2788–2792; Angew Chem Int Ed 37:2644–2647, 1998. HM Becker. Neue Screening-Systeme fu¨r die enantioselektive Bio- und Metallkatalyse. PhD dissertation, Ruhr-Universita¨t, Bochum, Germany, 2000. MT Reetz, MH Becker, KM Ku¨hling, A Holzwarth. Time-resolved IR-ther-

Screening for Enantioselective Enzymes

52b.

53.

54a.

54b. 55a.

55b. 55c.

55d.

56a.

56b. 56c.

56d.

56e.

56f.

597

mographic detection and screening of enantioselectivity in catalytic reactions. Angew Chem 110:2792–2795; Angew Chem Int Ed 37:2647–2650, 1998. MT Reetz, M Hermes, MH Becker. Infrared-thermographic screening of the activity and enantioselectivity of enzymes. Appl Microbiol Biotechnol 55:531– 536, 2001. ALE Larsson, BA Persson, J-E Ba¨ckvall. Enzymatic resolution of alcohols coupled with ruthenium-catalyzed racemization of the substrate alcohol. Angew Chem 109:1256–1258.; Angew Chem, Int Ed Engl 36:1211–1212, 1997. Reviews of combinatorial methods in materials science and in catalysis: B Jandeleit, DJ Schaefer, TS Powers, HW Turner, WH Weinberg. Combinatorial materials science and catalysis. Angew Chem 111:2648–2689; Angew Chem Int Ed 38:2494–2532, 1999. S Senkan. Combinatorial heterogeneous catalysis—a new path in an old ﬁeld. Angew Chem 113:322–341; Angew Chem Int Ed 40:312–329, 2001. WF Maier. Combinatorial chemistry—challenge and chance for the development of new catalysts and materials. Angew Chem 111:1294–1296; Angew Chem Int Ed 38:1216–1218, 1999. PP Pescarmona, JC van der Waal, IE Maxwell, T Maschmeyer. Combinatorial chemistry, high-speed screening and catalysis. Catal Lett 63:1–11, 1999. S Senkan, K Krantz, S Ozturk, V Zengin, I Onal. High-throughput testing of heterogeneous catalyst libraries using array microreactors and mass spectrometry. Angew Chem 111:2965–2971. Angew Chem Int Ed 38:2794–2799, 1999. C Hinderling, P Chen. Rapid screening of oleﬁn polymerization catalyst libraries by electrospray ionization tandem mass spectrometry. Angew Chem 111:2393–2396; Angew Chem Int Ed 38:2253–2256, 1999. G Smith, JA Leary. Diﬀerentiation of diastereomeric nickel(II) N-glycoside complexes using tandem mass spectrometry and kinetic energy release measurements. J Am Chem Soc 118:3293–3294, 1996. J Ramirez, F He, CB Lebrilla. Gas-phase chiral diﬀerentation of amino acid guests in cyclodextrin hosts. J Am Chem Soc 120:7387–7388, 1998. M Sawada, Y Takai, H Yamada, J Nishida, T Kaneda, R Arakawa, M Okamoto, K Hirose, T Tanaka, K Naemura. Chiral amino acid recognition detected by electrospray ionization (ESI) and fast atom bombardment (FAB) mass spectrometry (MS) coupled with the enantiomer-labelled (EL) guest method. J Chem Soc, Perkin Trans 2(3):701–710, 1998. DV Dearden, C Dejsupa, Y Liang, JS Bradshaw, RM Izatt. Intrinsic contributions to chiral recognition: discrimination between enantiomeric amines by dimethyldiketopyridino-18-crown-6 in the gas phase. J Am Chem Soc 119:353– 359, 1997. S Piccirillo, C Bosman, D Toja, A Giardini-Guidoni, M Pierini, A Troiani, M Speranza. Gas-phase enantiodiﬀerentation of chiral molecules: chiral recognition of 1-phenyl-1-propanol/2-butanol clusters by resonance enhanced multiphoton ionization spectroscopy. Angew Chem 119:1816–1818; Angew Chem, Int Ed Engl 36:1729–1731, 1997. EN Nikolaev, EV Denisov, MI Nikolaeva, JH Futrell, VS Rakov, FJ Winkler. Elucidation of inﬂuence of chirality on formation and decomposition of ion

598

56g. 56h. 56 i.

57.

58.

59a.

59b.

60. 61.

62. 63.

64. 65. 66.

67a. 67b.

68. 69.

Reetz molecular complexes in the dialkyltartrate class using mass spectrometry. Adv Mass Spectrom 14:279–313, 1998. A Filippi, A Giardini, S Piccirillo, M Speranza. Gas-phase enantioselectivity. Int J Mass Spectrom 198:137–163, 2000. M Sawada. Chiral recognition detected by fast atom bombardment mass spectrometry. Mass Spectrom Rev 16:73–90, 1997. WA Tao, RG Cooks. Parallel reactions for enantiomeric quantiﬁcation of peptides by mass spectrometry. Angew Chem 113:779–782; Angew Chem Int Ed 40:757–760, 2001. A Horeau, A Nouaille. Micromethod for determination of the conﬁguration of secondary alcohols by kinetic resolution. Use of mass spectroscopy. Tetrahedron Lett 31:2707–2710, 1990. J Guo, J Wu, G Siuzdak, MG Finn. Measurement of enantiomeric excess by kinetic resolution and mass spectrometry. Angew Chem 111:1868–1871; Angew Chem Int Ed 38:1755–1758, 1999. MT Reetz, MH Becker, H-W Klein. D Sto¨ckigt. A method for highthroughput screening of enantioselective catalysts. Angew Chem 111:1872– 1875; Angew Chem Int Ed 38:1758–1761, 1999. MT Reetz, MH Becker, D Sto¨ckigt, HW Klein. High throughput screening method for determining enantioselectivity. Patent application DE-A 199 13858.3 (26.3.1999). MT Reetz, A Eipper, H Krumm, M Hermes, A Funke, T Eggert, KE Jaeger, unpublished results. W Schrader, A Eipper, DJ Pugh, MT Reetz. Second generation MS-based high-throughput screening system for enantioselective catalysts and biocatalysts. Can J Chem 80:626–632, 2002. M Burk. Lecture at Chiral Europe, London, May 12, 2003. GA Korbel, G Lalic, MD Shair. Reaction microarrays: a method for rapidly determining the enantiomeric excess of thousands of samples. J Am Chem Soc 123:361–362, 2001. B Phimister. Going global. Nat Genet 21:1, 1999. P Abato, CT Seto. EMDee: an enzymatic method for determining enantiomeric excess. J Am Chem Soc 123:9206–9207, 2001. F Taran, C Gauchet, B Mohar, S Meunier, A Valleix, PY Renard, C Cre´minon, J Grassi, A Wagner, C Mioskowski. High-throughput screening of enantioselective catalysts by immunoassay. Angew Chem 114:132–135; Angew Chem Int Ed 41:124–127, 2002. MJ Shapiro, JS Gounarides. NMR methods utilized in combinatorial chemistry research. Prog Nucl Magn Reson Spectrosc 35:153–200, 1999. H Schro¨der, P Neidig, G Rosse´. High-throughput structure veriﬁcation of a substituted 4-phenylbenzopyran library by using 2D NMR techniques. Angew Chem 112:3974–3977; Angew Chem Int Ed 39:3816–3819, 2000. MT Reetz, A Eipper, P Tielmann, R Mynott. Studiengesellschaft Kohle mbH. Patent application DE-A 102 09 177.3, 2002. P Tielmann, M Boese, M Luft, MT Reetz. A practical high-throughput screening system for enantioselectivity using FTIR spectroscopy. Chem Eur J, in press.

27 Enzyme Engineering by Microbial Cell Surface Display Thorsten M. Adams and Harald Kolmar ¨t Go ¨ttingen Georg-August-Universita ¨ttingen, Germany Go

During recent years, structure-based protein design and directed evolution have been widely applied to engineer enzyme activity, speciﬁcity, or stability (1). Methodologies such as gene shuﬄing (2) and combinatorial mutagenesis (3) made it possible to generate diverse molecular repertoires of enzyme variants that were successfully screened for the desired improvements. Screening of large mutant libraries in the range of 104 to 107 diﬀerent variants is a crucial step in the process and often becomes the limiting factor (4,5). In cases where the desired enzyme function can be coupled to microbial growth or survival, selection may be applicable. Otherwise, single bacterial cells are clonally expanded and each population is individually tested for the desired activity. Commonly, the bacterial cells are compartmentalized, e.g., by transfer into microtiter plates, followed by cell lysis and testing the lysate for the desired novel or improved function. One of the most interesting advancements of recent molecular biotechnology is the ability to directly display peptides and proteins on the surface of host organisms. This breakthrough technology obviates cell lysis 599

600

Adams and Kolmar

and allows functional screening in a deﬁned environment. It opens new avenues for various applications such as the generation of bacterial live vaccines, whole cell biosorbents, cell-based diagnostics, and recombinant biocatalysts. Moreover, it allows one to apply very high throughput screening by ﬂuorescence-activated cell sorting (FACS) of combinatorial peptide and enzyme libraries for the desired function, including expression level, stability, ligand binding, and catalysis. In this chapter, we will mainly focus on novel approaches in cellular display of enzymes and their applications in enzyme technology. 1 1.1

DISPLAY STRATEGIES Escherichia coli Cell Surface Display

Numerous expression systems have been developed for the display of peptides and proteins on the surface of E. coli, which is the preferred host for the generation, propagation, and maintenance of large molecular repertoires that may be derived from over 1010 individual transformants. For microorganisms other than E. coli, the library size is limited by the transformation eﬃciency and, realistically, it cannot be much larger than 105 clones. To become exposed on the outer surface of an E. coli cell, the protein of interest, which is synthesized in the bacterial cytoplasm, has to pass two membranes, namely the cytoplasmic membrane and the outer membrane (Fig. 1). Surface exposition of a heterologous passenger protein is com-

Figure 1 E. coli cell surface display formats: (A) porins; (B) Lpp–OmpA fusion; (C) ﬁmbriae; (D) autotransporters; (E) ice-nucleation protein; (F) intimin; (P) passenger protein.

Enzyme Engineering by Microbial Cell Surface Display

601

monly achieved by genetic fusion of the passenger with a translocator protein that is completely or in part located on the outer surface of the microbial host cell. 1.1.1

Porins

Genetic insertion of a target sequence into the genes for outer membrane proteins is a frequently used strategy to enable membrane translocation and subsequent surface anchoring of the recombinant passenger gene products (6). Porins are abundant outer membrane proteins that constitute a h-barrel structure, where the h-strands traverse the outer membrane with the connecting loops facing either the periplasm or the cell surface (7). Short peptides with several dozens amino acids in length can be displayed on the cell surface via insertion into surface-exposed loops of porins such as OmpC and LamB (6). However, the position and the length of the target sequence plays a critical role in the eﬃcient display because sequences exceeding approximately 50–60 residues negatively interfere with the folding and membrane insertion of the carrier protein. Unfortunately, most if not all porins are inserted into the outer membrane in a way that both termini face the bacterial periplasm. This renders them unsuitable to serve as carboxy-terminal or amino-terminal fusion partners to achieve a cell surface exposition of the fused passenger protein. To overcome this drawback, Francisco et al. (8) developed the Lpp– OmpA system, a sophisticated display format based on a tripartite fusion protein consisting of the ﬁrst nine amino acids of the E. coli major outer membrane lipoprotein (Lpp), three membrane spanning h-strands comprising residues 46–159 of OmpA protein, and the protein of interest. In the shortest version of the targeting vehicle, a single membrane-spanning hstrand of OmpA is used together with the Lpp moiety (8,9). The aminoterminal cysteine residue of Lpp provides an outer membrane anchor, which consists of a cysteinyl–glycerol molecule to which two fatty acids are attached by two ester linkages and one fatty acid that is attached by an amide linkage (10). With this expression system, enzymes such as hlactamase (8), an organophosphorus hydrolase (11), Cellulomonas ﬁmi exoglucanase Cex as well as its cellulose binding domain (12), single-chain antibodies (scFv) (13), and a protease inhibitor (14) have been successfully displayed on the E. coli cell surface (Table 1). 1.1.2

Fimbriae

Fimbriae and ﬂagella of gram-negative bacteria are complex ﬁlamentous structures on the cell surface that are composed of thousands of copies of the respective ﬁmbrial or ﬂagellar protein. Flagella display is based on the

602 Table 1

Adams and Kolmar Examples of Microbial Display of Enzymes

Displayed enzyme h-Lactamase Organophosphorus hydrolase C. ﬁmi exoglucanase Cex h-Lactamase Levansucrase Organophosphorus hydrolase Carboxymethylcellulase OmpT protease Lipase Glucoamylase h-Glucosidase Carboxymethylcellulase

Display format Lpp–OmpA fusion Lpp–OmpA fusion Lpp–OmpA fusion AIDA INP INP INP OmpT a-Agglutinin fusion a-Agglutinin fusion a-Agglutinin fusion a-Agglutinin fusion

Display host E. E. E. E. E. E. E. E. S. S. S. S.

coli coli coli coli coli coli coli coli cerevisiae cerevisiae cerevisiae cerevisiae

Reference 8 11 12 25 27 11 28 62 48 45,49 50 50

fact that large peptides can be fused into the variable domain of the ﬂagellar major subunit FliC without loss of ﬂagellar synthesis and function. By inframe insertion of various bacterial adhesin gene fragments in a permissive site of the ﬂiC gene of E. coli, display of a number of peptides ranging from 30 up to over 300 amino acids in size in the E. coli ﬂagellum was achieved (15). Flagella display allows presentation of large peptides in thousands of intimately associated copies on the outer surface of E. coli and has proven a valuable tool for a variety of applications such as epitope mapping, binding analyses, or molecular studies of adhesin–receptor interactions (16). Similarly, ﬁmbriae displaying metal-binding motifs have been shown to work well for the sequestration of metals by recombinant E. coli cells (17,18). However, the passenger protein is inserted into the structural framework of the ﬂagellar/ﬁmbrial protein, which can hamper passenger protein folding or ﬁlament formation. 1.1.3

Autotransporters

The autotransporters form a family of secreted proteins from gram-negative bacteria. They possess an overall unifying structure comprising three functional domains: the amino-terminal leader sequence, the secreted passenger domain, and a carboxy-terminal h-domain that forms a h-barrel pore for the secretion of the passenger protein (19). The prototype of autotransporters is the IgA protease (IgAh) from Neisseria gonorrhoeae (20), where the aminoterminal protease domain is released into the culture medium after cell surface exposition and autoproteolytic cleavage (21). Autotransport and cell

Enzyme Engineering by Microbial Cell Surface Display

603

surface exposure of the amino-terminal domain of IgAh also functions if heterologously expressed in E. coli. Igah has been engineered by replacing the amino-terminal protease domain by the passenger polypeptide to be transported, as exempliﬁed by the cell surface display of cholera toxin B subunit (22) and a protease inhibitor of the squash family (9). However, it was found that overexpression of the fusion protein is highly toxic for E. coli, which makes library screening very diﬃcult (9). Recently, AIDA, another autotransporter from E. coli, the adhesin involved in diﬀuse adherence, was used to expose the cholera toxin B subunit (23), small T-cell epitopes, and the 11.6kDa B subunit of the E. coli heat labile toxin (LTB) (24) on the E. coli cell surface. Furthermore, it was possible to display an enzymatically active hlactamase on the E. coli surface via fusion to AIDA (25). 1.1.4

Ice-Nucleation Protein

The ice-nucleation protein (INP) of Pseudomonas syringae, which is capable of catalyzing the formation of ice in supercooled water, is attached to the outer surface of the bacterial cell via a glycosyl-phosphatidylinositol (GPI) anchor (26). INP was found to remain surface exposed when expressed in E. coli. Several proteins were successfully displayed on the cell surface of E. coli via genetic fusion to INP, such as levansucrase (27), organophosphorus hydrolase (11), carboxymethylcellulase (28), HIV gp120 (29), hepatitis B virus surface antigen (30), and synthetic phytochelatins (31). 1.1.5

Intimin

Intimins are members of a family of bacterial adhesins of pathogenic gramnegative bacteria, which speciﬁcally interact with diverse eukaryotic cell surface receptors (32). They are integrated into the bacterial outer membrane with their amino-terminal region, while the carboxy-terminal 280–300 amino acids are surface exposed. The cell binding activity of the EaeA intimin from enterohaemorrhagic E. coli has been localized to its C-terminal 280 residues and the structure of the carboxy-terminal domains has been determined (33,34). It is assumed that the amino-terminal 550 residues of intimin form a porin-like structure and are folded into an antiparallel h-barrel. The entire extracellular segment of intimin is an elongated and relatively rigid rod made up of three immunoglobulin-like domains and a C-terminal lectin-like domain to interact with the receptor. This domain resides on a rigid extracellular arm, which is most likely anchored to the amino-terminal transmembrane domain through a ﬂexible hinge made by two glycine residues allowing mechanical movement between the extracellular rod and the bacterial outer membrane (34). Obviously, intimin provides a structural

604

Adams and Kolmar

scaﬀold ideally suited for the cell surface display of receptor binding domains remote from the bacterial cell surface. Intimin variants have been constructed, where the two carboxy-terminal extracellular domains that mediate the adhesion of enteropathogenic and enterohaemorrhagic E. coli to target epithelia have been replaced by various passenger proteins. A derivative of the Ecballium elaterium trypsin inhibitor, the Bence–Jones protein REIv, human interleukin-4 (35), as well as calmodulin, ubiquitin, and h-lactamase inhibitor protein from Streptomyces clavuligerus, were eﬃciently targeted to the surface of E. coli cells (T. Adams, A. Wentzel, H. Kolmar, unpublished results). Approximately 30,000 passenger proteins were found to be surface exposed on a single E. coli cell (35). 1.2

Alternative Microbial Hosts

Surface display on gram-positive bacteria has also been taken into consideration mainly for vaccine development. Approaches based on attenuated mycobacteria, nonpathogenic staphylococci, streptococci, and lactococci, as well as Bacillus subtilis, have been developed (36). Single-chain antibody fragments (37) and the cellulose binding domain from Trichoderma reesei cellulase were displayed on recombinant staphylococci (38), which could serve as inexpensive tools in diagnostic tests and as novel types of microbial biocatalysts. The protozoan Tetrahymena has also been shown to be capable of displaying fusion proteins on its surface (39). Furthermore, several investigators have developed mammalian cell surface display formats (40–42). For the past several years, the expression of proteins on the surfaces of the yeast Saccharomyces cerevisiae has been very actively studied (for reviews, see Refs. 43,44). Exposition of proteins on the cell surface of S. cerevisiae oﬀers for some applications several advantages over bacterial display hosts: S. cerevisiae is widely used in industrial production of proteins and chemicals. Hence enzyme-coated yeast cells could be used as whole-cell catalysts for many biotransformations. To ﬁx heterologous enzymes to the cell wall of S. cerevisiae, Murai et al. (45) developed a yeast display system that relies on a tripartite fusion consisting of a secretion signal sequence, the passenger protein, and the glycosyl-phosphatidylinositol (GPI)-anchor attachment signal sequence of the native cell-wall-anchored protein a-agglutinin. Boder and Wittrup (43) used the small Aga2p-binding domain of the yeast a-agglutinin mating receptor as a cell wall anchor, which forms two disulﬁde bonds to the Aga1p cell-wall protein. a-Agglutinin is a mannoprotein involved in the mating of type a S. cerevisiae cells with mating type a cells. Examples for yeast cell surface display of heterologous proteins and enzymes are scFv antibody

Enzyme Engineering by Microbial Cell Surface Display

605

fragments (43), T cell receptors (46), human urokinase plasminogen activator epidermal growth factor-like domain (47), lipase (48), glucoamylase (45,49), h-glucosidase, and carboxymethylcellulase (50). 1.3

Choice of the Appropriate Display Format

At present, S. cerevisiae and E. coli are the preferred organisms for the display of populations of variant polypeptides. Yeast combines the advantages of a eukaryotic secretory pathway with the ease of manipulation of a single-celled microorganism, and E. coli, because of its high transformation eﬃciency, is generally the preferred host for the generation of combinatorial polypeptide libraries. The development of a robust and versatile E. coli cell surface display system was, for several years, hampered by the ﬁnding that overproduction of outer membrane fusions is often found to be associated with severe reductions in cell viability (13,35). Very little is known about the mechanism by which outer membrane proteins ﬁnd their way into the bacterial outer membrane and we do not know yet the reason for the growth defects. Several successful approaches were made to overcome that problem, including tight regulation of outer membrane fusion protein expression (13), careful adjustment of fusion protein net accumulation to a tolerable level, and utilization of well-tolerated autologous translocator proteins such as AIDA (23) or intimin (35). For applications where bacterial cells are used as biosorbents or as microparticles for enzyme display, cell viability may not be a major concern. Therefore it might be desirable to use E. coli cells carrying a maximum number of surface-exposed molecules per cell. High-level accumulation of passenger proteins on the E. coli cell surface exceeding 10,000 molecules per cell has been reported for several display formats (13,35). Common to all E. coli display systems is the requirement of the passenger protein to pass both the cytoplasmic and the outer membrane. As a consequence, these display systems underlie the same restrictions as ﬁlamentous phage display, where phage assembly occurs within the periplasmic space and therefore requires secretion of the passenger/coat–protein fusion. As a rule, those proteins that are secreted in their natural host are more likely amenable to successful surface exposition than cytoplasmic proteins. However, no rules or predictions can yet be made about which candidate passenger protein will become exposed on the surface of a particular host cell. Several cytoplasmic proteins have been displayed and some periplasmic proteins failed to be displayed. Nonetheless, proteins that are refractory to display can be optimized for cell surface display by random mutagenesis and ﬂow cytometry selection. Kieke et al. (51) have successfully applied a strategy of random mutagenesis and selection for surface expression of T cell receptor

606

Adams and Kolmar

(TCR) variants via labeling of the cells with a ﬂuorescent anti-TCR antibody followed by ﬂow cytometry screening. For the isolation of ligand-binding proteins from molecular libraries, periplasmic expression of the protein of interest may be suﬃcient, as long as the ligand is able to pass the E. coli outer membrane. Chen et al. (52) have described the PECS system (periplasmic expression with cytometric screening) where a ﬂuorescent conjugate of the ligand is used to incubate E. coli cells expressing a library of proteins that are secreted into the periplasmic space. Ligand molecules as large as about 10 kDa can enter the E. coli periplasm and equilibrate within the periplasmic space without compromising the cell’s integrity or viability. The bacterial cell envelope eﬀectively serves as a dialysis bag to selectively retain receptor–ﬂuorescent probe complexes but not free ligand. Flow cytometry screening of a bacterial cell population expressing variant antidigoxigenin scFv fragments was used to isolate cells with elevated ﬂuorescence, which were shown to produce scFv antibodies with higher aﬃnity to digoxigenin.

2 2.1

APPLICATIONS OF MICROBIAL CELL SURFACE DISPLAY Microbial Cells as Self-Amplifying Solid Supports

Cell surfaces can be regarded as solid supports for the immobilization of proteins, similar to the immobilization of proteins on microbeads. Conceptually, one can display an expressed protein on the surface of producing cells and then handle the cells as if they were beads of an inert support matrix. Display of proteins provides a means to circumvent separate expression, puriﬁcation, and immobilization of binding proteins and enzymes. An interesting aspect of bacterial cell surface display is the use of recombinant bacteria as bioadsobents for heavy metals. Metallothioneins that were inserted into the permissive loop of LamB multiplied the Cd2+ sequestration of recombinant E. coli by 20-fold (53). Even more intriguing is the possibility to engineer soil bacteria that are able to survive in polluted environments for an extended period of time. Valls et al. (54) fused the mouse metallothionein to the autotransporter domain of the IgA protease from N. gonorrhoeae and displayed it on the surface of Ralstonia metallidurans, which resulted in a threefold increase of Cd2+ binding. Enzymes have been displayed on cell surfaces for various applications. Yeast strains were constructed displaying active lipase of Rhizopus oryzae (48), glucoamylase mediating starch utilization (45), or cellulose utilization by coexpression of carboxymethylcellulase and h-glucosidase (50). Multivalent display in the context of adjuvant immune-stimulating components of the cell surface makes microbial display a promising avenue for vaccine development (55).

Enzyme Engineering by Microbial Cell Surface Display

2.2 2.2.1

607

Functional Screening of Cell-Surface-Exposed Enzyme Libraries Direct Positive Selection

Direct methods for screening or selection link improved enzyme activity to colony phenotypes or to the survival or growth rates of cells, respectively. Examples of this method include colony screening on plates using chromogenic substrates (56), selection on plates containing increasing antibiotic concentrations (2), and complementation selection with auxotrophs (57). However, intracellular expression of the enzyme of interest requires that it does not negatively interfere with cellular metabolism, that the enzymatic activity can be distinguished from the background of all other cell reactions, and that an externally added substrate readily enters the cytoplasm of the enzyme producing cell. This restriction can be overcome by displaying the protein of interest on the surface of the microbial cell. Recently, the ice nucleation protein (INP)-based bacterial surface display system has been used to selectively screen enzyme libraries for improved catalytic activity of carboxymethyl cellulase (CMCase) (28). The substrate of this enzyme, carboxymethyl cellulose (CMC), is a high-molecular-weight polymer, which is not transported into cells. As a result, only cells displaying CMCase on their surface are able to hydrolyze CMC in agar plates and can easily be identiﬁed because they are surrounded by a clear halo after Congo red staining. Furthermore, growth rates of E. coli cells displaying CMCase variants on minimal medium containing CMC as the sole carbon source were found to be correlated with the activities of displayed CMCase variants. As a consequence, by selecting rapidly growing colonies, cells containing improved CMCase variants with ﬁvefold increased activity could be isolated (28). 2.2.2

Screening of Microbial Populations by Flow Cytometry

One major advantage of microbial surface display over other display formats lies in the ability to use ﬂuorescence-activated cell sorting (FACS) for very high throughput screening of polypeptide libraries. As more powerful combinatorial mutagenesis methods are now available, designing screening strategies becomes the most critical step in the successful exploitation of molecular diversity. In this respect, ﬂow cytometry has been established over the recent years as a powerful tool for the screening of microbial populations for elevated gene expression, enhanced catalytic performance, or improved binding capabilities, as detailed in the following sections. To isolate proteins with enhanced binding capabilities to a particular ligand, the library of cells, where each cell individually displays numerous copies of a unique protein variant, is incubated with a ﬂuorescently labeled ligand. After thorough washing, only cells displaying a protein variant with

608

Adams and Kolmar

aﬃnity to the ligand remain ﬂuorescent and are isolated using FACS (Fig. 2). Several methods have been described to introduce a ﬂuorescent label into the protein of interest (Fig. 3). Fluorescent reporter groups can be directly introduced by chemical coupling. Protein ligands that are produced by heterologous gene expression can be engineered such that they contain an additional epitope sequence, which is recognized by a monoclonal antibody. Ligand binding can then be detected by consecutive incubation of the cell

Figure 2 Combinatorial library screening by FACS. A library of microbial cells displaying a protein of interest is incubated with a ﬂuorescent ligand. Cells that are capable of binding the ligand are detected by the LASER optics of the ﬂuorescence activated cell sorter (FACS). The ﬂow cytometer nozzle is vibrated at a high frequency, which causes the microscopic ﬂuid stream to break into discrete droplets. As a ﬂuorescent cell enclosed in a droplet reaches the droplet break-oﬀ point, it receives a positive or negative charge. As the droplets individually pass through two vertical deﬂection plates, the electric ﬁeld created by those plates directs them a collection vial. Uncharged droplets ﬂow into a waste receptacle. Positive cells are cultivated overnight on agar plates to prevent loss of clones that grow more slowly than others. Colonies are scraped oﬀ and used to inoculate liquid medium, to which, at an appropriate optical density, inducer is added to induce expression of the translocator/passenger protein fusion. This procedure is repeated for several rounds until ﬂuorescent cells are enriched.

Enzyme Engineering by Microbial Cell Surface Display

609

Figure 3 Procedures for ﬂuorescent detection of protein/ligand interaction. (A) Fluorophore-coupled ligand. (B) Fluorophore-coupled antibody. (C) Quarternary complexes generated by consecutive rounds of incubation with ligand, primary antibody, biotinylated second antibody, and streptavidin, R-phycoerythrin conjugate. (D) Labeling with steptavidin-coated magnetic beads.

population with the ligand protein followed by incubation with ﬂuorescencelabeled anti-epitope antibody. Cell labeling can also be achieved by application of consecutive rounds of incubation with ligand protein, anti-epitope antibody, biotinylated second antibody followed by incubation with streptavidin, R-phycoerythrin conjugate. All these compounds for indirect ﬂuorescent labeling are commercially available. By using a polyclonal second antibody that is biotinylated at multiple sites in conjunction with a streptavidin, R-phycoerythrin conjugate, which contains 35 or more ﬂuorophores depending on the organism of origin (58), a dramatic signal ampliﬁcation can be achieved because numerous ﬂuorophores are bound per cell surface exposed protein variant. As a consequence, less than 1000 surface displayed protein molecules per microbial cell are suﬃcient to achieve a suitable signalto-noise ratio for detecting ligand-binding cells (35). To demonstrate the feasibility of indirect cell labeling and FACS for isolation of rare binders, Wentzel et al. (35) have recently described a model experiment where 100 E. coli cells displaying a particular epitope sequence were mixed with 109 control cells. Epitope displaying E. coli variants could be isolated from the

610

Adams and Kolmar

1:107 mixture after only three consecutive rounds of cell labeling, sorting, and recultivation of the fraction of enriched cells. The major advantage of ﬂuorescence-activated cell sorting as a tool for high-throughput screening lies in the ability to perform biological assays on large populations in solution with single-cell resolution. Flow cytometric analysis of cell surface binding of ﬂuorescence-labeled ligands provides linear quantitation of binding constants and dissociation rates in situ and surface expression across several orders of magnitude. Analysis of typically more than 10,000 protein molecules per cell surface eﬀectively eliminates the stochastic uncertainty inherent in scaﬀolds displaying only a few protein molecules. FACS screening of a library of single chain Fv antibody fragments displayed on S. cerevisiae allowed Boder et al. (59) to isolate variants with femtomolar antigen-binding aﬃnity, the highest ligand-binding aﬃnity yet reported for a monovalent protein. With modern ﬂow cytometers, such as the FACSVantage from Becton Dickinson or the MoFlo from Cytomation Inc., it is nowadays possible to sort cell populations at rates of not less than approximately 90,000 cells per second (60). Hence a total of approximately 3 108 cells can be screened per hour. Under the assumption that each clone of a library should be represented by at least three bacterial cells presenting a particular variant on their cell surface, molecular repertoires represented by up to 109 diﬀerent variants can be screened in 1 day. Because after one sorting round, the number of FACS-positive cells is usually less than 1/1000 of the initial population, an accordingly smaller cell population has to be screened in the next sorting round requiring only minutes of sorting time. If necessary, libraries exceeding 109 members can be processed in the ﬁrst sorting round using magnetic cell sorting to pre-enrich target cells. To achieve this, streptavidin- or antibodycoated superparamagnetic microbeads are used instead of streptavidin, Rphycoerythrin conjugate in the labeling scheme described above, and cells are captured by passage of the cell population through a separation column, which is placed in a strong permanent magnet. The column matrix serves to create a high-gradient magnetic ﬁeld. The magnetically labeled cells are retained in the column while nonlabeled cells pass through (14). After removal of the column from the magnetic ﬁeld, the magnetically retained cells are eluted. A single-pass enrichment ratio of over 1000-fold has been reported (61). Over 1011 bacterial cells can be handled in parallel in a single experiment, thus allowing one to screen large repertoires with reasonable library oversampling (T. Adams, unpublished results). Olsen et al. (62) recently developed a technology that allows one to screen by ﬂow cytometry a library of enzyme displaying E. coli cells for rare variants with enhanced catalytic turnover. This has been achieved by using a substrate molecule, which becomes converted into a ﬂuorescent product

Enzyme Engineering by Microbial Cell Surface Display

611

upon catalytic turnover. Because this product attaches to the surface of the enzyme-displaying cell, a direct correlation between turnover rate of the cell-exposed enzyme variant and the cellular ﬂuorescence is established. To obtain such a linkage, Olsen et al. (62) used a cell-surface-associated ﬂuorescence resonance energy transfer (FRET) substrate, which consist of a ﬂuorophore, a scissile bond to be cleaved by the desired enzyme, a quenching ﬂuorophore, and a positively charged moiety to direct the substrate to the negatively charged cell envelope. Enzymatic cleavage of the scissile bond (Fig. 4) disrupts FRET quenching; the quencher is released from the cell and

Figure 4 Binding of FRET substrate to the cell surface of E. coli cells displaying a protein library. The positively charged FRET substrate is attached to the negatively charged polysaccharide matrix of the cell surface. Upon catalytic turnover, the FRET substrate displays FL ﬂuorescence, which is otherwise quenched by Q. FL: BODIPY; Q: tetramethylrhodamine.

612

Adams and Kolmar

drifts away, while the ﬂuorophore remains attached to the cell surface because of the overall positive charge. The research group isolated a mutant of OmpT protease with a 60-fold increase in catalytic activity toward a nonpreferred substrate in a single round of screening from a library of about 2 106 variants (62). 3

CONCLUSION

Bacteria and yeast displaying heterologous receptors or enzymes on their surface hold great potential as whole-cell adsorbents and biocatalysts. Microbes, where the target protein is covalently attached to the cell surface, can be regarded as living and self-amplifying microbeads, and are valuable matrices for various analytical and biotechnological applications. Numerous powerful methodologies are nowadays available for the surface exposure of heterologous proteins in various microbial hosts, ranging from E. coli cells and gram-positive bacteria to yeasts. Upon microbial surface exposure, the protein of interest becomes directly accessible to potential interaction partners, enzyme substrates, or inhibitors. Substantial progress has been made in the past few years in the development of new tools for the generation of very large mutant enzyme libraries. Although fairly large improvements have been made in parallel in the development of screening tools for the isolation of enzymes with enhanced catalytic performance, only a fraction of the generated library clones, which may exceed 108 diﬀerent mutants, can be screened by application of conventional enzyme activity assays in a microplate format even with sophisticated automated robotic systems. Current FACS technology allows the screening of approximately 109 variant cells per day. Promising examples of FACS screening of cell-based libraries for a desired enzyme function have been recently described. However, for many interesting enzyme-catalyzed reactions, further work remains to be invested into the development of strategies for linking enzyme performance to a corresponding ﬂuorescence signal that can be readout by ﬂow cytometry. Nevertheless, the ability to quantitatively screen libraries of very large size not only for folding stability, protein interaction, and inhibitor binding, but also for catalytic activity, opens new avenues to the directed evolution of enzymes.

REFERENCES 1.

2.

P Forrer, S Jung, A Pluckthun. Beyond binding: using phage display to select for structure, folding and enzymatic activity in proteins. Curr Opin Struct Biol 9:514–520, 1999. WP Stemmer. Rapid evolution of a protein in vitro by DNA shuﬄing. Nature 370:389–391, 1994.

Enzyme Engineering by Microbial Cell Surface Display 3. 4.

5. 6. 7.

8.

9.

10.

11. 12.

13.

14.

15.

16. 17.

18. 19.

613

JF Reidhaar-Olson, RT Sauer. Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science 241:53–57, 1988. N Cohen, S Abramov, Y Dror, A Freeman. In vitro enzyme evolution: the screening challenge of isolating the one in a million. Trends Biotechnol 19:507– 510, 2001. M Olsen, B Iverson, G Georgiou. High-throughput screening of enzyme libraries. Curr Opin Biotechnol 11:331–337, 2000. H Lang. Outer membrane proteins as surface display systems. Int J Med Microbiol 290:579–585, 2000. SW Cowan, T Schirmer, G Rummel, M Steiert, R Ghosh, RA Pauptit, JN Jansonius, JP Rosenbusch. Crystal structures explain functional properties of two E. coli porins. Nature 358:727–733, 1992. JA Francisco, CF Earhart, G Georgiou. Transport and anchoring of betalactamase to the external surface of Escherichia coli. Proc Natl Acad Sci U S A 89:2713–2717, 1992. A Wentzel, A Christmann, R Kratzner, H Kolmar. Sequence requirements of the GPNG beta-turn of the Ecballium elaterium trypsin inhibitor II explored by combinatorial library screening. J Biol Chem 274:21037–21043, 1999. J Ghrayeb, M Inouye. Nine amino acid residues at the NH2-terminal of lipoprotein are suﬃcient for its modiﬁcation, processing, and localization in the outer membrane of Escherichia coli. J Biol Chem 259:463–467, 1984. M Shimazu, A Mulchandani, W Chen. Cell surface display of organophosphorus hydrolase using ice nucleation protein. Biotechnol Prog 17:76–80, 2001. JA Francisco, C Stathopoulos, RA Warren, DG Kilburn, G Georgiou. Speciﬁc adhesion and hydrolysis of cellulose by intact Escherichia coli expressing surface anchored cellulase or cellulose binding domains. Biotechnology (N Y) 11: 491–495, 1993. PS Daugherty, MJ Olsen, BL Iverson, G Georgiou. Development of an optimized expression system for the screening of antibody libraries displayed on the Escherichia coli surface. Protein Eng 12:613–621, 1999. A Christmann, K Walter, A Wentzel, R Kratzner, H Kolmar. The cystine knot of a squash-type protease inhibitor as a structural scaﬀold for Escherichia coli cell surface display of conformationally constrained peptides. Protein Eng 12:797–806, 1999. B Westerlund-Wikstrom, J Tanskanen, R Virkola, J Hacker, M Lindberg, M Skurnik, TK Korhonen. Functional expression of adhesive peptides as fusions to Escherichia coli ﬂagellin. Protein Eng 10:1319–1326, 1997. B Westerlund-Wikstrom. Peptide display on bacterial ﬂagella: principles and applications. Int J Med Microbiol 290:223–230, 2000. K Kjaergaard, MA Schembri, P Klemm. Novel Zn(2+)-chelating peptides selected from a ﬁmbria-displayed random peptide library. Appl Environ Microbiol 67:5467–5473, 2001. MA Schembri, K Kjaergaard, P Klemm. Bioaccumulation of heavy metals by ﬁmbrial designer adhesins. FEMS Microbiol Lett 170:363–371, 1999. IR Henderson, F Navarro-Garcia, JP Nataro. The great escape: structure and function of the autotransporter proteins. Trends Microbiol 6:370–378, 1998.

614

Adams and Kolmar

20. J Pohlner, R Halter, K Beyreuther, TF Meyer. Gene structure and extracellular secretion of Neisseria gonorrhoeae IgA protease. Nature 325:458–462, 1987. 21. J Jose, F Jahnig, TF Meyer. Common structural features of IgA1 proteaselike outer membrane protein autotransporters. Mol Microbiol 18:378–380, 1995. 22. T Klauser, J Pohlner, TF Meyer. Selective extracellular release of cholera toxin B subunit by Escherichia coli: dissection of Neisseria Iga beta-mediated outer membrane transport. EMBO J 11:2327–2335, 1992. 23. J Maurer, J Jose, TF Meyer. Autodisplay: one-component system for eﬃcient surface display and release of soluble recombinant proteins from Escherichia coli. J Bacteriol 179:794–804, 1997. 24. MP Konieczny, M Suhr, A Noll, IB Autenrieth, M Alexander Schmidt. Cell surface presentation of recombinant (poly-) peptides including functional T-cell epitopes by the AIDA autotransporter system. FEMS Immunol Med Microbiol 27:321–332, 2000. 25. CT Lattemann, J Maurer, E Gerland, TF Meyer. Autodisplay: functional display of active beta-lactamase on the surface of Escherichia coli by the AIDAI autotransporter. J Bacteriol 182:3726–3733, 2000. 26. LM Kozloﬀ, MA Turner, F Arellano. Formation of bacterial membrane icenucleating lipoglycoprotein complexes. J Bacteriol 173:6528–6536, 1991. 27. HC Jung, JM Lebeault, JG Pan. Surface display of Zymomonas mobilis levansucrase by using the ice-nucleation protein of Pseudomonas syringae. Nat Biotechnol 16:576–580, 1998. 28. YS Kim, HC Jung, JG Pan. Bacterial cell surface display of an enzyme library for selective screening of improved cellulase variants. Appl Environ Microbiol 66:788–793, 2000. 29. YD Kwak, SK Yoo, EJ Kim. Cell surface display of human immunodeﬁciency virus type 1 gp120 on Escherichia coli by using ice nucleation protein. Clin Diagn Lab Immunol 6:499–503, 1999. 30. EJ Kim, SK Yoo. Cell surface display of hepatitis B virus surface antigen by using Pseudomonas syringae ice nucleation protein. Lett Appl Microbiol 29:292–297, 1999. 31. W Bae, A Mulchandani, W Chen. Cell surface display of synthetic phytochelatins using ice nucleation protein for enhanced heavy metal bioaccumulation. J Inorg Biochem 88:223–227, 2002. 32. BA Vallance, BB Finlay. Exploitation of host cells by enteropathogenic Escherichia coli. Proc Natl Acad Sci U S A 97:8799–8806, 2000. 33. M Batchelor, S Prasannan, S Daniell, S Reece, I Connerton, G Bloomberg, G Dougan, G Frankel, S Matthews. Structural basis for recognition of the translocated intimin receptor (Tir) by intimin from enteropathogenic Escherichia coli. EMBO J 19:2452–2464, 2000. 34. Y Luo, EA Frey, RA Pfuetzner, AL Creagh, DG Knoechel, CA Haynes, BB Finlay, NC Strynadka. Crystal structure of enteropathogenic Escherichia coli intimin–receptor complex. Nature 405:1073–1077, 2000. 35. A Wentzel, A Christmann, T Adams, H Kolmar. Display of passenger proteins

Enzyme Engineering by Microbial Cell Surface Display

36. 37.

38.

39.

40.

41.

42.

43. 44. 45.

46.

47.

48. 49.

50.

615

on the surface of Escherichia coli K-12 by the enterohemorrhagic E. coli intimin EaeA. J Bacteriol 183:7273–7284, 2001. M Hansson, P Samuelson, E Gunneriusson, S Stahl. Surface display on gram positive bacteria. Comb Chem High Throughput Screen 4:171–184, 2001. E Gunneriusson, P Samuelson, M Uhlen, PA Nygren, S Stahl. Surface display of a functional single-chain Fv antibody on staphylococci. J Bacteriol 178: 1341–1346, 1996. J Lehtio, H Wernerus, P Samuelson, TT Teeri, S Stahl. Directed immobilization of recombinant staphylococci on cotton ﬁbers by functional display of a fungal cellulose-binding domain. FEMS Microbiol Lett 195:197–204, 2001. J Gaertig, Y Gao, T Tishgarten, TG Clark, HW Dickerson. Surface display of a parasite antigen in the ciliate Tetrahymena thermophila. Nat Biotechnol 17:462– 465, 1999. P Holmes, M Al-Rubeai. Improved cell line development by a high throughput aﬃnity capture surface display technique to select for high secretors. J Immunol Methods 230:141–147, 1999. JD Chesnut, AR Baytan, M Russell, MP Chang, A Bernard, IH Maxwell, JP Hoeﬄer. Selective isolation of transiently transfected cells from a mammalian cell population with vectors expressing a membrane anchored single-chain antibody. J Immunol Methods 193:17–27, 1996. WC Chou, KW Liao, YC Lo, SY Jiang, MY Yeh, SR Roﬄer. Expression of chimeric monomer and dimer proteins on the plasma membrane of mammalian cells. Biotechnol Bioeng 65:160–169, 1999. ET Boder, KD Wittrup. Yeast surface display for screening combinatorial polypeptide libraries. Nat Biotechnol 15:553–557, 1997. ET Boder, KD Wittrup. Yeast surface display for directed evolution of protein expression, aﬃnity, and stability. Methods Enzymol 328:430–444, 2000. T Murai, M Ueda, M Yamamura, H Atomi, Y Shibasaki, N Kamasawa, M Osumi, T Amachi, A Tanaka. Construction of a starch-utilizing yeast by cell surface engineering. Appl Environ Microbiol 63:1362–1366, 1997. PD Holler, PO Holman, EV Shusta, S O’Herrin, KD Wittrup, DM Kranz. In vitro evolution of a T cell receptor with high aﬃnity for peptide/MHC. Proc Natl Acad Sci U S A 97:5387–5392, 2000. JR Stratton-Thomas, HY Min, SE Kaufman, CY Chiu, GT Mullenbach, S Rosenberg. Yeast expression and phagemid display of the human urokinase plasminogen activator epidermal growth factor-like domain. Protein Eng 8:463– 470, 1995. M Washida, S Takahashi, M Ueda, A Tanaka. Spacer-mediated display of active lipase on the yeast cell surface. Appl Microbiol Biotechnol 56:681–686, 2001. Y Shibasaki, N Kamasawa, S Shibasaki, W Zou, T Murai, M Ueda, A Tanaka, M Osumi. Cytochemical evaluation of localization and secretion of a heterologous enzyme displayed on yeast cell surface. FEMS Microbiol Lett 192:243– 248, 2000. T Murai, M Ueda, T Kawaguchi, M Arai, A Tanaka. Assimilation of cellooligosaccharides by a cell surface-engineered yeast expressing beta-glucosi-

616

51.

52.

53. 54.

55. 56. 57.

58.

59.

60.

61.

62.

Adams and Kolmar dase and carboxymethylcellulase from Aspergillus aculeatus. Appl Environ Microbiol 64:4857–48561, 1998. MC Kieke, EV Shusta, ET Boder, L Teyton, KD Wittrup, DM Kranz. Selection of functional T cell receptor mutants from a yeast surface-display library. Proc Natl Acad Sci U S A 96:5651–5656, 1999. G Chen, A Hayhurst, JG Thomas, BR Harvey, BL Iverson, G Georgiou. Isolation of high-aﬃnity ligand-binding proteins by periplasmic expression with cytometric screening (PECS). Nat Biotechnol 19:537–542, 2001. C Sousa, A Cebolla, V de Lorenzo. Enhanced metalloadsorption of bacterial cells displaying poly-His peptides. Nat Biotechnol 14:1017–10120, 1996. M Valls, S Atrian, V de Lorenzo, LA Fernandez. Engineering a mouse metallothionein on the cell surface of Ralstonia eutropha CH34 for immobilization of heavy metals in soil. Nat Biotechnol 18:661–665, 2000. JS Lee, KS Shin, JG Pan, CJ Kim. Surface-displayed viral antigens on Salmonella carrier vaccine. Nat Biotechnol 18:645–648, 2000. D Wahler, JL Reymond. Novel methods for biocatalyst screening. Curr Opin Chem Biol 5:152–158, 2001. JA Smiley, SJ Benkovic. Selection of catalytic antibodies for a biosynthetic reaction from a combinatorial cDNA library by complementation of an auxotrophic Escherichia coli: antibodies for orotate decarboxylation. Proc Natl Acad Sci USA 91:8319–8323, 1994. S Ritter, RG Hiller, PM Wrench, W Welte, K Diederichs. Crystal structure of a phycourobilin-containing phycoerythrin at 1.90-A˚ resolution. J Struct Biol 126: 86–97, 1999. ET Boder, KS Midelfort, KD Wittrup. Directed evolution of antibody fragments with monovalent femtomolar antigen-binding aﬃnity. Proc Natl Acad Sci U S A 97:10701–10705, 2000. RG Ashcroft, PA Lopez. Commercial high speed machines open new opportunities in high throughput ﬂow cytometry (HTFC). J Immunol Methods 243: 13–24, 2000. YA Yeung, KD Wittrup. Quantitative screening of yeast surface-displayed polypeptide libraries by magnetic bead capture. Biotechnol Prog 18:212–220, 2002. MJ Olsen, D Stephens, D Griﬃths, P Daugherty, G Georgiou, BL Iverson. Function-based isolation of novel enzymes from a large library. Nat Biotechnol 18:1071–1074, 2000.

28 Overexpression and Secretion of Biocatalysts in Pseudomonas Frank Rosenau and Karl-Erich Jaeger ¨t Du ¨sseldorf Heinrich-Heine-Universita Ju ¨lich, Germany

1

INTRODUCTION

Enzymes are naturally occurring biocatalysts operating in living cells. During biological evolution for billions of years, they have been optimized to catalyze a given reaction with high activity and substrate speciﬁcity. Nowadays, several weeks are suﬃcient to mimic natural evolution in a test tube by using ‘‘directed’’ or in vitro evolution, during which enzyme variants with desired properties are identiﬁed in large libraries of mutated genes. In principle, this technique does not require knowledge of the enzymes’ structure, its catalytic mechanism, or biosynthesis, making it a powerful novel tool for enzyme optimization (1–5). The directed evolution of a given biocatalyst is a multistep process including the (a) identiﬁcation of a candidate enzyme, which preferably should have a catalytic activity toward the substrate of interest; (b) cloning of the respective enzyme gene; (c) generation of a large number of mutant genes; (d) expression of these genes to generate large libraries of enzyme variants; (e) identiﬁcation of better-performing biocatalysts in the libraries by high-throughput screening or selection; and, ﬁnally, (f ) produc617

618

Rosenau and Jaeger

tion of the best-performing biocatalyst at a large scale. Therefore, the construction of a potent overexpression system for a gene of interest constitutes a major part of devising an eﬃcient directed evolution strategy. 2

HOW TO DEVISE AN EFFICIENT OVEREXPRESSION SYSTEM

An eﬃcient overexpression system consists of a vector harboring the gene(s) of interest behind strong and inducible promoters, which may be under the control of regulatory elements allowing controlled gene expression in prokaryotic or eukaryotic cells. In bacteria, tightly regulated promoters, such as the EPL and EPR promoters derived from the Escherichia coli bacteriophage E or the E. coli lac operon-based promoters Plac, Ptac, and Ptrc, are used (6). A suitable host strain allows easy handling, has a short generation time when grown in a variety of diﬀerent media, and does not require extreme conditions such as high temperatures or an anoxigenic environment. Additionally, such a strain should allow DNA transformation with high eﬃciency. The expression level to be achieved should be high enough to allow high-throughput screening, which is usually performed in microtiter plates (culture volume: 10–100 AL) after growth of the cultures for several hours. Among other factors, the throughput of a screening method per unit of time is determined by the amount of biocatalyst produced per cell. Furthermore, a suitable expression system should oﬀer the possibility to produce the optimized biocatalyst at a larger scale, preferably also allowing for a cost-eﬀective downstream processing. Therefore, secretion of the biocatalyst into the culture supernatant should also be envisaged. The best known bacterial overexpression system uses E. coli host strains and the pET vector series (commercialized by Novagen, Madison) with the expression of genes from a strong promoter derived from the bacteriophage T7. Another popular example is the pBAD system marketed by Invitrogen (Carlsbad), which involves a promoter that can transiently be induced by the addition of arabinose to the culture medium. However, a considerable number of biocatalyst proteins cannot be expressed using one of these systems because their production requires several accessory cellular functions. Some enzymes may need essential cofactors for activity and unique chaperones for folding. Moreover, secretion may also be essential to achieve an enzymatically active state of the enzyme. Therefore, the development of an eﬃcient system allowing for overexpression and secretion is not trivial, and requires a detailed knowledge of the biocatalyst protein, its biochemical properties, and the cellular pathway involved in its biosynthesis, folding, and secretion. Bacterial lipases originating from the genera

Biocatalysts in Pseudomonas

619

Pseudomonas and Burkholderia represent prototypic examples for such biocatalysts. They do have an exceptionally high potential for numerous biotechnological applications (1,5,7–12), but a complex pathway of folding and secretion is required to obtain enzymatically active protein (13). 3

BOTTLENECKS FOR OVEREXPRESSION OF LIPASES

Many bacterial lipases are secreted into the environment. Therefore, their transcription, folding, and secretion represent potential bottlenecks for lipase production (see Fig. 1). Lipases from the Gram-negative bacterial genera Pseudomonas and Burkholderia belong to three distinct groups of lipase family I, which is further divided into six subfamilies (14). Prototype lipases of subfamilies I.1 and II.2 are those from P. aeruginosa and Burkholderia glumae, which not only share a high degree of sequence homology but also several physiological features including their biogenesis and secretion. The lipase LipA produced by P. aeruginosa is encoded in a bicistronic operon together with its cognate foldase, Lif (15). Under physiological growth conditions, the transcription and translation of this lipase are downregulated to a level that results in the production of only about 200 lipase molecules per cell (16). The physiological regulation of transcription can be circumvented by placing the lipase operon behind an artiﬁcial promoter; however, it has been demonstrated that the relation of lipase to foldase molecules also determines the yield of extracellular lipase. Recently, we have shown for both P. aeruginosa and B. glumae that overproduction of the Lif foldases in relation to the corresponding lipases resulted in a signiﬁcant increase in enzymatically active extracellular lipases by at least a factor of 20, indicating that the ratio of lipase to foldase indeed represents an important bottleneck for lipase overexpression (M El Khattabi, F Rosenau, W Bitter, K-E Jaeger, J Tommassen, submitted for publication). The secretion of P. aeruginosa lipase is a two-step process involving the translocation across both the inner membrane and the outer membrane (13). The signal peptide-dependent translocation into the periplasm is mediated by the Sec machinery (17). The existing experimental data suggest that the capacity of the Sec machinery does not represent a limiting factor during lipase overexpression. After having reached the periplasm, proper folding of the lipase and of several other biocatalysts requires the correct formation of disulphide bonds mediated by the Dsb proteins (reviewed in Ref. 18). In P. aeruginosa, DsbA and DsbC are absolutely required for lipase to reach a secretioncompetent conformation (19,20), thereby demonstrating that periplasmic folding represents another bottleneck for overexpression. Finally, secretion through the bacterial outer membrane occurs via a multisubunit protein

620

Rosenau and Jaeger

Figure 1 Bottlenecks (A-C) for overexpression of P. aeruginosa lipase. The lipase structural gene lip is located in an operon together with a second gene lif encoding a lipase-speciﬁc foldase. Biosynthesis requires the eﬃcient transcription of the operon (A) for which at least two diﬀerent promoters (P1 and P2) have been identiﬁed. Both the amount and the stability of the mRNA inﬂuence the eﬃciency of translation (B). Newly synthesized lipase is translocated into the periplasm (p) via the Sec-machinery located in the inner membrane (i.m.), where the signal sequence (ss) is removed by a speciﬁc signal peptidase. Periplasmic folding of the lipase (C) requires the action of Lif and of additional folding catalysts including Dsb proteins, which catalyze the formation of disulphide bonds. Rapid degradation by periplasmic proteases is avoided by correct folding of lipase molecules, which are subsequently recognized by the Xcp-machinery (D) and translocated across the outer membrane (o.m.) into the extracellular medium.

Biocatalysts in Pseudomonas

621

complex named the type II secretion machinery (21,22). Obviously, only correctly folded proteins are recognized by certain components of this machinery, whereas others are rapidly degraded by periplasmic proteases (19,20), deﬁning this recognition process as another potential bottleneck during overexpression. 4

OVEREXPRESSION OF LIPASES IN PSEUDOMONAS

First of all, overexpression requires an eﬃcient transcription of a target gene. This fact has been generally accepted since the outstanding work of Tabor and Richardson, who established an overexpression system in E. coli that uses an RNA polymerase encoded by the E. coli bacteriophage T7. The concept is based on the high processivity of this enzyme and its exceptionally high speciﬁcity for T7-derived promoters (23,24). Due to its unusual processivity, T7-RNA polymerase requires tight regulation to ensure a low basal level of expression under noninducing conditions to avoid background expression of target proteins, which may turn out to be harmful or deleterious to the host cells. Modern plasmids such as the pET series (Novagen), which are devised for T7-RNA polymerase-dependent expression in E. coli, therefore contain additional lac operator sequences preceding the promoter site and also encode extra copies of the lac repressor gene to reduce the basal gene expression. Upon induction of target gene expression, high amounts of mRNA and subsequently of protein are produced. However, the production as such of a high transcript quantity does not always guarantee the highlevel production of the respective protein because it is often accompanied by misfolding and intracellular deposition of the protein in the form of insoluble inclusion bodies. The molecular mechanisms resulting in the formation of inclusion bodies are still largely unknown (25). Despite these drawbacks, E. coli-based systems using inducible T7-RNA polymerase have proven their potential for the overexpression of target genes from various sources and are commercially available in a number of variants. Moreover, the unique features of T7-RNA polymerase have been used to increase the expression levels of several target genes in yeast, plant, or mammalian cells (26–28). The T7 overexpression system has also been adapted to construct P. aeruginosa overexpression strains (6,29–33). In Table 1, several currently available plasmids, which all harbor an inducible T7 promoter, are listed. The novel expression vector, pBBR22b (Fig. 2), is based on the mobilizable broad host range vector, pBBR1MCS (30). An AseI/PvuI fragment containing the multiple cloning site of pBBR1MCS, the promoter region, and the lacZa reporter gene allowing for blue-white selection in E. coli where exchanged by a PshAI/PpuMI fragment derived from the commercially

622

Rosenau and Jaeger

Table 1 Broad Host Range Plasmids for T7-RNA Polymerase-Dependent Protein Expression in P. aeruginosa Plasmids

Selectable markers

Additional features

pBBR22b

cmr

pBSPIIKS/pBSPIISK pEB12 pEB14 pBBR1MCS pUCPKS/pUCPSK

ampr ampr ampr cmr ampr

lac operator, lacIq gene, pelBsignal sequence, His tag Blue-white selection Multiple terminators lac operator Blue-white selection Blue-white selection

References This chapter

(12) (29) (29) (30) (31)

available expression vector, pET22b (Novagen). Unlike pET22b, the resulting vector, pBBR22b, can replicate in a variety of gram-negative bacteria and is therefore suitable for the overexpression of diﬀerent target genes in strains other than E. coli. It harbors in combination a T7 promoter (PT7) and a lac operator (lacO)—a feature typical for vectors of the pET series. In expression strains that provide the Lac repressor for transcriptional control of the T7-RNA polymerase gene, this leads to a signiﬁcant reduction of background expression under noninducing conditions. This eﬀect is even further increased by a constitutively expressed additional copy of the Lac repressor gene (lacIq) encoded on the vector. A strong ribosomal binding site (SD) enables an eﬃcient translation initiation of target genes cloned into the polylinker region. Furthermore, in-frame fusions with the pelB signal sequence allow for a Sec-dependent translocation of overexpressed target proteins into the periplasm of several diﬀerent host cells. Moreover, by creating in-frame fusions with a His-tag coding sequence, target proteins can be constructed that contain a carboxy-terminal aﬃnity tag enabling an easy one-step aﬃnity puriﬁcation. Brunschwig and Darzins have constructed the P. aeruginosa strain, PADD 1976, which contained a cassette composed of the T7-RNA polymerase structural gene under the control of a lacUV5 promoter and, in addition, the lacIq gene encoding the E. coli Lac repressor (29). As lacIq is constitutively expressed in P. aeruginosa, the transcription of the T7-RNA polymerase gene from the lacUV5 promoter is repressed under noninducing conditions. Addition of the synthetic inducer, isopropyl-h-D-thiogalactoside (IPTG), induces the synthesis of T7-DNA polymerase, which itself transcribes plasmid-encoded target gene(s) starting from the T7 promoter. Initial experiments demonstrated that P. aeruginosa PADD 1976 was a

Biocatalysts in Pseudomonas

623

Figure 2 The broad host range expression vector, pBBR22b. This vector was constructed by insertion of a PshAI/PpuMI-fragment derived from the commercially available expression vector, pET22b (Novagen, Madison, Wisconsin, USA), into the mobilizable broad host range vector, pBBR1MCS (Ref. 30). It harbors the combination of a T7 promoter (PT7) and a lac-operator (lacO), a strong ribosomal binding site (SD), and additionally allows the construction of in-frame fusions with the pelB signal sequence (ss) and a His-tag coding sequence. The chloramphenicol resistance gene (Cmr) and the elements needed for mobilization (MOB) and replication (REP) in gram-negative bacteria originating from pBBR1MCS are indicated as black arrows (Ref. 30).

suitable host strain for the overexpression of P. aeruginosa lipase (32). However, its use as an expression host for mutant lipases was limited because it also harbored the wild-type gene and therefore produced a signiﬁcant background lipase activity. In order to exclude this eﬀect, we have constructed diﬀerent strains in which the chromosomal lipase gene was inactivated. P. aeruginosa PABST7.1 was based on mutant strain, P. aeruginosa PABS1, which carries a large deletion in the lipase operon covering about 600 bp of the lipase structural gene lipA and about 300 bp of the

Figure 3 Construction of the overexpression strain P. aeruginosa PAFRT7.7. The T7-expression cassette obtained from phagemid pEB1 (Ref. 29) harboring the T7RNA polymerase gene and the gene encoding the Lac-repressor was cloned into the lipA gene of pLip3-S, a pBluescript derivative carrying the P. aeruginosa lipase operon. The resulting gene disruption construct was subcloned into the mobilizable suicide vector, pME3087 (kindly provided by Dieter Haas, University of Lausanne, Switzerland), giving pMELipT7, which was then inserted into the chromosome of the wild-type strain P. aeruginosa PAO1 by triparental conjugation and recombination replacing the wild-type lipA gene. For the resulting strain P. aeruginosa PAFRT7.7, the allelic exchange was conﬁrmed by Southern blotting and the lipase-deﬁcient phenotype was determined by lipase activity assays and Western blotting.

Biocatalysts in Pseudomonas

625

foldase gene lif. However, in this strain, the expression cassette containing the IPTG-inducible T7-RNA polymerase gene was inserted into the chromosome at an unknown position by random integration of the phagemid pEB1. Therefore, strain P. aeruginosa PAFRT7.7 was constructed by sitespeciﬁc integration of the expression cassette into the lipase operon (32). This strain now represents a lipase-negative mutant suitable for overexpression of lipase (Fig. 3), which can also be used for the background-free expression of mutant lipase genes (e.g., derived from libraries constructed by directed evolution experiments). Both P. aeruginosa strains yielded a lipase overexpression level, which exceeded that of the P. aeruginosa wild type by at least ﬁve orders of magnitude. Standard T7 promoter-based overexpression protocols suggest to induce expression during the logarithmic growth phase. On the other hand, extracellular lipase is produced and secreted by wild-type P. aeruginosa only when the cells reach the stationary growth phase. Furthermore, it is known that the type II secretion machinery needed to transport lipase into the culture medium is subject to growth phase regulation being expressed only at high cell densities (34). Therefore, we have exactly determined the optimal time point for the induction of T7-RNA polymerase-dependent lipase gene expression. The most eﬃcient lipase production was obtained when T7RNA polymerase expression was induced at the beginning of the stationary growth phase followed by a lipase production phase of 24 h (unpublished results). Extracellular lipase production was increased to 150 mg/L culture supernatant without any further optimization of media and growth conditions (32). 5

OVEREXPRESSION USING HETEROLOGOUS BACTERIAL HOST STRAINS

In several countries including Germany, containment regulations pose severe restrictions to the use of potentially pathogenic bacteria for large-scale biotechnological applications (e.g., industrial-scale fermentations and downstream processes). In order to circumvent these restrictions, it is desirable to use expression systems derived from nonpathogenic strains. Mainly for this reason, much eﬀort has been put into attempts to overexpress Pseudomonas lipases in E. coli as a nonpathogenic host strain (35–38). However, due to the complexity of the folding and secretion pathway used by these lipases, these attempts resulted in the production of enzymatically inactive lipase protein, which subsequently had to be subjected to time-consuming in vitro refolding procedures to obtain active enzymes (37,38). More recently, the cooverexpression of diﬀerent general cellular chaperones has become a general strategy to improve the folding capacity of heterologous expression strains,

626

Rosenau and Jaeger

thereby enhancing the production level of those proteins that tend to form insoluble aggregates or misfolded and inactive intermediates (39–44). The same concept has successfully been applied to optimize the folding process of heterologous proteins in the bacterial periplasm (45). However, the successful heterologous overexpression of a Pseudomonas lipase has not yet been reported. Recently, we have developed a heterologous lipase overexpression system that allows both the expression and the secretion of P. aeruginosa lipase into the culture supernatant of the nonpathogenic strain, Pseudomonas putida, which has been characterized as an extremely versatile bacterium with a high potential in bioremediation and biocontrol applications. In contrast to other pseudomonads, it is classiﬁed as a GRAS ( generally regarded as safe) organism, which qualiﬁes this strain as a production host for the large-scale industrial production of biocatalysts (46). Moreover, P. putida oﬀers the practical advantage of having a lipase-negative phenotype. For overexpression, the P. aeruginosa lipase operon was cloned into plasmid pBBR1MCS to give pBBL7, which resulted in the production of several milligrams per liter of lipase protein upon induction with IPTG. However, periplasmic folding and Xcp-mediated secretion through the bacterial outer membrane were also needed as outlined above. There are good reasons to assume that at least some periplasmic foldases such as Dsb proteins, rotamases, or general chaperones are present in P. putida. Secretion across the outer membrane may also be a limiting step, as demonstrated in an elegant approach carried out to identify cellular factors that could increase extracellular lipase production by P. alcaligenes: by screening a genomic cosmid library, a plasmid, which harbored the genes for a type II secretion machinery and greatly enhanced the production of lipase, was identiﬁed (47). In P. putida, xcp homologous genes have also been identiﬁed (48,49). However, a functional Xcp secretion machinery was not described. If it exists at all, it cannot mediate the secretion of heterologous proteins as demonstrated for P. aeruginosa elastase (50) and lipase (F Rosenau and K-E Jaeger, unpublished results). In order to reconstitute a functional Xcp machinery, we have identiﬁed this in a P. aeruginosa genomic cosmid library and cloned the entire xcp gene cluster. Upon transformation of P. putida with both pXCP7 containing 12 P. aeruginosa xcp genes and pBBL7 harboring the lipase operon, P. aeruginosa lipase was eﬃciently produced and secreted by P. putida (Fig. 4). The level of secreted and enzymatically active lipase was comparable to that observed for P. aeruginosa wild type, thereby demonstrating that the P. aeruginosa Xcp machinery was fully functional also in the heterologous host, P. putida. This result suggests the possibility to develop P. putida as a heterologous nonpathogenic expression and secretion host for biocatalysts produced by diﬀerent genera of related Pseudomonas and Burkholderia species.

Biocatalysts in Pseudomonas

627

Figure 4 P. putida as a non pathogenic host for heterologous expression and secretion of P. aeruginosa lipase. The production of extracellular P. aeruginosa lipase in P. putida requires expression of both the lipase protein and the Xcp secretion machinery from P. aeruginosa. (A) Production of lipase was analyzed on agar plates containing tributyrin as the substrate. The formation of clear halos around the colonies indicates the secretion of enzymatically active lipase. Signiﬁcant lipase production can only be observed in P. putida when the bacteria contain both the lipase operon on plasmid pBBL7 and the Xcp machinery encoded on plasmid pXCP7 originating from pLAFR3, which served as a control. (B) This result was conﬁrmed by quantitative analysis of cell-free culture supernatants from P. putida grown in LB medium. Lipase activity was measured spectrophotometrically using p-nitrophenyl palmitate as the substrate and expressed as enzyme activity (Akat/mL) per unit of bacterial cell density determined spectrophotometrically as O.D.580 nm. (C) Lipase protein was detected by Western immunoblotting using a lipase-speciﬁc antiserum.

6

CONCLUSION

A rapidly increasing number of genes encoding novel biocatalysts are currently being isolated. Environmental DNA (the so-called metagenome) is screened for functional open reading frames, new genes are detected by determination of the complete genome sequences of many prokaryotic and eukaryotic organisms, and modern protein engineering as well as directed evolution techniques generate large libraries consisting of billions of mutants derived from appropriate wild-type genes. However, to assess the functionality of all these novel genes, they have to be expressed in a way that results in the production of enzymatically active enzyme protein in amounts high enough to allow the determination of at least the most important

628

Rosenau and Jaeger

biochemical properties including enzymatic activity, substrate speciﬁcity, or enantioselectivity. Therefore, eﬃcient systems are needed for the production and isolation of biocatalyst proteins from various sources. Recently, the genus Pseudomonas has emerged as a reasonable alternative to the well-known overexpression systems operating in E. coli. Several P. aeruginosa strains have been adapted to allow speciﬁc and eﬀective overexpression from the T7 promoter driven by an inducible T7-RNA polymerase. In addition, eﬃcient secretion of overexpressed enzymes into the bacterial culture supernatant is also possible as was demonstrated for P. aeruginosa lipase. Furthermore, heterologous overexpression and secretion in P. putida of a P. aeruginosa lipase were achieved by co-expression of both the biocatalyst gene itself and the corresponding homologous type II secretion machinery. Undoubtedly, Pseudomonas bacteria will belong to the tool box of novel and eﬃcient biocatalyst overexpression systems that molecular biologists need to develop in the near future.

ACKNOWLEDGMENT The work reported here was funded by the European Union project Nanofoldex.

REFERENCES 1. 2. 3. 4. 5.

6. 7. 8. 9.

KE Jaeger, MT Reetz. Directed evolution of enantioselective enzymes for organic chemistry. Curr Opin Chem Biol 4:68–73, 2000. IP Petrounia, FH Arnold. Designed evolution of enzymatic properties. Curr Opin Biotechnol 11:325–330, 2000. UT Bornscheuer, M Pohl. Improved biocatalysts by directed evolution and rational protein design. Curr Opin Chem Biol 5:137–143, 2001. ET Farinas, T Bulter, FH Arnold. Directed enzyme evolution. Curr Opin Biotechnol 12:545–551, 2001. KE Jaeger, T Eggert, A Eipper, MT Reetz. Directed evolution and the creation of enantioselective biocatalysts. Appl Microbiol Biotechnol 55:519–530, 2001. HP Schweizer. Vectors to express foreign genes and techniques to monitor gene expression in pseudomonades. Curr Opin Biotechnol 12:439–445, 2001. A Liese, K Seelbach, C Wandrey. Industrial Biotransformations. Weinheim: Wiley-VCH, 2000. KE Jaeger, T Eggert. Lipases for biotechnology. Curr Opin Biotechnol 13: 390–397, 2002. MT Reetz. Lipases as practical biocatalysts. Curr Opin Chem Biol 6:145–150, 2002.

Biocatalysts in Pseudomonas

629

10. KE Jaeger, K Liebeton, A Zonta, K Schimossek, MT Reetz. Biotechnological application of Pseudomonas aeruginosa lipase: eﬃcient kinetic resolution of amine and alcohols. Appl Microbiol Biotechnol 46:99–105, 1996. 11. KE Jaeger, MT Reetz. Microbial lipases form versatile tools for biotechnology. Trends Biotechnol 16:396–403, 1998. 12. K Liebeton, A Zonta, K Schimossek, M Nardini, D Lang, BW Dijkstra, MT Reetz, KE Jaeger. Directed evolution of an enantioselective lipase. Chem Biol 7:709–718, 2000. 13. F Rosenau, KE Jaeger. Bacterial lipases from Pseudomonas: regulation of gene expression and mechanisms of secretion. Biochimie 82:1023–1032, 2000. 14. JL Arpigny, KE Jaeger. Bacterial lipolytic enzymes: classiﬁcation and properties. Biochem J 343:177–183, 1999. 15. KE Jaeger, BW Dijkstra, MT Reetz. Bacterial biocatalysts: molecular biology, three-dimensional structures, and biotechnological applications of lipases. Annu Rev Microbiol 53:315–351, 1999. 16. W Stuer, KE Jaeger, UK Winkler. Puriﬁcation of extracellular lipase from Pseudomonas aeruginosa. J Bacteriol 168:1070–1074, 1986. 17. EH Manting, AJ Driessen. Escherichia coli translocase: the unravelling of a molecular machine. Mol Microbiol 37:226–238, 2000. 18. JF Collet, JC Bardwell. Oxidative protein folding in bacteria. Mol Microbiol 44:1–8, 2002. 19. K Liebeton, A Zacharias, KE Jaeger. Disulﬁde bond in Pseudomonas aeruginosa lipase stabilizes the structure but is not required for interaction with its foldase. J Bacteriol 183:597–603, 2001. 20. A Urban, M Leipelt, T Eggert, KE Jaeger. DsbA and DsbC aﬀect extracellular enzyme formation in Pseudomonas aeruginosa. J Bacteriol 183:587–596, 2001. 21. M Koster, W Bitter, J Tommassen. Protein secretion mechanisms in Gramnegative bacteria. Int J Med Microbiol 290:325–331, 2000. 22. M Sandkvist. Biology of type II secretion. Mol Microbiol 40:271–283, 2001. 23. S Tabor, CC Richardson. A bacteriophage T7 RNA polymerase/promoter system for controlled exclusive expression of speciﬁc genes. Proc Natl Acad Sci USA 82:1074–1078, 1985. 24. FW Studier, BA Moﬀatt. Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol 189:113–130, 1986. 25. C Schlieker, B Bukau, A Mogk. Prevention and reversion of protein aggregation by molecular chaperones in the E. coli cytosol: implications for their applicability in biotechnology. J Biotechnol 96:13–21, 2002. 26. BM Benton, WK Eng, JJ Dunn, FW Studier, R Sternglanz, PA Fisher. Signalmediated import of bacteriophage T7 RNA polymerase into the Saccharomyces cerevisiae nucleus and speciﬁc transcription of target genes. Mol Cell Biol 10: 353–360, 1990. 27. MW Lassner, A Jones, S Daubert, L Comai. Targeting of T7 RNA polymerase to tobacco nuclei mediated by an SV40 nuclear location signal. Plant Mol Biol 17:229–234, 1991. 28. A Lieber, U Kiessling, M Strauss. High level gene expression in mammalian

630

29. 30. 31. 32.

33.

34.

35.

36. 37.

38.

39.

40.

41.

42.

43.

Rosenau and Jaeger cells by a nuclear T7-phase RNA polymerase. Nucleic Acids Res 17:8485–8493, 1989. E Brunschwig, A Darzins. A two-component T7 system for the overexpression of genes in Pseudomonas aeruginosa. Gene 111:35–41, 1992. ME Kovach, RW Phillips, PH Elzer, RM RoopII, KM Peterson. pBBR1MCS: a broad-host-range cloning vector. Biotechniques 16:800–802, 1994. AA Watson, RA Alm, JS Mattick. Construction of improved vectors for protein production in Pseudomonas aeruginosa. Gene 172:163–164, 1996. KE Jaeger, B Schneidinger, F Rosenau, M Werner, D Lang, BW Dijkstra, A Zonta, MT Reetz. Bacterial lipases for biotechnological applications. J Mol Catal B Enzym 3:3–12, 1997. TT Hoang, AJ Kutchma, A Becher, HP Schweizer. Integration-proﬁcient plasmids for Pseudomonas aeruginosa: site-speciﬁc integration and use for engineering of reporter and expression strains. Plasmid 43:59–72, 2000. M Akrim, M Bally, G Ball, J Tommassen, H Teerink, A Filloux, A Lazdunski. Xcp-mediated protein secretion in Pseudomonas aeruginosa: identiﬁcation of two additional genes and evidence for regulation of xcp gene expression. Mol Microbiol 10:431–443, 1993. JL Aamand, AH Hobson, CM Buckley, ST Jorgensen, B Diderichsen, DJ McConnell. Chaperone-mediated activation in vivo of a Pseudomonas cepacia lipase. Mol Gen Genet 245:556–564, 1994. F Ihara, I Okamoto, K Akao, T Nihira, Y Yamada. Lipase modulator protein (LimL) of Pseudomonas sp. strain 109. J Bacteriol 177:1254–1258, 1995. DT Quyen, C Schmidt-Dannert, RD Schmid. High-level formation of active Pseudomonas cepacia lipase after heterologous expression of the encoding gene and its modiﬁed chaperone in Escherichia coli and rapid in vitro refolding. Appl Environ Microbiol 65:787–794, 1999. PC Traub, C Schmidt-Dannert, J Schmitt, RD Schmid. Gene synthesis, expression in E. coli, and in vitro refolding of Pseudomonas sp. KWI 56 and Chromobacterium viscosum lipases and their chaperones. Appl Microbiol Biotechnol 55:198–204, 2001. E Ailor, MJ Betenbaugh. Overexpression of a cytosolic chaperone to improve solubility and secretion of a recombinant IgG protein in insect cells. Biotechnol Bioeng 58:196–203, 1998. K Nishihara, M Kanemori, M Kitagawa, H Yanagi, T Yura. Chaperone coexpression plasmids: diﬀerential and synergistic roles of DnaK–DnaJ–GrpE and GroEL–GroES in assisting folding of an allergen of Japanese cedar pollen, Cryj2, in Escherichia coli. Appl Environ Microbiol 64:1694–1699, 1998. C Vonrhein, U Schmidt, GA Ziegler, S Schweiger, I Hanukoglu, GE Schulz. Chaperone-assisted expression of authentic bovine adrenodoxin reductase in Escherichia coli. FEBS Lett 443:167–169, 1999. K Nishihara, M Kanemori, H Yanagi, T Yura. Overexpression of trigger factor prevents aggregation of recombinant proteins in Escherichia coli. Appl Environ Microbiol 66:884–889, 2000. K Ikura, T Kokubu, S Natsuka, A Ichikawa, M Adachi, K Nishihara, H

Biocatalysts in Pseudomonas

44.

45.

46.

47.

48.

49.

50.

631

Yanagi, S Utsumi. Co-overexpression of folding modulators improves the solubility of the recombinant guinea pig liver transglutaminase expressed in Escherichia coli. Prep Biochem Biotechnol 32:189–205, 2002. KH Lee, HS Kim, HS Jeong, YS Lee. Chaperonin GroESL mediates the protein folding of human liver mitochondrial aldehyde dehydrogenase in Escherichia coli. Biochem Biophys Res Commun 298:216–224, 2002. Z Zhang, ZH Li, F Wang, M Fang, CC Yin, ZY Zhou, Q Lin, HL Huang. Overexpression of DsbC and DsbG markedly improves soluble and functional expression of single-chain Fv antibodies in Escherichia coli. Protein Expr Purif 26:218–228, 2002. MC Ronchel, JL Ramos. Dual system to reinforce biological containment of recombinant bacteria designed for rhizoremediation. Appl Environ Microbiol 67:2649–2656, 2001. G Gerritse, R Ure, F Bizoullier, WJ Quax. The phenotype enhancement method identiﬁes the Xcp outer membrane secretion machinery from Pseudomonas alcaligenes as a bottleneck for lipase production. J Biotechnol 64: 23–38, 1998. A de Groot, JJ Krijger, A Filloux, J Tommassen. Characterization of type II protein secretion (xcp) genes in the plant growth-stimulating Pseudomonas putida, strain WCS358. Mol Gen Genet 250:491–504, 1996. A de Groot, G Gerritse, J Tommassen, A Lazdunski, A Filloux. Molecular organization of the xcp gene cluster in Pseudomonas putida: absence of an xcpX (gspK) homologue. Gene 226:35–40, 1999. P Braun, W Bitter, J Tommassen. Activation of Pseudomonas aeruginosa elastase in Pseudomonas putida by triggering dissociation of the propeptide–enzyme complex. Microbiology 146:2565–2572, 2000.

29 Analysis of Catalytic and Structural Stability of Native and Covalently Modified Enzymes P.V. Sundaram and S. Srimathi Centre for Protein Engineering and Biomedical Research, The Voluntary Health Services Madras, India

1

INTRODUCTION

Whereas demand for enzymes is largest from the industrial markets, new growth areas are blossoming in many other areas in biotechnology. Novel use for countless new enzymes is a major source of growth in the recent years. Industrial processes including drug design in the pharmaceutical industry, chiral synthesis, therapeutic applications, food processing, detergents, animal feeds, and fuel alcohol markets are the areas that are bound to demand increased attention and investment in biotechnology. Such a demand has spurred new approaches to large-scale production of stable enzymes. In addition to all of this is the prospect of countless new proteins/enzymes that will be appearing in the public domain in the new era of proteomics. Techniques have been established for large-scale production of enzymes after changing their structures by protein engineering techniques such as SDM, gene shuﬄing, or directed evolution of proteins (1). 633

634

Sundaram and Srimathi

To achieve gross shifts in properties, most of the R&D developments utilize protein engineering techniques in structure modiﬁcations. In this chapter, we demonstrate that a chemical approach to structure modiﬁcation of an already folded protein is a viable alternative method to produce stable enzymes. It is also cost-eﬀective unlike the genetic approach. How does one measure enzyme stability? Though we recognize the fact that there is catalytic stability (CS) as diﬀerent from structural or thermodynamic stability (SS), it is naturally the former which is more relevant in the sense that for most of those who are interested in applications, it is the retention of catalytic activity of the enzyme under varying conditions that will be their prime concern. Structural stability (SS) is of course an important entity, but, as we see from our experience with a variety of enzymes, SS need not necessarily resemble CS in its response to temperature changes or the denaturing eﬀect of chaotropic agents. In other words, very often, catalytic activity may be lost while perceptible changes in structure may not be visible, and it is also possible that in some cases, there may be noticeable changes in structure without much alteration in activity. Although the role of the naturally inherited structure of a protein may be understood in qualitative terms, it is not always clear what the quantitative contributions of the various forces are. Stability may depend on (1) the primary sequence as well as the tertiary structure of a fully folded protein, the latter, for example, producing S– S bonds and ionic interactions at critical places, and (2) factors independent of the sequence such as water structure, charge distribution on the protein, and the dipole moment of the medium. For long now, eﬀorts to preserve catalytic activities have been approached mainly by adding the so-called cosolvents such as sugars, salts, or suitable buﬀer ions to enzyme solutions or even additives such as glycerol or ammonium sulfate. Similarly, monosaccharides and disaccharides are often added to lyophilized enzyme preparations. In the more recent times, the concept of immobilization of enzymes on insoluble carriers to conserve activity began to emerge followed by genetic engineering techniques to alter protein structures by changing primary sequences using substitutions. This last mentioned approach relied upon intervention at the DNA level commonly known as the site-directed mutagenesis (SDM). This has been followed by gene shuﬄing and guided evolution, the last mentioned now being considered the best approach for structural changes aimed at optimizing performance and stability. One of the primary objectives of enzyme engineering is to produce stable enzymes useful for various biotechnological applications including organic synthesis. The native enzymes can be structurally engineered to suppress their inactivation under diﬀerent conditions. This can be achieved by mutagenesis, immobilization, or chemical modiﬁcation. In our approach of

Analysis of Stability of Enzymes

635

stabilizing the enzymes by chemical modiﬁcation, we concentrate both on the catalytic and structural stabilities. 2 2.1

ANALYSIS OF CATALYTIC STABILITY Thermal Inactivation

The catalytic stability of an enzyme is measured by progressive inactivation of the enzyme at various temperatures and pH. The half-life of inactivation, t1/2, is obtained by exposing the enzyme to various temperatures for a deﬁnite period before measuring its activity at a temperature at which it is stable. The slope of the plot of log percentage residual activity against time gives the inactivation constant, ki, at that temperature. The half-life is calculated using the following relationship. t1=2 ¼ ln2=ki ð1Þ The inactivation constant calculated at various temperatures can be used to make an Arrhenius plot of log ki against 1/T from which the energy of inactivation, Eai, is calculated. Eai ¼ 2:303 R slope ð2Þ The free energy (DGp ), enthalpy (DHp ), and entropy (DSp ) associated with inactivation can be calculated using the following relationship kB T expðDGp =RTÞ kcat ¼ ð3Þ h p ð4Þ DH ¼ Eai RT p p ðDH DG Þ ð5Þ DSp ¼ T where ki is the ﬁrst-order inactivation rate constant (h1), kB is the Boltzmann’s constant, and h is Planck’s constant. 2.2

Thermal Activation

In order to calculate the energies and entropies of activation, the use of the theory of absolute reaction rates is necessary. The inﬂuence of temperature on the rate constant depends on the equilibrium between the activated complex and the unactivated reactants. The rate of breakdown of these complexes is given by kcat ¼ kB T=hK

ð6Þ

where K is the equilibrium constant. Applying the thermodynamic equations to this equilibrium DGp ¼ RT lnK

ð7Þ

636

Sundaram and Srimathi

Substituting Eq. (7) in Eq. (6) kcat ¼ kB T=h expðDGp =RTÞ

ð8Þ

The rate constant measured at various temperatures can be used to make an Arrhenius plot to get the activation energy, Ea. The slope of this plot is equal to Ea/R. The enthalpy and entropy of activation are calculated using Eqs. (4) and (5). Lonhienne et al. (2) have pointed out in their review that the contribution of kcat to the free energy should be taken into consideration while calculating DGp in Eq. (9). According to them, the determination of kcat at various temperatures eliminates this error in the DGp values estimated using Eq. (3). DGp ¼ RTð23:76 þ lnT lnkcat Þ

ð9Þ p

In order to see how sensitive the DG values will be to alterations in kcat values, we examined the reactions catalyzed by porcine trypsin using BAPNA as the substrate. In the temperature range 35jC to 40jC, kcat values were around 182 S1, and increasing this 10-fold, i.e., 1820 S1, the DGp value dropped from 62.18 to 56.29 kJ/mol, a clear 5.9 kJ/mol decrease. It is worth pointing out that in many cases, such as porcine trypsin mentioned here, the Km values remain within the same order of magnitude and do not vary dramatically in the temperature range 35jC to 70jC. 3

ANALYSIS OF STRUCTURAL STABILITY

The primary sequence of the native proteins folds into a stable three-dimensional conformation called the native conformation. There is always a balance between the conformation assumed and catalytic activity achieved. There are speciﬁc regions in a protein which contribute to its structural and catalytic stabilities, and it may be often diﬃcult to correlate one with the other. To know the eﬀect of mutation or chemical modiﬁcation, it is essential to know the conformational stability of native and denatured forms of wild type and mutant or native and modiﬁed protein. Since it is impossible to determine the absolute conformational stability, it is calculated as the diﬀerence in free energy for the native and the unfolded form relative to the transition state. Protein unfolding is normally induced by pH, temperature, or chaotropic agents such as urea and guanidine hydrochloride. The technique chosen will depend on the property to be measured to follow the unfolding process. UV diﬀerence spectroscopy, ﬂuorescence spectroscopy, and circular dichroic spectroscopy are the most useful techniques. Unfolding curves obtained from these techniques can be analyzed further to get the essential information about the denaturation (4).

Analysis of Stability of Enzymes

3.1

637

The Mathematics of Protein Unfolding

A typical unfolding curve can be divided into three regions assuming y as the optical property measured to monitor the unfolding process. (i) The pretransition region, which shows the variation of y for the native, folded protein with respect to the denaturant concentration, pH, and temperature measured as yN. (ii) The transition region where y varies as the protein unfolds. (iii) The posttransition region showing the variation of y for the denatured, unfolded protein measured as yD. We use equilibrium methods to examine the eﬀect of modiﬁcation on the relative energies of the stable native and unfolded forms. Measuring the conformational stability requires the determination of the equilibrium constant for the denaturation process. It is assumed that there exists at least two states which are stable, the native, folded state and the denatured, unfolded state. aD N Y

ð10Þ

Unfolding of many globular proteins has been found to approach a two-state folding mechanism as shown above. Determination of whether protein unfolding is a single-step transition between two states or a multistep conformational transition with intermediate states necessitates the use of high-precision instrumentation. The smooth single-step transition involves subtransitions, detection of which is purely dependent on the sensitivity of measurement. At any point during unfolding, only the folded and unfolded conformations are present at signiﬁcant concentrations. It can also be said that even if there are intermediates involved, they can be classiﬁed as native-like and denatured-like intermediates. Therefore fN þ fD ¼ 1

ð11Þ

where fN is the fraction of folded conformation and fD is the fraction unfolded. Values of y characteristic of the native state, yN, and of the denatured state, yD, can be obtained by the extrapolation of the pretransition and posttransition regions, respectively. For a two-state mechanism, since fN+fD=1, y ¼ yN f N þ yD f D

ð12Þ

Combining Eqs. (11) and (12) fD ¼ ðy yN Þ=ðyD yN Þ

ð13Þ

fN ¼ ðyD yÞ=ðyD yN Þ

ð14Þ

638

Sundaram and Srimathi

Thus the equilibrium constant, KD, and the free energy of unfolding, DGD, can be calculated using the following relation KD ¼ fD =fN ¼ ðy yN Þ=ðyD yÞ DGD ¼ RT lnKD

ð15Þ ð16Þ

Three methods are currently used to estimate DGDH2O which is DGD at zero concentration of denaturant. They are Tanford’s model, denaturant binding model, and linear extrapolation model (LEM) discussed in detail by Pace (4). The LEM is the simplest and the most widely used method. It is based on the linear dependence of DGD on the denaturant concentration in the transition region and assumes that this linearity continues to zero concentration of the denaturant. The data are ﬁtted to an equation DGD ¼ DGD H2 O m ðdenaturantÞ

ð17Þ

where m is the measure of the dependence of DGD on denaturant concentration and it is equal to the slope of the plot of DGD vs. [D]. Also, at the midpoint of the unfolding curve ½D1=2 ¼ DGD H2 O=m

ð18Þ

where [D]1/2 is the denaturant concentration at which there is 50% unfolding. Also, at [D]1/2, the free energy, DGD=0. We are interested in determining the conformational stability of proteins with modiﬁed structures. Homologous series of proteins, mutants, and chemically modiﬁed proteins may be compared as their structure may show minor variations. The diﬀerence in conformational stability D(DG) can be determined either from the DGDH2O values of the proteins or from the [D]1/2 and m values as follows: X m=n kcal=mol ð19Þ DðDGÞ ¼ D½D1=2 where D[D]1/2 is the diﬀerence in [D]1/2 values and Sm/n is the average value of m for n number of proteins. 3.2

Techniques Used to Follow Unfolding

It is well known that the intrinsic properties of the protein structure and amino acid residues are exploited as the probes to follow the unfolding path. The change in molar extinction coeﬃcient upon unfolding can be monitored by UV diﬀerence spectroscopy selecting 287 nm for tyrosine and 292 nm for tryptophan as absorption maximum. When the intrinsic ﬂuorescence of tyrosine and tryptophan are used to trace unfolding, the protein is excited and the emission spectra are recorded by a spectroﬂuorophotometer in the presence of the denaturant of interest. Excitation at 281 nm gives both tyrosine and tryptophan ﬂuorescence. However, when the excitation wave-

Analysis of Stability of Enzymes

639

length is 295 nm, only tryptophan is excited. The emission maximum is observed between 345 and 350 nm when the tryptophan residues are solventexposed, and it occurs between 320 and 330 nm when they are in the interior of the globular proteins. Both UV diﬀerence spectroscopy and ﬂuorescence spectroscopy reﬂect the changes in the tertiary structure governed by the aromatic side chains. However, with circular dichroism (CD spectroscopy), both secondary and tertiary structural changes can be monitored. The peptide backbone of the globular proteins forms the basic secondary structure as a result of protein folding. The three major secondary structures, a-helix, h-sheet, and random coil, change with denaturant concentration. The far UVCD (190–250 nm) shows the changes in secondary structure (5). The ellipticity changes at 222 and 208–210 nm show the changes in helix and those between 216 and 218 nm estimate the changes in h-sheets. The unordered or random coil shows a strong positive band around 200 nm. The near UVCD (250–300 nm) measures the tertiary structural changes characterized by aromatic side chains. Ellipticity changes at 270 and 296 nm show the changes in tertiary structure during unfolding. The noncoincidence of the transition curves obtained from the ellipticity changes at far UV (222 nm) and at near UV (270 and 296 nm) region will show that there is at least one intermediate involved. The results obtained by various spectroscopic techniques can be analyzed well using the same two-state model and equations discussed earlier. It is also possible to analyze the thermal denaturation of proteins by CD. When analyzed similarly, it gives an additional parameter, Tm, the melting temperature. At Tm, DGD=0. However, comparing thermal denaturation results could be diﬃcult because denaturation of proteins under identical conditions, but monitored by diﬀerent techniques such as ﬂuorescence or CD, need not produce results showing similar trends, as for example, Tm obtained through ﬂuorescence shows an increase in Tm values for trypsins modiﬁed with a variety of modiﬁers. Opposite trends are noticed for the Tm values of the samples in the thermal denaturation studies carried out with a CD. 4

ALTERNATIVES IN STRUCTURE MODIFICATION AND THEIR RELATIVE MERITS

Protein engineering, including the various approaches mentioned earlier, has been the most popular approach used in improving and preserving the catalytic activity of enzymes. However, protein structures may also be modiﬁed by in vitro covalent modiﬁcation using a variety of reagents (not necessarily composed of amino acids) of diﬀerent physicochemical properties such as aldehydes, sugars, carbohydrate polymers, and PEG derivatives to mention a few.

640

4.1

Sundaram and Srimathi

General Approaches to In Vitro Modification

There are several factors inherent to the ﬁnal strategy employed in in vitro covalent modiﬁcation of enzymes such as: a) Molecular size and subunit structure of the enzyme. b) The availability of the primary sequence information and crystal structure data of the enzyme which will be ideal. c) Knowledge of the interlysine and inter –COOH group distances calculated from the crystal data in (b) will be useful in selecting the modiﬁer. d) General information on the pH and temperature dependence and any metal ion eﬀects that deserves attention. e) The molar ratio of the enzyme to modiﬁer used during modiﬁcation. f) Choice of the enzyme to modiﬁer molar ratio is likely to inﬂuence the production of a heterogeneous mixture of enzyme adducts having diﬀerent degrees of modiﬁcation. 4.2

Consequences of Modification

Chemical modiﬁcation of an enzyme could induce changes in conformation which, in turn, may aﬀect the catalytic activity and structural stability. Product yield, homogeneity or heterogeneity of the products formed, percentage of the targeted protein groups being modiﬁed, and percentage activity retained by the structurally modiﬁed protein are features that are of primary concern before further characterization of the products is undertaken. The extent to which the catalytic and structural parameters are altered depends on the degree of modiﬁcation since the chemical modiﬁcation procedures are group-speciﬁc and not site-speciﬁc, forming a heterogeneous mixture of modiﬁed enzymes. In this context, the term percentage modiﬁcation can be misleading because the same percentage modiﬁcation can be produced involving diﬀerent set of target groups. Two situations are possible, i.e., (1) activity may be the same whereas the percentage of the groups modiﬁed may be diﬀerent or (2) catalytic activity retained may vary though the enzyme may have been modiﬁed to the same extent. One may or may not see a trend in the degree of modiﬁcation and the activity retained. A generic procedure has been developed to modify the structure of enzymes by in vitro covalent procedures (6–8). The approach takes into account the inherent properties of the various enzymes so that there is room to design the physicochemical properties of the modiﬁer molecules, with the primary concern being that, while modifying the enzymes, their speciﬁcity,

Analysis of Stability of Enzymes

641

and in broad terms their native properties, are retained while their thermotolerances and chemotolerances improve. Having achieved this, the aim is to characterize the various parameters connected with the eﬃcient performance of the catalysts. This includes their thermotolerance and chemotolerance. This detailed analysis should indicate what parameters are dependent on the primary sequence of the enzymic protein and which are the sequenceindependent factors that aﬀect stability. In other words, how complex and predictable the molecular mechanism of stabilization is should become clearer in our analysis. 4.2.1

Effect of Product Heterogeneity

The number of speciﬁc groups modiﬁed in a protein during chemical modiﬁcation can introduce a catalytic activity-independent heterogeneity. This kind of variation will be more pronounced when a relatively large polymeric modiﬁer with multiple reactive sites is used. However, if all the reactive sites are not blocked simultaneously due to steric factors of the reactants (enzyme or polymeric modiﬁer), the initial states of the available binding sites on the modiﬁer molecule might inﬂuence the succeeding steps. The eﬀective stoichiometry in reactions involving macromolecular reactants is subject to changes depending on (a) the conformation of the macromolecules, (b) the orientation of the molecules, and (c) the relative reactivities of the potential reactive sites which may begin to vary as the initial sites have already reacted. Heterogeneously modiﬁed enzymes separated by gel ﬁltration chromatography show varying degrees of catalytic and structural stabilities. Chemical modiﬁcation of proteins targeting speciﬁc amino acids can only be group-speciﬁc and not site-speciﬁc. For this reason, since a given protein has only a ﬁnite number of the targeted amino acids, it is possible to modify the protein to the same extent (% modiﬁcation) but with a diﬀerent set of amino acids, e.g., q-NH2 groups of lysine when one is targeting lysines. Under identical conditions, the extent of modiﬁcation achieved may be reproducible, but the activities retained need not be equal and reproducible. In general, it has been found that high degrees of modiﬁcations result in reduced activity of an otherwise more stable conformer. Under identical conditions, a product of predictable conformation can be obtained. Since solvents can also aﬀect protein conformation, the choice of solvent composition during modiﬁcation becomes another variable which can be exploited. 4.2.2

Michaelis Parameters and Their Significance

It is important to know what the catalytic eﬃciency (kcat/Km) and the speciﬁc activity of a modiﬁed enzyme are before more detailed characteriza-

642

Sundaram and Srimathi

tion is carried out. Catalytic eﬃciency will also be aﬀected by the molecular size of the substrates. Proteases are a good example to test this phenomenon since these enzymes may be studied with small molecular weight synthetic substrates such as ATEE or BAPNA or with natural substrates such as casein or BSA. Deepthi et al. (9) demonstrated how steric limitations to large substrates aﬀect the Km, kcat, and kcat/Km of several proteases such as chymotrypsin, trypsin, and papain. Similarly, in the case of a-amylases, steric problems surface when starch is used as a substrate instead of the synthetic substrate p-nitrophenyl maltopentoside. 5

MODEL ENZYMES STUDIED

5.1

Papain

An independent study of papain covalently modiﬁed with an oxidized sucrose polymer (OSP 400) of molecular weight 400 kDa by Rajalakshmi and Sundaram (6) was made. Among the three preparations bearing a molar ratio of enzyme to modiﬁer of 16:1 (P1), 4:1 (P2), and 2:1 (P3), P3 showed a decrease of 20% in speciﬁc activity. Catalytic eﬃciency of P3 also decreased the most. Km values did not change, whereas kcat decreased with increase in OSP 400 in the molar ratio. Table 1 shows the results of covalently modiﬁed papain using oxidized sucrose polymer (OSP 400) at diﬀerent molar ratios wherein it may be seen that Km values do not change much whereas kcat does. Preparations containing more of the modifying sucrose polymer show reduced catalytic activity, thus decreasing the kcat/Km value. Thus P3 showed the lowest catalytic eﬃciency, although its catalytic stability was the highest. Topt and T50 are the parameters that tell us about the heat resistance of the enzyme, and as may be seen in Table 2, both of these parameters show

Table 1

Speciﬁc Activity and Aﬃnity Constants of Native and Modiﬁed Papain

Papain form Native P1 P2 P3

Speciﬁc activity (Amol pNA/mg protein/h)

Km (mM)

kcat (min1)

kcat/Km (M1 S1)

1.76F0.06 1.82F0.08 1.70F0.10 1.76F0.06

1.00F0.02 0.94F0.08 1.09F0.06 1.00F0.03

3.82F0.09 4.01F0.07 3.52F0.09 2.98F0.10

63.70 71.06 53.76 49.70

P1, P2, and P3 refer to PS-papain, modiﬁed in ratios of 16:1, 4:1 and 2:1, respectively. The Km and kcat values were determined from Lineweaver–Burk plots. The values given are the meanFstandard deviation from at least three independent determinations.

Analysis of Stability of Enzymes Table 2

Temperature Optima and Stability of Native and Modiﬁed Papain Unheated

Papain type Native P1 P2 P3

643

T50 (jC)

Preheated

Topt (jC)

Ea (kJ/mol)

Topt (jC)

Ea (kJ/mol)

Without urea

With 8M urea

62.6F0.21 72.4F0.41 71.2F0.16 73.6F0.10

33.4F0.38 32.5F0.32 33.4F0.22 36.3F0.25

59.8F0.18 66.0F0.25 70.6F0.12 71.2F0.12

33.3F0.18 32.0F0.27 32.2F0.19 35.1F0.22

68 75 74 78

29 53 56 60

Topt: optimum temperature for activity; Ea: energy of activation for thermal activation. All Topt and Ea values are given as meanFSD from at least four independent determinations. T50 is the temperature where 50% of initial activity is retained, values represent the average of at least four independent assays; standard errors in T50 values are not more than f1jC.

an increase in value indicating improved thermotolerance after modiﬁcation, with the best among the lot being the preparation P3. In contrast to Topt and T50, the activation energy Ea increases only slightly for the modiﬁed enzymes. This correlates with the fact that the speciﬁc activity of the enzyme (Table 1) is lowered after modiﬁcation. Slopes of lines obtained from the plots of log % residual activities (RA) of papain at diﬀerent temperatures over a period of time (Fig. 1) denote ki, the rate constant of the inactivation reactions. Plotting log ki against 1/TjK yields Eai (Fig. 2), the activation energy of inactivation for native and modiﬁed papain by using Eq. (2). Table 3 contains values of stabilization factor (SF) of the three papain adducts P1, P2, and P3 in the temperature range 60jC to 90jC. For P1, the SF value remains around 2.5 for 60jC to 90jC, whereas for P2 and P3, maximum stabilization is obtained at 70jC. Table 4 contains the data on t1/2 and kinetic activation parameters of native and modiﬁed papains. The values for t1/2 are calculated from Eq. (1) (t1/2=ln 2/k), and the interpretation of these values loses its signiﬁcance when the half-life of the modiﬁed enzymes such as P3 is so large especially at lower temperatures. A more adequate measure of stability is obtained from Eai, the activation energy of inactivation (see Eq. (2) which uses the Arrhenius equation). A larger value of Eai implies that more energy is required to inactivate the protein. The t1/2 values of both the native and various modiﬁed enzymes P1, P2, and P3 (Table 4) drop steeply with the increase in temperature above 60jC. The half-life of OSP 400 papain preparations (P1 to P3) shows a vast improvement depending on the amount of OSP used. Thus the t1/2 values were

644

Sundaram and Srimathi

Figure 1 First-order plots of thermal inactivation of papain (A) native, (B) P1, (C) P2, and (D) P3 at 37jC (.), 50jC (o), 60jC (z), 70jC (n), 80jC (j), and 90jC (5). P1, P2, and P3 are modiﬁed preparations of papain.

2- to 20-fold higher at 60jC, 3- to 30-fold higher at 70jC, 4- to 8-fold higher at 80jC, and 2-fold higher at 90jC. The free energy of inactivation DGp and the corresponding enthalpy p (DH ) and entropy values (DSp ) calculated from Eqs. (3)–(5) are also found in Table 4. These values did not change markedly in the temperature range 60jC to 90jC, although, compared to the native enzyme, the adducts

Analysis of Stability of Enzymes

645

Figure 2 Arrhenius plots of thermal inactivation for native (.) and modiﬁed papain P1 (j), P2 (z), and P3 (o). Slopes of the plots in Fig. 1 yield ki, the ﬁrstorder inactivation rate constants.

showed larger changes in DG p , DH p , and DS p . DG p for native papain ranged between 105.1 and 114 kJ/mol, its average enthalpy was 162 kJ/mol, and the entropy values were between 144.5 and 156.5 mol K1. P1, which has the lowest content of OSP, showed a decrease 17 J/mol K1 in its DS p , which is less than that of native enzyme, and a decrease in DH p by 3.3 kJ/mol, although DG p was higher than the native by 2.1 to 3.1 Table 3 Temperature-Dependent Stabilization Factor (SF) of Papain Modiﬁed with OSP 400 Papain type

60jC

70jC

80jC

90jC

P1 P2 P3

2.15 4.66 20.50

3.00 11.00 32.00

2.5 2.68 5.00

2.5 1.43 2.14

SF=t1/2 modiﬁed/t1/2 native. Molar ratios of enzyme to OSP 400 are 16:1 (P1), 4:1 (P2), and 2:1 (P3).

646

Sundaram and Srimathi

Table 4 Half-Life and Kinetic Activation Parameters for Thermal Inactivation of Native and Modiﬁed Papains Temperature (jC) 60

70

80

90

(A) Native papain t1/2 (h) DG p (kJ/mol) DH p (kJ/mol) DS p (J/mol K1)

48.00F1.1 114.0F0.20 162.1F0.22 144.5F0.30

6.00F0.20 110.1F0.08 162.1F0.22 151.5F0.25

1.60F0.06 108.3F0.02 161.9F0.22 151.8F0.28

0.28F0.02 105.1F0.02 161.9F0.22 156.5F0.28

(B) Papain P1 t1/2 (h) Dt1/2a DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1)

103.00F3 55.00F1.90 116.1F0.15 2.1F0.05 158.8F0.21 128.2F0.09

18F0.9 12F0.70 113.2F0.13 3.1F0.05 158.7F0.21 132.7F0.12

4F0.1 2.4F0.04 111.1F0.17 2.8F0.15 158.6F0.21 134.6F0.06

0.7F0.04 0.47F0.02 107.8F0.21 2.7F0.19 158.5F0.21 140.0F0.0

(C) Papain P2 t1/2 (h) Dt1/2a DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1)

224F2.5 176F1.40 118.2F0.15 4.2F0.05 179.6F0.24 184.1F0.13

66F1.1 60F0.90 118.0F0.18 7.9F0.10 179.5F0.24 179.3F0.09

4.3F0.2 2.7F0.14 111.3F0.13 3.0F0.11 179.4F0.24 192.9F0.16

0.4F0.04 0.12F0.00 106.2F0.23 1.1F0.21 179.3F0.24 201.4F0.15

(D) Papain P3 t1/2 (h) Dt1/2a DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1)

988F4 940F2.90 122.3F0.18 8.3F0.02 236.6F0.27 342.6F0.14

192F3 186F2.80 120.1F0.17 10.0F0.09 240.4F0.27 351.0F0.15

8F0.6 6.4F0.54 113.1F0.19 4.8F0.17 240.4F0.27 360.6F0.12

0.6F0.05 0.32F0.02 107.4F0.27 2.3F0.25 240.3F0.27 366.1F0.0

a Difference between modified and native papain. Standard deviations represent deviations in values calculated from the maximum and minimum values of the first-order rate constants.

Analysis of Stability of Enzymes

647

kJ/mol. P2 and P3 showed an increase in DS p , DH p , and DG p . D(DG p ) for all the modiﬁed papains were the highest at 70jC which indicates maximum stability at this temperature. 5.1.1

Urea Denaturation

The urea denaturation pattern of papain before and after modiﬁcation is rather unusual in that the native and the sample P1, which has the least OSP (E/M=16:1) in relation to the enzymic protein, are inhibited by urea (0 to 8 M), whereas P2 and P3, which are made with an enzyme to modiﬁer molar ratio of 4:1 and 2:1, show an increase in activity of 120% and 170%, respectively, in the initial 4 h after which a loss in activity was observed. This initial increase in activity is attributed to the active site becoming more ﬂexible due to urea. Further rupture of the H-bonds leads to denaturation and a gradual loss of activity. t1/2 (time for 50% inactivation when exposed to urea) for the native enzyme was 33 min which increased to 130 min for P1 due to this eﬀect. t1/2 could not be calculated for P2 and P3 because of the initial activation observed. 5.2

Chymotrypsin

Similar to papain, bovine a-chymotrypsin was also modiﬁed by OSP 70, OSP 400, CMC (12 kDa), and Dextran (73 and 250 kDa), all of them by the reductive alkylation procedure. Table 5 shows the data on residual activities of native and modiﬁed chymotrypsin after heat, urea, and SDS treatment. The pH optimum for this enzyme was virtually the same after modiﬁcation. The Km value for the CMC adduct moves up from 0.49 to 0.66 mM. The kcat/Km for native enzyme was 122.44 M1 S1 and was 81.82 M1 S1 for the CMC adduct. 5.2.1

Activity Against BSA

Using BSA as natural substrate (molecular weight taken as 64 000, extinction coeﬃcient after total digestion as 6000, and setting the native enzyme activity as 100%), the following were the percentage activities shown by the various modiﬁed chymotrypsin adducts: native (100%) > CMC-C (80%)>OSP 400-C (35%)>Dextran-C (28.9%)>OSP 70-C (26.2%). The kcat/Km value for CMC-C was the highest at 6.00104 M1 S1 as against 7.2104 M1 S1 for the native enzyme, while other modiﬁed products ranged between 2.04104 and 2.2104 M1 S1. Topt values increased by only 0.5jC to 10jC from 56.5jC for the native chymotrypsin. Table 5 contains data on the residual activities (% RA) of chymotrypsin before and after modiﬁcation, the extent of modiﬁcation, T50 and

648

Sundaram and Srimathi

Table 5 Residual Activities of Native and Modiﬁed a-Chymotrypsin After Heat, Urea, and SDS Treatment Type of enzyme OSP 400-C OSP 70-C

CMC-C Dextran 250-C Native-C

Enzyme/polymer molar ratio

Modiﬁcation (%)

% RAa

T50 (jC)b

U50 (M)c

% RA in SDSd

1:0.5 M1 1:0.5 M2 1:0.5 O1 1:1.0 O2 1:2.0 O3 1:333 C1 1:500 C2 1:0.2 D1 1:0.5 D2 1:0.0

58 80 60 67 75 82 90 90 98 0.00

64 55 75 68 65 80 77 53 50 100

64 63 60 61 61 58 57 55 54 50

8.6 – – – 7.9 4.5 – 4.9 – 3.6

100 – – – 100 45 – 82 – 5.6

a % RA refers to the residual activity remaining after modification as compared with the native enzyme whose activity is taken as 100%. b T50 is the temperature where 50% of the initial activity is retained. c U50 is the concentration of the urea at which 50% initial activity is retained. d Incubation with 0.3% SDS for 15 min.

U50 values (temperature and urea concentration at which 50% of initial activity is retained), and % RA after SDS treatment. Modiﬁcation of 58% to 98% was observed among the four modiﬁers used. It is worth noticing how the % modiﬁcation aﬀects % RA, T50 (jC), U50 (M), and the denaturing eﬀect of SDS. The OSP 400-modiﬁed enzyme M1 yields a 58% modiﬁed adduct that is more active and stable in all respects than M2 which is 80% modiﬁed. A similar trend is seen also with O1, O2, and O3, the three OSP 70-modiﬁed enzymes. An increase of 4jC to 14jC in T50 values of the modiﬁed enzymes indicates their improved thermotolerance, with M1, the OSP 400-modiﬁed enzyme, being the best of the lot. Variation in the molar ratios of enzymes to polymer does not change T50 values noticeably, although other parameters appear to change. Table 6 contains t1/2, Eai, and other thermal inactivation parameters such as DGp , DHp , and DSp for native and variously modiﬁed chymotrypsins. Although the native and modiﬁed enzymes showed a steep decrease in t1/2 values with increasing temperature, the OSP 400-modiﬁed enzyme showed a 60- to 90-fold higher t1/2 than the native, whereas for dextran, CMC, and OSP 70-modiﬁed enzymes, they were 8-, 32-, and 80-fold higher, respectively. Correspondingly, Eai values also rose indicating stabilization. DGp values of modiﬁed enzymes rise by 2.3 to 12.4 kJ/mol.

Analysis of Stability of Enzymes

649

Table 6 Half-Life and Kinetic Activation Parameters for the First Phase (k1) of the Thermal Inactivation of Native and Modiﬁed Chymotrypsin a-Chymotrypsin (a-CT) Native a-CT

OSP 400-a-CT (M1)

OSP 70-a-CTa (O3)

CMC-a-CT (C1)

Dextran-a-CT (D1)

t11/2 (h) 5.5F0.115 to 0.033F0.001 362.0F5.43 to 1.3F0.037 160F1.92 to 0.9F0.027 87.5F1.31 to 0.4F0.01 9F0.19 to 0.086F0

Eai1 (kJ/mol)

DG1p (kJ/mol)

DH1p (kJ/mol)

DS1p (J/mol K1)

183.3F0.42

106.9F0.213 to 97.6F0.02 118.11F0.15 to 108.1F0.21 115.08F0.15 to 107F0.32 114.32F0.16 to 104.35F0.22 109.2F0.19 to 100.3F0.134

182.4F0.255 to 182.3F0.255 337.5F0.5 to 336.74F0.49 300F0.54 to 299.7F0.54 284.44F0.43 to 284.31F0.43 227F0.32 to 226.9F0.32

233.7F0.514 to 250.52F0.44 677F0.54 to 676.46F0.2 581.2F0.46 to 570F0.42 527F0.38 to 532.42F0.43 365F0.36

338.0F0.9

302F0.7

285.0F0.6

227.6F0.4

375F0.28

Experimental data collected at 50–65jC. The values are given as FSD from triplicates. Ea is the activation energy of inactivation. It is obtained by plotting log ki, the ﬁrst-order inactivation constant, against reciprocal of temperature as per the Arrhenius equation, ki is obtained from the slopes of plots of log % residual activity against time in hours. The enzymes in assay buffer were incubated at various temperatures. At regular intervals, aliquots were removed to measure the residual activity expressed relative to that of the unheated control.

Similarly, it may be seen in Table 5 how U50, the concentration of urea at which 50% initial activity is retained, also increase noticeably in the case of OSP-modiﬁed chymotrypsin. Fluorescence spectral measurements of the native and modiﬁed enzymes at several temperatures between 30jC and 70jC revealed that the native chymotrypsin is denatured by heat at 60jC resulting in the unfolding of the enzyme. This is borne out by a distinct redshift in the wavelength maximum and a corresponding decrease in the ﬂuorescence intensity (Fig. 3A). Under similar conditions, an OSP 70-modiﬁed enzyme (preparation O3 in Table 5) showed lesser redshift and smaller ﬂuorescence intensity loss (Fig. 3B). These experiments show that the modiﬁed enzyme retains its conformational stability and its catalytic activity as well. 5.3

B-Glucosidase

h-Glucosidase from sweet almonds modiﬁed by conjugation with oxidized sucrose polymer (OSP), carboxymethylated sucrose polymer (CMOSP), or CM cellulose (CMC) showed the eﬀect of structural changes mainly seen in

650

Sundaram and Srimathi

Figure 3 Fluorescence emission spectra of (A) native and (B) OSP 70-modiﬁed (O3 in Table 4) chymotrypsin. Enzyme samples incubated for 1 h at 25jC (.) and 60jC (o) and their ﬂuorescence emission spectra recorded.

Analysis of Stability of Enzymes

651

improved thermotolerance with a 2- to 6-fold increase in t1/2 at 50jC (in a temperature range tested at 40jC to 70jC, Table 7). Free energy of thermoinactivation (DGp ) increased by 4.97 kJ/mol for CMC-modiﬁed enzyme and 7.5 kJ/mol for a sucrose polymer-modiﬁed enzyme (9). CMC-modiﬁed h-glucosidase shows a 3.3-fold increase in thermostability (preparation C in Table 8A), whereas the OSP-modiﬁed enzyme enhances thermotolerance (preparation B in Table 8B) by 18.9 times. Ea, the activation energy of the modiﬁed enzymes, decreased from 4.89 kcal/mol for the native enzyme to 1.46 kcal/mol for the CMC-enzyme and 4.31 and 3.64 kcal/mol for OSP-enzyme and the CMOSP-enzyme, respectively. This indicates that the CMC-modiﬁed enzyme becomes the most eﬃcient catalyst in this group. Table 7 contains the data on t1/2, DGp , DHp , and DSp of this modiﬁed h-glucosidase. The footnotes contain the details. Molar ratios of enzyme to modiﬁer (E/M) may be varied to obtain optimum stabilization. Table 8A shows that the D(DGp ) value changes depending on E/M ratio. OSP-modiﬁed h-glucosidase at E/M of 1:0.89 produced the best results, with D(DGp ) varying between 7.5 and 5.89 kJ/mol in the temperature range 55jC to 70jC. Similarly, the stabilization factor (SF) is also the best for the same adduct (Table 8). 5.3.1

Stabilization in Nonaqueous Media

Most of the chemically modiﬁed enzymes that we have studied are found to tolerate high concentrations of polar solvents. This ﬁnding is considered very useful when enzymes may be considered for use in the synthesis of esters, peptides, or carbohydrate polymers. Here we discuss the observations made with the eﬀect of solvents on h-glucosidase activity. The eﬀect of increased negative charges on the enzyme adduct as in the case of ECMC or ECMOSP could aﬀect the polarity and the dipole moment of the enzyme which might either stabilize or destabilize the enzyme in water-miscible solvents (8). The stabilities of the native enzyme (E) and CMC-modiﬁed adduct (ECMC) in 60% v/v solvents such as acetone, CH3CN, dioxane, DMF, DMSO, and ethanol were compared (Fig. 4). ECMC was more stable than E in all cases except dioxane in the following order: DMF>DMSO= CH3CN>acetone>ethanol, the actual order of their dipole moments in pure form being DMSO=CH3CN>DMF>acetone> ethanol. Pure dioxane has a dipole of zero. Thus it suggests that DMF is ideal for the stabilization of ECMC, and a shift in either direction reduces the extent of stabilization. An important ﬁnding was that in all these solvents, the modiﬁed enzyme was more stable than the native enzyme. EOSP and

652

Sundaram and Srimathi

Table 7 Half-Life (t1/2) and Kinetic Activation Parameters for Thermal Inactivation of Native and Modiﬁed h-Glucosidase Temperature j(K)

Native t1/2 (h) DG p (kJ/mol) DH p (kJ/mol) DS p (J/mol K1) ECMC t1/2 (h) DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1) EPS t1/2 (h) DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1) ECMOSP t1/2 (h) DG p (kJ/mol) D(DG p )a DH p (kJ/mol) DS p (J/mol K1)

303

323

333

343

56.9 – – –

12 108.7 142 106.2

4.25 109.15 142.92 101.41

0.48 106.69 142.83 105.62

ND – – – –

74.5 113.67 4.97 187.53 229.1

6 110.43 1.26 187.44 231.26

1.03 109.1 2.41 187.36 228.16

75.98 – – – –

27.13 111.04 2.34 166.27 170.99

5.36 110.09 0.94 166.19 168.46

0.6 107.27 0.56 166.1 171.51

83.9 – – – –

22.2 110.51 1.81 166.27 172.63

5.6 110.19 1.04 166.19 168.16

0.57 107.12 0.43 166.10 171.95

ND—Not determined. The experiments were done in triplicate and rate measurement data varied within F2%. The data presented are obtained by taking average values of the triplicates. All the parameters, i.e., Eai, t1/2, DG p , DH p , and DS p , are obtained from the ki values obtained in the temperature-dependent inhibition measurements using Eqs. (2)–(4). After regression analysis, correlation coefficient of the data was found to range between 0.9837 and 0.9967. Ea for the native enzyme (E) was 4.89 kcal/mol and for ECMC, EPS, and ECMPS, the values were 1.46, 4.31, and 3.64 kcal/mol, respectively. Thus DEa (MN) was 3.43, 0.58, and 1.25 kcal/mol for ECMC, EPS, and ECMOSP, respectively. M and N (Ea M– Ea N) denote modified and native enzymes, respectively. The half-life (t1/2) of the enzyme at different temperatures was estimated by incubation at 30jC, 40jC, 50jC, 60jC, and 70jC. Aliquots were removed at definite time intervals over a period of 2 h and assayed for activity at 28jC. Molar ratio (enzyme/modifier) for making ECMC is 1:16 and for EPS and ECMOSP, they are 1:0.5 and 1:1, respectively. a Difference in DG p between native and modified enzyme.

Analysis of Stability of Enzymes

653

Table 8 (A) Standard Free Energy Changes D(DGp ) and (B) Stabilization Factor (SF) of Modiﬁed h-Glucosidase D(DGp ) (kJ/mol) E:M ratio

55jC

60jC

(A) CMC-b-glucosidase 1:0.112 2.47 2.12 1:0.561 3.25 3.39 1:1.01 3.79 3.69 (B) OSP-b-glucosidase 1:0.112 4.46 2.76 1:0.89 7.5 6.07 1:1.6 6.07 4.69

Stabilization factor (SF)

65jC

70jC

55jC

60jC

65jC

70jC

1.38 2.02 2.8

2.05 2.52 2.89

1.81 2.28 2.78

2.16 3.42 3.2

1.64 2.07 2.72

2.15 2.51 2.87

3.62 4.44 4.33

3.31 5.89 4.36

5.12 18.3 11.25

2.69 10.8 6.58

4.04 5.09 5.65

3.21 9.6 5.7

DGp values are obtained using Eq. (2). M and N denote modified and native enzyme, respectively. Experiments were done in triplicate and averages were taken. Correlation coefficients after regression analysis for the values of D(DGp ) for the three CMC preparations were around 0.9941 to 0.9988 and were in the range 0.9537 to 0.9953 for the OSP hglucosidase. SF is t1/2M/t1/2N.

ECMOSP were less stable than ECMC, although ECMOSP showed slightly better activity than EOSP in these solvents (Table 9). 5.4

Subtilisin A

Subtilisin A (or Subtilisin Carlsberg isolated from Bacillus licheniformis and supplied by Novozymes A/S), an alkaline protease, did not show any improvement in stability upon cross-linking with monoglutaraldehyde. In fact, the enzyme became a poor catalyst displaying a lower value for Topt and t1/2 and a dramatic 4-fold increase in Ea. This reaction with MGA must have altered the structure drastically such that the Km decreases dramatically from 0.282 mM for the native enzyme to 0.005 mM for the MGA-Subtilisin. kcat decreases from 7.76 to 0.137 S1 for the modiﬁed enzyme, although the net result is that the eﬃciency of the enzyme as a catalyst remains unaltered at 27.5 M1 S1. These data imply that the enzyme with the modiﬁed structure binds the substrate too strongly and does not release the product readily enough. However, modiﬁcation with OSP 400 produced an enzyme that retained 85% of the original activity while using casein as the substrate. This reduction in activity could be due to a steric hindrance created for the macromolecular substrate by the enzyme which is already attached to OSP

654

Sundaram and Srimathi

Figure 4 Eﬀect of water-miscible organic solvents on h-glucosidase activity. E and ECMC were incubated for 24 h in 60% (v/v) solvents at room temperature (28– 30jC). An aliquot of enzyme was assayed for activity in buﬀer.

Table 9 Correlation Between Solvent Polarity and Stability of h-Glucosidase After Modiﬁcation

Solvent 60% (v/v) Dioxane Ethanol Acetone DMF DMSO Acetonitrile

Relative activity (modiﬁed/native)

Dipole moment of pure solvent A

ECMC

ECMOSP

EPS

0 1.69 2.88 3.86 3.90 3.92

0.8 1.0 1.2 1.59 1.35 1.33

0.56 0.78 0.89 0.84 1.2 1.88

0.59 0.69 0.64 0.85 1.1 1.4

% Residual activity was calculated by comparing the absolute activity in solvent with that without solvent after 24-h incubation at room temperature for each enzyme. The relative activity is a comparison of the residual activities of the modiﬁed enzyme with the native enzyme in each solvent.

Analysis of Stability of Enzymes

655

Table 10 Stabilization Factor (SF) for OSP 400Modiﬁed Subtilisin A t1/2 (h) Temperature (jC) 50 55 60

Modiﬁed

Native

SF

17 10.6 5

3.2 0.57 0.47

5.3 18.6 10.6

400, a large polymer. Ea, the Arrhenius activation energy, decreases by 1.66 kJ/mol after modiﬁcation, whereas Topt remained unchanged at 70jC and T50 increased from 53jC to 61jC. Maximum stabilization occurs at 55jC with the stabilization factor SF reaching 18.6 (Table 10). The kinetic parameters do not change much in the temperature range 30jC to 70jC. After exposure to 1% SDS, nearly 50% activity is retained by OSP 400 subtilisin as against around 15% for the native enzyme. 6

DISCUSSION

In proteins, the primary sequence determines structure and structure inﬂuences function. In enzymes, function implies catalytic activity, and how the latter may be persuaded to remain stable has been discussed in this review with some speciﬁc enzymes as examples. In general, protein structures depend on amino acid composition, and given a linear sequence, they would fold into structures that segregate hydrophobic amino acids which may not be valid in all cases. In extreme cases, proteins may not fold properly but would tend to aggregate and form precipitates, a reason why very often during production, when scaling up is attempted, inclusion bodies are formed. 6.1

Changes in the Kinetic Parameters and Their Significance

How the values of the parameters like Topt, t1/2, U50, DGp , DHp , m, and Eai are estimated has been described, and when the values of all these parameters increase, it is an indication that the catalytic stability of the enzymes also increases. When Ea, the Arrhenius activation energy, goes down in value, it implies that the enzyme has become a better catalyst. 6.2

Effect of Modification on Protein Structure

Altering structure by in vitro covalent methods could also lead to proteins with improved properties including catalytic performance and stability. However, there are elements that are a natural part of the in vitro approach

656

Sundaram and Srimathi

which arise from the fact that the modiﬁers used for changing the structures can be varied in their physicochemical properties. There are two ways in which the protein modiﬁcation manifests itself: (a) directly visible changes in structure, e.g., change of one amino acid for another as in engineered proteins, or (b) in the in vitro method, attachment of a modiﬁer molecule to an amino acid target on the proteins such as an acid or a carbohydrate molecule made to react with a lysine q-NH2 group. In dealing mainly with the in vitro method, the consequences that may be foreseen are the following. 6.2.1

Water Structure

One of the consequences is the water structure changes resulting from a cluster of OH groups present in the carbohydrate, e.g., disaccharides or polymers used as a modiﬁer. This will ‘‘rigidify’’ the protein and stabilize it considerably. 6.2.2

Charges on the Protein

The charge on the protein can be neutralized, reversed, or increased considerably leading to a change in the pI of the protein. Ultimately, the activity and stability pattern of the enzyme can change due to such structural changes. 6.2.3

Solvent Effects

The thermodynamics of water–alcohol (30:70) systems has been investigated at a molecular level using neutron diﬀraction with hydrogen isotope labeling by Maurel (10). In a seemingly simple system such as this, the H-bonding pattern appears quite complex. We draw attention to this phenomenon only to emphasize the fact that the orientation of (a) the hydrophobic head groups—CH3 in methanol in a 70% aqueous mixture of the solvent, and, for example, (b) the oxygens in ether, or (c) the CN in CH3CN, and so on probably produces sizable eﬀects on the enzyme molecule when we try to study their function after a long exposure to predominantly organic media which is important when enzyme-mediated synthesis is a concern. The H-bond network in a medium is a characteristic of the composition of the medium due to the complex thermodynamics of the aqueous mixtures of organic solvents. Given this situation, it is not diﬃcult to see how diﬀerent solvents inﬂuence a given enzyme in diﬀerent ways. It is conceivable that a ﬁne layer of the ‘‘solvent cage’’ orients itself on the enzyme molecule and may aﬀect the enzyme functionally. For example, we have pointed out (Fig. 4) the role played by the dipole moment and the dielectric

Analysis of Stability of Enzymes

657

constant of the solvents in aﬀecting the catalytic eﬃciency of h-glucosidase in our studies. When looking at the eﬀect of the solvents on the properties of an enzyme, one is conscious of the water structure in the medium. A carbohydrate-modiﬁed enzyme in an aqueous medium already strongly inﬂuences the water structure because of the introduction of a sizable cloud of –OH groups from the modiﬁer which in turn forms a network of H-bonds. The eﬀect of organic solvents on such a system will be diﬀerent from that found in a purely aqueous phase. Apart from the intramolecular and intermolecular H-bonds found in the protein molecules, one encounters the H-bond formation by water molecules and the –OH groups of the carbohydrate in the medium. 6.2.4

Temperature Coefficient and Enzyme Stability

It must be pointed out that the catalytic rate is a combination of the thermal stability and the temperature coeﬃcient, and in the descending limb of ‘‘the temperature vs. enzyme activity plot,’’ denaturation becomes signiﬁcant. Because of this, the denaturation eﬀect increases when the assay time increases, and also with increasing temperatures. The overall eﬀect of this is that the real Topt, the optimum temperature for enzymatic activity, could shift to lower values as the ratio of (Einact) to (Eact) in the equilibrium mixture increases with increased exposure time and also at high temperatures as suggested by Daniel et al. (3). In our experiments with covalently modiﬁed enzymes, which show greater stability, t1/2, and Topt values increase noticeably. This implies that the onset of denaturation is delayed towards a higher temperature.

7

EPILOGUE

Anﬁnsen’s paradigm that the primary sequence of a protein deﬁnes its folded structure and consequently its properties is still accepted. Based on this, the primary sequence of enzymes is modiﬁed by protein engineering techniques to change or improve their properties. Introduction of disulﬁde bridges at critical points in the protein could also lead to stabilization provided that the S–S bridge has the right stereochemistry (12). We have shown that a viable alternative to this expensive and laborintensive procedure is an in vitro method for altering protein structures by covalent chemical methods. Methods to analyze the catalytic and structural stabilities have been discussed with choice examples of three proteases and a glycosidase which have been modiﬁed with a variety of modiﬁer molecules. Results reinforce the contention that covalently produced structural changes

658

Sundaram and Srimathi

often lead to enzymes possessing increased resistance to thermal and chemical stress including a spectrum of solvent eﬀects. 7.1

Influence of Carbohydrate Structure on Stabilization

Among the hydrophilic modiﬁers such as carbohydrates, in some instances, it is not yet clear why one sugar, a disaccharide, or a polysaccharide is a better stabilizer than the others. We have tried coupling several disaccharides such as sucrose, maltose, lactose, or trehalose after oxidation with periodate to proteins like papain, chymotrypsin, and trypsin. Usually, sucrose has been found to be the best (Venkatesh et al., unpublished), with maltose the next best stabilizer. Whereas trehalose, which is used as an additive or a cosolvent, has been considered to be very eﬃcient in maintaining the stability of enzymes, it is not a suitable modiﬁer for covalent coupling. We have tried both periodate oxidized as well as carboxymethylated trehalose in modifying subtilisin (unpublished). Among the carbohydrate polymers, dextran is a linear molecule, whereas the sucrose polymer used in our studies is synthetically made and is branched. The branched nature probably makes it a more eﬃcient stabilizer. In conclusion, it may be summarized that: 1. Enzyme structure may be modiﬁed by protein engineering techniques or in vitro covalent procedures. 2. We have established several procedures to activate a variety of modiﬁer molecules that will react with proteins causing a permanent change in their structures. 3. Procedures optimized for assessing catalytic (functional) and structural stabilities such as DGH2O have been used in our studies. 4. In making the protein more hydrophilic, its solubility increases, or if its hydrophobicity is increased, it may alter its behavior in nonaqueous solvents. 5. We have shown how disaccharides and natural and synthetic polysaccharides may be attached to proteins covalently. 6. Our studies showed that the physical addition of carbohydrates did not improve enzymatic behavior as compared with covalent coupling. 7. The variability in the bulk, as in the case of carbohydrate polymers added to the protein molecule, can contribute to the change in the microenvironment of the protein. This could oﬀset some of the conformational stability parameters such as DGH2O and m, the midpoint of unfolding transition.

Analysis of Stability of Enzymes

8.

9.

659

Our studies show that there is suﬃcient scope to synthesize new modiﬁer molecules to further enlarge our approach to enzyme stabilization using the in vitro covalent coupling procedures. It is also clear that catalysis in organic media may be made more facile using in vitro modiﬁcation of enzymes.

REFERENCES 1.

R Rudolph. Successful protein folding on an industrial scale. Protein Engineering Principles and Practice. New York: Wiley-Liss, 1996, pp 283–298. 2. T Lonhienne, C Gerday, G Feller. Psychrophilic enzymes: revisiting the thermodynamic parameters of activation may explain local ﬂexibility. Biochim Biophys Acta 1543:1–10, 2000. 3. RM Daniel, MJ Danson, R Eisenthal. The temperature optima of enzymes: a new perspective on an old phenomenon. Trends Biochem Sci 26:223–225, 2001. 4. CN Pace. Determination and analysis of urea and guanidine hydrochloride denaturation curves. In: CHW Hirs, SN Timasheﬀ, eds. Methods in Enzymology. Vol. 131. New York: Academic Press, 1986, pp 266–280. 5. JT Yang, C-SC Wu, HM Martinez. Calculation of protein conformation from circular dichroism. In: CHW Hirs, SN Timasheﬀ, eds. Methods in Enzymology. Vol. 130. New York: Academic Press, 1986, pp 208–269. 6. N Rajalakshmi, PV Sundaram. Stability of native and modiﬁed papain. Protein Eng 8:1039–1049, 1995. 7. R Venkatesh, PV Sundaram. Modulation of stability properties of bovine trypsin after in vitro structural changes with a variety of chemical modiﬁers. Protein Eng 11:691–698, 1998. 8. PV Sundaram, R Venkatesh. Retardation of thermal and urea induced inactivation of a-chymotrypsin by modiﬁcation with carbohydrate polymers. Protein Eng 11:699–705, 1998. 9. S Deepthi, R Venkatesh, PV Sundaram. Catalytic eﬃciency of covalently modiﬁed proteases against proteinaceous substrates. Ann NY Acad Sci 864:521– 523, 1998. 10. P Maurel. Relevance of dielectric constant and solvent hydrophobicity to the organic solvent eﬀect in enzymology. J Biol Chem 193:1677–1683, 1978. 11. L Subramaniam, PV Sundaram. Kinetics of thermal inactivation of h-glucosidase stabilized by covalent modiﬁcation with soluble carbohydrate polymers. Submitted for publication. 12. A Fersht. Protein stability. Structure and Mechanism in Protein Science, A Guide to Enzyme Catalysis and Protein Folding. New York: W.H. Freeman and Company, 1999, pp 534–535.

Index

Page numbers in boldface indicate in-depth discussion of the subject.

Acinetobacter calcoaceticus, 266 Activation: energy, 73, 294, 636 interfacial, 122 pathway, 124, 214 thermal, 635 Active site, 169, 216 conserved, 18 metal cluster, 249 mutants, 238 Activity: protein engineering concepts, 2, 220 speciﬁc, 4, 37, 219 Activity proﬁle, pH-dependent, 6, 10, 37, 44 Acylation, 48, 62 Agrobacterium, 516 tumefaciens, 462 Alanine scanning, 7, 385 Alcohol dehydrogenase, 170 Algorithm, 11, 115

Alignment: sequence, 4, 25, 61, 161, 298, 366, 473 structure, 44, 55, 89, 336 Aminohydrolase superfamily, 247 Amylase, 17, 216, 296 Analysis, stability, 491, 633 Angle: bond, 39, 109 dihedral, 293, 307 energy, 109 equilibrium, 110 Euler, 130 hinge, 45 phi, 293, 307 psi, 293, 307 rotation, 130, 293 volume, 107 Ankyrin, 4 Aspergillus: awamori, 8, 310 fumigatus, 5 661

662 [Aspergillus] niger, 4 terreus, 5 Assay, 475, 492, 563, 568 automation, 525 circular dichroism, 574 DNA microarrays, 584 ﬁlter, 493 ﬂuorescence, 567 in vivo, 495 IR-thermographic, 577 mass spectroscopy, 579 NMR-based, 588 solid phase, 507 spectroscopic, 563 Automation, 525

Bacillus: agaradherens, 165 amyloliquefaciens, 43 caldolyticus, 3 cereus, 8, 296 circulans, 163, 221 lentus, 41 licheniformis, 41 megaterium, 486 stearothermophilus, 8 subtilis, 3, 379 Bacteriorhodopsin, 44, 50 Barnase, 3, 175 Beta-lactamase, 401 Binase, 3 Binding, 63, 106, 170, 600 aﬃnity,79, 85 change of, 54 cofactor, 36, 269 DNA, 332 domain, 233, 446 energy, 79 mode, 64, 216 site, 40, 61 substrate, 4, 28, 40, 47, 121, 133, 237, 249 Brownian dynamics, 124

Index Candida rugosa, 60 Carbohydrate active enzymes, 15–34, 216, 229 Cassette PCR, 436 Catalytic: identical machinery, 25 mechanism, 19, 60, 216, 231, 240 properties engineering, 242 CAZY, 20 Cellvibrio gilvus, 462 Chaperone, 638 Chemical modiﬁcation, 1, 73, 277, 634 Chimeric, 268, 342, 357, 429, 447, 461 Chromogenic substrate, 484, 511, 518, 607 Chymotrypsin, 647 Classiﬁcation: by conserved structural elements, 62 EC number, 16 of glutathion tranferases, 451 of homing endonucleases, 327 sequence and folding similarities, 17 Cleavage site, 7, 325, 329, 394, 432 Cluster analysis of mutant library, 452 Codon usage, 382 Cold shock protein, 3 Colony pick, 538, 561 Combinatorial: algorithms, 507 cassette mutagenesis, 359, 377 library, 209, 366, 428, 515, 605 modelling of statistics, 185 mutagenesis, 507 mutant libraries, 443 COMBINE, 79 COMFA, 80 Comparative binding energy analysis, 79 Computational methods, 59, 79, 97 Computer simulation, 97–148 Concepts (see Protein engineering concepts) Conﬁguration: inverting, 19, 232 retaining, 19, 232

Index Conformational change, 37, 44, 54, 82, 122, 152 Conformational ﬂexibility, 44, 152, 242, 503 Consensus sequence, 4, 133, 136, 437 Conserved residues, 236, 357 prolines, 299 Continuum dielectric model, 159 Coordinate shift, 38 Coulombic interactions, 113 Covalently: crosslinked, 277 modiﬁed, 633 Crystallography, 38 Cumulative eﬀect, 295, 505 Cytochrome f, 44, 50 Databases: CAZY, 20 Lipase Engineering Database, 61 MEROPS, 32 Deamidation, 7 Degenerate DNA, 359, 365, 437, 515– 518 Dehalogenase, 79–96 Dehydrogenase, 261, 303, 587 alcohol, 170 glucose, 261 3-isopropylmalate, 495 Deletion, 36, 49, 299, 357–363, 384, 499, 623 Dielectric: boundary, 160 cavity, 175 constant, 110, 124, 152, 159–160, 161, 175, 656 medium, 160, 175 Digital imaging, 482, 507 Dihedral angles (see Angle) Directed evolution, 9, 32, 40, 475 methods, 353–373, 375, 413 modelling and optimization protocols, 185–212 optimization, 443

663 Diversity: chimera, 435 error-prone, 381, 384 generation, 356, 401, 413, 428, 515, 607 shuﬄing, 361, 428, 449 structural, 40, 444 theoretical, 185, 380, 515 DNA shuﬄing, 354, 361, 364, 413, 425, 443, 461 modelling, 186 Docking, 2, 65–77, 79–96, 101, 106, 220 Domain, 2, 44, 106, 463, 599 binding, 82, 233, 601 catalytic, 26, 233 noncatalytic, 26 shuﬄing, 422, 444 Drosophila lebanonensis, 170 EC-number, 16 Electrostatic, 2, 44 interactions, 110, 113, 149–184 in denatured proteins, 174 Enantioselectivity, 2, 10, 59, 376, 559 assay, 484 in silico assay, 64 modelling of, 59–78 predicting, 65 screening, 559–598 Endonucleases, 325 Engineering: activity, 4, 220, 242, 253 selectivity, 376 speciﬁcity, 219, 241, 253, 275, 340, stability, 3, 219, 274, 293 Environment: electrostatic, 2 steric, 2 Error-prone, 358, 502 diversity of, 384 mutational bias of, 381 polymerase chain reaction, 376–390 Escherichia coli, 266, 499, 600 Evolutionary method, 353

664 Expression: in Pichia pastoris, 280 in Pseudomonas, 617 Filter assay, 493 Flexibility, 38, 53, 162 Flow cytometry, 607 Fluorescence, 567 activated cell sorting (FACS), 600 Fluorescent, 238, 283, 482, 567, 585, 606 protein, 342 Fluorogenic: assay, 567 substrate, 485 Fold, 6, 17, 23, 37, 39, 106 alpha-beta-hydrolase fold, 60 (beta-alpha)8 –fold, 23, 247 beta-propeller, 262 of oligo-1,6-glucosidase, 296 Foldase, 618 Folding, 6, 36, 637 chaperone, 618 families, 25 feature of oligo-1,6-glucosidase, 296 mutant impact, 355, 385, 467, 493 pattern, 43 simulation, 99 Force ﬁeld, 11, 80, 101, 106, 219 Forces: long-range, 111 short-range, 111 Fragment reassembly (see Reassembly) Free energy: of deprotonation, 154 duplex formation, 187 electrostatic free energy of desolvation, 80 Functional space, 443 Galactose oxidase, 511 Glucose dehydrogenase, 261 Glucosidases, 15–34, 216, 231, 461 alpha-, 216 beta-, 461, 516, 649

Index Glutathione transferases, 444 Glycosidases, 231 Glycosyltransferases, 15 GRID, 82 Haloalkane dehalogenase: DhlA, 85 linB, 90 Hamiltonian operator, 102 H-bonds, 2, 8 Helix-capping, 9 High throughput, 508, 525, 563 Homing endonucleases, 325 Homologous regions, 209, 463 Homology, 42, 199, 203, 216, 430 approaches, 3 model, 18, 216, 266, 362 Hydrogen bonds, 2, 8 Hydrophobic interactions, 2, 8, 491 Hygroycin B phosphotransferase, 495 Immunoglobulin, 3 Inactivation, thermal/heat, 495, 540, 635 Inhibitor: HIV protease, 81 phospholipase A2, 81 Insertions, 359 Intein, 337 Intermediate: carbamoyl, 249 carbo cation, 218 cryotrapped, 239 folding, 637 oxazolinium ion, 232 product, 62 tetrahedral, 71, 122 transition state, 134 Inverting, 19, 232 In vitro evolution, 353 In vivo gene shuﬄing, 414 Ion: binding, 269 cluster, 249

Index Ionization equilibria in proteins, 149 ITCHY, 199, 363 Kanamycin nucleotidyl transferase, 495 Kcat, 37, 48, 222, 255, 273, 511, 642 Kinetic, 247, 253 parameters of chimers, 470 Klebsiella pneumoniae, 280 Km, 37, 222, 255, 273 prediction of, 85 Lipase, 59, 121, 376, 402, 563, 619 Lipid interface, 123 Lipid-lipase interactions, 121 Lyases, 15 Lysozyme, 218 Mechanism, 19, 217, 229 chitinases, 231 conserved, 18 glycosidase, 19, 217, 229 Methionine, 7 Michaelis–Menten complex, 85, 122, 138 Microtiter plates, 481, 537, 570 Modelling: annealing events, 186 directed evolution, 185 homology, 18 Michaelis–Menten complex, 85 quantitive modelling, 59 Modular organization, 15, 26 Molecular dynamics simulation, 98, 101 enantiomer analysis, 71 Molecular interactions, 107 Monooxygenase, 365, 477 Monte Carlo method, 99 Mutant: combination, 3, 8 libraries (see Variant library) many in same position, 44 Mutational bias, 381 Newton’s second law of motion, 102 p-Nitrophenol, 485, 492, 563

665 Nonaqueous media, 651 Nucleophile: active site residue, 10, 86, 122, 133, 216, 232, 249, 463 reaction, 398, 449 Oligo-1,6-glucosidase, 296 Optically active, 62 Oxyanion, 122, 133 hole, 62 Oxydation: cysteine, 7 methionine, 7 Papain, 642 PCR, 358, 436 P450 monooxygenase, 486 pH: activity proﬁle, 6, 10, 37, 44, 468 changing, 6, 221, 236 Phage display, 391–412 Phosphotriesterase, 247 Phylogeny, 515, 517 Phytase, 1–14 pKa: of residues, 134, 152, 175, 221 shift, 2, 10 Plates: agar, 416, 477, 492, 509 culture, 449 microtiter (see Microtiter plates) solid media, 408 Poisson-Boltzmann equation, 160 Polymerase chain reaction (PCR), 358, 436 Potential energy functions, 107 Prediction: of activity, 80 of annealing temperature, 187 of enantioselectivity, 65 of mechanism 18, 20 overprediction, 28 of stabilization, 11 of structure 18, 20 Proline rule, 2, 293

666 Protease: resistant, 7 susceptibility, 7 trypsin-like, 36 Protein core, 8 Protein engineering, rational, 2 Protein engineering concepts, 1–14 cavity ﬁlling, 7 consensus, 2 homologous enzymes, 2 homology approaches, 2, 3 proline rule, 2, 293 replacing cysteines and methionines, 7 stability, 219 Protein structure, 3-D, 1 Protein tyrosine phosphatase, 131 Protonation and deprotonation, 150 Pseudomonas: aeruginosa, 376, 563, 619 cepacia, 60 ﬂuorescence, 480 putida, 626 Pyrroloquinoline quinine glucose dehydrogenase (PQQ-GDH), 261 QSAR, 80 Quantitative modelling, 59 Quantitative structure activity relationship, 80 Quantum mechanical calculation, 88 Quantum mechanics, 102 Quick-E-Test, 565 RACHITT, 364, 428 Ramachandran plot, 39, 304 Random chimeragenesis on transient templates (RACHITT), 364 Random mutagenesis, 358, 375 Rational protein engineering, 2 Rational redesign, 79, 213 Reaction: conditions, 70 mechanism, 19, 217, 229

Index Reassembly, 190, 202, 207, 361, 426 Recombination, 357 gene, 360 theoretical, 185 Redesign, 79 Redox enzymes, 261 Reducing entropy, 2 Retaining, 19, 232 Rhizomucor miehei, 123, 126 Rhizopus delemar, 126 RNase, 499 Saccharomyces cerevisiae, 413, 604 Salt bridge, 2, 7, 8, 39 Saturation mutagenesis, 385 Scratchy, 188, 199, 363 Screening, 406, 475–490 automation, 525 of culturable microorganism, 477 high throughput, 520, 525 kinetics, 509 microtiter plate (MTP), 481, 537, 570 noise, 540 optimization, 541 robotics, 525 thermostability, 491–506 Secondary structural elements, 2, 9, 39 variant in helices, 44 Secretion, 617 Selection, 406, 475 complementation, 480 display methods, 480 growth in presence of antibiotics, 480, 495 Sequence alignment, 4, 25, 61, 298, 366, 473 Serratia marcescens, 233 clans, 24 families, 15 Shuﬄing, 425 family, 364, 428 gene, 413 single-stranded DNA, 430

Index

667

sn-1, sn-2 and sn-3 position, 68 SN1, 218 SN2 reaction, 88 Solvent: channel, 50 eﬀects, 70, 656 Speciﬁc activity, 2, 46, 131 Speciﬁcity, 2, 37, 46, 131, 241 chain length, 61 enantioselectivity, 2, 59, 253 Spectroscopy, 309, 563, 579 Sphingomonas paucimobilis, 90 Stability, 2, 44, 219, 633 catalytic, 633 cofactor, 269 improved, 3, 37 structural, 633 thermostability, 3, 219 Staggered extension process (StEP), 361 Staphylococcus aureus, 496 Stereoselectivity, 253, 482 Structural: alignment, 44, 55, 89, 336 diversity, 444 modules, 444 motif, 36 stability, 633 Structure: activity, 9 -based approaches, 6 stability, 7 Subsites, 254 Substrate: binding site, 10, 40 conﬁguration, 25 docked, 64 speciﬁcity, 2, 37, 44, 131 Subtilisin, 37, 43, 44–49, 195, 653 Subtle changes, 42, 50 Suicide substrates, 398 Suppressor mutation, 495 Surface display cell, 599–616 T4-lysozyme, 44, 45 Taq-polymerase, 379

колхоз 5/15/06

Temperature factor, 52 Thermal: activation, 635 inactivation, 635 proﬁles, 467 Thermoadaptation, 495 Thermolysin, 8, 216 Thermomyces lanuginosus, 130, 402, 414 Thermostabilization, 3, 293, 491, 511 Thermotoga maritime, 462 Three-dimensional (3-D) structure, 1, 6, 18, 24, 35, 39 subtle changes, 42, 50 of variant enzymes, 35–58 Time scale of motions, 104–106 Titrateable group, 2 Titration, 152 irregular, 167 Torsion: angle, 39, 67–69, 107 potential function, 109 Transition: DNA, 359 state, 134, 214 analogues, 396 Transversions, 359 Turnover, 48, 122, 400, 610 Umbelliferone, 486 Unfolding, 637 energy, 174, 307, 638 van der Waals interaction, 110, 491 Variant library: combinatorial DNA library, 185 library size, 599 Water structure, 656 Yeast, 413 Xanthobacter autotrophicus, 85 X-ray crystallography, 38, 35–58 isomorphous, 41 molecular replacement, 41 Xylanase, 163, 221