Chemical Biology
Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Cunther Wess Volume I
Related Titles
Larijani, B., Woscholski, R., Rosser, C. A. (eds.)
Casteiger, I. (ed.)
Chemical Biology
Handbook o f Chemoinformatics
Applications and Techniques
From Data to Knowledge
2006 Hardcover ISBN 978-0-470-09064-0
2003 Hardcover ISBN 978-3-527-30680-0
Klipp, E., Herwig, R., Kowald, A., Wierling, C., Lehrach, H.
Nicolaou, K. C., Hanko, R., Hartwig, W. (eds.)
Systems Biology in Practice
Handbook of Combinatorial Chemistry
Concepts, Implementation and Application
Drugs, Catalysts, Materials
2005 Hardcover ISBN 978-3-527-310784
2002 Hardcover ISBN 978-3-527-30509-4
Kubinyi, H.,Muller, G . (eds.)
Chemogenomics in Drug Discovery
Beck-Sickinger, A., Weber, P.
A Medicinal Chemistry Perspective
Combinatorial Strategies in Biology and Chemistry
2004 Hardcover ISBN 978-3-527-30987-0
2002 Hardcover ISBN 978-0-471-49726-4
1807-2007 Knowledge for Generations Each generation has its unique needs and aspirations. When Charles Wiley first opened his small printing shop in lower Manhattan in 1807, it was a generation of boundless potential searching for an identity. And we were there, helping to define a new American literary tradition. Over half a century later, in the midst of the Second Industrial Revolution, it was a generation focused on building the future. Once again, we were there, supplying the critical scientific, technical, and engineering knowledge that helped frame the world. Throughout the 20th Century, and into the new millennium, nations began to reach out beyond their own borders and a new international community was born. Wiley was there, expanding its operations around the world to enable a global exchange of ideas, opinions, and know-how. For 200 years, Wiley has been an integral part of each generation’s journey, enabling the flow of information and understanding necessary to meet their needs and fulfill their aspirations. Today, bold new technologies are changing the way we live and learn. Wiley will be there, providing you the must-have knowledge you need to imagine new worlds, new possibilities, and new opportunities. Generations come and go, but you can always count on Wiley to provide you the knowledge you need, when and where you need it!
William J. Pesce President and Chief Executive Officer
Peter Booth Wiley Chairman of the Board
Chemical Biology From Small Molecules to Systems Biology and Drug Design Edited by Stuart 1. Schreiber, Tarun M. Kupoor, and Cunther Wess
.,CENTENNIAL
B I C I W T E N N I I L
WILEY-VCH Verlag CmbH & Co. KCaA
The Editors
Prof: Dr. Stuart L. Schreiber Howard Hughes Medical Institute Chemistry and Chemical Biology Harvard University Broad Institute o f Harvard and MIT Cambridge, MA 02142 USA
Prof: Dr. Tarun M. Kapoor Laboratory o f Chemistry and Cell Biology Rockefeller University 1230 York Ave. New York, NY 10021 USA
Prof: Dr. Ciinther Wess CSF - Forschungszentrum fur Umwelt und Gesundheit lngolstadter Landstr. 1 85764 Neuherberg Germany
All books published by Wiley-VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free o f errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.
Library ofcongress Card No.: applied for British Library Cataloguingin-Publication Data A catalogue record for this book i s available from the British Library.
Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet a t < http://dnb.d-nb.dez.
0 2007 WILEY-VCH Verlag CmbH & Co KCaA, Weinheim All rights reserved (including those o f translation into other languages). No part o f this book may be reproduced in any form - by photoprinting, microfilm, or any other means - nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.
Typesetting Laserwords Private Ltd, Chennai, India Printing betz-druck CmbH, Darmstadt Binding Litges & Dopf CmbH, Heppenheim Cover Schulz Grafik-Design, Fussgonheim Wiley Bicentennial Logo Richard J. Pacific0 Printed in the Federal Republic o f Germany Printed on acid-free paper
ISBN 978-3-527-31150-7
Iv
Preface
XV
List of Contributors
XVll
Volume 1 Part I
chemistry and Biology - Historical and Philosophical Aspects
1
Chemistry and Biology - Historical and PhilosophicalAspects
1.1 1.2 1.2.1 1.2.2 1.2.3 1.3 1.3.1 1.3.2 1.3.3
Prologue 3 Semantics 4 Synthesis - Genesis - Preparation 4 Synthetic Design - Synthetic Execution 8 Preparative Chemistry - Synthetic Chemistry 9 Bringing Chemical Solutions to Chemical Problems 10 The Present Situation 10 Historical Periods of Chemical Synthesis 12 Diels-Alder Reaction - Prototype of a Synthetically Useful Reaction IG Bringing Chemical Solutions to Biological Problems 18 The Role of Evolutionary Thinking in Shaping Biology 18 On the Sequence of Chemical Synthesis (Preparation) and Biological Analysis (Screening) 20 Bringing Biological Solutions to Chemical Problems 45 Proteins [99] 45 Antibodies 52 Bringing Biological Solutions to Biological Problems 53 EPILOGUE 54 The Fossil Fuel Dilemma of Present Chemical Industry 54
1.4 1.4.1 1.4.2 1.5 1.5.1 1.5.2 1.G
1.7 1.7.1
Gerhard Quinkert, Holger Wallmeier,Norbert Windhab,and Dietmar Reichert
Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Cunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
3
vi
1
Contents
1.7.2
Two Lessons From the Wealth of Published Total Syntheses 55 Acknowledgments 58 References 59
Part II
Using Natural Products to Unravel Biological Mechanisms
2
Using Natural Products to Unravel Biological Mechanisms
2.1
Using Small Molecules to Unravel Biological Mechanisms Michael A. Lampson and Tarun M . Kapoor
2.1.1 2.1.2 2.1.3 2.1.4
2.2
2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6
3 3.1
3.1.1 3.1.2 3.1.3 3.1.4 3.1.5
71 71
Outlook 71 Introduction 71 Use of Small Molecules to Link a Protein Target to a Cellular Phenotype 72 Small Molecules as Probes for Biological Processes 77 Conclusion 89 References 90 Using Natural Products to Unravel Cell Biology Jonathan D. Gough and Craig M . Crews Outlook 95 Introduction 95 Historical Development 95 General Considerations 96 Applications and Practical Examples Future Development 109 Conclusions 109 Acknowledgments 110 References 110
95
96
Engineering Control Over Protein Function Using Chemistry
115 Revealing Biological Specificityby Engineering Protein- Ligand Interactions 115 Matthew D. Simon and Kevan M. Shokat Outlook 115 Introduction 115 The Selection of Resistance Mutations to Small-moleculeAgents 116 Exploiting Sensitizing Mutations to Engineer Nucleotide Binding Pockets 126 Engineering the Ligand Selectivelyof Ion Channels 130 Conclusion 134 References 136
Contents
3.2
Controlling Protein Function by Caged Compounds 140 Andrea Giordano, Sirus Zarbakhsh, and Carsten Schultz
3.2.1 3.2.2 3.2.3 3.2.4
Introduction 140 Photoactivatable Groups and Their Applications 140 Caged Peptides and Proteins I S 0 Caged Proteins by Introduction of Photoactive Residues via Site Directed, Unnatural Amino Acid Mutagenesis 156 Small Caged Molecules Used to Control Protein Activity 159 Conclusions 168 References 168
3.2.5 3.2.6
3.3
3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10
4 4.1
4.1.1 4.1.2 4.1.3 4.1.4 4.1.5
4.2
Engineering Control Over Protein Function; Transcription Control by Small Molecules 174 j o h n T. Koh Outlook 174 Introduction 174 The Role of Ligand-dependent Transcriptional Regulators 175 Engineering New Ligand Specificities into NHRs 179 The Requirement of “Functional Orthogonality” 180 Overcoming Receptor Plasticity 180 Nuclear Receptor Engineering by Selection 183 Ligand-dependent Recombinases 184 Complementation/Rescue of Genetic Disease 186 De Novo Design of Ligand-binding Pockets 188 Light-activated Gene Expression from Small Molecules 189 References 191 199 Chemical Complementation: Bringing the Power of Genetics to Chemistry 199 Pamela Peralta-Yahya and Virginia W. Cornish
Controlling Protein-Protein Interactions
Outlook 199 Introduction 199 History/Development 202 General Considerations 208 Applications 21 G Future Development 222 References 223 Controlling Protein- Protein Interactions Using Chemical Inducers and Disrupters of Dimerization 227 T i m Clackson Outlook
227
1
vii
viii
1
Contents
4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6
Introduction 227 Development of Chemical Dimerization Technology Dimerization Systems 229 Applications 237 Future Development 245 Conclusion 245 Acknowledgments 246 References 246
4.3
Protein Secondary Structure Mimetics as Modulators of Protein-Protein and Protein-Ligand Interactions 250 Hang Yinand Andrew D. Hamilton
4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 4.3.6
5
5.1
5.1.1 5.1.2 5.1.2.2 5.1.2.3 5.1.2.4 5.1.2.5 5.1.3 5.1.3.2 5.1.3.3 5.1.4 5.1.4.2 5.1.4.3 5.1.5
Outlook 250 Introduction 250 History and Development 251 General Considerations 253 Applications and Practical Examples Future Developments 264 Conclusion 265 Acknowledgments 2G5 References 265
228
255
271 Synthetic Expansion of the Central Dogma Masahiko Sisido Expanding the Genetic Code
271
Outlook 271 Introduction 272 Aminoacylation of tRNA with Nonnatural Amino Acids 274 Micelle-mediatedAminoacylation 275 Ribozyme-mediatedAminoacylation 276 PNA-assisted Aminoacylation 277 Directed Evolution of Existing aaRS/tRNA Pair to Accept Nonnatural Amino Acids 278 Other Biomolecules That Must Be Optimized for Nonnatural Amino Acids 281 Adaptability of EF-Tu to Aminoacyl-tRNAsCarrying a Wide Variety of Nonnatural Amino Acids 283 Adaptability of Ribosome to Wide Variety of Nonnatural Amino Acids 283 Expansion of the Genetic Codes 284 Four-base Codons 285 “Synthetic Codons” That Contain Nonnatural Nucleobases 286 In vivo Synthesis of Nonnatural Mutants 287
Contents
5.1.7
Application of Nonnatural Mutagenesis - Fluorescence Labeling 289 Future Development and Conclusion 291 Acknowledgments 291 References 291
Part Ill
Engineering Control Over Protein Function Using Chemistry
6
Forward Chemical Genetics
5.1.6
6.1 6.2 6.3 6.3.1 6.3.2 6.3.3 6.3.4 6.3.5 6.3.6 6.3.7 6.3.8 6.3.9 6.3.10 6.3.11 6.3.12 6.4 6.4.1 6.4.2 6.4.3 6.5 6.6
299
StephenJ. Haggarty and Stuart L. Schreiber Outlook 299 Introduction 299 History/ Development 302 General Considerations 307 Small Molecules as a Means to Perturb Biological Systems Conditionally 307 Forward and Reverse Chemical Genetics 308 Phenotypic Assays for Forward Chemical-Genetic Screening 3 12 Nonheritable and Combinations of Perturbations 316 Multiparametric Considerations: Dose and Time 318 Sources of Phenotypic Variation: Genetic versus Chemical Diversity 318 The “Target Identification” Problem 329 Relationship between Network Connectivity and Discovery of Small-molecule Probes 323 Computational Framework for Forward Chemical Genetics: Legacy of Morgan and Sturtevant 325 Mapping of Chemical Space Using Forward Chemical Genetics 326 Dimensionality Reduction and Visualization of Chemical Space 330 Discrete Methods of Analysis of Forward Chemical-genetic Data 334 Applications and Practical Examples 336 Example 1: Mitosis and Spindle Assembly 336 Example 2: Protein Acetylation 338 Example 3: Chemical-genomic Profiling 340 Future Development 344 Conclusion 347 Acknowledgments 348 References 349
I
ix
X I
Contents
7
7.1
Reverse Chemical Genetics Revisited 355 Reverse Chemical Genetics - An Important Strategy for the Study of Protein Function in Chemical Biology and Drug Discovery 355 Rolf Breinbauer, Alexander Hillisch, and Herbert Waldmann
7.1.1 7.1.2 7.1.3 7.1.4 7.1.5 7.1.6
Introduction 355 History/Development 356 General Considerations 361 Applications and Practical Examples Future Developments 376 Conclusion 379 Acknowledgments 380 References 380
7.2
Chemical Biology and Enzymology: Protein Phosphorylation as a Casestudy 385 Philip A. Cole
7.2.1 7.2.2
7.3
7.3.1 7.3.2 7.3.3 7.3.4 7.3.5 7.3.6
8 8.1
8.1.1 8.1.2
366
Outlook 385 Overview 385 The Enzymology of Posttranslational Modifications of Proteins 387 References 401 Chemical Strategies for Activity-based Proteomics NadimJessani and Benjamin F. Cravatt Outlook 403 Introduction 403 History/Development 404 General Considerations 407 Applications and Practical Examples Future Development 421 Conclusions 422 Acknowledgments 423 References 423
403
415
Tags and Probes for Chemical Biology
427 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications 427 Stephen R. Adams Outlook 427 Introduction 427 History and Design Concepts of the Tetracysteine-biarsenical System 429
Contents
8.1.3 8.1.4 8.1.5 8.1.6
8.2
8.2.1 8.2.2 8.2.3 8.2.4
General Considerations 430 Practical Applications of the Biarsenical-tetracysteine System 439 Future Developments and Applications 453 Conclusions 454 Acknowledgments 454 References 454 Chemical Approaches to Exploit Fusion Proteins for Functional Studies 458 Anke Arnold, India SielaJ NilsJohnsson, and Kailohnsson Outlook 458 Introduction 458 General Considerations 459 Applications and Practical Examples 463 Conclusions and Future Developments 476 Acknowledgments 477 References 477
Volume 2 Part IV
Controlling Protein- Protein Interactions
483 483
9
Diversity-orientedSynthesis
9.1
Diversity-oriented Synthesis Derek S. Tan
9.2
Combinatorial Biosynthesis of Polyketides and Nonribosomal Peptides 519 Nathan A. Schnarr and Chaitan Khosla
10
Synthesis of Large Biological Molecules
10.1
Expressed Protein Ligation 537 Matthew R. Pratt and Tom W. Muir
10.2
Chemical Synthesis of Proteins and Large Bioconjugates Philip Dawson
10.3
New Methods for Protein Bioconjugation Matthew B. Francis
11
Advances in Sugar Chemistry
11.1
537
567
593
635 The Search for Chemical Probes to Illuminate Carbohydrate Function 635 Laura L. Kiessling and Erin E. Carlson
1
xi
xii
I
Contents
11.2
Chemical Glycomics as Basis for Drug Discovery Daniel B. Werz and Peter H. Seeberger
668
12
The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors 693
Paul A. Townsend, Simon]. Crabb, Sean M. Davidson, Peter W. M. Johnson, Graham Packham, and Arasu Ganesan Part V
Expandingthe Genetic Code
13
Chemical Informatics
13.1
Chemical Informatics Paul A. Clemons
13.2
WOMBAT and WOMBAT-PK Bioactivity Databases for Lead and Drug Discovery 760 Marius Olah, Ramona Rad, Liliana Ostopovici, Alina Bora, Nicoleta Hadaruga, Dan Hadaruga, Ramona Moldovan, Adriana Fulias, Maria Mracec, and Tudor 1. Oprea
723 723
Volume 3 Part VI
Forward Chemical Genetics
14
Chemical Biology and Drug Discovery
14.1
789 Managerial Challenges in Implementing Chemical Biology Platforms 789 Frank L. Douglas
14.2
The Molecular Basis of Predicting Druggability 804 Bissan Al-Lazikani, Anna Gaulton, Gaia Paolini, Jerry Lanfar, John Overington, and Andrew Hopkins
15
Target Families
15.1
The Target Family Approach Hans Peter Nestler
15.2
Chemical Biology of Kinases Studied by NMR Spectroscopy 852 Marco Betz, Martin Vogtherr, Ulrich Schieborr, Bettina Elshorst, Susanne Grimrne, Barbara Pescatore, Thomas Langer, Krishna Saxena, and Harald Schwalbe
825
825
Contents
891
15.3
The Nuclear Receptor Superfamily and Drug Discovery John T. Moore, Jon L. Collins, and Kenneth H . Pearce
15.4
The GPCR - 7TM Receptor Target Family 933 Edgar Jacoby, Rochdi Bouhelal, Marc Gerspacher, and Klaus Seuwen
15.5
Drugs Targeting Protein-Protein Interactions Patrick Che'ne
16
Prediction of ADM ET Properties
Part VII
Reverse Chemical Genetics Revisited
17 17.1
1045 Systems Biology of the JAK-STATSignaling Pathway 1045 lens Timmer, Markus Kollrnann, and Ursula Klingmiiller
17.2
Modeling Intracellular Signal Transduction Processes Jason M. Haugh and Michael C. Weiger
18 18.1
Genome and Proteome Studies
18.2
Scanning the Proteome for Targets of Organic Small Molecules Using Bifunctional Receptor Ligands 1118 Nikolai Kley
Part Vlll
Tags and Probes for Chemical Biology
19
Chemical Biology - An Outlook
979
I003 UEfNorinder and Christel A. S. Bergstrom
Computational Methods and Modeling
1 061
1083 Genome-wide Gene Expression Analysis: Practical Considerations and Application to the Analysis of T-cell Subsets in Inflammatory Diseases 1083 Lars Rogge and Elisabetta Bianchi
Giinther Wess Index
1151
1143
I
xiii
I
Preface Small molecules are at the heart of chemical biology. The contributions in this book reveal the many ways in which chemical biologists’ studies of small molecules in the context of living systems are transforming science and society. Macromolecules are the basis of heritable information flow in living systems. This is evident in the Central Dogma of biology, where heritable information is replicated via DNA and flows from DNA to RNA to proteins. Small molecules are the basis for dynamic information flow in living systems. They constitute the hormones and neurotransmitters, many intra- and intercellular signaling molecules, the defensive and offensive ”natural products”used in information flow between organisms, among many others. They are the basis for memory and cognition, sensing and signaling, and, of course, for many of the most effective therapeutic agents. One dominant theme in many of the chapters concerns small molecules and small-molecule screening. Together, these have dramatically affected lifescience research in recent years. Many of the contributors to Chemical Biology themselves both provided new tools for understanding living systems and affected smoother transitions from biology to medicine. The chapters they have provided offer riveting examples of the field’s impact on life science. The range of approaches and the creativity that fueled these projects are truly inspiring. After a period of widely recognized advances by geneticists and molecular and disease biologists, chemists and chemical biologists are returning to a position of prominence in the consciousness of the larger scientific community. The trend towards small molecules and small-molecule screening has resulted in an urgent need for advances in synthetic planning and methodology. Synthesis routes are needed for candidate small molecules and for improved versions of candidates identified in biological discovery efforts. Several contributors give hints to the question: How do we synthesize candidate structures most effectively poised for optimization? They note that planning and performing multi-step syntheses of natural products in the past resulted in the recognition and, often, resolution of gaps in synthetic methodology. The synergistic relationship between organic synthesis planning and methodology Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
xv
xvi
1
Preface
is even more profound as synthetic organic chemists tackle the new challenges noted above. The objects of synthesis planning, no longer limited by the biochemical transformations used by cells in synthesizing naturally occurring small molecules, require radically new strategies and methodologies. Several contributors help us answer a related question that also influences synthetic plannig: What are the structural features of small, organic molecules most likely to yield specific modulation of disease-relevant functions? They note that the ability to assess the performance of these compounds, and to compare their performance to other small molecules such as commercially available or naturally occurring ones, is possible through public small-molecule screening efforts and public small-molecule databases (e.g., WOMBAT, PubChem, ChemBank). These developments are reminiscent of the early stage of genomics research, where visionary scientists recognized the need to create a culture of open data sharing and to develop public data repositories (e.g., GenBank) and analysis environments (e.g., Ensembl, UCSC Genome Browser). Sometimes the line between small and macromolecules is blurred. Oligosaccharides are often presented as a third class of macromolecules, yet several contributions here reveal arguably greater similarities of carbohydrates to small-molecule terpenes than to nucleic acids and proteins, both in terms of their biosynthesis and cellular functions. Oligosaccharides are shown to be synthesized by glycosyl transferases (analogous to isopentenyl pyrophosphate transferases used in terpene biosynthesis) and, like the terpenes, are subject to tailoring enzymes. Transferase enzymes are used to attach oligosaccharides and terpenes to proteins, where they serve key functions (e.g., glycoproteins, farnesylated Ras). Chemical biologists have illuminated and manipulated oligosaccharides and the unquestionable member of the macromolecule family, the proteins, with great aplomb. Several of our contributors are pioneers in the revolution of protein chemistry and protein engineering, and their chapters provide clear testimony to the consequences of these advances to life science. Finally, in examing the similarities of and synergies between chemical biology and systems biology, several of our contributors have perhaps offered a glimpse into the future of these fields. Stuart L. Schreiber, Cambridge Tarun M. Kapoor, New York Gunther Wess, Neuherberg
January 2007
List of Contributors Stephen R. Adarns Department o f Pharmacology University o f California, San Diego 310 George Palade Laboratories 0647 La Jolla, CA 92093-0647 USA
Elisabetta Bianchi lmmunoregulation Laboratory Department o f Immunology Institute Pasteur 25, rue du Dr. Roux 75724 Paris Cedex 15 France
Anke Arnold Ecole Polytechnique Federale de Lausanne (EPFL) Institute o f Chemical Sciences and Engineering 1011 Lausanne Switzerland
A h a Bora Division o f Biocomputing University o f New Mexico School o f Med, MSC11 6445 Albuquerque, N M 87131 USA
Christel A. S. Bergstrom AstraZeneca R&D Discovery Medicinal Chemistry 15185 Sodertalje Sweden
Rochdi Bouhelal Novartis Institutes for BioMedical Research Lichtstrasse 35 4056 Basel Switzerland
Marco Betz Center for Biomolecular Magnetic Resonance Institute o f Organic Chemistry and Chemical Biology Johann Wolfgang GoetheUniversity Frankfurt Max-von-Laue-Str. 7 60439 Frankfurt Germany
Rolf Breinbauer Institute o f Organic Chemistry University o f Leipzig Johannisallee 29 041 03 Leipzig Germany
Erin E. Carkon Department o f Chemistry University o f Wisconsin 1101 University Avenue Madison, WI 53706 USA
Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
xviii
1
List
ofContributors
Patrick Chene Oncology Research Novartis Institutes for Biomedical Research 4002 Basel Switzerland Tim Clackson ARIAD Pharmaceuticals, Inc. 26 Landsdowne Street Cambridge, MA 021 39-4234 USA Paul A. Clemons Chemical Biology Broad Institute o f Harvard & MIT 7 Cambridge Center Cambridge Center, MA 02142 USA Philip A. Cole Department o f Pharmacology Johns Hopkins School o f Medicine 725 N. Wolfe St. Baltimore, MD 21 205 USA Jon L. Collins Discovery Research. GlaxoSmithKline Discovery Research Research Triangle Park, NC 27709 USA Virginia W. Cornish Department o f Chemistry Columbia University 3000 Broadway, MC 31 67 New York, NY 10027-6948 USA Simon J. Crabb School o f Chemistry University o f Southampton Highfield Southampton SO1 7 1 BJ United Kingdom
Craig M. Crews Yale University School o f Medicine 333 Cedar Street New Haven, CT 06510 USA Benjamin F. Cravatt Neuro-Psychiatric Disorder Institute The Skaggs Institute for Chemical Biology The Scripps Research Institute BCC 159 10550 North Torrey Pines Rd. La Jolla, CA 92037 USA Sean M. Davidson The Hatter Cardiovascular Institute 67 Chenies Mews University College Hospital London WC1 E 6DB United Kingdom Philip Dawson Department o f Cell Biology and Chemistry The Scripps Research Institute 10550 N. Torrey Pines Road La Jolla, CA 92037 USA Frank L. Douglas Aventis Pharma lndustriepark Hochst 65926 Frankfurt Germany Bettina Elshorst Center for Biomolecular Magnetic Resonance Institute o f Organic Chemistry and Chemical Biology Johann Wolfgang GoetheUniversity Frankfurt Max-von-Laue-Str. 7 60439 Frankfurt Germany
List ofcontributors
Matthew B. Francis Department o f Chemistry University of California, Berkeley Berkeley, CA 94720-1460 USA Adriana Fulias Division of Biocomputing University o f New Mexico School of Med, MS C l l 6445 Albuquerque, N M 87131 USA Arasu Canesan School of Chemistry University o f Southampton Highfield Southampton SO1 7 1BJ United Kingdom Anna Caulton Pfizer Global Research and Development Pfizer Ltd. Sandwich, Kent, CT13 9NJ United Kingdom Marc Cerspacher Novartis Institutes for BioMedical Research Klybeckstrasse 141 4057 Basel Switzerland Andrea Giordano European Molecular Biology Laboratory Gene Expression Programme Meyerhofstr. 1 691 17 Heidelberg Germany
Jonathan D. Cough Yale University Department of Molecular, Cellular, and Developmental Biology Kline Biology Tower 442 New Haven, CT 06520-8103 USA Susanne Crimme Center for Biomolecular Magnetic Resonance Institute o f Organic Chemistry and Chemical Biology Johann Wolfgang GoetheUniversity Frankfurt Max-von-Laue-Str. 7 60439 Frankfurt Germany Dan Hadaruga Division of Biocomputing University of New Mexico School of Medicine, MS C l l 6445 Albuquerque, N M 87131 USA Nicoleta Hadaruga Division of Biocomputing University of New Mexico School o f Med, MS C l l 6445 Albuquerque, N M 87131 USA Stephen J. Haggarty Broad Institute of Harvard and MIT 320 Bent Street Cambridge, MA 02141 USA Andrew D. Hamilton Department of Chemistry Yale University 225 Prospect St. New Haven, CT 06520-8107 USA
I
xix
xx
I
List ofcontributors
JasonM. Haugh Department o f Chemical and Biomolecular Engineering North Carolina State University Raleigh, NC 27695-7905 USA Alexander Hillisch Bayer Healthcare AG PH-GDD-EURC-CR Aprather Weg 18a 42096 Wupperta! Germany Andrew Hopkins Pfizer Global Research and Development Pfizer Ltd. Sandwich, Kent, CT13 9NJ United Kingdom Edgar Jacoby Novartis Institute for Biomedical Research Lichtstrasse 35 4056 Basel Switzerland Nadim Jessani Department of Cell Biology Celera 180 Kimball Way South San Francisco, CA 94080 USA Kai Johnsson Ecole Polytechnique Federale de Lausanne (EPFL) Institute o f Chemical Sciences and Engineering 1011 Lausanne Switzerland
Nils Johnsson Center for Molecular Biology o f Inflam mat io n Institute o f Medical Biochemistry University o f Muenster Von-Esmarch-Str. 56. 48149 Muenster Germany
Peter W. M. Johnson School o f Chemistry University of Southampton Highfield Southampton SO17 1BJ United Kingdom Tarun M. Kapoor Laboratory of Chemistry and Cell Biology Rockefeller University Flexner Hall 1230 York Ave. New York, NY 10021 USA Laura L. Kiessling Department o f Chemistry University o f Wisconsin 1101 University Avenue Madison, WI 53706 USA Nikolai Kley CPC Biotech, Inc. 610 Lincoln Street Waltham, MA 02451 USA Chaitan Khosla Department o f Chemistry Stanford U n iversi ty 381 North South Mall Stanford, CA 94305 USA
List
Ursula Klingmiiller German Cancer Research Center (DKFZ) Im Neuenheimer Feld 280 69120 Heidelberg Germany John T. Koh Department o f Chemistry and Biochemistry University o f Delaware Newark, DE 19716 USA Markus Kollmann Physics Institute Hermann-Herder-Str. 3 79104 Freiburg Germany Michael A. Lampson Laboratory o f Chemistry and Cell Biology Rockefeller University Flexner Hall 1230 York Ave. New York, NY 10021 USA Jerry Lanfear Pfizer Global Research and Development Pfizer Ltd. Sandwich, Kent, CT13 9NJ United Kingdom Thomas Langer Center for Biomolecular Magnetic Resonance Institute o f Organic Chemistry and Chemical Biology Johann Wolfgang GoetheUniversity Frankfurt Max-von-Laue-Str. 7 60439 Frankfurt Germany
ofcontrjbutors
Bissan Al-Lazikani lnpharmatica Ltd. 60 Charlotte Street London, W1T 2NU United Kingdom Ramona Moldovan Division o f Biocomputing University o f New Mexico School o f Med, M S C l l 6445 Albuquerque, N M 87131 USA JohnT. Moore Discovery Research GlaxoSmithKline Discovery Research Research Triangle Park, NC 27709 USA Maria Mracec Division o f Biocomputing University o f New Mexico School o f Med, M S C l l 6445 Albuquerque, N M 87131 USA Tom W. Muir The Rockefeller University 1230 York Avenue New York, NY 10021 USA Hans Peter Nestler Sanofi aventis Combinatorial Technologies Center 1580 East Hanley Blvd. Tucson, AZ 85737 USA Ulf Norinder AstraZeneca R&D Discovery Medicinal Chemistry 15185 Sodertalje Sweden
1
xxi
xxii
I
~ i s ofcontributon t
Marius Olah Division o f Biocomputing University o f New Mexico School o f Med, M SC l l 6445 Albuquerque, N M 87131 USA
Pamela Peralta-Yahya Department o f Chemistry Columbia University 3000 Broadway, MC 3167 New 'fork, NY10027-6948 USA
Tudor 1. Oprea Division o f Biocomputing University o f New Mexico School o f Med, MS C l l 6445 Albuquerque, N M 87131 USA
Barbara Pescatore Center for Biomolecular Magnetic Resonance Institute of Organic Chemistry and Chemical Biology Johann Wolfgang CoetheUniversity Frankfurt Max-von-Laue-Str.7 60439 Frankfurt Germany
Liliana Ostopovici Division o f Biocomputing University o f New Mexico School o f Med, M SC l l 6445 Albuquerque, N M 87131 USA John Overington lnpharmatica Ltd. 60 Charlotte Street London, W1T 2NU United Kingdom Graham Packham School o f Chemistry University o f Southampton Highfield Southampton SO1 7 1BJ United Kingdom Gaia Paolini Pfizer Global Research and Developme nt Pfizer Ltd. Sandwich, Kent, CT13 9NJ United Kingdom Kenneth H. Pearce Gene Exp. and Protein Chem. GIaxoSmith Kline Discovery Research Research Triangle Park, NC 27709 USA
Matthew R. Pratt Laboratory of Synthetic Protein Chemistry The Rockefeller University New York, NY 10021 USA Ramona Rad Division o f Biocomputing University o f New Mexico School of Med, MS C l l 6445 Albuquerque, N M 87131 USA Dietmar Reichert Degussa AG Exclusive Synthesis & Catalysis Rodenbacher Chausssee 4 63457 Hanau Germany Lars Rogge lmmunoregulation Laboratory Department of Immunology Institute Pasteur 25, rue du Dr. Roux 75724 Paris Cedex 15 France
List ofcontributors
Cerhard Quinkert lnstitut fur Organische Chemie und Chemische Biology Johann Wolfgang Goethe Universitat Marie-Curie-Str. 11 60439 Frankfurt Germany Krishna Saxena Center for Biomolecular Magnetic Resonance Institute o f Organic Chemistry and Chemical Biology Johann Wolfgang GoetheUniversity Frankfurt Max-von-Laue-Str. 7 60439 Frankfurt Germany Ulrich Schieborr Center for Biomolecular Magnetic Resonance Institute o f Organic Chemistry and Chemical Biology Johann Wolfgang GoetheUniversity Frankfurt Max-von-Laue-Str. 7 60439 Frankfurt Germany Nathan A. Schnarr Department o f Chemistry Stanford University 381 North South Mall Stanford, CA 94305 USA Harald Schwalbe Center for Biomolecular Magnetic Resonance Institute o f Organic Chemistry and Chemical Biology Johann Wolfgang GoetheUniversity Frankfurt Max-von-Laue-Str. 7 60439 Frankfurt Germany
Stuart L. Schreiber Howard Hughes Medical Institute Department o f Chemistry and Chemical Biology Harvard University Broad Institute o f Harvard and M I T Cambridge, MA 02142 USA Carsten Schultz European Molecular Biology Laboratory Gene Expression Programme Meyerhofstr. 1 691 17 Heidelberg Germany Peter H. Seeberger Laboratory for Organic Chemistry Swiss Federal Institute o f Technology Zurich ETH-Honggerberg HCI F315 Wolfgang- Pa u Ii-Str. 10 8093 Zurich Switzerland Klaus Seuwen Novartis Institutes for BioMedical Research Lichtstrasse 35 4056 Basel Switzerland Kevan M. Shokat Department o f Cellular and Molecular Pharmacology UC San Francisco 600 16th Street, Box 2280 San Francisco, CA 90143-2280 USA hdia Sielaff Ecole Polytechnique Federale de Lausanne (EPFL) Institute o f Chemical Sciences and Engineering 1011 Lausanne Switzerland
I
xxiii
xxiv
I
List ofcontributors
Matthew D. Simon Department o f Cellular and Molecular Pharmacology UC San Francisco 600 16th Street, Box 2280 San Francisco, CA 90143-2280 USA Masahiko Sisido Department o f Bioscience and Biotechnology Okayama University 3-1-1 Tsushimanaka Okayama 700-8530 Japan Derek S. Tan Laboratory of Chemistry and Chemical and Chemical Genetic Sloan-Kettering Cancer Center 1275 York Ave. RRL 1317 New York, NY 10021 USA lens Timmer Physics Institute Hermann-Herder-Str. 3 79104 Freiburg Germany Paul A. Townsend School o f Chemistry University o f Southampton Highfield Southampton SO1 7 1BJ United Kingdom Martin Vogtherr Center for Biomolecular Magnetic Resonance Institute o f Organic Chemistry and Chemical Biology Johann Wolfgang GoetheUniversity Frankfurt Max-von-Laue-Str.7 60439 Frankfurt Germany
Herbert Waldmann MPI of Molecular Physiology University of Dortmund Otto-Hahn-Str. 11 44227 Dortmund Germany Holger Wallmeier Aventis Pharma Deutschland GmbH Research &Technologies lndustriepark Hochst, K801 65926 Frankfurt am Main Germany Michael C. Weiger Department o f Chemical and Biomolecular Engineering North Carolina State University Raleigh, NC 27695-7905 USA Daniel B. Werz Laboratory for Organic Chemistry Swiss Federal Institute o f Technology Zurich ETH-Honggerberg HCI F315, Wolfgang-Pauli-Str. 10 8093 Zurich Switzerland Ciinther Wess GSF - Forschungszentrum fur Umwelt und Gesundheit Ingolstadter Landstr. 1 85764 Neuherberg Germany Norbert Windhab Degussa AG CREAVIS Rodenbacher Chausssee 4 63457 Hanau Germany
List ofContributors
Hang Yin
Sirus Zarbakhsh
Department o f Chemistry Yale University 225 Prospect St. New Haven, CT 06520-8107 USA
European Molecular Biology Laboratory Gene Expression Programme Meyerhofstr. 1 691 17 Heidelberg Germany
I
xxv
PART I Introduction
Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited bv Stuart L. Schreiber. Tamn M. Kauoor. and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
13
1 Chemistry and Biology - Historical and Philosophical Aspects Gerhard Quinkert, Holger Wallmeier,Norbert Windhab,and Dietmar Reichert Dedicated to Profs. Helmut Schwarz and Utz-Hellmuth Felcht on the occasion of their respective GOth birthdays.
1.1 Prologue
The reductionistic attitude of philosophers [ 11has given way to the emergencebased thinking [2] of biologists. In place of the view that phenomena occurring at a higher level in a complex system [3] with hierarchically structured levels of organization can also be described by rules and in terms of concepts already verified at a lower level, it has come to be accepted that some of these rules or concepts may be altered or even gained in the transition from lower to higher level. This applies even in the case of the structural and functional basic unit of all biological systems: the living cell. The living cell is a protected region in which diverse ensembles of molecules interact with one another in a harmony achieved through self-assembly [4]. The reality of the cell, with its overlapping functional networks [S] (for regulation of metabolism, signal transduction, or gene expression, for example) can serve as a model. The question of the hierarchical organization of such networks arises. Top-down analysis proceeds in the direction of decreasing complexity of the biological systems, a cell, a tissue, or even an organism, step by step all the way down to the level of molecules underlying their intra- and intermolecular interactions. From chemistry’s molecules and supermolecules bottom-up synthesis starts in the direction of increasing complexity to reach the totality of the cell and its higher organizations emerging through modular motifs and supramodular functional units [6]. Bottom-upsynthesis and top-down analysis are signposts for changes in complexity in emergent systems, lending themselves not only to narrative representation of what is, but also to reflective conjecture on why something is as it is. The interdisciplinary union of the worlds of chemistry and of biology has to begin with the different entry points to the two disciplines. In the world of chemistry, for material atoms and its associated interactions within and Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
4
I between moleculesthe crucial aid is the open sesame represented by the periodic 1 Chemistry and Biology - Historical and Philosophical Aspects
system of the chemical elements. In the world of biology, the fundamental information flow and the associated ascent from the biochemical network of metabolism to the biological network of genetic information transfer can be deciphered by the Rosetta Stone that is the genetic code. Fundamental to this is the understanding that in biology - as in cosmology'), but wholly different in chemistry (and physics) - earlier historical events influence future developments. It is a characteristic of historical events that they may have been played out completely differently under other circumstances. In such cases, it is reasonable to ask why questions. Why did Darwinian evolution eventually come to entrust its further fate to the chemistries of two polymer types, nucleic acids and proteins, and their later collaboration in a ribosome? Why did the dice fall in favor of a genetic code with triplet character? Why did protein genesis satisfy itself with the 20 canonical amino acids? For a transdisciplinary perspective it is worth addressing such cases in which the emergence of chemistry (or, more precisely, biochemistry) into biology (or, more precisely, molecular biology) signifies a tipping point. This came about with the appearance of macromolecules possessing the aptitude to store and distribute information and to translate it into catalytic function [gal. It became manifest as awareness grew of the double-faceted nature of protein synthesis: as an enzymatic chain of chemical reaction steps in biochemical space and as a genetic information transfer process in molecular biological space 191. This essay deals with the structures and functions of material things produced by chemical or biological means. While the products obtained in both routes are comparable, if not identical, the production facilities differ substantially.As facilities of human design, they happen to be formed by machines in the laboratory or in the factory;as facilities of Darwinian evolution, they start to exist in generative supermolecules of the living world. Having distinguished the generation of natural products by supramolecular facilities built up by self-assemblyof complementary molecules from the production of materials in man-made facilities, it seems appropriate to add a brief excursion into semantics. 1.2 Semantics 1.2.1 Synthesis - Genesis - Preparation
By a chemical reaction, whether it takes place in a laboratory, in a factory, or in a living cell, an educt is converted into a product. If the product is structurally 1) The developments of stars and galaxies offer
no analog to Darwinian evolution by natural selection, of course [7].
1.2 Semantics
more complex than the related educt, the conversion is called a construction (in biochemistry: an anabolic pathway). In contrast, the conversion is called a degradation (in biochemistry: a catabolic pathway), if the product is less complex than the related educt. According to another classification, one may distinguish between synthesis, genesis, and preparation. While execution follows a subtle plan in the first and instructions of a naturally selected program in the second case, tinkering takes place in the last instance. That such a differentiation may prove useful to the keen mind of a synthetic chemist is demonstrated by the example of the natural dye, indigo. While its first offspring is often popularly held to be urea, synthetic chemistry actually began in the last quarter of the nineteenth century, with the production of artificial indigo [lo]. This dissent can be resolved if consensus is reached on what should be understood by the term synthesis in organic chemistry [ll].If it is taken to mean an attempt to construct a previously decided upon target molecule with a known structure from a suitable starting molecule (or molecules) according to some plan [12],the choice has to be for indigo. Urea, in contrast, was discovered by chance as an isomerization product of ammonium cyanate by Wohler [13]in 1828, and was not in any way prepared intentionally [14].This qualification, however, does not mean that the urea synthesis can be discounted as inconsequential. On the contrary, Friedrich Wohler’s production of artificial urea from hydrogen cyanate and ammonia in 1828 was a key discovery for the dawning chemical sciences, and researchers at the everadvancing frontiers of the science have to this day venerated the narrative connection between Wohler’s urea synthesis and their own new findings and future perspectives. What historians like to unmask as a benign legend [14] serves scientists as a rhetorical shorthand and metaphorical paraphrase. In the industrially used Heurnann-Pfleger synthesis, N-phenylglycine 1, readily accessible from aniline, is transformed through indoxyl2 into indigo 3 in a targeted fashion (Scheme 1-1). This process represents the culmination of a development first set in motion in the laboratories of the Munchen University under Adolf Baeyer. Baeyer had begun his efforts to prepare indigo in the laboratory at a time (before 1883) when the constitution of indigo was not even known [lG],starting his
1
2
3
Scheme 1-1 Industrial production o f indigo 3 by the Heurnann-Pfleger synthesis [15]: from 1 via 2 t o 3.
15
6
I endeavors with degradation products (aniline,anthranilic acid,isatin) obtained 7 Chemistry and Biology - Historical and Philosophical Aspects
by the application of one of the usual degradative methods (alkali melt, effect of oxidizing agents) to the naturally occurring dyestuff. These degradation products were treated with an extraordinarily broad range of chemicals in a form of intuitive combinatorial process, to examine whether the resulting products would contain 3. In this way, Baeyer and Emmerling succeeded in transforming isatin 10 into 3 in 1870.The preparation of 10 (from phenylacetic acid4: 1878)was however too elaborate to becomrnerciallyviable (Scheme 1-2). As long as the constitution of a target molecule is unknown, the above definition of a synthesis is inadmissible. The sequence of reactions depicted in Scheme 1-2, however, characterizes a venture that serves for the preparation of indigo. Two other pathways that afforded indigo in the laboratory were also not industrially viable. A. von Baeyer encouraged BASF and Farbwerke Hoechst to undertake a systematic search for an industrial synthesis of artijicial indigo (the constitution of which had meanwhile been established) in competition with one another. This was finally achieved in a strategicallyclear and tactically flexible manner through the already mentioned Heumann-P’eger synthesis (Scheme 1-1).It was envisaged that the artificial preparation of dyes from coal tar should become a source of national wealth. Baeyer’s Miinchen University laboratories and the two representatives of Germany’s flowering chemical
1
r
4
5
7
a
1
6
1
H 9
0
I
Scheme 1-2
colleagues.
Laboratory studies ofthe preparation of indigo 3 by A. (uon) Baeyer and his
1.2 Semantics 17
industry had exchanged ideas and experiences in a previously unknown scale and had thus passed the test for a collaboration in partnership. In 1905, Adolf von Baeyer was awarded the Nobel Prize for Chemistry for his contribution to the development of organic chemistry and the chemical industry. It has thus been demonstrated that the example of indigo is suitable for conceptual differentiation between molecule construction according to a plan (synthesis) and one without a plan (preparation). It can also provide an illustration, based on the different character of the synthetic steps involved, of differentiation between chemical and biological synthesis steps within the overall indigo syntheses. Chemical synthesis steps [ 17a] can be understood to include transformations achieved not only through the use of reagents or catalysts prepared by chemists but also those in which enzymes, antibodies, or even dead cells are used. Synthesis steps in which the synthetic capabilities of living cells, either possessing their original genomes or new recornbinant variants, are deployed in a targeted manner, are classified as a part of biological synthesis [17a]. Indigo was synthesized biologically in 1983 (Scheme 1-3) [18]. Biological indigo synthesis made use of an Escherichia coli strain with a recornbinant genome, being capable of converting aromatic hydrocarbons in general into cis-l,2-dihydrodiols and, in particular, indole (obtained from tryptophan 11 with the aid of tryptophanase) into cis-2,3-dihydroxy-2,3dihydroindol13. The recombinant E. coli strain was augmented with the genes expressing naphthalene dioxygenase from Pseudomonas putida. The initially produced oxidation product spontaneously loses water, and the resulting indoxyl 2 is converted by aerial oxidation into 3, which can be taken up into organic solvents.
&NH2
cis-2,3-dihydroxy2,3-dihydroindol
H
/ H 11
12
11
Tryptophanase
-
13
Naphthalenedioxygenase
12
+
13
1
- H2O
Air oxidation 3
-
Scheme 1-3 Formation of indigo 3 in a recombinant strain of E. coli.
2
8
I
1 Chemistry a n d Biology
Indol-3glycerolphosphate
Historical and Philosophical Aspects
- --
12
2
3
Scheme 1-4 On the formation of indigo 3.
After the discussion on the biological synthesis of indigo with the aid of a recombinant E. coli strain, one question still remaining relates to the programmed genesis of indigo precursors in plants. Plants cultivated for indigo production contain 2, stabilized by glycosylation (e.g., as indican = indoxyl B-D-glucoside or as isatan B = indoxyl 5-ketogluconate) [19]. Indoxyl on its part is produced from indole 3-glycerinephosphate [20] (Scheme 1-4) and that in turn by the chorismate pathway. This essay deals not only with preparation (intuitive) and synthesis (planned) but also with genesis (programmed). Such (genetically and somatically regulated) programs have arisen through Darwinian evolution. A plan for a synthesis is devised by a synthetic chemist as designer and enacted by the synthetic chemist as molecule maker. How is a synthesis planned?
1.2.2 Synthetic Design - Synthetic Execution
Unlike the bottom-up-oriented execution of a synthesis, involving real molecules, the designing of a synthesis is a top-down event using virtual structuresZ).Design begins with the target structure and moves through a greater or lesser number of intermediate structures to the starting structure, with the complexity generally decreasing. The starting structure is worthy of that name, once it can reasonably be said to represent a comfortably accessible starting molecule for the carrying out of the synthesis. E. J . Corey coined some terms for top-down-oriented synthesis design which intended to highlight the fact that retrosynthetic structure analysis and synthetic building up of the molecule are concurrent processes. Whilst bottom-up synthesis takes place with molecules and in synthetic steps through the deployment of suitable synthetic building blocks, from the appropriate starting molecule to the resulting target molecule, top-down retrosynthesis operates with structures and in transformation steps through the identification of appropriate retron structure elements, from the particular target structure to the resulting starting structure. Some of Corey’s achievements through his endeavors in the logic ofsynthesis [21] include: the fact that organic synthesis can be taught [22] even where it is not actively practiced; 2) Differentiation between abstract structures
and concrete molecules will also pay for itself in other circumstances.
1.2 Semantics
the availability of computer-aided synthesis planning [23]as a procedure to generate a population of synthesis plans from which the synthetic chemist can select the best one to use; and his being awarded the 1990 Nobel Prize for Chemistry for development and methodology of organic synthesis. Twenty-five years earlier, R. B. Woodward had been awarded the Chemistry Nobel Prize for his outstanding achievements in the art of organic synthesis. Woodward’scategorical imperative [12] - Synthesismust always be carried out by plan - rapidly became the sign of the coming generation of natural products’ synthesis chemists. His qualifying statement in the following sentence can easily go unremarked: “The synthetic frontier can be defined only in terms of the degree to which realistic planning is possible”. This is probably the reason for Woodward’scomment at the end ofhis essay on the total synthesis of chlorophyll [24a].“At the beginning there was detailed synthetic planning. The degree to which our plans proved realizable is very gratifying, but laboratory discoveries and knowledge obtained from observation and experimentation contributed at least as much to the advancement of our studies. We learned and found out much that would previously not have been knowable or at best would have been only approximately imaginable.” Elsewhere he sounds the Leitmotif of natural products synthesis [24b]: “In our time many organic chemists address themselves explicitly to mechanistic and theoretical problems - and make outstanding contributions in so doing - it should not be forgotten that questions too self-consciouslyasked of Nature may well receive subconsciously determined answers - answers which only with difficulty contain more than was presupposed in the questions. It is important to keep open the avenues for innovation and surprise.”
1.2.3 Preparative Chemistry - Synthetic Chemistry
The terms preparative chemistry and synthetic chemistry are often used synonymously. We wish to draw some distinction between them: in preparative chemistry we see a rich fund of knowledge from which the synthetic chemist can draw, gained from work on chemical reactions. The preparative chemist is concerned with broadly aimed investigations geared toward the discovery of chemical reactions and the development and improvement of already known ones. A chemical reaction may qualify as “mature” [17a] if it is capable of transforming a starting compound of not too restricted substrate specificity in a predictable manner: under easily maintainable reaction conditions; as far as possible with the use of substoichiometric proportions of effective catalysts;
19
10
I
I Chemistry and Biology - Historical and Philosophical Aspects
without restriction to a particular scale; with high chemical yield; and with high regio- and stereospecificity into an envisaged product. There is now such an extensive available reservoir of preparatively useful reactions of this level of comprehensiveness that for the construction of molecular skeletons it appears expedient to switch to a handful of trusted reactions in the first instance [25]. In the introduction, modijication, and elimination offinctional groups, the a priori restriction on only a few methods is already becoming more difficult. Organic synthesis presupposes a substantial body of knowledge, usually developed through bottom-up strategies ofthe structures and reactivities oforganic molecules. In education, though, it is important to begin concurrently practicing top-down approaches based on this knowledge and its extension and further enrichment, as early as possible. As example speaks louder than a long discussion of principles: to demonstrate the problem-solving potential of synthetic chemistry, it would be useful to identify a molecule that has served for a long time, commanding undiminished interest both in the past and in the present, as a sought-after target molecule for a solid synthetic pathway. One such molecule is estrone. If a particular target structure has been decided upon, it is appropriate to select a particular synthetic pathway from the multitude ofvirtual ones identifiable by combinatorial analysis (Scheme 1-5).In the process, it usually remains open whether the whole set of alternative synthetic pathways for the particular decision is evaluated or intuitively only a part of it is considered.
1.3 Bringing Chemical Solutions to Chemical Problems
1.3.1 The Present Situation
At the beginning of the twenty-first century chemistry finds itself in the middle of a phase of reorientation. In the chemical industry there is a clear trend toward specialization and concentration. It cannot be ignored that traditional organizational structures can be altered appreciably by investment and disinvestment decisions, the maxim being away from the broadly diversified chemical concern of yesterday toward the megacorporation of tomorrow, with its focus on a few core competences. Measures adopted in established organizations are disposition of particular branches, horizontal fusion of adjoining core activities, and vertical integration of new high-tech ventures. In the chemical sciences, progressive integration with chemical biology and also with nanotechnology is underway. Self-organization of molecules and modules into supramolecular and supramodular functional units plays a prominent role in both fields of development, as is clear from research and
1.3 Bringing Chemical Solutions to Chemical Problems
-A
AB
BC
AC
ABD
I”
7 ABCD
\?AAD
N A Y D1 BD
6 further planning variants
CD
B
A
+
A B C D t C
4 further planning variants
4 D Scheme 1-5
Virtual synthetic pathways toward the steroid skeleton with rings A, 6, C, and D. Top row: stepwise conversion of a ring A (B,C, or D)-building block into the ABCD system; middle row: expansion in a
single step of an AB (AC, AD, BC, BD, or CD)-building block into the ABCD system; bottom row: expansion in a single step of an A (B,C, or D)-building block into the ABCD system.
teaching in the top academic institutions. That this has been possible is due to the development of physical methods without the aid of which it would be impossible even to establish the existence or presence of systems with particular properties. The core competence of chemistry, though, remains the provision of new molecules through synthesis, a mission equally valid for synthetic chemists in both industrial and academic environments. Both can point to great successes in the past. Nonetheless, synthesis finds itself in a dilemma. Academic synthetic chemists tended to give the highest priority to the elegance of the design of a synthesis, and this veneration was passed on to their students. For industry’s molecular engineers, the expediency with which the synthesis could be carried out held center stage: a concept which new graduates did not have to come to terms with until their entry into their industrial careers. Meanwhile, the constructive tension between elegance and efficiency was usurped by the dream of the perfect reaction and the ideal synthesis. The perfect reaction can be summarized in Derek Burton’s utopian view: 100%yield, 100%stereoselectivity [25a]. B. M. Trost [25b]seeks to advance toward the ideal through observance of atom-economy, and M. Beller [25c]
12
I through transformation of multiple-component educts into single-component 7 Chemistry and Biology - Historical and Philosophical Aspects
products. The ideal synthesis conforms to the prescription of K. B. Sharpless [26]: rather than being concerned with the innumerable synthetic methods in the textbooks one should assemble a handful of “perfect” reactions that may be used again and again by synthetic chemists in the many-step construction of a molecular framework. A solution to this dilemma lies in a radical new orientation, as the synthetic chemist begins to take on a role in chemistry similar to those long played by the medical doctor in biology or the engineer in physics [27]. In this way, the synthetic chemist provides assistance to the fundamental scientist as a practicing technologist for mutual benefit and being capable of demonstrating that, and in what way, fundamental chemical knowledge may be applied in a targeted fashion to problem solving in synthesis. There is still the matter of future target molecules for the synthetic chemist. The times are gone when it was sufficient to synthesize a target molecule just because it had not yet been synthesized in another laboratory. The accent of interest in chemistry has shifted. There are two reasons for this: one is that the structure space of supramolecular chemistry, unlike that of molecular chemistry, is in many regions only thinly populated and awaits selective filling. The attention of chemists has therefore moved from molecular structure to molecular function [28]. Molecules that combine themselves into supramolecular functional units attract particular attention from synthetic chemists. A. Eschenrnoser’s vision [29] of creating synthetically accessible supramolecular systems that will spontaneously assemble and may even be capable of reproducing themselves, thus representing the first artificial models of living systems, is heading in this direction, although far into the future. 1.3.2 Historical Periods of Chemical Synthesis
From a distance, scientific and technological advancements look like a continuous stream, contributed to by many activists. On closer inspection, though, discontinuities due to outstanding contributions by individuals are unmistakable. If the development of chemical synthesis is reviewed, it is possible informally to identify three phases, following on from one another in the sense that a later phase is characterized by a greater degree of selectivity than the earlier, with which it partially overlaps. It is easy to make out prominent protagonists for each of the three phases. The example of the female sex hormone estrone serves well to demonstrate how the synthetic chemist has succeeded in meeting growing demands for selectivity.
1.3.2.1
The pre-Woodwardian Era
The first phase of chemical synthesis, ending at about the beginning of the Second World War, might be termed the pre-Woodwardian era.
1.3 Bringing Chemical Solutions t o Chemical Problems
The pre- Woodwardian era largely concerned itself with the collection and classification of synthetic tools: chemical reactions suited to broad application to the constitutional construction of molecular skeletons (including Kiliani’s chain-extension of aldoses, reactions of the aldol type, and cycloadditions of the Diels-Alder type). The pre- Woodwardian era is dominated by two synthetic chemists: Emil Fischer and Robert Robinson. Emil Fischer was emphasizing the importance of synthetic chemistry in biology as early as 1907 [30]. He was probably the first to make productive use of the three-dimensional structures of organic molecules, in the interpretation of isomerism phenomena in carbohydrates with the aid of the Van’t Ho$ and Le Be1 tetrahedron model (cf. family tree of aldoses in Scheme I-G),and in the explanation of the action of an enzyme on a substrate, which assumes that the complementarily fitting surfaces of the mutually dependent partners are noncovalently bound for a little while to one another (shape complementarity) [31]. Robert Robinson looked for suitable reactions with the aid of which constitutional modifications in a pathway to, for example, a steroid synthesis might be achieved. He was probably the first to employ mechanistic
! c 7 cs c2
0C1
Glyceraldehyde
Eryihrose
/
$
Ribose
/
\
\
/
Xylose
\
/
\
LYXOSQ
HO
OH
Allose
Arabinose
OH
H
CH,OH
gl:$4
CH20H
CH20H
CH20H
CHzOH Altrose
H $
OH
CH20H
Glucose
CH>OH
Mannose
CH>OH OH CH,OH
CH70H
Gulose
Scheme 1-6 The family tree o f aldoses derived f r o m
(+)-glyceraldehyde. The Fischer projections of the corresponding aldaric acids are, variously, chiral and asymmetrical (C,), chiral and symmetrical (C?), o r achiral and symmetrical (G).
Idose
Galactose
Talose
14
I considerations in the process. There is a tendency toward charge balancing 7 Chemistry and Biology - Historical and Philosophical Aspects
between anionoid and cationoid atom groups [32] through space and through the bonds lying between them (charge complementarity). Robinson used a transparent accounting system (curly arrows) to illustrate the direction of charge displacement (Scheme 1-7). Case Study Estrone: Elisabeth Dane’s attempts to produce estrone 24 (Scheme 1-8)synthetically [33], beginning with a Diels-Alder reaction that might formally give rise to two regioisomeric adduct components, ended in disappointment: whilst no adduct at all was obtained from an attempted reaction between the Dane diene 1 4 and the monoketonic dienophile 15a, the reaction between 14 and the biketonic dienophile 19a resulted in a mixture of rac-20a and rac-2la, in which rac-20a, with the steroidal molecular skeleton, was present only as a minor component. It is thus no surprise that the Dane strategy was consigned to the files, at the end of the 1930s.
1.3.2.2
The Woodwardian Era
In the second phase of organic synthesis, which could reasonably be termed the Woodwardian era, beginning in 1937”, chemical reactions characterized by diastereoselection in the construction of a molecular skeleton found favor. Here as well, two synthetic chemists tower over all their contemporaries: one, naturally, is R. €3. Woodward, who advanced the intellectualization of organic synthesis like no one else. Woodward’s seminars set a new standard for natural products chemistry4).The other is Albert Eschenrn~ser~), the sole
P O
,-
Me
Me
Scheme 1-7 Analysis ofthe relative orientation o f Dane’s diene 14 and the complementary dienophile following Robinson’s way. 3) Woodward graduated as a Doctor of Philosophy in 1937, after submission of his dissertation at M I T (Cambridge, Mass.) (341.
4) I have no doubt that they ( Woodwards seminars
at ETH Zurich)played a major role in stimulating my ownpredilectioizforand enthrallment with the synthesis of complex natural products; A. E.: in 1351.
5) See the concise Preface in [36a].
1.3 Bringing Chemical Solutions to Chemical Problems
15
14
15a: R =M e 15b: R = Et
16a: R =M e 16b: R = Et
17a: R = Me 17b: R = Et
18a: R = Me 18b: R = Et
19a: R = Me 19b: R = Et
20a: R = Me 20b: R = Et
21a: R =M e 21b:RZEt
22a: R = Me 22b: R = Et
23
24
Scheme 1-8 Collections o f formulae relevant to Dane’s concept o f a steroid synthesis following the AB D + ABCD aufbau principle.
+
recipient of the privilege of a “collaborative competition” with Woodwurd [35]. To master the demands of stereoselection it is necessary to know the mechanism of the reaction used and its stereostructural consequences. In particular, knowledge of a mechanism demands the capability to gauge the diastereomorphic transition states of rival parallel reactions (see Scheme 36 in [37]).A necessary prerequisite for the acceptance of proposed ideas is that they should be able to predict the sense of chirality of the main product components, accurately. Case Study (f)-Estrone (ruc-24): In 1991, [33c] the presumed dead Dane strategy was resurrected by the use of Lewis acids as mediators. Compound 1 4 does in fact react with 15a between 0 “C and room temperature in CH2Cl2 - to provide a mixture of (mainly) ruc-16a and (as a minor product) ruc-17a - as soon as Et2AlCl is added [33d]. In the presence of TiC14 in CHzCl2 at -80 “C an 89% yield of ruc-18a is obtained.
1.3.2.3
I
The post-Woodwordian Era
Characteristic of the third phase of organic synthesis, which would logically be termed the post- Woodwurdian era, is that the constitutional construction of a molecular framework is now concerned not only with the problem of diastereoselection but also with the more demanding problem of
16
I enantioselection [37]. Certain chemical reactions serving as key stages in I Chemistry and Biology - Historical and Phi/osophical Aspects
multistep syntheses have been developed to perfection through the preparation of tailor-made catalysts by Barry Sharpless6) (38a],R. NoyoVi [39]and E. J. Corey [40],setting the standard for the further development of organic synthesis. Case Study: (+)-Estrone 24. The “Dane-style estrone synthesis” provides a classic example of stereoselective access to an envisaged target molecule. The Diels-Alder reactions between 14 and 15a or 19a are chirogenic’’ reaction steps or, put another way, the enantioselective access to the Diels-Alder adducts can already be set at this stage. This requires, for example, the participation of a nonracemic Lewis acid with the “right” sense of chirality. In the presence of a Ti-TADDOLate [42], cycloadduct 20a was thus obtained from the Dane diene 14 and the bidentate dienophile 19a and was further transformed via 23 into (+)-estrone 24*1 [33d]. Before leaving estrone, a synthetic model for oral contraceptives, as synthetic biologicals (vide infia), it should be pointed out that each historical period of chemical synthesis can be correlated with a characteristic synthetic level amenable to conscious perception [37]. The resurrection [33c] of the Dane strategy for estrone prompted synthetic chemists working on the design of metal-free, chirality-transferring catalysts to use the chirogenic opening step as a selection assay. In this context, acceleration of adduct formation and changes in the ratios of the resulting regioisomers are encouraging signs that enantioselection, which may be finished off here by recrystallization if necessary, may be anticipated [33d]. M. W. Gobel and coworkers [43] and E. J. Corey and coworkers [44]have reported on the application of amidinium catalysts and oxazaborolidinium catalysts, respectively,for the enantioselective treatment of the Dane diene 14 with 19a or with acyclic dienophile~~).
1.3.3 Diels-Alder Reaction - Prototype of a Synthetically Useful Reaction
The Diels-Alder reaction occupies a cherished place in the hearts of organic synthetic chemists, not only in the synthesis of steroids [45]but far and wide in the synthesis of structurally complex natural products [46].The Diels-Alder 6 ) Thebottomline in Scheme 1-6shows the eight aldohexoses ofnatural origin; they all belong to the D-series. Their L-configured enantiomers have been synthesized by use of the abiotic Sharpless catalyst (38bj.
8) The (S,S)-configurated Ti-TADDOLate [42] complex with four phenanthren-9-yl residues is used at -80°C in CH2C12: 65% chemical yield, 93% ee or 78% chemical yield, and 85% ee (2 or 0.2 equiv, respectively).
7) See [41] for the meaning of the term “chi-
9) With cyclic dienophiles, rings C and D in the cycloadduct are joined in cis fashion. With acyclic dienophiles containing E-configured C=C bonds, an adduct in which the atom groups necessary for construction ofthe D ring are oriented, trans is produced; see Chapter 3 in [33d].
rogenic reaction step” and the usefulness of its application.
1.3 Bringing Chemical Solutions to Chemical Problems
reaction comes closest to meeting the stipulations of K. B. Sharpless [26] and B. M. Trost [25b] set out in Section 1.3.1. It only remains to comment that, besides diverse instances of intermolecular examples, the intramolecular version1o'of a Diels-Alder reaction was not left neglected in the synthesis of estrone and its derivatives. Scheme 1-9 summarizes the construction of a steroid framework by the A D + AD + [AD]* -+ ABCD aufiau principle"'. [AD]* 25a is a photoenol generated i n situ, and reacts under meticulously determined conditions [48] by cycloaddition and subsequent dehydration to provide the estrone derivatives 2Ga and 27a. The mixture of regioisomeric styryl derivatives can be reduced to give 24 after temporary protection of the 17-keto group. The photoenol 25a is produced by regioselective electronic excitation of the Michael adduct 28a with light having wavelengths of >340nm. The Michael adduct is accessible by treatment of the chiral enolate anion 30a with the achiral acceptor 29 [49]. The strength (the trans fusion of rings C and D is directly accessible) and weakness (there is still no solution to the problem of substitution of the multistep procedure that delivers diastereoselection for a shorter route proceeding in tandem with enantioselection) of the photochemical synthesis of 24 have already been commented upon [36b].
+
I&[ Me0
\
&& C.r:"
Me0
25
\
Me0
\
26
27
a:R=Me b: R = Et
Me0
20
29
30
Scheme 1-9 Collection offormulae relevant to a steroid synthesis following an A D + AD + [AD]* + ABCD aufbau principle.
+
10) For further examples see the section "Intramolecular DielT-Alder Reactions" in Ref. [47].
11) Optimization of the reaction conditions was carried out in the racemic series 1481. See 1491
for the synthesis ofthe enantiomerically pure target compounds.
18
I
I Chemistry and Biology - Historical and Philosophical Aspects
1.4 Bringing Chemical Solutions to Biological Problems 1.4.1 The Role o f Evolutionary Thinking in Shaping Biology
Biology is such a hugely diversified field that a historical guide hardly helps as an aid to orientation. Given this, it might then be reasonable to consciously pick out some particular partial aspect, as Theodosius Dobzhansky did in his famous statement “Nothing in Biology makes Sense except in the Light of Evolution”. With evolutionary biology as a compass, it is not hard to discern three historical periods.
1.4.1.1
The pre-Darwinian Era
One prominent event in the pre-Darwinian era is the Cuvier-Geofioy debate (concerning the primacy of anatomical structure over anatomical function or vice versa) before the Acade‘mie des Sceances in Paris in the spring of 18301*).Its immediate focus involved opposed viewpoints in comparative anatomy, while indirectly it represented endeavors to turn “the static Chain of Being into an ever-moving escalator” [511. Cuvier represented the functionalist approach of the designer: Formfollows Function. Geofioy Saint-Hilaire expanded the theme and took the structuralist standpoint of the evolutionist: Functionfollows Form. The public argument was unable to settle the difference between the two adversaries, though it became clear that fundamental scientific discussions would in future no longer take place in a neutral en~ironment’~). It was also evident that evolutionary thinking in biology could no longer be kept in its cage.
1.4.1.2
The Darwinian Era
In the narrow sense, the Darwinian era began with the publication of The Origin of Species in 1859 and ended at the beginning of the twentieth century with the rediscovery of Gregor Mendel’s 1866 Versuche iiber Pflanzen-Hybriden (Experiments in Plant Hybridization). Charles Darwin’s book “The Origin of Species by Means of Natural Selection could be read as one long argument. It supported the claims of science to understand the world in its own terms. Animals and plants are not the product of special design or special creation. Natural selection was not self-evident in nature, nor was it the kind of theory in which one could say, “Look here and see”. Darwin had no crucial experiment that conclusively demonstrated evolution in action. His whole concept of natural selection rested on analogy”, an analogy between selective processes taking place under either artijcial or natural conditions [53]. A series of 12) See [SO] for the Cuuier-Geofioydebate before
and beyond the Academie.
13) See [52]: Discussions between Goethe and
Eckerrnann of the 2nd August 1830.
1.4 Bringing Chemical Solutions to Biological Problems
questions was left open; that of whether in the union of two gametes into a zygote a mixture of the genes involved took place (blending inheritance), occupied a key position. It could only be answered after: Gregor Mendel [54]had set out statistical rules for the passing on of particular hereditary characteristics from generation to generation, which are useful for discussion on the complex relationships in questions of heredity, and Wilhelm]ohannsen [55] had coined the terms phenotype and genotype, which made it possible to distinguish between a statistically apparent type (the phenotype) of observable properties and the corresponding genetic make-up (the genotype) of an organism. The distinction between genotype and phenotype facilitated the separation between genetics and embryology. It is clear from this separation that the differentiation between genetic and environmental causes in embryology and the wider discipline of developmental biology is something to talk about.
1.4.1.3
The post-Darwinian Era
The post-Darwinianera saw the vision of Darwinian evolution through natural selection being accepted as a reality. Since then, evolution has been observed in action in many living organisms and also in innumerable viruses [56, 571. Through Manfied Eigen’s paper on the role of “Self-organization of Matter and the Evolution of Biological Macromolecules” [58] Darwin’s ideas have been placed on firm physical foundations and have been tested by in vitro evolution experiments [59]. The Darwinian view of evolution has prompted biologists to think in terms of dynamic populations while considering a species [60].To avoid misunderstandings among nonbiologists, Eigen introduced the term quasispecies. Because of mutability, self-replicating systems are always ensembles of mutants and are not, in any circumstances, single species made up of uniform individuals. To indicate quantitative proportional relationships between quasispecies and their mutants, Eigen’s evolutionary model uses a multidimensional representation (sequence space). In a nucleic acid space [61] (protein space [62]14)),each nucleic acid (protein) sequence is represented in the sequence space by a point and each change in the sequence by a vector. If the points in a sequence space are assigned specific scalar fitness values, a fitness landscape is obtained. The metaphor of a fitness landscape (adaptive landscape) was introduced into evolutionary biology in 1932 by Sewall Wright [64] and was afterwards used abundantly, if with a certain breadth of interpretation, by theoretical biologist^^^). The picture conveyed 14) See [63]:Footnote 10. 15) R. A. Fisher, /. B. S. Haldane, and S. Wright
count as mathematical biologists; their publications were understood only by some of
their professional colleagues. T. Dobzhansky, G . G . Simpson, and E. Mayr successfully interpreted the mathematically formulated theorems [65].
20
I by the metaphor is that of an evolving population subject to exclusion of 7 Chemistry and Biology
-
Historical and Philosophical Aspects
unfit mutants making uphill progress until a local peak is reached. For the evolutionary process in the high-dimensional sequence space, local peaks in the vicinity may readily be reached by small jumps, without the need to traverse the valleys between them, and a continuous sequence of small jumps to reach a global summit is a realistic prospect. To use Eigen’s own words: “Because of frequent criss-crossing of paths in multidimensional sequence space, by virtue of its inherent non-linear mechanism which gives the appearance of goal-directednessthe process of evolution is steered in the direction of optimal value peak” [8b]. In brief, biological evolution uses two processes: genetic mutation (as a means of generating random diversity) and natural selection (as a means to optimize the peak-jumping technique) in the environmentally shaped fitness landscape. Through the removal of subdisciplinary barriers, biology’s evolutionary thinking has contributed on two occasions to enhance that science’s voice in the choir of the natural sciences. In the 1940s and 1950s, a union of Darwinian and Mendelian perspectives took place in Modern Synthesis [65], whilst at the turn of the twentieth to the twenty-first century a union of developmental and evolutionary biology into evolutionary developmental biology (Evo-Devo) is taking place before our eyes in the New Synthesis [66].
1.4.2 O n the Sequence of Chemical Synthesis (Preparation) and Biological Analysis (Screening)
In an ideal starting situation for the synthetic chemist the structure of the target molecule is already given. In the real world of the search for active substances, the matter of whether a target molecule is to be synthesized is determined by its presumed profile of properties. If a management decision is made in favor of a target molecule to be synthesized, the synthetic chemist then looks for a way to relate molecular function back to molecular structure. This is based on the supposition that a functional unit should contain at least two structurally complementary molecules non-covalently bound to one another in a supermolecule. The idea of supermolecules as supramolecular functional units, nowadays preached and systematically further developed most conspicuously by Jean-Marie Lehn [67], goes back directly to Emil Fischer [31], who introduced the instructive lock-and-key metaphor as early as 1894. Fischer’s metaphor, as the tip of the submerged model of molecular recognition, traces the function of a supermolecule back to structural interactions between its complementary constituents. Through this, the complementarity between substrate and enzyme was to become the basis of enzymology. Paul Ehrlich seized on the lock-and-key metaphor in his 1908 Nobel lecture [68], and the goal of chemotherapeutic endeavor thereafter came to be regarded as the activation or deactivation of a receptor through noncovalent binding of a
7.4 Bringing Chemical Solutions to Biological Problems
complementary effective substance. Structural complementarity of effector and receptor accordingly represents the fundamentals of chemotherapy, similar to the way in which complementarity of antigen and antibody is regarded as central to immunology. The goal of synthesizing a target molecule with particular properties can be achieved with the aid of two problem-solving processes based on different principles. In one problem-solving process, illustrated by the image of the key and its lock, the maxim is to m o d i h a designed target structure little by little until the corresponding target molecule has the very properties of interest. It involves an iterative procedure, usually of several rounds, based on trial and error. It is trivial to note that the screening can take place only after the synthesis. In the other problem-solvingprocess, which can be illustrated by the image ofan assortment of keys, hopefully containing the key that will be complementary to a given lock, the maxim is to develop a parallel structured search method, with the aid of which the matching key will befound, without it being necessary to subject the whole ensemble of candidates to the totality of&nctional tests. This is a procedure based on the principle of trial and selection. Since a distinction has been drawn between synthesis and preparation (Section 1.2.1),some spin doctoring should come as no surprise. After preparation is performed on a microscale, screening will follow before the synthesis on a macroscale. For the time being, we should come back to the traditional search for a biological, with a very particular function.
1.4.2.1
Single-componentConsecutive Procedure
In traditional single-component consecutive procedures, the synthetic chemist each time focuses on a structure (a molecule) from a series of successive candidates. The example of the total synthesis of estrone in Sections 1.3.2 and 1.3.3 demonstrates the adaptation of synthetic goals to the state of the art in organic synthetics. The case studies described there have academic value that should not be underestimated, though for industrial synthetic practices they are not directly relevant because estrone will in general be commercially more advantageously accessible through partial synthesis than through total synthesis. In the search for an ovulation inhibitor outlined below, however, total synthesis plays a commercially acceptable role, since partial synthesis drops out as a serious contender from the second generation of inhibitors to be discovered in future. 1.4.2.1.1
Oral Contraceptives
Thanks to initiatives instigated by Margaret Sanger, probably the highestprofile campaigner worldwide for family planning, a project geared toward the development of an orally administrable contraceptive was initiated in the
I
22
I early 1950s under the reproductive biologist Gregory G. Pincus at the Worcester I Chemistry and Biology
-
Historical and Philosophical Aspects
Foundation for Experimental Biological Research [69a]. It was known that progesterone established and maintained pregnancy as an endogenous gestagen and so was able to act as a contraceptive. As progesterone was not suited for oral application, a systematic search for the steroidal structure space was carried out for an exogenous gestagen [69b] that - orally administered - would bind to the progesterone receptor, hereby initiating a series of molecular events culminating in the induction or repression of a certain set of target genes. Binding of a gestagen to the progesterone receptor is necessary but not sufficient for the former’s playing an active role as an agonist in reproductive biology. This became clear as soon as an antigestagen like R LJ 486 [70] was found, which bound to the progesterone receptor, but - unlike an agonist - was unable to trigger the gestagenic response. As it turned out, there is no known parameter of effector binding that can predict differential agonistic or antagonistic activity of a steroid. If a metaphorical statement can ever reveal “how things are”, Emil Fischer’s static lock-and-keymetaphor [31a]ought to be replaced with a dynamic one. This was done by D. E. Koshland’s induced-jit concept [31b],which readily produced the self-explanatory hand-and-glove metaphor. Binding of a given effector will bring about a conformational change of the receptor that is favorable for catalytic activity of the formed supermolecule. G. G . Pincus and M . C. Chang investigated a diverse range of variants of about 200 steroids [69b], which were in most cases not naturally occurring compounds but products that had accrued in countless laboratories as a result of arduous individual studies on their biological functions. They found that combinations of a gestagenic and an estrogenic 19-nor-steroid exhibited the desired effects. These findings from animal experiments (rabbit and rat) were also confirmed in humans, in almost militarily planned (Pincus) clinical studies (by the gynaecologists I. Rock and C. R. Garcia). In the early 19GOs, a combination pill made up of norethindrone (prepared by C. Djerassi at Syntex in 1951 [71]) and 17w-ethynylestradiol (prepared by H . H . Inhofen at Schering AG in 1938 [72]) reached the market as the firstgeneration pill. Members of the First Generation
Norethindrone 31a, the gestagenic component in the combination pill, is smoothly accessible from estrone-methylether by partial synthesis [71]. The reaction sequence begins with a dearomatization (Birch reduction) and ends with an ethynylation (Scheme 1-10), necessary for the oral applicability. Technical production of estrone 24 (or estradiol) from inexpensive steroids such as diosgenin or cholesterol by partial synthesis is also feasible. Pyrolytic aromatization (Inhofen at Schering A G ) assists the transition from the steroid to the 19-nor-steroid class (such as from androsta-1,4-dien-17~-01-3-one 32 to estradiol33 [72]).
1.4 Bringing Chemical Solutions to Biological Problems 123
HO
& 3,
Me0
32
a: R = M e b: R = Et
33
;fi
\ 35
a: R = Me b: R = Et
34
Me0
Me0
37
38
Scheme 1-10 Collection o f formulae relevant to Trogov's concept o f a steroid synthesis following the AB D + ABD + ABCD aufbau principle.
+
Members of the Second Generation
Here the gestagen (-)-norethindrone 31a has been supplanted by (-)norgestrel 31b. The difference between the two molecular structures, minor in itself, still has far-reaching consequences for biological action and synthetic accessibility. The presence of the ethyl group in place of the methyl group at C( 13) slows down the compound's metabolism, thereby increasing bioavailability and also ordaining that total synthesis now has to take the place of partial synthesis. This begins (Scheme 1-10)with the condensation of (~)-l-vinyl-l-hydroxy-G-methoxy-l,2,3,4-tetrahydronaphthalene (rac-34)with 2ethylcyclopentane-l,3-dione(35b) [73]. The resulting seco-dione 3Gb, with a meso configuration, can be reduced microbiologically to one of four stereoisomers: the microorganism used (Saccharornycesuvarurn) approaches the surface of the five-membered ring differentially from one of the two diastereotopic half-spaces and selectively attacks only one of the two enantiotopic carbonyl groups [74b]. The reduction product 37b can be stereoselectively converted into (-)-38b (as reported by V. Torgov [74a]) and finally ( H . Smith [75])into (-)-norgestrel 31b.
24
I
I Chemistry and Biology - Historical and Philosophical Aspects
Members of Later Generations
The search for unnatural gestagens with improved properties by the trial and error approach continues. Oral applicability (through ethynylation at C(17)) and at low dosages (thanks to slow metabolism because of the ethyl group at C(13)) have already been achieved. A new, exogenous gestagen therefore has prospects of being favored over already known preparations only if it distinguishes itself in at least one of the three following aspects: through a higher binding specificity to the complementary receptor (i.e., biological); through more economically advantageous accessibility (i.e., chemical);and/or through some advantage arising from patent law (i.e., commercial). What this means in detail should become clear through illustration with later-generation gestagens. Gestoden 39 (Scheme 1-11) has the lowest ovulation inhibitory dose of all gestagens known to date. It displays both antiestrogenic and antimineralcorticoidal activity. A lower affinity to the androgen receptor is not sufficient to produce measurable anabolic androgenic effects. The pathway to 39 passes through compound 47 (Scheme 1-12) [7G] and after microbiological introduction of an 0 function at C(15) (with the aid of Penicilliurn ruistuickii), on through the stations 48 (R = H or Ac) and 49 [77]. Compound 31b, incidentally, can be easily obtained starting from 47 [78]. Desogestrel 40 (Scheme 1-11) is a progestagen that is transformed in the intestinal mucosa and in the liver into the actual effective metabolite 3-ketogestrel. The bioavailability is around 75%. Desogestrel, obtained partially synthetically by chemists at Orgunon [79], displays minimal androgenic and estrogenic activity. The long pathway from the 19-norsteroid estr-4-ene-3,17-dione includes a microbiological hydroxylation of
39
40
41
Scheme 1-11 Cestagens of the Pill of later generations: (-)-gestodene 39, (-)-desogestrel40, and (-)-drospirenone 41.
1.4 Bringing Chemical Solutions to Biological Problems 125
.J-:3:1
&&
42
43
0
44
0
/
O A O E t
46
45
48
Scheme 1-12
47
49
Collection offormul ae relevant to syntheses of (-)-norgestrel 31b a nd
(-)-gestodene 39 in both cases via 47.
the steroid skeleton at C(11) and an intramolecular functionalization of C(18).
E. J . Corey et al. [80]reported a total synthesis (Scheme 1-13) beginning with the reduction product 50, easily accessible from 42'"'. Alkylation of the metallated enol derived from 52 with m-methoxyphenylethyl-iodide to afford the tricyclic P-keto ester 53, followed by cationic cyclization of this to furnish the steroid derivative 54, warrants particular attention. Corey and colleagues have recently published another total synthesis of 40 [82], beginning with an enantioselective Diels-Alder reaction between Dane's diene 14 and dienophile 61. An oxazaborolidinium salt (see Section 1.3.2.3)was used as an efficient catalyst (Scheme 1-14). Drospirenone 41 (Scheme 1-11),the latest of the exogenous gestagens, differs from its antecedents in some characteristic ways: 16) The bicyclic, chiral, non-racemic building block 42 represents a milestone in the his-
tory of organic chemistry. It is accessible in high chemical yield and enantiomeric
excess from the achiral triketone precursor through a proline-catalyzed, intramolecular aldol condensation (Hajos-Parrish-EderSauer- Wiechert reaction [76,81]).
26
I
7 Chemistry and Biology - Historical and Philosophical Aspects
54
55
56
59
58
57
60
Scheme 1-13 Collection offormulae relevant t o a synthesis of (-)-desogestrel40 opened by the asymmetric Hajos-Parrish-Eder-Sauer-Wiechert reaction.
61
63
62
26 b
64
38
65
Scheme 1-14 Collection o f formulae relevant t o a synthesis of (-)-desogestrel 40 opened by an asymmetric Diels-Alder reaction o f Dane’s diene 14 and dienophile 61.
I . 4 Bringing Chemical Solutions to Biological Problems
127
constitutionally, in that both angular positions are occupied by methyl groups whilst the tetracyclic steroid skeleton is endowed with three additional rings, and biologically, in that 41 is an unnatural gestagen that both acts as an aldosterone antagonist and at the same time displays pronounced antiestrogenic and antiandrogenic properties. With this combination of activities in one and the same dosage, drospirenone currently holds a leading position in hormonal contraception, although it requires a higher dosage than gestagens with an ethyl group at C( 13). The synthesis ofDrospirenone 41 (Scheme 1-15) [83]starts with the inexpensive androstenolone 66, which can be converted microbiologically (Colletotrichum h i ) into the 7a,lSa-dihydroxy derivative 67. A selective epimerization at C(7) proceeds by way of the acetalG8. Methylenation of the intermediate (C=C) bond appearing between C(15) and C(1G) is successfully accomplished with the aid of dimethylsulfoxonium methylide to provide 71, and that of the (C=C) bond between C(G) and C(7) through a Simmons-Smith reaction. The conversion of 76 into 41 can be carried out in a one-pot procedure, with a Pd-catalyzed hydrogenation being followed by a Ru-catalyzed oxidation and a hydrochloric acid-induced dehydration.
66
67
70
71
74
69
68
73
72
75
Scheme 1-15 Collection o f formulae relevant t o a synthesis of (-)-drospirenone 41 starting from the easily accessible androstenolone 66.
76
28
I
I Chemistry and Biology - Historical and Philosophical Aspects
Pinkus and Chang (Section 1.4.2.1.1),in their search for orally applicable contraceptives, had decided upon norethindrone after some 200 steroidal candidates had been examined one by one. Chemists at Schering AG had stumbled upon drospirenone after some 600 newly prepared molecules with antialdosterone activity had become available [84].It can be justifiably stated that the hardly ineffectual pharmaceutical industry had finished up in a Mind alley in its search for new active substances by using traditional strategies [85]. The rapidly progressing expansion of the world market, where new suppliers have arrived in great numbers (globalization), places serious decisions before the management of every multinational company [86] (see Section 1.3.1). These are not merely restricted to restructuring of portfolios of the products manufactured; they also do not exclude the reorganization of the entire company structure”). Under real pressure from financial analysts and resumptive pressure from shareholders, questions have also been directed toward the scientists involved: whether there might be new methods that could afford more rapid access to new active substances. The answer was not long in coming: with chirotechnologyI8)and the combinatorial acceleration of the preparation and screening of whole populations of molecular candidates, a new turn has been taken in the solution of biological problems through chemical methods.
1.4.2.2
Multicornponent Simultaneous Procedure
Darwinian evolution is kept in motion by a continual succession of newly arising variation and its modification by natural selection. The search for active substances proceeds through multiple-component simultaneous procedures, in which a restricted variant population is prepared on a microscale by a combinatorial strategy, to be subjected to the new form of selection, that is, collective screening. After a successfully applied unnatural selection of a particular variant with the desired properties, synthesis on a macroscale can take place. In Section 1.4.2.2.1 a static variation is going to be prepared and screened for anti-inflammatory 17) The consequences arising from reorganiza-
tion of the structure of a business may be guessed by careful market analysis. Most difficult to predict is the reaction of employees. If the creative people among them are not convinced by the new orientation, or have even been put off by the way in which it has been implemented, they may defect to the competition, thus doubly weakening their previous employer.
18) One of the main challenges of synthetic chemistry in the post-Woodwardian era (see Section 1.3.2.3) is to find routes that satisfy the demands of industrial applicability to enantiomerically pure compounds [37]. In 1992, various international journals (Financial Times, Neue Ziircher Zeitung, Science, and Chemical & Engineering News), as if coordinated by a global editor, touched on the phenomenon of chirality. C&EN even predicted that chirotechnology may progress in the future as biotechnology had grown in the past.
1.4 Bringing Chemical Solutions to Biological Problems
activity of individual variants that might be useful in controlling asthmatic inflammation19’. The worldwide incidence, morbidity, and mortality of allergic asthma are increasing. Asthma has become an epidemic, affecting 155 million individuals throughout the world. It is a complex disorder characterized by local and systemic allergic inflammation, mucus hypersecretion, and reversible airway obstruction [88].The pathogenesis of asthma reflects the activity of cytokines from T Hcells. ~ Without these cells there is no asthma. Animal models support important roles for the cytokines IL-4, IL-5, and the recent IL-13 [89].The latter is closely related to IL-4: they both bind to the same IL-4 receptor, to the a-chain of that receptor, particularly. The molecular biologist is interested in the molecular consequences of allergen binding to the T-cell receptor. Experimental investigations have revealed various signal-transduction pathways that link T-cell surface molecules with nuclear transcription events. A [Ca2+]-dependentroute has been discovered, emanating from the T-cell receptor, which can be blocked by natural products of fungi: cyclosporine A (CsA) and FK 506 (Scheme 1-16). Another signal-transducing pathway, independent of [Ca2+],emanates from the IL-2 receptor and controls translational events on ribosomes. It can be blocked by a third natural product, rapamycin, but not by CsA or FK 506. Two signaling pathways have been targeted for pharmacological treatment of unwanted immune responses. It is essential to realize that blocking signal transduction leading to regulated transcription or regulated translation, requires CsA or FK 506 on the one hand and rapamycin on the other to be more than an inhibitor of a cognate target protein: calcineurin in the former and fascilin related adhesive protein (FRAP) in the latter case. As a matter of fact, the fungi-derived ligands in each case act as a “molecular glue” that mediates the interactions of primary and secondary receptors, forming a ternary receptor-ligand-receptor complex. Calcineurin is blocked by CsA and by FK 506, but only, after the two ligands have been activated by each complex primary receptor, cyclophilin A and FK-506 binding protein 12 (FKBP 12), respectively. In a similar way, rapamycin, on forming a binary complex with the primary receptor FKBP 12, is promoted to block the secondary receptor called FRAP on ternary complex formation (Table 1-1). An antigen bound by the receptor of a T cell sets in motion a long cascade of signal carriers and subsequent proliferation of T cells. In allergic subjects, this signal cascade can be initiated by allergens, which are by themselves actually harmless, leading to undesired T-cell overproduction. For allergy sufferers, therefore, it is desirable to specifically interrupt or slow down transcriptional or translational signal cascades involved in T-cell production. Because FK 506, rapamycin, and CsA are effective immunosuppressants, they cannot be 19) Project of the G e m a n Federal Ministry of’
Education and Research (87a], initiated by A. Kleemann, K. Brune. G . Quinkert; fordetails see (631 and [87b]. Beginning: 1 July 1994.
30
I
1 Chemistry and Biology - Historical and Philosophical Aspects
\
FK 506
Rapamycin
-4
CsA Scheme 1-16
Natural immunosuppressants.
Table 1-1 Naturally occurring immunosuppressants (ligands) and their receptor complexes Ligand
Cyclosporine FK 506 Rapamycin
Primary receptor
Secondary receptor
Cyclophilin FKBP FKBP
Calcineurin Calcineurin FRAP
Binary complex Ternary complex
considered suitable for long-term treatment of allergic patients. The search is on for nonnatura120)ligands with a more specific action on the immune system. A collection of non-natural ligands - synthesized independently in various laboratories - has demonstrated an immense chemical production effort in search of specific modulators of the immune system with significantly reduced 20) V. Prelog [90]has underlined the viewthat nat-
ural products hold a worthwhile message. H. Waldrnann et al. [91] entertain the plausible
argument that “natural products are biologically validated starting points in structural space for compound library development”.
1.4 5r;nging Chemical Solutions to Biological Problems
molecular complexity. One can’t help wondering why the traditional method, making one compound at a time, analyzing it, and evaluating it biologically indubitably was applied by all synthetic groups involved. As the synthetic target structures aimed at are represented by isolated points scattered irregularly over a relatively small segment of structure space, a combinatorial approach furnishing a focused variation, whose members ought to be represented by a cluster of points in abstract structural space, would seem promising. 1.4.2.2.1 Preparation and Screening o f a Static Variation The combinatorial approach that was pursued in search of an antiasthma drug based on a split-and-mix strategy [92] as a practical use of the operational principle of parsimony was to get the most with the least; in this case, to get 343 different types of variants in only 21 reaction steps. Scheme 1-17 sketches
Scheme 1-17 Construction o f a binary-encoded [93]combinatorial variation using the split-and-mix protocol (resulting in an one-bead-one-variant state) and an
encoding-decoding alternation (resulting in a state with every bead carrying a single tripeptide sequence).
I
31
32
I how a biased variation of 343 members was obtained on resin-beads in three 7 Chemistry and Biology
-
Historical and Philosophical Aspects
preparative rounds, each round allowing for the parallel attachment of one out of seven building blocks available. The complete set of monomeric building blocks used in the construction of the combinatorial variation of Scheme 1-17 is shown in Scheme 1-18.The aesthetic elegance of the combinatorial strategy reveals itself when compared with alternative strategies*’). The bead-bound substrate variation was screened for binding to a biological receptor (a fluorescence-conjugated immunophilin [87])by mixing a sample of the charged beads with a buffer containing the complementary protein. The beads that carry variants with affinity for the receptor are easily identified by visual inspection under a microscope with a fluorescent illuminator and removed with the aid of a (non-plastic) syringe. The sequence of each beadbound substrate variant has been determined indirectly but unambiguously by Clark Still’s encoding-decoding alternation [93].
Molecular encoding: During each step of the construction of a focused variation of tripeptides (see Scheme 1-17)tagging molecules are attached to the beads
Scheme 1-18 21 building blocks for the preparation o f t h e 343 tripeptides of Scheme 1-17 (building blocks 6,10, and 11 were used as racemates). 21) A divergent approach would require 399
+ +
(7’ 7’ 7 3 ) reaction steps, a serial approach even 1029 (73+ 7’t 7’) reaction steps to reach the same 343 variants [63, 871.
7.4 Bringing Chemical Solutions t o Biological Problems
that encode both the step number (one through 21) and the reagent (amino acid or acid chloride, respectively) used in that step. A combinatorial encoding of the 21 reaction steps requires altogether seven molecular tags (i.e., A, B, C; AB, AC, BC; ABC in one round). Molecular decoding: After screening the variation, the molecular tags22'can be cleaved photochemically from each of the selected beads and analyzed by gas chromatography [93].The specified on-bead selection test afforded a mixture of ruc-77 and rac-78 (Scheme 1-19). To explore its biological properties by various functional tests [94], a substantial amount had to be synthesized. Instead of going for 79 (Scheme 1-19)the more distant compound 80 (Scheme 1-20)was aimed at, by conventional synthesis technique. The cause for replacement oftarget structure 79 with 80 was accidental. While looking for linkers for solid-phase synthesis that can be cleaved enzymatically, the substitution took place. Substitution of the B-methoxyethylamino residue by the Z-protected lysine residue [87] led to higher biological activity in various functional tests. Compound 80, recently, [94] has been considered to be a promising candidate for the treatment of diseases accompanied by immunological inflammation. The combinatorial approach produces large variations of related molecules, which can be exploited by appropriate screening techniques. As far as the production ofthese variations and their screening are concerned, combinatorial chemistry reminds one of the immune system. In the immune system, antibodies recognize cognate antigens. Those antibody-producing cells that are effective against a particular type of invader molecules preferentially evolve from a huge population. If the invaders are pathogens or parasites, dynamic
6 OCH3
77
6 OCH3
78
OCH3 79
Scheme 1-19 On-bead molecules (rac-77 and roc-78) selected from the variation of Scheme 1-17. and the seeming target structure 79. 22) The molecular tags that were used are
composed of a series of electrophoric tags (halophenol derivatives) plus a photolabile linker [93].
I
33
34
I
1 Chemistry and Biology
Historical and Philosophical Aspects
0
H
80
0
\
81 82
81 81 82+83
a)82
- - bl
CI
83 84
d)
85
+86
80
e)
a) 6 0 ~ ~aq0 NaOH, , dioxane, 90 % b) MeOH. SOClp, 98 % c ) 2-Chloro-1methylpyridiniumiodide, CH2Cl2.NEt3. 50 % d) MeOH. 2.5 N NaOH, 74 % e) 2-Chloro-1methylpyridiniumiodide, CH2Clp,NEt3. 86 %
Scheme 1-20 Collection of formulae relevant to a synthesis of the biologically active candidate 80.
coevolution between them and the host may occur. There is, however, a tremendous difference between a static variation and the immune system. While the processes of preparation and screening of a static variation were designed by chemists, what happens in immunology was not designed but rather evolved. The preparation of a dynamic variation (to be described in the following section) is somewhat in between the two extremes, though very much closer to the designer's end. 1.4.2.2.2
Preparation and Screening of a Dynamic Variationz3)
In the previous section, a well-known method was applied to a long-standing biological problem: the discovery of a new biologically active substance. With 23) For dynamic non-covalent chemistry see 1951.
1.4 Bringing Chemical Solutions to Biological Problems
the intention of finding such a substance displaying properties closest to a setup profile, a static molecular variation was prepared (on microscale) and screened (collectively) to afford a select variant qualifying as the candidate for subsequent synthesis (on macroscale). In this section, we present the selfassembly ofa variation ofthree sets ofconjugates from which an added receptor selects a number of effectors by molecular recognition. This selection works by way of the interactions of protein surfaces within the receptor-effector supermolecule, the knowledge of which ought to be helpful in drug design. The self-assembly to be introduced is based on three pyranosyl-RNA (p-RNA) [96] single strands (a, b, and c, Scheme 1-21) associating in a Watson-Crick-like manner, initially into binary and further on into ternary super molecule^^^). In
Scheme 1-21
Base-pairing dynamics of single strands a, b, and c.
24) Project of the G e m a n Federal Ministry of
Education and Research [97a];for details see [87b][97b]. Initiated by A. Eschenmoser, U.-H. Felcht, G. Quinkert [97c]. Beginning: 1 April 1995.
I
35
36
I addition to the H bridges, intercatenary n,n-stackingeffects make a substantial I Chemistry and Biology - Historical and Philosophical Aspects
contribution to the stabilization of the resulting duplexes [9Ga, 9Gd]. In its current form, the self-assembly is based on three p-RNA single strands with 7 (a and b) or 14 (in the case of c) nucleobases. The two short strands are sequence complementary to the first seven or the last seven bases in the longer strand. The pairing gives rise eventually to water-soluble ternary complexes acb (Scheme 1-21). Strand c is involved in all the equilibria. Since strands a and b are unable to pair with one another and as they bind to non-overlapping regions of c, they do not compete with each other in binding to c. The unusual designation acb is used to reflect the dominant role of the longer strand c in complex formation. The following equilibria, with five independent equilibrium constants25), apply to the pairing of the complementary strands: ci
+ aj *aj
: ci,
Subscripts i,j , and k are used to distinguish various possible sequences displaying the required complementarity. Scheme 1-22 shows a network representation of the above set of equilibria. The nodes in the network correspond to the individual strands involved in the equilibria, while the lines represent their possible associations or dissociations. Along a given line, the concentrations of a single strand or of several strands vary between zero and the maximum disposable value. Each of the colored lines corresponds to a single strand, whilst black lines relate to more than one strand or to a binary complex. With the exceptions of a and b, which have only two connections each, all other nodes have at least three available connections, whilst the node for the ternary acb complex has as many as five. The network here results from the superposition of the synchronous formation from a, b, and c with the formation both from ac plus b and from cb plus a. 25) (1)and (2) form closed subsystems. As soon
as all three components are present, however, the full system of equilibria (1-5) is valid. Equilibrium (5) represents the synchronous formation of the ternary complex
out of the three single conjugates. Since this corresponds to third-order kinetics, a process of this type is significantly less probable than the purely bimolecular processes (1-4).
1.4 Bringing Chemical Solutions to Biological Problems
I '
I a
acb
I b
l
\
rh
C
Variation of [a] Variation of [b] ~
Scheme 1-22
Variation of [c] Network representation of equilibria (1)-(5)
In a three-dimensional representation, the strands and their complexes can be arranged as the vertices of a trigonal bipyramid, its edges corresponding to the equilibrium arrows from (l)-(S)26).Each state ofthe system is thus a point within the trigonal bipyramid. The stability of the complexes may be preserved when the pairing-capable strands a, b, and c are extended into sets of conjugates2'' A, B, and C (Scheme 1-23). Coupling with a series of oligopeptides transforms the pairing system (selfassembly system) with the three single strands a, b, and c into an exploring system (molecular recognizing system) with the three sets of conjugates A, B, and C. The equilibria (1)-(5) also apply to the conjugates, if the subscripts i, j, and k are used to denote the oligopeptides employed. For the resulting system there is a particular assignment of roles: the pairing system based on the p-RNA strands a, b, and c serves to bring the peptide regions into proximity with each other, thus supporting their joint function. The law of mass action applies here not only to the self-assembly but also to molecular recognition, ensuring that the full potential of the structural variation can be exploited. As effectors, the triple peptide combinations are capable of entering into specific interactions with a further component, a receptor R (Scheme 1-24).As a selector of complementary oligopeptide combinations, the receptor enables unnatural selection from the variation of conjugates. 26) I t should be pointed out that the transition
from ac to cb does not take place as a direct, single process, but should be regarded only as a conflation of processes ac cf a + c and cb c + b. The corresponding edge of the bipyramid thus - unlike the other edges - does not symbolize a single equilibrium. c)
27) For the conjugates the following p-RNA sequences have been used: a = {CGGGGGNJ. b = [NGAAGGG], and c = (CCCTCTNCC CCCG}. N is a tryptamine nucleoside [98],
which serves to attach the oligopeptides (discrete random variation of hexapeptides composed of the amino acids C, E, F, H , K , L, N, R, S, T, W).
I
37
38
I
7 Chemistry and Biology
-
Historical and Philosophical Aspects
Scheme 1-23 Equilibria between members ofthe three sets o f conjugates of types A, B, and C each with p-RNA moieties (gray) t o make self-assembly possible and oligopeptide moieties (green) t o allow molecular recognition.
The equilibria (1-5) described above now need to be supplemented, first to take account of the receptor itself, and second to allow for the receptor complexes with the various components of binary and ternary aggregates shown in Scheme 1-23: altogether eight molecular species are now involved. Scheme 1-25 shows the corresponding network of 8 nodes and 28 possible equilibria, each of the nodes having 7 connections. As in Scheme 1-22, green, red, and blue lines represent the possible binary equilibria, whilst black lines denote potential ternary and quaternary equilibria. In the interactions with a receptor, unlike in the case of the separate ternary complex, there are several types of substitution equilibria in which conjugates
1.4 Bringing Chemical Solutions to Biological Problems
Scheme 1-24 Sketch o f molecular recognition of a receptor (R) by a complementary effector (here by a discrete variant of type ACB).
are exchanged. There are three types of pure binary substitutions, and two higher order substitutions where one conjugate is substituted for two others at a time. Whether these simultaneous exchanges of several conjugates, as well as the higher order associations and dissociations are relevant, though, remains to be determined experimentally. The alternative of stepwise processes is available in any case. Topologically, the molecular species can be ordered into four levels of complexity28’(Scheme 1-25). On the simplest level is the free receptor R. The level above is represented by the binary complexes R:A, R B , and R C , the next level by the ternary complexes RAB, RAC, and RBC, whilst lastly the level of highest complexity is occupied by the quaternary complex R:ACB. Accordingly, the participating species can be arranged as vertices of a cube. All possible equilibria are now either edges, or face- or space-diagonals of the cube and the system is, by definition, described by a point inside the cube at any time. The cube-style representation shows, firstly, that pathways from one species to another are possible either via both edges and diagonals, or exclusively via 28) The free ternary complex and its subsystems
are found on these levels likewise and are continuously present over the full span of equilibria. For the sake of clarity, however, they are not explicitly taken into account here.
I
39
40
I
1 Chemistry and Biology
-
Historical and Philosophical Aspects
Scheme 1-25 Network representation of all possible equilibria extending Scheme 1-24. The eight nodes are labeled by bold characters. All other intersections are
artifacts of the two-dimensional representation. For the sake o f clarity, faceand space-diagonals ofthe cube are not shown.
edges or diagonals. Secondly, it also demonstrates the high syntactic symmetry (equivalenceof the different types of interactions) of the system and underlines the exchangeability of receptor and effectors. To delineate pharmacological properties of members of the dynamic system shown in Scheme 1-25, data of an enzyme-binding experiment from a realtime biomolecular interaction analysis27)and data of an enzyme-inhibition experiment from a photometric assay30)have been correlated (Scheme 1-26). One can see that the strongest affinity (binding) does not give rise to the greatest activity (inhibition). Affinity is not proportional to activity. Species RAC shows the strongest affinity, whilst species RACB causes the greatest activity. Since species RCB has the weakest affinity, it is clear that B makes no cooperative contribution to affinity, but is important for effective activity. 29) The biotinylated conjugates (ACB, AC, BC, 30) The enzyme is mixed with its photolabeled or C) are captured by a sensor chip, whose substrate S. Upon cleavage by the enzyme,
surface is coated with immobilized streptavidin and which acts via surface plasmon resonance as a tool for enzyme (R) binding experiments.
the label is activated and fluorescence can be detected. In case ofinhibition by the effector, cleavage does not occur and fluorescence is not detected.
7.4 Bringing Chemical Solutions to Biological Problems
Obviously, there is no additivity of the individual conjugates’ contributions. From the quantitative point of view this corresponds to non-linear behavior. The influence on the enzymatic reaction has to be interpreted in terms of either competitive inhibition (ACB:R)31), uncompetitive inhibition (ACB:RS), mixed inhibition (ACB:R ACB:RS), or substrate capture by the conjugates. It should be noted that interactions of A, B, and C with the receptor may mutually influence one another in both cooperative or anticooperative fashion. Furthermore, the coordinating role that conjugate C is playing in self-assembly (Scheme 1-23) may be pushed into the background or may even be absent entirely while interacting with the receptor.
+
Scheme 1-26 Correlation diagram of affinity (binding) and activity (inhibition) for some nodes ofthe network of Scheme 1-25. Values for ACB are set to 100%. 31) Here, and in the other possibilities men-
tioned, ACB:R stands for any ofthe molecular species from Scheme 1-25 containing the receptor.
I
41
42
I
7 Chemistry and Biology - Historical and Philosophical Aspects
1.4 Bringing Chemical Solutions to Biological Problems
For a screening experiment on enzyme inhibition (Scheme 1-27),a variation of conjugates of types A, B, and C was formatted spatially addressable using 16 microtiter plates. One out of 1308 different C conjugates was given each in a separate well, together with 1of 8 different A conjugates and 1 of 11 different B conjugates, as indicated on the margins. In 99 of the remaining wells, the single A or B conjugates were given as inactive blank controls. The last well was filled with solvent and buffer, only. To each of the various mixtures the enzyme used was added, together with its fluorescence-labeled substrate s. In each well, the enzyme could either select the substrate or the conjugates of Scheme 1-25. In the first case, the labeled substrate would be cleaved by the enzyme and fluorescence observed. In the second case, inhibition of the enzyme would occur and little or no fluorescence detected. The color coding in Scheme 1-27 indicates the degree of inhibitory activity found in each case. White and pale blue denote inactive substances, red and violet denote strong inhibitory effects. In a separate measurement, an ICs0 value of 23 nM was found for the strongest inhibitor (position A 8 / B l l on the plate in the fourth column, third row). Surprisingly, there are not only single point hits but also whole clusters of hits in which the participating conjugates display inhibitory activity. A closer inspection of, for example, all the wells in which conjugate A4 is present, reveals that the majority indeed shows activity, independently of the B and C conjugates added. This notwithstanding, not all 16 plates show the same distribution of active and inactive triplets, even though the A and B conjugates are the same in each plate. So, variation in the C conjugate significantly influences the activity of the A and B conjugates. This is especially apparent in the mixtures of A3 with B1 through B8 and of A2 with B1, B3, and B5 through B7 in the plate of the second column, third row. Only in the presence of a C conjugate do A and B conjugates contribute to the observed activity in this case. The law of mass action suggests to depart from the 1 : 1: 1 stoichiometry in the search for maximum activity. On changing the concentrations of individual conjugates, one shifts the molecular system parallel to edges or planes of the cube (Scheme 1-25).The statistical weights of the contributions of individual conjugates to the network of interactions are altered in the process. Scheme 1-28 shows the results of a pilot experiment in which the inhibitory activity was measured as a function of the concentrations of the A and B c o n j ~ g a t e s ~The ~ ) .results are displayed as a hypersurface for a constant concentration of conjugate C. The sigmoidal dose-activity relationship is clearly evident with regard to both A and B. The stoichiometric composition with [A] = [B] = [C] = 555 nM is represented by a point located on top of a ridge, separating a flat region of the hypersurface from a descending slope. Starting from the stoichiometric point, activity increases with the concentrations of A and B. The strongest inhibition value was found at the bottom of the slope 32) Results relate to the second strongest inhibitor found in the screening. In Scheme 1-27 it is to be found on the plate in
the third row and the second column with the conjugates A3/B1. The results presented in Scheme 1-26 refer to the same complex.
I
43
44
I
7 Chemistry and Biology
-
Historical and Philosophical Aspects
Scheme 1-28 Three-dimensional (hypersurface) view ofenzyme-inhibition activity o f a combination ofthree conjugates, A, B, and C as a function of the concentrations o f conjugates A and B. The
stoichiometric composition [A] = [B] = [C] = 555 nM is close t o a ridge. Increasing the concentrations o f A and B enhances the activity.
with [A] = [B] = 5000 nM and [C] = 555 nM, where the properties of A and B have a 10 times greater statisticalweight than those of C33).From the foregoing discussion it can be directly inferred that the activity of a conjugate triplet is not connected to a single molecular species from Scheme 1-25. Given the dynamics of the supramolecular system described, one could go a step further and transgress the confinements of molecular constitution. It should be just as possible to use carbohydrates, steroids, terpenes or even nonbiogenic substance classes - dendrimers, for example - in place of the peptides. Through the addition of conjugates of different types of constitution, the transition from one type to another could be studied in a quasi-continuous way, opening up a further, new option for the determination of structure-activity relationships. The dynamics of the system allows it to adapt to changes in the environment. Adaptation here means that the balance between the interactions inside the 33) Comparing Scheme 1-28 with Scheme 1-26, one can see that the increase of activity on going from C to ACB, from CB to ACB, and from AC to ACB is consistent with the topology ofthe hypersurface in Scheme 1-28.
1.5 Bringing Biological Solutions to Chemical Problems
effector (between the individual conjugates) on the one hand and those between the effector and the receptor on the other hand, can change. Therefore, depending on the prevailing conditions, different molecular species may be responsible for the effects produced at the receptor. Particular combinations of members of the three sets described may be used to map the affinity profile of the receptor. In short: receptor profiling directly results from a thorough investigation of the dynamic system under discussion. It reveals the complementarity between the sites of the interacting surfaces of receptor and effectors and suggests the design for a specific, biologically active substance finally taking over from the analyzing effectors. Ultimately, the potential ofbiologicallyactive substances can only be assessed in actual biological systems by means of animal experiments (Scheme 1-29) and confirmed by subsequent clinical studies. En route to this, however, the dynamic system described here offers various options for the analysis and optimization of pharmacological parameters like affinity and activity. It is the heterobifunctional character of the dynamic system that allows the synthetic chemist to influence both intrinsic self-assembly as well as extrinsic molecular recognition in a controlled way. 1.5 Bringing Biological Solutions to Chemical Problems
1.5.1 Proteins 1991
Among the bio-macromolecules, proteins are distinguished all-round players. As fibrous proteins they are used for structural purposes. As enzymes they catalyze almost every chemical reaction in a cell with great power and high specificity. As gene regulators they control gene expression in development and evolution. As antibodies (immunoglobulins) they bind invading antigens. As motor proteins they convert chemical energy into kinetic energy. As transport proteins they mediate transmembrane movements of ions or metabolites. 1.5.1.1 A Look at Protein Structure and Generation from Different Angles The chemist fills the void in structure space left by the physicist who dislikes the integrated complexity of the molecular world. Even the chemist, for some time, had been treating his structure space rather unevenly. According to the Beilstein Doctrine341,macromolecules neglected by the organic chemist for a 34) Beilstein Handbook of
Organic Chemistry, an encyclopedia of known micromolecular carbon compounds, does not concern itself with macromolecular carbon compounds [17e].
I
45
46
I
I Chemistry and Biology - Historical and Philosophical Aspects
Scheme 1-29
Outlook: supramolecular network concept in pharmacology.
long time [17f],were finally taken up by the biochemist who could not afford to ignore bio-macromolecules like nuclear acids and proteins any longer. The bottom-up view of the biochemist eventually was complemented by the top-down attitude of the (molecular) biologist. Quite a few of those scientists who considered themselves molecular biologists entertained the idea [ 100aI that “other laws of physics’ might be discovered by studying the gene”. This search for the physical paradox [100b] remained an important element of the psychological infrastructure of the creators of molecular biology. As a matter of fact, the physicists among the new group were going to create a new approach to biology [loll.
1.5 Bringing Biological Solutions to Chemical Problems
1.5.1.1.1
The Chemist’s Look (1021
The HofFneister-Fischer Theory of Protein Structure was made public in 1902 [103, 1041. Accordingly, proteins consist of polypeptide chains in which the individual a-amino acids are linked to one another through amide (peptide) bonds formed between the COOH group of one amino acid and the NH2 group of the next amino acid. The structure of proteins, Linus Pauling has demonstrated, some time later, how deep knowledge of chemistry can lead to general rules [105]. The nature of the strong peptide bond, the role of weak hydrogen bonding, and the importance of complementarity [lo61 were such rules used in model building: one of Pauling’s methods to work out the structure of bio-macromolecules. Stepwise protein synthesis normally requires [ 1071 protection of the amino group of the first amino acid and the carboxy group of the next amino acid; activation of the carboxy group of the amino acid carrying the protected amino group to form a peptide bond; and finally, removal of the protecting groups. Polypeptide synthesis on insoluble polymer supports was pioneered by R. B. Merriield [108].This method could be automated and has facilitated protein synthesis enormously [ 1091. Chemical ligation of even unprotected peptide segments has recently been reported [IlO]. To summarize: systematic variation of structure with the aim of developing peptides for therapeutic use gives the synthetic chemist a good excuse for chemical synthesis. a-Amino acids, obtained from natural sources or from the synthetic chemist’s laboratory, play a trailblazing role in the gradual growth of chemical biology. For the synthetic protein chemist they are the obvious building blocks, for the teaching chemical generalist they are ideal demonstration objects with an unmistakable structural profile: two unlike functional groups and - with the exception of glycine - at least one stereogenic center within the smallest possible space. Nearly 50 years were to pass from Emil Fischer’s view that synthetic chemistry should contribute to the solution of biological problems [30] to Du Vigneaud’s synthesis of the neuropeptide oxytocin [ 1111. Preparative stumbling blocks in the selective protection and/or activation of functional groups as well as in the effective separation of complex reaction products, first had to be cleared from the path. Methodological progress toward the achievement of automated solid-phase synthesis, with or even without utilization of protecting group technology, finally made peptide synthesis more or less a routine matter. Sophisticated methods have been developed to ligate smaller peptide segments together to make larger peptides. As far as larger proteins are concerned, the chemist’s ability to control their structure (and functions) specifically is still in its infancy.
I
47
48
I 1.5.1.1.2
7 Chemistry and Biology
-
Historical and Philosophical Aspects
The Biochemist’s Look [112]
In his study of endergonic protein genesis,3s)the biochemist is driven by the desire to understand how the energy barrier from the amino acids to the peptide is overcome [113]. Paul C. Zamecnik, Mahlon Hoagland, and their colleagues developed and used a cell-free system for the in uitro study of the mechanistic details of protein genesis [114]. By the use of radioactive amino acids, it could be shown that, in an initial step, enzymatic activation of the one amino acid out of 20 induced by the hydrolysis of ATP took place following the reaction: Amino acid + ATP
Enzyme,
AMP-amino acid residue:enzyme pyrophosphate
+
The resulting adenylated amino acid appears to be tightly bound to its specific enzyme, the corresponding aminoacyl-tRNA synthetase. without leaving its enzyme, the former, in a consecutive step, reacts with a low-molecular-weight RNA (called soluble RNA = sRNA, later more logically known as transfer RNA = tRNA) to afford an aminoacyl-tRNA [115,116]. AMP-amino acid residue:enzyme tRNA
+
GTP Amino acid residue-tRNA +
+ AMP + enzyme
This transacylation furnishes conjugates that structurally bridge the gap between amino acids and their ordered arrangement in proteins.
1.5.1.1.3
The Molecular Biologist’s Look [117]
Aminoacyl-tRNAs not only bridged the gap between activated amino acids and their ordered arrangement in proteins but they also, rather dramatically, brought together the experimental biochemist and the theoretical molecular biologist [113, 1181. The biochemist, beyond biogenesis, takes a lively interest in flow of matter and energy during metabolism. The molecular biologist takes additional interest in the flow of genetic information during gene expression on the one-way road: D N A + RNA + Protein. M. Hoagland [115] and P. C. Zamecnik [116]with their sRNAs acted as the experimental biochemists while Francis Crick, by offering his adaptor hypothesis [119], figured as the theoretical biologist. Several years, before sRNAs were discovered, Crick had already proposed 20 types of adaptor-RNAmolecules, which could line up along an unspecified template-RNA, and each bind to a particular amino acid. In his own words: “one would require twenty adaptors, one for each amino acid, and separate enzymes would be needed to join each adaptor to its cognate amino 35) We distinguish in this essay products of
protein synthesis which were designed by man from products of protein genesis which were produced by evolution.
1.5 Bringing Biological Solutions to Chemical Problems
acid. Thus one is lead to suppose that after the activating step, discovered by Hoagland and described earlier (vide supra), some other more specific step is needed before the amino acid can reach the template”. Which template? Several observations had excluded rRNAs from being candidates for acting as templates. A cell, for example, could make a new type of protein without making a new type of ribosome. The template-RNA was finally disinterred as a class of unstable intermediates, self-explanatorilycalled messenger-RNAs ( ~ R N A s ) ~When ~ ) . J . D. Watson informed the scientific community “About the Involvement of RNA in the Synthesis of Protein” [117a]he could begin with the sentence: “The ordered interaction of the three classes of RNA controls the assembly of amino acids into protein”. Now essential details in brief: protein genesis (translation) is the central event in molecular biology. It takes place in the incredibly complex machinery3’) of the ribosome [124], where the syntactic structure of ribonucleic acids is translated into the syntactic structure of proteins. During the translation process, the information contained in a triplet codon of mRNA is decrypted by an anticodon of a tRNA molecule, according to the instructions of the genetic code. The genetic code is an abstract scheme for the redundant correlation of 64 “words” (nucleoside triplets) in the language of nucleic acids with 20 “words” (canonical amino acids) in the language of proteins. The synthetic chemist accepts the limitation on the number of amino acid building blocks as the price for his readymade use of the ribosomal protein generating system. The undisputed leading actors in the translation process at the stage of information transfer from ribonucleic acids to proteins are aminoacyl-tRNAs [ 1251. These are conjugates made up of proportions of both biopolymer types (language systems), produced through esterification of an amino acid with a tRNA. A particular tRNA with its anticodon corresponding to a specific amino acid is covalently coupled (esterified) with precisely this amino acid. The esterification takes place through the help of an enzyme (an aminoacyl-tRNA synthetase) capable of specifically recognizing and coupling that particular tRNA and its cognate amino acid [126].Whilst the self-assembly of mRNA and tRNA during translation is due to codon-anticodon interaction, based on Watson-Crick 36) Messenger-RNAs were the last of the RNA trio engaged in protein genesis, to be detected [120]. A further type of RNA has been discovered as a widespread, universal tool in biology for gene regulation by means of antisense-like interactions [121]. It is called inductive RNA (RNAi) and is produced from double stranded RNA in a cascade of enzymatic processes by a set of specific RNAses. Several regulatory pathways involving RNAi are known in many eukaryotes, including plants and mammals. RNAi is used extensively as a tool for research and its therapeutic potential is getting more and more obvious [122].
37) In an urgent appeal, we are certainly going to
follow henceforth, Carl Woese [123] requests to stop looking at an organism as a molecular machine. The machine metaphor, according to his view, overlooks much of what biology is. To understand living systems in any deep sense, “we must come to see them not materialistically, as machines, but as stable complex, dynamic organization”.
I
49
50
1 Chemistry and Biology - Historical and Philosophical Aspects
I pairing of complementary nucleobases, the mutual recognition of a tRNA and its cognate synthetase during aminoacyl-tRNA formation is due to molecular shape complementarity.
1.5.1.2 1.5.1 2.1
The Genetic Code [127] Cracking the Genetic Code
The genetic code was cracked in the early 19GOs, beginning with investigations by Marshall Nirenberg and Heinrich Matthaei by using a cell-free E. coli system. The N I H researchers, in an inaugural experiment demonstrated that the homopolymer polyuridylic acid coded for the nonnatural protein polyphenylalanine [ 1281. Clearly, the natural system of protein genesis would translate any appropriate message, natural or artificial, into a polypeptide chain, natural or artificial [116]. 1.5.1.2.2
Expanding the Genetic Code
By Natural Selection
The genetic code has the potential for 64 (=43) triplet codons, 61 of which redundantly specify the 20 canonical amino acids. The methionine-specifying triple code AUG may take on the role of a starting signal at the beginning of protein synthesis: it thus has a double function. Three triplet codes in a mRNA - UAA (ochre), UGA (opal), and UAG (amber) - known as nonsense codons, specify no amino acids; that is, there are no tRNAs with complementary anticodons for these codons. As a consequence, translation breaks off here. The nonsense codons are also, therefore, termed stop signals (termination codons). Broader roles in protein genesis, however, have also been established for two of these three stop signals in recent years. In E. coli (and also in a whole range of other organisms) the UGA codon may be redefined to perform one of two different functions: either it may function as a stop codon and thus end the elongation of the protein chain under construction, or further growth of the polypeptide chain may carry on with incorporation of selenocysteine [129],not a member of the standard set of canonical amino acids. Which of the two instructions is followed by the translation system is dictated by the secondary and tertiary structure of the mRNA to be decrypted (and possibly by protein factors). Similarly, structural alterations in mRNA are able to modify the programming of the UAG codon: once more, a codon that continues a translation in progress, in this case through the incorporation of pyrrolysine [130], is produced from a stop codon. The genetic code is thus naturally expanded from the standard set. Instead of the original 20 amino acids, 22 amino acids specified by mRNA sequences are currently recognized. Further as yet unrecognized extensions of the genetic code through natural selection cannot be excluded. Why no sense codon has (yet) been found to be doubly
1.5 Bringing Biological Solutions to Chemical Problems
coded, is unclear. The discovery that the genetic code, as a result of natural selection, already has more than 20 amino acid building blocks for protein genesis in store, poses the question of whether the genetic code might also be expandable by design; that is, whether amino acids not specified by the genetic code in their original version might be introducible into a polypeptide chain by translation. By Design [131]
Peter G. Schultz, a leading protagonist of the movement to consider biology an engineering discipline, is aiming at the construction of new proteins and, eventually of new organisms with enhanced properties. Two alternatives for site-specific in vivo incorporation into proteins, of amino acids not specified by the genetic code in their original version, have been designed to achieve that goal: systematic reassignment of three-base nonsense codons or use of supersized codons. The addition of a non-canonical amino acid to the genetic code requires - in the first case - additional components of the protein producing system: a noncanonical amino acid, an exogenous tRNA/aminoacyl-tRNA synthetase pair, and an unique codon that specifies the amino acid of interest. Orthogonality between the exogenous translational components (Scheme 1-30) and their endogenous opposite numbers is the key feature of this approach. With the effect that the codon for the noncanonical amino acid should not encode a canonical amino acid; that the new tRNA or the cognate aminoacyl-tRNA synthetase should not cross-react with any endogenous tRNA/synthetase pair; and that the new synthetase should recognize only the noncanonical and not any of the canonical amino acids.
A completely autonomous bacterium with a 21 amino acid genetic code was engineered. The bacterium can generate p-aminophenylalanine from basic carbon sources and incorporate this amino acid into proteins in response to the amber nonsense codon (1321. As the restriction of non-coding triplet codons limits the number of noncanonical amino acids, the question arises as to whether or not expansion of the genetic code by use of a supersized codon and cognate tRNA with an expanded anticodon loop might be possible. A study Exploring the Limits of Codon and Anticodon Size [133] reveals that the E. coli ribosome is capable of using codons of three to five nucleobases. The tRNAs that decode these codons are most efficient with a Watson-Crick complementary anticodon containing two additional nucleotides on either side of the normal-sized anticodon in the loop. An orthogonal synthetase/tRNA pair was designed and constructed, which site-specifically incorporates a noncanonical amino acid (L-homoglutamin) into proteins of E. coli in response to the four-base codon AGGA [134].
I
51
52
I
J Chemistry and Biology - Historical and Philosophical Aspects
Scheme 1-30 Incorporation of (a) canonical (yellow) and (b) noncanonical (red) amino acids into proteins in vivo.
1.5.2 Antibodies
The ribosomal system is not the only evolutionary accomplishment the synthetic chemist might use in pursuit of his ends. The immune system offers an example of how a biological solution can successfully be brought to exploit antibodies as enzymatic catalysts. As far as their functions are concerned, enzymes and antibodies normally are quite different. Enzymes have been selected for the transition state of a catalyzed reaction over millions of years [105].Antibodies have been selected for their affinity for the immunogen over a period ofweeks [135].Ifthe immunogen were a transition state analogue, the resulting antibodies should catalyze the appropriate reaction. Richard A. Lemer and Peter G. Schultz with their respective colleagues have designed molecules
1. I; Bringing Biological Solutions to Biological Problems
that could be used to guide the process of clonal expansion and somatic mutation to generate catalytic antibodies for a variety of reactions [136].Rather than going into details here, we refer to the authoritative book on catalytic antibodies 11371. The various articles ofthat book make for interesting reading: for the synthetic chemist who wants to design new catalysts as well as for the molecular biologist who wants to gain structural insight into antibody evolution.
1.6 Bringing Biological Solutions to Biological Problems
The composition of this essay followed the matrix
chemical problems
biological problems
Biological answers to biological questions are, of course, given by Nature directly. Man may use the complex systems of Nature with the aim to correct a fault (as, e.g., was done by Robert Edwards and Patrick Steptoe [ 1381 in reproductive medicine). Reproductive medicine cannot be discussed disregarding bioethical aspects [ 1391. The present authors are not competent to meet the bioethical requirements. For this reason, reproductive medicine is not further commented on. Up to now synthetic chemistry has been the dominant part of our reflection. Now synthetic biology comes in to meet the requirements of the sophisticated observer who wants to be informed about the newest development. At any rate, the fundamental question, WHAT IS LIFE? comes up. Under this title, two essays have been published; one by Erwin Schrodinger [140] in 1944 and the other by J . B. S. Haldane [141] in 1949. While the former focused on the physical aspect of the living cell, the latter considered life essentially as a pattern of chemical processes. A very pragmatic point of view was formulated in 1994 by Antonio Lazcano 11421 with the statement: “Life is like music, you can describe it, but not define.” In a state-of-the-art survey, Biology and the Future o f M a n 11431, of the US National Academy of Sciences, the chances to realize the dream of a man-made cell were pondered. The conclusion reached was: “Those who are hopeful about synthesizing a cell in the foreseeable future have every reason to retain their optimism.” However, they should be warned against false claims. Synthesis of life is one such false claim. Living things (i.e., a cell) can be synthesized but not life itself, and that is what people really mean when they are talking about synthesizing life. A question that keeps busy scientists in chemistry as well as in biology is about where the line separating inanimate from animate matter can be
I
53
54
I I drawn. In the past it has been tried to link the problem to the question of Chemistty and Biology - Historical and Philosophical Aspects
life’s origin in terms of molecular evolution [144]. Recently, sequencing of the human and other complete genomes has shed some new light on this field. The question of what the minimal set of genes would be necessary for a living organism can be put more concisely in the context of what is now called synthetic biology [145]. Both approaches, the top-down way of deactivating more and more genes of an existing species [146]and the bottomup way of assembling genes to build an organism with a fully synthetic genome [147],have not yet reached the goal to explain the transition from the inanimate to the animate world. On the one hand, results obtained through different methods to identify the minimal set of genes that constitute a living organism point to roughly 250 genes [148]. On the other hand, none of the synthetic constructs obtained so far covers the central functionality of life, self-construction, metabolism, adaptation, self-repair, reproduction, and evolution [149]. Nonetheless, the bottom-up route has turned into an engineering approach to synthetic biology [150].The strategy is to combine predefined DNA modules, so-called bio-bricks that can be combined to bio-circuits, designed to be implementations of biological functions [ 1511. In that sense, synthetic biology is seen as the successor of molecular cloning, in particular, with respect to safety issues.
1.7 EPI LOCUE
To round offthis essay, we point to two issues gaining more and more emphasis in chemistry. One thing is the problem of shared use of the limited sources of energy and raw materials. The other thing is the concept of a total synthesis, in particular for complex natural substances. Both topics underline that organic chemistry is far from being pure routine applying a comprehensive toolbox to solve any problem in synthesis [ 1521. Medical therapeutics, agrochemicals, and high-performance materials must be provided by organic chemistry to fulfill global needs. 1.7.1 The Fossil Fuel Dilemma o f Present Chemical Industry
For chemical industry, the interdependence of energy source and raw material supply is typical. This double function of fossil fuel to act as a source of raw material supply as well as an energy source will have to be terminated in a not-too-distant future [153]. Being the main source of raw material, fossil fuel should be maintained as long as possible for the chemical industry. A final way out to disentangle energy requirement and raw material supply
I would be to find new sources for one field or the other. Nuclear energy, 1.7 EPlLOCUE
despite political moves to dispense with nuclear power, could play a role as an alternative to fossil fuel. With petroleum supplies dwindling, there is increasing interest in selective methods for transforming other carbon feedstocks into hydrocarbons suitable for transportation fuel. The reductive oligomerization of CO and H l to produce hydrocarbons (specificallyn-alkanes) with highly controlled molecular weight (Fischer-Tropsch process [154]) from the vast reserve of coal, natural gas, oil, or biomass is one such process that was developed in the 1920s. The Goldman-Brookhart process (tandem alkane dehydrogenation-olefin metathesis [155]) is of a similar kind, but of recent origin.
1.7.2 Two Lessons From the Wealth o f Published Total Syntheses
The final proof of the structure of a natural product after the latter has also been synthesized in the chemist’s lab was, for a long time, common procedure [156]. In a few cases, disagreement raised a few eyebrows. This was the case for patchouli alcohol and for a molecule called hexacyclinol [157]. Quinine is an example of the difficulties associated with the notion of a total synthesis. Shouts [35, 37,1581 and murmurs [llb,159] have been expressed to comment on the wealth of total syntheses of natural products performed in the second half of the twentieth century.
1.7.2.1
Synthetic Lesson from Patchouli Alcohol: The Trouble with “the Last Structural Proof’ [160]
The peculiar case of patchouli alcohol (87) (Scheme 1-31) was told and commentated by Jack D. Dunitz [IbOa]. Following W. H. Perkin’s jun. advice [I561 to perform, as a final proof of structure a total synthesis of a natural product 87 was synthesized [IGOc]. The synthetic product proved to be identical to sesquiterpene whose structure had been derived from the results of a long series of chemical experiments lasting more than 50 years and apparently confirmed in 1961 by total synthesis [IGOc]. In spite of this, X-ray structure determination [IbOa] revealed that the accepted structure of patchouli alcohol was wrong. A careful reinvestigation showed that during chemical degradation as well as during synthesis a rearrangement of the molecular skeleton had taken place. The first reaction step of the chemical degradation (acetate pyrolysis affording patchoulene 88) and the last reaction step of the chemical synthesis (hydrolysis of the epoxide 89 obtained from 88) were accompanied by a rearrangement proceeding in precisely the reverse direction of the rearrangement in the other case. Taking this
55
56
I
1 Chemistry and Biology - Historical and Philosophical Aspects
Degradation
a7
t
87
Synthesis
88
i 89
(b) Scheme 1-31
Synthesis and degradation of Patchouli alcohol.
finding into consideration, a new synthetic approach furnished 87 without any difficulty [lGOd].
1.7.2.2
Synthetic Lesson From Quinine 90: The Trouble with Formal Total Syntheses [161a]
In the period between 1918 and 2001, a series of publications appeared that changed the claim of the total synthesis of 90 (Scheme 1-32) as a fact into a myth. It started with a paper of Rabe and Kindler in 1918 [lGlb]on the partial synthesis of 90 from quinitoxine (91),via quininone (92) (Scheme 1-32a).91 is a relais compound to 90, since it can easily be made from 90. In 1944 and 1945, Woodward and Doring published two papers [lGle]where they linked the partial synthesis of Rabe and Kindler to their own synthesis of 91 (Scheme 1-32b), taking the combination as a total synthesis of 90. Not being convinced of the view of Woodward and Doring, Stork published a new total synthesis of 90
1.7 EPILOGUE
92
HOP
Me
N
A
57
9-epf-quinine
90
quinidine
HO
I
HO F
9-epr-quinidine
MeN
, Ac
- qN, 0
Ac
Me mixture of stereoisomers
isoquinoline-7dl
OMe
91
Scheme 1-32 Synthesis of 90. (a) The Robe-Kindler partial synthesis of 90 I161 b]. (b) The Woodward-Diin'nglRabe-Kindler formal total synthesis of 90 [161e]. (c) The Stork total synthesis of 90 [161fl.
90
58
I
1 Chemistry and Biology
-
Historical and Philosophical Aspects
+
.POTBDPS
J.-+.OTBS
oAf= OTBDPS
94
Scheme 1-32
(Continued)
in 2001 [Iblfl. He started from the Taniguchi lactone (94) and proceeded via desoxyquinine (95) (Scheme 1-32c).According to Stork, a distinction between a real total synthesis and a formal one is necessary. Accordingly, the work of Woodward and Doring is an example of a formal total synthesis.
Acknowledgments
Our own investigations on multicomponent simultaneous procedures were supported by the German Ministry of Education and Research and carried out by a team ofpostdoctoral fellows. In addition to these colleagues whose names are mentioned in the references, Susanne Feiertag, Stefan Kienle, Stefan Raddatz, Jochen Muller-lbeler,Jochen Muth, Christoph Brucher, Heike BehrensdorJ; Andreas Kappel, and Marc Pignot have contributed to our understanding of dynamic variations. Oliver Boden took care of the equipment for the electronic version of the manuscript. We are indebted to n e o d o r a Ruppenthal for patient and skillful secretarial help. The greater part of this essay has been translated from German into English by Dr. Andrew Beard. We are grateful to the mentioned persons for their assistance and to the indicated institution for its generous support. Last but not least, we would like to emphasize that it was Albert Eschenmoser's idea to use p-RNA or analogs for selecting appropriate candidates from a self-assembly of a dynamic variation.
References 159 References 1.
2.
3.
4.
5.
6.
7.
8.
9.
(a) F.J. Ayla, T. Dobzhansky, (Eds.), Studies in the Philosophy of Biology-Reduction and Related Problems, Macmillan, London, 1974; (b) J. Cornwell, (Ed.),Nature’s Imagination-The Frontiers ofscientijjc Vision, Oxford University Press, Oxford, 1995; (c) G.R. Bock, J.A. Goode (Eds.),Novartis Foundation Symposium 213, The Limits of Reductionism in Biology, John Wiley and Sons, Chichester, 1998; (d) F. Crick, The Astonishing Hypothesis (Introduction), Simon & Schuster, New York, 1995. A. Stephan, Emergenz-Von der Unvorhersagbarkeit zur Selbstorganisation, Dresden University Press, Dresden, 1999. (a) Several authors, Special section on complex structures, Science 1999, 284,79; (b) T. Vicsek, The bigger picture, Nature 2002, 418, 131; (c) J.M. Ottino, Engineering complex systems, Nature 2004, 427, 399. (a) Z.N. Oltvai, A.-L. Barabasi, Life’s complexity pyramid, Science 2002, 298, 763; (b) L.H. Hartwell, 7.7. Hopfield, S. Leibler, A.W. Murray, From molecular to modular biology, Nature 1999, 402, c47. Several authors, Special section on networks in biology, Science 2003, 301, 1863. Several authors, Special section on systems biology, Science 2002, 295, 1661. M. Rees, Our Cosmic Habitat, Weidenfeld & Nicolson, London, 2001. M. Eigen, R. Winkler-Oswatitsch, Steps Towards Lqe; (a) Part 11, Chapter 4; (b) Part 111, Oxford University Press, Oxford, 1992. (a) H.-J. Rheinberger, Toward History of Epistemic Things-Synthesizing Proteins in the Test Tube, Stanford University Press, Stanford, 1997; (b) H.-J. Rheinberger, A history of protein biosynthesis and ribosome research, in Protein Synthesis and
10.
11.
12.
13.
14.
15.
16.
17.
18.
Ribosome Structure, Eds.: K.H. Nierhaus, D.N. Wilson, Wiley-VCH, Weinheim, 2004. (a) M. Seefelder, Indigo-Kultur, Wissenschaft und Technik, 2nd ed., ecomed Verlagsgesellschaft. Landsberg, 1994; (b) W. Wetzel, Natunvissenschaften und Chemische Industrie in Deutschland, Franz Steiner Verlag, Stuttgart, 1991; (c)W. Abelshauser, (Ed.), Die BASF- Eine Unternehmensgeschichte, Verlag C.H. Beck, Munchen, 2002; (d) E. Baumler, Ein Jahrhundert Chemie (zum 1OOjahrigen Jubilium der Farbwerke Hoechst AG), Dusseldorf, 1963; (e) E. Steingruber, Indigo and indigo colorants, Ullmann’s Encyclopedia ofhdustrial Chemistry, 5th ed., Vol A14, Verlag Chemie, Weinheim. (a) C.A. Russell, Role of synthesis in organic chemistry, Ambix 1987, 34, 169; (b) J.W. Cornforth, The trouble with synthesis, Aust. /. Chem. 1993, 46, 157. R.B. Woodward, in Perspectives in Organic Chemistry, Ed.: A. Todd, Interscience Publishers, New York, 1956, p. 155. F. Wohler, Uber kunstliche Bildung des Harnstoffs, Ann. Phys. Chem. 1828, 12, 253. J. Weyer, 150 Jahre Harnstoffsynthese, Nachr. Chem. Tech. Lab. 1978, 26, 564. C Voigt, Immer eine Idee besser-Forscher und Erfinder der Degussa, Degussa AG, Frankfurt am Main, 1998. A. von Baeyer, Zur Geschichte der Indigo-synthese, Ber. Dtsch. Chem. Ges. 1900, 33, LI, (Sonderheft). G. Quinkert, E. Egert, C. Griesinger, Aspects oforganic Chemistry, Verlag Helvetica Chimica Acta, Basel, 1996; (a) p. 2; (b) p. 55; (c) Fig. 5.4; (d) Section 10.2.6; (e) p. 5 and p. 79; Section 7.5. B.D. Ensley, B.J. Ratzkin, T.D. Osslund, M.1. Simon, L.P. Wackett, D.T. Gibson,
(4
60
I
1 Chemistry and Biology
19.
20.
21.
22. 23.
24.
25.
-
Historical and Philosophical Aspects
Expression of naphthalene oxidation genes in Escherichia coli results in the biosynthesis of indigo, Science 1983, 222, 167. (a) Zhi-Qiang X, M.H. Zenk, Biosynthesis of indigo precursors in higher plants, Phytochemistry 1992, 31, 2695; (b) H. Marcinek, W. Weyler, B. Deus-Neumann, M.H. Zenk, Indoxyl-UDPG-glucosyltransferase from baphicacanthus cusia, Phytochemistry 2000, 53, 201. T. Maugard, E. Enaud, P. Choisy, M.D. Legoy, Identification ofan indigo precursor from leaves of isatis tinctoria (Woad), Phytochemistry 2001, 58,897. (a) E.J. Corey, M. Ohno, R.B. Mitra, P.A. Vatakencherry, Total synthesis of longifolene, J . Am. Chem. SOC. 1964, 86,478; (b) E.J. Corey, General methods for the construction of complex molecules, Pure Appl. Chem. 1967, 14, 19; (c) E.J. Corey, Xue-Min Cheng, The Logic of Chemical Synthesis,Wiley, New York, 1989; (d) E.J. Corey, The Logic of Chemical Synthesis, Nobel Lectures Chemistry 1981-1990, World Scientific, Singapore, 1992, p. 686. S. Warren, Desigrting Organic Syntheses,Wiley, Chichester, 1978. (a) E.J. Corey, W. Todd Wipke, Computer-assisted design of complex organic syntheses, Science 1969, 166, 178; (b) E.J. Corey, Computer-assisted analysis of complex synthetic problems, Q. Rev. 1971, 25, 455; (c) E.J. Corey, A.K. Long, S.D. Rubenstein, Computer-assisted analysis in organic synthesis, Science 1985, 228, 408. (a) R.B. Woodward, Totalsynthese des chlorophylls, Angew. Chem. 1960, 72, 651; (b) R.B. Woodward, Fundamental studies in the chemistry of macrocyclic systems related to chlorophyll, Ind. Chim. Belg. 1962, 11, 1293. (a) D.H.R. Barton, The invention of reactions useful for the synthesis of specifically fluorinated natural
26.
27.
28.
29.
30.
31.
32.
33.
products, Pure Appl. Chem. 1977, 49, 1241; (b) B.M. Trost, Atom economy-A challenge for organic synthesis, Angew. Chem., Int. Ed. Engl. 1995, 34, 259; (c) J.F. Hartwig, Raising the bar for the “Perfect Reaction”, Science 2002, 297, 1653. H.C. Kolb, M.G. Finn, K.B. Sharpless, Click chemistry: Diverse chemical function from a few good reactions, Angew. Chem., Int. Ed. Engl. 2001, 40, 2004. A. Eschenmoser, in Neuorientierung der Chemie-Mode oder mehr? Podiumsdiskussion,Aventis Deutschland, Frankfurt am Main, 2002. G.S. Hammond, Restructuring of chemistry and chemical curricula, Pure Appl. Chem. 1970, 22, 3. A. Eschenmoser, Various comments made on organic synthesis and life sciences, in Chemical SynthesisGnosis to Prognosis, Eds.: C. Chatgillaloglu, V. Snieckus, Kluwer Academic Publishers, Dordrecht, 1996. E. Fischer, Synthetical chemistry in its relation to biology, 1.Chem. SOC. 1907,1749. (a) E. Fischer, Bedeutung der stereochemischen resultate fur die physiologie, Ber. Dtsch. Chem. Ges. 1894, 27, 3228; (b) D.E. Koshland, Jr, The key-lock-theoryand the induced-fit-theory,Angew. Chem., rnt. Ed. Engl. 1995, 33, 2375. A. Todd, J.W. Cornforth, Robert Robinson, Biographical Memoirs of the Fellows of the Royal Society, 1976, 22, 415. (a) E. Dane, Synthesen in der Reihe der Steroide, Angew. Chem. 1939, 52, 655; (b) G. Singh, Structure of Dane’s adduct,]. Am. Chem. SOC.1956, 78, 6109; (c) G. Quinkert, M. Del Grosso, A. Bucher, J.W. Bats, G. Durner, E. Dane’s route to estrone revisited, Tetrahedron Lett. 1991, 32, 3357; (d) G. Quinkert, M. Del Grosso, A. Doring, W. Doring, R.I. Schenkel, M. Bauch, G.T. Dambacher, J.W. Bats, G. Zimmermann, G. Durner, Total synthesis with a chirogenic
References I 6 1
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
opening move demonstrated on steroids with estrone or 18a-Homoestrone skeleton, Helv. Chim. Acta 1995, 78, 1345. R.B. Woodward, Experiments on the synthesis of estrone, 1.Am. Chem. SOC.1940, 62, 1478. A. Eschenmoser, RBW, Vitamin B12, and the Harvard-ETH Collaboration, in Robert Bums Woodward, Eds.: O.T. Benfey, P.J.T. Morris, Chemical Heritage Foundation, Philadelphia, 2001. G. Quinkert, M.V. Kisakurek, From Molecular Structure Towards Biology, Verlag Helvetica Chimica Acta, Zurich, 2001, (a) p. VII; (b) Section 3.2.1. (a) G. Quinkert, H. Stark, Stereoselective synthesis of enantiomerically pure natural products-estrone as example, Angew. Chem., lnt. Ed. Engl. 1983, 22, 637; (b) B. List, J.W. Yang, The organic approach to asymmetric catalysis, Science 2006, 313 1584. (a) K.B. Sharpless, Searching for new reactivity, Nobel Lecture Chemistry 2001; (b) S.Y. KO, A.W.M. Lee, S. Masamune, L.A. Reed, 111, K.B. Sharpless, F.J. Walker, Total synthesis of the L-Hexoses, Tetrahedron 1990, 46, 245. R. Noyori, Asymmetric catalysis: Science and opportunity, Nobel Lecture Chemistry 2001. E.J. Corey, Catalytic enantioselective Diels- Alder reactions: Methods, mechanistic fundamentals, pathways, and applications, Angew. Chem., Int. Ed. Engl. 2002, 41, 1650. S. Drenkard, J. Ferris, A. Eschenmoser, Chemie von a-Amonitrilen, Helv. Chim. Acta 1990, 73, 1373. (a) D. Seebach, A.K. Beck, A. Heckel, TADDOL and its derivatives-our dream of universal chiral auxiliaries, in From Molecular Structure Towards Biology, Verlag Helvetica Chimica Acta, Zurich, 2001; (b) K. Narasaka, Chiral lewis acids in catalytic asymmetric reactions, Synthesis 1990, 1. S.B. Tsogoeva, G. Durner, M. Bolte, M.W. Gobel, A C2-Chiral Bis(amidinium) catalyst for a
44.
45.
46.
47.
48.
49.
50.
51.
52.
Diels-Alder reaction constituting the key step of the quinkert-dane estrone synthesis, Eur. J . Org. Chem. 2003, 1661, and earlier papers. Qi-Ying Hu, P.D. Rege, E.J. Corey, Simple, catalytic enantioselective syntheses of estrone and desogestrel, 1.A m . Chem. Soc. 2004, 126,5984. (a) G. Quinkert, Five Decades of Steroid Synthesis, Vorlesungsreihe Schering, Berlin, 1988, Heft 19; (b) G. Quinkert, M. Del Grosso, Progress in the Diels-Alder reaction means progress in steroid synthesis, in Stereoselective Synthesis, Eds.: E. Ottow, K. Schollkopf, B.G. Schulz, Springer Verlag, Berlin, 1993, S. 109. K. Nicolaou, S.A. Snyder, T. Montagnon, G.E. Vassilikogiannakis, The Diels-Alder reaction in total synthesis, Angew. Chem., Int. Ed. Engl. 2002, 41, 1668. (a) M.B. Groen, F.J. Zeelen, Steroid total synthesis, Red. Trav. Chim. Pays-Bas 1986, 105,465; (b) F.J. Zeelen, Steroid total synthesis, Nat. Prod. Rep. 1994, 607. G . Quinkert, W.-D. Weber, U. Schwartz, H. Stark, H. Baier, G. Durner, Hochselektive totalsynthese von 19-Nor-Steroiden mit photochemischer Schlusselreaktion: Racemische zielverbindungen, Liebigs Ann. Chem. 1981, 2335. G . Quinkert, U. Schwartz, H. Stark, W.-D. Weber, F. Adam, H. Baier, G. Frank, G. Durner, Asymmetrische totalsynthese von 19-Nor-Steroiden mit photochemischer Schlusselreaktion: Enantiomerenreine zielverbindungen, Liebigs Ann. Chem. 1992,1999. T.A. Appel, The Cuvier-Geoffrey Debate, Oxford University Press, New York, 1987. M. Ruse, Evolution, in 7’he Oxford Companion to Philosophy, Ed.: T. Honderich, Oxford University Press, Oxford, 1995. J.P. Eckermann, Gespriiche mit Goethe in den LetztenJahren Seines Lebens, C. Michel, H. Grtiters (Hrsg.), Deutscher Klassiker Verlag, Frankfurt am Main, 1999.
62
I
I Chemistry and Biology - Historical and Philosophical Aspects 53. 54.
55.
56.
57.
58.
59.
60.
61.
62. 63.
64.
65.
66.
J. Browne, Charles Darwin, Vol. 11, A.A. Knopf, New York, 2002. E.A. Carlson, Mendel’s Legacy, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2004. W. Johannsen, Elemente der Exakten Erblichkeitslehre, G. Fischer, Jena, 1909. F.M. Burnet, Evolution made visible, in The Evolution ofLiving Organisms, Ed.: G.W. Leeper, Melbourne University Press, Melbourne, 1962. M. Eigen, Viren als modelle der molekularen evolution, Paul- Ehrlich-Ludwig Darmstadter Award Lecture, Frankfurt am Main, March 14th 1992. (a) M. Eigen, Self-organization of Matter and the Evolution of Biological Macromolecules, Naturwissenschaften 1971, 58,465; (b) Der Code des Lebens, 3 SAT, 26.04.2006, DVD, ZDF, 2006; (c) M. Eigen, From Strange Simplicity to Complex Familiarity, in preparation. G. Strunk, T. Ederhof, Machines for automated evolution experiments in vitro based on the serial-transfer concept, Biophys. Chem. 1997, 66, 193. E. Mayr, What Evolution is, Weidenfeld & Nicolson, London, 2002. I. Rechenberg, Evolutionsstrategie ‘94, frommann-holzboog, Stuttgart-Bad Cannstatt, 1994. J. Maynard Smith, Concept ofprotein space, Nature 1979, 280,445. G. Quinkert, H. Bang, D. Reichert, Variation and selection, Helv. Chim. Acta 1996, 79, 1260. W.B. Provine, Sewall Wright and Evolutionary Biology, The University of Chicago Press, Chicago, 1986. D.L. Hull, History of evolutionary thought, in Encyclopedia of Evolution, Vol. I, Ed.: M. Pagel, Oxford University Press, Oxford, 2002. (a) S.C. Gilbert, J.M. Opitz, R.A. Raff, Resynthesizing evolutionary and developmental biology, Dev. Biol. 1996, 173, 357; (b) J.S. Robert, Embryology, Epigenesis, and
67. 68.
69.
70.
71.
72.
73.
74.
Evolution- Taking Development Seriously, Cambridge University Press, Cambridge, 2004; (c) C.R. Woese, A new biology for a new century, Microbiol. Mol. Biol. Rev. 2004, 173, 68; (d) K.M. Weiss, The phenogenetic logic of life, Nut. Rev. Genet. 2005, 6, 36. J.-M. Lehn, Supramolecular Chemistry, VCH, Weinheim, 1995. P. Ehrlich, Partial cell functions, Nobel Lecture Physiology or Medicine 1908. (a) B. Asbell, The Pill, Random House, New York, 1995; L.V. Marks, Sexual Chemistry, Yale University Press, New Haven, 2001; C. Djerassi, This Man’s Pill, Oxford University Press, Oxford, 2001; (b) G. Pincus, Control of contraception by hormonal steroids, Science 1966, 153, 493. (a) A. Brzozowsky, A.C.W. Pike, Z. Dauter, R.E. Hubbard, T. Bonn, 0. Engstrom, L. Ohman, G.L. Greene, J.-A. Gustafsson, M. Carlquist, Molecular basis of agonism and antagonism in the oestrogen receptor, Nature 1997, 389, 753; (b) E.-E. Beaulieu, Contragestion and other clinical applications of RU 486, an antiprogesterone at the receptor, Science 1989, 245, 1351. C. Djerassi, L. Miramontes, G. Rosenkranz, F. Sondheimer, Synthesis of 19-Nor-17aethynyltestosterone and 19-Nor-17a-methyltestosterone, J . A m . Chem. SOC.1954, 76,4092. G. Quinkert, Hans Herloff Inhoffen in His Times, Eur. J. Org. Chem. 2004,3727. C. Rufer, H. Kosmol, E. Schroder, K. Kiesslich, H. Gibian, Totalsynthese von optisch aktiven 13-Ethyl-gonan-Derivaten, Liebigs Ann. Chem. 1967, 702,141. (a) I.V. Torgov, Progress in the total synthesis of steroids, Pure Appl. Chem. 1963, 6,525; (b) C.H. Kuo, D. Taub, N.L. Wendler, Mechanism of the coupling reaction of a vinyl carbinol with a B-Diketone, J. Org. Chem. 1968,33,3126.
References I 6 3 75.
76.
77.
78.
79.
80.
81. 82.
83.
84.
H. Smith. et al., 13fi-Alkylgona1,3,5(10)-trienes, 13fi-Alkylgon-4en-3-ones, and related compounds, /. Chem. Soc. (London), 1964,4472. (a) U. Eder, G. Sauer, R. Wiechert, Neuartige asymmetrische cyclisierung zu optisch aktiven Steroid-CD-Teilstticken, Angew. Chem., Int. Ed. Engl. 1971, 10. 496; (b) Z.G. Hajos, D.R. Parrish, Asymmetric synthesis of bicyclic intermediates of natural product chemistry,/. Org. Chem. 1974, 39, 1615. H. Hofmeister, K. Annen, H. Laurent, K. Petzoldt, R. Wiechert, Syntheses of gestodene, Drug Res. 1986, 36, 781. G. Sauer, U. Eder, G. Haffer, G. Neef, R. Wiechert, Synthesis of D-Norgestrel, Angew. Chem., Int. Ed. Engl. 1975, 14, 417. M.J. van den Heuvel, C.W. van Bokhoren, H.P. de Jongh, F.J. Zeelen, A partial synthesis of desogestrel based upon intramolecular oxidation of an Recl. 1Ifi-hydroxy-19-norsteroid. Trav. Chim. Pays-Bas 1988, 107, 331. E.J. Corey, A.X. Huang, A short enantioselective total synthesis of the third-generation oral contraceptive desogestrel, /. Am. Chem. Soc. 1999, 121, 710. B. List, Proline-catalyzed asymmetric reactions, Tetrahedron 2002, 58, 5573. Qi-Ying Hu, P.D. Rege, E.J. Corey, Simple, catalytic enantioselective syntheses of estrone and desogestrel, 1.Am. Chem. Soc. 2004, 126,5984. (a) H. Laurent, D. Bittler, H. Hofmeister, K. Nickisch, R. Nickolson, K. Petzoldt, R. Wiechert, Synthesis and activities of anti-aldosterones, J . Steroid Biochem. Mof. Biof.1983, 19, 771; (b) W. Elger, S. Beier, K. Pollow, R. Garfield, S.Q. Shi, A. Hillisch, Conception and pharmacodynamic profile of drospirenone, Steroids 2003, 68, 891. R. Wiechert, in Schering 1971- 1993, S. 149, Schering AG, Berlin, 2005.
85.
86.
87.
88.
89.
90.
91.
92.
(a) G. Quinkert, in High-Tech-Das neue Gesicht der Arzneimitte(forschung, H.1. Dengler, S . Meuer (Hgb.), G . Fischer, Stuttgart, 1995; (b) Several authors in: Special Issue of Science on Drug Discovery 2005, 309, 721-735. F. Aftalion, A History ofthe International Chemical Industry, 2nd. ed., Chemical Heritage Press, Philadelphia, 2001. (a) G. Quinkert, D. Reichert, H.-G. Schaible, B. Cezanne, Final Report of the BMBF Project No. 0310792, Projekttrager Jiilich, 2000; (b) G. Quinkert, Kombinatorische Chemie-ein Paradigmenwechsel in der Chemischen Synthese, Verh. Ges. Dtsch. Naturforscher u. Arzte, 120. Vers., Hirzel Verlag, Stuttgart, 1999; (c) H . 4 . Schaible, Kombinatorische Synthese codierter Verbindungsbibliotheken und Selektion immunsuppressiver Verbindungen, Dissertation, University of Frankfurt am Main, 1997. (a) W.W. Busse, R.F. Lemanske, Asthma, N. Engl. /. Med. 2001, 344, 350; (b) Several authors in: Nature 1999, B l , 402. M. Wills-Karp, J. Luyimbazi, X. Xu, B. Schofield, T.Y. Neben, C.L. Karp, D.D. Donaldson, Interleukin-13: central mediator of allergic asthma, Science 1998, 282, 2258. V. Prelog, Gedanken nach 118 Semestern Chemiestudium, in Chemie und Geseflschaft, Ed.: G . Boche, Wissenschaftl Verlagsges, Stuttgart, 1984, p. 57. D. Brohm, S. Metzger, A. Bhargava, 0. Muller, F. Lieb, H. Waldmann, Natural products are biologically validated starting points in structural space for compound library development, Angew. Chem., Int. Ed. Engl. 2002, 41, 307. (a) A. Furka, F. Sebestyen, M. Asgedom, G. Dibo, General method for rapid synthesis of multicomponent peptide mixtures, Int. /. Pep. Protein Res. 1991, 37, 487; (b) K.S. Lam, S.E. Salmon,
64
I
I Chemistry and Biology
93.
94.
95.
96.
97.
-
Historical and Philosophical Aspects
E.M. Hersh,V.J. Ruby, W.M. Kazmierski, R.J. Knapp, A new type of peptide library for identifying ligand-binding activity, Nature 1991, 354, 82. (a) M.H.J. Ohlmeyer, R.N. Swanson, L.W. Dillard, J.C. Reader, G. Asouline, R. Kobayashi, M. Wigler, W.C. Still, Complex synthetic chemical libraries indexed with molecular tags, Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 10922; (b) H.P. Nestler, P.A. Bartlett, W.C. Still, A general method for molecular tagging of encoded combinatorial chemistry libraries, I. Org. Chem. 1994,59,4723. A. Pahl, M. Zhang, K. Torok, H. Kuss, U. Friedrich, Z. Magyar, J. Szekely, I<. Horvath, K. Brune, I. Szelenyi, Anti-inflammatory effects of a cyclosporine receptor binding compound, D-43787,/. Phamacol. Exp. Ther. 2002, 301, 738. 1.-M. Lehn, Dynamic combinatorial chemistry and virtual combinatorial libraries, Chem. - Eur. J . 1999, 5, 2455. (a) S. Pitsch, S. Wendeborn, B. Jaun, A. Eschenmoser, Pranosyl-RNA (p-RNA),Helv. Chim. Acta 1993, 76, 2161; (b) I. Schlonvogt, S. Pitsch, C. Lesneur, A. Eschenmoser, B. J a m , R.M. Wolf, Pyranosyl-RNA (p-RNA): Duplex formation by self-pairing, Helv. Chim. Acta 1996, 79, 2316; (c) M. Bolli, R. Micura, S. Pitsch, A. Eschenmoser, Pyranosyl-RNA: Further observations on replication, Helv. Chim. Acta 1997, 80, 1901; (d) S. Ilin, I. Schlonvogt, M.-0. Ebert, B. Jaun, H. Schwalbe, Comparison of the N M R spectroscopy solution structures of pyranosyl-RNA and its Nucleo-b-peptide analogue, Chembiochem 2002,3,93. (a) N. Windhab, Final Report of the BMBF Project No. 0311030, Projekttrager Julich, 2001; (b) C. Miculka, N. Windhab, G. Quinkert, A. Eschenmoser, Novel substance library and supramolecular complexes produced therewith, PCT Int. Appl. WO 97143232. Chem. Abstr.
1998, 128, 34984; (c) G. Quinkert, Visionen-paradigmenwechseltechnologieschube, in Chemie-Eine reqe lndustrie oder weiterhin Innovationsmotor? Blazek & Bergmann, Frankfurt am Main, 2000. 98. C. Hamon, T. Brandstetter, N. Windhab, Pyranosyl-RNA supramolecules containing non-hydrogen bonding base-pairs, Synlett 1999, (suppl. l),940. 99. (a) C. Tanford, J. Reynolds, Nature’s Robots, Oxford University Press, Oxford, 2001; (b) Th. Creighton, Proteins Structures and Molecular Properties, 2nd Ed., Freeman, 2002; (c) Proteins at Work, Science 2006, 312(S10), 211-230. 100. (a) G.S. Stent, That was the molecular biology that was, Science 1968, 160, 390; (b) G.S. Stent, Introduction: Waiting for the paradox, in Phage and the Origins of Molecular Biology, Eds.: J. Cairns, G.S. Stent, J.D. Watson, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 1992, p. 3. 101 (a) M. Delbriick, A physicist looks at biology, in Phage and the Origins of Molecular Biology, Eds.: J. Cairns, G.S. Stent, J.D. Watson, Cold Spring Harbor Laboratory Press, 1992, p. 9; (b) A Physicist’s renewed look at biology-twenty years later, Nobel Lecture Medicine, 1969. 102. H.G. Khorana, Chemical Biology, World Scientific, Singapore, 2000. 103. F. Hofmeister, Uber Bau und Gruppierung der Eiweisskorper, Ergeb. Physiol. 1902, I, 759. 104. E. Fischer, Uber die Hydrolyse der Proteinstoffe, Chem. Ztg. 1902, 26, 939. 105. L. Pauling, Molecular architecture and biological reactions, Chem. Eng. News 1946,24,1375. 106. L. Pauling, M. Delbriick, The nature of intermolecular forces operating in biological processes, Science 1940, 92, 77. 107. M. Bergmann, L. Zervas, Uber ein allgemeines verfahren der Peptidsynthese, Ber. Dtsch. Chem. Gei. 1932, 65, 1192
References I 6 5 108. 109.
110.
111.
112.
113.
114.
115.
116.
117.
118. 119.
120.
B. Merrifield, Solid phase synthesis, Nobel Lecture Chemistry, 1984. S.B.H. Kent, Chemical synthesis of peptides and proteins, Annu. Rev. Biochem. 1988, 57,957. P.E. Dawson, S.B.H. Kent, Synthesis of native proteins by chemical ligation, A n n u . Rev. Biochem.. 2000, 69, 923. V. Du Vigneaud, C. Ressler, I.M. Swan, C.W. Roberts, P.G. Katsoyannis, S. Gardon, The synthesis of an octapeptide amid with the hormonal activity of oxytocin, 1. Am. Chem. SOC.1953, 75,4879. P.C. Zamecnik, Historical aspects of protein synthesis, A n n . N.Y. Acad. Sci. 1979, 325, 269. T. Pederson, 50 years ago protein synthesis met molecular biology: the discoveries of amino-acid activation and transfer RNA, F A S E B ] . 2005, 19, 1583. P. Zamecnik, The machinery of protein synthesis, Trends Biol. Sci. Lett. 1984, 9, 464. P.C. Zamecnik, Historical and current aspects of the problem of protein synthesis, Harvey Lecture, 1959. M. Hoagland, Toward the Habit of Truth, W.W. Norton & Company, New York, 1990. (a) J.D. Watson, Involvement of RNA in the synthesis of proteins, Science 1963, 140, 17; (b) P.B. Moore, T.A. Steitz, The roles of RNA in the synthesis of protein, in The R N A World, Eds.: R.F. Gesteland, T.R. Cech, J.F. Atkins, 3rd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2006. M. Hoagland, Enter transfer RNA, Nature 2004, 431,249. (a) F.H.C. Crick, On protein synthesis, Syrnp. SOC.Exp. Biol. 1958, 12, 138; (b) F. Crick, W h a t M a d Pursuit, Basic Books, New York, 1988. (a) S. Brenner, F. Jacob, M. Meselson, An unstable intermediate carrying information from genes to ribosomes for protein synthesis, Nature 1961, 190, 576; (6) F. Gros, H. Hiatt, W. Gilbert,
121.
122.
123.
124. 125.
126.
127.
128.
129.
C.G. Kurland, R.W. Risebrough, J.D. Watson, Unstable ribonucleic acid revealed by pulse labelling of Escherichia Coli, Nature 1961, 170, 581; (c) Walter Gilbert, The RNA World, Nature 1986, 319, 618. A. Fire, D. Albertson, S.W. Harrison, D.G. Moerman, Production of antisense RNA leads to effective and specific inhibition of gene expression in C. elegance muscle, Development 1991, 113,503. (a) Gregory I. Hannon, John J. Rossi, Unlocking the potential of the human genome with RNA interference, Nature 2004, 431, 371; (b) Chistian P. Petersen, John G. Doench, Alla Grishok, Phillip A. Sharp, The Biology of Short RNAs, in: 7'he R N A World, 3rd Edition, Eds.: R.F. Gesteland, T.R. Cech, J.F. Atkins, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 2006. CarlR. Woese, A new biology for a new century, Microbiol. Mol. Biol. Rev. 2004, 68, 173. T.R. Cech, The ribosome is a ribozyme, Science 2000, 289, 878. M. Ibba, D. Soll, The renaissance of aminoacyl-tRNA synthesis, E M B O Rep. 2001, 2, 382. I?. Schimmel, L.R. De Pouplana, Footprints of aminoacyl-tRNA synthetase are everywhere, Trends Biol. Sci. ( T I B S )2000, 25, 207. (a) F.H.C. Crick, On the genetic code, Nobel Lecture Physiology or Medicine, 1962; (b) M. Nirenberg, The genetic code, Nobel Lecture Physiology or Medicine, 1968; (c) H.G. Khorana, Nucleic acid synthesis in the study of the genetic code, Nobel Lecture Physiology or Medicine, 1968. M.W. Nirenberg, J.H. Matthaei, The dependence of cell-free protein synthesis in E. cali upon naturally occurring or synthetic polyribonucleotides, Proc. Natl. Acad. Sci. U.S.A. 1961, 47, 1588. D.L. Hatfield, In Soon Choi, B.J. Lee, J.E. lung, Selenocysteine a new addition-to the universal genetic code,
66
I
I Chemistry and 6io/ogy - Historical and Philc)sophical Aspects
130.
131.
132.
133.
134.
135.
136.
137. 138.
139.
in Transfer RNA in Protein Synthesis, 140. E. Schrodinger, What is Lfe?, Cambridge University Press, Eds.: D.L. Hatfield, B.J. Lee, R.M. Cambridge, 1944. Pirtle, CRC Press, Boca Raton, 1992. (a) P. Schimmel, K. Beebe, Genetic 141. J.B.S. Haldane in Philosophy of code seizes pyrrolysine, Nature Biology, Ed.: M. Ruse, Macmillan 2004, 431, 257; (b) J.F. Atkins, Publishing Comp., New York, 1989. R. Gesteland, The 22nd amino acid, 142. A. Lazcano in Early Lij on Earth, Ed.: Science 2002, 296, 1409. S. Bengton, Columbia University (a) L. Wang, P.C. Schultz, Expanding Press, New York, 1994. the genetic code, Chem. Commun. 143. P. Handler, Biology and the Future 2002, I, 1; (b) J. Xie, P.G. Schultz, An o f M a n , Ed.: P. Handler, Oxford expanding genetic code, Methods University Press, New York, 2005, 36, 227; (c) L. Wang, P.G. 1970. Schultz, Expanding the genetic code, 144. (a) S.L. Miller, A production of amino Angew. Chem., Int. Ed. Engl. 2005, acids under possible primitive earth 44, 34. conditions, Science 1953, I 1 7, 528; (a) R.A. Mehl, J.C. Anderson, S.W. (b) S.L. Miller, L.E. Orgel, The Origins Santoro, L. Wang, A.B. Martin, D.S. ofLfe on the Earth, Concepts of King, D.M. Horn, P.G. Schultz, Modern Biology Series, Prentice Hall, Generation of a bacterium with a 21 Englewood Cliffs, 1974; (c) L.E. amino acid genetic code, J . Am. Orgel, Molecular replication, Nature Chem. SOC.2003, 125,935; 1992,358,203. (b) L. Wang, P.G. Schultz, A general 145. (a) S.A. Benner, A.M. Sismour, approach for the generation of Synthetic biology, Nature Reviews orthogonal tRNAs, Chem. Biol. 2001, Genetics 2005, 6, 533; 8,883. (b) R. MeDaniel, R. Weiss, Advances J.C. Anderson, T.J. Magliery, P.G. in synthetic biology: on the path from Schultz, Exploring the limits of prototypes to applications, Curr. codon and anticodon size, Chem. Opin. Biotechnol. 2005, 16, 476. Biol. 2002, 9, 237. J.C. Anderson, N. Wu, S.W. Santoro, 146. (a) C.A. Hutchinson et al., Global transposon mutagenesis and V. Lakshman, D.S. King, P.G. minimal mycoplasma genome, Schultz, An expanded genetic code Science 1999, 286, 2165; (b) G. Posfai with a functional quadruplet codon, et al., Emergent Properties of Proc. Natl. Acad. Sci. U.S.A. 2004, Reduced-Genome Escherichia coli, 101,7566. Science 2006, 312, 1044. S. Tonegawa, Somatic generation of 147. H.O. Smith et al., Generating a antibody diversity, Nature 1983, synthetic genome by whole genome 302, 575. assembly: 4x174 bacteriophage from P.G. Schultz, Bringing biological synthetic oligonucleotides, Proc. Natl. solutions to chemical problems, Proc. Acad. Sci. 2003, 100, 15440. Natl. Acad. Sci. U.S.A. 1998, 95, 148. (a) E.V. Koonin, How many genes 14590. can make a cell: the minimal-gene-set E. Keinan (Ed.),Catalytic Antibodies, concept, Annu. Rev. Genomics Hum. Wiley-VCH, Weinheim, 2005. Genet. 2000, I, 99; (b) P.L. Luisi, Robert Edwards, P. Steptoe, Matter of T. Oberholzer, A. Lazcano, The Lfe, W. Morrow & Company, New notion of a DNA minimal cell, Helv. York, 1980. Chim. Acta 2002, 85, 1759; (a) J. Maienschein, Whose View of (c) F. Arigoni, F. Talabot, M. k i t s c h , Lfe? Harvard University Press, M.D. Edgerton, E. Meldrum, A Cambridge, 2003; (b) R.M. Green, The Human Embryo Research Debates, genome-based approach for the Oxford University Press, Oxford, identification of essential bacterial 2001. genes, Nature Biotech. 1998, 16, 851.
References 167 149.
150.
151. 152. 153.
154.
155.
156.
157.
(a) P.L. Luisi, About various definitions of life, Origins ofL@ and Evolution ofthe Biosphere 1998, 28, 613; (b) B. Korzeniewski, Cybernetic formulation of the definition of life, /. theor. Biol. 2001, 209, 275; (c) Y.N. Zhuravlev, V.A. Avetisov, The definition of life in the context of its origin, Biogeosciences 2006, 3, 281; (d) D.E. Koshland Jr.,The seven pillars of life, Science 2002, 295, 2215. (a) E. Andrianantoandro, S. Basu, D.K. Karig, R. Weiss, Synthetic biology: new engineering rules for an emerging discipline, Mol. Systems Biol. 2006, 2, msb4100073; (b) P. Fu, A perspective of synthetic biology: assembling building blocks for novel functions, Biotechnol. /. 2006, 1, 690; (c) J.B. Tucker, R.A. Zilinskas, The promise and perils of synthetic biology, Trte New Atlantis 2006, Spring 2006,25. A registry of standardized modules can be found at http://parts.mit.edu. Editorial, Beauties of Synthesis, Nature 2006, 443, 1. K. Weissermel, Energie und Rohstoff entkoppeln, aber wie?, Lecture given in Frankfurt am Main, Feb. 22nd, 1980, Hicom GmbH, http://www.hicom.de. K. Weissermel, H.-J. Arpe, Industrial Organic Chemistry, Fourth Edition, Wiley-VCH, Weinheim, 2003. A.S. Goldman, A.H. Roy, Z. Ahuja, W. Schinski, M. Brookhart, Catalytic Alkane Metathesis by Tandem Alkane Dehydrogenation-Olefin Metathesis, Science 2006, 312, 257. W.H. Perkin, Jr., Experiments on the synthesis of the terpenes. Part I., /. Chem. Soc. 1904,85,654. E. Marris, The proofis in the product, Nature 2006, 442,492.
D.H.R. Barton, The relevance of organic chemistry, Chem. Britain 1973, 9, 149. 159. (a) R. Huisgen, The adventure Playground of Mechanisms and Novel Reactions, in: Profiles, Pathways, and Dreams, J.I. Seeman (Ed.),American Chemical Society, Washington DC, 1994, p. X X I I ; (b) P. Schmalz, Interview mit Gilbert Stork: Organische - Zukunft und Gegenwart, Nachr. Chew. Tech. Lab. 1987, 35, 349. 160. (a) J.D. Dunitz, X-Ray Analysis and the Structure of Organic Molecules, Cornell University Press, Ithaca, 1978, p. 310; (b) J. Fleming, Selected Organic Syntheses, Wiley, London, 1973, p. 125; (c) G. Buchi, R.E. Erickson, N. Wakabayashi, Constitution of Patchouli Alcohol, /. A m . Chem. Soc. 1961, 83,927; (d) G. Buchi, W.D. McLeod jr., J. Padilla O., Synthesis of Patchouli Alcohol, 1.Am. Chem. SOL. 1964, 86,4438. 161. (a) S.M. Weinreb, Synthetic lessons from quinine, Nature 2001, 21 1, 429; (b) P. Rabe, K. Kindler, Uber die partielle Synthese des Chinins, Ber. dtsch. chem. Ges. 1918, 51, 466; (c)T.S. Kaufman, E.A. Ruveda, The quest for quinine: Those Who Won the Battles and Those Who Won the War, Angew. Chem. Internat. Ed. 2005, 44, 854; (d) ].I. Seeman, The 158.
Woodward-Doeringl Rabe- Kindler Total Synthesis of Quinine: Setting the Record Straight, Angew. Chem. Internat. Ed. in press; (e) R.B. Woodward, W.E. Doering, The total synthesis of quinine, J . A m . Chem. Soc. 1994, 66, 849; 1945, 67,860; (fl G. Stork, D. Niu, A. Fujimoto, E.R. Koft, J.M. Balkovec, J.R. Tata, G.R. Dake, The first stereoselective synthesis of quinine, J . Am. Chem. Soc. 2001, 123, 3239.
PART II Using Small Molecules to Explore Biology
Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, T a r u n M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I
2 Using Natural Products to Unravel Biological Mechanisms
2.1 Using Small Molecules to Unravel Biological Mechanisms
Michael A. Lampson and Tarun M . Kapoor
Outlook
Experimental strategies designed around small molecule inhibitors have been critical in advancing our understanding ofbiological mechanisms. This chapter introduces a series of biological questions and illustrates how they have been addressed by using small molecules to perturb protein function.
2.1.1 Introduction
Our understanding of biological processes often develops from discovering or designing ways to perturb the process and observe the effects of the perturbation. While genetic approaches have been widely used for this purpose, small molecule inhibitors have several advantages as a means of perturbing protein function. First, small molecules provide a high degree of temporal control, generally acting within minutes or even seconds, and are often reversible, allowing both rapid inhibition and activation of protein function. The ability to design perturbations on short timescales has proved particularly valuable in examining dynamic biological processes. Second, dose can easily be controlled with small molecule inhibitors to allow varying degrees of inhibition. Third, small molecules can be applied in multiple biological systems, including different organisms, different cell types, and in vitro systems. The examples discussed in this chapter illustrate how these properties of small molecules have been exploited in designing strategies to dissect biological mechanisms. Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
71
72
I
2 Using Natural Products to Unravel Biological Mechanisms
2.1.2 Use of Small Molecules to Link a Protein Target to a Cellular Phenotype
Small molecules with dramatic cellular phenotypes have been used, without knowledge of their protein target, to provide insight into biological processes. If the effects of a small molecule are well characterized, then identification of the protein target immediately provides a wealth of information about its cellular functions because of the known inhibition phenotypes. 2.1 2.1
Colchicine and Tubulin
Cell division is the process by which cells dividetheir contents into two daughter cells, each ofwhich must receive genetic material identical to that of the mother cell. Each chromosome is replicated before cell division begins, and a complex and highly regulated process known as mitosis has evolved to ensure that the replicated chromosomes are equally partitioned between the two daughter cells. Progress through mitosis is closely linked to chromosome movements (Fig. 2.1-1(a)).Chromosomes first move to the center of the spindle, and only after correct positioning of all chromosomes at metaphase (Fig. 2.1-1(a) iii) do the sister chromosomes split apart at anaphase (Fig. 2.1-1(a)iv) and move to opposite sides of the cell before the final division into two daughter cells (Fig. 2.1-1(a)v, vi). All of these coordinated chromosome movements occur over the course of approximately one hour. The result is that each daughter cell receives exactly one copy of each replicated chromosome. Failure of this process leads to loss or gain of whole chromosomes in the daughter cells, a condition known as aneuploidy which is strongly associated with developmental defects and human diseases such as cancer (reviewed in Ref. [I]). Examination of fixed samples revealed the existence of a fibrous structure, known as the mitotic spindle, which appears at each mitosis and disappears after the chromosomes have separated. One of the great challenges in the study of cell division has been to understand the organization and function of the mitotic spindle. Use of the small molecule colchicine (Fig. 2.1-1(b))has contributed to our understanding of the physical properties of the spindle fibers and how they might drive chromosome movements, as well as their molecular components. The fibers that make up the mitotic spindle are optically anisotropic, or birefringent, with different indices of refraction in different directions (i.e., parallel or perpendicular to the fiber axis).Exploiting this property of the fibers, Inoue developed a sensitive polarized light microscope that allowed him to directly observe the spindle in living cells [2]. The small molecule colchicine (Fig. 2.1-1(b))was known to disrupt spindle function, but its mechanism of action was not known. Using the polarized light microscope, Inoue showed that the birefringence of the spindle fibers disappeared after colchicine treatment, indicating loss of the fibers [3]. The time course of this effect ranged from a few minutes to an hour, depending on the concentration. If colchicine was removed, the fibers recovered. Small molecule inhibitors of protein synthesis
2.7 Using Small Molecules to Unravel Biological Mechanisms (b) Colchicine
p'
I I
Sp'indle fiber
Replicated chromosome pair
Taxol iv
Fig. 2.1-1
V
(a) Overview o f mitosis. (i) Chromosomes are replicated before mitosis. (ii) The spindle forms and chromosomes attach to spindle fibers. (iii) Chromosomes move t o the center ofthe spindle at metaphase. (iv) Sister chromosomes separate at anaphase and
I
vi
move in opposite directions. (v) The cell divides as the cleavage furrow forms between the separated chromosomes. (vi) Two daughter cells form, each with exactly one copy of each chromosome. (b) Structures o f t w o small molecules that target microtubules: colchicine and taxol.
were used to demonstrate that the fibers recovered by assembly from an available pool of material [4].Similar results were obtained by changing the temperature to manipulate the fibers [S]. Together, these findings suggested that the observed birefringence was due to oriented polymers that were in equilibrium with free molecules in solution. The equilibrium is shifted toward the depolymerized state by colchicine or by low temperature, and returns to its original state after removal of the inhibitor or rewarming. To demonstrate the potential functional significance of the spindle fiber dynamics, the same experimental paradigm was used: perturbation of spindle function combined with observation ofthe fibers in living cells. Treatment with low concentrations of colchicine caused the fibers to contract slowly rather than immediately eliminating the birefringence. As the fibers contracted, chromosomes were pulled toward one pole of the spindle, which was anchored at the cell surface [ 3 ] .The effect was reversible, as fibers elongated after removal of colchicine and chromosomes moved away from the pole. This experiment demonstrated that force could be generated by coupling polymerization and depolymerization of the fibers to chromosome movement. In the studies discussed above, colchicine was used to probe spindle function without knowing its mechanism of action. Tight binding to a intracellular
0'
73
74
I target was implied by the low concentration (100 nM) required to arrest cells 2 Using Natural Products to Unravel Biological Mechanisms
in mitosis. A strategy was developed to isolate a colchicine-binding protein. First, colchicine was labeled with H3 with high specific activity and tested with a variety of cells, tissues, and organelles [GI. High binding activity was observed with multiple preparations, including the mitotic spindle, cilia, sperm tails, and brain tissue, that are enriched in intracellular fibers called microtubules,the same fibers that Inoue observed in the spindle [7, 81. These results suggested that the target of colchicine was a subunit of microtubules. Isolated sea urchin sperm tails were dissolved to extract the colchicine-binding activity, which was then purified by gel filtration and sedimentation over a sucrose gradient. A single component with a sedimentation constant of GS was identified. Using porcine brain as a starting material, the same component was isolated and shown to bind guanosine triphosphate (GTP) [9, lo]. Because this component was believed to be the primary constituent of microtubules, the protein was named tubulin [Ill. The functions of microtubules in cells depend on the activities of numerous microtubule-associated proteins (MAPs), including regulators of polymerization dynamics and molecular motors that move along microtubule tracks. Identification of MAPS was made difficult by the dynamic nature of microtubule fibers, particularly the tendency to depolymerize under conditions used to prepare extracts for biochemical purification. The small molecule taxol (Fig. 2.1-1(b))was shown to promote microtubule assembly and to stabilize polymerized microtubules [12, 131, and these properties were exploited to develop a procedure for purification of MAPS [14]. Taxol was added to brain or cell extract to polymerize microtubules, which were subsequently isolated together with bound MAPs. Washing with high salt released MAPS from the microtubules, which were stabilized with taxol, so that the soluble MAPS could be separated from the microtubules. One prominent application of this strategy was the discovery of the founding member of the kinesin family of microtubule-based motor proteins [15]. The potential of small molecules targeting microtubules as cancer therapeutics was demonstrated by the vinca alkaloids, such as vincristine and vinblastine, which have been used in the clinic for 40years. At high concentrations (10- 100 nM), these compounds depolymerize microtubules, which eliminates the mitotic spindle. At lower concentrations that are used clinically, microtubules remain stable but microtubule dynamics are suppressed. Taxol, which also inhibits microtubule dynamics, is widely used to treat a variety of cancers (reviewed in Ref. [lG].These drugs induce a mitotic arrest, which eventually leads to cell death [17]through mechanisms that are only beginning to be understood [18,19, 201. 2.1.2.2
Cytochalasin and Actin
While colchicine was a valuable tool for examining cellular processes that relied on microtubules, electron microscopy revealed another filamentous structure,
2. I Using Small Molecules to Unravel Biological Mechanisms
Fig. 2.1-2 (a) Structure ofcytochalasin B, a small molecule that targets actin. (b) Force production by the contractile ring in cytokinesis. A ring o f actin filaments forms at the plasma membrane and contracts to divide the cell in half.
termed rnicroJlarnents, that was distinct from microtubules. A key step in understanding the function of microfilaments was to observe a correlation between the presence of the filaments, their disruption by the small molecule cytochalasin (Fig. 2.1-2(a)), and the phenotype of cytochalasin treatment in multiple systems. Although the molecular target of cytochalasin was unknown, it was shown to inhibit many forms of cellular or intracellular movement, such as cytoplasmic cleavage in cytokinesis (Fig. 2.1-2(b)),cell motility, membrane ruffling, and nerve outgrowth [21, 221. In all of these systems, microfilaments were observed and were shown to be disrupted by cytochalasin. Cells recovered after removal of cytochalasin as the microfilaments returned to their normal state. Furthermore, the actions of cytochalasin and colchicine were generally mutually exclusive, suggesting that the two types of filamentous structures could function independently in the cell. Microtubule-dependent processes, which were inhibited by colchicine, were often insensitive to cytochalasin, while processes inhibited by cytochalasin were generally insensitive to colchicine [22]. The conclusion from these correlative data was that microfilaments likely played a fundamental role in the generation of forces at the cellular level: “the evidence seems overwhelming that microfilaments are the contractile machinery of nonmuscle cells” [22]. The action of the myosin motor, which uses energy from adenosine triphosphate (ATP) hydrolysis to slide filaments made up of polymers of the protein actin, was known to drive contractility in muscle, but the relevance of this mechanism to other cellular processes had not been demonstrated. Using actin filaments purified from muscle, cytochalasin was shown to decrease the viscosity of actin in solution. This experiment, which established a direct link between cytochalasin and actin, led to two important conclusions. First, cytochalasin interacts directly with actin. Second, “an interaction of
I
75
76
l cytochalasin with actin or actin-like proteins in vivo could account for the 2 Using Natural Products to Unravel Biological Mechanisms
ability of cytochalasin to inhibit various forms of cell motility and contraction” [23].As the molecular target of cytochalasin, actin was implicated as a critical component of the microfilaments involved in cytochalasin-sensitiveprocesses. 2.1 2 . 3 Small Molecules and Thermal Sensation Another example of a small molecule with a dramatic cellular phenotype is capsaicin (Fig. 2.1-3(a)),the natural product that makes chili peppers “hot”. Its mechanism of action is of particular interest because of the link to more general pain sensation. A class of neurons that are excited by various noxious stimuli (chemical, mechanical, or temperature) are also sensitive to capsaicin [24]. Therefore, capsaicin could be a useful tool in understanding the basic mechanisms underlying pain sensation. The discovery of a capsaicin
Fig. 2.1-3
(a) Structures ofthe small molecule capsaicin and menthol. (b) Schematic o f the VR1 receptor, a nonspecific cation channel. The channel is
gated by capsaicin binding, heat, and protons. (c) Response of the VR1 receptor channel t o capsaicin, temperature, and pH. Adapted from [Ref. 281.
2. J Using Small Molecules to Unravel Biological Mechanisms
receptor, in particular, would provide a molecular handle on this process. Studies in cultured neurons showed that capsaicin induced a rapid calcium influx through activation of a cation channel [25, 261. On the basis of this knowledge, an expression cloning strategy was devised to identify the receptor [27]. The underlying logic of this approach was that if nonneuronal cells were not sensitive to capsaicin simply because they did not express the receptor, expression of the receptor would lead to a capsaicin-induced increase in intracellular calcium. A neuronal cDNA library was transfected into human embryonic kidney (HEK293) cells and screened by calcium imaging in living cells. The cloned receptor, named VR1 (vanilloid receptor subtype 1) was shown to be a nonselective cation channel expressed in sensory neurons (Fig. 2.1-3(b-c)). The sensitivity of VR1 to heat and acid, as well as capsaicin, indicated its more general physiological importance in detecting noxious stimuli [28]. At the whole animal level, the role of V R l in detection of noxious stimuli has been demonstrated by gene disruption studies in mice [29, 301. A similar expression cloning strategy was used to identify a receptor involved in transduction of cold sensation. In this case, the natural product used to induce calcium influxwas menthol (Fig. 2.1-3(a)),which was known to produce a sensation of cold and even suggested to interact directly with a cold detection pathway [31].Transient receptor potential (TRPM8), a cation channel from the same family as VR1, was cloned and shown to be activated by both menthol and cold [32, 331. Thus, small molecules were used to link our perceptions of both heat and cold to specific receptors in sensory neurons involved in thermosensation. Identification of these receptors has opened the door to an understanding of thermosensation at a molecular level [34]. 2.1.3 Small Molecules as Probes for Biological Processes
In strategies developed to use small molecules as probes to understand biological processes, the effects of the small molecule on the biological system as a whole are often more important than the specific protein target, which may not even be known. A number of insightful experiments have been designed around such perturbations by examining how the system responds to or recovers from the induced state. Because of the temporal control available with small molecules and the reversibility of inhibition, these approaches are particularly powerful with dynamic processes. As initially shown with colchicine, the mitotic spindle is a highly dynamic structure and small molecules have played an integral role in understanding its function. 2.1.3.1
Progression through Mitosis
It is clear from observing chromosome movements that cell division occurs in an ordered sequence of events (Fig. 2.1-1(a)). Chromosomes attach to spindle microtubule fibers and move to the spindle equator before sister
I
77
78
I chromosomes separate at anaphase and move to opposite sides of the 2 Using Natural Products to Unravel Biological Mechanisms
cell, followed by division into two daughter cells. Successful chromosome segregation requires that events occur in this order. If anaphase begins prematurely, before chromosomes have properly attached to the spindle, the sister chromosomes will not segregate equally, leading to aneuploid daughter cells. Mechanisms that determine the timing of anaphase onset are therefore critical for the success of mitosis. One hypothesis for how anaphase onset might be regulated was through feedback control. This term refers to a mechanism for controlling progression past a certain point in the cell cycle,known as a checkpoint, where the completion of an event generates a signal that allows the next event to begin. Failure to complete the event causes a cell-cycle arrest. In the context of progression through mitosis, some critical process, such as spindle assembly, would be monitored to generate a signal regulating anaphase onset. Consistent with this hypothesis, colchicine was known to induce a mitotic arrest by disrupting the spindle. The effect of colchicine did not prove the existence of a feedback control mechanism, however, because the mitotic arrest could also be explained by direct inhibition of another microtubule-dependent process required for anaphase. A prediction of the feedback-control hypothesis is that mutations in genes required for feedback signaling would allow cells to bypass the colchicine-induced arrest and progress through mitosis without completing spindle assembly. A genetic screen was designed to identify such mutations in budding yeast, using benomyl, a small molecule inhibitor of microtubule polymerization that is effective in yeast, to perturb spindle assembly. Benomyl could either be used at a low dose or washed out, as the effect is reversible, so that cells would survive the treatment. Cells were arrested in mitosis with high benomyl(70 pg mL-'), which prevents spindle formation, but proceeded normally through mitosis after removal of benomyl and continue to grow (Fig. 2.1-4(a))1351. Alternatively, spindle assembly was slowed with low benomyl (15 pg mL-l), and anaphase onset was delayed to allow completion of spindle assembly, but cells continued to grow [36]. In both cases, massive chromosome missegregation and cell death were expected if cells entered anaphase prematurely in the presence of benomyl with incomplete or nonexistent spindles. The difference in survival between cells with functional and defective feedback control was used to select mutations in genes required for feedback control [35, 361. After creating random genetic mutations, cells that failed to grow after benomyl treatment were selected (Fig. 2.1-4(b)).As in Inoue's studies with colchicine, the reversibility of the small molecule and the ability to achieve partial inhibition by decreasing the dose were important components of the benomyl-screening strategies. The identification of genetic mutations that abolished the benomyl-induced mitotic arrest provided evidence for a feedback mechanism that delays anaphase onset until completion of spindle assembly, now often referred to as the mitotic spindle checkpoint. The names M a d , for mitotic arrest deficient, and Bub, for budding uninhibited by benomyl, were used for
0
2.7 Using Small Molecules to Unravel Biological Mechanisms
(b)
Wild-type cell arrests in mitotis
79
Colony grows without benomyl
8
Mutant cell defective in feedback control fails to arrest
I
.
Colony dies with benomyl
Cells dead due to catastrophic chromosome misegregation
(I4*
Benomyl
Benomyl removed
Fig. 2.1-4
missegregation and eventual cell death. Screening strategy used t o identify genes required for feedback control (b) Cells were mutagenized, and colonies were grown from single cells and then o f anaphase onset in budding yeast [35]. transferred t o create two replicate plates. (a) Cells were arrested in mitosis for 20 h with benomyl, a small molecule that targets One plate (top) was grown without benomyl. The second plate (bottom) was treated with tubulin and prevents spindle formation. benomyl. Colonies that failed to grow on the After removal o f benomyl, wild-type cells second plate, indicating defective feedback form a spindle and proceed normally control, were selected from the first plate t o through mitosis. Mutant cells fail to arrest identify the mutated gene. and enter anaphase without forming a spindle, causing chromosome
the genes identified in these screens. The Mad and Bub genes, which are well conserved from yeast to mammals, have provided the foundation for much of our current understanding of the mitotic spindle checkpoint. Studies in transgenic mice have confirmed the importance of several of these genes for faithful chromosome segregation in higher eukaryotes, as reduced expression increases both aneuploidy and cancer susceptibility. In human tumors, mutations have been reported in Madl, Mad2, Bubl, and BubRl, a related vertebrate protein (reviewed in [Ref. 11. Additionally, human germline mutations in BubR1 have been linked to mosaic variegated aneuploidy, a condition associated with high risk of cancer [37]. Experiments examining the intracellular localization of Mad2 have suggested a model for how the feedback control mechanism might operate [38, 391. At early stages of mitosis, Mad2 localizes to the kinetochore, a structure that forms on each chromosome and mediates attachment to spindle microtubules. As
*
80
I
2 Using Natural Products to Unravel Biological Mechanisms
cells progress through mitosis, however, Mad2 disappears from kinetochores, and at anaphase onset none of the kinetochores have detectable Mad2. The loss of Mad2 from kinetochores correlates with microtubule attachment. Furthermore, when spindle microtubules are depolymerized with the small molecule nocodazole, Mad2 localizes to all kinetochores. These findings suggest a mechanistic basis for the feedback-control model. Mad2 binds kinetochores that lack microtubule attachment as a signal that mitosis in not complete, which prevents anaphase onset. Microtubule binding displaces Mad2 from kinetochores, so that when all kinetochores have bound microtubules, anaphase can begin. It should be noted that the small molecule benomyl was used in the Mad/Bub genetic screens not because of its specific protein target but because of the perturbation of spindle assembly. In principle, the same experiments could be done by targeting a different component of the spindle. The generality of the spindle checkpoint has been demonstrated through the use of monastrol, a small molecule inhibitor of the mitotic kinesin Eg5, which was identified in a screen for small molecules that arrest cells in mitosis without targeting tubulin [40].Because Eg5 is required to separate the spindle poles, monastrol treatment arrests cells in mitosis with monopolar spindles. In the presence of monastrol, the checkpoint can be overridden by inhibition of Mad2, through microinjection of inhibitory antibodies [41]. This finding indicates that the principle of feedback control applies generally to spindle perturbations through highly conserved mechanisms. Inhibitors of Eg5 are currently in development as anticancer drugs because, like taxol and the vinca alkaloids, they arrest cells in mitosis by activating the spindle checkpoint. The efficacy of these drugs, as demonstrated by recent studies, requires a prolonged, checkpoint-dependent mitotic arrest [42, 191. Drug resistance is conferred by a compromised spindle checkpoint, for example, through reduced expression of Mad2. 2.1.3.2 Positioning the Cleavage Plane in Cytokinesis Monastrol, the small molecule inhibitor of Eg5, has been used in several studies to address questions in the biology of cell division [41, 43, 441. One important question is how the position of the cell division (or cleavage) plane is determined in cytokinesis. The cleavage plane is typically positioned in the center of the cell so that cellular components are equally divided between the two daughter cells. Asymmetric divisions do occur, however, and are particularly important during development, when the location of the cleavage plane can determine the fate of the daughter cells. Models to explain the position of the cleavage plane relied on the presence of the bipolar microtubule array of the mitotic spindle, which would place the division plane in between the spindle poles. To test this idea directly, monastrol was used in an experiment designed to determine if cytokinesis could occur in cells with monopolar spindles [41].To
2. I
Using Small Molecules to Unravel Biological Mechanisms
Fig. 2.1-5 Assay to examine cytokinesis in the presence of a monopolar spindle [41]. Treatment with monastrol, a small molecule inhibitor ofthe kinesin Eg5, causes cells to arrest in mitosis with monopolar spindles due to activation of the spindle checkpoint. Microinjection o f an antibody against the protein Mad2 inactivates the checkpoint so monopolar that cellsspindles. divide with
Anti-Mad2 antibody injection
@+&I
p b
Monastrol
allow cells to enter anaphase in the presence of monastrol, inhibitory antibodies against Mad2 were microinjected to override the mitotic spindle checkpoint. After entering anaphase, the injected cells successfully completed cytokinesis (Fig. 2.1-5). This experiment demonstrated that a bipolar microtubule array is not required for cytokinesis. By carefully analyzing microtubule dynamics during anaphase in the monopolar spindles, a population of microtubules near the chromosomes was shown to be stabilized at the location where the cleavage plane formed. These findings suggest a model in which the position of the cleavage plane is determined by local regulation of microtubule dynamics, through association with chromosomes.
2.1.3.3
Correcting Errors in Chromosome-spindleAttachments
Accurate chromosome segregation in mitosis requires not only feedback control of anaphase onset but also regulation of chromosome attachment to the spindle. Each pair of replicated chromosomes must achieve a particular orientation in which microtubule fibers attach sister chromosomes to opposite poles of the spindle. Experiments in yeast showed that inhibition of the Ipll/Aurora family of kinases stabilized improper attachments [45, 461, but how the active kinase corrected attachment errors was not known. Because attachment errors are rarely observed in the presence of active Aurora kinase, this problem was particularly difficult to address. Inhibition of Aurora kinase, through experimental approaches such as genetic mutation, could be used to accumulate attachment errors, but not to examine error correction by the active kinase. Reversible small molecule Aurora kinase inhibitors present a
I
81
82
I
2 Using Natural Products to Unravel Biological Mechanisms (b) IV
-b
-b
Monastrol
Monastrol removed Hesperadin
H
Hesperadin removed
2.7 Using Small Molecules to Unravel Biological Mechanisms 4
Fig. 2.1-6 Correction o f improper chromosome attachments by activation o f Aurora kinase [44]. (a) Structures o f t w o Aurora kinase inhibitors (AKI), hesperadin and AKI-1. (b) Assay schematic. (i) Treatment with the Eg5 inhibitor monastrol arrests cells in mitosis with monopolar spindles, in which sister chromosomes are often both attached to the single spindle pole. (ii) Hesperadin, an Aurora kinase inhibitor, is added as monastrol is removed. As the spindle bipolarizes with Aurora kinase inhibited, attachment errors fail t o correct so that some sister chromosomes are still attached t o the same pole o f t h e bipolar spindle. (iii) Removal o f hesperadin activates Aurora kinase. Incorrect attachments are destabilized by disassembling the microtubule fibers, pulling the chromosomes to the pole, while correct attachments are stable. (iv) Chromosomes move from the pole to the center ofthe spindle as correct attachments form.
(c) Spindles were fixed after bipolarization either in the absence (i) or in the presence (ii) o f a n Aurora kinase inhibitor. Chromosomes are shown in blue and microtubule fibers in green. The arrows indicate sister chromosomes that are both attached t o the same spindle pole. Projections o f multiple image planes are shown, with optical sections o f boxed regions (1 and 2) t o highlight attachment errors. Scale bar 5 pm. (d) After removal o f hesperadin, CFP tubulin (top) and chromosomes (bottom) were imaged live by three-dimensional confocal fluorescence microcopy and differential interference contrast (DIC), respectively. The arrow and arrowhead show two chromosomes that move to the spindle pole (marked by a circle in DIC images) as the associated kinetochore-microtubule fibers shorten, and then move t o the center ofthe spindle. Time (min:s) after removal of hesperadin. Scale bar 5 pm. (With permission from Lampson et al. N a t . Cell Biol. 2004, Ref. 44.)
solution to this problem because they can be used to inhibit kinase function and subsequently removed to activate the kinase. Understanding the function of Aurora kinases is particularly important because they have been linked to oncogenesis, and Aurora kinase inhibitors are currently in development as cancer therapeutics [47, 481. Several issues needed to be addressed to devise a strategy to address the question of how attachment errors were corrected. First, kinase inhibition should be temporally controlled to experimentally isolate the error correction process, as Aurora kinases have been implicated in multiple mitotic processes. Second, error correction likely involves some regulation of the dynamics of the microtubule fibers that attach chromosomes to the spindle. These dynamics can be analyzed with high temporal and spatial resolution by high-resolution microscopy in living cells. Finally, the dynamics of individual microtubule fibers are difficult to analyze if that fiber is obscured by other microtubules in the spindle. The dynamics can be clearly observed, however, under conditions in which the improperly attached chromosomes are positioned away from the spindle body. All of these issues were addressed through the development of an assay using several reversible small molecule inhibitors (Fig. 2.1-6) [44]. First, treatment with the Eg5 inhibitor monastrol arrests cells in mitosis with monopolar spindles (Fig. 2.1-G(b) i). A particular chromosome attachment error in which both sisters are attached to the single spindle pole, referred to as syntelic attachment, is frequent in the monopolar spindles [49]. If monastrol
84
I is removed, the spindle becomes bipolar, all of the accumulated attachment 2 Using Natural Products to Unravel Biological Mechanisms
errors are corrected, and anaphase proceeds normally. An Aurora kinase inhibitor was added immediately after removal of monastrol to determine if Aurora kinase activity is required for correction of the attachment errors. Because the Aurora kinase inhibitor is added only at this point, its activity was unperturbed for all the preceding stages of mitosis. To control for possible off-target activities of the Aurora kinase inhibitors, the assay was performed with two structurally unrelated inhibitors (Fig. 2.1-G(a)). Cells expressing GFP (green fluorescent protein) tubulin were used to examine spindle bipolarization in the presence of an Aurora kinase inhibitor (Fig. 2.1-G(b-d)).Both chromosome and microtubule dynamics were analyzed at high resolution by multimode fluorescence and transmitted light microscopy.The syntelic attachment errors persisted as the spindle bipolarized, directly demonstrating that Aurora kinase activity is required for correction of these errors. Notably, some of the improperly attached microtubule fibers could be clearly observed, unobstructed by other spindle microtubules, as the chromosomes attached to these fibers were positioned away from the spindle body. After spindle bipolarization, the Aurora kinase inhibitor was removed to examine how the active kinase might correct the syntelic attachment errors. One hypothesis was that attachment errors would correct by chromosome release from the attached microtubule fiber [50]. Instead the observation was that improperly attached chromosomes remained attached to the microtubule fibers and were pulled to the spindle pole as the fibers shortened. Properly attached chromosomes were not affected, suggesting local regulation of microtubule dynamics by Aurora kinase activity. After disassembly of the microtubule fibers, the chromosomes moved to their usual position at the center of the spindle as correct attachment formed. Several advantages of small molecule inhibitors, particularly in combination with high-resolution live-cell microscopy, are demonstrated by this assay. In a highly dynamic process such as mitosis, many events occur on timescales of minutes or seconds. Ideally, perturbation of protein function and observation of the effects of the perturbation would be possible on similar timescales. Manipulation of protein function through the use of reversible small molecule inhibitors, together with live-cell imaging, makes this possible. In the assay described here, inhibitors of both the kinesin Eg5 and Aurora kinases were effectivelyused as switches to turn enzymes on and off. With this high degree of temporal control, a mechanism for correcting chromosome attachment errors could be dissected without perturbing the preceding processes, such as those involved in spindle assembly.
2.1.3.4
Brefeldin A Principles of Membrane Transport
Our understanding of cell division has benefited greatly from studies with small molecules, but these tools have also been applied successfully to other dynamic processes in cell biology. One such process is the transport of lipids
2.7 Using Small Molecules to Unravel Biological Mechanisms
and proteins between distinct membrane-bound compartments, or organelles, inside the cell. The small molecule Brefeldin A (BFA) was instrumental in uncovering some of the basic principles of intracellular transport. A fundamental question in cell biology is how an organelle can maintain its identity in the presence of constant inward and outward flow of lipids and proteins. In the secretory pathway, for example, proteins are synthesized in the endoplasmic reticulum (ER), then transported to the Golgi apparatus for processing, and finally exit the Golgi in transport intermediates that fuse with the plasma membrane to release their contents outside the cell (Fig. 2.1-7(a)). As an indication of the flow of lipids and proteins through this pathway, bulk ER membrane was estimated to be depleted by transport to the Golgi with a half-time of 10 min [Sl]. This observation suggested the existence of a recycling pathway to return membrane to the ER, but the first direct demonstration of this recycling pathway came from studies with Brefeldin A (Fig. 2.1-7(b)).Early studies had shown that BFA blocked transport of proteins out of the ER and caused disassembly of the Golgi [52, 531. Careful analysis of BFA-treated cells demonstrated that within minutes of BFA treatment, resident Golgi proteins redistributed to the ER. The redistribution was shown both by localization of Golgi proteins and biochemically, as resident ER glycoproteins were processed by the redistributed Golgi enzymes in the presence of BFA [54, 551. After removal of BFA, the Golgi rapidly reformed and the usual localization of Golgi proteins was reestablished, again within minutes. These findings provided direct evidence for a Golgi-ER recycling pathway and highlighted the dynamic nature of membrane transport between the two organelles. Subsequent studies with BFA led to additional insights into some essential features of membrane traffic from the Golgi. A careful analysis of the timing of events after BFA treatment showed that a 110-kD peripheral membrane protein, whose identity was at that point unknown, dissociated from Golgi membranes as the earliest detectable event (within 30 s) in BFA action and reassociated after removal of BFA as the Golgi reformed [SG]. Other peripheral membrane proteins did not dissociate but redistributed to the ER instead, as had been shown for resident Golgi proteins. These findings suggested that the 110-kD protein played a critical role in the regulation of membrane transport from the Golgi. The 110-kD protein was subsequently purified and cloned and shown to be identical to B-COP, a component of the coat protein 1 (COPI) (or coatamer) complex, which forms the coat of vesicles budding from the Golgi [57, 581. This finding, together with the known effects of BFA, led to the hypothesis that COPI-coated vesicles mediate forward membrane flow from the Golgi. Inhibition of this process with BFA would allow retrograde flow to dominate, so that Golgi membranes would be transported back to the ER, as observed. The hypothesis was tested in a cell-free system in which the budding of COPI-coated vesicles from Golgi membranes could be reconstituted in vitro [59].BFA prevented the assembly ofthe COPI coat in this system, as predicted.
86
I
2 Using Natural Products to Unravel Biological Mechanisms
Fig. 2.1-7
(a) Schematic ofthe secretory pathway. Transport vesicles carry membrane and soluble material from the ER t o the Colgi and from the Golgi to the plasma membrane, where the soluble contents are released into the extracellular space. (b) Structure of the small molecule Brefeldin A. (c) Regulation ofvesicle budding by the
ARF CTPase. Exchange o f GDP for GTP on ARF triggers ARF-CTP binding t o Colgi membranes. After ARF-CFP binding, the coatamer complex assembles on the membrane and induces budding o f a transport vesicle. ARF hydrolyzes CTP after vesicle budding t o release coatamer and ARF-CDP from the membrane.
Together these experiments linked the COPI complex with forward membrane transport from the Golgi, through the observed effects of BFA on both COPI coat assembly and the dynamics of ER-Golgi trafficking.
2.7 Using Small Molecules t o Unravel Biological Mechanisms
BFA continued to be instrumental in understanding the regulation of coat assembly. In a semipermeabilized cell system, GTPy S, a nonhydrolyzable analog of GTP, was shown to prevent the BFA-induced dissociation of the 110-kD protein (at that point not known to be p-COP) from the Golgi [GO]. This finding suggested that the GTP-GDP (guanosine diphosphate) cycle was involved in the process inhibited by BFA. A small GTP-binding protein, adenosine diphosphate ribosylation factor (ARF), was a candidate involved in this mechanism because it was known to associate with the Golgi and had been implicated in Golgi transport processes [ G l ] . When the sensitivity of this protein to BFA was examined, BFA was found to inhibit ARF binding to Golgi membranes, both in cells and in vitro, while GTPyS prevented this inhibition [G2]. These results were consistent with the effects of BFA and GTPyS on ,&COP. Furthermore, ARF was shown to be a subunit of the COPI coat [G3].Together, these findings suggested that the GTP-binding state of ARF regulates COPI coat formation. To place the events in an ordered biochemical process, BFA was shown to be required for association of ARF with Golgi membranes, and ARF was then required for binding of p-COP [G4]. A more detailed biochemical understanding of the mechanism of BFA action was provided by the finding that an activity associated with Golgi membranes catalyzes GDP-GTP exchange on ARF and is inhibited by BFA [65, 661. The interpretation was that BFA acts by preventing nucleotide exchange on ARF, which prevents ARF binding to membrane, an event required for coat assembly and vesicle budding. This result suggested a general model for membrane transport in which ARF proteins regulate assembly of coated vesicles through changes in the GTP-GDP binding state and therefore control vesicular trafficking (Fig. 2.1-7(c))[G7]. Much more work has been done with BFA, for example, to understand its mechanism of action in more detail [G8], but the studies discussed here illustrate many key features of the small molecule approach. Interest in BFA was initially stimulated by its dramatic phenotype on a biological process: traffic of proteins through the secretory pathway. Before the underlying mechanism was understood in molecular detail, the inhibitor was instrumental in a series of experiments that revealed some of the key principles of membrane transport. Though BFA was not directly involved in all of the experiments, interpretation of many of the findings depended on placing the results in the context of BFA action. These experiments demonstrated the dynamic nature of ER-Golgi transport and the role of the COPI coat complex in vesicle formation. Furthermore, the role of the ARF GTPase in coat assembly led to a model for regulation of vesicular trafficking. Several properties of BFA as a small molecule were exploited throughout these experiments. Reversibility and temporal control were used to understand the dynamic nature of the events and to place them in an ordered process. In addition, BFA was used in multiple systems, including various cell types and in vitro, so that insights from biochemical experiments could be interpreted in the context of a complex cellular process.
88
I
2 Using Natural Products to Unravel Biological Mechanisms
2.1.3.5
Catalysis by Ribosomal RNA
Small molecules can be used to address problems at the level of biochemical reactions as well as larger-scalecellular processes. Puromycin, a small molecule inhibitor of protein synthesis, has contributed to our understanding of the catalysis of peptide bond formation. Protein synthesis in a cell takes place on a large assembly of protein and RNA components called the ribosome. This structure carries out the complex task of reading the codons of an mRNA molecule, selecting the appropriate amino acid for each codon, and catalyzing the formation of a peptide bond between that amino acid and the preceding one in the polypeptide chain (the peptidyl transferase reaction). It was initially assumed that ribosomal proteins were responsible for the peptidyl transferase activity, but experiments in the 1970s suggested that ribosomal RNA might be directly involved. The discovery of catalytic RNA in the 1980s [69, 701 led to the hypothesis that ribosomal RNA, rather than protein, might catalyze peptide bond formation. An experiment was designed to test this idea on the basis of the logic that if catalysis is RNA based, it might be possible to remove ribosomal proteins without loss of peptidyl transferase activity. The assay used to measure transferase activity had been developed two decades earlier as a model reaction to study the mechanism of peptide bond formation [71].In this assay, both ribosomal substrates, the growing polypeptide chain and the incoming aminoacyl-tRNA,are replaced with simplified molecules: a tRNA fragment, CAACCA-formyl-methionine, and the small molecule puromycin (Fig. 2.1-8). The “fragment reaction” requires only the large (50s) ribosomal subunit, without small subunits or other factors. Peptidyl transferase activity can be measured as formation of the product f-Met-puromycin, using 35 S-labeled methionine. Exploiting this model system, catalytic activity was measured following extraction of ribosomal proteins from the 50s subunit, using procedure designed to cause minimal perturbation of RNA structure. Ninetyfive percent of the ribosomal protein could be removed by treatment with SDS (sodium dodecyl sulfate) and proteinase K, followed by phenol extraction, while maintaining over 80% activity [72]. In contrast, transferase activity was rapidly lost upon treatment with ribonuclease. While this result could not formally exclude the possibility that catalysis was carried out by the remaining 5% of ribosomal proteins, it strongly supported the hypothesis that ribosomal RNA was responsible for peptidyl transferase activity. In the fragment reaction, the ability of puromycin to mimic the aminoacyltRNA in the peptidyl transferase reaction was exploited to measure catalytic activity. Puromycin was subsequently used to design a transition-state analog for the peptidyl transferase reaction, known as the Yams inhibitor, in which it is linked to the oligonucleotide CCdA by a phosphoramide group [73]. In a complex with the 50s ribosomal subunit, the Yams inhibitor was used to define the catalytic site in a high-resolution crystal structure. N o protein was found within 18 A of this site [74]. This result demonstrated conclusively that the catalytic activity indeed resides in the ribosomal RNA.
2. I Using Small Molecules to Unravel Biological Mechanisms
I
89
Elongated polypeptide chain -OR
Growing polypeptide chain
NHz
",
ReleasedtRNA
Purornycin
Fig. 2.1-8 (a) Elongation o f a polypeptide chain. The amino group ofthe incoming aminoacyl-tRNA joins the carbonyl group o f the growing polypeptide chain to replace the
peptidyl-tRNA. (b) The small molecule puromycin replaces the arninoacyl-tRNA in the polypeptide chain and prevents further elongation.
2.1.4 Conclusion
The experiments described in this chapter illustrate how small molecule inhibitors have been used to design strategies to address fundamental biological problems. As our understanding of the biology advances, the use of small molecules should complement genetic and RNAi-based approaches. The advantages of small molecule inhibitors have been emphasized here, but there are also significant limitations that should be considered, particularly in comparison with genetic approaches. For example, genetics can be used to target any gene for mutation or deletion without direct effects on any other gene. Discovery of a new small molecule inhibitor, however, is challenging. Another limitation is the difficulty of demonstrating specificity of small molecule inhibitors. Taking a kinase inhibitor as an example, testing the
90
I
2 Using Natural Products to Unravel Biological Mechanisms
effects on over 500 kinases in the human genome is a substantial undertaking. Using small molecules in focused assays is one way to address specificity, so that a narrowly defined biological process is examined and off-target effects are less likely to be relevant. In combination with this approach, several inhibitors that target the same protein can be compared. If the inhibitors are chemically unrelated, they are not expected to have similar off-target activities. 2.1.4.1
Future Directions
Only the availability of inhibitors and the assays that can be designed around them limit the future use of small molecule inhibitors to address biological questions. Currently, only a small fraction of the proteome can be targeted by small molecules. As new inhibitors are identified, small molecule-based strategies will be applicable to an increasing range of biological problems. The development of methods to monitor protein function with high temporal and spatial resolution, particularly in living cells, will also increase the scope for using small molecules. Recent advances in fluorescence-based probes, for example, have made it possible to monitor numerous properties of living cells, including membrane potential, pH, posttranslational modifications, protease activity, and mediators of intracellular signaling such as Ca2+ and cyclic adenosine monophosphate (AMP) [75]. These high-resolution readouts, with the temporal control afforded by small molecule inhibitors, should be a powerful combination for examining biological mechanisms in living cells. Methods have also been developed to measure the enzymatic activities of single protein molecules in vitro. Investigating the effects of small molecule inhibitors, both at this level and in a more complex cellular context, should continue to provide insight into protein function.
References G.j. Kops, B.A. Weaver, D.W. Cleveland, On the road to cancer: aneuploidy and the mitotic checkpoint, Nut. Rev. Cancer 2005, 5, 773-785. 2. S. Inoue, Polarization optical studies of the mitotic spindle. I. The demonstration of spindle fibers in living cells, Chromosoma 1953, 5, 487-500. 3. S. Inoue, The effect of colchicine on the microscopic and submicroscopic structure of the mitotic spindle, Exp. Cell Res. 1952, Z(Suppl.),305. 4. S. Inoue, H. Sato, Cell motility by labile association of molecules. The nature of mitotic spindle fibers and their role in chromosome movement,
/. Gen. Physiol. 1967, 5O(Suppl.),
1.
259-292. S. Inoue, Organization and function of the mitotic spindle, in Primitive Motile Systems in Cell Biology, (Eds.: R.D. Allen, K. Kamiya),Academic Press, New York, 1964,549-598. 6. E.W. Taylor, The Mechanism of Colchicine Inhibition of Mitosis. I. Kinetics of Inhibition and the Binding of H3-Colchicine,J. Cell B i d . 1965, 25(Suppl.), 145 - 160. 7. G.G. Borisy, E.W. Taylor, The mechanism of action of colchicine. Binding of colchicine-3H to cellular protein, J. Cell Biol. 1967a, 34, 525-533. 5.
References 191 8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
G.G. Borisy, E.W. Taylor, The mechanism of action of colchicine. Colchicine binding to sea urchin eggs and the mitotic apparatus, /. Cell Biol. 1967b, 34, 535-548. M.L. Shelanski, E.W. Taylor, Isolation of a protein subunit from microtubules, J . Cell Biol. 1967, 34, 549-554. R.C. Weisenberg, G.G. Borisy, E.W. Taylor, The colchicine-binding protein of mammalian brain and its relation to microtubules, Biochemistry 1968, 7, 4466-4479. H. Mohri, Amino-acid composition of “Tubulin” constituting microtubules of sperm flagella, Nature 1968, 21 7, 1053-1054. P.B. Schiff, J. Fant, S.B. Honvitz, Promotion of microtubule assembly in vitro by taxol, Nature 1979, 277, 665-667. P.B. Schiff, S.B. Honvitz, Taxol stabilizes microtubules in mouse fibroblast cells, Proc. Natl. Acad. Sci. U.S.A. 1980, 77, 1561-1565. R.B. Vallee, A taxol-dependent procedure for the isolation of microtubules and microtubule-associated proteins (MAPs),J. Cell Biol.1982, 92,435-442. R.D. Vale, T.S. Reese, M.P. Sheetz, Identification of a novel force-generating protein, kinesin, involved in microtubule-based motility, Cell 1985, 42, 39-50. M.A. Jordan, L. Wilson, Microtubules as a target for anticancer drugs, Nat. Rev. Cancer 2004, 4, 253-265. M.A. Jordan, K. Wendell, S . Gardiner, W.B. Derry, H. Copp, L. Wilson, Mitotic block induced in HeLa cells by low concentrations of paclitaxel (Taxol) results in abnormal mitotic exit and apoptotic cell death, Cancer Res. 1996, 56,816-825. C.L. Rieder, H. Maiato, Stuck in division or passing through: what happens when cells cannot satisfy the spindle assembly checkpoint, Dev. Cell 2004, 7,637-651. W. Tao, V.J. South, Y. Zhang, J.P. Davide, L. Farrell, N.E. Kohl, L. Sepp-Lorenzino, R.B. Lobell,
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
Induction of apoptosis by an inhibitor of the mitotic kinesin KSP requires both activation of the spindle assembly checkpoint and mitotic slippage, Cancer Cell 2005, 8, 49-59. B.A. Weaver, D.W. Cleveland, Decoding the links between mitosis, cancer, and chemotherapy: the mitotic checkpoint, adaptation, and cell death, Cancer Cell 2005, 8, 7-12. S.B. Carter, Effects of cytochalasins on mammalian cells, Nature 1967, 213, 261 -264. N.K. Wessells, B.S. Spooner, J.F. Ash, M.O. Bradley, M.A. Luduena, E.L. Taylor, J.T. Wrenn, K. Yamaa, Microfilarnents in cellular and developmental processes, Science 1971, 271,135-143. J.A. Spudich, S. Lin, Cytochalasin B, its interaction with actin and actornyosin from muscle (cell movement-microfilaments-rabbit striated muscle), Proc. Natl. Acad. Sci. U.S.A. 1972, 69,442-446. M.J. Caterina, D. Julius, The vanilloid receptor: a molecular gateway to the pain pathway, Annu. Rev. Neurosci. 2001, 24,487-517. U. Oh, S.W. Hwang, D. Kim, Capsaicin activates a nonselective cation channel in cultured neonatal rat dorsal root ganglion neurons, J. Neurosci. 1996, 16, 1659-1667. J.N. Wood, J . Winter, I.F. James, H.P. Rang, J. Yeats, S. Bevan, Capsaicin-induced ion fluxes in dorsal root ganglion cells in culture, J. Neurosci. 1988, 8, 3208-3220. M.J. Caterina, M.A. Schumacher, M. Tominaga, T.A. Rosen, J.D. Levine, D. Julius, The capsaicin receptor: a heat-activated ion channel in the pain pathway, Nature 1997, 389,816-824. M. Tominaga, M.J. Caterina, A.B. Malmberg, T.A. Rosen, H. Gilbert, K. Skinner, B.E. Raurnann, A.I. Basbaum, D. Julius, The cloned capsaicin receptor integrates multiple pain-producing stimuli, Neuron 1998, 21,531-543. M.J. Caterina, A. Leffler, A.B. Malmberg, W.J. Martin, J. Trafton, K.R. Petersen-Zeitz, M. Koltzenburg,
92
I
'
2 Using Natural Products t o Unravel Biological Mechanisms
30.
31.
32.
33.
34.
35.
36.
37.
38.
A.I. Basbaum, D. Julius, Impaired nociception and pain sensation in mice lacking the capsaicin receptor, Science 2000, 288, 306-313. J.B. Davis, J. Gray, M.J. Gunthorpe, J.P. Hatcher, P.T. Davey, P. Overend, M.H. Harries, J. Latcham, C. Clapham, K. Atkinson, S.A. Hughes, K. Rance, E. Grau, A.J. Harper, P.L. Pugh, D.C. Rogers, S. Bingham, A. Randall, S.A. Sheardown, Vanilloid receptor-1 is essential for inflammatory thermal hyperalgesia, Nature 2000, 405, 183-187. H. Hensel, Y. Zotterman, The effect of menthol on the thermoreceptors, Acta Physiol. Scand. 1951, 24,27-34. D.D. McKemy, W.M. Neuhausser, D. Julius, Identification of a cold receptor reveals a general role for TRP channels in thermosensation, Nature 2002,416 52-58. A.M. Peier, A. Moqrich, A.C. Hergarden, A.J. Reeve, D.A. Anderson, G.M. Story, T.J. Earley, I. Dragoni, P. McIntyre, S. Bevan, A. Patapoutian, A TRP channel that senses cold stimuli and menthol, Cell 2002, 108,705-715. S.E. Jordt, D.D. McKemy, D. Julius, Lessons from peppers and peppermint: the molecular logic of thermosensation, Curr. Opin. Neurobiol. 2003, 13,487-492. M.A. Hoyt, L. Totis, B.T. Roberts, S. cerevisiae genes required for cell cycle arrest in response to loss of microtubule function, Cell 1991, 156, 507-517. R. Li, A.W. Murray, Feedback control of mitosis in budding yeast, Cell 1991, GG, 519-531. S. Hanks, K. Coleman, S. Reid, A. Plaja, H. Firth, D. Fitzpatrick, A. Kidd, K. Mehes, R. Nash, N. Robin, N. Shannon, J. Tolmie, J. Swansbury, A. Irrthum, J. Douglas, N. Rahman, Constitutional aneuploidy and cancer predisposition caused by biallelic mutations in B U B l B , Nut. Genet. 2004,36,1159-1161. R.H. Chen, J.C. Waters, E.D. Salmon, A.W. Murray, Association of spindle
39.
40.
41.
42.
43.
44.
45.
46.
47.
assembly checkpoint component XMAD2 with unattached kinetochores, Science 1996, 274, 242-246. J.C. Waters, R.H. Chen, A.W. Murray, E.D. Salmon, Localization of Mad2 to kinetochores depends on microtubule attachment, not tension, J . Cell Bid. 1998, 141,1181-1191. T.U. Mayer, T.M. Kapoor, S. J. Haggarty, R.W. King, S.L. Schreiber, T.J. Mitchison, Small molecule inhibitor of mitotic spindle bipolarity identified in a phenotype-based screen, Science 1999, 286,971-974. J.C. Canman, L.A. Cameron, P.S. Maddox, A. Straight, J.S. Tirnauer, T.J. Mitchison, G. Fang, T.M. Kapoor, E.D. Salmon, Determining the position of the cell division plane, Nature 2003, 424,1074-1078. T. Sudo, M. Nitta, H. Saya, N.T. Ueno, Dependence of paclitaxel sensitivity on a functional spindle assembly checkpoint, Cancer Res. 2004, 64, 2502-2508. A. Khodjakov, L. Copenagle, M.B. Gordon, D.A. Compton, T.M. Kapoor, Minus-end capture of preformed kinetochore fibers contributes to spindle morphogenesis, J . Cell Biol. 2003, 160,671-683. M.A. Lampson, K. Renduchitala, A. Khodjakov, T.M. Kapoor, Correcting improper chromosome-spindle attachments during cell division, Nut. Cell Biol. 2004,6,232-237. S. Biggins, F.F. Severin, N. Bhalla, I. Sassoon, A.A. Hyman, A.W. Murray, The conserved protein kinase Ipll regulates microtubule binding to kinetochores in budding yeast, Genes Dev. 1999, 13, 532-544. T.U. Tanaka, N. Rachidi, C. Janke, G. Pereira, M. Galova, E. Schiebel, M.J. Stark, K. Nasmyth, Evidence that the Ipll-Slil5 (Aurora kinase-INCENP) complex promotes chromosome bi-orientation by altering kinetochore-spindle pole connections, Cell 2002, 108,317-329. E.A. Harrington, D. Bebbington, J. Moore, R.K. Rasmussen,
References I 9 3
48.
49.
50.
51.
52.
53.
54.
55.
A.O. Ajose-Adeogun, T. Nakayama, J.A. Graham, C. Demur, T. Hercend, A. Diu-Hercend, M. Su, J.M. Golec, K.M. Miller, VX-680, a potent and selective small-molecule inhibitor of the Aurora kinases, suppresses tumor growth in vivo, Nat. Med. 2004, 10, 262-267. P. Meraldi, R. Honda, E.A. Nigg, Aurora kinases link chromosome segregation and cell division to cancer susceptibility, Cum. Opin. Genet. Dev. 2004, 14,29-36. T.M. Kapoor, T.U. Mayer, M.L. Coughlin, T.J. Mitchison, Probing spindle assembly mechanisms with monastrol, a small molecule inhibitor of the mitotic kinesin, Eg5, J. Cell Biol. 2000, 150,975-988. R.B. Nicklas, S.C. Ward, Elements of error correction in mitosis: microtubule capture, release, and tension,]. Cell Biol. 1994, 126, 1241-1253. F.T. Wieland, M.L. Gleason, T.A. Serafini, J.E. Rothman, The rate of bulk flow from the endoplasmic reticulum to the cell surface, Cell 1987, SO, 289-300. T . Fujiwara, K. Oda, S. Yokota, A. Takatsuki, Y. Ikehara, Brefeldin A causes disassembly of the Golgi complex and accumulation of secretory proteins in the endoplasmic reticulum, J . Biol. Chem. 1988, 263, 18545-18552. Y. Misumi, K. Miki, A. Takatsuki, G. Tamura, Y. Ikehara, Novel blockade by brefeldin A of intracellular transport of secretory proteins in cultured rat hepatocytes, J . Bid. Chem. 1986,261, 11398-11403. R.W. Doms, G. Russ, J.W. Yewdell, Brefeldin A redistributes resident and itinerant Golgi proteins to the endoplasmic reticulum, J . Cell Biol. 1989, 109,61-72. J. Lippincott-Schwartz, L.C. Yuan, J.S. Bonifacino, R.D. Klausner, Rapid redistribution of Golgi proteins into the ER in cells treated with brefeldin A: evidence for membrane cycling from Golgi to ER, Cell 1989, 56, 801-81 3.
56.
57.
58.
59.
60.
61.
62.
63.
J.G. Donaldson, J. LippincottSchwartz, G.S. Bloom, T.E. Kreis, R.D. Klausner, Dissociation of a 110-kD peripheral membrane protein from the Golgi apparatus is an early event in brefeldin A action, J . Cell Biol. 1990, 1 1 I , 2295-2306. R. Duden, G. Griffiths, R. Frank, P. Argos, T.E. Kreis, Beta-COP, a 110 kD protein associated with non-clathrin-coated vesicles and the Golgi complex, shows homology to beta-adaptin, Cell 1991, 64, 649-665. T. Serafini, G. Stenbeck, A. Brecht, F. Lottspeich, L. Orci, J.E. Rothman, F.T. Wieland, A coat subunit of Golgi-derived non-clathrin-coated vesicles with homology to the clathrin-coated vesicle coat protein beta-adaptin, Nature 1991b, 349, 215-220. L. Orci, M. Tagaya, M. Amherdt, A. Perrelet, J.G. Donaldson, J . Lippincott-Schwartz, R.D. Klausner, J.E. Rothman, Brefeldin A, a drug that blocks secretion, prevents the assembly of non-clathrin-coated buds on Golgi cisternae, Cell 1991, 64, 1183-1 195. J.G. Donaldson, J. Lippincott-Schwartz, R.D. Klausner, Guanine nucleotides modulate the effects of brefeldin A in semipermeable cells: regulation of the association of a 170-kD peripheral membrane protein with the Golgi apparatus, J. Cell Biol. 1991b, 112, 579-588. T. Stearns, M.C. Willingham, D. Botstein, R.A. Kahn, ADP-ribosylation factor is functionally and physically associated with the Golgi complex, Proc. Natl. Acad. Sci. U.S.A. 1990, 87,1238-1242. J.G. Donaldson, R.A. Kahn, J , Lippincott-Schwartz, R.D. Klausner, Binding of ARF and beta-COP to Golgi membranes: possible regulation by a trimeric G protein, Science 1991a, 254, 1197-1 199. T. Serafini, L. Orci, M. Amherdt, M. Brunner, R.A. Kahn, J.E. Rothman, ADP-ribosylation factor is a subunit of the coat of Golgi-derived COP-coated
94
I
2 Using Natural Products to Unravel Biological Mechanisms
vesicles: a novel role for a GTPin the excision of the intervening binding protein, Cell 1991a, 67, sequence, Cell 1981, 27,487-496. 70. C. Guerrier-Takada. K. Gardiner, 239-253. 64. J.G. Donaldson, D. Cassel, R.A. Kahn, T. Marsh, N. Pace, S . Altman, The R.D. Klausner, ADP-ribosylation RNA moiety of ribonuclease P is the factor, a small GTP-binding protein, is catalytic subunit of the enzyme, Cell required for binding of the coatomer 1983,35,849-857. protein beta-COP to Golgi 71. R.E. Monro, K.A. Marcker, membranes, Proc. Natl. Acad. Sci. Ribosome-catalysedreaction of U.S.A. 1992a, 89, 6408-6412. puromycin with a 65. J.G. Donaldson, D. Finazzi, R.D. formylmethionine-containing Klausner, Brefeldin A inhibits Golgi oligonucleotide,/. Mol. Biol. 1967, 25, membrane-catalysed exchange of 347-350. guanine nucleotide onto ARF protein, 72. H.F. Noller, V. Hoffarth, L. Zimniak, Nature 1992b, 360,350-352. Unusual resistance of peptidyl 66. J.B. Helms, J.E. Rothman, Inhibition transferase to protein extraction by brefeldin A of a Golgi membrane procedures, Science 1992, 256, enzyme that catalyses exchange of 1416-1419. guanine nucleotide bound to ARF, 73. M. Welch, J. Chastang, M. Yarus, An Nature 1992,360, 352-354. inhibitor of ribosomal peptidyl 67. J.E. Rothman, The protein machinery transferase using transition-state of vesicle budding and fusion, Protein analogy, Biochemistry 1995, 34, S C ~1996, . 5, 185-194. 385-390. 68. C.L. Jackson, Brefeldin A revealing the 74. P. Nissen, J. Hansen, N. Ban, P.B. fundamental principles governing Moore, T.A. Steitz, The structural membrane dynamics and protein basis of ribosome activity in peptide transport, Subcell. Biochem. 2000,34, bond synthesis, Science 2000, 289, 233-272. 920-930. 69. T.R. Cech, A.J. Zaug, P.J. Grabowski, 75. J. Zhang, R.E. Campbell, A.Y. Ting, In vitro splicing of the ribosomal RNA R.Y. Tsien, Creating new fluorescent precursor of Tetrahymena: probes for cell biology, Nat. Rev. Mol. involvement of a guanosine nucleotide Cell Biol. 2002, 3, 906-918.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
2.2 Using Natural Products to Unravel Cell Biology
2.2 Using Natural Products to Unravel Cell Biology
Jonathan D. Gough and Craig M . Crews
Outlook
In recent years, a new discipline has emerged from the interface of chemistry and biology, known as chemical biology. The unique foundation of this field is the examination of biological questions through the use of chemical probes. An example of chemical genetics is the use of biologically active natural products as “inducible alleles” for elucidating protein function. In this chapter, we discuss a variety of different natural products and their use in understanding cell biology. 2.2.1 Introduction
With the sequencing of the human genome, advances in biological research have grown exponentially. The use of genetic knockouts, RNA interference, and site-directed mutagenesis to understand the roles of genes and gene products is now becoming commonplace. Fundamentally, these methods perturb protein expression at the genetic or transcriptional level. Although these new tools have significantly improved our understanding of molecular, cellular, and developmental biology, many questions still remain intractable. Through the use of chemical genetics, biologically active compounds are now being used as another means to address difficult biological questions. Small molecules offer a significant advantage over classical genetic techniques in that they can serve as “conditional alleles”. For example, a small molecule that targets a specific protein can be used to “knock out” or inhibit that protein only at a certain point during the cell cycle or during an organism’s developmental process. In this approach, small molecules act as “conditional alleles” that can be used in a temporal manner to induce or inhibit a specific biological response, thus providing a method to selectively investigate cell-signaling events within a narrow temporal window. In this way, chemical genetics has provided the means to answer biological questions that are difficult to study with standard genetic methods. 2.2.2 Historical Development
Evolution has taught us that biological systems find or create ways to adapt to exogenous forces or stressors. Natural products are often the result of this Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
I
95
96
l survival mechanism. These often highly potent small molecules encompass 2 Using Natural Products to Unravel Biological Mechanisms
a diverse array of structural variation and biological activities. Historically, isolated compounds and extracts have been utilized as herbal remedies or drugs. Initially, pharmaceutical companies utilized natural products as a source or lead toward new drug candidates. Although most of these compounds lack the potential for use as drugs, biologists in recent years have found that natural products are useful for perturbing model cell systems. As a class of compounds, they offer a unique starting point for investigating biological systems. Because they are created in a living system, they are often cell permeable and have specific biological targets. Using structure activity relationships, via the analysis of analogs, natural products provide a starting point for the development of new synthetic biological probes and insight into their mechanism of action. 2.2.3 General Considerations
It is doubtful that Asperillus firnigatus evolved to produce the potent antiangiogenic natural product fumagillin as a means to inhibit endothelial cell growth. Nevertheless, secondary metabolites from many natural sources have unexpected biological activities and have proved useful as cellular probes or even as drug candidates. While many biologically active natural products are isolated each year, not all have the potential to be effective biological tools. Natural products are often isolated based on relatively simple bioassays such as cell growth inhibition. Those compounds that block cell growth in a nonselective manner (i.e., DNA intercalcation, ionophore activity, electrontransport disruption), offer little in the ability to control specific intracellular signaling processes. Thus, those natural products that most likely serve as ligands for enzymes offer the most potential use as chemical genetic probes.
2.2.4 Applications and Practical Examples
2.2.4.1 HDAC Inhibitors: Histone Deacetylase Inhibitors The posttranslational modification of histones provides a code for the correct regulation of gene expression by affecting chromatin structure and interaction with regulatory factors. Modifications include acetylation, deacetylation, phosphorylation, methylation, and ubiquitination [l].Histone acetyltransferases(HATS)serve to activate gene transcription by acetylating the E-amine of lysine residues of histone tails. Conversely, histone deacetylases (HDACs) serve to deacetylate the lysine residues resulting in chromatin condensation and subsequent transcriptional silencing [2]. Since the discovery
2.2 Using Natural Products t o Unravel Cell Biology
of the first HDAC inhibitors trichostatin A (TSA)1 and trapoxin (TPX) 2 in the 1990s [3] these, and other similar inhibitors have provided insight into a diverse array of cell-signaling events: cell cycle arrest, apoptosis, cell differentiation, angiogenesis, and metastasis inhibition. The general mechanism of action for many of these natural products entails an aliphatic chain with a metal chelating moiety that interferes with zinc coordination in the binding pocket of their targeted HDACs.
1
3
2.2.4.1.1
Trichostatin A
0
The antifungal natural product TSA, originally isolated from a Streptomyces,was found to have reversible biological activity at low nanomolar concentrations. Yoshida and coworkers [4]demonstrated that TSA causes the induction of Friend leukemia cell differentiation as well as inhibition of the cell cycle of normal rat fibroblasts in the G I and G2 phases. This initial work revealed that at low nanomolar concentrations, TSA induces the accumulation of acetylated histones because of inhibiting HDAC activity within the cell. TSA has also been shown to induce apoptosis in various tumor cell lines [5] thereby making HDACs possible targets for cancer treatment. By blocking HDACs, inhibitors such as TSA affect the level of gene transcription, causing both the up- and downregulation of many genes ( ~ 2 % of the genome) [GI. For example, TSA was found to reduce the expression of cyclin B1, a key cyclin for G2-M transition, but in fact also stimulated expression of p21C1P/WAF, an inhibitor of cyclin-dependent kinase (CDK)and Cdc2. Through TSA-mediated HDAC inhibition, the G2-M transition is blocked because of
I
97
98
I
2 Using Natural Products to Unravel Biological Mechanisms
increased transcription of cell cycle regulators, p21C'P/WAF and cyclin B1. This occurs via the modulation of histone acetylation at these gene promoters [7]. In addition, TSA has proved useful in the elucidation of important nuances of cell differentiation. Cell cycle inhibitors had shown that inhibition of proliferation was necessary, but not sufficient, for the differentiation of neuronal precursor cells into oligodendrocytes [8]. Given the significant level of chromatin remodeling that accompanies cellular differentiation, MarinHusstege and colleagues [9]hypothesized that histone acetylation plays a role in oligodendrocyte differentiation. Using synchronized primary neonatal rat cortical progenitors that were induced to differentiate into oligodendrocytes, the authors showed that there is a temporal window during which histone deacetylation is correlated with the acquisition of a branched morphology and myelin gene expression. TSA-treated progenitors were able to exit from the cell cycle but did not progress into oligodendrocytes. The ability of HDAC inhibitors to inhibit oligodendrocyte differentiation is cell lineage dependent, although TSA did not affect the precursor cells' ability to differentiate into astrocytes. These results suggest that transcriptional repression is a crucial event during oligodendrocyte lineage progression. 2.2.4.1.2
Trapoxin
The irreversible HDAC inhibitor TPX was first isolated as a fungal metabolite that induced morphological reversion of v-sis-transformed NIH 3T3 cells [lo]. Using the known structure-activity relationship between other HDAC inhibitors as a guide, a TPX affinity reagent was synthesized and used to identify its target protein as a HDAC [ll]. TPX was used to elucidate the protein interactions necessary for HDAC mediated transcriptional repression via the Mad:Max ternary complex [ 121. Previous studies had suggested that Mad:Max transcriptional repression was mediated by ternary complex formation with another unknown protein. Biochemical experiments identified the proteins mSin3A or B as the primary candidates responsible for this negative transcriptional function. Coexpression of activated or inactivated MAD (a DNA-binding transcription factor) in the presence of TPX demonstrated that HDAC activity was necessary for ternary complex formation. Additionally, these and other experiments showed that the Mad:Max heterocomplexes repress transcription in a mSin3A-associated H DAC-dependent manner. 2.2.4.1.3
Apicidin and Depudecin
Like TPX, the rnicrobially derived HDAC inhibitor depudecin 4 was also isolated based on its ability to reverse the transformed cellular phenotype of tumor cells. This diepoxide-containing natural product induced a flat phenotype in Ki-rus-transformed NIH 3T3 cells and was further characterized as an HDAC inhibitor by its ability to induce the accumulation of acetylated histones [13]. Apicidin (APC) 3, a cyclic tetrapeptide HDAC inhibitor with
2.2 Using Natural Products to Unravel Cell Biology
structural similarity to TPX, was shown to possess potent antiproliferative activity against various cancer cell lines [14], and like depudecin, displays potent in uitro and in uivo antiangiogenic activities [15, 161. Thus, given the ability of HDAC inhibitors to arrest cell proliferation and reverse tumor cell morphology, HDAC inhibitors have generated much attention as a new class of antitumor drugs.
2.2.4.2
Cyclin-dependent Kinase Inhibitors
Cyclin-dependent kinases (CDKs) play key roles in regulating cell cycle progression. Throughout the cell cycle, different CDKs are activated and are directly responsible for driving the cell from one phase to the next. Individual CDK activity is regulated by a number of cellular processes: cyclin association, association with cyclin-dependent inhibitors (CDI),CDK synthesis, proteolysis, and various posttranslational modifications. Progression through the cell cycle is controlled by the concentrations of different cyclin proteins, Thus, cyclin degradation results in the loss of activity from its CDK partner, leading to the arrest of the cell cycle. The regulation of cell cycle progression is important for the cells’ ability to deal with external stresses. Therefore, CDKs serve a checkpoint function, in that the cellular stress can block entry into the next phase of the cell cycle through the expression of a member of the three major and ~ 1 6 [17]. ” ~ ~ ~ CDI families, p21C’p’wAF, The idea of targeting CDKs represents a completely different strategy for treating tumor cells: finding small molecules that inhibit specific molecular targets as opposed to drugs that just kill tumor cells. Functionally, all CDK inhibitors act by competitive inhibition of ATP binding to a CDK. Whereas disruption of the CDK-cyclin interaction is an attractive therapeutic strategy given its requirement for CDK activity, the large protein-protein-binding surface of this interaction makes it a less-than-ideal target relative to the small, well-defined ATP-binding pocket of CDK. Accordingly, several antiproliferative natural products target the ATP-binding site on CDKs. 2.2.4.2.1
Purine Analogs
The natural products olomoucine 7 and roscovitine 8 are relatively selective kinase inhibitors that bind CDK1, 2, and 5 but have little effect on CDK4 and G [18].These selective CDK inhibitory profiles result in cell cycle arrest in the GI and G2 phases. Both inhibitors act in a dose-dependent and reversible manner, thus allowing temporal control of CDK activity at different stages of the cell cycle. CDK inhibition by these potent natural products results in four major cellular consequences: (a) inhibition of cell proliferation; (b) induction of apoptosis in mitotic cells; (c) induction of cellular differentiation; and (d) protection from apoptosis. Several studies have shown that purine derivatives arrest cells in
I
99
100
I
2 Using Natural Products to Unravel Biological Mechanisms
\
/
N
5
OH 8
OH 7
6
I
QOH CI OH 0 9
10
11
either GI or GZ [19-211 primarily due to CDK2 and CDKl inhibition; however, the effect on Erkl/2 activity has also been demonstrated [22]. CDK purine derivative inhibitors also induce apoptosis in mitotic cells when combined with other drug treatments. For example, roscovitine and olomoucine were found to synergize with a farnesyltransferase inhibitor [23] to induce apoptosis of human cancer cell lines. In addition, the combination of the microtubule stabilizing drug Taxol@with the CDKl inhibitor purvalanol A 9 results in HeLa cell apoptosis [24]. Treatment with either Taxol@or purvalanol A alone and in combination (in the reverse order) were ineffectual, demonstrating an ordered cooperativitybetween the two drugs. Likewise, the induction of differentiation in murine erythroleukemia cells is triggered by the combined sequential inhibition of CDK2 (with roscovitine) and CDK6 (via p16"K4a), while the reverse sequence of inhibition was ineffective [20,25,26].Finally, purine analog inhibitors of CDKs (5-10) can protect cells from apoptosis via a mechanism yet undefined. Examples of this phenomenon include the prevention of CAMPinduced apoptosis in rat leukemia cells [27], etoposide-induced apoptosis in rat fibroblasts [28], and cell death in human immunodeficiency virus (H1V)-inducedsyncytia [29]. 2.2.4.2.2
Flavopiridol
Flavopiridol (FLV) 11 is a sernisynthetic flavinoid derived from rohitukine, an indigenous plant from India [30]. FLV can induce cell cycle arrest by three mechanisms: (a) direct inhibition of CDK via binding in the ATP-binding site;
2.2 Using Natural Products to Unravel Cell Biology
(b) inhibition of CDK7/cyclin H consequently leading to loss of CDK activation [31];and (c) decreased levels of cyclin D1, an oncogene that is overexpressed in many human neoplasias [32]. Initial studies revealed that FLV arrested cells in GI or Gz due to CDKl and CDK2 inhibition [33-351. In vitro studies, however, revealed that FLV inhibits all CDKs thus far examined (IC50 100-300 nM) [ 35 - 371.
2.2.4.3
Proteasome Inhibitors
Cell homeostasis and proliferation is dependent on both protein synthesis as well as protein degradation. The proteasome serves as the primary regulator of intracellular proteolysis. Specifically, the proteasome is a 700 kDa, multicatalytic protease complex composed of two 19s regulatory particles flanking the 20s proteolytic cylinder [38], itself consisting of 28 subunits organized into four rings. The proteasome has three major classes of protease activities: (a) trypsinlike activity; (b) chymotrypsin-like activity; (c) peptidylglutamyl peptide hydrolyzing (PGPH) activity or caspaselike activity. Each protease function appears to act independently, thereby degrading most proteins into six to eight amino acid peptides. Proteins are targeted for proteolysis via conjugation to 76 amino acid polypeptide ubiquitin (Ub) catalyzed by a multistep process involving a series of enzymes that: activate the Ub monomer ( E l ) , recognize the protein targeted for degradation (E3), and transfer Ub monomers to lysine residues on the targeted protein (E2). The proteasome has been implicated as a key player in a number of important cellular processes including apoptosis, cell differentiation, M HC class I antigen presentation, NF-KB activation, tumor suppression, and cell division. In particular, the prominent role that the proteasome plays in cellular proliferation has generated much attention toward the use of proteasome inhibitors as antitumor chemotherapeutic agents. As more and more cellular functions are linked to the proteasome, the use of proteasome inhibitors will be increasingly important in the investigation of various signaling interactions. 2.2.4.3.1
Lactacystin
Originally characterized as a microbial metabolite that induced neurite outgrowth in neuroblastoma cells [39, 401, lactacystin 14 was later found to be a potent inhibitor of cell proliferation [41]. Using a [3H] lactacystin analog, Fenteany and coworkers [39] demonstrated that lactacystin and its related clasto-B-lactone covalently bind the N-terminal threonine of the 20s proteasome subunit. Functionally, lactacystin is a relatively nonspecific protease inhibitor, also showing significant inhibition of peptidyl peptidase I1 and cathepsin A [40].Despite this cross-inhibitory activity, lactacystin has been used to investigate the role of the Ub proteasome pathway in a diverse array of systems such as Alzheimer’s disease, breast cancer, neurobiology, kidney research, and nephrology, to name a few [41-461.
I
101
102
I
2 Using Natural Products to Unravel Biological Mechanisms
15 13
14
2.2.4.3.2
a,b-Epoxyketones
Selective covalent inhibitors of proteasome have also been developed. Epoxomicin and eponemycin are members of the cr,B-epoxyketone class of proteasome inhibitors that were isolated from actinomycete strains and found to exhibit in vivo antitumor activity against B16 melanoma [47,48]. Early structure activity studies and structural motifs present in similar molecules suggested that the terminal epoxyketone moiety was an important aspect of the functional pharmacophore, possibly via covalent modification of its target protein. Through synthetic chemistry and biochemical affinity techniques, the natural products and corresponding biotinylated affinity reagents have been used to identify the 20s proteasome as the molecular target of epoxomicin 12 and eponemycin 13 [38,491. X-ray crystallographic analysis demonstrated that the epoxyketone pharmacophore of epoxomicin forms a covalent adduct as a morpholino ring [SO] with the amino terminal threonine of the 20s proteasome. Epoxomicin draws its specificity from the uniqueness of the proteasomal N-terminal threonine; nonproteasomal proteases lack an N-terminal nucleophilic residue and thus cannot form a stable covalent morpholino adduct with the epoxomicin epoxyketone pharmacophore [50]. These potent and specific proteasome inhibitors have been used to answer questions in a number of biological fields and systems. For example, proteasome inhibitors have been used to investigate inflammation, cancer biology
2.2 Using Natural Products to Unravel Cell Biology
and neuroscience. In immune research, chemokines and their receptors play an important role in host immune surveillance and are important mediators of HIV pathogenesis and inflammatory response. Chemokines and their receptors have also been implicated in hematopoiesis, angiogenesis, embryonic development and breast cancer metastasis. Specifically, they play important roles in immune and inflammatory responses by regulating the directional migration and activation of leukocytes. The chemokine receptors CXCR4 and CCR5 have been shown to act as coreceptors for the entry and infection of HIV-1 and HIV-2. The proteasome inhibitors lactacystin and epoxomicin have been used to show that downmodulation mechanisms and chemotaxis mediated by CCR5 and CXCR4 are dependent upon proteasome activity [51]. 2.2.4.3.3
TMC-95A
Recently, more selective noncovalent inhibitors of proteasome have been developed. TMC-95A 15 is a potent and reversible selective inhibitor of the chymotrypsin-like, trypsinlike, and caspaselike activities ofthe 20s proteasome. Comparatively, TMC-95A shows no inhibition of calpain, cathepsin, or trypsin. This selectivity in activity has led to a great deal of current biological interest in TMC-95A [50, 52,531 including X-ray crystallographic analysis showing that TMC-95A does not covalently bind the yeast proteasome [54].
2.2.4.4
ATPase Inhibitors
Vacuolar ATPases (V-ATPases)are a class of enzymes that are found throughout eukaryotes. Fundamentally, these multisubunit complexes function as proton pumps, moving hydrogen ions from one side of a membrane to the other. In so doing, they alter the pH of the distal compartment. Typically, V-ATPases perform this function on the membrane of cellular vacuoles and are dependent on ATP for the energy required to carry out their function. Structurally, eukaryotic V-ATPases are comprised of 13 different polypeptides, which are defined as comprising two specific functional domains; Vo is the transmembrane-ion channel domain and V1 is the ATPase or ATP-binding domain. Small molecule V-ATPase inhibitors are thought to function primarily through binding to and inhibiting the Vo domain. In recent years, V-ATPase have become important drug targets because their inhibition leads to highly specific cytotoxic effects [55]. 2.2.4.4.1 Bafilomycins and Concanamycins A series of macrolides, bafilomycins 17 and concanamycins 16 were isolated in a screen for secondary microbial metabolites having effects similar to those of the cardiac glycosides ouabain and digitoxin [56].Their V-ATPase inhibitory effects were not recognized until Bowman and colleagues discovered that bafilomycins inhibit H+ V-ATPases at nanomolar concentrations [57]. Until then these compounds had exhibited a wide range ofbiological activities: in vitro
I
103
104
I
2 Using Natural Products to Unravel Biological Mechanisms
inhibition of P-ATPase, antihelminthic activity against Caenorhabditis elegans, stimulation of y -aminobutyric acid release from rat brain synaptosomes, selective antifungal activity and inhibition of concanavalin-A-stimulated T-cell proliferation.
0
16 I
0 ’
17
From a functional standpoint, V-ATPases act as regulators of organelle pH by pumping protons from the cytoplasm into the lumen. Inhibition of this regulatory effect results in cytotoxicity. However, because these compounds bind reversibly, they can be used to perturb a given system for the purpose of understanding the effect of pH change on other cellular functions or protein interactions. In addition, as they are reversible, recovery from drug treatment can also be observed. Examples include inhibition of acidification in pinocytic vesicles, inhibition of lysosomal acidification and degradation of Epidermal Growth Factor (EGF)in mammalian cells [55].
2.2.4.5
Angiogenesis Inhibitors
Angiogenesis is the formation of new blood vessels from preexisting blood vessels and is required for wound healing and reproduction. In addition to these homeostatic roles for angiogenesis, the formation of new blood vessels has been found to be required for the metastasis and growth of tumors. Since Judah Folkman [58] proposed the link between angiogenesis and tumor growth/metastases, much effort has focused on the identificationand developmentof antiangiogenic small molecules as antitumor chemotherapeutic agents. Angiogenesis is closely regulated through the complex interactions of endogenous factors that promote and inhibit the process. In general,
2.2 Using Natural Products to Unravel Cell Biology
angiogenesis proceeds through three steps [59, 601: degradation of the basement membrane, invasion or migration of cells through the degraded matrix, and differentiation into mature blood vessels. For endothelial cell proliferation to occur, the existing blood vessel cells must degrade the underlying basement membrane and invade the stroma of the neighboring tissue. Once the barrier has been broken, cells proliferate and migrate into the underlying tissue. The cells differentiate and form capillary loops. Subsequently, cell polarity is established and the formation of the lumen begins. Small molecules that interrupt the various phases of angiogenesis have been insightful in determining important signaling events that regulate the various processes involved. 2.2.4.5.1
Curcuminoids
Curcuminoids, a group of natural products originally isolated from the Indian spice turmeric, have been known to be potent antioxidant and antiinflammatory agents for many years. Curcuminoids reduce tissue factor (TF) gene expression through the inhibition of the AP-1 and NF-KB transcription factors and thus lead to the loss of angiogenesis initiation [Gl,621. 2.2.4.5.2
Fumagillin and TNP-470
Fifteen years ago, an astute observation made during the routine culturing of endothelial cells led to the identification of a new antiangiogenic natural product. The natural product fumagillin 18 was isolated from a contaminated A. &migatus fresenius colony in the Folkman laboratory. Subsequent derivatization of the parent natural product by Takada Pharmaceuticals yielded the drug candidate TNP-470 19 that was 50 times more potent than the parent natural product fumagillin [63]. Using the structure activity relationship as a guide, a biotinylated affinity reagent was synthesized and used to identify methionine aminopeptidase 2 (MetAP-2)as the molecular target of fumagillin
19
I
105
106
I and TNP-470 [G4].X-ray crystal structures of the free and the fumagillin-bound 2 Using Natural Products t o Unravel Biological Mechanisms
MetAP-2 revealed the mechanism of action of this potent natural product; a covalent bond between the reactive spirocyclic epoxide of furnagillin and histidine-23 1 of MetAP-2 blocks the active site. Endothelial cells, unlike fibroblasts, display an impressive sensitivity to fumagillin and TNP-470 addition. At the molecular level, TNP-470 does not inhibit early GI mitogenic events such as cellular protein tyrosyl phosphorylation or the expression of immediate early genes [GS]. However, TNP-470 was found to induce expression of the CDK inhibitor p21C'P/WAF and p53 in endothelial cells [GG]. Moreover, the function of both p21C1P/WAF and p53 were shown to be essential for the endothelial cell cycle GI arrest induced by TNP-470 and lack of p21C'P/WAF abrogates the inhibitory activity of TNP-470 on corneal angiogenesis in vivo. Thus it was shown that these antiangiogenic compounds act through p21C'P/WAF induction to GI cell cycle arrest. 2.2.4.6
Immunosuppressant Natural Products
Using the immunosuppressive natural products cyclosporin A (CsA) 20, rapamycin 22, and FK 506 21, researchers were able to unravel two key
4
20
22
21
2.2 Using Natural Products t o Unravel Cell Biology
signal transduction pathways in T lymphocytes (T cells). T cells respond to an immune stimulus through the binding of an antigen-presenting cell to the T-cell receptor (TCR).Binding subsequently initiates a cascade of intracellular signaling events leading to activation and proliferation of the T cells and other cell types required for an immune response. Importantly, this process induces the transcription and thereby production of a range of effector molecules like interleukin 2 (IL-2);IL-2 is secreted and binds to IL-2 receptors on various cells including T lymphocytes and stimulates the cells to progress from G I to the S phase of the cell cycle. This sequential chain of events drives the immune response. Immunosuppressive natural products have proved useful in the elucidation of several immune cell signal transduction pathways through the identification of specific target proteins. 2.2.4.6.1
Cyclosporin A and FK 506
CsA is a cyclic undecapeptide that was isolated from the fungus Cylindrocarpon lucidum Booth and Tolypocladium injlatum Gams in 1970 by the Sandoz Laboratory. Interestingly, CsA has both high potency and selectivity for inhibition of T-cell activation with low cytotoxicity. The structurally unrelated polyketide metabolite FK 506, isolated in 1984 by the Fujisawa Pharmaceutical Company from the fungus Streptomyces tsukubaensis 9996, proved to have 100 times greater immunosuppressive activity than CsA. Although the two natural products were structurally different and had different potencies, they exhibit the same phenotypic biological activity; both compounds prevented the progression from Go to G I during T-cell activation. CsA and FK 506 have proved to be critical tools in elucidating the signaling events downstream of the TCR. Both were found to block the same step in Ca2+-dependentsignaling pathways. Additionally, these natural products were also found to bind to peptidyl-prolyl cis- trans isomerases, collectively known as immunophilins. CsA binds cyclophilin [67] and FK 506 binds FKBP 12 [68].Although it appeared that both natural products functioned through the same mechanism of calcium-dependent gene expression, oddly neither target protein alone initiated the release of calcium. For the cell cycle inhibition, both the small molecule and the protein are needed to be present. Using affinity chromatography with immobilized protein-natural product complexes, the phosphatase calcineurin was identified as the target of both protein-drug complexes [69]. In vivo the protein-ligand pairs formed immunosuppressive complexes that inhibited the calcium-dependent calcineurin phosphatase activity. The T-cell specific transcription factor, NFAT is held in the cytosol through the presence of an inhibitory phosphorylated residue. Upon TCR-mediated calcium release, the calcineurin dephosphorylates NFAT, translocates to the nucleus. CsA and FK 506 have proved useful in identifying this pharmaceutically vulnerable step in immune cell signaling [70].
1
107
108
I
2 Using Natural Products to Unravel Biological Mechanisms
2.2.4.6.2
Rapamycin
The fungal immunosuppressive agent rapamycin was isolated from Streptomyces hygroscopicus, originally found in a soil sample from Rapa-Nui, Easter Island in 1975. Although structurally similar to FK 506, rapamycin demonstrated markedly different activity. Rapamycin does not affect the progression from Go to GI, but rather blocks T-cell progression from GI to S phase. As FK 506 and rapamycin share structural similarities, it was not surprising that rapamycin also bound FKBP 12. However, binding studies revealed that the FKBP 12-rapamycin complex does not target calcineurin, as done by the F K 50G-FKBP 12 complex. Rather, using FKBP 12-rapamycin complex as an affinity reagent, the lipid kinases target of rapamycin 1 and 2 (TOR1 and TOR2) were identified [71]; these proteins possess homology to the mammalian phosphatidyl inositol-3-kinases, which are involved in the regulation of cell cycle progression in stimulated cells. Studies have shown that growth factor addition to cells leads to TOR activation and subsequent increased p70 SG kinase activity [72].
2.2.4.7 2.2.4.7.1
Other Examples of Biologically Active Natural Products Capsaicin
Some of the most commonly and frequently used spices throughout the world are hot peppers of the Capsicum family, of which capsaicin 23 is the major pungent ingredient. Because of its analgesic and anti-inflammatory activities, topical application of capsaicin has been used for the treatment of a variety of neuropathic pain conditions. Autoradiographic visualization of a tritiated resiniferatoxin probe in tissues of various species identified the vanilloid receptor (VR) as a molecular target [73, 741. Additionally, capsaicin was used as a molecular probe to isolate the first nociceptive receptor, VR1[75]. Characterization of VR1 revealed it to be a member of the Transient Receptor Potential (TRP)ion channel family and a nonselective cation channel activated by capsaicin or elevated temperatures.
'0 24
2.2 Using Natural Products t o Unravel Cell Biology
2.2.4.7.2
Parthenolide
Parthenolide 24, the biologically active natural product in the medicinal herb Feverfew, has been used for 2000 years to treat fevers, headaches, and inflammation [76]. Initial studies of the anti-inflammatory of parthenolide activity showed that it was a potent inhibitor of NF-KB nuclear translocation as well as I K B phosphorylation. Using a biotinylated analog of parthenolide in affinity chromatography experiments revealed that parthenolide formed a covalent adduct with IKB Kinase beta (IKK-B) in a specific and dose-dependent manner [77]. This specific interaction between IKKB and parthenolide was confirmed by mass spectrometric analysis. Parthenolide was shown to form a covalent adduct with Cys179 of IKKB, which lies between the two phosphorylated serines in the kinase activation loop. Moreover, constitutively activated protein with a Cysl79Ala point mutation was found to be insensitive to 40 pM parthenolide, indicating that parthenolide inhibits IKKB via Michael addition by Cys179 in the kinase activation loop [77].
2.2.5 Future Development
Mechanism of action studies of biologically natural products have profited greatly from the emerging field of chemical biology as chemists and biologists have worked more closely over the last 15 years. Moreover, these natural products will continue to be of great use as drug development leads in addition to their use as tools for understanding intracellular processes.
2.2.6 Conclusions
After a decade, both natural products and cell-based bioassay screening, which were out of favor, are making a renaissance in the pharmaceutical industry. Natural products still offer an impressive range of chemical diversity and have a long track record of providing scaffolds for successful drugs. A greater appreciation of their potential for the identification of novel hit structures is propelling a new interest in the use of natural product screens in the pharmaceutical industry. Likewise, cell-based bioassays are regaining some of their previous acceptance in the drug development process, primarily because of the success of novel target deconvolution strategies. New proteomic technologies are largely behind the belief that the pharmaceutical industry has the ability to identify the targets of compounds identified in cell-based assays. Obviously, not all biologically active compounds identified in these screens will be developed into therapeutic agents. However, this renewed interest in both natural products and cell-based assays will, in turn, offer many new
I
109
110
2 Using Natural Products to Unravel Biologicd Mechanisms
I opportunities for the development of novel cell biological probes, using the fruits of these screens.
Acknowledgments
The authors would like to acknowledge the financial support of the NIH (grant GMG21G0).
References 1.
2.
3.
4.
5.
6.
7.
8.
A.U. Khan, S. Krishnamurthy, Histone modifications as key regulators of transcription, Front. Biosci. 2005, 10,866-872. M. Grunstein, Histone acetylation in chromatin structure and transcription, Nature 1997,389,349-352. M. Yoshida, S. Horinouchi, T. Beppu, Trichostatin A and trapoxin: novel chemical probes for the role of histone acetylation in chromatin structure and function, BioEssays 1995, 17, 423-430. M. Yoshida, M. Kijima, M. Akita, T. Beppu, Potent and specific inhibition of mammalian histone deacetylase both in vivo and in vitro by trichostatin A,]. Bid. Chem. 1990, 265, 17174- 17179. M.H. Kuo, C.D. Allis, Roles of histone acetyltransferases and deacetylases in gene regulation, BioEssays 1998, 20, 615-626. M. Yoshida, A. Matsuyama, Y. Komatsu, N. Nishino, From discovery to the coming generation of histone deacetylase inhibitors, Curr. Med. Chem. 2003, 10,2351-2358. Y. Sowa, T. Orita, S. HiranabeMinamikawa, K. Nakano, T. Mizuno, H. Nomura, T. Sakai, Histone deacetylase inhibitor activates the p21/WAFl/Cipl gene promoter through the Spl sites, Ann. N.Y. Acad. S C ~1999,886,195-199. . X.M. Tang, J.S. Beesley, J.B. Grinspan, P. Seth, J. Kamholz, F. Cambi, Cell cycle arrest induced by ectopic expression of p27 is not sufficient to promote oligodendrocyte
9.
10.
11.
12.
13.
14.
differentiation, J . Cell. Biochem. 1999, 76,270-279. M. Marin-Husstege, M. Muggironi, A. Liu, P. Casaccia-Bonnefil,Histone deacetylase activity is necessary for oligodendrocyte lineage progression, J . Neurosci. 2002, 22, 10333-10345. H. Itazaki, K. Nagashima, K. Sugita, H. Yoshida, Y. Kawamura, Y. Yasuda, K. Matsumoto, K. Ishii, N. Uotani, H. Nakai et al., Isolation and structural elucidation of new cyclotetrapeptides, trapoxins A and B, having detransformation activities as antitumor agents, /.Antibiot. (Tokyo) 1990,43,1524-1532. J. Taunton, C.A. Hassig, S.L. Schreiber, A mammalian histone deacetylase related to the yeast transcriptional regulator Rpd3p, Science 1996, 272, 408-411. C.A. Hassig, T.C. Fleischer, A.N. Billin, S.L. Schreiber, D.E. Ayer, Histone deacetylase activity is required for full transcriptional repression by mSin3A, Cell 1997,89,341-347. K. Sugita, H. Yoshida, M. Matsumoto, S. Matsutani, A novel compound, depudecin, induces production of transformation to the flat phenotype of NIH3T3 cells transformed by ras-oncogene, Biochem. Biophys. Res. Commun. 1992, 182,379-387. J.W. Han, S.H.Ahn, S.H. Park, S.Y. Wang, G.U. Bae, D.W. Seo, H.K. Kwon, S. Hong, H.Y. Lee, Y.W. Lee, H.W. Lee, Apicidin, a histone deacetylase inhibitor, inhibits proliferation of tumor cells via
15.
16.
17.
18.
19.
20.
21.
22.
23.
induction of p21WAFl/Cipl and gelsolin, Cancer Res. 2000, 60, 6068-6074. S.H. Kim, S.Ahn, J.W. Han, H.W. Lee, H.Y. Lee, Y.W. Lee, M.R. Kim, K.W. Kim, W.B. Kim, S. Hong, Apicidin is a histone deacetylase inhibitor with anti-invasive and anti-angiogenic potentials, Biochem. Biophys. Res. Commun. 2004, 315, 964-970. T. Oikawa, C. Onozawa, M. Inose, M. Sasaki, Depudecin, a microbial metabolite containing two epoxide groups, exhibits anti-angiogenic activity in vivo, Bid. Pharm. Bull. 1995, 18,1305-1307. K. Vermeulen, D.R. Van Bockstaele, Z.N. Berneman, The cell cycle: a review of regulation, deregulation and therapeutic targets in cancer, Cell Prolq 2003,36, 131-149. N. Gray, L. Detivaud, C. Doerig, L. Meijer, ATP-site directed inhibitors of cyclin-dependent kinases, C u r . Med. Chem. 1999, 6,859-875. N. Villerbu, A.M. Gaben, G. Redeuilh, J. Mester, Cellular effects of purvalanol A: a specific inhibitor of cyclin-dependent kinase activities, Int. J. Cancer 2002, 97, 761-769. R.T. Abraham, M. Acquarone, A. Andersen, A. Asensi, R. Belle, F. Berger, C. Bergounioux, G. Brunn, C. Buquet-Fagot, D. Fagot et al., Cellular effects of olomoucine, an inhibitor of cyclin-dependent kinases, Biol. Cell 1995, 83, 105-120. F. Alessi, S. Quarta, M. Savio, F. Riva. L. Rossi, L.A. Stivala, A.I. Scovassi, L. Meijer, E. Prosperi, The cyclin-dependent kinase inhibitors olomoucine and roscovitine arrest human fibroblasts in G1 phase by specific inhibition of CDK2 kinase activity, Exp. Cell Res. 1998, 245, 8-18. M. Knockaert, P. Lenorrnand, N. Gray, P. Schultz, J. Pouyssegur, L. Meijer, p42/p44 MAPKs are intracellular targets of the CDK inhibitor purvalanol, Oncogene 2002, 21, 6413-6424. H. Edarnatsu, C.L. Gau, T. Nemoto, L. Guo, F. Tamanoi, Cdk inhibitors,
24.
25.
26. 27.
28.
29.
30.
31.
roscovitine and olomoucine, synergize with farnesyltransferase inhibitor (FTI) to induce efficient apoptosis of human cancer cell lines, Oncogene 2000, 19,3059-3068. D.S. O’Connor, N.R. Wall, A.C. Porter, D.C. Altieri, A p34(cdc2) survival checkpoint in cancer, Cancer Cell 2002, 2,43-54. 1. Matushansky, F. Radparvar, A.I. S koultchi, Reprogramming leukemic cells to terminal differentiation by inhibiting specific cyclin-dependent kinases in G1, Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 14317-14322. A. Borgne, R.M. Golsteyn, The role of cyclin-dependent kinases in apoptosis, Prog. Cell Cycle Res. 2003, 5, 453-459. T. Sandal, C. Stapnes, H. Kleivdal, L. Hedin, S.O. Doskeland, A novel, extraneuronal role for cyclin-dependent protein kinase 5 (CDK5):modulation of CAMP-induced apoptosis in rat leukemia cells, J . Biol. Chem. 2002, 277,20783-20793. S . Adachi, A.J. Obaya, Z. Han, N. Ramos-Desimone, J.H. Wyche, J.M. Sedivy, c-Myc is necessary for DNA damage-induced apoptosis in the G(2) phase of the cell cycle, Mol. Cell. Bid. 2001, 21,4929-4937. M. Castedo, T. Roumier, J. Blanco, K.F. Ferri, J. Barretina, L.A. Tintignac, K. Andreau, J.L. Perfettini, A. Amendola, R. Nardacci, P. Leduc, D.E. Ingber, S. Druillennec, B. Roques, S.A. Leibovitch, M. Vilella-Bach, J. Chen, ].A. Este, N. Modjtahedi, M. Piacentini, G. Kroemer, Sequential involvement of Cdkl, mTOR and p53 in apoptosis induced by the HIV-1 envelope, EMBOJ. 2002,21,4070-4080. R.G. Naik, S.L. Kattige, S.V. Bhat, B. Alreja, N.J. Desouza, R.H. Rupp, An antiinflammatory cum immunomodulatory piperidinylbenzopyranone from dysoxylum-binectariferum-isolation, structure and total synthesis, Tetrahedron 1988, 44,2081-2086. S . Mani, C. Wang, K. Wu, R. Francis, R. Pestell, Cyclin-dependent kinase inhibitors: novel anticancer agents,
112
I
2 Using Natural Products to Unravel BiologicalI Mechanisms
32.
33.
34.
35.
36.
37.
38.
Expert Opin. Investig. Drugs 2000, 9, 1849-1870. E.A. Sausville, D. Zaharevitz, R. Gussio, L. Meijer, M. Louarn-Leost, C. Kunick, R. Schultz, T. Lahusen, D. Headlee, S. Stinson, S.G. Arbuck, A. Senderowicz, Cyclin-dependent kinases: initial approaches to exploit a novel therapeutic target, Pharmacol. 7'her. 1999, 82, 285-292. G. Kaur, M. Stetler-Stevenson, S. Sebers, P. Worland, H. Sedlacek, C. Myers, J. Czech, R. Naik, E. Sausville, Growth inhibition with reversible cell cycle arrest of carcinoma cells by flavone L86-8275,J. Natl. Cancer Inst. 1992, 84, 1736-1740. P.J.Worland, G. Kaur, M. Stetler-Stevenson, S. Sebers, 0. Sartor, E.A. Sausville,Alteration of the phosphorylation state of p34cdc2 kinase by the flavone L86-8275 in breast carcinoma cells. Correlation with decreased H1 kinase activity, Biochem. Pharmacol. 1993, 46, 1831-1840. M.D. Losiewicz, B.A. Carlson, G. Kaur, E.A. Sausville, P.J.Worland, Potent inhibition of CDC2 kinase activity by the flavonoid L86-8275, Biochem. Biophys. Res. Commun. 1994, 201,589-595. B. Carlson, T. Lahusen, S. Singh, A. Loaiza-Perez,P.J. Worland, R. Pestell, C. Albanese, E.A. Sausville, A.M. Senderowicz, Down-regulation of cyclin D1 by transcriptional repression in MCF-7 human breast carcinoma cells induced by flavopiridol, Cancer Res. 1999, 59, 4634-4641. B.A. Carlson, M.M. Dubay, E.A. Sausville, L. Brizuela, P.J. Worland, Flavopiridol induces G1 arrest with inhibition of cyclin-dependent kinase (CDK)2 and CDK4 in human breast carcinoma cells, Cancer Res. 1996, 56, 2973-2978. N. Sin, K. Kim, M. Elofsson, L. Meng, H. Auth, B.H.B. Kwok, C.M. Crews, Total synthesis of the potent proteasome inhibitor epoxomicin: a useful tool for understanding
39.
40.
41.
42.
43.
44.
45.
46.
47.
proteasome biology, Bioorg. Med. Chem. Lett. 1999, 9,2283-2288. G. Fenteany, R.F. Standaert, W.S. Lane, S. Choi, E.J. Corey, S.L. Schreiber, Inhibition of proteasome activities and subunit-specific amino-terminal threonine modification by lactacystin, Science 1995, 268,726-731. H. Ostrowska, C. Wojcik, S. Omura, K. Worowski, Lactacystin, a specific inhibitor of the proteasome, inhibits human platelet lysosomal cathepsin A-like enzyme, Biochem. Biophys. Rex Commun. 1997,234,729-732. S. Omura, H. Takeshima, Lactacystin: a tool for elucidation of proteasome functions, Tanpakushitsu Kakusan KOSO1996, 41, 327-336. J.Y. Zhang, S.J. Liu, H.L. Li, J.Z. Wang, Microtubule-associated protein tau is a substrate of ATP/Mg(2+)-dependent proteasome protease system,]. Neural Transm. 2005, 112,547-555. T. Tsukinoki, H. Sugiyarna, R. Sunami, M. Kobayashi, T. Onoda, Y. Maeshima, Y. Yamasaki, H. Makino, Mesangial cell Fas ligand: upregulation in human lupus nephritis and NF-kappaB-mediated expression in cultured human mesangial cells, Clin. Exp. Nephrol. 2004,8,196-205. C. Lorz, P. Justo, A.B. Sanz, J. Egido, A. Ortiz, Role of Bcl-xL in paracetamol-induced tubular epithelial cell death, Kidney Int. 2005, 67, 592-601. K.L. De Moliner, M.L. Wolfson, N. Perrone Bizzozero, A.M. Adamo, Growth-associated protein43 is degraded via the ubiquitin-proteasome system, J. Neurosci. Res. 2005, 79, 652-660. M.R. Brown, V. Bondada, J.N. Keller, J. Thorpe, J.W. Geddes, Proteasome or calpain inhibition does not alter cellular tau levels in neuroblastoma cells or primary neurons, 1.Alzheimers Dis. 2005, 7, 15-24. K. Sugawara, M. Hatori, Y. Nishiyama, K. Tomita, H. Kamei, M. Konishi, T. Oki, Eponemycin, a
References
48.
49.
50.
51.
52.
53.
54.
55.
new antibiotic active against B16 melanoma. I. Production, isolation, structure and biological activity,/. Antibiot. 1990, 43, 8-18. M. Hanada, K. Sugawara, K. Kaneta, S. Toda, Y. Nishiyama, K. Tornita, H. Yamamoto, M. Konishi, T. Oki, Epoxomicin, a new antitumor agent of microbial origin, /. Antibiot. 1992, 45, 1746- 1752. L. Meng, R. Mohan, B.H.B. Kwok, M. Elofsson, N. Sin, C.M. Crews, Epoxomicin, a potent and selective proteasome inhibitor, exhibits in vivo anti-inflammatory activity, Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 10403-10408. M. Groll, K.B. Kim, R. Huber, C.M. Crews, Crystal structure of epoxomicin:20S proteasome reveals molecular basis for selectivity of d,,Y-epoxyketone proteasome inhibitors, 1.Am. Chem. Soc. 2000, 122,1237-1238. A.Z. Fernandis, R.P. Cherla, R.D. Chernock, R.K. Ganju, CXCR4/CCR5 down-modulation and chemotaxis are regulated by the proteasome pathway, /. Biol. Chem. 2002,277,18111-18117. J. Kohno, Y. Koguchi, M. Nishio, K. Nakao, M. Kuroda, R. Shimizu, T. Ohnuki, S. Komatsubara, Structures of TMC-95A-D:novel proteasome inhibitors from Apiospora montagnei sacc. TC 1093,J. Org. Chem. 2000, 65,990-995. Y. Koguchi, J. Kohno, M. Nishio, K. Takahashi, T. Okuda, T. Ohnuki, S. Komatsubara, TMC-95A, B, C, and D, novel proteasome inhibitors produced by Apiospora montagnei Sacc. TC 1093. Taxonomy, production, isolation, and biological activities, /. Antibiot. (Tokyo)2000,53, 105-109. M. Groll, Y. Koguchi, R. Huber, J. Kohno, Crystal structure of the 20 S proteasome:TMC-95A complex: a non-covalent proteasome inhibitor, /. Mol. Bid. 2001, 311, 543-548. S. Drose, K. Altendorf, Bafilomycins and concanamycins as inhibitors of V-ATPases and P-ATPases,]. Exp. Bid. 1997, 200, 1-8.
56.
57.
58. 59.
60.
61.
62.
63.
64.
65.
L. Huang, G. Albers-Schonberg, R.L. Monaghan, K. Jakubas, S.S. Pong, O.D. Hensens, R.W. Burg, D.A. Ostlind, J. Conroy, E.O. Stapley, Discovery, production and purification of the Na+, K+ activated ATPase inhibitor, L-681,110from the fermentation broth of streptomyces sp. MA-5038,]. Antibiot. (Tokyo) 1984, 37,970-975. E.J. Bowman, A. Siebers, K. Altendorf, Bafilomycins: a class of inhibitors of membrane ATPases from microorganisms, animal cells, and plant cells, Proc. Natl. Acad. Sci. U.S.A. 1988,85,7972-7976. J. Folkman, Tumor angiogenesis, Adv. Cancer Res. 1974, 19, 331-358. S.M. Hyder, G.M. Stance], Regulation of angiogenic growth factors in the female reproductive tract by estrogens and progestins, Mol. Endocrinol. 1999, 13,806-811. S. Liekens, E. De Clercq, J . Neyts, Angiogenesis: regulators and clinical applications, Biochem. Pharmacol. 2001, 61,253-270. S. Singh, B.B. Agganval, Activation of transcription factor NF-kappa B is suppressed by curcumin (diferuloylmethane) [corrected],/. Biol. Chem. 1995,270,24995-25000. T.S. Huang, M.L. Kuo, J.K. Lin, J.S. Hsieh, A labile hyperphosphorylated c-Fos protein is induced in mouse fibroblast cells treated with a combination of phorbol ester and anti-tumor promoter curcumin, Cancer Lett. 1995, 96, 1-7. D. Ingber, T. Fujita, S. Kishimoto, K. Sudo, T. Kanamaru, H. Brem, J. Folkman, Synthetic analogues of furnagillin that inhibit angiogenesis and suppress tumour growth, Nature 1990,348,555-557. N. Sin, L. Meng, M.Q.W. Wang, J.J. Wen, W.G. Bornmann, C.M. Crews, The anti-angiogenic agent furnagillin covalently binds and inhibits the methionine aminopeptidase, MetAP-2, Proc. Natl. Acad. Sci. U.S.A. 1997, 94,6099-6103. H. Koyama, Y. Nishizawa, M. Hosoi, S. Fukumoto, K. Kogawa, A. Shioi,
I
113
114
I
2 Using Natural Products t o Unravel Biological Mechanisms
66.
67.
68.
69.
70.
71.
72.
H. Morii, The fumagillin analogue 73. A. Szallasi, Autoradiographic Tnp-470 inhibits DNA synthesis of visualization and pharmacological vascular smooth muscle cells characterization of vanilloid stimulated by platelet-derived growth (capsaicin) receptors in several factor and insulin-like growth species, including man, Acta Physiol. factor-I-possible involvement of %and. Suppl. 1995, 629, 1-68. cyclin-dependent kinase 2, Circ. Res. 74. A. Szallasi, S. Nilsson, 1996, 79,757-764. T. Farkas-Szallasi, P.M. Blumberg, J.R. Yeh, R. Mohan, C.M. Crews, The T. Hokfelt, J.M. Lundberg, Vanilloid antiangiogenic agent TNP-470 (capsaicin) receptors in the rat: requires p53 and p21CIP/WAF for distribution in the brain, regional endothelial cell growth arrest, Proc. differences in the spinal cord, axonal Natl. Acad. Sci. U.S.A. 2000, 97, transport to the periphery, and 12782- 12787. depletion by systemic vanilloid R.E. Handschumacher, M.W. treatment, Brain Res. 1995, 703, Harding, J. Rice, R.J. Drugge, D.W. 175-183. Speicher, Cyclophilin: a specific 75. S.M. Huang, T. Bisogno, M. Trevisani, cytosolic binding protein for A. Al-Hayani, L. De Petrocellis, cyclosporin A, Science 1984, 226, F. Fezza, M. Tognetto, T.J. Petros, J.F. 544-547. Krey, C.J. Chu, J.D. Miller, S.N. G.D. Van Duyne, R.F. Standaert, P.A. Davies, P. Geppetti, J.M. Walker, V. Di Karplus, S.L. Schreiber, J. Clardy, Marzo, An endogenous capsaicin-like Atomic structure of FKBP-FK506,an substance with high potency at immunophilin-immunosuppressant recombinant and native vanilloid VR1 complex, Science 1991, 252,839-842. receptors, Proc. Natl. Acad. Sci. U.S.A. J. Liu, J.D.J. Farmer, W.S. Lane, 2002, 99,8400-8405. J. Friedman, 1. Weissman, S.L. 76. S . Heptinstall, D.V. Awang, B.A. Schreiber, Calcineurin is a common Dawson, D. Kindack, D.W. Knight, target of cyclophilin-cyclosporin A and J. May, Parthenolide content and FKBP-FK506complexes, Cell 1991, 66, bioactivity of feverfew (Tanaceturn 807-815. parthenium (L.) Schultz-Bip.). N.A. Clipstone, G.R. Crabtree, Estimation of commercial and Identification of calcineurin as a key authenticated feverfew products, 1. signalling enzyme in T-lymphocyte P h a m . Phamacol. 1992,44,391-395. activation, Nature 1992, 357, 695-697. 77. B.H. Kwok, B. Koh, M.I. Ndubuisi, E.J. Brown, M.W. Albers, T.B. Shin, M. Elofsson, C.M. Crews, The K. Ichikawa, C.T. Keith, W.S. Lane, anti-inflammatory natural product S.L. Schreiber, A mammalian protein parthenolide from the medicinal herb targeted by G1-arresting Feverfew directly binds to and inhibits rapamycin-receptor complex, Nature IkappaB kinase, Chew. B i d . 2001, 8, 1994,369,756-758. 759-766. 1. Mann, Natural products as immunosuppressive agents, Nat. Prod. Rep. 2001, 18, 417-430.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
3 Engineering Control Over Protein Function Using Chemistry
3.1 Revealing Biological Specificity by Engineering Protein- Ligand Interactions
Matthew D. Simon and Kevan M . Shokat
Outlook
Protein function can be altered in a rapid and graded manner through small molecule ligand binding in both natural systems and through drug design. In natural systems evolutionary pressure can lead to accumulation of mutations that influence ligand binding specificity, thereby altering protein function. Similarly, in the laboratory, mutations that have well defined effects on a protein’s ligand specificity can provide a functional handle to elucidate the protein’s biological role. Here we explore examples of mutations, introduced in the laboratory or found in nature, that cause significant changes to protein ligand specificity, with an emphasis on the biological and biochemical lessons learned from these studies. The examples described here illustrate both the challenges and the power of engineering protein-ligand interactions in order to elucidate a protein’s biological role.
3.1 .I Introduction
The exquisite specificity observed in biological systems emerges from the composite specificity of interactions at the molecular level. Understanding the mapping between molecular interactions and their functional consequences is the aim of molecular biology. While it is common to characterize biochemical activities of a protein i n vitro, identifying the biological importance of these activities in a complex environment such as a cell extract, an intact cell, or even an entire organism, remains a daunting task. Genetic approaches provide Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited bv Stuart L. Schreiber. Tarun M. Kauoor. and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag G k b H & Co KGaA, Weinheirn ISBN 978-3-527-31150-7
116
I powerful means to investigate these biological activities (e.g., observing the 3 Engineering Control Over Protein Function Using Chemistry
phenotype that results from a gene disruption). However, protein engineering can provide complementary information that connects the biochemical specificity of a protein to its functional role. Here we discuss examples where protein-ligand interactions can be engineered to provide a specificity handle that can in turn be used to link a molecular interaction to a biological result. In these experiments a protein is mutated to alter its ligand Specificity. The resulting engineered protein-ligand interaction is then used to infer the role of the unmodified protein in the biological system. The success of this strategy requires that we specifically engineer protein-ligand interactions. How feasible is it to alter a protein’s ligand specificity in a well-defined manner? Are mutations that change the ligand specificity of a protein common or rare? Are mutations that alter the specificity of small-molecule binding more or less likely to destroy other functions or properties of the protein such as its catalytic activity, stability, or cellular localization? From all the potential mutations at a protein-ligand interface, what strategy do we use to identify the productive mutations that have a desired effect on protein-ligand interactions? Molecular evolution in nature provides inspiration to help answer these questions. While the mechanistic details accounting for the success of natural molecular evolution are distinct from the practical details governing protein engineering, there are similarities that are worth elaborating. In particular, molecular evolution in nature demonstrates that a small number of mutations is often sufficient to cause dramatic changes in the ligand-binding properties of a protein. Similarly, in protein engineering, a single point mutation is often enough to provide a specificity handle that allows a protein to be uniquely sensitive or uniquely resistant to a small-molecule ligand. By keeping engineered changes to the protein simple, the potential to rationally engineer proteins is increased, and the chances ofother adverse effects are minimized. In fact, many engineering strategies based on individual mutations are essentially indistinguishable from natural strategies found throughout evolution. Here we discuss several such examples. 3.1.2 The Selection of Resistance Mutations to Small-molecule Agents
3.1.2.1
HIV Protease Inhibition and Substrate Selectivity
Drug resistance mutations are common in patients treated with anti-HIV compounds such as indinavir and nelfinavir. These drugs act by inhibiting HIV protease (HIV PR), one of the essential HIV proteins required for viral growth and infection. These drugs inhibit HIV PR by competing with the peptide substrate to bind in the active site of HIV PR (see Fig. 3.1-1). The rapid emergence of inhibitor resistance is caused, in part, by the low fidelity of
3.1 Revealing Biological Specificity by Engineering Protein-Ligand Interactions I 1 1 7
lndinavir
Fig. 3.1-1
NeIfi navir
HIV PR bound to a NC-pl peptide substrate [3] (a) and nelfinavir (b) (41.
the HIV-reverse transcriptase. An these experiments a protein is mutated to alter its ligand specificity. The resulting engineered protien-ligand interaction is then used to infer the role of the unmodified protein in the biological system. For example, nelfinavir is a potent inhibitor of the wild-type HIV PR (Ki = 0.28 nM) However, a double mutant of HIV PR (V82F/I84V) that has
118
I been observed in patients causes the virus to become refractory to nelfinavir 3 Engineering Control Over Protein Function Using Chemistry
= 86) [I]. Given that the inhibitors mimic the protease's natural peptide substrate, it is perhaps surprising that HIV PR mutants can overcome inhibition without losing a critical level of enzymatic activity". The most common resistancecausing mutations are found in close proximity to the substrate-binding pocket. Given that the inhibitor-binding surface largely overlaps with the substrate-binding surface, how do these mutations disrupt inhibitor binding while retaining substrate recognition? Structural analysis reveals that the inhibitors tend to penetrate deeply into the same pockets that the protease uses to bind the side chains of its substrates - in fact, the inhibitors tend to penetrate more deeply into the pockets than the substrates themselves. Therefore, mutations in the protease can disrupt the deepest inhibitor contacts while having a smaller effect on substrate binding [2]. Indeed, it appears that the majority of the characterized inhibitor-resistance mutations work by disrupting these deep inhibitor contacts, thereby selectively disrupting the binding of one ligand (the inhibitor) without affecting another ligand (the substrate). While many of the characterized HIV PR mutants do not substantially alter the protease substrate specificity there are other resistance-causing mutations that do, the best characterized of which is V82A. When the in vitro substrate specificity of the V82A mutant (inhibitor resistant) was compared with the wild-type (inhibitor sensitive) strain, the V82A-containing enzyme was found to have a statistically significant increase in activity for Val over Ala at the P2 position of the substrate (P2 is the second amino acid N-terminal to the scissile bond) [S]. So, in this case, mutations in HIV PR were selected, that disrupt the inhibitor-binding surface, but in doing so, the substrate specificity of the protease was also affected. So how does the virus accommodate this change in specificity? HIV PR cleaves several substrates during viral development. Among these sites, cleavage of the nucleocapsid-pl (NC-pl) site is rate limiting to viral maturation. The occurrence of the V82A mutation in HIV PR correlates with an alanine-to-valine mutation at the NC-pl cleavage site. In other words, it appears that, under the pressure of selection caused by the HIV PR inhibitor, a HIV PR mutant (V82A) was selected with alterations to the inhibitorbinding site, thereby changing the substrate specificity at P2. Along with the altered substrate specificity at P2 came compensatory mutations in one of the substrate sequences (Ala-to-Valat P2). Residue V82 does not make direct van der Waals contact with the P2 side chain. Rather, incorporation of an Ala-to-Val mutation at P2 generally increases the quality of fit between the substrate and the enzyme thereby compensating for loss of substrate-enzyme contacts at V82 [3]. This structural difference explains why the V82A mutation (K,rnut/K,wt
1) The V82FII84V mutant HIV PR is func-
tionally active - a vims with these mutants is viable - yet the catalytic efficiency of the
mutant (kcat/KM = 0.5 mM sc') is compromised relative to the wild-type enzyme ( k c a t / K ~= 30 mM-' s -') 111.
3. I Revealing Biological Specificity by Engineering Protein-Ligand lnteractions
in the enzyme and the Ala-to-Val mutation in the substrate are found to coevolve. There are at least three lessons from HIV PR inhibitor resistance. First, relatively few mutations are often sufficient to induce inhibitor resistance, and in many cases a single point mutation is sufficient. Interestingly, several mutations allow HIV to overcome inhibitor sensitivity demonstrating that there are numerous solutions to the same engineering problem. While the mutations are focused in regions that directly contact the inhibitor, as we might expect, some are sufficiently subtle (e.g., acting through slight rearrangements of the protein core) that it is hard to imagine predicting similar mutations while attempting to rationally engineer a protein. Second, relatively few mutations may be necessary to convergently engineer a protein and its substrate - in this case natural selection led to a HIV PR mutation (I82A) that changed its substrate selectivity and a compensatory change in one of its substrates. While the first two lessons are encouraging for the purposes of engineering proteins with altered specificities, the third lesson is largely cautionary: protein functions can be intimately interconnected. In at least one case, altering the inhibitor surface of HIV PR affected the substrate specificity of the mutant proteases. For this reason, engineering projects that intend to dissect individual functions of a given protein must also take care to control other unintended changes to the protein function. For example, it is common that engineering a protein will adversely affect its stability or activity. This natural example demonstrates the feasibility but also the challenges of mutating a protein to alter its ligand specificity using only a small number of mutations.
3.1.2.2
Identification ofthe Target o f Rapamycin
While the emergence of HIV strains resistant to HIV PR inhibitors presents a serious medical challenge, there are cases where the development resistance mutations to an inhibitor can be invaluable, particularly when the mode of action of an inhibitor is unknown. Such was the case with the smallmolecule immunosuppressant rapamycin [GI.The natural product rapamycin became the subject of intensive study after it was demonstrated to block helper T-cell activation through an unknown mechanism. Indeed, this is a common problem with small-molecule agents; although it is straightforward to isolate compounds that cause interesting phenotypes, identifying the phenotypically relevant targets of the molecule can be challenging. Similarly, once a putative target is identified, it can be difficult to establish whether inhibition of that target is sufficient to cause the biological effect or whether other targets may also contribute to the observed phenotype. In the case of rapamycin, several groups conducted research to determine the specific underlying biochemical interactions that lead to the phenotypic effects of rapamycin.
1
119
120
I
3 Engineering Control Over Protein Function Using Chemistry
Me,,
Rapamycin
Finding the binding partners of a small molecule is one common approach in target identification.Attempts to identify the physiologically relevant cellular targets of rapamycin led to the observation that rapamycin binds tightly to the abundant peptidylprolyl rotamase, FK506 binding protein (FKBP).While binding to FKBP appeared to be important for the activity of rapamycin, several lines of evidence suggested that binding and inhibiting FKBP is not sufficient to account for the cellular activity of rapamycin. For example, rapamycin is toxic to yeast, yet strains lacking FKBP are viable; since FKBP is not essential, its inhibition would not be expected to cause toxicity. Intriguingly, however, yeast lacking FKBP are insensitive to rapamycin. This and other lines of evidence [G, 71 lead to the subsequent realization that rapamycin binds FKBP and that the FKBP-rapamycin complex then targets other cellular factors; it is these other cellular factors that are responsible for the specific cellular activity of rapamycin. Focusing on yeast, a genetic screening was done to identify mutations that conferred rapamycin resistance [8]. To accomplish this, yeast cells were mutagenized and rapamycin resistant strains were identified. Some of the mutations recoveredlocalized to FKBP, as would be expected (see Fig. 3.1-2(b)). These mutations are recessive, consistent with the role of FKBP as an accessory protein; even in the presence of the mutant, the wild-type copy is sufficient to form the active rapamycin-FKBP complex. In addition to the recessive FKBP mutations, two other proteins were implicated in rapamycin activity, TOR1 and TOR2 (for target of rapamycin). Unlike the FKBP mutations, it was found that TOR mutations had dominant effects, suggesting that these TOR proteins (later identified as protein kinases related to the lipid PI3 kinase), are the relevant targets of rapamycin responsible for cellular activity (see Fig. 3.1-2(c)).Indeed, the mutations in the TOR proteins that cause complete rapamycin resistance have been shown in vitro to block the binding and,
3.7 Revealing Biological Specijcity by Engineering Protein-Ligand Interactions
Fig. 3.1-2 Mechanism of rapamycin inhibition and resistance. (a) Rapamycin inhibits TOR through an FKBP-rapamycin complex. (b) Resistance mutations in FKBP
lead to loss of inhibition. (c) Dominant resistance mutations in TOR prevent FKBP-rapamycin binding and inhibition.
therefore, inhibition of TOR by the FKBP-rapamycin complex. Furthermore, although the initial identification of TOR was performed in yeast, several studies demonstrated that a human homolog of TOR is responsible for the immunosuppressive activity of rapamycin. In fact, that the mammalian TOR deserves its name can be demonstrated using a similar line of experimentation; the mutation in mammalian TOR analogous to one of those discovered in yeast (S1975I) confers rapamycin resistance to mammalian cells. In summary, mutations in a protein that caused small-molecule resistance were used to map the phenotype induced by the small molecule to its functionally relevant biochemical targets. Specifically, rapamycin acts by binding the abundant protein FKBP. The resulting small-molecule protein complex subsequently binds to and inhibits the TOR proteins leading to the observed cellular effects of rapamycin. This seemingly complicated mechanism of action is similar for immunosuppressants FK506 and cyclosporin A (FK506 binds to FKBP and the complex inhibits the phosphatase calcineurin; cyclosporin A binds cyclophilin and the resulting complex also inhibits calcineurin). In the case of rapamycin, it was possible to demonstrate that TOR is the target using a dominant mutant of TOR that is resistant to FKBP-rapamycin inhibition. These resistance mutations are the single most definitive means of demonstrating the phenotypically relevant target of a small molecule. Unfortunately, the availability of resistance mutations can be limiting; attempts to engineer such a mutation may adversely affect the function of the protein
I
121
122
3 Engineering Control Over Protein Function Using Chemistry
I (aswas demonstrated by the altered substrate specificity of the V82A mutant of HIV PR). Similarly, isolating resistance mutations from genome-wide screens (as was the case with TOR) is only tractable in organisms such as yeast that are conducive to genetic manipulation. Nonetheless, the use of resistance mutations to uncover and prove the functionally relevant targets of an inhibitor is a powerful and definitive experiment.
3.1.2.3
Kinase Inhibitors and Resistance
While inside the laboratory, whole genome screens (enabled by organisms amenable to genetic manipulation) has made possible the identification of resistance mutations, outside the laboratory similar screens are inadvertently taking place in the real lives of cancer patients who are treated with antineoplastic drugs. The ability to search for increased gene copy number of known oncogenes and loss of heterozygosity at tumor suppressor loci, the development of array-based comparative genomic hybridization for identification of translocation events, and, most relevant here, the ability to carry out high throughput DNA sequencing of candidate resistance genes have allowed identification of numerous molecular markers of resistance to chemotherapeutics. Many resistance loci are associated with increases in the cancer cell’s ability to pump out the antineoplastic agent, such as drug efflux pump mutants. These resistance mechanisms are independent of the targeting agent, causing resistance to cis-platinum, doxorubicin, and other general antiproliferative agents. Resistance mechanisms to molecularly targeted therapeutics in contrast provide discreet insights into the mechanism of action of these new generation antineoplastics.
BAY 43-9006
lmatinib
3.1 Revealing Biological Specificity by Engineering Protein-Ligand Interactions
The prototype molecularly targeted therapeutic agent is imatinib, an inhibitor of the Bcr-Abl tyrosine kinase. This oncogenic kinase is produced by translocation of the Bcr locus on chromosome 9 to the c-Abl tyrosine kinase on chromosome 11, termed the Philadelphia chromosome because of its discovery in 1960 at the University of Pennsylvania School of Medicine by Peter Nowell and David Hungerford from the Institute for Cancer Research [9].It was later demonstrated in 1973 by Janet Rowley that the Philadelphia translocation was responsible for a specific form of leukemia, chronic myelogenous leukemia (CML)[lo].In 2001, imatinib was approved for treatment of CML patients, and produced remarkable results with more than 92% patients achieving 14-month progression-free survival on imatinib as a monotherapy. The importance of imatinib in demonstrating the efficacy of a smallmolecule tyrosine kinase inhibitor for cancer therapy is its broad implication for molecularly targeted therapeutics. First, it discounted the notion that protein kinases could not be targeted selectively by small molecules that bind to the ATP-binding site. In particular, the ATP-binding pockets of different protein kinases were thought to be too similar for small molecules to discriminate between them, yet imatinib only targets a handful of kinases (the known targets ofimatinib include Bcr-Abl,c-Abl, PDGFR, and c-Kit).Also, the high concentration of cellular ATP (>1 mM) was expected to limit the potency of ATP-competitive drugs, yet imatinib is a potent inhibitor (IC50 < 1 pM). It was also believed that the side effects associated with inhibition of wild-type kinases (such as c-Abl) would be prohibitive, yet imatinib causes no overt toxicity in normal cells while inducing apoptosis in CML leukemia cells. Second, because of its ability to target Bcr-Abl expressing tumors, patients could be classified into potential responders based on their Philadelphia chromosome status. This genomic prescreening for responder populations is widely viewed as a major avenue for improvement of therapeutic efficacy, minimization of unnecessary toxicity in nonresponder populations, and heralds the era of personalized medicine. A third and more cautionary lesson from imatinib has been the rapid emergence of imatinib resistance in CML patients. Initially, the advanced stage CML patients, those in so-called blast crisis stage, who received imatinib late in disease, showed high rates of resistance. Currently, all CML patients are given imatinib upon diagnosis, and thus the rate of emergence of resistance is slower, although still a major challenge to these patients’ long-term survival. The molecular mechanism behind imatinib resistance mirrors its molecular mechanism of action. Bcr-Abl gene duplication as well as transcriptional mechanisms leading to increases in Bcr-Abl transcript levels can lead to imatinib resistance. Thus, the Bcr-Abl inhibition exerts selective pressure on CML tumors to increase Bcr-Abl signaling, which is manifest by upregulation of Bcr-Abl messenger RNA. Another common mechanism of resistance is the mutation of the Bcr-Abl kinase ATP-binding pocket in which imatinib binds [Ill. The mutation in the ATP-binding pocket produces a Bcr-Abl protein kinase, which can carry out ATP-dependent substrate phosphorylation but
I
123
124
I cannot be inhibited by imatinib. Strikingly,the cancer has identified selectivity 3 Engineering Control Over Protein Function Using Chemistry
determinants for imatinib binding, which do not affect ATP binding. One particular mutation, T3151, is most frequently identified in imatinib resistant tumors and serves as an illustration of how a single point mutation can exquisitely control ligand selectivity (see Fig. 3.1-3). The amino acid at
Fig. 3.1-3 The crystal structure of imatinib bound t o Abl kinase [12]. The gatekeeper residue (T315, colored red) packs tightly against imatinib (PDB: 1 IEP).
3. I Revealing Biological Specificity by Engineering Protein-Ligand Interactions
position 315 of Bcr-Abl makes contact with the exocyclic amine of ATP and, thus, lines the adenine-binding pocket of the kinase. The ATP-binding pocket of most protein kinases is larger than necessary for binding ATP, especially in the vicinity of the exocyclic amine of ATP. Thus, a large hydrophobic pocket adjacent to adenine is available for small-molecule inhibitor binding. Importantly, the size of the amino acid residue at position 315 controls access to this extra pocket, and thus it has been termed the gatekeeper residue. In the T315I mutant Bcr-Abl kinase, imatinib cannot access the hydrophobic pocket because the larger isoleucine residue blocks its access. Since the bulkier isoleuciiie occupies a pocket not used by substrate ATP, the T315I mutant is still able to efficiently bind ATP and catalyze phosphotransfer reactions. As the predominance of imatinib resistance mechanisms can be traced to Bcr-Abl functional upregulation, the clinical resistance offers another proof of mechanism akin to the genetic screen which identified TOR as the target of rapamycin discussed in Section 3.1.2.2. In the former case imatinib was more or less designed to be a Brc-Abl inhibitor, thus its target was known from the outset of the clinical trial. In the case of rapamycin, a genetic screen to identify its target(s) was carried out to identify the molecular basis for its effect on immune suppression. In an amalgam between these two paradigms for target identification and clinical efficacy, a B-Raf inhibitor BAY43-9006 displayed disappointing efficacy in clinical trials of myeloma patients, despite the identification of activating mutations in B-Raf, in this form of cancer. Luckily, BAY43-9006 was also used in clinical trials of other cancer types, where it showed surprising efficacy in the treatment of renal cancer, which is thought to be particularly dependent on vascularization. Subsequent biochemical studies demonstrated that BAY43-9006, which was originally thought to be a highly specific B-Raf inhibitor, is a potent inhibitor of the vascular endothelial growth factor receptor (VEGFR),providing a post fucto rationale for its efficacy in this VEGFR-dependent cancer type [13]. In another case of small-molecule assisted target identification, the imatinib response of patients with idiopathic hypereosinophilic syndrome lead to the identification of a chromosomal rearrangement involving the tyrosine kinase, and the known imatinib target, PDGFR, as a likely cause of this syndrome [14]. The link between the PDGFR fusion and hypereosinophilic syndrome was further strengthened when, after extended imatinib therapy, a relapse in one patient was observed to correlate with the emergence of a T674I mutation in PDGFRA. T674 is the gatekeeper residue in PDGFRA. Similarly, imatinib has been found to be a useful therapy for gastrointestinal stromal tumors (GIST)which is driven by the c-Kittyrosine kinase, a previously known “off-target’’ of imatinib when it was being developed as a Bcr-Abl inhibitor. Again, resistance to imatinib in GIST patients has emerged and c-Kit ATP-binding site mutations to the gatekeeper residue (T670I) is commonly found [ 151. The lessons learned from irnatinib, BAY-43-9006suggest that cancers can be uniquely dependent on the catalytic activity of a single kinase. Moreover,
I
125
126
I because of the highly conserved nature of the kinase ATP-binding pocket, 3 Engineering Control Over Protein Function Using Chemistry
drug candidates always inhibit multiple family members. In some cases, offtarget effects will lead to new medicines (BAY43-9006).In some other cases of course, off-target effects will lead to toxic side effects, and will predictably lead to failures of clinical trials. Moreover, because a single amino acid in the binding pocket of kinases, the gatekeeper residue, can control inhibitorbinding specificity, resistance to these drugs has emerged quickly in cancer patients. A central challenge in all therapeutic areas is to identify key kinase targets for the treatment of the signaling defects in human diseases.
3.1.3 ExploitingSensitizing Mutations to Engineer Nucleotide Binding Pockets 3.1.3.1
EngineeringUniquely lnhibitable Kinases
One approach for determining the function of every protein kinase in the genome is to develop a highly selective small-molecule inhibitor of each kinase. The challenge in achieving high specificity is daunting since over 500 kinases are present in the human genome, containing highly similar ATP-binding pockets. Our laboratory has addressed this specificity problem by using protein engineering to target a kinase inhibitor to any kinase of interest. In fact, this is the inverse of the problem discussed in Section 3.1.2.3, the generation of imatinib resistant alleles (T315I) Bcr-Abl. Rather than creation of an inhibitor resistant allele, the approach to discovery of an inhibitor of any protein kinase is to create a uniquely sensitive kinase allele, which will be inhibited by a molecule that does not inhibit any wild-type protein kinase.
Me
PPl
1NM-PP1
This is achieved by mutation of the gatekeeper residue in the wild-type kinase to a small alanine or glycine residue. Importantly, there are no human, mouse, worm, fly, or yeast kinases with an alanine or glycine gatekeeper residue, making the mutant kinase unique. A pyrazolopyrimidine-based
3.1 Revealing Biological Specificity by Engineering Protein-Ligand lnteractions
Fig. 3.1-4 The structure of kinase inhibitor PP1 bound t o the ATP-binding pocket o f Hck kinase. The gatekeeper residue (the surface ofwhich is colored red) packs tightly against the tolyl substituant of PP1 [16] (PDB: 1QCF).
inhibitor was designed (based on the parent inhibitor PPl), which is only capable of inhibiting kinases containing a glycine or alanine gatekeeper residue. Importantly, the kinases with the smallest naturally occurring gatekeeper residues, serine and threonine, are not inhibited by 1NM-PP1 (Fig. 3.1-4).It is interesting to note that the gatekeeper residue was selected on the basis of structural models of kinase-ATP crystal structures and docking models of pyrazolopyrimidine-based inhibitors prior to the discovery of the gatekeeper mutations in imatinib resistant CML patients. The fact that gatekeeper mutations can be used to confer inhibitor sensitivity through rational design and inhibitor resistance through natural selection processes highlights that this residue is a dominant feature controlling small molecule access to the ATP-binding pocket without affecting kinase activity. 3.1.3.2
Analog-specific Kinases
The enzymatic function of protein kinases is carried out by phosphorylation of serine, threonine, or tyrosine residues on target proteins. As an estimated 30% of human proteins are thought to be phosphorylated, identification of the direct substrates of all human protein kinases is a daunting challenge. Although a wide range of methods have been developed for isolating the
1
127
128
I
3 Engineering Control Over Protein Function Using Chemistry
4-(03P)30 OH OH
ATP
OH OH N6-Benzyl ATP
phosphoproteome, critical information about the kinase or kinases responsible for a given phosphorylation event are not provided by phosphoproteomics. To directly label and identify the targets of each kinase in the genome, kinases can be engineered to accept surrogate phosphodonors that are not accepted by any wild-type protein kinases. These N6-substituted ATP analogs, most commonly N6-benzyl ATP, are accepted by kinases containing an alanine or glycine gatekeeper residue. The N6-benzyl ATP accepting oncogenic tyrosine kinase (1338G) v-Src has been the best characterized analog-specific protein kinase. Several critical design criteria must be satisfied by an engineered kinase, for it to be useful in studying kinase-signaling pathways. First and foremost, the substrates phosphorylated by the mutant kinase must be identical to those phosphorylated by the wild-type protein. Three lines of evidence suggest that mutation of the gatekeeper residue does not alter substrate specificity. First, using combinatorial peptide substrates, wild-type Src and (1338G) Src protein kinases exhibit identical sequence specificity patterns [17]. Second, using a cellular transformation assay, v-Src and I338G v-Src produce equivalent levels of anchorage independent cell growth, confirming that the cellular targets phosphorylated by the mutant are able to fully recapitulate the wild-type kinase-induced phenotype [18].Lastly, at the structural level, the crystal structure ofthe mutant Src (T338G c-Src, see Fig. 3.1-5) shows no rearrangements in the kinase domain in the phosphoacceptor binding pocket. In fact, the cocrystal structure with NG-benzylADP shows that the nucleotide binding is unchanged from that of the ADP/c-Src cocomplex. Thus, available biochemical, genetic, and structural evidence suggests that the mutation of the gatekeeper residue in the Src kinase exhibits very limited change to the function of the kinase, while allowing the use of inhibitors or ATP analogs for the study of Src. Currently, over 30 protein kinases from yeast, mouse, humans, Arabidopsis, and tomato have been successfully engineered for substrate labeling or inhibitor development. 3.1.3.3
From CTPases to XTPases
Given the convergence between the resistance mutations found in cancer and the mutations used to engineer orthogonal kinase ligands, it is reasonable
3. I
Revealing Biological Specifrcity by Engineering Protein-Ligand Interactions
Fig. 3.1-5
N6-benzylADP is shown bound i n the ATP-binding pocket o f t h e analog-sensitized Src kinase (PDB: 1 KSW) (Ref.: Witucki, LA et al., Chem Biol, 2002 19, 25-33).
to consider the gatekeeper residue particularly amenable to engineering. But the gatekeeper residue is not alone. In fact, the strategy to engineer orthogonal kinase ligands is the descendant of a similarly successful strategy to engineer orthogonal nucleotide specificity into the nucleotide binding pocket of GTPases. This mutation was discovered by Hwang et al. while dissecting the GTP-binding pocket of EF-Tu, a GTPase essential for ribosome function in Escherichia coli [19]. Introducing an aspartate to the asparagine mutation (D138N) disrupted the hydrogen-bonding interactions between GTP and the GTPase, thus impairing the GTPase activity of the protein. Remarkably, using XTP as substrate rather than GTP, restored hydrogen bonding (now reversed, see Fig. 3.1-6) and the activity of the GTPase-turned-XTPase was nearly identical to the wild-type enzyme. Therefore, this mutation allows the construction of an orthogonal nucleotide specificity (the XTPase accepts only XTP; the GTPase only GTP). This engineered GTPase was particularly useful for dissecting the GTP requirements of the E. coli ribosome. In vitro translation experiments had established that two GTPases are necessary for each round of amino acid addition to a growing polypeptide. EF-G (one of these two GTPases) is responsible for the translocation of the peptidyl-tRNA from the A site to the P site of the ribosome. The other GTPase involved in this process is EFTu - the GTPase previously engineered into an XTPase by Hwang et al. The
I
129
130
I
3 Engineering Control Over Protein Function Using Chemistry
GTP
OH OH
Fig. 3.1-6 CTPases contain a conserved aspartate that hydrogen bonds to the guanine ofCTP. An aspartate t o asparagine mutation changes the nucleotide specificity from GTP to XTP by altering these hydrogen bonds.
role of EF-Tu is to ensure proper binding of the appropriate aminoacyl-tRNA to the ribosome (Fig. 3.1-7). Because the D138N EF-Tu nucleotide specificity is orthogonal to wild-type EF-G, Weijland and Parmeggiani were able to use this mutant, radiolabeled nucleotides (either XTP or GTP) to quantitate the nucleotide consumption of each protein during the translation cycle [20, 211. From this work it was established that, for every amino acid incorporated into a growing peptide chain, EF-Tu (D138N) consumes two molecules ofXTP and EF-G (wt) consumes one molecule of GTP. At the time when Miller et al. developed the GTPase-to-XTPase mutation in EF-Tu, they proposed that, because this mutation is in a highly conserved loop shared by most GTPases, the D138N mutation should be applicable to endow other GTPases with XTPase activity. This proposal has proven remarkably accurate; numerous GTPases have been converted into XTPases using this strategy [22]. 3.1.4 Engineeringthe Ligand Selectively of Ion Channels
3.1.4.1
Resistance Mutations in L-type Calcium Channel Signaling
For kinases and GTPases, point mutations can be used to study one member of a large family by allowing the engineered member to bind to a unique
3. I Revealing Biological Specfcity by Engineering Protein-Ligand interactions
Fig. 3.1-7 The crystal structure o f EF-Tu bound to a nonhydrolyzable CTP analog shows Asp138 hydrogen bonding t o guanine. (PDB: 1 EXM).
ligand or substrate. An alternative means of isolating the activity of a single protein in a family is to engineer the protein of interest to be uniquely resistant to a general inhibitor. This way, the activity of the protein can be unmasked by inhibiting all the other family members. The function of L-type calcium channels was dissected in this manner. Voltage-gatedcalcium channels play an important role in neuronal signaling. While there are several different types of voltage-gated calcium channels, they share a common activity: allowing an influx of calcium into the cytoplasm upon activation. Despite this commonality, calcium influx from different types of channels is not equivalent; L-type calcium channel specific blockers diminish calcium dependent CAMP-responseelement binding (CREB) protein phosphorylation and activation of the MAP kinase pathway while N- and P/Q-type channel blockers have little-to-no effect. This and other differences led to the proposal that calcium signal can act locally. For example, L-type calcium channels may have the means of directing the entering calcium to affect signaling molecules positioned near the channel. These signaling molecules may then activate other signaling pathways (such as the MAP kinase pathway). Testing this hypothesis requires a means of isolating the role of calcium influx through L-type calcium channels from the role of calcium influx from other types of voltage-gated calcium channel. This feat
I
131
132
I was accomplished using a mutant L-type calcium channel that is resistant to 3 Engineering Control Over Protein Function Using Chemistry
nimodipine, a dihydropyridine (DHP) antagonist of L-type calcium channel activity. A dihydropyridine-resistant L-type calcium channel was identified while trying to map the DHP-binding site [23]. Initially, the binding site was probed using photoaffinity labels and chimeric channels. These studies implicated a specific region as responsible for DHP binding. Site-directed mutagenesis in this region identified several mutations that altered DHP sensitivity. One mutation, in particular, TlOOGY, was shown to be resistant to antagonism by a DHP. The agonist binding to the mutant channel was dramatically decreased, as demonstrated in a radioligand-binding assay. That this effect might be caused by nonspecific disruption of the channels structure was ruled out by demonstrating that channel activation and inactivation were not affected by this mutation. Therefore, biochemical and electrophysiological evidences suggest that this mutant channel is similar to the wild-type channel with the exception that the mutant is resistant to DHP antagonists. In neurons, the TlOOGY mutant channel's activities can be distinguished from that of the endogenous channel by treating the cells with nimodipine, thus blocking the wild-type copy and revealing the activity of the transfected mutant [24]. Upon membrane depolarization in the presence of nimodipine, the mutant channel rescues the Ca2+ influx and other downstream signaling pathways including CREB phosphorylation and the stimulation of the MAP
Fig. 3.1-8 The activity of an exogenenous
nimodipine, the endogenous, wild-type L-type calcium channel was dissected using channel (blue) is blocked and the activity of a mutation that effects nimodipine the mutant channel (green) i s revealed. resistance (T1006Y). In the presence of
3. I Revealing Biological Specificity by Engineering Protein-Ligund lnteructions
kinase pathway (Fig. 3.1-8). Thus, the DHP-resistant T1006Y mutant L-type calcium channel provides the specificity handle necessary to dissect the activity of L-type calcium signaling. For example, this TlO06Y channel was instrumental in the identification of a calmodulin-binding site on the Cterminus of the channel. This binding site provides insight as to how L-type calcium channel signaling can use local Ca2+ influx to interface specifically with other cellular signaling pathways.
3.1.4.2
Capsaicin Sensitivity
Similar to the engineering of DHP-resistant mutant calcium channels, there are natural examples of the emergence of uniquely resistant channels. One example comes from the small-molecule capsaicin, the component of hot chili peppers that induces the sensation of burning pain. Capsaicin accomplishes this effect by binding to and opening the VR1 cation channel found in nerve endings, including the mouth. That we consider chili peppers “hot” is not arbitrary - the VR1 channel is also responsible for recognition of noxious stimuli including heat (>43 “C) and acid [25].
Capsaicin
It has been proposed that capsaicin serves chili peppers by selectively deterring predators. Birds, productive vectors for seed dispersion, do not respond to capsaicin. In contrast, mammals are predatory but are deterred by the capsaicin (with the exception of humans) [26]. The molecular basis of the differential capsaicin sensitivity between birds and mammals can be traced to VR1 [27]. The avian homolog of VR1, like its mammalian counterpart, is responsive to heat but unlike its mammalian counterpart, avian VR1 does not respond to capsaicin. The chicken VR1 ortholog (capsaicin insensitive) was compared with the rat V R l (capsaicin sensitive) and chimeric channels were used to identify sites on the chicken VR1 sufficient to render rat VR1 capsaicin, insensitive. When a short stretch of the rat VR1 channel in the third transmembrane spanning region (presumably at the capsaicin-binding site, although there are no high resolution structures of the VR1 channel) is substituted with the chick sequence, the mutant channel is rendered capsaicin insensitive. Using this chimera as inspiration, it was possible to find individual point mutations sufficient to render the rat channel capsaicin insensitive while having only a
1
133
134
I modest impact on the channel's response to heat and acid. Interestingly, the 3 Engineering Control Over Protein Function Using Chemistry
best resistance-inducing mutations were the unnatural ones found in neither receptor; the use of these natural differences serves as an excellent guide but, as with many of the examples above, it is often necessary to test a panel of mutations before a productive mutant is found. Perhaps more remarkable than the ability to use the differences between chick (insensitive) and rat (sensitive) to construct a mutant insensitive rat VR1, was the use of the rat receptor to guide the construction of a capsaicinsensitive chick receptor. Building the binding pocket required more than a point mutation; the active construct borrowed 45 amino acids from the rVR1 inserted into the correct position in the cVR1. Essentially, the molecular basis of this selective deterrence causing birds, but not mammals, to consume chili peppers is explained by a biochemical change in ligand specificity, induced by a few amino acids in mammal versus avian VR1.
3.1.5 Conclusion 3.1.5.1
Challenges in Protein Engineering
We have presented several natural and synthetic examples of the alteration of protein-ligand interactions. Several other examples exist and have been reviewed elsewhere [28-311. While the utility of altering ligand specificity is clear, protein engineering remains challenging. Even for the successful examples presented here, the mutant proteins frequently suffer some level of compromised function. For example, the space-creating mutation in the ATP-binding site of Cdkl, an essential yeast kinase involved in the regulation of cell cycle progression, has a substantial impact on the KM of the kinase for ATP = 35 pM, KM,,,~ = 320 pM) [32]. In this case, the compromised KM does not significantly impact the utility of the engineered kinase because the high cellular concentrations of ATP (>1mM) are substantially above the KM for both the wild-type kinase and the mutant. This and other similar concerns can be addressed by using one of the great advantages of convergent engineering strategies: the activity of the mutant can always be compared to the activity of the wild type, both with and without ligand (see Table 3.1-1). Because of these controls, unintended changes to the function of the protein or the ligand can be dissected. In the case of the analog-sensitized Cdkl, the mutant compares favorably with the temperature-sensitive mutant that had previously been used to dissect the function of this kinase. Specifically, this mutant kinase has been used with INM-PP1 to demonstrate the role of Cdk1 in the G2/M transition [32] and with ~ - ~ * P - l a b e lNG-benzyl ed ATP to identify numerous substrates of this kinase [33].Even when the reengineered mutants do not match the function of the wild type perfectly, they can still serve as useful tools.
3. I Revealing Biological Spec9city by Engineering Protein-Ligand Interactions Table 3.1-1 Controls available when using orthogonally engineered protein-ligand interactions to study the biological function of a protein Without ligand
With ligand
Wild type
Reference state
Mutant
Control for the effect of the mutation
Control for the off-target effects of the ligand Experimental condition to probe the functional consequences of the protein-ligand interaction
But sometimes the engineered mutations have a substantial impact on the activity of the protein. For example, while the GTPase-to-XTPase mutation described in Section 3.1.3.3 has been general for most of the GTPases studied, attempts to use the Asp-to-Asn mutation to study G-protein coupled receptor (GPCR) signaling through Go, were initially unsuccessful because the mutation (D273N) compromises nucleotide binding and GTPase activity of these G-proteins [34]. In this case, it was possible to rescue the activity of the mutant G-protein using an additional mutation (Q2SOL) that resides on the other side of the GTP-binding pocket from D273. The discovery of this mutation was apparently serendipitous; Q250L mutants are usually GTPase deficient. Similarly, the space-creating mutations used to study kinases (see Section 3.1.3.1) occasionally compromise the activity of a kinase severely. In several cases, it has been possible to identify second-site suppressor mutations that rescue the activity of the mutant kinase [35]. In light of the natural examples we have presented above, perhaps this level of feasibility is to be expected; within the set of single mutants of a given protein there appears to be significant functional diversity in ligand-binding activities. The best mutations are sometimes, but not always, easy to rationalize. While using rational strategies to identie productive mutations undoubtedly enriches the chances of finding mutants with the desired activities, testing several mutations is likely necessary. Nonetheless, both the natural and synthetic examples above illustrate that reengineering a protein’s ligand specificity is a tractable problem.
3.1.5.2
Engineering Extended Biomolecular Interfaces
While this chapter has focused on the engineering of selectivity for smallmolecule ligands, primarily using single mutations, a similar strategy would clearly be useful for studying the biological specificity of larger interfaces if the reagents were available. Toward this end, several studies have attempted more ambitious engineering projects to redesign large regions of protein interfaces. For example, computational approaches were instrumental in developing mutants of maltose-binding protein (and related members of the
I
135
136
I family) with completely reengineered ligand specificities 3 Engineering Control Over Protein Function Using Chemistry
[36] Similarly, many other computational approaches have made significant progress to aid in the reengineering of protein interfaces [37]. Alternatively, in vitro selections have provided a means of enriching desired binders from large libraries of mutants. For instance, phage display has been used to reengineer both protein-protein [38] and protein-DNA [39-411 interactions. While reengineering complex biomolecular interfaces remains difficult, these advances, alone or in combination, will aid in the development of specifically engineered binding partners that will provide powerful tools to study the biological importance of these interactions.
3.1.5.3
Conclusion
Reengineering protein-ligand interactions can provide powerful information that complements traditional biochemical and genetic approaches. The power of these engineering approaches will increase as new methods are developed both in protein engineering and in our ability to genetically manipulate the organisms we wish to study. These engineering approaches are most useful in vitro or in organisms where genetic manipulation is tractable, such as bacteria, yeast, flies, and mice. As pharmacological agents that target wild-type proteins become increasingly selective, these reagents will complement chemical genetic tools. Even in these cases, however, engineering protein-ligand interactions can provide important information about the specificity of the pharmacological agent, as was discussed earlier for rapamycin. While the genome is vast, many of its features reoccur (e.g., domains, cofactors, etc.) in several different signaling contexts. This biochemical similarity presents a specificity problem on one hand but an engineering opportunity on the other; introducing specificity handles using carefully designed mutations can help provide insight into critical connections between biochemical specificity and biological function.
References R.M. Klabe, L.T. Bacheler, P.J. Ala, S. Erickson-Viitanen, J.L. Meek, Resistance to HIV protease inhibitors: a comparison of enzyme inhibition and antiviral potency, Biochemistry 1998, 37(24),8735-42. 2. N.M. King, M. Prabu-Jeyabalan, E.A. Nalivaika, C.A. Schiffer, Combating susceptibility to drug resistance: lessons from hiv-1 protease, Chem. B i d . 2004, 11(10), 1333-8. 1.
3. M. Prabu-Jeyabalan, E.A. Nalivaika,
N.M. King, C.A. Schiffer, Structural basis for coevolution of a human immunodeficiency virus type 1 nucleocapsid-pl cleavage site with a v82a drug-resistant mutation in viral protease, J. Virol. 2004, 78(22), 12446-54. 4. S.W. Kaldor, V. J. Kalish, J.F.N. Davies, B.V. Shetty, J.E. Fritz, K. Appelt, J.A. Burgess, K.M. Campanale, N.Y. Chirgadze, D.K. Clawson,
References I 1 3 7
5.
6.
7.
8.
9.
10.
11.
12.
13.
B.A. Dressman, S.D. Hatch, D.A. Khalil, M.B. Kosa, P.P. Lubbehusen, M.A. Muesing, A.K. Patick, S.H. Reich, K.S. Su, J.H. Tatlock, ViracePt (nelfinavir mesylate, ag1343): a potent, orally bioavailable inhibitor of hiv-1 protease, J. Med. Chem. 1997, 40(24),3979-85. D.S. Dauber, R. Ziermann, N. Parkin, D.J. Maly, S. Mahrus, J.L. Harris, J.A. Ellman, C. Petropoulos, C.S. Craik, Altered substrate specificity of drug-resistant human immunodeficiency virus type 1 protease, I. Viral. 2002,76(3),1359-68. J.L. Crespo, M.N. Hall, Elucidating tor signaling and rapamycin action: lessons from saccharomyces cerevisiae, Microbiol. Mol. Biol. Rev. 2002, 66(4), 579-91. S.L. Schreiber, Chemistry and biology of the immunophilins and their immunosuppressive ligands, Science 1991, 251(4991),283-7. J. Heitman, N.R. Mowa, M.N. Hall, Targets for cell cycle arrest by the immunosuppressant rapamycin in yeast, Science 1991, 253(5022),905-9. P.C. Nowell, D. Hungerford, A minute chromosome in chronic granulocytic leukemia, Science 1960, 132, 1497. J.D. Rowley, Letter: a new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and giemsa staining, Nature 1973, 243(5405), 290-3. M.E. Gorre, M.Mohammed, K. Ellwood, N. Hsu, R. Paquette, P.N. Rao, C.L. Sawyers, Clinical resistance to sti-571 cancer therapy caused by bcr-abl gene mutation or amplification, science 2001, 293(5531), 876-80. B. Nagar, W.G. Bornmann, P. Pellicena, T. Schindler, D.R. Veach, W.T. Miller, B. Clarkson, J. Kuriyan, Crystal structures of the kinase domain of c-abl in complex with the small molecule inhibitors pd173955 and imatinib (sti-571),Cancer Res. 2002, 62(15),4236-43. S.M. Wilhelm, C. Carter, L. Tang, D. Wilkie, A. McNabola, H. Rong,
14.
15.
16.
17.
18.
C. Chen, X. Zhang, P. Vincent, M. McHugh, Y. Cao, J. Shujath, S. Gawlak, D. Eveleigh, B. Rowley, L. Liu, L. Adnane, M. Lynch, D. Auclair, I. Taylor, R. Gedrich, A. Voznesensky, B. Riedl, L.E. Post, G. Bollag, P.A. Trail, Bay 43-9006 exhibits broad spectrum oral antitumor activity and targets the raf/mek/erk pathway and receptor tyrosine kinases involved in tumor progression and angiogenesis, Cancer Res. 2004, 64(19), 7099-109. J . Cools, D.J. DeAngelo, J. Gotlib, E.H. Stover, R.D. Legare, J. Cartes, J. Kutok, J. Clark, I. Galinsky, J.D. Griffin, N.C. Cross, A. Tefferi, J . Malone, R. Alam, S.L. Schrier, J. Schmid, M. Rose, P. Vandenberghe, G. Verhoef, M. Boogaerts, I , wlodarska, H, Kantarjian, P. Marynen, S.E. Coutre, R. Stone, D.G. Gilliland, A tyrosine kinase created by fusion of the pdgfra and fiplll genes as a therapeutic target of imatinib in idiopathic hypereosinophilic syndrome, N. Engl. I. Med. 2003, 348(13), 1201-14. E. Tamborini, L. Bonadiman, A. Greco, V. Albertini, T. Negri, A. Gronchi, R. Bertulli, M. Colecchia, P.G. Casali, M.A. Pierotti, S. Pilotti, A new mutation in the kit atp pocket causes acquired resistance to imatinib in a gastrointestinal stromal tumor patient, Gastroenterology 2004, 127(1), 294-9. T. Schindler, F. Sicheri, A. Pico, A. Gazit, A. Levitzki, J. Kuriyan, Crystal structure of hck in complex with a src family-selective tyrosine kinase inhibitor, Mol. Cell 1999, 3(5), 639-48. L.A. Witucki, X. Huang, K. Shah, Y. Liu, S. Kyin, M.J. Eck, K.M. Shokat, Mutant tyrosine kinases with unnatural nucleotide specificity retain the structure and phospho-acceptor specificity of the wild-type enzyme, Chem. Bid. 2002, 9(1),25-33. K. Shah, K.M. Shokat, A chemical genetic screen for direct v-src substrates reveals ordered assembly of
138
I
3 Engineering Control Over Protein Function Using Chemistry
19.
20.
21.
a retrograde signaling pathway, Chem. Bid. 2002, 9(1),35-47. Y.W. Hwang, D.L. Miller, A mutation that alters the nucleotide specificity of elongation factor tu, a gtp regulatory protein, /. Bid. Chem. 1987, 262(27), 13081- 5. A. Weijland, A. Parmeggiani, Toward a model for the interaction between elongation factor tu and the ribosome, Science 1993, 259(5099), 1311-4. A. Weijland, G. Parlato, A. Parmeggiani, Elongation factor tu d138n, a mutant with modified substrate specificity,as a tool to study energy consumption in protein biosynthesis, Biochemistry 1994, 33(35),10711-7.
22.
23.
24.
25.
A. Bishop, 0. Buzko, S. HeyeckDumas, I. Jung, B. Kraybill, Y. Liu, K. Shah, S. Ulrich, L. Witucki, F. Yang, C. Zhang, K.M. Shokat, Unnatural ligands for engineered proteins: new tools for chemical genetics, Annu. Rev. Biophys. Biomol. Sttuct. 2000, 29, 577-606. M. He, I. Bodi, G. Mikala, A. Schwartz, Motif iii s5 of 1-type calcium channels is involved in the dihydropyridine binding site. a combined radioligand binding and electrophysiologicalstudy, /. Bid. Chew. 1997, 272(5),2629-33. R.E. Dolmetsch, U. Pajvani, K. Fife, J.M. Spotts, M.E. Greenberg, Signaling to the nucleus by an 1-type calcium channel-calmodulin complex through the map kinase pathway, Science 2001, 294(5541),333-9. M.J. Caterina, M.A. Schumacher, M. Tominaga, T.A. Rosen, J.D. Levine, D. Julius, The capsaicin receptor: a heat-activated ion channel in the pain pathway, Nature 1997, 389(6653),
28.
29.
30.
31.
32.
33.
34.
35.
36.
816-24. 26.
7.7. Tewksbury, G.P. Nabhan, Seed dispersal. directed deterrence by capsaicin in chilies, Nature 2001,
37.
412(6845),403-4. 27.
S.E. Jordt, D. Julius, Molecular basis for species-specificsensitivity to “hot” chili peppers, Cell 2002, 108(3), 421-30.
38.
M.A. Shogren-Knaak, P.J. Alaimo, K.M. Shokat, Recent advances in chemical approaches to the study of biological systems, Annu. Rev. Cell Dev. Biol. 2001, 17,405-33. J.T. Koh, Engineering selectivity and discrimination into ligand-receptor interfaces, Chem. Biol. 2002, 9(1), 17-23. A.R. Buskirk, D.R. Liu, Creating small-molecule-dependent switches to modulate biological functions, Chem. B i d . 2005, 12(2),151-61. B.N. Cook, C.R. Bertozzi, Chemical approaches to the investigation of cellular systems, Bioorg. Med. Chem. 2002, 10(4),829-40. A.C. Bishop, J.A. Ubersax, D.T. Petsch, D.P. Matheos, N.S. Gray, J. Blethrow, E. Shimizu, J.Z. Tsien, P.G. Schultz, M.D. Rose, J.L. Wood, D.O. Morgan, K.M. Shokat, A chemical switch for inhibitor-sensitive alleles of any protein kinase, Nature 2000 407(6802),395-401. J.A. Ubersax, E.L. Woodbury, P.N. Quang, M. Paraz, J.D. Blethrow, K. Shah, K.M. Shokat, D.O. Morgan, Targets of the cyclin-dependent kinase cdkl, Nature 2003, 425(6960),859-64. B. Yu, V.Z. Slepak, M.I. Simon, Characterization of a goalpha mutant that binds xanthine nucleotides, I. Biol. Chem. 1997, 272(29), 18015-9. C. Zhang, D.M. Kenski, J.L. Paulson, A. Bonshtien, G. Sessa, J.V. Cross, D.J. Templeton, K.M. Shokat, A second-site suppressor strategy for chemical genetic analysis of diverse protein kinases, Nut. Methods 2005, 2(6),435-41. L.L. Looger, M.A. Dwyer, J.J. Smith, H.W. Hellinga, Computational design of receptor and sensor proteins with novel functions, Nature 2003, 423(6936), 185-90. T. Kortemme, D. Baker, Computational design of protein-protein interactions, Cum. Opin. Chem. Bid. 2004, 8(1),91-7. S. Atwell, M. Ultsch, A.M. De Vos, J.A. Wells, Structural plasticity in a remodeled protein-protein interface, Science 1997, 278(5340),1125-118.
References I 1 3 9
39. H.A. Greisman, C.O. Pabo, A general
strategy for selecting high-affinity zinc finger proteins for diverse dna target sites, Science 1997, 275(5300),657-61. 40. R.R. Beerli, B Dreier, C.F. Barbas, Engineering polydactyl zinc-finger transcription factors, Nat. Biotechnol. 2002, 20(2), 135-41.
41.
M.D. Simon, K.M. Shokat, Adaptability at a protein-dna interface: re-engineering the engrailed homeodomain to recognize an unnatural nucleotide, J . Am. Chem. SOC.2004, 126(26),8078-9.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
140
I
3 Engineering Control Over Protein Function Using Chemistry
3.2 Controlling Protein Function by Caged Compounds
Andrea Giordano, Sirus Zarbakhsh, and Carsten Schultz
3.2.1 Introduction
One ofthe major tasks in biological sciences is to dissect complex specimens to learn more about structures, their functions, and the connections between the components. These days, science is focusing predominantly on the microscopic and molecular level and therefore the behavior of each molecule, its fate, its mobility, and the interaction with other molecules is of interest. TO achieve this, it is required to generate data with high spatial and temporal resolution. Most standard methods cannot provide the latter, because they require the destruction of cells. Even modern techniques like ribonucleic acid interference (RNAi) or artificial expression of proteins are crude in this respect because large populations of molecules are affected. It would be most desirable to interfere with a small subset of molecules in a specific area of a cell or an organism. Even more advanced would be techniques that permit the onset of a biochemical reaction or a translocation event at a certain time point and under the control of the observer. Photoactivatable compounds could serve these purposes. With a flash of light focused at a particular region of the specimen, a biologically active compound may be generated or destroyed within seconds. The caged compound is usually a small molecule that is able to modulate protein function [l].In the last decade or so, proteins or peptides themselves are increasingly equipped with photoactivatable groups generating switchable, biologically active molecules under the direct control of the experimentalist [2, 31. When applied to proteins, the photolytic removal would activate or inactivate the molecule spontaneously thus mimicking fast intracellular changes in enzyme activity. In a few cases, the methodology was used for other macromolecules like DNA and RNA [4-61. This chapter gives a brief overview of the various known caging groups suitable for forming caged proteins, their pros and cons, and the methods of introducing the groups chemically. Chiefly, the current knowledge of applying cages to proteins and the questions answered by using caged proteins are described. During the preparation of this manuscript, a splendid book describing most of our knowledge on caged compounds and proteins was published [7]. 3.2.2 Photoactivatable Groups and Their Applications 3.2.2.1 Nitrobenzyl and Nitrophenyl Groups In 1962, Barltrop et al. reported the release of glycine from its nitrobenzyl
carbamate upon photolysis IS]. Today, the o-nitrobenzyl group and its Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
3.2 Controlling Protein Function by Caged Compounds
derivatives are the most prevalent photocleavable caging groups in use. Formally, the reaction is a photochemically induced isomerization of o-nitrobenzyl alcohol into o-nitrosobenzaldehyde, thereby releasing the substituent as the free acid (Scheme 3.2-1).Esters, carbamates, and carbonates are converted into an acetal derivative that spontaneously collapses into the aldehyde and the released fragment. If the leaving group is a carbamate or a carbonate the latter undergoes spontaneous decarboxylation and yields free amines or alcohols, respectively. The groups are usually uncharged, of average lipophilicity, and fairly small; all features that are desirable for cell applications. Nitrobenzyl groups as well as other caged groups were successfully employed, especially to mask charged groups like acids, phosphates, and amines (as carbamates) [g]. For compounds like CAMP the corresponding nitrobenzyl ester or coumaryl esters were rendered uncharged by the masking groups and the compounds were, therefore, able to penetrate cell membranes [I1, 121. After photolysis, however, the released charged compounds were again impermeable and hence trapped inside cells. This prodrug-like approach combines two crucial features of biochemical tools: cell permeability and photoactivation. This combination of properties could also be of major interest in peptide-based tools in the future. The unsubstituted 2-nitrobenzyl (NB) group (Fig. 3.2-1A) has several shortcomings that limit its application. First, the wavelength that is required for deprotection (260 nm) is too short for optical equipment and is known to damage living cells [13]. Second, the N B caging group is not suitable to examine fast reactions because there is a lag of a few milliseconds between the photolysis and the release of the bioactive molecule [14, 151. Third, the photoproduct 2-nitrosobenzaldehyde may react with the released compound or other components, leading to cell damage [16].These three factors (photolysis wavelength, kinetics, and product) are most relevant for all cages used in living cells. A more suitable photolysis by-product is released from the 1-(2-nitropheny1)ethyl (NPE) group (Fig. 3.2-1B) [16], which is also removed by short UV light (265 nm). It generates the less reactive nitrosoacetophenone and therefore exhibits less toxicity. Also, NPE’s photolysis rates are significantly higher at 260nm than those for N B (10000 versus 850 s-l). Even better are a-carboxy-2-nitrobenzyl (CNB) groups (Fig. 3.2-1C, 17000 s-l) [17]. However, the NPE group is chiral, a property that is often undesirable due to the formation of diastereomers with chiral biomolecules. The diastereomers might have different biological and photochemical properties and separation is usually difficult on a preparative scale. NPE-caged ATP was used to probe the kinetics of muscle contraction, but its release rate was modest and, more importantly, the caged compound was not completely inactive [18, 191. Sometimes, the increased lipophilicity of the cage is undesirable. To prevent the interaction of NPE-caged carbamoylcholin with the nicotin acetylcholine receptor before photolysis a negatively charged
I
141
142
I
3 Engineerhg Control Over Protein Function Using Chemistry
OYX
I
"&" \ /
tI
t
I X
qq 3 3
a : a: 0
CKN
LT
z
-K zIIX
I
ko
o-+g t
3.2 Controlling Protein Function by Caged Compounds
B
A
NO2 CH3
C NO2 COOH
D NO2 COOH
&x
&x
@x
\
\
\
\
NPE
CNB
NPg
W
N
H
2
I
143
E
H3CO OCH3
NB
H3C0W
DMNB
H
DMNPE
OCH3
NU2
I
OCH3 DNP
NTP
Fig. 3.2-1 Structures o f nitrobenzyl groups used for light-induced deprotection. X represents a leaving group, either in the reagent used to introduce the cage or for the photochemical release.
carboxylate group was attached to the cage (CNB, Fig. 3.2-1C), eliminating the problem 1171. In addition, this CNB group showed faster release kinetics than the N B group [17]. CNB has also been successfully used to cage glycine derivatives [20]. However, additional charges are not always beneficial. CAMPdependent protein kinase A (PKA) was made to react with CNB bromide to yield a caged version of the enzyme [21]. The caging group was introduced at Cys199 and inactivated PKA. Unfortunately, the caged protein was unable to undergo significant photoactivation. In contrast, simple o-nitrobenzyl bromidemodified PKA not only exhibited a substantial loss in kinase activity but also showed a 20-30 fold reactivation of the catalytic activity upon exposure to UV light (for more detailed information on caged PKA, see below). A particular form of CNB is (2-nitropheny1)glycine (Npg). This artificial amino acid (Npg, Fig. 3.2-1D) was successfully incorporated into ion channels like the nicotinic acetylcholine receptor [22] by nonsense suppression, a technique developed by Peter Schultz and coworkers [23-261. Irradiation (4 h, > 360 nm) of proteins containing Npg led to peptide backbone cleavage in Xenopus oocytes [22]. Like the nitrobenzyl group, NPE and CNB groups absorb only weakly at wavelengths greater than 340 nm, thus limiting applications in the suitable range of 350-400 nm. Wavelengths under 300 nm are inconvenient because of considerable absorption by proteins and nucleic acids as well as by any kind of glass, including microscope lenses. This was overcome when electron-donating groups were added to the aromatic moiety. The 4,5-dimethoxy-2-nitrobenzyl (DMNB) (Fig. 3.2-1E) cage (2-nitroveratryl) was introduced in 1970 by Patchornik and Woodward as
DMNTP
144
I a “nitrogen” protecting group [27]. The substituents on the aromatic ring 3 Engineering Control Over Protein Function Using Chemistry
were located to give a major absorption band at 350nm. This relatively long wavelength is attractive, because absorbance of radiation by proteins and nucleic acids is significantly reduced. Until today, DMNB is still one of the few photolabile protecting groups working at lower energy levels (up to 420 nm). Marriott employed DMNB chlorocarbamate (Fig. 3.2-2A) to cage G-actin at LysGl [28].This modification blocked the polymerization of G-actin to F-actin. Additionally,he prepared a cysteine-caged myosin using the DMNB bromide [29]. The DMNB chlorocarbamate and bromide (Fig. 3.2-2A and B) are both commercially available and are the most commonly used reagents to introduce the DMNB group. Nitrophenyl-substituted Michael acceptor systems (Fig. 3.2-2C) have also been employed to cage proteins, for instance B-galactosidase,probably by reaction with a cysteine residue [30]. Katritzky et al. examined the effect of the electronic nature of nitrobenzyl groups and two different types of linkage groups, ether and carbonate, upon photolysis [31]. The 4-monomethoxy substituted nitrobenzyl group (Fig. 3.2-1F) had a more electron-rich benzylic carbon atom than that of the 4,s-dimethoxy substituted nitrobenzyl compounds, because, according to the authors, the methoxy substituent in the meta position was electron withdrawing with respect to the benzyl carbon atom. On the basis of quantitative stucture-activity relationship calculations it was expected that monomethoxy-substituted nitrobenzyl molecules would decompose faster than their dimethoxy analogs under photolysis conditions [311. Dimethoxy substitution of caged nitrobenzyl phenylephrine increased the maximum absorption wavelength and also increased the rate of photolysis relative to the unsubstituted nitrobenzyl phenylephrine analog, showing that electrondonating benzyl substituents promoted photolytic cleavage of 2-nitrobenzyl phenolic ethers. Furthermore, it was shown that molecules with ether linkages decompose faster than molecules with a carbonate linkage. The faster kinetics of release of DMNB compared to the corresponding NPE-caged versions were demonstrated for caged cyclic nucleotides [32]. The 1-(4,5-dirnethoxy-2-nitrophenyl)ethyl (DMNPE) (Fig. 3.2-1G) group which combines both the modifications of DMNB and NPE groups failed to show fast release kinetics with ATP or amino acids [32, 331. As mentioned above, another major problem is the formation of diastereomers due to the
Fig. 3.2-2 Structures of commonly used DMNB reagents.
3.2 Controlling Protein Function by Caged Compounds
stereocenter at the benzylic carbon atom. As expected, the DMNPE group is removed with UV light > 350 nm, which is less harmful to cells. Furthermore, the photo-by-product is again a nitrosoacetophenone that is less reactive than the corresponding aldehyde released by photolysis of commonly implemented o-nitrobenzyl caging groups. Therefore, depending on the application, the use of the DMNPE group might be beneficial, especially when the formation of diastereomers is not causing problems. The isomeric 2-ethyl form [34] as well as the related 2-propyl variety [35] were also examined as cage groups. The photorelease happened via B-elimination. Because of favorable quantum yields, these groups may be some of the most promising caging groups in future applications. Some of the isomeric nitroaromatic groups were tried as photocages for phosphates in the 1960s. The 3,s-dinitrophenyl (DNP) (Fig. 3.2-1H) caged inorganic phosphate was converted by irradiation at 300-360 nm ( E about 3000 M-lcm-l) with a reasonable quantum efficiency (0.67) and released phosphate at > l o 4 s-' at pH 7. However, the only successful example that employs the DNP group was the photoreleasing phosphate in crystals of glucogen phosphorylase b, thereby permitting to monitor its catalytic cycle by Laue X-ray diffraction [36]. DNP-caged ATP was at least 100-fold less photosensitive than DNP phosphate, clearly a setback for applications involving compounds with a chromophore. Recently, N-methylN-(2-nitrophenyl)carbamoyl chloride (MNPCC) was introduced to specifically mask the catalytic serine in butyrylcholinesterase (BChE). Reactivation was achieved by irradiation at 365 nm [37]. A very recent addition to the nitrobenzyl-based photocleavable protecting groups are the 1-(2-nitrophenyl)-2,2,2-trifluoroethyl (NPT) (Fig. 3.2-1K)and the ~-(~,5-dimethoxy-(2-nitrophenyl)-~,~,~-trifluoroethyl (DMNPT) (Fig. 3.2-1L) groups [38]. However, these groups are not stable under the harsh reaction conditions of the Williamson synthesis. Therefore, it was required to attach the NPT and DMNPT groups to various alcohols via Mitsunobu coupling. Primary alcohols reacted with good yields while secondary alcohols gave only poor coupling. An advantage ofthe NPT and DMNPT groups is the high quantum yields (0.4-0.7). Unfortunately, besides the slow fragmentation kinetics observed for decaging alcohols [38]this caging group exhibited very poor hydrolytic stability for carboxylic esters (M. Goeldner, personal communication). An interesting nitrobenzyl-based photocage is the 2,2'-dinitrobenzhydryl (DNB) group [27]. Here, the benzylic methylene group is substituted with another o-nitrophenyl group. This group, which was used to cage amino acids, does not lead to diastereomers due to its symmetry. The related bis(2-nitro4,5-dimethoxyphenyl)methylgroup was used to cage ion chelators [39,40]. A novel cage variety is the 2-(dimethylamino)-5-nitrophenyl (DMNP) group. With its major absorption band at 400 nm, fast release kinetics, and a decent extinction coefficient (9000 M-'cm-') this group appears to be promising for in vivo applications [41].
I
145
~
~
~
146
I
3 Engineering Control Over Protein Function Using Chemistry
Is it possible to use several of these photoactivatable groups in one molecule for orthogonal deprotection by wavelength-selective cleavage? First attempts with various nitrobenzyl group derivatives were only partially successful mainly because of energy transfer between the chromophores [42,43].
3.2.2.2
Other Caging Groups
A significant number of photoremovable protecting groups that are not derivatives of nitrobenzyl group cages have been devised by organic chemists for applications in peptide and nucleotide syntheses. These groups and their respective uses have been extensively reviewed before [9,44].We will therefore describe only those groups that were useful to cage peptides and proteins, in detail. However, several caging groups used to date for small molecules or as photoremovable protecting groups for synthetic purposes may be very useful for applications with proteins in the future. Unfortunately, many of these require photolysis with short wavelength ultraviolet light (<300 nm) and would be impractical for biological systems. Some, however, are cleaved at higher wavelengths and do not cause the photodestruction of amino acids such as tryptophan and tyrosine. These are, in particular, phenacyl esters. They were used to mask phosphates [45,46]and peptides [47] and generated mostly phenylacetic acid derivatives after photocleavage due to an intramolecular rearrangement reaction [48, 491. Sheehan introduced substituted benzoin esters as a protecting group for the carboxyl group, over 30years ago [SO]. Later, this moiety was reinvestigated as replacement for the NPE group to protect phosphates. Promising results were achieved with cu-benzoyl-3,Sdimethoxybenzyl phosphate due to a high quantum yield (0.78 at 347 nm/0.64 at 366 nm) and fast photolysis rates (>10’ s-I) [51,52]. A water-soluble diacetic acid derivative was also introduced 1531. A very elegant application of a benzoin group is the formation of a peptidic loop by cyclization via a bifunctional chromophore that keeps the peptide in a partially unfolded state. Photolysis of the benzoin broke the cyclic structure thereby permitting the peptide to fold, which was followed by CD spectroscopy [54]. Other groups like the sisyl (tris(trimethylsily1)-silyl)group are probably too lipophilic to be used in an aqueous environment and might interfere with protein conformation or solubility [55].This problem has been anticipated for coumarin-based cages. While coumarins were successfully used for caging y-aminobutyric acid (GABA) derivatives [56] and for two photon photolysis of glutamate in brain slices [57], the (7-methoxy-coumarin-4-y1)methyl esters of CAMP and cGMP were poorly soluble [58, 591. More recently, however, substituted coumarylmethyl ester (7-diethylaminocoumarin-4-y1)methyl ester (DEACM), (7-carboxymethoxycoumarin-4-y1)methyl ester (CMCM), and [6,7bis(carboxymethoxy)coumarin-4-yl]methyl ester (BCMCM) were developed to cage cyclic nucleotide monophosphates. The CMCM and BCMCM groups increased the hydrophilicity and solved the solubility problem [59].The DEACM protecting group on the other hand, exhibited remarkable photochemical
3.2 Controlling Protein Function by Caged Compounds
properties [60]. The caged cyclic nucleotides could be efficiently released at nondamaging wavelengths (405 nm). All caged compounds were released very quickly and show very high rates of photocleavage. 7-Hydroxycoumarinyl methyl esters of CAMP were also sufficiently soluble to allow for biological applications [61]. Hence, coumarin-based groups have a high potential for successful applications in proteins. Other groups worth investigating are arylazides [62],nitroindilines [63, 641, as well as N-acyl-2-thionothiazolidines [65] and 5-azido-l,3,4-oxadiazoles [66].Most of these groups suffer from laborious preparation procedures or have just not been investigated for applications with large molecules. Exceptions are cinnamate-based caging groups.
32 . 2 . 3 Vi nylogenic Photocleavable Croups The cinnamate cage was used in one ofthe earliest examples ofa caged enzyme. In contrast to other caging groups, the cinnamate cage relied on E + Z photoisomerization (Scheme 3.2-2). Porter and coworkers showed that a number of serine proteinases could be inactivated with p-Amidinophenyl-o-hydroxymethylcinnamate, which forms a stable acyl enzyme intermediate upon release of the pamidinophenol leaving group [67,68]. After photoisomerization to the Z derivative, the aromatic hydroxy group was sufficientlyclose to the ester ofthe acylated enzyme to permit reesterification (Scheme 3.2-2). This sterically favorable arrangement allowed the regeneration ofthe free serine hydroxy group and gave the decaged protein. Limitations reside in the extensive overlap between enzyme and inhibitor absorbance spectra. The intensity of the light source had to be substantial. At the same time long irradiations degraded the enzyme. Other photocleavable protecting groups that take advantage of E + Z photoisomerization are the vinylsilanes (Fig. 3.2-3) [69, 701. Unfortunately, these compounds require harsh, short wavelength light (254 nm) for photoconversion. The introduction of a methylenedioxy group (Fig. 3.2-3B) failed to shift the absorption to higher wavelengths, but the naphthalene derivative (Fig. 3.2-3C)was effectively photolyzed at 350 nm in methanol.
3.2.2.4 Attaching Photoactivatable Croups The introduction of a cage usually requires a nucleophilic group at the molecule of interest. The relevant groups in proteins and peptides are amino, thiol, or
w o , eH n z y m e HO-enzyme Scheme 3.2-2 Decaging of a proteinase via an intramolecular reesterification.
I
147
148
I
3 Engineering Control Over Protein Function Using Chemistry
*OH Fig. 3.2-3 Vinylsilanes as photocleavable protecting groups require photoisornerization.
E
+2
alcohol groups. Amino groups are readily reacted with chloroformate derivatives (Scheme 3.2-3).In fact, the most commonly used nitrobenzyl derivative (DMNB-OCOC~)is commercially available. Other reagents are prepared by reaction of the alcohol with phosgene or alternatively with carbonyldiimidazole (CDI) [42]. Caging reactions proceed under mild conditions in aqueous solution at slightly basic pH (9-10) [28]. An alternative is p-nitrophenyl carbonate esters. The leaving group permitted the formation of a carbamate directly from the hydrochloric acid salt of glutamate in the presence of 4-(dimethylarnino)pyridine(DMAP) at room temperature (Scheme 3.2-3) [57]. Thiol groups are preferentially reacted with aryl methylhalogenides, for instance, bromo nitrobenzyl derivatives (Scheme 3.2-4). The conditions are extremely mild (Tris buffer pH 7.2) and reactions were reported to be finished within an hour [71].When the reactive caging group is equipped with a suitable amino acid docking sequence, a specific cysteine can be labeled, even with a 300-fold excess of the reagent [21]. Another photoactivatable caging reagent that covalently binds to thiols in proteins is the a-haloacetophenone group. Its aromatic character is recognized particularly well by phosphotyrosine phosphatases (PTP) [72,73].Accordingly,haloacetophenone groups are potent photoreleasable inhibitors of PTPs in vitro. No details about the labeling procedure have been published so far. It is of special interest to label serine and threonine residues, due to their role as acceptors for posttranslational modifications, namely, for phosphorylation.
Scheme 3.2-3
Introduction of caging groups to amino residues.
3.2 Controlling Protein Function by Caged Compounds
Tris buffer PH 7.2
R-SH
-
R
+
Scheme 3.2-4
Introduction ofcaging groups to thiol residues. X is
O H or halogen
To achieve the necessary alkylation, much harsher conditions are required. Unfortunately, the strongly basic conditions of the Williamson ether synthesis are unsuitable for halogenated o-nitrobenzyl reagents [74]. A more suitable leaving group than the halogene is the well-known trichloracetamide group (Scheme 3.2-5). However, the successful reaction requires strongly acidic conditions (CF3S03H)and is used for protected amino acids rather than entire peptides [75]. A milder method that is suitable for caging hydroxyl groups in proteins is the reesterification with p-amidino esters of arylcinnamates. With the help of the leaving group, deactivation of thrombin was achieved within 8 h at pH 7.4 [68]. The phosphorylated varieties are as important for functional studies of peptides and proteins as the hydroxyl groups. Since the nucleophilicity of a phosphate is only moderate, thiophosphates are frequently used as targets for caging reactions. The same conditions that work for labeling cysteines are applied for thiophosphates [71].Alternatively, 4hydroxyphenacyl bromide (HP-Br) is employed to label a thiophosphothreonine in protein kinase A under very mild conditions (1mM reagent, pH 7.3) [76]. For peptides, caged phosphates can be conveniently introduced during solid phase synthesis via phosphorous ( I I I) reagents [77 -801. The above-mentioned coumarin cages were introduced to CAMPor cGMP via the corresponding diazoalkanes [60]. The introduction of cages via diazo compounds has great versatility and was used for numerous applications, in particular, for caging small biologically active phosphate esters like ATP and myo-inositol 1,4,5-trisphosphate (InsP3) [lo, 811. Usually, the carbonyl
CFsS03H
R-OH
+
CH2C12
Cl3C"Q ' OCH3 OCH3
Scheme 3.2-5
NO2
Rm*ocH3
OCH3
A method that does not require base to form ethers o f hydroxy amino acids.
I
149
(b) B r A T s N H - N h -
AcO
\
Scheme 3.2-6
B r d O H T s
AcO
EtSN
~~d AcO
\
Two commonly used synthetic routes to diazo compounds. Ts - tosyl.
derivative of the caging group was reacted with hydrazine, followed by oxidation to the diazo compound in the presence of MnOz (Scheme 3.2-G(a))[lo,81,821. After the removal of MnOz by filtration and several washes, the diazo reagent was used mostly without further purification. In an alternative method, a tosyl hydrazone was formed. Treatment with base then gave the diazo compound (Scheme 3.2-G(b))[GO, 611.
3.2.3 Caged Peptides and Proteins
The synthesis of caged peptides is accompanied by a series of obstacles. That is the reason for the formerly small amount of caged peptides available compared to other low-molecular-weight caged species. Proteins contain a variety of nucleophilic sites and therefore the major problem is the site-specific modification of a protein with an exogenous caging agent. Furthermore, the absence of an appropriate nucleophilic residue at or near the desired site of modification can be a problem. Finally, unlike low-molecular-weight compounds, proteins and most peptides are generally not membranepermeant. The most obvious way to prepare a caged protein seems to be the addition of a photoactivable group to a residue that is essential for protein function. The problem is that the chemistry required needs to deal with entire proteins and that the residue of interest is not usually unique within the protein. Nevertheless, several approaches addressed the direct introduction of cage groups on proteins, either on several residues simultaneously or specifically on a single amino acid side chain.
3.2.3.1 Multiresidue Protein Caging
Preparation of caged proteins by introduction of an o-nitrobenzyl group directed toward specific residues dates back to the mid-1990s. In a pilot study, bovine serum albumin (BSA) was randomly labeled with up to 15
3.2 Controlling Protein Function by Caged Compounds
o-nitrobenzyl groups at Lys residues using either 2-nitrobenzyl alcohol or 1-(2-nitrophenyl)ethanolin the presence of diphosgene or l,l’-CDI, which yielded up to 90% of caged protein [83]. Notably, the secondary alcohol coupled with diphosgene, but not with 1,l’-CDI. Exposure of NB-labeled BSA to UV light led to the release of about 60% of the coupled cages. The incomplete photolysis was probably due to the propensity of the photoproduct nitrosobenzaldehyde to either back-react with the protein or to dimerize to azobenzene-2,2’-dicarboxylicacid, which was suggested to act as an internal filter lowering the efficiency of photolysis [84]. NPE-labeled BSA, on the other hand, readily furnished up to 95% of the native protein after UV treatment (365 nm) with a time-dependent release of about 1/3 of coupled residues after 1-2 min and about 213 of that after 5 min of exposure. Performing the same caging strategy and using antibodies as models for both receptor and ligands, these authors successfully modulated affinity of antibody-binding sites for antigen, antigen binding sites for antibodies, and antibody Fc binding sites for protein A using a NPE-coated human IgG before and after UV treatment [85]. With the aim of studying the regulation of the G-actin monomer pool and the assembly of F-actin filaments in living cells, Marriott described both preparation and properties of G-actin conjugates [28, 861. Using the lysine-directed 4,5-dimethoxy-2-nitrobenzyl chloroformate (DMNBOCOCl) and an optimized water-based chemistry protocol that avoided overlabeling of the target protein (and thus, circumventing problems of denaturation/insolubility/low yields of photoactivation), caged monomeric G-actin was prepared in 30-60% yield, with an average of four DMNB groups per monomer. Such LysG1-caged G-actin showed to be unable to polymerize to F-actin in vitro, confirming that residue Lys6l was forming part of an actin-actin interface in F-actin. Upon photo-deprotection with UV light (320-400 nm) for 12 min, polymerized F-actin was obtained in 60-95% yield. More recently, Lys-targeted protein caging with DMNB-OCOCl was performed on the G-actin binding protein thymosin 8 4 (TB4) [87]. TB4 is thought to be involved in the regulation of the large intracellular G-actin pool. Native TB4 is known to inhibit actin polymerization in vitro by binding to G-actin via a conserved nine-residue segment (LKKTETQEK, residues 17-25) [88]. In the cited study, DMNB-labeled TB4 was shown to be unable to bind to G-actin in vitro as a result of the unaffected rate of polymerization compared to control actin. Subsequently, DMNB-labeled TB4 was introduced by bead loading in locomoting fish epithelial keratocytes and was photoactivated locally in the cell wings (871. Upon UV irradiation (365 nm), very specific changes in the global locomotory pattern of keratocytes were observed in vivo, with noticeable turning of cells. These observations may be explained by local perturbation in actin filament dynamics brought by the spontaneous increase of active, decaged TB4 concentration in the region of irradiation.
1
151
152
I 3.2.3.2
3 Engineering Control Over Protein Function Using Chemistry
Single Residue Protein Caging
A second labeling strategy aimed at the preparation of caged protein conjugates is based on the targeted modification of essential cysteine residues using photolabile alkyl halides [86], such as 2-bromo-2-(2-nitrophenyl)aceticacid (CNB-Br),NB-Br, or DMNB-Br. Proteins to be caged at Cys residues can be engineered from other proteins by cysteine-scanning mutagenesis: the useful mutant will be the one that is inactive only after labeling with a thiol-reactive caged reagent. Because only a single cage group is removed from a cysteinetargeted caged protein, the photoactivation yield is usually higher compared to DMNB-caged proteins [89].The main disadvantage of this approach may be the necessity of generating and screening a large collection of mutants. The synthesis and utilization of the water-soluble CNB-Br as a Cys-targeted caging reagent was reported by Bayley and coworkers [go]. Staphylococcal a-hemolysine ( a H L ) is a toxic polypeptide lacking cysteine residues. The protein self-assembles to form a heptameric pore in cell membranes. A single cysteine mutant R104C maintained this feature, while pore-forming activity toward rabbit erythrocytes was lost upon derivatization of CyslO4 with CNBBr (100 10 mM Dithiothreitoe in aqueous buffer at pH 8.5, yield ca 80%). Toxicity ofthe R104C mutant was regenerated by photoactivation with UV light (300 nm, 30 min, yield ca 60%) and subsequent exposure to rabbit erythrocytes (Fig. 3.2-4). Marriott and Heidecker reported a Cys-caged heavy meromyosin (HMM) using DMNB-Br and evaluated the capacity of photoactivated HMM to couple the energy of calcium/actin-activated ATP hydrolysis to the movement of F-actin filaments in an in vitro motility assay [29, 861. It was known from labeling studies with the thiol-reactive fluorophore tetramethylrhodamine
+
0
20
40
60
Time (min)
Fig. 3.2-4
Hemolytic activity of decaged R104C a-hemolysine (black circles) toward rabbit red blood cells (rRBC) measured by monitoring light scattering at 595 nm versus a nonilluminated sample (white circles). With permission from Ref. [go].
3.2 Controlling Protein Function by Caged Compounds
iodoacetate (IA-TMR) directed against Cys707 that this residue was crucial for sliding of F-actin filaments in the in vitro motility assay. Therefore, it was reasoned that Cys707-caged HMM could show a similar behavior, which eventually could be reverted upon photoactivation. HMM was reacted with DMNB-Br in aqueous buffer at pH 7.4. Two cage groups per HMM molecule (or one cage per ATPase domain of HMM) were incorporated in the reported protocol. Although the calcium/ATPase activity of purified caged HMM was increased fivefold compared to unlabeled HMM, caged HMM failed to produce appreciable sliding of F-actin filaments, unless irradiated with pulsed (500 ms) 340-400 nm UV light, conditions that produced sliding of 90% of F-actin filaments in the in vitro motility assay with a velocity of up to 4 pm s-l, a value comparable to unmodified HMM [%I. Protein kinases constitute a large family of enzymes (>500) whose activity includes the transfer of the y -phosphoryl group of ATP to serine, threonine, and tyrosine residues in a wide range of protein substrates, giving rise to a large collection of phosphorylation-based signal transduction pathways. A well-defined spatially and temporally activatable kinase is of invaluable utility in elucidating many aspects of signal transduction phenomena in living cells, under both physiological and pathological conditions. One of the best-studied kinases is protein kinase A. An interesting comparison of the behavior of three different caged catalytic subunits of PKA was reported by Bayley and colleagues [91]. Working with a single cysteine mutant (C343S) of the murine catalytic subunit of PKA, the unique Cys residue 199 was masked with the thiol-reactive cage groups NB-Br, CNBBr, and DMNB-Br. Cys199 is placed in close proximity to the critical Thr197 in the “activation loop” of the enzyme [92]. The caged protein showed, as expected, a significant inactivation when kinase activity was tested in vitro with the artificial substrate Kemptide (LRRASLG).Interestingly, only the NB-caged enzyme showed, among the three, low values of residual activity after caging (3-5%) and satisfactory activity after photolysis (pH 6.0,80- 100%)with respect to the unmodified enzyme. Moreover, the quantum yield of photolysis was an impressive 0.84. The ‘‘lesson’’from this work, using the authors’ phrasing, is that given a particular target protein a variety of photoremovable protecting groups have to be tested since a reagent that works well with one protein (for instance, the CNB-caged aHL described earlier) may not work well with others. Cofilin is a kinase-regulated, F-actin binding protein whose activation state is regulated by phosphorylation at Ser3 through the LIM-domain-containing kinase (LIM kinase). Unphosphorylated cofilin monomers bind cooperatively to F-actin in vitro leading to depolymerization of actin filaments [93], while phosphorylation by LIM kinase inactivates these features of the cofilin function (Fig. 3.2-5).Lawrence and coworkers [94]observed that the cysteine mutant S3C cofilin is constitutively active because it is unable to undergo phosphorylation by LIM kinase, while a CNB-caged S3C cofilin is unable to depolymerize actin filaments in vitro. This shows the importance of Ser3 for cofilin activity. Accordingly, S3C cofilin activity was restored up to 80% upon irradiation and
I
153
154
I
3 Engineering Control Over Protein Function Using Chemistry
Fig. 3.2-5 Activity o f cofilin initiated by local decaging. A 2-s laser pulse aimed at the area indicated in F gave local protrusion within 1 t o 3 rnin. With permission from Ref. [95].
depolymerization of rhodamine-labeled actin filaments was assessed via an in vitro light microscopy assay. Subsequently, these investigators could elegantly extend the role of cofilin in vivo by microinjecting caged CNB-S3C cofilin (up to 20 pM) into MTLn3 carcinoma cells and by exposing cell territories to UV irradiation [95]. Cell-wide photoactivation increased free barbed ends, F-actin content, and cellular locomotion, while highly localized activation generated lamellipodia and determined direction of cell locomotion. Showing all the intrinsic power of caged proteins in biological investigations in vivo, this study expanded the effective role of cofilin in contrast to motility models in vitro, where cofilin was predicted to only depolymerize F-actin. Protein phosphorylation on tyrosine residues is an important posttranslational modification playing a vital part both in physiological processes, such as transmembrane signaling, and in pathological processes, for instance, in cancer and immune dysfunctions [96].The levels of tyrosine phosphorylation are regulated by the opposing action of protein Tyr kinases (PTKs),which catalyze the formation of phosphotyrosine residues (pY) on target proteins, and phosphotyrosine phosphatases (PTPs), which hydrolyze pY. PTPs of various origins share a common domain of about 250 residues containing the unique “signature motif’ (I/V)-HCxAGxxR(S/T) in which the catalytic phosphatase cysteine is located [97]. Being generally less well characterized than protein kinases, the precise role of PTPs in physiological and pathological conditions still remains to be investigated in more detail. Recently, a-halogenated acetophenones (phenacyl groups) have been reported as a novel, membrane-permeant, non o-nitrobenzyl-based class of caging reagents. They are capable of covalent, photoreversible (350 nm) inhibition of PTPs at the catalytic cysteine (Scheme 3.2-7) [72,73].The different
3.2 Controlling Protein Function by Caged Compounds
a-bromo and a-chloro acetophenone derivatives were employed i n vitro to cage the catalytic cysteine ofvarious prototypical phosphatases such as PTPlB, SHP1, and the catalytic domain of SHP-1, SHP-1 (ASH2). Recovery of enzyme activity after irradiation at 350 nm (15 min) was in some cases obtained to a maximum of 80% of the original value. In the last years, reports have demonstrated the possibility of producing caged proteins by targeting specific amino acid residues that are different from lysine or cysteine. After having described a catalytic Ca subunit of PKA caged at Cys199, Bayley along with Zou and others presented a Ca caged at the active threonine (Thr197) using the above-mentioned 4-hydroxyphenacyl photoremovable protecting group [76]. The advantage of such a caging group with respect to the classical o-nitrobenzyl derivatives was the rapid photo~ the ) lack of reactivity of the photolysis deprotection (k % 107-10s s ~ and product 4-hydroxyphenyl acetic acid [47, 981. The phenacyl methodology was also employed to prepare caged thiophosphoryl peptides (see also below) [76, 991: Ca catalytic subunit was first expressed as a recombinant mutant protein (H6-T197C199A/C343S) in Escherichia coli. Exclusive thiophosphorylation of Thr197 was performed with the phosphoinositide-dependent kinase (PDK-1) in the presence of ATP(y)S. Confirmation of thiophosphorylation was assessed by Western blotting and gel-shift electrophoresis. Finally, purified thiophosphorylated Ca was caged with 4-HP-BR (Scheme 3.2-8) giving rise to the modified protein HP-PsT197Ca showing an 18-fold reduction of specific kinase activity i n vitro toward Kemptide. Activation by photolysis was performed with UV light (312 nm) at pH 7.3 with an 85-90% yield in photoactivation, a quantum yield of 0.21, and a 15-fold increase in activity. These are promising values for future in vivo studies. Photoregulation of the catalytic activity of natural and recombinant human BChE was described in 2003 [37].This enzyme is closely related to acetylcholine
-
-
hi.
S"H
s o
+
/e
OR
i
hV
$
OR
6 /
+$OH /
OR
OR
X = CI, Br ; R = H, CH,,
+
Cys-protein
CH,COOH
Scheme 3.2-7 Cysteine-containing proteins like phosphatases are caged in the active site with phenacyl bromides or chlorides.
I
155
156
I
3 Engineering Control Over Protein Function Using Chemistry
ATP(r)S PDK-1 kinase
Tlg7Ca
Tig7Ca
I
HP-Br
b
0
0 Br
HP-Br=
Q OH
Ti 97Ca
hv I S-P-OH
I
-s-p=o
II
I OH
0
H0’
Scheme 3.2-8 Caging ofthe catalytic subunit Ca of PKA was achieved by thiophosphorylation and subsequent alkylation o f the thiophosphate by 4-hydroxyphenacyl bromide (HP-Br).
esterase (AChE),the serine hydrolase that terminates cholinergic transmission by hydrolysis of the neurotransmitter acetylcholine. Despite the fact that its endogenous substrate has not been identified yet, this enzyme plays a key role in detoxification by degrading esters such as succinylcholine and cocaine. In the reported study, BChE was treated with a novel photoremovable alcoholprotecting group, MNPCC targeted at the catalytic serine residue ofthe enzyme. MNPCC seemed to act as a pseudoirreversibleinhibitor and the X-ray structure of the MNPCC:BChE conjugate showed a nonambiguous carbamylation of the catalytic residue as the only modification on the protein [37].Reactivation of the caged enzyme was obtained at 365 nm (20 min, pH 7.4) and exhibited an efficiency larger than 80%, as was determined by the Ellman test. The same group previously intended to explore the efficient photoregulation in crystals of the MNPCC:BChE conjugate was used to further determine the mechanistic properties of BChE by time-resolvedX-ray crystallography under cryophotolytic conditions [loo]. 3.2.4 Caged Proteins by Introduction o f Photoactive Residues via Site Directed, Unnatural Amino Acid Mutagenesis
Photochemical control of processes such as protein folding, protein-protein or protein-ligand interactions may be achieved via an alternative procedure by which the photochemical trigger - that is, the caged amino acid - is directly incorporated into the native protein sequence as an unnatural residue. The elegant and sophisticated - yet laborious - biosynthetic methodology introduced by Peter Schultz made a wider exploration of protein functions possible by de facto expanding the natural genetic code [23-251. Introduction of an unnatural amino acid follows a series of defined steps that are summarized here briefly: (a)the codon for the amino acid to be replaced
3.2 Contro/hg Protein Function by Caged Compounds
is substituted with a nonsense codon (like the amber stop codon UAG) via standard site-directed mutagenesis, (b) a specific “nonsense suppressor” tRNA able to recognize this codon is prepared and acylated with the desired unnatural amino acid, (c) addition of the mutagenized gene or mRNA and the aminoacylated suppressor tRNA to an in vitro extract or biosynthetic apparatus generates a mutant protein containing the unnatural amino acid at the desired position. Thus, the generation of the specific suppressor tRNA, its acylation with the unnatural residue, and the synthesis of sufficient amount of mutagenized protein are the key steps of the entire methodology, more recently expanded in some technical aspects from its original design [101-103]. With this technique, caged amino acids have been successfully introduced into various protein sequences as unnatural residues. Enzymatic catalysis before and after photoirradiation has been explored by means of caged residues replacing the natural ones in critical positions. Schultz and coworkers described a mutant phage T4 lysozyme (T4L)containing an aspartyl /3-nitrobenzyl ester in place of the wild-type Asp20 in the active site of the enzyme [104]. This residue, along with Glull, is responsible for the catalytic activity [105]. The caged protein, produced in 37% yield, showed no activity in vitro. Conversely, activity was restored to a 32% level compared to the wild-type enzyme after irradiation at 315 nm (Hg-Xe arc lamp 1000 W). In another experiment these investigators managed to photochemically initiate protein splicing from the Thermococccus litoralis DNA Vent polymerase by introducing the 2-nitrobenzyl ether of serine in the place of the conserved Ser1082 [106]. NB- or DMNB-caged aspartates were instrumental in controlling the dimerization of HIV-1 protease [107].This enzyme exists as a 22-kDa monomer that self-assembles into the active dimeric aspartyl protease. The active site is placed at the interface of the homodimer and consists of Asp25 and Asp125, both necessary for the proteolytic activity [108, 1091. Introduction of a NB-Asp into position 25 led to minimal proteolytic activity, while its recovery after UV irradiation (500 W mercury-xenon lamp, 10 min, 0 “C,pH 6.0) was about 97% as revealed by a fluorescence-based protease assay [110]. The introduction of the caged aspartate did not prevent dimerization, suggesting that H bonding involving the wild-type residue is not a prerequisite for monomer association of HIV-1 protease. Instead, it was believed that it affected the stability of the dimer [107]. A similar behavior was shown by the H133A mutant of BamHI endonuclease having incorporated a caged Lys132 [lll].Lys132 along with Glu167, Glu170, and His133 participates in the salt-bridge network at the dimer interface of the active wild-type enzyme [112, 1131. Site-directed introduction of DMNB-OCOLys132 (yield 55%) in the H133A mutant did not prevent dimer formation but abolished enzyme activity almost completely. Photoirradiation (365 nm, 20min, 0°C) led to a recovery of both activity and specificity toward a substrate DNA (ADNA). A different behavior was shown for the H133A BamHI mutant incorporating DMNB-Glul67 or DMNB-Glul70 which did not
1
157
158 3 Engineering Control Over Protein Function Using Chemistry
I exhibit recovery of activity after photoactivation, suggesting misfolding of the protein subsequent to the introduction of these caged residues. A site-directed incorporation of a phenylazo-Phe residue (azoAla) at the same position 132 was also performed (incorporation efficiency of 52%) [114]. Dimer formation and enzyme activity was achieved by inducing trans-cis photoisomerization of the azobenzene moiety. The substihtion K132azoAla produced a mutant enzyme with drastically reduced activity (measured by cleavage efficiency of a DNA substrate), while after irradiation and trans-cis isomerization almost full activity was recovered compared to the wild-type enzyme. Thus, in its trans conformation, the bulkiness of the azoAla residue prevented a correct association of monomers, while the more compact size of the cis isomer did not preclude the proper assembly into the active form. Gradual gain of activity was observed within 5 min of photoirradiation (366 nm, 0°C) without further increase in a global 20 min exposure time. Several proteins are naturally produced as inactive proenzymes and acquire full activity only when cleaved at a specific position by another enzyme. Caspase-3, a cysteine protease, is a key component of the apoptosis signaling pathway. Its inactive form procaspase-3 is cleaved at position Ser176 by caspase8 in the “death receptor-induced’’ apoptosis pathway, eventually forming the active tetramer. Majima and coworkers artificially reproduced the activation mechanism of procaspase-3 by photoinducing the cleavage of the backbone in a mutant protein containing a Npg residue specifically introduced at position 176 [115]. The incorporation efficiency of Npg by using an i n vitro transcription/translation system was only 15%. Nevertheless, photoactivation (366 nm, O’C, up to 10 min exposure time) of Npg-caspase-3 was followed within 1 min by a clear activation of enzymatic activity as quantified by the change in fluorescence of the peptidic substrate Z-DEVD-rhodamine 110. All these studies were performed i n vitro. Some i n vivo experiments with caged proteins engineered by nonsense suppression were successful, especially on the acetylcholine receptor. In the mouse muscle nicotinic receptor (nAChR), NB-tyrosine was incorporated at positions 93 and 198 of the (Y subunit. These are conserved residues crucial for acetylcholine binding. The mutagenized mRNA and the relative nonsense suppressor tRNA charged with the NB-Tyrwere injected into Xenopus oocytes. The channel was successfully expressed and incorporated into the egg membrane [ 1161. In the following voltage-clamp study, a train of about 20 near-UV laser pulses (300-350 nm) was able to increase acetylcholineinduced conductance across the membrane with about 5% of decaged Tyr residues in any one flash. A qualitatively similar result was achieved in another elegant experiment where the same ion channel was mutagenized by direct incorporation of NB-Cys or NB-Tyr replacing a conserved leucine residue in the y subunit that is known to be involved in channel gating [117].As stated by these authors, the work represented the first successful incorporation of caged amino acids into a transmembrane segment of a membrane protein. Interestingly, the presence
3.2 Controlling Protein function by Caged Compounds
of the bulky nitrobenzyl group did not disturb both assembly and trafficking of the receptor, but likely distorted its conformation leading to an alteration of the conductance. This condition was reverted by photoactivation performed with 1-ms pulses of UV light. The different and characteristic kinetics of channel activation after flash photolysis for tyrosine and cysteine for the respective caged receptors were determined. Oocytes expressing the mutant acetylcholine receptor wVall32Npg showed acetylcholine-induced conductance similar to the wild type! but upon photoinduced cleavage of the backbone in the localized region of the w subunit about 90% of the current was lost. Thus, in addition to playing a key role in the correct assembly of the various subunits, this conserved portion proved to be essential for receptor function [22]. The work of this group clearly showed the importance and usefulness of caged proteins as tools for the elucidation of protein function in living cells [118- 1201.
3.2.5 Small Caged Molecules Used to Control Protein Activity
An alternative method to modifying the protein of interest is to control its function by an inhibiting or activating ligand. Since these ligands can be small peptides or other small molecules, a caging group is usually introduced by preparative chemistry. After decaging, interaction between ligand and protein is permitted, the protein is either silenced or activated. For life cell applications, the small molecule ligand has to be membrane-permeant or needs to be introduced by physical methods like microinjection or electroporation. Among the many caged ligands reported so far are various cyclic and noncyclic nucleotides [19, 59, 82, 121, 1221, nitric oxide [123], lipids [go, 124-1261, carbohydrates [80, 127, 1281, inositol polyphosphates [81, 129-1311, ion chelators [40,132, 1331, amino acids [57, 134, 1351, receptor agonists [136, 1371, and many others [138]. Because the synthesis and application of these small molecules has been thoroughly reviewed before [l,7, 44, 1391, we will not discuss them in detail.
3.2.5.1
Caged Peptides
Some of the most potent modulators of protein function are peptides. To introduce a cage at the correct position, essential residues need to be known. Alternatively, libraries of potential binding peptides have to be prepared and tested. There are only a handful of amino acid residues suitable for introducing a caged group. Typical side chains are those of the basic and acidic amino acids and the nucleophilic thiol group of cysteine. In addition, phosphorylation usually takes place at the alcohol groups of serine, threonine, or tyrosine and caging groups on these residues render the phosphorylation site inaccessible until the cage is removed. Solid phase peptide synthesis (SPPS) also permits
1
159
160
I the introduction of phosphorylated residues equipped with a cage group 3 Engineering Control Over Protein Function Using Chemistry
attached to the phosphate. From a synthetic standpoint, there are two ways of preparing caged phosphopeptides: by using an already assembled caged phosphoamino acid or by introducing the caged phosphate after cleavage of the mature peptide from the resin. Phosphopeptides will bind to proteins usually interacting with phosphoproteins as soon as the cage is removed. With the help of membrane-penetrating peptide sequences, “peptide interference” is now on its way into biology labs. 3.2.5.1.1
Caged Basic Residues
3.2.5.1.2
Caged Tyrosine Residues
Caged lysine in form of N‘-o-nitrobenzyloxycarbonyllysine was reported as a building block suitable for Fmoc-SPPS. It was used for the preparation of caged AIPs, autocamtide-2 related inhibitory peptides [2, 1401. AIP (KKALRRQEAVDAL) is a highly specific inhibitor of calcium/calmodulindependent protein kinase I1 (CaMKII). The first two lysine residues play an important role for its activity [141]. As expected, caged AIPs showed significantly reduced inhibitory activity in vitro toward CaMKII (IC50 = 1.2 x M) and gave instantaneous recovery of activity after irradiation (IC50 = 3.6 x lo-’ M, as for natural AIP). Interestingly, the photolysis byproduct nitrosobenzaldehyde did not interfere with the behavior of the photoactivated peptides.
One of the first caged peptides contained a NB-caged tyrosine that was introduced via Fmoc-SPPS [142]. Fmoc-Tyr(NB) was used to prepare caged neuropeptide Y (NPY) and caged angiotensin I1 (AII) peptide [142]. NPY is a 36-amino acid peptide containing Tyr residues at both the N- and the C-termini. It localizes in both the central and peripheral nervous system and is potentially involved in various physiological roles, including blood pressure regulation, anxiety, circadian rhythms, and feeding behavior. Structure/activity relationship studies indicated that both the N- and the C-terminal fragments of NPY are essential for the activation of Y 1 receptors [143]. Introduction of one caged Tyr at the naturally occurring Tyr positions at the N/C-termini of NPY led to a decrease of about 1 order of magnitude after activation of the Y1 receptors in SK-N-MCcells, with additional reduction when two caged Tyr were incorporated at both termini of NPY. Restoration of activity assessed by the binding assay performed after UV irradiation demonstrated the successful role ofthe nitrobenzyl group as a cage for Tyr residues and for the NPY peptide itself. Interestingly, no differences in activity toward A11 receptors in human neuroblastoma SMS-KAN cells were found between caged and unmodified A11 peptides, indicating that the Tyr residue in this eight-amino acid peptide is not involved in binding to the receptor [142].
3.2 Controlling Protein Function by Caged Compounds
The 20-amino acid residue peptide RS-20, whose sequence derives from smooth muscle myosin light chain kinase (M LCK),is a well-known calmodulin binding peptide [144]. Both, RS-20 and LMS-1, a 13-residue peptide derived from the autoinhibitory domain of MLCK, have the capability of inhibiting MLCK phosphorylation activity, normally directed toward the molecular motor, actin binding protein myosin 11, which is involved in physiological phenomena like cell polarization and locomotion [145, 1461. The interaction of RS-20 with its target protein calmodulin has been extensively studied and hydrophobic residues Trp5 and Leu18 were shown to be critical for binding [147, 1481. Tyr9 in LMS-1 peptide is in turn crucial for the inhibitory effect as is predicted from mutagenesis studies on MLCK [149]. Walker and others expanded the study on these molecules, both in vitro and in vivo, using a caged version ofboth peptides (Scheme 3.2-9)[150].Trp5 in RS-20 was replaced with a masked tyrosine bearing a CNB cage on the phenolic group. The carboxylic group of the cage mimicked the negative charge of a glutamate, a mutation known to have a negative effect on binding. Accordingly, the caged RS-20 peptide was largely unable to bind to calmodulin, as assessed in vitro by a quantitative calmodulin-dependent MLCK assay. The photoproduct 5YRS-20 generated after 10-min irradiation at 300-400 nm showed an apparent 50-fold increase in its affinity toward calmodulin. A similarly Tyr9-caged LMS-1 proved to be an effective switchable inhibitor of MLCK in vitro, being indistinguishable from authentic LMS-1 in its inhibitory potency. The effect of local photoactivation of the two caged peptides was finally assessed in vivo in fast-moving Newt eosinophil cells [151]. Peptides were introduced by microinjection in an estimated concentration of 20-100 pM. Photoactivation
9
NO,
0
COOH
+
L I 1 5cgY-RS-20 H,N-ARRKYQKTGHAVRAIGRLSS-COOH
hv
0C ,O ,OH
peptides
9cgY-LMS-1 H,N-LSKDRMKKYMARR-COOH
r~
1
Scheme 3.2-9 The calmodulin binding peptide RS-20 and LMS-1, a peptide that inhibits myosin phosphorylation, caged at different tyrosines. Both peptides were successfully used in eosinophils after microinjection [151].
+
I
161
162
I was performed locally by pulsed near-UV laser light (series of 10 pulses with 3 Engineering Control Over Protein Function Using Chemistry
a 5 ms duration at 20 ms intervals) with concomitant microscopic observation of cells. Photorelease of active peptides was followed, within a few seconds, by acute paralysis of cell movement, ceased flow of cytoplasmic granules and inhibition of forward motion of the leading lamellipodia. These results suggested that calcium/calmodulin regulation of MLCK activity is a major signaling pathway underlying locomotion in eosinophil cells in vivo, and that the myosin I1 motor target of MLCK activity is strongly involved in these motility functions. 3.2.5.1.3
Caged Cysteine and Thiophosphoryl Residues
As mentioned above, Pan and Bayley reported a generally applicable approach for caging cysteine-containing peptides or thiophosphorylated peptides on serine residues in aqueous solution using o-nitrobenzyl bromides such as NB-Br, CNB-Br, and DMNB-Br [71]. Kemptide (LRRASLG), C-kemptide (LRRACLG), and CS-kemptide dimer (LRRACLGLRRASLG) were used as model peptides in this study. Both, Kemptide and CS-Kemptide dimer, were successfully thiophosphorylated on Ser residue using ATP(y)S and PKA catalytic subunit. Thiophosphorylated kemptide peptide was subsequently treated with the three different cages, respectively. At pH 7.2, only NB-Br and DMNB-Br cages were found to react satisfactorily with the thiophosphate group, producing the corresponding caged peptides in 95% yield. CNB-Br was found to be close to unreactive (10% yield at pH 4.0), hence the synthesis to this caged peptide was no longer pursued. Photoactivation of NB- and DMNB-thiophosphoryl-caged Kemptide at 290-380nm was obtained with a yield of 70 and 55% and with quantum yields of 0.23 and 0.06, respectively (Scheme 3.2-10). Selective caging was examined on the CS-Kemptide dimer. The goal was to selectively introduce a cage at a thiophosphoryl-Ser residue over a cysteine
SH
I
op02s-
I
H2N-LRRACLGLRRASLG-COOH
$;:‘*Hy j
NB-Er. pH 4.0
O z N q
I
O=P-OH s 0
I
H2N-LRRACLGLRRASLG-COOH
SH
I
s
NO2
O=P-OH 0
I
H2N-LRRACLGLRRASLG-COOH
Scheme 3.2-10 The selective introduction of cages to thiophosphates versus cysteines is p H dependent.
3.2 Controlling Protein Function by Caged Compounds
residue. At pH 4.0 NB-Br (2 mM in 100 mM sodium acetate) showed good selectivity for the alkylation of thiophosphate, while at pH 7.2 both Cys and thiophosphoryl residues reacted with NB-Br as was determined by MALDI-MS. Cysteine-targeted caging of C-Kemptide was performed with all three photolabile groups mentioned above at pH 7.2 with a consistent yield of caged product (95%), while photolysis with UV light (h,,, = 312 nm) gave yields varying from 62 to 70% and quantum yields from 0.15 to 0.62 at pH 5.8, with a slight decrease in performance at pH 7.2. Finally, NB-caged thiophosphoryl kemptide was used to test the activity of phage h protein phosphatase (h-PPase)before and after photoactivation. The thiophosphate group of NB-caged thiophosphoryl kemptide was fully protected against h-PPase activity, whereas the correspondent group in the unmodified peptide was hydrolyzed to an extent of 90% when incubated at 30°C for 3 h. After UV treatment for 40 min, the uncaged thiophosphoryl kemptide underwent thiophosphate hydrolysis to about 70%. A similar strategy was employed to produce caged thiophosphotyrosyl peptides [99]. The sequence EPQYEEIPILG was thiophosphorylated on the tyrosine residue by action of hematopoietic cell kinase (Hck) in the presence of Co" ions (the authors explain how thiophosphorylation on Tyr with ATP(y)Sand tyrosine kinases failed in conditions that normally work well with standard ATP) and afterwards attached the thiophosphate group again with both NB-Br and 4-HP-Br, respectively. The peptides EPQYp,(HP)EEIPILG and EPQYp,(NB)EEIPILG were obtained in 90 and 75% yield, respectively, regardless of the pH of the reaction buffer (range 5.8 to 8.0). Irradiation of the EPQYp,(HP)EEIPILG peptide at 312 nm afforded the photoproduct EPQYP,EEIPILG with 50-70% yield. Quantum yields were 0.65 and 0.56 at pH 5.8 and 7.3, respectively. The same treatment of EPQYp,(NB)EEIPILG gave EPQYp,EEIPILG in 50 to 60% yield, with quantum yields 0.37 and 0.25 at pH 5.8 and 7.3, respectively. It was verified that the caged peptides were no longer able to bind to an SH2 domain in vitro, while this feature was completely restored upon photolysis (Scheme 3.2-11).Despite the promising characteristics of the above described thiophosphorylated peptides (especially the HP-caged one), to the best of our knowledge, no study has yet been reported in vivo. By means of caged peptides, Lawrence and coworkers successfully prepared a caged protein kinase A in two different ways, (a) via a peptidic affinity label [21] and (b) via a caged inhibitor [152].The peptidic affinity label was designed to target Cys199 in the active loop of the enzyme, interacting with PKA active site only in its caged form, while transforming itself into a low affinity ligand upon photoactivation. This peptide was synthesized by SPPS (see Fig. 3.2-6) and coupled at the C-terminus to the a-carboxyl group of a CNB cage via a diethylamine linker. The caged ligand was subsequently coupled to the thiol group of Cys199, finally affording the caged enzyme.
I
163
164
I
3 Engineering Control Over Protein Function Using Chemistry
I
HZN-EPQYEEIPILG-COOH Kck kinase, Co"
H2N-EPQYEEIPILG-COOH
hv 312nm HP-Br
(50-70%) (90%)
NB-Br (75%)
hv 312nm
H2N-EPQYEEIPILG-COOH I
02N7$ NB-cagedpeptide (inactive)
HP-cagedpeptide (inactive) %OH Scheme 3.2-11 Tyrosine residues equipped with various caging groups rendered peptides inactive with respect to SH2-domain binding.
Fig. 3.2-6 Protein kinase A labeling approach. Underlined letters represent amino acids in the one-letter code.
This caged PKA showed less than 2% of the activity displayed by the native protein, while UV irradiation (300-400 nm, up to 15 min) restored about 50% of the activation of the unmodified enzyme in vitro. Following these in vitro observations, 3-7 pM solutions of caged PKA were microinjected in living rat embryo fibroblasts (REF)-10-fold dilution was estimated after injection - and irradiated with near-UV light (300-400 nm, up to 15 min). In these cells, photoactivation of PKA led to disruption of actin stress fibers, membrane rufling, and change of cell shape from flat to rounded, in accordance with the phenotype observed when unmodified, active catalytic PKA subunit was injected into the same cells. Microinjected cells that were not exposed to UV irradiation retained their stress fibers and flat morphology, indicating that the PKA-inducedpathway had not been activated [21]. PKI is a heat-stable protein first described in 1982 as a potent inhibitor of PKA [153]. On the basis of a short binding sequence, a potent inhibitor peptide with the sequence GRTGRRNAI was identified. The underlined Arg residue played an essential role for the inhibitory behavior of this
3.2 Controlling Protein Function by Caged Compounds
peptide [154].Consequently, a peptide containing an L-ornithine replacing the arginine residue was prepared. The latter was guanidinylated to obtain a caged arginine, the first example described of this kind [ 1521. The guanidinylating reagent resulted from the synthesis of DMNB-OCOCIand S-methylisothiourea (Scheme 3.2-12). The caged peptide was shown to be a SO-fold poorer inhibitor of PKA in vitro (K;= 20 pM) compared to the uncaged counterpart (K;= 420 nM). When REFS were exposed to the membrane-permeant PKA activator, 8-(4-~hlorophenylthio)-cAMP (CPT-CAMP),they underwent the same morphological transformation as described above (disruption of actin stress fibers leading to cell shape changes). In contrast, cells microinjected with the caged peptide (5 pM estimated intracellular concentration) and exposed to UV irradiation were unable to respond to the CPT-CAMPstimulus, demonstrating that the CPT-CAMPactivation of the PKA pathway had been efficiently blocked in vivo by the decaged peptide [152]. 3.2.5.1.4
Caged Phosphorylation Sites and Caged Phosphopeptides
Recently, a Ser-caged,photoactivatable fluorescent peptide probe that monitors protein kinase C (PKC) activity was described [75].As expected, the Ser-caged peptide failed to serve as an effective PKC substrate in vitro,but upon lightinduced deprotection (300-400 nm, h,,,360 nm, 90 s), the serine became phosphorylated and enzyme activity was recorded as a convincing change in the fluorescent properties of the probe. Photoconversion was estimated to occur with 60% yield and a quantum yield of 0.06. With this probe, the investigators also studied the light-induced sampling of PKC activity in HeLa cells in vivo. Exposure of cells to phorbol ester (TPA) normally induces PKC activity. HeLa cells microinjected with the caged probe at an estimated concentration of 20 pM failed to display a fluorescent response to TPA, while a robust response was recorded as a result of a concomitant TPA treatment and UV irradiation (365 nm at 1 J cm-2).
Scheme 3.2-12 A peptide caged at an arginine residue was prepared by attaching a DMNB-coupled S-methylisothiourea reagent t o ornithine [152].R represent further amino acids.
1
165
166
I
3 Engineering Control Over Protein Function Using Chemistry
The phosphorylated varieties with a cage attached to the phosphate are as desirable as caged serine or threonine. Imperiali and colleagues have lately introduced an elegant and general method for the synthesis of peptides containing 2-nitrophenylethyl-caged phosphoserine, phosphothreonine, and phosphotyrosine by integrating an interassembly approach into Fmoc-SPPS [78]. The recently reported method for the synthesis of the phosphocaged Fmoc-building blocks - namely, N-a-Fmoc-phospho(1nitrophenylethyl-2-cyanoethyl)-~-serine, -threonine and -tyrosine - is superior to the introduction of cages to the growing peptide on resin. Especially, the oxidation step required in phosphorous(111) chemistry was potentially hazardous toward oxidation-sensitive residues C-terminal of the caged amino acid [79]. A caged phosphoserine octapeptide equipped with the environmentally (DANA) [155]was sensitive fluorophore 6-(2-dimethylaminonaphthoyl)alanine used in vitro to probe the phosphorylation-dependent binding to 14-3-3 proteins [156], a highly conserved family of proteins that plays a role as an intermediate in the cell cycle regulation through phosphorylation-dependent protein-protein interactions [157].The caged phosphopeptide was unable to bind to the target 14-3-3protein as opposed to the photoproduct after irradiation at 365 nm. This could be monitored by the shift of fluorescence of the DANA amino acid from heml = 522 n m (unbound peptide) to hem2= 501 n m (bound peptide). The investigators have more recently described the use of such caged phosphoserine-containing phosphopeptides to perform a UV-induced, “chemical” knock-down of the entire 14-3-3 protein family thereby observing the effects on cell cycle progression in vivo [158]. A derivative caged at the phosphoserine position of a good 14-3-3-binding motif sequence like MARRLYRpSLPAKK [159]was prepared by SPPS. The efficiency of the photoactivationwas first tested in vitro under conditions mimicking irradiation of cultured cells (365 nm, 90 s, 2.8 J m-’ irradiation dose).The uncaged phosphopeptide was obtained in about 80% yield, quantum yield of 0.43 and was able to compete with cellular proteins for 14-3-3binding in vitro, as demonstrated by competitive binding assays performed in U20S cell lysates (Scheme 3.2-13). The caged phosphopeptide was subsequently supplied to living U20S cells by connecting it to the cell-permeable Penetratin sequence [161]via a disulphide bond between N-terminal cysteine residues. After internalization and release from vector peptide by spontaneous hydrolysis of the disulfide bridge, effects of uncaged phosphopeptide disturbance on 14-3-3 binding to natural target proteins were studied under several conditions. For instance, synchronized U20S cells that received the peptides in an early G2 phase and were subjected to UV treatment (365 nm, 90 s) showed (a) an increased cell death ratio compared to controls, (b) an uncontrolled premature entry into M phase accompanied by mitotic catastrophe, and (c) a striking reduction in the stable G1 cell population, suggesting that 14-3-3 proteins normally regulate the onset and timing of mitosis in cycling cells and maintain stable interphase arrest in noncycling cells. The role of 14-3-3 proteins in
3.2 Controlling Protein Function by Caged Compounds
Ac
I
167
CONHp
CONHp "\
I
522 nm
ACC -ONH~ O
f
0-P=O
\
h = 501 nm \ /N\
Scheme 3.2-13 An octapeptide equipped with the environmentally sensitive dye DANA. Only after decaging, binding t o 14-3-3 domains is possible and is measured by a shift in fluorescence due t o the change in the lipophilicity ofthe environment [160].
the S-phase checkpoint response to DNA damage is speculative, since cells incubated with caged peptides and simultaneously exposed to both UV-A and UV-B (respectively 365 and 302 nm, 90 s) to induce uncaging and DNA damage were unable to sustain S-phase arrest compared to controls, resulting in ca SO% early apoptotic cell death. To prepare larger phosphoproteins with cages on the phosphate moiety, it was necessary to combine the synthesis of caged phosphopeptides [78, 791 with expressed protein ligation [162, 1631. The ligation of a recombinant Smad2-MH2 thioester with the doubly NPE-caged C-terminal phosphopeptide yielded a recombinant protein that formed a heterodimer with the cytosolic retention factor Sara (Smad anchor for receptor activation). Decaging permitted the release of Sara and subsequently the formation of active homotrimers. Decaging was also followed in digitonin-permealized HeLa cells by monitoring nuclear entry of Srnad2-MH2 after illumination [162]. This methodology was extended using a cage in the backbone of the MH2 peptide. Photorelease of the bulky N-terminus permitted homotrimerization. This was made visible by adding fluorescein next to the phosphorylation sites and a dabcyl quencher to the N-terminus. Photoinduced homotrimerization was therefore accompanied by a strong increase of the fluorescein fluorescence [164].
168
I
3 Engineering Control Over Protein Function Using Chernistv
MeoX:r-"" Me0
Fig. 3.2-7 A chemotactic tripeptide caged at the N-formyl group.
\
H
YN'Met-Leu-Phe-OMe
0 3.2.5.1.5
Other Caged Residues
Some N-formylated peptides are known to promote chemotaxis in mammalian leukocytes, acting specifically via the formyl peptide receptor (FPR) located on the plasma membrane of neutrophils [165].Among them, the most active peptide is the tripeptide N-formyl-&)Met-&)Leu-(L)Phe.Caged versions of such a peptide have been synthesized employing either nitroveratrylaldehyde or nitropiperonaldehyde as photoremovable protecting groups at the N-formyl moiety (Fig. 3.2-7) [lGG]. Although the described caged peptides exhibited a drop of activity by 3-4 orders of magnitude in a rat basophilic leukemia RBL2H3 cell line, a study concerning photoactivation in vivo and related effects on chemotaxis has not yet been reported.
3.2.6 Conclusions
Caged compounds including caged proteins are extremely useful tools to study biochemical processes inside and outside of living cells. The respective molecules have been employed in a large variety of areas. However, the overall number of research groups benefiting from the technology is still fairly small. It would be desirable if novel caging groups, caged molecules, and ready-to-use decaging equipment would be more easily accessible. We will definitely see more of the exciting applications in the future. For this, as in more and more areas in biology, the close collaboration of chemists and biologists will be indispensable.
References
J.M. Nerbonne, Curr. Opin. Neurobiol. 1996, 6, 379-386. Y. Tatsu, Y. Yumoto, N. Shigeri, Phamzacol. Ther. 2001, 91, 85-92. K. Lawrence, D.S. Curley, Curr. Opin. Chem. Biol. 1999, 3,84-88. L. Heckel, A. Krock, Angew. Chem. Int. Ed. 2005, 44,471-473.
H. Furuta, T. Tsien, R.Y. Okamoto, H. Ando, Nat. Genet. 2001, 28, 317-325. H. Fumta, T. Okamoto, H. Ando, Methods Cell. Biol. 2004, 77, 159-171. M. Givens, R.S. Goeldner, (Eds.), Dynamic Studies in Biology, WileyIVCH, New York, 2005.
References I169 8. J.A. Schofield, P. Barltrop,
Tetrahedron Lett. 1962, 697-699. 9. C.G. Bochet, J. Chem. SOC.Perkin Trans. 12002, 125-142. 10. J.W. Reid, G.P. McCray, J.A. Trentham, D.R. Walker,J. Am. Chem. SOC.1988, 110,7170-7177. 11. V. Frings, S. Bendig, J. Lorenz, D. Wiesner, B. Kaupp, U.B. Hagen, Angew Chem. Int. Ed. 2002, 41, 3625-3628. 12. J. Schlaeger, E.J. Engels,J. Med. Chem. 1977,20,907-911. 13. B.Z.U. Patchornik, A. Amit, 1sr.J. Chem. 1974,103-113. 14. H. Wong, W.K. Schnabel, W. Schupp, J. Photochem. 1987, 36, 85-97. 15. Q.Q. Schnabel, W.Schupp, H. Zhu, 1.Photochem. 1987, 39, 317-332. 16. J.H. Forbush, B. Hoffman, J.F. Kaplan, Biochemistry 1978, 17, 1929-1935. 17. T. Matsubara, N. Billington, A.P. Udgaonkar, J.B. Walker, J.W. Carpenter, B.K. Webb, W.W. Marque, J. Denk, W. McCray, J.A. Hess, G.P. Milburn, Biochemistry 1989,28,49-55. 18. E. Millar, N.C. Homsher, Annu. Rev. Physiol. 1990, 52, 875-896. 19. J.E.T. Barth, A. Munasinghe, V.R.N. Trentham, D.R. Hutter, M.C. Corrie, J. Am. Chem. SOC. 2003, 125, 8546-8554. 20. A.P. Walstrom, K.M. Ramesh, D. Guzikowski, A.P. Carpenter, B.K. Hess, G.P. Billington, Biochemistry 1992,31,5500-5507. 21. K. Lawrence, D.S. Curley, J. Am. Chem. SOC.1998, 120,8573-8574. 22. P.M. Lester, H.A. Davidson, N. Dougherty, D.A. England, Proc. Natl. Acad Sci. U. S. A. 1997, 94, 11025-1 1030. 23. V.W. Mendel, D. Schultz, P.G. Cornish, Angew Chern. Int. Ed. 1995, 34,621-633. 24. C.J. Anthonycahill, S.J. Griffith, M.C. Schultz, P.G. Noren, Science 1989, 244,182-188. 25. D.Cornish, V.W. Schultz, P.G. Mendel, Annu. Rev. Biophys. Biomol. Struct. 1995, 24, 435-462.
26. L.E. Collins, C.S. Gilmore, M.A.
27. 28. 29. 30.
31. 32.
33.
34.
35. 36.
37.
38. 39. 40. 41.
Carlson, J.E. Ross, J.B.A. Chamberlin, A.R. Steward, J. Am. Chern. Soc. 1997, 119,6-11. B. Woodward, R.B. Patchornik, A. Amit,J. Am. Chem. SOC.1970, 92, 6333-6335. G. Marriott, Biochemistry 1994, 33, 9092-9097. G. Heidecker, M. Marriott, Biochemistry 1996, 35, 3170-3174. R. Zehavi, U. Naim, M. Patchornik, A. Smirnoff, P. Golan, Biochem. Biophys.Acta Prot. Strut. Mol. Enzymol. 1996, 1293,238-242. A.R. Xu, Y.J. Vakulenko, A.V. Wilcox, A.L. Bley, K.R. Katritzky, J. Org. Chem. 2003, 68,9100-9104. J.F. Wootton, D.R. Trentham in Photochemical Probes in Biochemistry, (Ed.: P.E. Nielsen), Kluwer Academic Publishers, 1989, pp 277-296. M. Viola, R.W. Johnson, K.W. Billington, A.P. Carpenter, B.K. Mccray, J.A. Guzikowski, A.P. Hess, G.P. Wilcox,J. Org. Chem. 1990, 55, 1585- 1589. H. Eisele-Buhler, S. Hermann, C. Kvasyuk, E. Charubala, R. Pfleiderer, W. Giegrich, Nucleosides Nucleotides 1998, 17, 1987-1996. K.R. DeLisi, C. Laursen, R.A. Bhushan, Tetrahedron Lett. 2003, 44, 8585-8588. E.M.H. Hadfield, A. Waiters, S . Wakatsuki, S. Bryan, R.K. Johnson, L.N. Duke, Phil. Trans. Royal Soc. Ser. A Math Phys. Eng. Sci. 1992,340,245-261. S . Nicolet, Y. Masson, P. Fontecilla-Camps, J.C. Bon, S . Nachon, F. Goeldner, M. Loudwig, Chembiochem2003,4, 762-767. A. Goeldner, M. Specht, Angew Chem. Int. Ed. 2004, 43, 2008-2012. M.A. Goldman, Y.E. Trentham, D.R. Ferenczi, J. Physiol. (London)1989, 418, P155. S.R. Kao, J.P.Y. Tsien, R.Y. Adams, J. Am. Chern. SOC. 1989, 1 1 I , 7957-7968. A. Grewer, C. Ramakrishnan, L. Jager, J. Gameiro, A. Breitinger,
170
I
3 Engineering Control Over Protein Function Using Chemistry
H.G.A. Gee, K.R. Carpenter, B.K. Hess, G.P. Banerjee, J . Org. Chem. 2003, 68,8361-8367. 42. C.G. Bochet, Tetrahedron Lett. 2000, 41,6341-6346. 43. A. Bochet, C.G. Blanc, J . Org. Chem. 2002, 67,5567-5577. 44. G. Prestwich, G.D. Dorman, Trends Biotechnol. 2000, 18, 64-77. 45. R.S. Athey, P.S. Matuszewski, B. Kueper, L.W. Xue, J.Y. Fister, T. Givens, J . Am. Chem. Soc. 1993, 115,6001-6012. 46. R.S. Kueper, L.W. Givens, Chem. Rev. 1993, 93,515-66. 47. R.S. Jung, A. Park, C.H. Weber, J. Bartlett, W. Givens, J . Am. Chem. SOC. 1997, 119,8369-8370. 48. K. Corrie, J.E.T. Munasinghe, V.R.N. Wan, P. Zhang,]. Am. Chem. SOC. 1999, 121,5625-5632. 49. A. Falvey, D.E. Banerjee, J. Am. Chem. Soc. 1998, 120,2965-2966. 50. J.C. Wilson, R.M. Sheehan, J . Am. Chem. SOC.1964,86,5277. 51. J.E.T. Trentham, D.R. Corrie,]. Chem. SOC.Perkin Trans. 11992, 2409-2417. 52. Y.J. Corrie, J.E.T. Wan, P. Shi,J. Org. Chem. 1997, 62,8278-8279. 53. R.S. Chan, S.I. Rock,J. Am. Chem. Soc. 1998, 120,10766-10767. 54. K.C. Rock, R.S. Larsen, R.W. Chan, S.I. Hansen, J . Am. Chem. Soc. 2000, 122,11567-11568. 55. M.A. Balduzzi, S. Mohamed, M. Gottardo, C. Brook, Tetrahedron 1999,55,10027-10040, 56. B. Kullmann, P.H. Bier, M.E. Kandler, K. Schmidt, B.F. Curten, Photochem. Photobiol. 2005, 81, 641-648. 57. T. Wang, S.S.H. Dantzker, J.L. Dore, T.M. Bybee, W.J. Callaway, E.M. Denk, W. Tsien, R.Y. Furuta Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 1193-1200. 58. T. Torigai, H. Sugimoto, M. Iwamura, M. Fumta,J. Org. Chem. 1995, GO, 3953-3956. 59. V. Bendig, J. Frings, S. Eckardt, T. Helm, S. Reuter, D. Kaupp, U.B. Hagen, Angew Chem. Int. Ed. 2001, 40,1045-1048.
60.
61.
62.
63.
64.
65.
66. 67.
68.
69. 70.
71. 72.
73.
74. 75.
76.
77.
V. Frings, S. Wiesner, B. Helm, S. Kaupp, U.B. Bendig, J. Hagen, Chembiochem2003,4,434-442. T. Takeuchi, H. Isozaki, M. Takahashi, Y. Kanehara, M. Sugimoto, M. Watanabe, T. Noguchi, K. Dore, T.M. Kurahashi, T. Iwamura, M. Tsien, R.Y. Furuta, Chembiochem2004,5,1119-1128. D.H.R. Sammes, P.G. Weingarten, G.G. Barton,]. Chem. SOC.( C ) 1971, 721-725. D.A. Patchornik, A. Amit, B. Ben-Efraim,]. Am. Chem. SOC. 1976,843-844. G. Ogden, D.C. Barth, A. Corrie, J.E.T. Papageorgiou, J. Am. Chem. SOL. 1999, 121,6503-6504. L.P.J. White, J.D. Burton, Tetrahedron Lett. 1980, 21, 3147-3150. P.N. Woodward, R.B. Confalone, J . Am. Chem. Soc. 1983, 105, 902-906. A.D. Pizzo, S.V. Rozakis, G.W. Porter, N.A. Turner,]. Am. Chem. SOC.1987, 109,1274-1275. A.D. Pizzo, S.V. Rozakis, G. Porter, N.A. Turner,J. Am. Chem. SOC.1988, 110,244-250. M.C. Lee, Y.R. Pirrung, J . Org. Chem. 1993,58,6961-6963. M.C. Fallon, L. Zhu, J. Lee, Y.R. Pirmng,]. Am. Chern. Soc. 2001, 123, 3638-3643. P. Bayley, H. Pan, F E B S Lett. 1997, 405,81-85. G. Guo, X.C. Beebe, K.D. Coggeshall, K.M. Pei, D. Arabaci, J . Am. Chem. SOC.1999, 121,5085-5086. G. Yi, T. Fu, H. Porter, M.E. Beebe, K.D. Pei, D.H. Arabaci, Bioorg. Med. Chem. Lett. 2002, 12, 3047-3050. L. Goeldner, M. Peng,]. Org. Chem. 1996, GI, 185-191. W.F. Nguyen, Q. McMaster, G. Lawrence, D.S. Veldhuyzen, J . Am. Chem. Soc. 2003, 125, 13358-13359. K.Y. Cheley, S. Givens, R.S. Bayley, H. Zou, J . Am. Chem. SOC.2002, 124, 8220-8229. D.M. Peterson, E. J. Vazquez, M.E. Brandt, G.S. Dougherty, D.A.
I
References 171
78. 79.
80. 81. 82. 83. 84. 85. 86.
87.
88. 89. 90. 91. 92. 93. 94.
95.
96.
Imperiali, B. Rothman, J . Am. Chem. SOC.2005, 127,846-847. D.M. Vazquez, E.M. Vogel, E.M. Imperiali, B. Rothman, Org. Lett. 2002,4,2865-2868. D.M. Vazquez, M.E. Vogel, E.M. Imperiali, B. Rothman, ]. Org. Chem. 2003, 68,6795-6798. C. Wichmann, 0. Schultz, C. Dinkel, Tetrahedron Lett. 2003, 44, 1153-1155. J.W. Feeney, J. Trentham, D.R. Walker, Biochemistry 1989, 28, 3272-3280. J.W. Reid, G.P. Trentham, D.R. Walker, Methods Enzymol.1989, 172, 288-301. S. Spoors, ].A. Fawcett, M.C. Self, C.H. Thompson, Biochem. Biophys. Res. Commun. 1994, 201,1213-1219. V.N. Pillai, Synthesis 1980, 1-26. C.H. Thompson, S. Self, Nat. Med. 1996, 2,817-820. G. Ottl, J. Heidecker, M. Gabriel, D. Marriott, Methods Enzymol.1998, 291,95-116. P. Rajfur, Z.Jones, D. Marriott, G. Loew, L. Jacobson, K. Roy,]. Cell Biol. 2001, 153, 1035-1047. D. Nachmias, V.T. Safer, Bioessays 1994, 16,473-479. G. Roy, P. Jacobson, K. Marriott, Methods Enzymol.2003, 360, 274-288. C.Y. Niblack, B. Walker, B. Bayley, H. Chang, Chem. Biol. 1995, 2, 391-400. C.Y. Fernandez, T. Panchal, R. Bayley, H. Chang,]. Am. Chem. SOL.1998, 120,7661-7662. L.N. Noble, M.E.M. Owen, D. J. Johnson, Cell 1996,85,149-158. J.R. McCough, A. Ono, S. Bamburg, Trends Cell. Biol. 1999, 9, 364-370. M . Ichetovkin, I. Song, X.Y. Condeelis, J.S. Lawrence, D.S. Ghosh,]. Am. Chem. Soc. 2002, 124, 2440-2441. M. Song, X.Y. Mouneimne, G. Sidani, M. Lawrence, D.S. Condeelis, J.S. Ghosh, Science 2004, 304,743-746. T. Hunter, Cell 1995, 80, 225-236.
97. B.G. Tonks, N.K. Neel, Curr. Opin.
Cell. Biol. 1997, 9, 193-204.
98. C.H. Givens, R.S. Park, ]. Am. Chem.
Soc. 1997, 119, 2453-2463.
99. K.Y. Miller, W.T. Givens, R.S. Bayley,
H. Zou, Angew Chem. Int. Ed. 2001, 40,3049-3051. 100. A. Ursby, T. Weik, M. Peng, L. Kroon, J. Bourgeois, D. Goeldner, M. Specht, Chembiochem2001, 2, 845-848. 101. L. Brock, A. Herberich, B. Schultz, P.G. Wang, Science 2001, 292, 498-500. 102. T. Ashizuka, Y. Murakami, H. Sisido, M. Hohsaka, Nucleic Acids Res. 2001, 29,3646-3651. 103. T. Ashizuka, Y. Taira, H. Murakami, H. Sisido, M. Hohsaka, Biochemistry 2001,40,11060- 11064. 104. D. Ellman, J.A. Schultz, P.G. Mendel, J . Am. Chem. SOC.1991, 113, 2758-2760. 105. L.H. Matthews, B.W. Weaver, ]. Mol. Biol. 1987, 193, 189-199. 106. S.N. Jack, W.E. Xiong, X. Danley, L.E. Ellman, J.A. Schultz, P.G. Noren, C.J. Cook, Angew Chem. Int. Ed. 1995, 34,1629-1630. 107. G.F. Lodder, M. Laikhter, A.L. Arslan, T. Hecht, S.M. Short, ]. Am. Chem. SOC.1999, 121,478-479. 108. L.J. Tomaszek, T.A. Roberts, G.D. Carr, S.A. Magaard, V.W. Bryan, H.L. Fakhoury, S.A. Moore, M.L. Minnich, M.D. Culp, J.S. Desjarlais, R.L. Meek, T.D. Hyland, Biochemistry 1991, 30, 8441-8453. 109. L.J. Tomaszek, T.A. Meek, T.D. Hyland, Biochemistry 1991, 30, 8454-8463. 110. E.D. Wang, G.T. Krafft, G.A. Erickson, J. Matayoshi, Science 1990, 247,954-958. 111. M. Nakayama, K. Majima, T. Endo, J . Org. Chem. 2004, 69,4292-4298. 112. M. Strzelecka, T. Dorner, L.F. Schildkraut, I. Agganval, A.K. Newman, Structure 1994, 2,439-452. 113. M. Strzelecka, T. Dorner, L.F. Schildkraut, I. Agganval, A.K. Newman, Nature 1994,368,660-664. 114. K. Endo, M. Majima, T. Nakayama, Chem. Commun. 2004,2386-2387.
172
I
3 Engineering Control Over Protein Function U!iing Chemistry 115.
116.
117.
118.
119.
120.
121. 122. 123. 124.
125. 126.
127.
128. 129.
130. 131. 132.
M. Nakayama, K. Kaida, Y. Majima, T. Endo, Angew Chem. Int. Ed. 2004, 43,5643-5645. J.C. Silverman, S.K. England, P.M. Dougherty, D.A. Lester, H.A. Miller, Neuron 1998, 20,619-624. K.D. Gallivan, J.P. Brandt, G.S. Dougherty, D.A. Lester, H.A. Philipson, Am.J . Physiol. Cell. Physiol. 2001, 281, C195-C206. E. J. Brandt, G.S. Zacharias, N.M. Dougherty, D.A. Lester, H.A. Petersson, Biophotonics Pt A 2003, 360,258-273, G.S. Tong, Y.H. Li, M. Lester, H.A. Dougherty, D.A. Brandt, Biochemistry 2000,39,1575-1576. Y.H. Brandt, G.S. Li, M. Shapovalov, G. Slimko, E. Karschin, A. Dougherty, D.A. Lester, H.A. Tong, J . Gen. Physiol. 2001, 1 1 7, 103- 118. R. Gee, K. Lee, H.C. Aarhus,]. Biol. Chem. 1995, 270,7745-7749. L.J. Corrie, J.E.T. Wootton, J.F. Wang, J. Org. Chem. 2002,67, 3474-3478. L.R. Tsien, R.Y. Makings, J . Biol. Chem. 1994, 269,6282-6285. X.P. Sreekumar, R. Patel, J.R. Walker, J.W. Huang, Bi0phys.J. 1996, 70,2448-2457. J. Gadella, T.W.J. Goedhart, Biochemistry 2004,43,4263-4271. B.T. Reich, R. Neeman, M. Bercovici, T. Liscovitch, M. Williger, J . Biol. Chem. 1995, 270,29656-29659. S. Hirokawa, R. Iwamura, M. Watanabe, Bioorg. Med. Chem. Lett. 1998, 8, 3375-3378. J.E.T. Corrie,J. Chem. SOC.Perkin Trans. 11993,2161-2166. W.-h. Llopis, J. Whitney, M. Zlokarnik, G. Tsien, R.Y. Li, Nature 1998, 392,936-941. C. Schultz, C. Dinkel, Tetrahedron Lett. 2003, 44, 1157-1159. J.A. Prestwich, G.D. Chen, Tetrahedron Lett. 1997, 38, 969-972. S.R. Kao, J.P.Y. Grynkiewicz, G. Minta, A. Tsien, R.Y. Adams, J . Am. Chem. Soc. 1988, 110, 3212-3220.
133.
134.
135.
136.
137. 138.
139. 140.
141.
142.
143. 144.
145. 146. 147.
148.
149.
150.
151.
G.C.R. Kaplan, J.H. Barsotti, R.J. Ellis-Davies, Biophys. ]. 1996, 70, 1006- 1016. R. Ramesh, D. Carpenter, B.K. Hess, G.P. Wieboldt, Biochemistry 1994, 33, 1526-1533. F.M. Margulis, M. Tang, C.M. Kao, J.P.Y. Rossi, J . Biol. Chem. 1997, 272, 32933-32939. L. Wieboldt, R. Ramesh, D. Carpenter, B.K. Hess, G.P. Niu, Biochemistry 1996,35,8136-8142. F.M. Kao, J.P.Y. Rossi, J . Biol. Chem. 1997,272,3266-3271. Y.Q. Angleson, J.K. Kutateladze, A.G. Wan,]. Am. Chem. SOC.2002, 124, 5610-5611. S.R. Tsien, R.Y. Adams, Annu. Rev. Physiol. 1993, 55, 755-784. Y. Shigeri, Y. Ishida, A. Kameshita, I . Fujisawa, H. Yumoto, N. Tatsu Bioorg. Med. Chem. Lett. 1999, 9, 1093- 1096. A. Shigeri, Y. Tatsu, Y. Uegaki, K. Kameshita, I. Okuno, S. Kitani, T. Yumoto, N. Fujisawa, H. Ishida, FEBS Lett. 1998,427,115-118. Y. Shigeri, Y. Sogabe, S. Yumoto, N. Yoshikawa, S. Tatsu, Biochem. Biophys. Res. Commun.1996, 227, 688-693. A.G. Jung, G. Beck-Sickinger, Biopolymers 1995, 37, 123-142. T.J. Burgess, W.H. Prendergast, F.G. Lau, W. Watterson, D.M. Lukas, Biochemistry 1986, 25,1458-1464. K. Debiasio, R. Taylor, D.L. Hahn, Nature 1992, 359, 736-738. K.A. Taylor, D.L. Giuliano, Curr. Opin. Cell. Biol. 1995, 7, 4-12. M. Clore, G.M. Gronenborn, A.M. Zhu, G. Klee, C.B. Bax, A. Ikura, Science 1992, 256, 632-638. A. Ikura, M. Crivici, Annu. Rev. Biophys. Biomol. Struct. 1995, 24, 85-116. M. Ikebe, R. Matsuura, M. Ikebe, M. Tanaka, EMBOJ. 1995, 14, 2839-2846. R. Ikebe, M. Fay, F.S. Walker, J.W. Sreekumar, Methods Enzymol.1998, 291,78-94. J.W. Gilbert, S.H. Drummond, R.M. Yamada, M. Sreekumar,
References
152.
153. 154.
155.
156.
157. 158.
R. Carraway, R.E. Ikebe, M. Fay, F.S. Walker, Proc. Natl. Acad Sci. U. S. A. 1998, 95,1568-1573. J.S. Koszelak, M. Liu, J. Lawrence, D.S. Wood, J . Am. Chem. Soc. 1998, 120,7145-7146. S. Walsh, D.A. Whitehouse, J . Biol. Chem. 1982, 257,6028-6032. H.C. Kemp, B.E. Pearson, R.B. Smith, A.J. Misconi, L. Vanpatten, S.M. Walsh, D.A. Cheng, J . Biol. Chem. 1986, 261,989-992. B.E. McAnaney, T.B. Park, E.S. Jan, Y.N. Boxer, S.G. Jan, L.Y. Cohen, Science 2002, 296, 1700-1703. M.E. Rothman, D.M. Imperiali, B. Vazquez, Org. Biomol. Chem. 2004, 2,1965-1966. A.J. Tanner, J.W. Allen, P.M. Shaw, A.S. Muslin, Cell 1996, 84,889-897. A. Rothman, D.M. Stehn, J. Imperiali, B. Yaffe, M.B. Nguyen, Nat. Biotechnol. 2004, 22, 993-1000.
159.
160.
161.
162. 163. 164.
165.
166.
M.B. Rittinger, K. Volinia, S. Caron, P.R. Aitken, A. Leffers, H. Gamblin, S.1. Smerdon, S.J. Cantley, L.C. Yaffe Cell 1997, 91, 961-971. M.E. Nitz, M. Stehn, J . Yaffe, M.B. Imperiali, B. Vazquez,]. Am. Chem. SOC. 2003, 125,10150-10151. D. Chassaing, G. Prochiantz, A. Derossi, Trends Cell. Bid. 1998, 8, 84-87. M.E. Muir, T.W. Hahn, Angew Chem., Int. Ed. 2004,43,5800-5803. T.W. Muir, Annu. Rev. Biochem. 2003, 72,249-289. J.P. Hahn, M.E. Muir, T.W. Pellois, J . Am. Chem. Soc. 2004, 126, 7170-7171. E.L. Bleich, H.E. Day, A.R. Freer, R.J. Clasel, J.A. Visintainer, J. Becker, Biochemistry 1979, 18,4656-4668. M.C. Drabik, S.J. Ahamed, J. Ah, H. Pirrung, Bioconjug. Chem. 2000, 11,679-681.
1
173
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
174
I
3 Engineering Control Over Protein Function Using Chemistry
3.3 EngineeringControl Over Protein Function; Transcription Control by Small Molecules
John T.Koh
Outlook
Ligand-inducible transcription factors, whether derived from heterologously expressed prokaryotic regulatory proteins or reengineered eukaryotic receptors, continue to play an invaluable role in studying gene function. Through the study of reengineering ligand-binding specificities of nuclear receptors and other transcription factors, new tools for exploring emerging extranuclear roles for these receptors can be generated. Developing new strategies for selective, functionally orthogonal, ligand-receptor pairs can be applied more broadly in chemical biology in the form of chemical inducers of dimerization (CIDs), or analog-specific enzymes. Similar design principles may also be applied to the functional rescue of disease-associated mutant proteins that have defects in binding small molecules. The impact that ligand-inducible transcription factors have had on the study of biology over the past decade highlights the importance of developing new methods to precisely manipulate and study complex biological systems at the molecular level. The availability of multiple ligand-dependent transcription factors further increases the level of complexity and sophistication with which we can probe complex biological phenomena. In the future new systems such as light-directed transcription control may play a powerful role in dissecting the roles of genes that act through their unique spatiotemporal patterns in tissue. These efforts will similarly require continued development of new tools based on the marriage of both chemical and biological methods.
3.3.1 Introduction
This chapter reviews strategies for manipulating or engineering de novo proteins that can regulate gene expression in response to small molecules. Methods that allow us to control the expression of genes in a spatially and temporally defined manner provide powerful tools for the study of gene function. The study of naturally occurring ligand-inducible transcriptional regulators affords insights into the strategies that nature uses to remotely regulate protein function, thus providing a basis with which to control and study the actions of virtually any gene product through the remote regulation of its expression. Ligand-receptor engineering can be used to create new transcriptional regulators, to provide the means to selectively Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
3.3 Engineerjng Control Over Protein Function; Transcription Control by Small Molecules
activate one of many cellular pathways responsive to the same ligand, and may further provide new strategies to rescue disease-associated mutants of liganddependent proteins. In addition, new methods to control gene expression with light can be used to spatially and temporally pattern genes in tissues.
3.3.2 The Role of Ligand-dependent Transcriptional Regulators
Many naturally occurring inducible transcriptional regulators have been used in heterologous systems for controlling protein expression. However, more recently a number of research groups have used a combination of chemical and genetic approaches to reengineer the specificity of transcriptional regulators [l-31. Emerging methods may allow one to convert otherwise nonligand-responsive proteins into ligand-responsive systems. Several new technologies offer unprecedented control over gene expression using nucleic acids such as antisense, ribozyme interference, and RNAi [4-6]. New methods to control mRNA translation in a ligand-dependent manner offer a new dimension of transcriptional control [7,8].These methods can be used in conjunction with ligand-dependent expression systems to provide spatial and temporal control of genes. In addition to strictly “on/ofl” responses, dose-dependent or “rheostatic” control expression can provide exquisite control of gene functions dependent on specific stoichiometries or spatiotemporal patterning.
3.3.2.1
Ligand-dependent Transcriptional Regulators Derived from Natural Repressors
Practical applications of ligand-regulated transcription factors need to function in a highly specific manner that should ideally only effect the expression of the gene of interest. Some of the most common ligand-inducible transcriptional regulators are derived from naturally occurring proteins. For example, the lac repressor binds to operons (i.e., genetic binding sites) in the promotor region preceding genes, and blocks transcription (Fig. 3.3-1). The lac repressor forms a homotetramer that spans two operon sites and blocks association of the transcription machinery through a combination of direct occlusion of DNAbinding sites and perturbation of DNA topology [9, 101. Binding of the smallmolecule lactose or the stable synthetic analog isopropylthiogalactropyranoside (IPTG) occurs near a dimerization interface, induces a conformational change that disrupts oligomerization and DNA binding, thus exposing the full promotor sequence and allowing for gene expression. The lac operon is highly inducible and widely used for controlling protein production in eukaryotes. This is particularly important when expression of the target protein, either toxic to the cell or otherwise, adversely affects growth. Several prokaryotic repressors can also be used in eukaryotes. Most notable is the tetracycline (TET) inducible expression system, which has had a
I
175
176
I
3 Engineering Control Over Protein Function Using Chemistry
Fig. 3.3-1 Prokaryotic repressors can be exploited t o control eukaryotic expression. (a) Repressor used to turn off transcription, (b) repressor-activator chimera used to “turn-on” gene expression. AD - activation domain.
tremendous impact on the study of eultaryotic protein function [ll, 121. Similar to LacR, DNA binding of TET is also conformationally controlled by the association of a small molecule, tetracycline (tet). Tet binding triggers dissociation of the repressor dimer and loss of DNA binding (Fig. 3.3-l(a)). These ligand-dependent repressors have been converted to ligand-dependent activators through fusion of LacR or TET to the potent HSV (herpes simplex virus) transactivation domain (VPlG). These systems provide tight control over genes of interest placed behind minimal promoters having threeflanking operator binding sequences (Fig. 3.3-1(b)).The LacR-VP1G chimera has approximately 1000-fold inducibility but is slow to respond to the addition of IPTG [lo]. In the original TET system, cells continuously treated with tet repress gene expression. When tet is removed gene expression is activated. The need to continuously treat cells with tet was a significant drawback as it was unclear what effects long term exposure to tet could have on a specific system. Bujard et al. were able to reengineer the TET so that it only bound DNA in the presence of doxycycline (dox). Fusion of this modified form of TET to the VPlG transactivation domain formed a dox-responsive transactivator, tTA or “Tet-On,” that tightly and rapidly upregulates the transcription 105-foldwhen dox is added [12]. These pioneering studies have lead to the development of a number of ligand-inducible activators based on prokaryotic proteins. Ligand-dependent transcriptional regulators have been derived from prokaryotic repressors that bind DNA in response to small-molecule ligands to commercially available antibiotics of the macrolide and streptogramin families [13,14].Because the protein binds DNA only when liganded, chimeras generated from the fusion of these repressors to transactivation domains can serve as potent ligand-dependent transcriptional activators.
3.3 Engineering Control Over Protein Function; Transcription Control by Small Molecules
(4
0
HO
OH
..fro-.
(b)
(c)
,
Fig. 3.3-2 Prokaryotic regulators of transcription have been adapted for use as eukaryotic transcriptional regulators: (a) macrolide, (b) streptogramin, (c) quorum signaling p-0x0-hexanoylhomoserine lactone.
3.3.2.2
Exploiting Prokaryotic Ligand-dependentActivators
Another class of small-molecule transcription factors is the quorumsensing receptors that often respond to surprisingly simple small molecules (Fig. 3.3-2(c)).These naturally occurring small-molecule dependent transcriptional regulators have been pursued as a means to control prokaryotic genes only recently and therefore their discussion here is brief [lS]. Nonetheless, these naturally occurring prokaryotic transcriptional regulators hold promise as an important new source of ligand-dependent transcriptional regulators of eukaryotic genes. An interesting example is that of the acetaldehyde responsive protein that controls expression in response to gaseous molecule and can therefore enforce transcription control in a whole animal transgenic model through its air supply [lG]. 3.3.2.3
Reprogramming Eukaryotic Transcriptional Regulators
A critical requirement for any transcriptional regulator to be used in the study of gene function is the strict selectivity of the ligand-receptor pair to activate only the gene of interest. Several groups have developed methods to “reprogram” the ligand-binding specificity and gene targeting specificity of transcriptional regulators. The need to change both ligand-binding specificity as well as DNA-binding specificity greatly limits the possible receptors that can be directly reengineered to provide control over transgene expression. For example, G-protein coupled receptors (GPCRs) are an important class of signaling receptors that regulate gene expression in response to small molecules. However, GPCRs regulate expression through signaling pathways that involve the intermediary actions of multiple proteins. In this case, the ligand binding and DNA recognition events are separated on many different proteins, making their “reprogramming” difficult. Nuclear and steroid hormone receptors (NHRs),in contrast to GPCRs, are ligand-inducible transcription factors, which when liganded directly bind to
I
177
178
I hormone response elements in eukaryotic genes and upregulate transcription 3 Engineering Control Over Protein Function Using Chemistry
through a ligand-dependent transactivation domain [ 17, 181. When in their unliganded forms, most steroid hormone receptors are not bound to DNA, but are instead sequestered by heat-shock proteins (hsps). Steroid hormone binding causes dissociation from hsps, dimerization, and DNA binding. In contrast, the unliganded forms of the “nuclear” receptors such as thyroid hormone, retinoic acid, peroxisome proliferator-activated receptor (PPAR), and vitamin D receptor (VDR) are generally bound to DNA as heterodimers with RXR retinoid X receptor and bind corepressor proteins that actively repress gene expression (Fig. 3 . 3 - 3 ) . The ligand-dependent transactivation domain and the DNA-binding domains of these receptors function relatively independent of each other, allowing one to create functional chimeras that redirect the actions of specific hormones to new genes through alternate DNA-binding domains. For example, an early study by Greene and Chambon demonstrated that by exchanging the glucocorticoid receptor (GR) ligand-binding domain for that of the estrogen receptor (ER), glucocorticoid-responsive genes could be rendered responsive to estradiol (E2) [19]. A number of other functional chimeras have been constructed by exchanging DNA-binding domains from other NHRs including thyroid hormone receptor (TR)/retinoid X receptor chimeras [20], retinoic acid/VDR chimeras (211, and TR/GR chimeras [22]. Functional chimeras have also been generated using non-NH R DNA-binding
Fig. 3.3-3
General mechanism of nuclear/steroid hormone receptor action. (a) Steroid hormone receptors are generally sequestered by heat-shock proteins (hsp) in
their unliganded forms. (b) Nuclear receptors can bind to DNA in the absence of ligand and can associate with transcriptional repressors.
3.3 Engineering Control Over Protein Function; Transcription Control by Small Molecules
domains such as the progesterone receptor (PR) Gal4 DNA-binding domain chimera, developed by Wang and O’Malley [23]. Several studies have shown that DNA-binding domains can be reengineered or evolved to bind to new DNA-binding sequences [24, 251. Therefore, NHRs are an attractive scaffold from which to develop new, selective transcriptional regulators as they in principle can be modified to regulate almost any transgene of interest. However, application of these systems is still limited by the presence of other endogenous receptors that are also responsive to the same hormone. The use of heterologous NHRs is one way to selectively control only the targeted gene of interest. The ecdysone receptor (EcR) is unique to insects and crustacea and therefore has been widely used to selectively regulate mammalian genes [26, 271. Inducible gene expression in mammals or mammalian cell culture can be achieved with EcR, although highly inducible expression generally requires coexpression of RXR. It is unclear if over expression of RXR influences expression of other NHR responsive genes. Nonetheless, the EcR has become an important heterologous regulator of mammalian gene expression. The need for additional and multiple ligandinducible transcription factors has prompted several groups to develop new transcriptional regulators by reengineering the ligand-binding domains of existing NHRs.
3.3.3 Engineering New Ligand Specificities into NHRs
The reengineering of NHR ligand-binding domains to selectively respond to synthetic ligands has proved to be an important and challenging area in ligand-receptor engineering. Since the original studies of Kirsh and Holbrook directed toward reengineering substrate specificity of enzymes, ligand-receptor engineering has become an important tool for studying complex biomolecular systems [28, 291. Schreiber was perhaps the first to use a combination of mutagenesis and synthesis to generate selective probes for biological function in the form of chemical inducers of dimerization (CIDs; covered elsewhere in this volume) 130-321. The basic design principle used in these studies was the use of “bumps and holes” to alter the interface between ligand and protein in a complementary manner [31]. The bump refers to a molecular appendage on the ligand that would cause a steric clash if it were to try to bind to the wild-type receptor. However the “bumped ligand” could bind to a receptor that is appropriately modified through mutagenesis to contain a compensatory “hole.” The “bump and hole” approach to ligand-receptor engineering has been applied to a number of protein-ligandlenzyme substrate systems. One ofthe most successful systems is the ATP analog-selective-kinase systems by Shokat et al. [33].
I
179
180
I
3 Engineering Control Over Protein Function Using Chemistry
3.3.4 The Requirement of “Functional Orthogonality”
The application of “bump and hole” engineering toward the generation of selective transcriptional regulators has been limited, largely because “hole-modified’ proteins often retain substantial aflinity for their natural ligand [34]. For some applications, such as the selective labeling of kinase substrates by radiolabeled ATP analogs that are only recognized by modified kinases, competing reactions by the natural (nonlabeled ATP) substrate for the kinase is not strictly required [35]. However, a selective transcriptional regulator used to study gene function would have to be function independent of any endogenous receptors. Absolute selectivity over all concentrations of ligand is rarely observed. In practice, it is sufficient for a modified ligand-receptor pair to be “functionally orthogonal” such that the modified receptor is nonresponsive to endogenous concentrations of the natural ligand and that the modified ligand is unable to activate the natural receptor at concentrations used to modulate the modified receptor [33, 341. It is important to recognize that while high potency is generally desirable, the ligand-analog need not bind the modified receptor with the same affinity as the natural ligand-receptor pair so long as it has high selectivity.
3.3.5 Overcoming Receptor Plasticity
The greatest challenge presented by engineered nuclear receptors is the significant structural flexibility of the ligand-binding domain. NH R ligandbinding domains undergo substantial structural reorganization upon hormone binding. The hormone generally provides a hydrophobic nucleus around which the ligand-binding domain repacks its core. The structural changes to the receptor’s core cause changes to the receptor surface resulting in coactivator recruitment and changes in receptor dimerization. It is therefore not surprising that the estrogen receptor binds many ligands that are substantially larger than E2 and would otherwise appear to be too large to fit within the binding pocket observed in the E2-ER crystal structure (Fig. 3.3-4) [36]. These studies imply that identifying “bumped” hormones that will not bind wild-type NHRs could be more challenging than ligand-receptor engineering with more rigid proteins. Through targeted site-directed mutagenesis, Corey et al. searched a library of failed drug candidates, “near drugs,” for their ability to selectively activate the 9-cis retinoic acid receptor, RAR [37, 381. These mutants were carefully selected not to have significant activity with the natural hormone 9-cis retinoic acid. Although mutants that improved activity of these ligands with the mutant receptor were identified, these ligands that largely contained only hydrophobic groups, aside from the requisite carboxylate, were generally less than 10-fold
3.3 Engineering Control Over Protein Function; TranscriptionControl by Small Molecules (-./.&?&C?F5
Estrddio (E?)
RU-58668
Hanson: 17u phenylvinyl estradiol
ICI-IX2.780
Katrenellenbogen; 4-ally1
Fig. 3.3-4 The estrogen receptor has sufficient flexibility to accommodate a diverse array
of ligands that interact with ER a t low or sub-nanomolar potencies.
selective for the mutant over the wild-type receptor. These studies highlight the remarkable ability of the wild-type receptor to accommodate ligands that differ in hydrophobic shape even when modeling might suggest that these ligands should not be accommodated by the ligand-binding site. In general, protein plasticity limits the use of “bump and hole” engineering of flexible proteins. Our group has therefore focused on exploring methods to manipulate polar groups to impart specificity to engineered ligandlreceptor pairs, following the general notion that polar interactions impart specificity to molecular recognition events because mismatched polar interactions cannot be easily avoided by simple side-chain reorganization. In an early work on the retinoic acid receptor, hormone-binding selectivity was changed by modifying a key arginine residue, (Arg278) that forms a salt bridge to the carboxylate of bound retinoic acid [39].Although a neutral ethylamide analog of retinoic acid displayed some mutant versus wild-type selectivity, this analog was notably less potent than the wild-type retinoic acid- RAR (retinoic acid receptor) pair and showed only partial selectivity. A more dramatic attempt to impart selectivity through the manipulation of polar interactions was the reversal of a ligand-receptor salt bridge by creating a guanidine functionalized retinoid, which showed selective but weak activity for the charge-complementing mutant RARy (S289G/R278E).The weaker cellular activity of this ligand-receptor pair is not entirely unexpected in the light of studies by Warshel suggesting that saltbridge interactions are stabilized protein dipoles that would be destabilizing if the salt bridge were reversed [40, 411. In general, charged or neutral polar
I
181
182
I groups found in the interior of proteins are stabilized by multiple polar 3 Engineering Control Over Protein Function Using Chemistry
interactions from the protein in the form of ion pairs, hydrogen bonds, and local or macrodipoles. Adding, removing, or rearranging polar groups found in the interior of protein-ligand complexes is generally disfavored as it leaves the associated polar groups unsatisfied. The solution to this problem of selectivity is not immediately obvious but in at least some cases can be solved. The Koh and the Katzenellenbogen groups simultaneously explored estrogen analogs that could complement the same Glu353 + Ala or Ser mutation in the estrogen receptor [42-441. Glu353 forms an intramolecular salt bridge with Arg274 and both residues form key hydrogen bonds to the 3-hydroxyl of E2 (Fig. 3.3-5(a)).Mutations to Glu353 greatly reduce the receptor’s affinity for the natural ligand E2. While a number of estrogen analogs bearing neutral functional groups in place of the 3-hydroxyl of E2 could activate the Glu353 mutants with high affinity, in almost all cases, these analogs activated the wild-type ERs with equal or greater potency. A few low-potency ligands ( t 2 % wild-type potency) show receptor selectivities as high as 34-fold (mutantlwild type) (Fig. 3.3-G(a))[42]. By comparison, carboxylate-functionalized estrogen analogs designed to restore (intermolecularly) the lost protein salt bridge with Arg274 form high affinity/potency complexes with the mutant receptor (Fig. 3.3-5(b)).These complexes are not of higher affinity than the analogs having neutral appendages, suggesting that the favorable energetics of forming a salt bridge with Arg274 is offset by the substantial cost of desolvating the ligand-associated carboxylate [44].However, carboxylate-functionalizedligands of appropriate size and shape provided a significant gain in selectivity, which can be as high as 95- to 400-fold in favor of the mutant over the wild-type
Fig. 3.3-5 Accessible surface model of functionally orthogonal ER/ES8 pair. (a) Wild-type ER-E2 receptor based on structure modeled by Brzozowski et al. [45].(b) Modeled structure o f ESg-ER(E353A).
3.3 Engineering Control Over Protein Function; Transcription Control by Small Molecules
RTP = I .S RS = 34
RTP = 0.8 RS = 22
RTP = 15 RS = 1.3
0
RTP = 17 R5= I1
RTP = 3.0 RS = 95
KTP = 0.9 KS = 9.2
RTP = 2 RS = 1.6
RTP = 38 RS = 56
Fig. 3.3-6 Complements for ERa(E353A).
structure provide high selectivity without (a) Neutral modifications tend t o provide significant loss in affinity. RTP - relative only modest mutant versus wild-type transcription potency; RS - receptor selectivity. (b) Acidic analogs of appropriate selectivity (ECSowild type/ECSomutant).
ERs (Fig. 3.3-6(b)).This greater selectivity is imparted as a result of weaker binding of the carboxylate-functionalized ligands to the wild type, presumably as a result of mismatched polar interactions at the ligand-receptor interface. We termed the process of exchanging polar groups across the ligand-receptor interface as “polar group exchange”. [43] In essence, the same key functional groups are present in more or less the same positions in the wild-type and the engineered ligand-receptor complexes but differ only in their covalent connectivity ofa key polar group. In the present example, the carboxylate group is presumed to be in more or less the same position but covalently linked to the ligand than to the receptor. This minimizes the impact of altering polar groups within the interior of the protein by preserving the orientation of key dipolar interactions. The most selective system reported is ERB(E305A) with the synthetic ligand ES8. This mutant is no longer activatedby endogenous concentrations of E2, but can be fully activated by concentrations of ES8 that do not activate the wild-type ERs. This system therefore comprises a functionally orthogonal ligand-receptor pair that, in principle, can be used to regulate gene expression independent of endogenous estrogen responsive receptors.
3.3.6 Nuclear Receptor Engineering by Selection
Miller and Whelan were perhaps the first to recognize the potential of screening or selecting NHR mutants from receptor libraries to identify ERs with modified ligand specificities [46,47].Using error prone PCR, they generated populations of mutant ERs in yeast that decreased responsiveness to E2 but has increased responsiveness to the synthetic diphenyl indene-ol GRl32706X. Despite their
I
183
184
I elegant plan, the selected mutants had good potencies but relatively modest 3 Engineering Control Over Protein Function Using Chemistry
selectivities, exhibiting only a 10- to 25-fold improvement in the potency of GR13270GX with the mutant when compared to wild type. One of the limitations of the Miller and Whelan study was that their modified ER regulated the expression of p-galactosidase, which was laboriously followed colorimetrically. Doyle has recently succeeded in using a true selection method to screen codon randomized libraries of RXR that were activated by the synthetic compound LG335 [2]. A key component to their strategy was to utilize a fusion of the nuclear receptor coactivator ACTR linked to the potent Gal4 activation domain (ACTR-GAD).This provided tight control of ADE2 expression to conditionally control survival of the P JG9-4Aauxitroph on media lacking Trp and Leu. The mutant RXR(I2G8V/A272V/I310L/F313M) was 300 times more responsive toward LG335 than wild-type RXR in mammalian cell culture. This particular ligand-receptor pair has only 30% of the wild-type efficacy but nonetheless represents a significant advance in the strategies used to develop functionally orthogonal transcriptional regulators. This general strategy could be easily extended to other NHRs.
3.3.7
Ligand-dependent Recombinases
Other NHR reengineering strategies do not require engineered ligand bound complex to be transcriptionally active but can exploit the ligand-dependent association of steroid receptors to hsps. Pioneering work by Chambon’s group demonstrated that site-specific recombinases can be placed under the control of nuclear receptor ligand-binding domains [48-SO]. The chimeric fusion protein composed of the site-specific recombinase Cre with the ER ligandbinding domain is only active in the presence of an ER ligand such as E2 or the antagonist tamoxifen (Fig. 3.3-7). The unliganded ER ligand-binding domain is associated with hsp90 and interferes with the formation of the tetrameric Cre complex, which mediates recombination. Ligand-dependent recombinases provide a powerful tool for the gene expression because flanking a gene of interest with Cre recognition sites can be used to permanently turn on or turn off its expression. Because recombination causes a permanent change to the cellular genome, all the progeny of a cell that has undergone recombination will propagate the same genomic change. Conditional recombinases used in conjunction with cell-type specific promotors can therefore be powerful tools for following cell lineages in vivo [511. Since the development of the original Cre-ER system, mutagenesis and screening strategies have identified modified ER ligand-binding domains that have reduced responsiveness to E2 but can mediate tamoxifen-dependent recombination [48]. It is important to make the distinction that these modified ligand-receptor pairs do not necessarily form transcriptionally active complexes. Since the first report of the Cre-ER system, several new systems
3.3 Engineering Control Over Protein Function; Transcription Control by Small Molecules
i
= S’-TATAAClTCGTATAGATATGCTATACGAAGTTAT-3’
1
(b)
edRE-ER a
ER ligand
11111,
ATG
STOP
Fig. 3.3-7 Site-specific recombinases can be used t o control gene expression. (a) Homologous recombination by Cre is performed a t specific LoxP sites. (b) The chimeric Cre-ER i s only active in the
presence o f an ER ligand. Recombination can be used t o switch on or off genes by placing them downstream of promoter sequences.
have been reported that make use of Cre or the site-specific recombinase Flp including Cre, Cre-PR (progesterone receptor fusion), Cre-GR (glucocorticoid receptor fusion), and EcR-Flp [Sl-531. Although some of these ligand-dependent recombinases have been reengineered to selectively respond to synthetic receptor antagonists such as Tamoxifen responsive Cre-ER or RU486 responsive Cre-PR, the need to treat cells for up to several days with these potent receptor antagonists may have unwanted side effects, particularly, when used in in vivo developmental models [SO, 531. This suggests that functionally orthogonal ligands may still have an important role to play, providing the next generation of highly selective ligand-dependent recombinases. 3.3.7.1 Chemical Biology o f NHRs and the Potential o f Engineered Nuclear Receptors
A rapidly emerging area in nuclear receptor biology is the “nongenomic” or “extranuclear” actions of NHRs [54]. Several lines of evidence suggest that nuclear receptors may activate signaling complexes outside of the nucleus that only indirectly affect gene transcription. For example, the rapid nongenomic
I
185
186
I actions ofvitamin D receptor (VDR)have been known for many years. Vitamin 3 Engineering Control Over Protein Function Using Chemistry
D analogs that selectively activate the nongenomic actions of vitamin D have played an invaluable role in the study of its nonnuclear actions [55-571. Nongenomic activities of thyroid hormone [58],glucocorticoids, androgens, and mineralcorticoids have also been identified [54, 591. Currently, the most well characterized of these systems involves estrogen and the estrogen receptor. In addition to identifying that the GPCR GPR30 is an estrogen responsive receptor [60-621, several studies have also confirmed that the estrogen receptor can also act outside the nucleus in complex with scaffolding proteins such as MNAR to activate Src kinase or in palmitoylated form in association with caveolins to activate PI3 kinase (Fig. 3.3-8) [63-661. In this case, the nuclear receptor is found to play multiple extranuclear roles in regulating cellular signaling pathways. Analog selective hormone receptors may yet play an important role in dissecting the multiple signaling pathways activated by steroid hormones.
3.3.8 Complementation/Rescue o f Genetic Disease
The development of analog-specific forms of nuclear/steroid hormone receptors has prompted us to investigate many naturally occurring mutations found in nuclear receptors associated with genetic disease. Mutations to
Fig. 3.3-8 Estradiol i s involved in many different signaling pathways some ofwhich involve the same ligand-receptor pair. a - classic nuclear activation of transcription, b - MNAR scaffolded
activation of Src kinase, c - palmitoylated ER can localize t o caveolins in an estrogen dependent manner, d - CPCR signaling by estradiol. Pathways a, b, and c may potentially involve E R a and ERP.
3.3 Engineering Control Over Protein Function; Transcription Control by Small Molecules
nuclear receptors are associated with a family ofhuman genetic diseases, which include VDR mutations associated with rickets, TR mutations associated with resistance to thyroid hormone, mineralcorticoid resistance, PPAR mutations associated with certain forms of severe insulin independent diabetes, and androgen receptor mutations associated with androgen insensitivity syndrome [67-691. Additionally, mutations to the androgen, estrogen, and TRs are associated with the pathology of prostate, breast, and thyroid cancers [70]. A significant subset of these disease-associated mutations is located at the receptor-hormone interface suggesting that appropriately designed hormone analogs may be able to “complement” or “rescue” the function of these receptors. Unlike current gene therapy strategies that use nucleic acid analogs, hormone analogs typically have good druglike properties (i.e., bioavailability, biostability) suggesting that hormone receptor complements may represent a new strategy toward developing new treatments for genetic disease. The possibility of using hormone analogs to rescue nuclear receptor mutations was perhaps first explored by DeGroot et al. who demonstrated that some synthetic hormone analogs were more potent than triiodothyronine (T3) in mutant forms of TR, associated with resistance to thyroid hormone [71]. More recently, Feldman and Peleg similarly screened vitamin D3 analogs that partially complement VDR mutants associated with vitamin D resistant rickets [72], and Chatterjee et al. have identified PPAR agonists that can restore activity to PPAR mutants associated with severe insulin independent diabetes [73]. The first example of a molecule being designed as a rescuing function to a mutant protein associated with a genetic disease was the development of the thyroid hormone analog HY1, which was designed to complement the RTH (thyroid hormone resistance) associated mutant TRB(R320C)[74].This study represented a significant advance over the earlier studies by DeGroot, in that the complementing analog was selective for the mutant form of TRB over the TRcr subtype. In more recent work, new thyroid hormone analogs have been developed that restore efficacy and potency to three ofthe most common RTH-associated mutants Arg320 -+ Cys, Arg320 + His, Arg316 + His (Fig. 3.3-9) [75, 761. All of the compounds used to rescue these mutations affect the carboxylate-binding cluster of arginines, and are based on the same general complementation strategy involving more neutral hydrogen bonding groups in place of the ligand’s carboxylate. This suggests that once general rules for designing complementing analogs are established, the process of identifying new compounds may be reasonably efficient. It is important to distinguish these “functional rescue” studies from several other important studies showing that small molecules can stabilize or chaperone folding of mutant proteins such as mutant p53 associated with cancer [77, 781, mutant forms of V2R associated with nephrogenic diabetes insipidus [79, SO], mutant forms of opsin associated with retinitis pigmentosa [81],and B-glucosidase mutants associated with gaucher disease [82, 831. By contrast, nuclear receptor mutants are often well-folded,stable proteins that
I
187
188
I
3 Engineering Control Over Protein Function Using Chemistry
OH
H
HY1 TRfl(R320C) EC,=7.0 nM rnuffrx selectivity = 5.5
H
KG-8
TRp(R320C) EC& 7 nM rnuffn selectivity = 12
TRp(R320H) EC= , 0.46 nM rnuffu selectivity = 1.O
A’
H TR[$(R316H)EC= , 12.6 nM muffu selectivtty = 4
Fig. 3.3-9 Analogs that rescue function t o TRP mutants TRP(R320C), TRB(R320H), and TRP(R316H) associated with resistance t o thyroid hormone. Receptor selectivity of ligand, mutlcr, defined as (ECso with TRcr)/(EC50 with mutant TRB).
have lost ligand-dependent transactivation function that can be complemented by appropriate ligand design. The challenge to designing compounds that rescue mutations associated with genetic diseases is that there are generally very few individuals with any specific mutation. This poses an even greater challenge to chemists to efficiently design compounds that can complement any specific mutation in a receptor-binding pocket. We evaluated the ability of computer-aided design to discover molecular complements for the rickets associated mutation VDR(R274L),which is more than 1000 times less responsive to the natural hormone 1,25-dihydroxyvitamin D3. We used a virtual screening strategy to evaluate a focused library of analogs of the synthetic VDR agonist LG190155 (Fig. 3.3-10) [84]. Although the bound structure of LG190155 with wild-type VDR was not available, half of the analogs selected by virtual screening were able to restore more than GO% activity at 200 nM. When tested in cell based assays, the best analogs were able to restore almost fully the potency and efficacy to this otherwise unresponsive mutant. Computer-aided design was similarly successful at identifying seco-steroid analogs that could complement this same mutant (Fig. 3.3-10) [85]. These findings suggest that for at least some mutants, computer-aided molecular design can be used to efficiently design compounds that rescue genetic mutations. 3.3.9
De Novo Design of Ligand-binding Pockets
In addition to reengineering existing ligand-binding pockets, it is also possible to generate de novo ligand-binding sites into proteins. A notable early example shown by Matthews was the formation of de novo benzene- and guanidine-binding sites by making Phe + Ala or Arg + Ala mutations into
3.3 Engineering Control Over Protein Function; Transcription Control by Small Molecules
I
189
R
HO
LCH
1,25dihydroxyvitaminD, Wild-type VDR; EC,=2.0 nM VDR(R274L); EC, 2000 nM
LG190155 Wild-type VDR; EC,= 110.0 nM VDR(R274L); EC, = 85 nM
no
ss-Ill VDR(R274L); EC=,
7.0 nM
0
ss-Ill VDR(R274L); EC=,
3.3
Fig. 3.3-10 Molecular rescue of rickets associated mutant VDR(R274L) by designed synthetic analogs of known agonists.
lysozyme [86].Although these de novo binding sites have only weak affinity for these solvent substrates, they clearly demonstrated that new small-molecule binding sites could be created into proteins. Barbas and Schultz have been able to use this strategy to create zinc finger domains that bind only in the presence of isoindole derivatives [87]. By fusing these inducible zinc finger domains to transactivation domains, the isoindoles can be used to remotely regulate gene transcription. Currently, the affinity of these de novo designed cavities for their ligands are of only modest potency. However, combined with recent advances in computational methods to de novo design ligand-binding cavities [88-911, this general strategy provides a potentially powerful approach to creating ligand-inducible transcriptional regulators. 3.3.10 Light-activatedGene Expression from Small Molecules
A new and exciting area in ligand-induced transcriptional regulators is the development of photoresponsive transcriptional regulators, which utilize photocaged small molecules. Just as ligand-inducibletranscriptional regulators have revolutionized our study of protein function, light-activated transcription (or translation) systems may prove to be a powerful tool for studying the function of genes that elicit their effects only through their expression in precise three-dimensional patterns, gradients, or arrays. This includes morphogens, which are important guidance cues for neurogenesis, vascular genesis, and limb development as well as other critical steps during development [92, 931. Spatial gene patterning may also potentially play a role in creating artificial tissues.
190
I
3 Engineering Control Over Protein Function Using Chemistry
By photocaging nuclear receptor agonists, Koh et al. were able to show that transcription could be controlled in an exposure-dependent manner [94]. Currently, photocaged agonists for the estrogen, thyroid, retinoic acid, and VDRs have been used to place nuclear receptor mediated transcription under the control of light [94-961. Using a photocaged agonist of the ecdysone receptor, Lawrence et al. have demonstrated that even though photoreleased agonists are freely diffusing, spatially discrete patterns of expressed genes can be made on the micron scale in cultured cells [97]. The photoregulation of gene expression by uncaging small molecules presents many challenges. Small-molecule triggers for transcription have the advantage of being easily delivered into cells by passive diffusion. Therefore, a multicellular system or organism is only light sensitive after the addition of the caged compound. Conversely, a cultured cell monolayer can be again rendered light-insensitive minutes after the caged compound is removed from the media. Ligand diffusion can affect the resolution at which genes can be patterned, as the photoreleased activator can diffuse into neighboring cells. When the patterned feature sizes are small, the region of activation will be confined through the effects of ligand dilution upon bulk diffusion. In other words, the concentration of released hormone activator will be too dilute to activate cells that are remote to the site of activation. Photocaged antagonists may provide a means to selectively turn off gene expression in a small region of cells within a larger tissue [96]. The photorelease of nuclear receptor agonists in a subpopulation of cells within a tissue presents another challenge, as the diffusion of ligand back out of the cell will limit the duration of transcription response. For some ligand-receptor pairs, the duration of reporter gene response may be limited to less than a few hours, whereas for other ligand-receptor pairs, reporter gene expression can last for several hours and as long as 1.5 days [95].The duration of reporter gene response is much longer than the half-life of free-ligand within the cell because many ligand-receptor complexes have very slow offrates. However, ligand-receptor pairs with apparently slow off-rates, can have a relatively limited duration of response as NHR transcription complexes are generally disassembled by chaperones and are targets of ubiquitin ligases and proteasomes [98- 1011.The effects ofphotoreleased antagonists to turn off gene expression can be similarly limited by ligand off-rates and receptor proteolysis. Even when a covalent-binding antagonist that has a very long ligand-receptor half-life is used, gene expression is recovered over several hours as new protein is resynthesized by the cell [9G]. The long duration response observed, for at least some ligand-receptor pairs, suggests that photocaged agonists can be used to generate unique spatiotemporal patterns of gene expression. The use of small molecules to activate gene expression should be compared to methods used to photocage proteins or nucleic acids [102-1061. In general, photocaged biopolymers are difficult to deliver into cells or organisms, whereas caged small molecules can in principle be added in vitro or in vivobut require the use of transfected cells of transgenic animals. Tsien et al. elegantly
References I191
demonstrated that photocaged forms of RNA and DNA can be injected into zebrafish oocytes (single cell stage) and are sufficiently stable to be carried into essentially all cells ofthe developed organism [lOG]. The caged RNA could then be released in a subpopulation of cells where it is locally translated into gene product. The use of caged nucleic acids to photoregulate gene expression was first demonstrated by Hasselton et al. in mouse models [103-1051. The application of caged RNAs has recently been expanded to light-activated RNAi methods by Friedman [107].
References 1.
2.
3.
4.
5.
6.
7.
8.
D.F. Doyle, D.J. Mangelsdorf, D.R. Corey, Modifying ligand specificity of gene regulatory proteins, Curr. Opin. Chem. Biol. 2000, 4, 60-63. L.J. Schwimmer, P. Rohatgi, B. Azizi, K.L. Seley, D. Doyle, Creation and discovery of ligand-receptor pairs for transcriptional control with small molecules, Proc. Natl. Acad. Sci. U.S.A. 2004, 101,14707-14712. A.R. Buskirk, D.R. Liu, Creating small-molecule-dependent switches to modulate biological functions, Chem. Biol. 2005, 12, 151-161. D.A. Braasch, D.R. Corey, Novel antisense and peptide nucleic acid strategies for controlling gene expression, Biochemistry 2002, 41, 4503-45 10. S.A. Raillard, G.F. Joyce, Targeting sites within HIV-1 cDNA with a DNA-cleaving ribozyme, Biochemistry 199635,11693-11701. L. Malphettes, M. Fussenegger, Macrolide- and tetracyclineadjustable siRNA-mediated gene silencing in mammalian cells using polymerase 11-dependent promoter derivatives, Biotechnol. Bioeng. 2004, 88,417-425. M. Mandal, M. Lee, J.E. Barrick, 2. Weinberg, G.M. Emilsson, W.L. Ruzzo, R.R. Breaker, A glycinedependent riboswitch that uses cooperative binding to control gene expression, Science 2004, 306, 275-279. J.E. Barrick, K.A. Corbino, W.C. Winkler, A. Nahvi, M. Mandal,
9.
10.
11.
12.
13.
14.
J. Collins, M. Lee, A. Roth, N. Sudarsan, I. Jona, J.K. Wickiser, R.R. Breaker, New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control, Proc. Natl. Acad. Sci. U.S.A. 2004, 101,6421-6426. M. Lewis, G. Chang, N.C. Horton, M.A. Kercher, H.C. Pace, M.A. Schumacher, R.G. Brennan, P.Z. Lu, Crystal structure of the lactose operon repressor and its complexes with DNA and inducer, Science 1996, 271,1247-1254. S.B. Baim, M.A. Labow, A.J. Levine, T. Shenk, A Chimeric Mammalian Transactivator Based on the Lac Repressor That Is Regulated by Temperature and Isopropyl Beta-D-Thiogalactopyranoside, Proc. Natl. Acad. Sci. U.S.A. 1991, 88, 5072-5076. M. Gossen, H. Bujard, Tight control of gene expression in mammalian cells by tetracycline-responsive promotcrs, Proc. Natl. Acad. Sci. U.S.A. 1992,89, 5547-5551. M. Gossen, A.L. Bonin, H. Bujard, Control of gene activity in higher eukaryotic cells by prokaryotic regulatory elements, Trends Biochem. S C ~1993, . 18,471-475. W. Weber, M. Fussenegger, Approaches for trigger-inducible viral transgene regulation in geno-based tissue engineering, Curr. Opin. Biotechnol. 2004, 15, 383-391. W. Weber, C. Fux, M. Daoud-El Baba, B. Keller, C.C. Weber,
192
I
3 Engineering Control Over Protein Function Using Chemistry
15.
16.
17.
18.
19.
20.
21.
22.
B.P. Kramer, C. Heinzen, D. Aubel, J.E. Bailey, M. Fussenegger, Macrolide-basedtransgene control in mammalian cells and mice, Nat. Biotechnol. 2002, 20, 901-907. P. Neddermann, C. Gargioli, E. Muraglia, S. Sambucini, F. Bonelli, R. De Francesco, R. Cortese, A novel, inducible, eukaryotic gene expression system based on the quorum-sensing transcription factor TraR, EMBO Rep. 2003,4,159-165. W. Weber, M. Rimann, M. Spielmann, B. Keller, M. Daoud-El Baba, D. Aubel, C.C. Weber, M. Fussenegger, Gas-inducible transgene expression in mammalian cells and mice, Nat. Biotechnol. 2004, 22, 1440- 1444. A.C.U. Steinmetz, J.P. Renaud, D. Moras, Binding of ligands and activation of transcription by nuclear receptors, Annu. Rev. Biophys. Biomol.StWct. 2001, 30, 329-359. A. Aranda, A. Pascual, Nuclear hormone receptors and gene expression, Physiol. Rev. 2001, 81, 1269- 1304. S. Green, P. Chambon, Oestradiol induction of a glucocorticoidresponsive gene by a chimaeric receptor, Nature 1987, 325, 75-78. I. J. Lee, P.H. Driggers, J.A. Medin, V.M. Nikodem, Recombinant Thyroid Hormone Receptor and Retinoid X Receptor Stimulate LigandDependent Transcription in vitro, Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 1647-1651. S.M. Pemrick, P. Abarzua, C. Kratzeisen, M.S. Marks, J.A. Medin, K. Ozato, J.F. Grippo, Characterization of the chimeric retinoic acid receptor RARalpha/VDR, Leukemia 1998, 12, 554-562. C.C. Thompson, R.M. Evans, Trans-activation by Thyroid Hormone receptors: functional parallels with Steroid Hormone receptors, Proc. Natl. Acad. Sci U.S.A. 1989,86,3494-3498.
23. Y. Wang, B.W. O’Malley,Jr, S.Y.
24.
25.
26.
27.
28.
29.
30.
31.
32.
Tsai, B.W. O’Malley,A regulatory system for use in gene transfer, Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 8180-81 84. H.A. Greisman, C.A. Pabo, A general strategy for selecting high-affinity Zinc finger proteins for diverse DNA targets, Science 1997, 275, 657-661. Y. Choo, A. Klug, Selection of DNA binding sites for Zinc fingers using rationally randomized DNA reveals coded interactions, Proc. Natl. Acad. Sci. U.S.A. 1994, 91,11168-11172. D. No, T.P. Yao, R.M. Evans, Ecdysone-inducible gene expression in mammalian cells and transgenic mice, Proc. Natl. Acad. Sci. U.S.A. 1996, 93,3346-3351. S.T. Suhr, E.B. Gil, M.C. Senut, F.H. Gage, High level transactivation by a modified Bombyx ecdysone receptor in mammalian cells without exogenous retinoid X receptor, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 7999-8004. C.N. Cronin, B.A. Malcolm, J.F. Kirsch, Reversal of substrate charge specificity by site-directed mutagenesis of aspartate aminotransferase, J . Am. Chem. Soc. 1987, 109,2222-2223. A.R. Clarke, T. Atkinson, J.J. Holbrook, From analysis to synthesis: new ligand binding sites on the lactate dehydrogenase framework. Part I , Trends Biochem. Sci. 1989, 14, 101-105. D.M. Spencer, T.J. Wandless, S.L. Schreiber, G.R. Crabtree, Controlling signal transduction with synthetic ligands, Science 1993, 262, 1019- 1024. P.J. Belshaw, J.G. Schoepfer, K.-Q. Liu, K.L. Morrison, S.L. Schreiber, Rational Design of Orthogonal Receptor-Ligand Combinations, Angew. Chem., Znt. Ed. Engl. 1995,34, 2129-2132. S.N. Ho, S.R. Biggar, D.M. Spencer, S.L. Schreiber, G.R. Crabtree, Dimeric ligands define a role for transcriptional activation domains in reinitiation, Nature 1996, 382, 822.
References I 1 9 3 33.
34.
35.
36.
37.
38.
39.
40
41.
42.
A. Bishop, 0. Buzko, S. HeyeckDumas, I. Jung, B. Kraybill, Y. Liu, K. Shah, S. Ulrich, L. Witucki, F. Yang, C. Zhang, K.M. Shokat, Unnatural ligands for engineered proteins: new tools for chemical genetics, Annu. Rev. Biophys. Biomol. Stwct. 2000, 29, 577-606. J.T. Koh, Engineering selectivity and discrimination into ligand-receptor interfaces, Chem. Biol. 2002, 9, 17-23. S.M. Ulrich, 0. Buzko, K. Shah, K.M. Shokat, Towards the engineering of an orthogonal protein kinasel nucleotide triphosphate pair, Tetrahedron 2000, 56, 9495-9502. J.A. Katzenellenbogen, R. Muthyala, B.S. Katzenellenbogen, Nature of the ligand-binding pocket of estrogen receptor alpha and beta: the search for subtype-selective ligands and implications for the prediction of estrogenic activity, Pure Appl. Chew. 2003, 75,2397-2403. D.J. Peet, D.F. Doyle, D.R. Corey, D.J. Mangelsdorf, Engineering novel specificities for ligand-activated transcription in the nuclear hormone receptor RXR, Chem. Biol. 1998, 5, 13-21. D.F. Doyle, D.A. Braasch, L.K. Jackson, H.E. Weiss, M.F. Boehm, D.J. Mangelsdorf, D.R. Corey, Engineering orthogonal ligand-receptor pairs from “Near Drugs”, /. Am. Chew. SOC.2001, 123, 11367-11371. J.T. Koh, M. Putnam, M. Tomic-Canic, C.M. McDaniel, Selective regulation of gene expression using rationally-modified retinoic acid receptors, /. Am. Chem. SOC.1999, 121,1984-1985. A. Warshel, J. Aqvist, Electrostatic energy and macromolecular function, Annu. Rev. Biophys. Chem. 1991, 20,267-298. J.-K. Hwang, A. Warshel, Why ion pair reversal by protein engineering is unlikely to succeed, Nature 1988, 334,270-272. R. Tedesco, J.A. Thomas, B.S. Katzenellenbogen, J.A.
43.
44.
45.
46.
47.
48.
49.
Katzenellenbogen, The estrogen receptor: a structure-based approach to the design of new specific hormone-receptor combinations, Chem. Biol. 2001, 8,277-287. Y. Shi, J.T. Koh, Selective regulation of gene expression by an orthogonal estrogen receptor-ligand pair created by polar-group exchange, Chem. Biol. 2001, 8,501-510. Y.H. Shi, J.T. Koh, Functionally orthogonal ligand-receptor pairs for the selective regulation of gene expression generated by manipulation of charged residues at the ligand-receptor interface of ER alpha and ER beta, /. Am. Chem. Soc. 2002, 124,6921-6928. A.M. Brzozowski, A.C. Pike, Z. Dauter, R.E. Hubbard, T. Bonn, 0. Engstrom, L. Ohrnan, G.L. Greene, J.A. Gustafsson, M. Carlquist, Molecular basis of agonism and antagonism in the oestrogen receptor, Nature 1997, 389, 753-758. N. Miller, J. Whelan, Random mutagenesis of human estrogen receptor ligand binding domain identifies mutations that decrease sensitivity to estradiol and increase sensitivity to a diphenol indene-ol compound: basis for a regulatable expression system, 1.Steroid. Biochem. Mol. Biol. 1998, 64, 129-135. J. Whelan, N. Miller, Generation of estrogen receptor mutants with altered ligand specificity for use in establishing a regulatable gene expression system, /. Steroid. Biochem. Mol. Biol. 1996, 58, 3-12. D. Metzger, J. Clifford, H. Chiba, P.Chambon, Conditional site-specific recombination in mammalian cells using a ligand-dependent chimeric Cre recombinase, Proc. Natl. Acad. Sci. U.S.A. 1995, 92,6991-6995. R. Feil, J. Brocard, B. Mascrez, M. LeMeur, D. Metzger, P. Chambon, Ligand-activated site-specific recombination in mice,
194
I
3 Engineering Control Over Protein Function Using Chemistry
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 10887-10890. R. Feil, J. Wagner, D. Metzger, P. Chambon, Regulation of Cre recombinase activity by mutated estrogen receptor ligand-binding domains, Biochem. Biophys. Res. Commun. 1997,237,752-757. J.A.Sawicki, B. Monks, R.J. Morris, Cell-specific ecdysone-inducible expression of FLP recombinase in mammalian cells, Biotechniques 1998, 25,868-870,872-865. J. Brocard, R. Feil, P. Chambon, D. Metzger, A chimeric Cre recombinase inducible by synthetic,but not by natural ligands of the glucocorticoid receptor, Nucleic Acids Res. 1998, 26,4086-4090. C. Kellendonk, F. Tronche, A.P. Monaghan, P.O. Angrand, F. Stewart, G. Schutz, Regulation of Cre recombinase activity by the synthetic steroid RU 486, Nucleic Acids Res. 1996, 24, 1404-1411. R. Losel, M. Wehling, Nongenomic actions of steroid hormones, Nut. Rev. Mol. Cell Bio. 2003, 4, 46-56. M.C. Farach-Carson, I. Nemere, Membrane receptors for vitamin D steroid hormones: potential new drug targets, Curr. Drug Targets 2003, 4, 67-76. K. Nemere, S.E. Safford, B. Rohe, M.M. DeSouza, M.C. Farach-Carson, Identification and characterization of 1,25D(3)-membrane-associated rapid response, steroid (1,25D(3)-MARRS) binding protein, J . Steroid Biochem. Mol. Biol. 2004, 89-90, 281-285. R. Khoury, A.L. Ridall, A.W. Norman, M.C. Farachcarson, Analogs of vitamin-D(3) selectively activate genomic and nongenomic pathways in osteoblasts, J. Bone Miner. Res. 1993, 8, S220-S220. P. J. Davis, F.B. Davis, Nongenomic actions of thyroid hormone on the heart, Thyroid 2002, 12,459-466. E. Falkenstein, H.C. Tillmann, M. Christ, M. Feuring, M. Wehling, Multiple actions of steroid hormones - A focus on rapid,
60.
61.
62.
63.
64.
65.
66.
67.
68.
nongenomic effects, Pharmacol. Rev. 2000,52, 513-555. E.J. Filardo, J.A. Quinn, K.I. Bland, A.R. Fracltelton, Estrogen-induced activation of Erk-1 and Erk-2 requires the G protein-coupled receptor homolog, GPR30, and occurs via trans-activation of the epidermal growth factor receptor through release of HB-EGF, Mol. Endocrinol. 2000, 14,1649-1660. E.J. Filardo, J.A. Quinn, A.R. Frackelton, K.I. Bland, Estrogen action via the G protein-coupled receptor, GPR30: stimulation of adenylyl cyclase and CAMP-mediated attenuation of the epidermal growth factor receptor-to-MAPK signaling axis, Mol. Endocrinol. 2002, 16, 70-84. P. Thomas, Y. Pang, E. J. Filardo, J. Dong, Identity of an estrogen membrane receptor coupled to a G protein in human breast cancer cells, Endocrinology 2005, 146,624-632. S. Balasenthil, R.K. Vadlamudi, Functional interactions between the estrogen receptor coactivator PELPl/MNAR and retinoblastoma protein, 1.Biol. Chem. 2003, 278, 22119-22127. F. Barletta, C.W. Wong, C. McNally, B.S. Kornm, B. Katzenellenbogen, B.J. Cheskis, Characterization of the interactions of estrogen receptor and MNAR in the activation of cSrc, Mol. Endocrinol. 2004, 18,1096-1108. D.P. Edwards, V. Boonyaratanakornkit, Rapid extranuclear signaling by the estrogen receptor (ER): MNAR couples ER and Src to the MAP kinase signaling pathway, Mol. Intern. 2003,3,12-15. L. Li, M.P. Haynes, J.R. Bender, Plasma membrane localization and function of the estrogen receptor alpha variant (ER46) in human endothelial cells, Proc. Natl. Acad. Sci. U.S.A. 2003, 100,4807-4812. D.S. Latchman, Transcription-factor mutations and disease, N. Engl. J. Med. 1996,334,28-33. D.M. Tanenbaum, Y. Wang, S.P. Williams, P.B. Sigler,
References I195
69.
70.
71.
72.
73.
74.
Crystallographic comparison of the estrogen and progesterone receptor’s ligand binding domains, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 5998-6003. I . Barroso, M. Gurnell, V.E. Crowley, M. Agostini, J.W. Schwabe, M.A. Soos, G.L. Maslen, T.D. Williams, H. Lewis, A.J. Schafer, V.K. Chatterjee, S. O’Rahilly, Dominant negative mutations in human PPARgamma associated with severe insulin resistance, diabetes mellitus and hypertension [see comments], Nature 1999, 402,880-883. M. Marcelli, M. Ittmann, S. Mariani, R. Sutherland, R. Nigam, L. Murthy, Y.L. Zhao, D. DiConcini, E. Puxeddu, A. Esen, J. Eastham, N.L. Weigel, D.J. Lamb, Androgen receptor mutations in prostate cancer, Cancer Res. 2000, 60,944-949. T. Takeda, S. Suzuki, R.T. Liu, L.J. DeGroot, Triiodothyroacetic acid has unique potential for therapy of resistance to thyroid hormone, J . Clin.Endocrinol. Metab. 1995, 80, 2033-2040. S.A. Gardezi, C. Nguyen, P.J. Malloy, G.H. Posner, D. Feldman, S. Peleg, A rationale for treatment of hereditary vitamin D-resistant rickets with analogs of 1 alpha,25dihydroxyvitamin D-3,J . Biol. Chem. 2001, 276 29148-29156. M. Agostini, M. Gurnell, D.B. Savage, E.M. Wood, A.G. Smith, 0. Rajanayagam, K.T. Garnes, S.H. Levinson, H.E. Xu, J.W.R. Schwabe, T.M. Willson, S. O’Rahilly, V.K. Chatterjee, Tyrosine Agonists reverse the molecular defects associated with dominant-negative mutations in human peroxisome proliferator-activated receptor gamma, Endocrinology 2004, 145, 1527-1538. H.F. Ye, K.E. O’Reilly, J.T. Koh, A subtype-selective thyromimetic designed to bind a mutant thyroid hormone receptor implicated in resistance to thyroid hormone, J . Am. Chem. Soc. 2001, 223,1521-1522.
75.
76.
77.
78.
79.
80.
81.
82.
Y. Shi, H. Ye, K.H. Link, M.C. Putnam, I. Hubner, S. Dowdel, J.T. Koh, Mutant-selective thyromimetics for the chemical rescue of thyroid hormone mutants associated with resistance to thyroid hormone (RTH), Biochem. J . 2005, 44,4612-4626. A. Hashimoto, Y. Shi, K. Drake, J.T. Koh, Design and synthesis of complementing ligands for mutant thyroid hormone receptor TRb(R320H): a tailor-made approach towards the treatment of resistance to thyroid hormone, Bioorg. Med. Chem. 2005, 13(11):3627-3639 In Press. B.A. Foster, H.A. Coffey, M.J. Morin, F. Rastinejad, Pharmacological rescue of mutant p53 conformation and function, Science 1999, 286, 2507- 25 10. A.N. Bullock, A.R. Fersht, Rescueing the function of mutant p53, Nat. Rev. Cancer 2001, I , 68-76. V. Bernier, J.P. Morello, A. Salahpour, M.F. Arthus, A. Laperriere, M. Lonergan, M. Bouvier, D.G. Bichet, A pharmacological chaperone acting at the V2-vasopressin receptor offers a treatment for nephrogenic diabetes insipidus, F A S E B J . 2002, 16, A142-Al43. J.P. Morello, A. Salahpour, A. Laperriere, V. Bernier, M.F. Arthus, M. Lonergan, U. PetajaRepo, S. Angers, D. Morin, D.G. Bichet, M. Bouvier, Pharmacological chaperones rescue cell-surface expression and function of misfolded V2 vasopressin receptor mutants, J . Clin. Invest. 2000, 105, 887-895. S.M. Noonvez, V. Kuksa, Y.Imanishi, L. Shu, S. Filipek, K. Palczewski, S. Kauushal, Pharmacological Chaperone-mediated in vivo folding and stabilization of the P23H-opsin mutant associated with Autosomal Dominant Retinitis Pigmentosa, J . Biol. Chew. 2003,278,14442-14450. A.R. Sawkar, W.C. Cheng, E. Beutler, C.H. Wong, W.E. Balch, J.W. Kelly, Chemical chaperones increase the cellular activity of N370S beta-glucosidase: a therapeutic
196
I
3 Engineering Control Over Protein Function Using Chemistry
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
the life of a mouse, Development strategy for Gaucher Disease, Proc. 2002, 129,815-829. Natl. Acad. Sci. U.S.A. 2002, 99, 94. F.G. Cruz, J.T. Koh, K.H. Link, 15428-15433. Light-activated gene expression, J . F.E. Cohen, J.W. Kelly, Therapeutic Am. Chem. SOC.2000, 122, approaches to protein-misfolding 877778778, diseases, Nature 2003, 426,905-909. 95. K.H. Link, F.G. Cruz, H.-F. Ye, S.L. Swann, J. Bergh, M.C. FarachK. O’Reilly, S. Dowdell, J.T. Koh, Carson, C.A. Ocasio, J.T. Koh, Photo-caged agonists of the nuclear Stmcture-based design of selective receptors RARg and TRb provide agonists for a rickets-associated unique time-dependent gene mutant of the vitamin D receptor, J . expression profiles for light-activated Am. Chem. SOC.2002, 124, gene patterning, Bioorg. Med. Chem. 13795- 13805. 2004, 12,5949-5959. S.L. Swann, J.J. Bergh, M.C. 96. Y.H. Shi, J.T. Koh, Light-activated Farach-Carson, J.T. Koh, Rational transcription and repression by using design of vitamin D-3 analogues photocaged SERMs, Chembiochem which selectively restore activity to a vitamin D receptor mutant associated 2004,5,788-796. 97. W.Y. Lin, C. Albanese, R.G. Pestell, with rickets, Org. Lett. 2002, 4, 3863-3866. D.S. Lawrence, Spatially discrete, E. Baldwin, W.A. Baase, X.J. Zhang, light-driven protein expression, V. Feher, B.W. Matthews, Generation Chem. Biol. 2002, 9,1347-1353. of ligand binding sites in T4 98. 2 . Nawaz, D.M. Lonard, A.P. Dennis, lysozyme by deficiency-creating C.L. Smith, B.W. O’Malley, substitutions, J . Mol. Biol. 1998, 277, Proteasome-dependent degradation 467 -485. of the human estrogen receptor, Proc. Q. Lin, C.F. Barbas, P.G. Schultz, Natl. Acad. Sci. U.S.A. 1999, 96, Small-molecule switches for zinc 1858-1862. finger transcription factors, J . Am. 99. A. Dace, L. Zhao, K.S. Park, Chem. Soc. 2003, 125, 612-613. T. Fumno, N. Takamura, L.L. Looger, M.A. Dwyer, J.J.Smith, M. Nakanishi, B.L. West, J.A. H.W. Hellinga, Computational Hanover, S. Cheng, Hormone design of receptor and sensor binding induces rapid proteins with novel functions, Nature proteasome-mediated degradation of 2003,423,185-190. thyroid hormone receptors, Proc. M. Allert, S.S. Rizk, L.L. Looger, Natl. Acad. Sci. U.S.A. 2000, 97, H.W. Hellinga, Computational 8985-8990. design of receptors for an 100. D.L. Osburn, G. Shao, H.M. Seidel, organophosphate surrogate of the I.G. Schulman, Ligand-dependent nerve agent soman, Proc. Natl. Acad. degradation of retinoid X receptors Sci. U.S.A.2004, 101,7907-7912, does not require transcriptional X. Yang, J.G. Saven, Computational activity or coactivator interactions, combinatorial protein design: Mol. Cell. Biol. 2001, 21, 4909-4918. sequence search and statistical 101. M. Qiu, C.A. Lange, MAP kinases design, Abstr. Pap. Am. Chem. SOC. couple multiple functions of human 2004,228, U523-US23. progesterone receptors: degradation, J.G. Saven, Combinatorial protein transcriptional synergy, and nuclear design, Curr. Opin. Struct. Biol. 2002, association, J. Steroid Biochem. Mol. 12,453-458. Biol. 2003, 85, 147-157. C. Tickle, Patterning i n Vertibrate 102. K. Curley, D.S. Lawrence, Development, Vol. 41, Oxford Light-activated proteins, Curr. Opin. University Press, Oxford, 2003. Chem. Biol. 1999, 3, 84-88. M. Zernicka-Goetz, Patterning of the 103. M.S. Chang, F.R. Haselton, Light embryo: the first spatial decisions in activated protein expression using
References
caged transfected plasmid 11: delivery by gene gun to organ cultured corneas, Invest. Ophthalmol. Vis. Sci. 1997,38,2083-2083. 104. F.R. Haselton, W.C. Tseng, M.S.
Chang, Light activated protein expression using caged transfected plasmid I: delivery by liposomes to cultured retinal endothelium, Invest. Ophthalmol. Vis. Sci. 1997, 38, 2082-2082. 105.
W.T. Monroe, M.M. McQuain, M.S. Chang, J.S. Alexander, F.R. Haselton,
Targeting expression with light using caged DNA, 1.Biol. Chem. 1999, 274, 20895-20900. 106. H. Ando, T. Fumta, R.Y. Tsien,
H. Okamoto, Photo-mediated gene activation using caged RNA/DNA in zebrafish embryos, Nat. Genet. 2001, 28,317-325.
Shah, S. Rangarajan, S.H. Friedman, Light-activated RNA interference, Angew. Chem. Int. Ed
107. S.
2005,44,1328-1332.
I
197
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I199
4 Controlling Protein- Protein Interactions
4.1 Chemical Complementation: Bringing the Power o f Genetics to Chemistry
Pamela Peralta-Yahya and Virginia W. Cornish
Outlook
Genetics in many ways is the underpinning of modern cell biology, having provided a straightforward experimental approach to identify the proteins involved in a given biological pathway. As practised, however, genetics leaves us with a picture of the cell composed largely of proteins. The roles of other molecules, such as phosphoinositides or siRNAs, have long been overlooked. With growing interest in developing a complete description of a living cell and with the backdrop of the genome sequencing projects, the question would seem to be how to extend the ease of genetics to these other classes of molecules. With a complete palette, it would then be possible to fully harness the powerful synthetic and functional capabilities of the cell for chemistry beyond that naturally carried out by the cell (Fig. 4.1-1).Here we consider a particular genetic assay, the yeast two-hybrid assay, in light of these challenges.
4.1.1 Introduction
The two-hybrid assay, which detects protein-protein interactions as reconstitution of a transcriptional activator, provides a general, high-throughput assay for cloning any protein on the basis of its interaction with another protein. Introduced only in 1989, the two-hybrid assay has proven so robust that today roughly half of the known protein-protein interactions are determined in part using the two-hybrid assay. In this, chapter we look at more recent efforts to extend this powerful genetic assay to read-out the other important molecules in Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
200
I
4 Controlling Protein-Protein Interactions
Fig. 4.1-1 Chemical Complementation combines the power of genetic assays and small molecule chemistry to understand small molecule function and develop new chemistry inside the cell.
the cell, such as nucleic acids and small molecules. We also consider the possibilities for exploiting the two-hybrid assay for chemical discovery-extending the power of genetics to chemistry not naturally carried out in the cell. The two-hybrid assay works by detecting protein-protein interactions as reconstitution of a transcriptional activator, a natural eukaryotic transcription factor, and as activation of a reporter gene. One protein is fused to the DNA-binding domain (DBD) of the transcriptional activator, and the other protein is fused to the activation domain (AD).If the two proteins bind to one another, they effectively dimerize and hence reconstitute the transcriptional activator (Fig. 4.1-2). In practice, this assay is used not just to test a single protein-protein interaction, but to test all of the proteins expressed in a given organism or cell line for binding to the protein of interest. A library of ADfusion proteins, encoding all ca lo4 different proteins, is transformed en masse into an appropriate two-hybrid selection strain containing the DBD-protein fusion of interest. Only cells expressing an AD-protein fusion that binds to the DBD-protein fusion will then survive under the appropriate reporter gene selection conditions. The assay is general because the transcriptionbased selection works for any protein-protein interaction. Therefore, while
4. I
Chemical Complementation: Bringing the Power ofGenetics to Chemistv
Fig. 4.1-2 In the yeast two-hybrid system, dimerization of fusion proteins X-DNA-binding domain and Y-activation domain reconstitutes the transcriptional activator. The reconstituted transcriptional
activator recruits the transcriptional machinery t o the promoter region of the reporter gene, initiating its transcriptional activation.
traditional genetic assays rely on pathway-specific cell survival selections or phenotypic screens, to which not all pathways or proteins in a pathway are amenable, the two-hybrid assay can be applied to any given protein-protein interaction, since the transcription-based read-out is independent of the particular pathway being studied. The assay is high-throughput because standard molecular biology techniques allow large libraries (ca 105-107 in yeast) to be tested simultaneously, where only the cells expressing an interacting protein pair survive. The other strength of the two-hybrid assay is the ease with which it can be carried out using modern methods in molecular biology. At the end of a two-hybrid selection, the interacting proteins can be read-out simply by extracting the DNA encoding the AD-fusion proteins from the surviving cells and by sequencing the DNA, As a proof of the power of this approach, the two-hybrid assay is now essential to any effort to clone proteins along a given biological pathway. Moreover, the fortuitous development of the two-hybrid assay concurrent with genome sequence projects, enables the construction of exact cDNA-ADlibraries based on this data, thus facilitating protein identity to be readily extracted from a random DNA library. The high-throughput nature of the two-hybrid assay even allows protein interaction studies to be carried out on a genome-wide scale. For example, analyzing all ca 6000 proteins expressed in yeast for binding to one another by testing all GOO0 DNA-binding protein fusions to their 6000 AD counterparts. As with the field of genetics as a whole, the two-hybrid assay is biased toward proteins. As variations of this assay, which can detect DNA, RNA, and small molecule binding, are now developed, it is exciting to imagine
I
201
202
I the potential for basic science discovery for the roles of these molecules in 4 Controlling Protein-Protein Interactions
the cell. Furthermore, these so-called n-hybrid assays extend these powerful transcription-based genetic assays to chemistry not naturally carried out in the cell. This extension should allow these genetic assays to be used not only for the discovery of biological pathways but also for new chemistry, including drug discovery and the directed evolution of molecules with new functional properties. 4.1.2 History/Developrnent
Since the conception of the two-hybrid assay to detect protein-protein interactions in vivo at the end of the 1980s, key modifications to this assay have expanded its scope to detect DNA-, RNA-, and small molecule-protein interactions in so-called n-hybrid assays. More recently, “n-hybrid” assays have also been used to detect enzyme catalysis, where enzyme activity is linked to cell survival via transcription of a reporter gene. Here we look at the initial publications that moved the two-hybrid assay into each of these new directions.
4.1.2.1 Protein-Protein Interactions In 1989 Fields and Song introduced the “Yeast Two-Hybrid Assay”
which provides a straightforward method for detecting protein-protein interactions in uivo [l].Until the development of the two-hybrid methodology, protein-binding interactions had been detected using traditional biochemical techniques such as coimmunoprecipitation, affinity chromatography, and photoaffinity labeling [2]. There are three significant advantages to this in vivo assay that led almost immediately to its widespread use: first, it is technically straightforward and can be carried out rapidly; second, the sequence of the two interacting proteins can be read off directly from the DNA sequence of the plasmids encoding them; and third, it does not depend on the identity of the interacting proteins and so is general. The two-hybrid assay was based on the observation that eukaryotic transcriptional activators can be dissected into two functionally independent domains, a DBD and a transcription AD, and that hybrid transcriptional activators can be generated by mixing and matching these two domains [3]. It appears that the DBD only needs to bring the AD into the proximity of the transcription start site, suggesting that the linkage between the DNA-binding and the AD can be manipulated without disrupting activity. Thus, the linkage in the two-hybrid assay is the noncovalent bond between the two interacting proteins. As outlined in Fig. 4.1-3(a),the yeast two-hybrid system consists of two protein chimeras, and a reporter gene downstream from the binding site for
4. J Chemical Complementation:Bringing the Power ofGenetics to Chemistry
A I
DBD
I DNA binding site I
I
>
I Reporter gene I
DBD
I DNA binding site I
DBD
DBD
I DNA bindinq site I
I DNA binding site I
Fig. 4.1-3 Different yeast n-hybrid systems that have been developed t o study protein-protein, protein-DNA, protein-RNA, and protein-small molecule interactions. (a) In the original version o f the yeast two-hybrid system, transcriptional activation o f the reporter gene i s reconstituted by recruitment o f the activation domain (AD) to the promoter region through direct interaction o f protein X and Y, since protein X is fused t o a DNA-binding domain (DBD) and protein Y i s fused to the AD. (b) In the one-hybrid system, the AD is fused directly t o the DBD. This system can be used to assay either DBDs that can bind t o a specific DNA sequence or the in vivo binding site for a
I
>
I Reporter gene I
given DBD. (c) The three-hybrid system that can detect RNA-protein interactions has one more component than the yeast two-hybrid system: a hybrid RNA molecule. One half ofthe hybrid RNA is a known RNA (R) that binds to the MS2 coat protein (MS2) with high affinity and serves as an anchor. The other half i s RNA X, whose interaction with protein Y is being tested. (d) Another version o f the yeast three-hybrid system can be used t o detect small molecule-protein interactions. Ligand L1 that interacts with protein X is covalently linked to ligand L2. Thus, i f L2 interacts with Y, transcriptional activation of the reporter gene will be reconstituted.
the transcriptional activator. If the two proteins of interest (X and Y) interact, they effectively dimerize the DNA-binding protein chimera (DBD-X)and the transcription activation protein chimera (AD-Y). Dimerization of the DBD and the transcription AD helps to recruit the transcription machinery to a promoter adjacent to the binding site for the transcriptional activator, thereby activating transcription of the reporter gene. The assay was demonstrated initially by using two yeast proteins known to be physically associated in vivo [l].The yeast S N F l protein, a serine-threonine protein kinase, was fused to the GAL4 DBD, and the SNFl activator protein SNF4 was fused to the GAL4 transcription AD. A GAL4 binding sequence was placed upstream of a /?-galactosidasereporter gene (lacz).Plasmids encoding
I
203
204
I the protein fusions and the reporter gene were introduced into the yeast. 4 Contro//ing Protein-Protein Interactions
Positive protein-protein interactions lead to the increase in B-galactosidase activity inside the cell, which can be tested in a colorimetric assay using 5-bromo-4-chloro-3-indolylB-D-galactosidase (X-gal)that turns the cells blue, or by direct measurement of enzyme activity using chlorophenol red B-Dgalactopyranoside as a substrate. Control experiments established that neither the DBD and AD domains on their own nor the individual protein chimeras induced B-galactosidase synthesis above background levels. B-Galactosidase synthesis levels were increased 200-fold when the DBD-SNF1 and SNFCAD fusion proteins were introduced together. By comparison, the direct DBD-AD fusion protein activated B-galactosidase synthesis levels 4000-fold. It was quickly realized that the strength of the two-hybrid assay would lie not in its ability to detect a single protein-protein interaction but rather to screen an entire genome to detect novel protein-protein interactions [4-91. For example, Murray and coworkers, as a first step toward testing their hypothesis that the cyclin-dependent kinase (CDK) Cdc20 is involved in the spindle assembly checkpoint in budding yeast, used the yeast two-hybrid assay to determine if any of the proteins known to be involved in the spindle checkpoint physically interact with Cdc20 [lo]. In this experiment, haploid strains containing DBD-MAD (mitotic arrest defective) fusions were crossed with haploid strains containing AD-Cdc2O fusions. Protein-protein interactions in the resulting diploids lead to transcription activation of the lacZ reporter gene. As controls, haploid strains containing SNF1-AD and SNF4-DBD fusions were also mated and tested for B-galactosidase activity. The yeast two-hybrid system detected three new protein partners for Cdc2O: MAD1, MAD2, and MAD3. In this experiment, the yeast two-hybrid assay was the key in rapidly and effectivelyidentifying the new protein-protein interactions. Identification of these interactions using more traditional biochemical methods, such as coimmunoprecipitation,would have been cumbersome and time consuming since those methods require prior isolation of large quantities of all possible interacting proteins before running the assays. By facilitating the discovery of cascades of interacting proteins - in this case, the spindle assembly checkpoint - the yeast two-hybrid assay helps researchers put together entire biochemical pathways and to begin understanding how these proteins function together inside a cell.
4.1.2.2
DNA-Protein Interactions
Early on it was appreciated that, just as the yeast two-hybrid assay could be used to detect protein-protein interactions, transcriptional activators could be used directly, in a “one-hybrid” assay, to detect DNA-protein interactions (Fig. 4.1-3(b))[ll,121. DNA-binding proteins that bind to a given target DNA sequence could be isolated from cDNA libraries encoding all the proteins expressed in a given organism or specific cell type. Alternatively, the optimal or naturally occurring recognition sequences for a given regulatory protein
4. I Chemical Complementation: Bringing the Power ofGenetics to Chemistry
could be determined. With such an approach, Wang and Reed isolated a complementary DNA for the transcriptional activator, Olf-1, believed to be the critical switch for the coordinated expression of olfactory-specific genes [ 131. To achieve this, they fused an olfactory cDNA library, consisting of 3.6 million clones, to the GAL4 transcription AD. The reporter plasmid consisted of three tandem Olf-1 binding sites upstream of a low activity promoter directing the transcriptional activation of the H I S 3 gene. The reporter plasmid requires the AD-cDNA fusion protein to bind to the Olf-1 sites and activate the transcription of the HIS3 gene. Therefore, only cells expressing the AD-cDNA fusion are able to grow on medium lacking histidine.
4.1.2.3
RNA-Protein Interactions
Selecting for RNA-protein interactions is less straightforward because RNAprotein fusions cannot be generated directly in vivo and because routine biochemical assays that turn RNA-binding events into an amplifiable signal are not available. These difficulties were circumvented by adding a third component to the two-hybrid system to generate a “three-hybrid” assay (Fig. 4.1-3(c)) [14, 151. The third component is a hybrid RNA molecule, in which one half is a well-studied RNA molecule that binds to a known protein with high affinity and the other half is the RNA molecule of interest whose protein-bindingpartner is in question. In total, the three-hybrid system consists of two protein chimeras, one RNA chimera, and a reporter gene. The hybrid RNA molecule bridges the DNA-binding and AD-fusion proteins and activates transcription of a reporter gene. In a proof of principle experiment, Wickens and coworkers showed that the RNA three-hybrid system could detect the interactions between two wellstudied protein-RNA pairs: the iron regulatory protein (IRPl) to the iron response element (IRE) RNA sequence, and the HIV transactivator (TAT) protein to the HIV transactivation response (TAR) element RNA sequence [16]. First, they constructed a bifunctional RNA containing a RNA sequence known to bind the coat protein MS2 and the RNA sequence of either IRE or TAR. Next, they fused the DNA-binding domain to the coat protein MS2, and the AD to either the IRPl or TAT proteins. The two protein fusions and the bifunctional RNA were introduced in a yeast strain containing a reporter construct that directs activation of both a lacZ reporter gene and a H I S 3 reporter gene upon RNA-protein interaction. These reporter genes allow the authors to carry the assay as a colorimetric screen using the lacZ reporter gene and as a selection where only cells containing an interacting RNA-protein pair survive on medium lacking histidine. Furthermore, using 3-amino-1,2,3triazole (3-AT),a competitive growth inhibitor of the enzyme encoded by the HIS3 gene, Wickens and coworkers were able to select only cells with elevated expression levels of the H I S 3 gene, reducing the number of false positives in the HIS3 growth selection.
I
205
206
I
4 Controlling Protein-Protein Interactions
4.1.2.4
Small molecule-Protein Interactions
Just as a dimeric RNA molecule can be introduced to mediate the interaction between the DNA-binding and ADS, so can a dimeric small molecule [17]. In fact, well before their use in a small molecule three-hybrid assay, dimeric small molecules were used as “chemical inducers of dimerization” (CIDs) to artificially oligomerize fusion proteins in vivo [18]. In the yeast threehybrid system, the union of two protein fusions and a CID reconstitute the transcription of a reporter gene (Fig. 4.1-3(d)).In 1996, Licitra and Liu built what they called a yeast three-hybrid assay [19]. This assay consists of two fusion proteins and a heterodimeric small molecule CID that brings these fusion proteins together to activate the transcription of a reporter gene (Fig. 4.1-3(d)). Licitra and Liu employed two fusion proteins: the glucocorticoid receptor (GR)fused to the DBD LexA, and FK 506-binding protein (FKBP12) fused to the transcription AD B42 [19].A heterodimeric dexamethasone (Dex)-FK506 molecule that binds to GR and FKBP12, respectively, bridges the two fusion proteins and activates the transcription of a lacZ reporter gene. Further, using the GR-LexA fusion protein and the Dex-FK506 molecule in their yeast three-hybrid assay, Licitra and Liu were able to isolate the FKBP isoform with the highest affinity for FK506 (FKBP12) from a Jurkat cDNA library. This experiment opened the yeast three-hybrid system as a tool for drug discovery.
4.1.2.5
Catalysis
In all the previous applications, the n-hybrid assay is used to detect a binding event, whether it is protein, DNA, RNA, or small molecule binding. Our laboratory and others have been interested in the idea that this powerful genetic assay could be brought to bear on a broader variety of questions. Several different approaches have now been devised for linking enzyme catalysis to reporter gene transcription using the n-hybridassay. Our laboratory introduced “Chemical Complementation”, which detects enzyme catalysis of bond formation or cleavage reactions on the basis of covalent coupling of two small molecule ligands in vivo (Fig. 4.1-4) [20]. In this assay, the enzyme is introduced as a fourth component to the small molecule yeast three-hybrid system, and the linker in the small molecule CID acts as the substrate for the enzyme. Bond formation is detected as synthesis of the CID and hence the activation of an essential reporter gene; bond cleavage is detected as cleavage of the CID and hence the repression of a toxic reporter gene. In theory, this approach should be readily extended to new chemistry, simply by synthesizing small molecule heterodimers with different chemical linkers as the enzyme substrates. Inspired by traditional genetics, our hope is to make a general complementation assay that would link enzyme catalysis of a broad range of chemical reactions to cell survival-extending genetic selections to chemistry beyond that naturally carried out in the cell.
4. I Chemical Comp/ementation: Bringing the Power ofGenetics t o Chemistry
E
Substrate
I
DBD
I
DNA binding site
I
Fig. 4.1-4 Chemical Complementation. A reaction-independent complementation assay for enzyme catalysis based on the yeast three-hybrid assay. A heterodimeric small molecule bridges a DNA-binding domain-receptor fusion protein and an activation domain-receptor fusion protein, activating transcription o f a downstream reporter gene in vivo. Enzyme catalysis o f
I I
Reporter gene
I
either cleavage or formation of the bond between the two small molecules can be detected as a change in transcription o f the reporter gene. The assay can be applied t o new chemical reactions simply by synthesizing small molecules with different substrates as linkers and adding an enzyme as a fourth component t o the system.
In our initial report, we chose cephalosporin hydrolysis by the Enterobacter cloacae P99 p-lactamase (P99) as a well-studied enzyme catalyzed cleavage reaction around which to develop Chemical Complementation [20]. Cephalosporins are B-lactam antibiotics, and p-lactamases are the bacterial resistance enzymes that hydrolyze and inactivate these antibiotics. The P99 B-lactamase is well-characterized biochemically and structurally, and the synthesis of cephalosporins is well established. First, we designed a small molecule CID cephalosporin substrate, incorporating the CID ligands at the C 3’ and C7 positions of the cephem core. Using a lacZ reporter gene, we showed that Chemical Complementation could be used to detect B-lactamase activity using this dexamethasone-methotrexate (Dex-Mtx)heterodimer with a cephem linker (Dex-Cephem-Mtx). In the absence of enzyme, the Dex-Cephem-Mtx CID dimerizes the appropriate DBD- and AD-fusion protein activating transcription of a lacZ reporter gene. Expression of the P99 p-lactamase then presumably leads to cleavage of the Dex-Cephem-Mtx CID, disrupting transcription activation. We also showed that the system could distinguish the wild-type (wt) enzyme from the inactive P99:SG4A variant, in which the critical
I
207
208
4 Controlling Protein-Protein fnteractions
I active site serine nucleophile has been mutated to an alanine, via a lacZ screen. These experiments established the feasibility of detecting enzyme catalysis using the yeast n-hybrid assay. Benkovic and coworkers took a related approach in an assay they called Quest (Querying for Enzymes using the Three-hybrid system), which detects catalysis by coupling substrate turnover to transcription of a reporter gene [21]. Here, the CID that dimerizes the transcriptional activator is a homodimer of the substrate. Enzyme catalysis of free substrate to product is detected as displacement of homodimeric CID substrate from the transcriptional activator fusion proteins. Although this approach has the advantage ofusing unmodified substrate, a new CID-protein pair has to be developed for each new reaction. In a more biological approach, Peterson and coworkers have developed a two-hybrid-based system to detect protein tyrosine kinase (PTK) activity [22]. This assay relies on the PTK-dependent phosphorylation of a tyrosine residue present in a peptide that has been fused to the DBD. The phosphorylated tyrosine is then bound by the phosphotyrosine-binding protein fused to the AD, leading to transcriptional activation of the reporter gene. While limited to peptide substrates, this approach has the advantage that it does not require chemical synthesis, making it more accessible to biologists.
4.1.3 General Considerations
Whether being applied as in the original two-hybrid assay to detect protein-protein interactions or in the related n-hybrid assays to detect protein-DNA, RNA, or small molecule interactions, the basic components of the n-hybrid assay remain the same. Thus, while we focus in this section on the small molecule three-hybrid assay because it is in this that our laboratory specializes, this section could also be used as a technical introduction to any of the other n-hybrid systems. The real strength of the n-hybrid assays lies in how straightfonvard they are to implement in the laboratory with basic knowledge of Escherichia coli and Saccharomyces cerevisiae molecular biology. Moreover, the commercial availability of the components of the two-hybrid system permits any laboratory to rapidly implement the system. Finally, laboratories without prior experience working with S. cerevisiae should not be deterred from carrying out n-hybrid assays, as molecular biology techniques for this organism are similar to those for E. coli.
4.1.3.1
The Chemical Inducer o f Dimerization (CID)
The effectiveness ofany three-hybrid system depends critically on the CID used to dimerize the transcriptional activator in vivo [23,24]. The subject of CIDs has been considered fully in the previous chapter by Clackson, so here we focus on the issues we have found particularly important for the use of CIDs in the
4. I
Chemical Complementation: Bringing the Power ofGenetics to Chemistry
three-hybrid assay. Our presentation of these considerations is based largely on our own work with the yeast three-hybrid system and the CID ligand/receptor pairs Dex/GR, FKS06/FK506 binding protein 12 (FK506/FKBP12), a synthetic analog of FK50G SLF/FK506 binding protein 1 2 (SLF/FKBP12), methotrexate/dihydrofolate reductase (Mtx/DHFR), 06-benzylguanine/06alkylated guanine-DNA alkyltransferase (BG/AGT),estrone/estrogen receptor (ES/ER), and biotin/streptavidin (biotin/SA) (Fig. 4.1-5) [19, 23-28].
Dexamethasone
Me0
FK506
SLF
Trimethoprim
&
HO
Estrone
Biotin
Fig. 4.1-5 Small molecules used t o create chemical inducers of dimerization (CIDs) for the yeast three-hybrid system.
I
209
210
I
4 Controlling Protein-Protein Interactions
First and foremost, a successful three-hybrid system seems to require a high-affinity (low nanomolar KD) CID pair [29]. Using the most sensitive reporter genes commercially available for the Brent LexA yeast three-hybrid system, we found that FK506-Dex, Mtx-Dex, Mtx-Mtx, and Mtx-SLF could all activate transcription, but Dex-Dex and Dex-SLF could not [25]. Second, the directionality of the system is important for a strong transcription read-out. We reported that the Dex-Mtx yeast three-hybrid system showed higher levels of transcription activation when DHFR was fused to the DBD than when fused to the AD [30]. Third, as with any CID application, the ligandlreceptor pair must be considered in the context of the host cell line. For example, the Dex/GR interaction is dependent on associated heat shock proteins. Thus, the KD of this interaction is significantly higher in S. cerevisiae, in which there are only homologous heat shock proteins, than in the native mammalian background. Also, this CID pair cannot be used in E. coli, in which there are no such homologous heat shock proteins. Finally, there are also more subtle effects. For example, for reasons we do not understand, only the E. coli DHFR, not the murine homolog, is functional in the Dex-Mtx yeast three-hybrid system [30].
4.1.3.2
The Genetic Assay
For a laboratory new to the three-hybrid assay, we recommend beginning with the yeast two-hybrid system, which is based on reconstitution of a eukaryotic transcriptional activator protein. Not only is this assay straightforward to practice but also all the necessary strains and plasmids are commercially available. As discussed below, however, there are potential advantages to working in E. coli or using a nontranscription-based assay. Several E. colibased transcription assays and general protein complementation assays (PCA) have now been developed as two-hybrid assays. Notably, while the E. coli transcription assays have proven amenable to the introduction of small molecule CIDs, the PCAs have not. 4.1.3.2.1
The Yeast n-Hybrid System
There are two key versions of the yeast two-hybrid system. The GAL4 system originally introduced by Fields and Song uses the DBD and the AD of the yeast GAL4 gene [ l ] . The LexA system introduced by Brent and coworkers uses the E. coli DBD LexA and the E. coli B42 AD [31]. Over time, these two systems have benefited from a number of improvements. Convenient DBD and AD vectors were developed to carry diverse bacterial drug-resistance markers, yeast origins of replication, and yeast auxotrophic markers. These technical improvements facilitate the testing of large pools of protein variants (ca lo6) using growth selections. In addition to the basic activator system, reverse and split-hybrid systems were developed to detect the disruption of protein-protein interactions, and a transcriptional repressor-based system has been reported [32, 331. Today components for these systems are commercially
4. I Chemical Complementation: Bringing the Power ofGenetics t o Chemistry
available, including Stratagene and Clontech, which market the Gal4 system, Origene, for the LexA system, and Invitrogen, which offers versions of both systems. All of the basic features of the two-hybrid system have been covered already in several excellent reviews and the chapters on methods. In our laboratory we have used the Brent two-hybrid system to build our Dex-Mtx yeast three-hybrid system. We favor the Brent system, which uses LexA, an E. coli transcription factor, and B42, an artificial activator isolated from E. coli genomic DNA. Both LexA and B42 are orthogonal to standard yeast genetic tools and nontoxic to the yeast cell, yet the artificial LexA-B42 transcriptional activator is on par with the strongest transcriptional activators endogenous to S. cerevisiae [31].Moreover, the LexA system permits the use of the tightly regulated GAL1 promoter to drive the expression of the LexA DBD and B42 AD-protein fusions by varying the ratio of galactose and glucose in the growth medium. As reported by Lin et al., we use pMW103, a multicopy 2~ plasmid with a HIS3 maker, to encode the LexA DBD fusions and pMW102, a multicopy 2,u plasmid with a TRPl marker, to encode the B42 AD fusions. Rather than the original EGY48 LEU2 selection strain, we chose the FY251 strain (MATa trplA63 his3A200 ura3-52 leuZAlGal+), which provides an additional selective marker for greater flexibility. The LEU2 or URA3 markers can then be used either for the transcription activation growth selection or introduction of additional plasmids. In this initial publication, we then used the lacZ reporter plasmid pMW112, which encodes the lacZ gene under control of eight tandem LexA operators. Thus, small molecule CID-induced transcription activation could be detected using standard lacZ transcription assays either on plates or in liquid culture [25]. Further optimization of the yeast three-hybrid system in our lab led us to conclude that integration of either the AD or DBD into the yeast chromosome stabilizes the transcription read-out of the reporter gene without loosing transcriptional strength, effectively reducing the number of false positives in the detection of novel ligand-receptor interactions [34]. 4.1.3.2.2
E. coli Transcription Activation Assays
Widespread use of the yeast two-hybrid system led several groups to develop alternate transcription-based assays. While the yeast two-hybrid assay is quite powerful, a bacterial equivalent would increase by several orders of magnitude the number of proteins that could be tested, as the transformation efficiency and doubling rate of E. coli are significantly greater than those of S. cerevisiae. There may also be applications where it is advantageous to test a eukaryotic protein in a prokaryotic environment, in which many pathways are not conserved. The yeast two-hybrid assay cannot, however, be transferred directly to bacteria since the components of the transcription machinery and the mechanism of transcriptional activation differ significantly between bacteria and yeast. The first bacterial repressor assay was developed in 1990 by Sauer and coworkers, who adapted a bacterial h transcriptional repressor system to
I
211
212
I read-out the GCN4-leucine zipper fusion [ 3 5 ] .The transcriptional repressor 4 Controlling Protein-Protein fnteractions
h d controls the lytic/lysogenic pathway in bacteriophage h. As a dimer, hcI is bound to the h operator and prevents the expression of genes involved in the lytic pathway, allowing integration of the h DNA into the
bacterial chromosome. Taking advantage of the hcI dimerization requirement, Sauer and coworkers fused the DNA-binding domain of two hcI to a GCN4 leucine zipper dimerization motive to restore a functional hybrid repressor. Seven years later, Hochschild and coworkers designed a bacterial twohybrid activation system based on the transcription mechanism of E. coli RNA polymerase (RNAP) [ 3 6 ] .This assay is based on their observation that binding of the C-terminus of the a subunit of the RNAP (a-CTD) to an upstream element leads to transcription activation of a downstream gene. To create a bacterial two-hybrid system, the authors replaced the a-CTD with the C-terminus of the transcriptional repressor hc1 (hcI-CTD), generating a ahcI chimera. Binding of the transcriptional repressor hcI to the h operon, leads to recruitment of RNAP via the ahcI chimera, which in turn directs transcription activation of a reporter gene downstream of the h operon. By simply replacing the ahcI chimera with arbitrary protein-protein interactions, they created a bacterial two-hybrid activation system. This technology was successfully applied to detect two interacting yeast proteins, Gal4 and Galll, fused to hcI and a-NTD (N-terminus of the alpha subunit of the RNAP) respectively (Fig. 4.1-6). Our development of a successful yeast three-hybrid system and the advantages promised by an analogous system in bacteria, led us to construct a bacterial three-hybrid system from the RNAP two-hybrid system developed by Hochschild and coworkers [ 3 7 ] . We chose to adapt this assay because it is a transcriptional activation system, and reconstitution of transcriptional activation should be largely conformation independent. The key to converting this two-hybrid assay into a three-hybrid system was the design of a dimeric ligand that could bridge hcI and a-NTD through the receptors of the ligand. For the bridging small molecule, we chose to prepare a heterodimer of Mtx and
.
Fig. 4.1-6 The bacterial two-hybrid system developed by Hochschild and coworkers. The Acl repressor and the a-subunit o f RNAP are fused t o two arbitrary proteins, X
and Y. Binding ofthe Acl repressor t o the A operon followed by dirnerization o f X and Y recruits RNAP leading t o transcription activation o f a downstream reporter gene.
4. I Chemical Complementation: Bringing the Power ofGenetics t o Chemistry
a synthetic analogue of FK506 (SLF).We call this heterodimer Mtx-SLF. We did not pursue building a bacterial three-hybrid system based on the Mtx-Dex heterodimer previously used in our yeast three-hybrid system because the Dex/GR interactions require heat shock proteins that are absent in E. coli. The heterodimer Mtx-SLF gives a strong transcription read-out in the E. coli RNAP three-hybrid system, providing a robust platform €or high-throughput assays based on protein-small molecule interactions.
4.1.3.3
Protein Complementation Assay
All of the above assays are based on transcription of a reporter gene. A different method for studying protein-protein interactions is the use of a PCA. Here an enzyme with a phenotype detectable via either a screen or a selection is divided into two nonfunctional fragments that are fused to proteins to be tested for dimerization. If the tested proteins dimerize, the two enzyme fragments are brought into close proximity leading to reconstitution of enzyme activity (Fig. 4.1-7) [38, 391. Since PCAs are independent ofthe cell’s transcription machinery, they can be used to detect protein interactions in any cell type or cell compartment in vivo or in vitro. Furthermore, PCAs can potentially quantify protein-protein interactions since there is a simple relationship between protein dimerization and reconstituted enzyme activity. PCAs have been developed using a variety of proteins including B-galactosidase, B-lactamase, DHFR, GFP (green fluorescent protein), and YFP (yellowfluorescent protein) 140-421. For example, in a proof of principle paper, Michnick and coworkers showed that mDHFR can be split into two fragments that show no detectable
Fig. 4.1-7 Protein complementation assays. A protein that carries out a detectable function is separated into two fragments that show no detectable
reconstituted enzyme activity on their own (blue and green), but can effectively reconstitute enzyme activity when fused t o two interacting proteins, X and Y.
I
213
214
I reconstituted enzyme activity on their own but can effectively reconstitute 4 Controlling Protein-Protein Interactions
enzyme activity when fused to two interacting proteins. Bacteria expressing a functionally reassembled mDHFR can easily be selected since mDHFR activity is essential for growth of E. coli in the presence of trimethoprim, which selectively inhibits bacterial DH FR but not its eukaryotic counterpart mDHFR. Further, the mDHFR PCA works as a selection system in eukaryotic cells deficient in endogenous DHFR activity [43]. In a remarkable application of this system, Michnick and coworkers were able to detect a protein-protein interaction, locate the interaction to a specific cell compartment, and place the interaction in a signal transduction pathway by doing a single assay based on the DHFR PCA in mammalian cells deficient of DHFR [44].Specifically, they examined protein interactions in the well-studied signal transduction pathway of receptor tyrosine kinase, which mediates control of initiation of translation in eukaryotes. From 35 interactions tested, the DHFR PCA selection identified 14 interacting partners that were localized to specific intracellular compartments using fluorescein-Mtx,a fluorophore in which the Mtx portion binds to the reconstituted DHFR with nanomolar affinity. The position of the protein interaction in the signal transduction pathway was determined by using three small molecule inhibitors known to act at key points of the pathway. In view ofthe advantages PCAs would bring to the detection ofprotein-small molecule interactions, our laboratory has made some efforts to develop a small molecule PCA three-hybrid assay, though without success [45]. Specifically,we tested both the Mtx-SLF adenylate cyclase PCA and the Mtx-SLF b-lactamase PCA in E. coli (E. Althoff, V. Cornish, unpublished results). In addition, we tested a Dex-Mtx GFP PCA also in E. coli in collaboration with Regan and coworkers (E. Althoff, V. Cornish, T. Magliery, L. Regan, unpublished results). From both, a simple thermodynamic consideration and these results, we hypothesize that without the high degree of cooperativity found in the transcription-based assays, the PCAs cannot detect a three-component interaction.
4.1.3.4 Problem Choice
The two-hybrid assay was originally used simply for cloning proteins based on their interaction with other proteins in a given biological pathway. However, the more recent development of one- and three-hybrid assays opens the door to studying DNA, RNA, and small molecule interactions, and even catalysis. Though developed as a genetic assay for cloning, there is no reason that the n-hybrid assays cannot be used for a broad range of applications, including drug discovery, directed evolution, and enzymology. It is interesting to consider how well suited the two-hybrid assay is for its original conception - the discovery of new proteins on the basis of their binding to other known proteins - particularly as this assay begins to be carried out on a genome-wide scale. An important paper that bears on this question,
4.1 Chemical Complementation: Bringing the Power ofGenetics to Chemistry
1
in our opinion, comes from Golemis and Brent, in which they estimated that the KD cutoff for the yeast two-hybrid assay is ca 1 p M [4G].Assuming that the proteins are being expressed at ca 1 p M concentrations, the two-hybrid assay can only detect relatively high-affinity interactions (ca K D = 1 pM). Thus, while the two-hybrid assay is quite successful at identifying new interactions, it is probably not appropriate to assume that a high-throughput two-hybrid assay gives a snapshot of all interactions. In fairness, however, it should be pointed out that traditional affinity chromatography approaches are even further impaired because they rely on the natural abundance of any given protein in the cell. Extending this analysis to drug discovery using the small molecule three-hybrid assay, it is our opinion that the threehybrid assay was long underutilized because the original systems had low sensitivity owing to the CID anchor. Recently, we have shown that our Mtx three-hybrid system has a KD cutoff of ca 100nM [29].Consistent with this idea, GPC Biotech reported last year the use of the Mtx threehybrid system for identification of protein targets of CDK inhibitors [47]. Interestingly, Hochschild and coworkers have shown that they can build additional sensitivity into their bacterial two-hybrid assay by adding cooperative interactions [48]. The n-hybrid assay can also be used for directed evolution. For example, Pabo and coworkers have adapted a bacterial one-hybrid assay to evolve zincfinger variants with defined DNA-binding specificities [49].Starting with a three zinc-finger protein that has nanomolar affinity for its DNA-binding site, the authors replaced the binding site for the third zinc finger with a new DNA sequence and then randomized the third finger to evolve a zincfinger variant with increased affinity for the target sequence. Impressively, the evolved zinc finger showed DNA affinity within 10-fold of the wt protein, KD = 0.01 nM, and a 10- to 100-fold preference for the modified over the wt DNA sequence. Given the low K D cutoff and the fact that the n-hybrid assay is governed by equilibrium binding, there are two likely limitations to using this assay for directed evolution. First, the assay cannot effectively detect initial, weak binders. Second, the assay is limited in its ability to distinguish evolved variants on the basis of improvements in KD since energy differences of only a few kilocalories per mole determine whether a molecule is bound at equilibrium. In theory, however, these limitations could be overcome by varying the concentration of the n-hybrid components or, again, by building in a series of tunable, cooperative interactions. Pabo and coworkers, then, choose their problem well. They began with a zinc-finger protein with two out of three zinc fingers intact. This initial binding affinity enabled them to select good binders in a single round of selection, rather than trying to improve binding affinity through multiple rounds of selection. A similar analysis suggests that the n-hybrid assays may be ideally suited to catalysis applications since large differences in catalytic activity are needed to significantly affect the half-life of product formation.
215
216
I
4 Controlling Protein-Protein lnteractions
4.1.4 Applications
Although introduced only in 1989, the yeast two-hybrid assay has emerged as an integral tool for biology research. Two-hybrid screens now appear regularly in the biology literature. Genome-widetwo-hybrid screens are even the focus of major research publications. Somewhat surprisingly then, there have been few applications of the related n-hybrid technologies to detect protein interactions with DNA, RNA, and small molecules, or applications beyond cloning. Here we look at more recent applications of n-hybrid assays with an eye for asking whether this discrepancy results from the relative power of these different n-hybrid assays or rather the biases of current research.
4.1.4.1
Protein-Protein lnteractions
Traditional genetic assays and more recently the yeast two-hybrid assay have been primarily used to identify natural protein-protein interactions. Twohybrid screens are now fully integrated into the biologist’s toolbox and appear routinely in the published literature. Almost half of the published protein-protein interactions to date have been detected, at least in part, using the yeast two-hybrid assay [SO]. Beyond these simple cloning applications, the two-hybrid assay would seem perfectly suited for genomics. For example, automation techniques were used to identify all possible protein-protein interactions in S. cerevisiae [51]. Every open-reading frame encoding a protein, ca GOOO in S. cerevisiae, was fused both to a DNA-binding domain and an AD, and the two fusion libraries were screened against one another. The major challenge in this project was how to transform all combinations of the GO00 DBD and GOOO AD fusions into yeast and then how to assay so many cells. Since a library of lo7 is at the limit of the transformation efficiency of yeast, it is in theory achievable. Uetz and coworkers compared two approaches. In the first approach, they explicitly mated haploid mating type (MATa) cells containing 192 DBD fusions with haploid MATa cells containing the GOOO AD fusions in a spatially addressable format, such as microtiter plate, and assayed each well using a HIS3 growth selection. In the second one, MATa cells containing the GOOO DBD fusions were mated with MATa cells containing the GOOO AD fusions, and only diploids that survived in a LEU2 growth selection were arrayed and analyzed individually. Interestingly, there were significantly more “hits” in the first spatially addressable format, underscoring the importance of parameterizing new methods for high-throughput screening and the problem of distinguishing false positives and negatives in genomics. This example highlights how well suited the n-hybrid assays are for extracting some of the information provided by recent genome sequencing efforts. While the two-hybrid method has been extensively used to detect natural protein-protein interactions, it should also be well suited for protein evolution. Brent and coworkers demonstrated that the two-hybrid assay can be used to
4. J Chemical Complementation: Bringing the Power ofGenetics to Chemistry
I
217
Table4.1-1 The sequences and binding affinities of 14 different
aptamers for binding to Cdk2 isolated in a yeast two-hybrid system Aptamer
KO (n M)
Amino acid sequence
Pep1 Pep2 pep3 pep4 pep5 Pep6 pep7 Pep8 pep9 Peplo Pep11 Pep12 pep13 pep14
ND[~~ 64 16 112 4~17 ND 52f3 ND ND 3nf5 ND 105 10 87 7 ND ND ND
ELRHRLGRAL SEDMVRGLAW GPTSHCATVP GRSDLWRVIR LVCKSYRLDW EAGALFRSLF YRWQQGWPS NMASCSFRQ SSFSLWLLMV KSIKRAAWEL GPSSAWNTSG WASLSDFY SVRMRYGIDA FFDLGGLLHG RVKLGYSFWA QSLLRCISVG QLYAGCYLGV VIASSLSIRV YSFVHHGFFN FRVSWREMLA QQRFVFSPSW FTCAGTSDFW GPEPLFDWTR D QVWSLWALGW RWLRRYGWNM WRRMELDAEI RWVKPISPLE RPLTGRWVVW GRRHEECGLT PVCCMMYGHR TAPHSVFNVD WSPELLRAMV AFRWLLERRP
a
*
* *
ND
-
not determined
identify peptide aptamers that inhibit Cdk2 from a library of random peptide sequences (Table 4.1-1) [52]. The 20-residue peptide library was displayed in the active site loop of E. coli thioredoxin (TrxA).The TrxA loop library was fused to the AD, and Cdk2 was fused to the DBD. In a single round of assay, 6 x lo6 TrxA-AD transformants, a very small percentage of the 20mers possible, were tested for binding to LexA-Cdk2. From this assay, they isolated 66 colonies that activated transcription of both a LEU2 and a lacZ reporter gene. Remarkably, these colonies converged on 14 different peptide sequences that bound Cdk2 with high affinity. Using surface plasmon resonance, the peptide aptamers were shown to bind Cdk2 with KDs of 30-120 nM. In kinase inhibition assays, the peptide aptamers had ICsos for the CdkZ/cyclin E kinase complex of 1- 100 nM. What is particularly impressive about this experiment is that nanomolar affinity ligands are being isolated in a single round of selection from a library only on the order of 106-108. Similar results have been obtained using peptide aptamers in a traditional genetic selection [53]. Given the success of this and related “aptamer” selections, it is somewhat surprising that these “aptamer” scaffolds are not more widely used. There are several potential advantages to directed evolution over traditional monoclonal antibody technology for generating selective binding proteins. Optimistically, six months are required from the start of immunization, through immortalization, and finally screening to generate a monoclonal antibody. On the other hand, if several peptide aptamer libraries were maintained for routine use, the libraries could be screened against a new target, false positives could be sorted out, and biochemical assays could validate a target in less than a month and at considerably less expense. Moreover, protein
218
I scaffolds other than antibodies may prove more robust for use as reagents and 4 Controlling Protein-Protein lnteractions
therapeutic applications. Perhaps because monoclonal antibody technology has become so robust over the years, the momentum does not seem to be there to seriously explore replacing this technology with directed evolution. It is also interesting to compare these “aptamer” scaffolds to chemical genetic approaches for generating inhibitors for a broad array of biological targets.
4.1.4.2
DNA-Protein Interactions
Just as the yeast two-hybrid assay can be used to detect protein-protein interactions, transcriptional activators can be used directly to detect protein-DNA interactions. In truth, this type of experiment was done before the one-hybrid assay was conceptualized as such. For example, as early as 1983 a His6 + Pro Mnt variant was generated that preferentially binds a mutant Mnt operator using a transcription-based selection [54]. A plasmid encoding Mnt was mutagenized both by irradiation with UV light and by passage through a mutator strain. The mutant plasmids were then introduced into E. coli and selected against binding to the wt operator and for binding to the mutant operator. Because there are a variety of convenient reporter genes, the E. coli was engineered to link DNA recognition to cell survival in both the negative (selection against binding to the wt operator) and the positive (selection for binding to the mutant operator) directions. Binding to the wt Mnt operator was selected against by placing a tet resistance (tetR)gene under negative control of the wt Mnt operator. If a Mnt mutant bound the wt operator, it would block synthesis of the tetR gene, and the E. coli cells would die in the presence of tetracycline. Then Mnt variants with altered DNA-binding specificity were selected for on the basis of immunity to infection by a P22 phage containing a mutant Mnt operator. The mutant Mnt operator controlled synthesis of the proteins responsible for lysing the bacterial host. If a Mnt variant could bind to this mutant operator, it would turn off the lytic machinery, and the bacteria would survive phage infection. Four independent colonies were isolated from the two selections. Again, only a single round of selection was required for each step. All four colonies encoded the same His6 + Pro mutation, two by a CAC + CCC and two by a CAC + CCT mutation. Not only did these mutants bind to the mutant operator but they also did not bind efficiently to the wt operator. More recently, Pabo and coworkers adapted a bacterial two-hybrid assay into a bacterial one-hybrid system to evolve zinc-finger variants with defined DNAbinding specificities [49]. In this assay, three tandem zinc fingers function as the DBD of this one-hybrid system and are fused to Gall1 protein, known to dimerize with Ga14, which is fused to the RNA polymerase. Binding of the three tandem zinc fingers to a specific DNA sequence upstream of the reporter gene, mobilizes the RNAP to the promoter region of the reporter gene and initiates transcription thereof (Fig. 4.1-8).This assay allows testing f 1 0 8 protein variants per round of selection. However, if all three zinc fingers were to protein variants (using be randomized simultaneously it would create 8 x
4. I Chemical Complementation: Brhging the Power ofGenetics to Chemistry
-
1 round of
s
e
T
I
d
g I
F3 ZF
2F3
F
DNAbindiny 18fe
Fig. 4.1-8 Development ofzinc fingers specific for a specific DNA sequence using a one-hybrid assay adapted from a bacterial two-hybrid system. Zinc fingers (ZF) 1, 2, and 3 from the Zif268 protein were fused to the Call 1 protein. The Gal4 protein, which binds Gall 1 with high affinity, was fused to
Reporter ene
the cy-subunit o f RNAP. I f ZF3 bound t o the first site with high affinity, the RNAP complex would be recruited, activating transcription o f a HIS3 reporter gene. Significantly, in just one round o f assay, several proteins were identified that bound specifically to the target DNA sequence.
24 codons at six amino acids per three zinc finger = (246)3),which cannot be covered by this high-throughput method. Thus, the authors are limited to randomizing one finger at a time, while keeping the other two unchanged. We believe that conserving the high affinity of two zinc fingers for the DNA may be important for the success of Pabo and coworkers’ directed evolution, because starting a directed evolution with a high-affinity protein for DNA ensures the evolution of proteins within the dynamic range of the n-hybrid system. For this zinc-finger evolution, they created a library of ca 10’ variants, and identified a total of nine sequences that bound specifically to three target DNAs with a preference of 10-to100-fold for the modified over the wt DNA. Comparing their results for the zinc-finger evolution using the bacterial hybrid system with earlier results obtained in a similar zinc-finger evolution study using phage display, Pabo and coworkers conclude that the affinity and specificity of the selected zinc fingers is superior to those obtained in earlier phage display studies. Moreover, the bacterial hybrid system is a more rapid alternative to phage display because it permits isolation of functional fingers in a single selection step instead of using multiple rounds of enrichments. Speaking to the power of this approach, Sangamo uses a modified one-hybrid assay for its selection of artificial DNA-binding proteins for commercial applications [55, 561. The success found here raises the question of other binding interactions. One could speculate that the success here depends on starting with two known zinc fingers with high affinity for their DNA target, except that the protein “aptamer” scaffold selections described in the previous section have begun with scaffolds with no measurable affinity for their protein target.
4.1.4.3
RNA-Protein Interactions
Before the development of the RNA three-hybrid system, identification of protein-RNA interactions was limited to in vitro methods such as pull-down assays using radiolabeled RNA. The introduction of the RNA three-hybrid system has allowed not only the detection of well-studied protein-RNA
I
219
220
I pairs, but also the identification of novel protein-RNA 4 Controlling Protein-Protein Interactions
interactions. An impressive application ofthis system is the cloning of a regulatory protein from Caenorhabditis elegans that binds to the 3’ untranslated region of the FEM-3 (fern-33’UTR)and mediates the sperm/oocyte switch in hermaphrodites [57]. In this assay, a bifunctional RNA plasmid possessing fern-33’UTRand the RNA ligand for the MS2 coat protein was introduced into a yeast strain expressing a DBD-MS2 upstream of the HIS3 and lac2 reporter genes. Into this strain, a complementary DNA-AD library was introduced. Cells containing a positive protein-RNA interaction were selected first for HIS3 and lacZ activation followed by screening for the presence of the bifunctional RNA plasmid. The RNA plasmid from successful candidates was lost by reverse selection and the cells were tested again for lacZ activation to reduce the number of false positives. Cells that failed to activate lacZ after plasmid loss were tested for fern-33’UTR binding specificity by reintroduction of the bifunctional RNA plasmids. The protein encoded in the only cDNA-AD that satisfied all selection and screening criteria was found to have 93% homology at the nucleotide level with two genes encoded in the C. elegans genome. Further testings confirmed these genes to be regulators of the sperm/oocyte switch in hermaphrodite C. elegans. The specificity with which the RNA three-hybrid assay selected just one protein from thousands for the selected protein-RNA interaction illustrates the power of this assay for finding novel protein-RNA interactions [lG].The recent discovery, for example, of RNAi highlights the need not to forget about molecules other than proteins when carrying genetic assays [58, 591.
4.1.4.4
Small Molecule-Protein Interactions
While several small molecule three-hybrid systems have now been reported, it was only in 2004 that such a system was used successfully for drug discovery research. Specifically, Becker and coworkers reported that the Mtx yeast threehybrid system developed in our laboratory could be used to clone novel protein targets of CDK inhibitors (Table 4.1-2) [47].The CIDs used in this study took advantage of the low picomolar affinity of Mtx for DHFR [25]. Three known CDK inhibitors, roscovitine, purvalanol B, and indenopyrazole, were linked to Mtx and introduced into a yeast strain expressing a DBD-DHFR protein fusion upstream of the HIS3 reporter gene and a library of kinase cDNAs linked to a transcription AD. With this system they isolated, besides the known CDK targets, 29 new kinase targets, 22 of which were either confirmed by in vitro binding or enzyme inhibition assays. We speculate that the success here was from the use of the high-affinity Mtx/DHFR anchor, which, as we recently showed, gives a KD cutoff of ca 100 nM in the yeast three-hybrid assay.
4.1.4.5
Catalysis
The widespread utility and robust transcription read-out of the n-hybrid system motivated several laboratories to develop general methods to detect enzyme
4. I Chemical Complementation: Bringing the Power ofGenetics to Chemistry
Table 4.1-2 Summary of biochemical analysis o f purvalanol B-Protein interactions. Binding o f proteins t o immobilized purvalanol B but not t o CDK-inactive-N6-methylated purvalanol B was evaluated by immunoblotting or liquid chromatography-mass spectrometry (for endogenous Jurkat proteins). Enzyme assays were performed with purified enzymes and percentage inhibition o f kinase activity observed with 1 pM purvalanol B
catalysis in vivo around the small molecule three-hybrid system. Several proofs of principle papers have been published in the last few years, and now the key test of these systems is whether they can be readily applied to new chemistry. Toward that end, our laboratory recently demonstrated that Chemical Complementation could be used to detect glycosidic bond formation using a glycosynthase [GO]. We chose glycosidic bond formation because despite the fundamental role of carbohydrates in biological processes and their potential use as therapeutics, carbohydrates still remain difficult to synthesize. Specifically, this system was developed using the E197A mutant of Cel7B from Humicola insolens, which
I
221
222
I had previously been shown to be an efficient“glycosynthase” using an a-fluoro 4 Controlling Protein-Protein Interactions
donor substrate. Here, enzymatic activity is detected as formation of a bond between a Mtx-disaccharide-fluoridedonor (Mtx-Lac-F)and a dexamethasonedisaccharide acceptor (Dex-Cel), which dimerize DBD-eDHFR and AD-GR activating transcription of a LEU2 reporter gene that permits survival under appropriate selective conditions. The growth advantage conferred by the glycosynthase activity was used to select the Ce17B:E197A glycosynthase from a pool of inactive variants (Cel7B).A mock library containing 100: 1 inactive variants to glycosynthase underwent 400-fold enrichment in glycosynthase after a single round of selection. Encouraged by this result, we carry out the directed evolution of the glycosidase Cel7B to improve its glycosynthase activity using a Glu197 saturation library. From a library of lo5 mutants, Ce17BE197S was selected, which showed a fivefold improvement glycosynthase activity over the known Ce17B:E197A glycosynthase (Table 4.1-3). As intended, no further modifications to Chemical Complementation were needed to extend this assay to detect glycosynthase activity. All that was required to detect glycosynthase activity was to add the Dex and Mtx saccharide substrates. This result shows the generality of Chemical Complementation, and the ease with which it can be applied to new chemical reactions. Moreover, it shows that Chemical Complementation can detect not only bond cleavage but also bond formation reactions. Although, the size of the Glu197 saturation library selected here was quite small, with only 32 members at the DNA level, the transformation efficiency of S. cerevisiae, however, allows much larger libraries, in the order of lo5-10’. 4.1.5 Future Development
The yeast two-hybrid assay no doubt will continue to be a mainstay technique for the discovery of new protein-protein interactions. As biological pathways Table 4.1-3 Clycosynthase activities and protein purification yields for Cel7B variants E197A
E197S
N196D/E197A
Specific activity (mol [F])/(min-’ mol [&I) 8 f2 40 f 5 7&1 Protein purification yield [nmol IF1] 6.1 4.6 7.3 Glycosynthase activity for tetrasaccharide synthesis from a-lactosyl fluoride and p-nitrophenyl p-cellobioside (PNPC) was measured for the Humicola insolens Cel7B variants in sodium phosphate buffer, pH 7.0, at room temperature. Specific activities were determined by measuring the fluoride ion release rate by a fluoride ion selective electrode. The protein purification yields are the yield of purified protein as determined by western analysis from total cell culture.
References I 2 2 3
are being studied increasingly at the systems level, the two-hybrid assay has the potential to be quite useful for analyzing total protein dynamics in living cells. As seen in the PCA work by Michnick and coworkers, it is here that technical improvements will prove important for the two-hybrid assay. But it is the n-hybrid assays that have the potential to extend the power of genetics to molecules other than proteins, such as nucleic acids and small molecules. Despite this enormous potential, use of these other n-hybrid assays pales in comparison to that of the two-hybrid assay. As we argue in this chapter, a consideration of the published literature suggests that this discrepancy is not the result of some inherent technical limitation to the n-hybrid assays, but rather likely reflects the bias of current practice. Thus, it is here that we believe there is most potential for the future development of the n-hybrid assay and indeed genetics as a whole. Technically, the nhybrid assays probably still can be further developed for different classes of molecules or posttranslational modifications. But already in their present form these assays seem to have tremendous potential for biological discovery, uncovering new functions for the many classes of molecules that make up the cell. These advances also expand our ability to engineer the cell to harness its synthetic and functional capabilities for chemical discovery. Just as protein engineering impacted both basic research and the biotechnology and pharmaceutical industries in the last 25 years, so should cell engineering in this century. Such systems engineering likely will require a much more quantitative understanding of cellular processes, and accordingly the n-hybrid assays will have to be characterized and rebuilt on this level, allowing, for example, the K D cutoff of the assay to be dialed-in. Using this genetic assay in entirely new ways should then open the door for new chemistry, with the potential to match the complexity of cell function.
References S. Fields, 0. Song, A novel genetic system to detect protein-protein interactions, Nature 1989, 340, 245-246. 2. E.M. Phizicky, S. Fields, Proteinprotein interactions: methods for detection and analysis, Microbiol. Rev. 1995,59,94-123. 3. L. Keegan, G. Gill, M. Ptashne, Separation of DNA binding from the transcription-activating function of a eukaryotic regulatory protein, Science 1986, 231,699-704. 4. E.A. Golemis, Protein-Protein Interactions: a Molecular Cloning 1.
Manual, 1st ed., Cold Spring Harbor Lab Press, New York, 2002. 5. B.T. Carter, H. Lin, V.W. Cornish, in Directed Molecular Evolution of Proteins, (Eds.: S. Brakmann, K. Johnsson), Wiley-VCH Verlag, Weinheim, 2002. 6. E. Phizicky, P.I. Bastiaens, H. Zhu, M. Snyder, S. Fields, Protein analysis on a proteomic scale, Nature 2003, 422,208-215. 7. C.R. Geyer, R. Brent, Selection of genetic agents from random peptide aptamer expression libraries, Methods En~ymol.2000,328,171-208.
224
I
4 Controlling Protein-Protein interactions 8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18. S.L. Schreiber, Chemistry and biology H. Lin, V.W. Cornish, In vivo of the immunophilins and their protein-protein interaction assays: immunosuppressive ligands, Science beyond proteins we would like to 1991,251,283-287. thank Tony Siu, Dr. Charles Cho, and 19. E.J. Licitra, 7.0. Liu, A three-hybrid the members of our lab for their helpful comments as we were system for detecting small ligandpreparing this manuscript, Angew. protein receptor interactions, Proc. Chem., Int. Ed. Engl. 2001,40, Natl. Acad. Sci. U.S.A. 1996, 93, 871-875. 12817-12821. H. Lin, V.W. Cornish, Screening and 20. K. Baker, C. Bleczinski, H. Lin, selection methods for large-scale G. Salazar-Jimenez,D. Sengupta, analysis of protein function, Angew. S. Krane, V.W. Cornish, Chemical Chem., Int. Ed. Engl. 2002, 41, complementation: a 4402-4425. reaction-independent genetic assay for L.H. Hwang, L.F. Lau, D.L. Smith, enzyme catalysis, Proc. Natl. Acad. Sci. C.A. Mistrot, K.G. Hardwick, E.S. U.S.A. 2002, 99,16537-16542. Hwang, A. Amon, A.W. Murray, 21. S.M. Firestine, F. Salinas, A.E. Nixon, Budding yeast Cdc20: a target of the S.J. Baker, S.j. Benkovic, Using an spindle checkpoint, Science 1998, 279, AraC-based three-hybrid system to 1041- 1044. detect biocatalysts in vivo, Nut J.A. Chong, G. Mandel, in The Yeast Biotechnol 2000, 18, 544-547. Two-Hybrid System, (Eds.: B. P.L., 22. D.D. Clark, B.R. Peterson, Rapid S. Fields), Oxford University Press, detection of protein tyrosine kinase New York, 1997, pp. 289-297. activity in recombinant yeast M.K. Alexander, D. Bourns, V.A. expressing a universal substrate, /. Zakian, in Two-Hybrid Systems, Proteome Res. 2002, I , 207-209. Methods and Protocols, Vol. 177 (Ed.: 23. D.M. Spencer, T.J. Wandless, S.L. P.N. MacDonald), Humana Press, Schreiber, G.R. Crabtree, Controlling New Jersey, 2001, pp. 241-260. signal transduction with synthetic M.M. Wang, R.R. Reed, Molecular ligands, Science 1993, 262, 1019-1024. cloning of the olfactory neuronal 24. J.F. Amara, T. Clackson, V.M. Rivera, transcription factor Olf-1 by genetic T. Guo, T. Keenan, S. Natesan, selection in yeast, Nature 1993, 364, R. Pollock, W. Yang, N.L. Courage, 121-126. D.A. Holt, M. Gilman, A versatile S. jaeger, G. Eriani, F. Martin, Results synthetic dimerizer for the regulation and prospects of the yeast three-hybrid of protein-protein interactions, Proc. system, F E E S Lett. 2004, 556, 7-12. Natl. Acad. Sci. U.S.A. 1997, 94, B. Zhang, B. Kraemer, D. SenGupta, 10618-10623. S. Fields, M. Wickens, Yeast 25. H. Lin, W. Abida, R. Sauer, W.V. three-hybrid system to detect and Cornish, Dexamethasoneanalyze interactions between RNA and methotrexate: an efficient chemical protein, Methods Enzymol. 1999, 306, inducer of protein dimerization in 93-113. vivo,J. Am. Chem. SOC.2000, 122, D.J. SenGupta, B. Zhang, B. Kraemer, 4247-4248. P. Pochart, S. Fields, M. Wickens, A 26. S.J. Kopytek, R.F. Standaert, J.C. Dyer, three-hybrid system to detect J.C. Hu, Chemically induced RNA-protein interactions in vivo, Proc. dimerization of dihydrofolate Natl. Acad. Sci. U.S.A. 1996, 93, reductase by a homobifunctional 8496-8501. dimer of methotrexate, Chem. Biol. N. Kley, Chemical dimerizers and 2000, 7,313-321. three-hybrid systems: scanning the 27. S. Gendreizig, M. Kindermann, proteome for targets of organic small molecules, Chem. Biol. 2004, I I , K. Johnsson, Induced protein 599-608. dimerization in vivo through covalent
References
28.
29.
30.
31.
32.
33.
34.
35.
labeling,]. Am. Chem. SOC.2003, 125, 14970-14971. S.S. Muddana, B.R. Peterson, Facile synthesis of cids: biotinylated estrone oximes efficiently heterodimerize estrogen receptor and streptavidin proteins in yeast three hybrid systems, Org. Lett. 2004, 6, 1409-1412. K.S. de Felipe, B.T. Carter, E.A. Althoff, V.W. Cornish, Correlation between ligand-receptor affinity and the transcription readout in a yeast three-hybrid system, Biochemistry 2004,43,10353-10363. W.M. Abida, B.T. Carter, E.A. Althoff, H. Lin, V.W. Cornish, Receptordependence of the transcription read-out in a small-molecule three-hybrid system, Chembiochem 2002,3,887-895. J. Gyuris, E. Golemis, H. Chertkov, R. Brent, Cdil, a human G1 and S phase protein phosphatase that associates with Cdk2, Cell 1993, 75, 791-803. M. Vidal, R.K. Brachmann, A. Fattaey, E. Harlow, J.D. Boeke, Reverse two-hybrid and one-hybrid systems to detect dissociation of protein-protein and DNA-protein interactions, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 10315-10320. H.M. Shih, P.S. Goldman, A.J. DeMaggio, S.M. Hollenberg, R.H. Goodman, M.F. Hoekstra, A positive genetic selection for disrupting protein-protein interactions: identification of CREB mutations that prevent association with the coactivator CBP, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 13896-13901. K. Baker, D. Sengupta, G. SalazarJimenez, V.W. Cornish, An optimized dexamethasone-methotrexate yeast 3-hybrid system for high-throughput screening of small molecule-protein interactions, Anal. Biochem. 2003, 3 15, 134-137. J.C. Hu, E.K. O’Shea, P.S. Kim, R.T. Sauer, Sequence requirements for coiled-coils: analysis with lambda repressor-GCN4 leucine zipper fusions, Science 1990, 250, 1400-1403.
3 6.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
S.L. Dove, J.K. Joung, A. Hochschild, Activation of prokaryotic transcription through arbitrary protein-protein contacts, Nature 1997, 386, 627-630. E.A. Althoff, V.W. Cornish, A bacterial small-molecule three-hybrid system, Angew. Chem., Int. Ed. Engl. 2002, 42, 2327-23 30. S.W. Michnick, I. Remy, F.X. Campbell-Valois, A. Vallee-Belisle, J.N. Pelletier, Detection of protein-protein interactions by protein fragment complementation strategies, Methods Enzymol. 2000, 328, 208-230. 1. Remy, J.N. Pelletier, A. Galarneau, in Protein-Protein Interactions, (Ed.: E. Golemis), Cold Spring Harbor Laboratory Press, New York, 2001, pp. 449-475. S.W. Michnick, 1. Remy, F. Valois, in Methods in Enzymology,Vol. 14, (Eds.: J. Abelson, S. Emr, J. Thorner), Academic Press, London, 2000, pp. 208-230. F. Rossi, C.A. Charlton, H.M. Blau, Monitoring protein-protein interactions in intact eukaryotic cells by beta-galactosidase complementation, Proc. Natl. Acad. Sci. U.S.A. 1997, 94,8405-8410. T. Wehrman, B. Kleaveland, J.H. Her, R.F. B a h t , H.M. Blau, Protein-protein interactions monitored in mammalian cells via complementation of beta-lactamase enzyme fragments, Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 3469-3474. 1. Remy, S.W. Michnick, Clonal selection and in vivo quantitation of protein interactions with protein-fragment complementation assays, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,5394-5399. I. Remy, S.W. Michnick, Visualization of biochemical networks in living cells, Proc. Natl. Acad. Sci. U.S.A. 2001. 98,7678-7683. E.A. Althoff, Engineering LigandReceptor Interactions Using a Bacterial Three-Hybrid System, Columbia University, New York, 2004. J. Estojak, R. Brent, E.A. Golemis, Correlation of two-hybrid affinity data
I225
226
I
4 Controlling Protein-Protein lnteractions
47.
48.
49.
50.
51.
52.
cyclin-dependent kinase 2, Nature with in vitro measurements, Mol. Cell. 1996,380,548-550. Biol. 1995, 15, 5820-5829. 53. M. Yang, Z. Wu, S. Fields, F. Becker, K. Murthi, C. Smith, Protein-peptide interactions analyzed J. Come, N. Costa-Roldan, with the yeast two-hybrid system, C. Kaufmann, A. Hanke, S. Dedier, Nucleic Acids Res. 1995, 23, S. Dill, D. Kinsman, N. Hediger, 1152-1156. N. Bockovich, S . Meier-Ewert,A three-hybrid approach to scanning the 54. P. Youderian, A. Vershon, S . Bouvier, R.T. Sauer, M.M. Susskind, Changing proteome for targets of small molecule the DNA-binding specificity of a kinase inhibitors, Chem. Biol. 2004, 11, repressor, Cell 1983, 35,777-783. 211-223. 55. S. Tan, D. Guschin, A. Davalos, Y.L. A. Hochschild, M. Ptashne, Lee, A.W. Snowden, Y. Jouvenot, H.S. Cooperative binding of lambda Zhang, K. Howes, A.R. McNamara, repressors to sites separated by A. Lai, C. Ullman, L. Reynolds, integral turns of the DNA helix, Cell M. Moore, M. Isalan, L.P. Berg, 1986,44,681-687. B. Campos, H. Qi, S.K. Spratt, C.C. K. Joung, E. Ramm, C. Pabo, A Case, C.O. Pabo, J. Campisi, P.D. bacterial two-hybrid selection system Gregory, Zinc-finger protein-targeted to study protein-DNA and gene regulation: genomewide protein-protein interactions, Proc. single-gene specificity, Proc. Natl. Natl. Acad. Sci. U.S.A. 2000, 97, Acad. Sci. U.S.A. 2003, 100, 7382-7387. 11997-12002. I. Xenarios, L. Salwinski, X.J. Duan, 56. Sangamo, Biosciences, Vol. 2005, P. Higney, S.M. Kim, D. Eisenberg, 2005, pp. Sangamo Bio Science Inc, DIP, the database of interacting www.sangamo.com; Biotechnology proteins: a research tool for studying company focused on the research and cellular networks of protein development of novel transcription interactions, Nucleic Acids Res. 2002, factors for regulating human, plant, 30,303-305. and microbial genes. P. Uetz, L. Giot, G. Cagney, T.A. 57. B. Zhang, M. Gallegos, A. Puoti, Mansfield, R.S. Judson, J.R. Knight, E. Durkin, S. Fields, J. Kimble, M.P. D. Lockshon, V. Narayan, Wickens, A conserved RNA-binding M. Srinivasan, P. Pochart, protein that regulates sexual fates in A. Qureshi-Emili, Y. Li, B. Godwin, the C. elegans hermaphrodite germ D. Conover, T. Kalbfleisch, line, Nature 1997, 390, 477-484. G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, J.M. Rothberg, 58. G.J. Hannon, RNA interference, Nature 2002,418, 244-251. A comprehensive analysis of 59. D.R. Engelke, J.J. Rossi, R N A protein-protein interactions in Interference, Methods Enzymology saccharomyces cerevisiae, Nature VO~. 392, 2005, 1-454. 2000,403,623-627. 60. H. Lin, H. Tao, V.W. Cornish, P. Colas, B. Cohen, T. Jessen, Directed evolution of a glycosynthase I. Grishina, J. McCoy, R. Brent, Genetic selection of ueutide autamers via chemical comulementation. 1. Am. Chem. SOC.2004, iZG, 15051-15b59. that recognize and inhibit
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
4.2 Contro//ing frote;n-frote;n Interactions
4.2 Controlling Protein-Protein interactions Using Chemical inducers and Disrupters of Dimerization
Tim Clackson
Outlook
Transient interactions between proteins are a common mechanism of information transfer in biological systems. Chemical inducers of dimerization allow these interactions to be brought under specific, real-time chemical control, and have become established tools for cell biology research. This chapter reviews the diverse types of ligands and cognate binding proteins that can be used to control protein-protein associations, and discusses the applications of the technology, both in basic research and in potential therapeutic settings.
4.2.1 Introduction
Many cellular processes are triggered by the induced interaction of signaling proteins [I, 21. Examples include the clustering of cell surface receptors by extracellular growth factors and the subsequent stepwise recruitment and activation of intracellular signaling proteins. Indeed, many signaling cascades proceed almost entirely through such interactions, from the initial extracellular receptor engagement through signaling to the nucleus, proximitydriven activation of gene transcription, and subsequent effector steps such as regulated protein secretion. A chemical inducer of dimerization, or “dimerizer”, is a cell-permeant organic molecule with two separate motifs each of which bind with high affinity to a specific protein module. In principle, any cellular process that is activated (or inactivated) by protein-protein interactions can be brought under dimerizer control by fusing the protein(s) of interest to the binding domain(s) recognized by the dimerizer. Addition of the dimerizer then noncovalently links the chimeric signaling proteins, activating the cellular event that it controls (Fig. 4.2-l(a)). This conceptually simple approach, described more than 10 years ago [ 3 ] , has proved broadly applicable and has been widely adopted not only in the chemical biology community but also across biological research in general. It has also spawned several related technologies, such as systems for “reverse dimerization”. This chapter will review the various protein-ligand systems that have been designed, and describe examples of their use, both in research and drug discovery. Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WlLEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
I
227
228
I
4 Controlling Protein-Protein Interactions
Fig. 4.2-1 Schemes showing the principle of chemically induced dimerization o f proteins. (a) Homodimerization. in this
cells. (b) Heterodimerization. In this example, one fusion protein is membrane tethered; the other is expressed as a soluble example, fusion proteins are tethered t o the cytosolic protein and is recruited to the cell cell membrane through fusion to a peptide membrane upon addition ofdimerizer. sequence that becomes myristoylated inside
4.2.2 Development o f Chemical Dimerization Technology
The concept of chemically induced dimerization was introduced by Schreiber and Crabtree and their colleagues in 1983 [ 3 ] .The inspiration for their work came from the mechanism of the natural product immunosuppressive drug FKSOG, which binds simultaneously to FK50G binding protein 12 (FKBP12 or FKBP), a ubiquitous peptidyl-prolyl cis-trans isomerase, and the signaling phosphatase calcineurin, inhibiting the latter’s phosphatase activity and hence blocking signaling. This suggested a general way to bring any protein-protein interaction under small molecule control. Bifunctional organic molecules could be designed, with two protein-binding moieties. Target proteins for these molecules could be appended to the signaling domains of interest at the genetic level to create chimeric proteins. Addition of the bifunctional organic molecule to cells expressing the chimeric proteins would induce dimerization of the engineered proteins, mimicking the natural activation process (Fig. 4.2-l(a)).
4.2 Controlling Protein-Protein lnteractions
I
In the initial paper, Spencer et al. used the FK506-FKBP interaction itselfto provide building blocks for the dimerizer system. They generated a dimerizer by linking two molecules of FK506 to create FK1012, a molecule that can bind two FKBP domains simultaneously (but not calcineurin). They then created a suitable variant of their target protein, the T-cell receptor zeta chain, by appending three copies of FKBP. Addition of FK1012 to cells expressing the engineered protein led to clustering of the protein and activation of authentic downstream cellular events. FK1012 is a homodimerizer, with two identical binding motifs. It was quickly recognized that induced heterodimerization should also be feasible, by fusing the two proteins of interest to different protein-binding domains that are targeted by a suitable nonsymmetrical dimerizer (Fig. 4.2-l(b)) [4-61. Dimerizers used for such approaches have included, for example, dimers of FK506 and cyclosporine (FK-CsA) [4]. However, it is most straightforward to simply use the bifunctional natural products directly. Rapamycin, an immunosuppressive drug related to FK506, functions by binding simultaneously to FKBP and the protein kinase FRAP/mTOR [7]and can be used to heterodimerize proteins fused to these protein modules [5, 61. The ability to induce a protein-protein interaction inside cells provided a general way to generate inducible alleles of signaling and other proteins - one that can be activated in real time, in contrast to classical genetic approaches [8]. This suggested a series of important applications, ranging from mechanistic analysis of protein function to understanding the consequences of activating signaling in whole cells and even transgenic animals. Initial hopes have been more than fulfilled, and several hundred papers have now been published that describe diverse uses of the technology [9].
4.2.3 Dimerization Systems
A major focus, following the initial reports, was on refining the tools used to achieve chemical dimerization - in particular, the dimerizers themselves. Important aims were to improve chemical feasibility, specificity, and pharmacological properties, the latter to permit studies in experimental animals. This section will describe the options that have evolved for different types of induced dimerization. The focus will be on the FKBPbased technologies and applications developed by the author’s group and its collaborators, although other systems will also be mentioned.
4.2.3.1
Homodimerization
A series of FK1012 variants has been described with different linkers and, in some cases, facile syntheses using FK506 as a starting point (Fig. 4.2-2) [lo]. All of these can be used to effect dimerization between FKBP fusion proteins.
229
230
I
4 Controlling Protein-Protein Interactions
Linker X
FK1012
E OH
Z
OMe OMe
Me0
ii3
H2
AP1510
Fig. 4.2-2 First-generation homodimerization agents FK1012 and AP1510. These molecules are able to induce homodimers between wild-type FKBP fusion proteins. The variant FK1012s differ only in the linker region.
We sought to develop fully synthetic, lower-molecular-weightreplacements for FK1012, to allow full exploration of structure-activityrelationship (SAR) and optimization of pharmaceutical properties. These efforts led to the design of A m 1 0 (Fig. 4.2-2), which comprises two synthetic FKBP-binding ligands joined by a short spacer [ll].Although AP1510 binds less tightly to FKBP than FK1012, it is more potent in most applications, perhaps due to a greater conformational rigidity. FK1012s or AP1510 can be used to induce discrete homodimers between molecules ofan FKBP fusion protein when that protein contains a single FKBP domain. Higher-order clustering can, in principle, be achieved by including two or more FKBP domains, although the geometry and stoichiometry of the resulting complexes are difficult to control. In addition to FKBP-based systems, homodimerization has also been achieved using the naturally dimeric natural product coumermycin, which can dimerize proteins fused to Escherichia coli DNA gyrase [12].
4.2.3.2
Heterodimerization
Although early heterodimerization studies used molecules such as FK-CsA, the most common approach is the use of rapamycin, which naturally functions
4.2 C o n t r d i n g Protein-Protein lnteractions
as a heterodimerizer [7]. One protein is fused to FKBP, and the other to the -100 amino acid domain of FRAP/mTOR which binds to the FKBP-rapamycin complex, termed FRB (for FKBP-rapamycin binding domain) [13]. FKBP and FRB have no detectable affinity for one another in the absence of rapamycin, yet the drug binds simultaneously to both proteins with high affinity. Thus, addition of rapamycin to cells expressing FKBP and FRB fusion proteins leads to strictly drug-dependent heterodimerization. Because of its inherent directionality, heterodimerization is often a more precise tool than homodimerization and can be used in many configurations. For example, a protein can be inducibly recruited to the plasma membrane by fusing it to one of the drug-binding domains, and fusing the other to a myristoylation motif (see Fig. 4.2-l(b)) [4]. A major application of heterodimerization is in the control of transcription (see Section 4.2.3.4) [5, 61. In addition to the rapamycin system, other heterodimerization systems have been described, including dimers of methotrexate and dexamethasone to target dihydrofolate reductase and glucocorticoid receptor fusion proteins, respectively [14, 151, and dimers of estrogen analogs and biotin analogs to target fusions to estrogen receptors and streptavidin [16].
4.2.3.3
Refining Ligand-Protein Pairs: “Bumps and Holes”
Although the ligand-protein interfaces provided by nature are good starting points for building dimerization systems, there is room for improvement. In particular, it is desirable to maximize the selectivity of the ligands for their target fusion proteins compared to endogenous proteins, to ensure that the ligands have no effect on natural cellular physiology. In the case of FKBP-based homodimerization, the ligands might interfere with the natural function of FKBP as a modulator of transmembrane signaling proteins (although this is unlikely given the high intracellular FKBP levels). There is also the possibility that dimerizer potency could be blunted by sequestration of the drug into the extensive cellular FKBP pool. In the case of rapamycinbased heterodimerization, adding rapamycin to cells inhibits endogenous mTOR/FRAP activity, inducing antiproliferative effects. The solution devised for these problems has become known as “bumps and holes”, and takes advantage of the fact that the sequences of the drugbinding domains are available for genetic modification, since they are being expressed heterologously in the cell (Fig. 4.2-3). In this approach, the ligand is modified to introduce a steric clash (a “bump”) that abolishes binding to the target protein. Then, using structure-guided or screening approaches, one or more compensating mutations are identified in the drug-binding domain that restore the ability to bind the modified ligand (a “hole”). The bumped dimerizer is now able to bind only to the modified drug-binding domain of the chimeric protein and not to endogenous proteins. In addition to affording highly specific protein-ligand pairs, this interfaceengineering approach has also provided insights into the structural and
1
231
232
I
4 C o n t r o h g Protein-Protein interactions
Fig. 4.2-3
Engineering specificity into FKBP dimerizing agents using “bumps and holes”. (a) Homodimerization system. Bumped homodimers are able t o induce dimers between FKBP fusion proteins engineered with appropriate “holes”, while evading endogenous FKBP. (b) Rapamycin-based heterodimerization
system. Bumped “rapalogs” are able to induce heterodimers between FKBP fusion proteins and FRB fusion proteins engineered with a specific “hole”. The compounds can still bind to endogenous FKBP, but have reduced or eliminated antiproliferative activity because this complex cannot bind effectively t o endogenous FRAP/mTOR.
4.2 C o n t r o h g Protein-Protein interactions
1
233
thermodynamic plasticity of small molecule-protein interfaces [ 17, 181. The approach has echoes in many other areas of chemical biology, in particular the pioneering work of Shokat and coworkers in engineering allele-selective kinase inhibitors and substrates (see Chapter 3.1). 4.2.3.3.1
Bumped Hornodirnerizers
Highly potent and selective hornodimerizers have been designed by engineering the interface between AP1510 and FKBP. X-ray crystallographic analysis suggested that alkyl substitution of a specific carbonyl group on the FKBP ligand would destroy binding and that loss-of-size mutations at FKBP residue F36 should restore affinity (Fig. 4.2-3(a)).Subsequent studies resulted in AP1903, a bumped dimerizer with very high affinity (& 0.1 nM) and 1000-foldselectivity for the FKBP mutant F36V compared to the wild-type protein (Fig. 4.2-4) [ 191. Related dimerizers with different linkers but equivalent potencies have also been described (such as AP20187; see Fig. 4.2-4). These dimerizers, in general, have proved to be much more potent than their unbumped cousins and suitable for in vivo studies in a range of experimental animals. Numerous studies have reported the use of FKBP-F36V fusion proteins and AP20187 to control cellular processes [9],and AP1903 itself has completed a phase I clinical trial in healthy human volunteers, where it was found to be safe and well tolerated [20].
-
4.2.3.3.2
Bumped Heterodirnerizers: “Rapalogs”
“Bumped” raparnycin systems have been developed by chemically modifying the FRB-binding portion of rapamycin, to generate “rapalogs” with reduced
~
Dtrnerizer
x
Linker Y
O
Fig. 4.2-4 Bumped homodimerizers. These compounds are designed to bind potently and specifically to the F36V mutant of FKBP.
H
234
I
4 Controlling Protein-Protein hteractions
or eliminated FRB binding and, hence, biological activity. Compensating mutations in FRB have then been identified using structure-guided mutagenesis and screening/selection, which can then be introduced into target protein FRB fusions (Fig. 4.2-3(b)). Several rapamycin bump-hole solutions have been described (Fig. 4.2-5). In one, bulky substitutions at the Cl6 methoxy group of rapamycin were used to abrogate binding to wild-type FRB. In a structure-guided screen, mutation of FRB residue Thr2098 (which abuts Cl6) to Leu was found to allow binding of a wide range of Cl6-substituted rapalogs (Ref. 21 and our unpublished work) (Fig. 4.2-5). In fact, the T2098L substitution is a versatile “tag” that functionally accommodates numerous rapamycin analogs, modified at C 16 and/or other positions, as well as rapamycin itself. As a result it is routinely incorporated into all our FRB fusion protein constructs and has been used with C16-bumped rapalogs in numerous in vitro and in vivo studies. Another system uses C20-methallyl rapamycin (Ma-rap; Fig. 4.2-5), which is unable to bind wild-type FRB and is therefore devoid of FRAP/mTOR inhibitory activity [22]. Ma-rap was found in a screen to bind very specifically to a triple mutant of FRB known as PLF [22]. Using the PLF mutant of FRB, Ma-rap can be used to achieve highly selective heterodimerization of proteins
Rapamycinl AP rapalogs
Rapalog
R16
Rapamycin
OMe
R32
Me0
II
0
Me0
OMe
AP22594
/I
0
OMe
II
AP1861
0
Me0
~
MA-rap AP21967
I
OH
~
L7
AP23102
HN,koa
I1
0
J,
Fig. 4.2-5 Bumped rapalogs used as heterodimerizers. The rapalogs listed in the panel are all active in dimerization systems incorporating the T2098L mutation in FRB fusion proteins. Ma-rap (CZO-methallyl
rapamycin), in which the triene portion of rapamycin is modified as shown, is active in dimerizeration systems incorporating the specific FRB triple mutation PLF (K2095P/T2098L/W2101 F) [22].
236
I
4 Controlling Protein-Protein lnteractions
Fig. 4.2-6 Schemes for controlling transcription using chemically induced dimerization. (a) Control using homodimerizers. (b) Control using heterodimerizers (rapalogs).
of FKBP binds to itself in a manner that can be reversed using an FKBP ligand [27]. The phenomenon was initially noted in a two-hybrid assay and subsequently confirmed by biophysical studies on the purified protein. Although the monomer-monomer affinity is relatively weak (& 30 yM), the interaction is specific, and concatenated F36M domains form discrete aggregates by virtue of multivalent binding. Interactions can be completely disrupted by addition of a monomeric “bumped” ligand analogous to one half of AP1903 (see Fig. 4.2-4),suggesting that the F3GM mutation, similar to F36V, introduces a “hole” in the protein surface. This result also implies that the proteins interact through their ligand-binding sites - a finding confirmed crystallographically (see next section). This system can be used to reversibly aggregate any protein to which multiple F36M domains are attached. For example, intracellular expression of a fusion between four F36M domains and green fluorescent protein (GFP) results in large fluorescent intracellular aggregates that disperse within minutes upon adding monovalent ligand [27]. Removal of ligand leads to rapid re-formation of aggregates.
-
4.2 C o f l t r o h g Protein-Protein Interactions
Fig. 4.2-7 Comparison of conventional and proteins. (b) Reverse dimerization system using monomeric ligand (AP21998) and "reverse" FKBP dimerization systems. F36M fusion Proteins. (a) induced dimerization using bumped homodimerizer AP20187 and F36V fusion
4.2.3.6 Structural Basis of Induced Dimerization
One attraction of using inducible dimerization is that the interacting molecules are understood in great detail. The high-resolution X-ray structures of all three FKBP-based complexes in the dimerized state are available - the AP1903 homodimerization system (our unpublished work), rapamycin heterodimerization system [7], and the F36M reverse dimerization system [27] (Fig. 4.2-8). These structures have been invaluable for engineering and optimizing the drug-protein interfaces. In addition, they provide important guidance on the orientations in which the binding proteins can be fused to heterologous proteins of interest, in order to induce dimerization of the appropriate geometry.
4.2.4 Applications
With protein-protein interactions being pervasive throughout biology, chemically controlled dimerization has proved to be a remarkably versatile technology, and more than 300 papers have described use of the approach [9]. These applications can be broadly separated into two classes. The first is the use of dimerization technologies in basic and applied biological research, to understand the functions of proteins or pathways, and to create
1
237
238
I
4 Controlling Protein-Protein Interactions
Fig. 4.2-8 X-ray crystal structures of (b) Structure o f raparnycin in complex with dimerized complexes. In each case, protein wild-type FKBP green and the FRB domain N-termini are marked in blue and C-termini of FRAP/rnTOR gray (Protein Data Bank (PDB) ID: 4FAP) [7]. (c) Structure ofthe in red. (a) Structure ofAP1903 in complex with two molecules o f FKBP-F36V (our homodimeric complex o f the unpublished data). The two proteins are self-associating FKBP mutant F36M brought close to each other in a “parallel” (PDB ID: 1 EYM) [27]. The two molecules configuration, and intramolecular interact through their ligand-binding sites in drug-drug interactions are extensive. an “antiparallel” configuration.
inducible animal models of disease. The second is the direct use of the technologies in potential therapeutic applications, generally in the context of cell or gene therapies. Examples of both will be reviewed in the following sections.
4.2 Contro//ing Protein-Protein interactions
4.2.4.1
Analysis o f Protein Function
A very common and powerful application is creating an inducible allele of a protein in order to dissect its function. Typically, the protein of interest is fused to a dimerization domain, cells expressing the fusion protein are exposed to dimerizer, and the consequences are assessed by any appropriate technique, such as assaying downstream signaling or profiling mRNA expression. The key advantages of chemically induced dimerization are that activation can be restricted to one particular protein and can be initiated and then monitored in “real time” by addition of drug. This allows very specific questions to be asked about the function of a protein or of the pathway that it controls. Over 100 proteins have been successfully brought under dimerizer control in this way 191. In many cases, these are signaling proteins such as cell surface receptors, intracellular protein kinases, and signaling proteases such as caspases. Often, the experimental goal is simply to test whether dimerization is sufficient to activate the protein. For example, such studies support an induced proximity model for activation of Raf-1 [12], caspase 8 [28], and leukemiaassociated fusion proteins [29]. However, more complex questions can be asked, particularly through combined use of homo- and heterodimerization. Dimerizable alleles of the epidermal growth factor (EGF) receptor family have beenused to show that EGFRl homodimers, EGFR2 (HER2)homodimers, and EGFR1-EGFR2 heterodimers all have different effects on breast tumor cell proliferation and invasion in three-dimensional culture models [30]. By using dimerizable alleles, the roles of each complex could be probed independently and without the complicating effects of the natural receptor ligands. More broadly, dimerization can be used to gain control over a specific molecular process or even cellular event that can be induced by proximity: examples include cell adhesion and rolling [31],DNA looping [32], recombinase enzymatic activation [33], RNA splicing [34], protein splicing [35], and glycosylation [3G]. These inducible alleles allow the process in question to be dissected, but often also provide tools that have applications in their own right: for example, the use of inducible recombinase activity to achieve temporal control of gene deletion [33].
4.2.4.2
Animal Models of Disease
Because the inducing compounds are suitable for use i n vivo, and are generally orthogonal to mammalian biology, studies can also be performed in a wholeanimal context. A common approach is to generate transgenic mice in which expression of the fusion protein is restricted to a tissue of interest. These mice allow study of protein or pathway function i n vivo, but can also provide an inducible model of any disease that is associated with activation (or inhibition). For example, transgenic mice expressing inducible versions of either fibroblast growth factor receptor 1 (FGFR1) or FGFR2 specifically in the prostate have been used to show that only the former receptor can induce the neoplasia and hyperplasia typical of early prostate cancer [37] (Fig. 4.2-9).These mice could
1
239
240
I
4 Controlling Protein-Protein Interactions
Fig. 4.2-9 Use of dimerization technology t o probe the roles of FGF receptor subtypes in prostate cancer development. Transgenic mice were prepared in which dimerizer-inducible alleles of FGFRl or FCFRZ were expressed exclusively in
prostate tissue. Administration of dimerizer (AP20187) induced prostate neoplasia and hyperplasia only in the FGFRl mice, implicating this receptor subtype in early Prostate cancer development.
also be used to test potential drugs for the ability to block the induced FGFRl signal and its consequences. A general approach to creating animal models of degenerative diseases is to induce apoptosis specifically in target tissues or organs. This can be achieved through tissue-specific expression of inducible alleles of the Fas receptor or through any number of downstream caspases. Mice in which hepatocytes can be inducibly ablated represent a valuable model for liver diseases [38], and mice expressing inducible caspase in macrophages are a valuable resource for probing the roles of these cells [39].
4.2.4.3
Regulated Cell Therapies
A powerful use of dimerizer technology is in controlling the proliferation, differentiation, and/or survival of genetically engineered cells [40]. Cell therapies have broad potential to treat diseases but suffer from limitations, including the inability to manipulate the cells once introduced into the body. Blau and coworkers have used dimerizer-activated alleles of cytokine receptors to acquire control over cell proliferation. Cells modified with a gene of interest are also engineered with this “cell growth switch”; administration of dimerizer then stimulates proliferation only of modified cells, in vitro or in vivo (Fig. 4.2-10). This approach has been successfully demonstrated in small [41] and large animal studies [42]and offers a way to expand very rare modified cell populations into a therapeutic range. Other signaling proteins can be used to achieve different outcomes - for example, dimerizing CD40 induces a potent
4.2 C o n t r o h g Protein-Protein lnteractions
Fig. 4.2-10 Application o f a dirnerization-based “cell growth switch” to achieve expansion of genetically modified cells. Hernopoietic cells are transduced with a retrovirus encoding a therapeutic gene along with a fusion between FKBp.F36V and
receptor. Although transduced cells are rare, following infusion in vivo they can be selectively expanded by administering dimerizer (AP20187), which induces their proliferation and differentiation. Expansion can akO be carried O u t in Cell CultUre.
the signaling domain o f rnpl, a cytokine
immunomodulatory response in cells and could be used as part of a cellular cancer vaccine [43]. The opposite approach to inducing proliferation is to induce cell death, using conditional alleles of Fas or caspases. A Fas “death switch” has been used to eliminate engineered T cells infused into animals [44],as a model for depleting the T cells that cause graft-versus-host disease following bone marrow transplantation [45].More potent caspase-based switches can also be used [46] and, in principle, could be installed into any therapeutic cell to provide a “fail-safe” mechanism for cell destruction should adverse events ensue.
4.2.4.4 Regulated Transcription and Regulated Gene Therapies
Use of dimerizers to control transcription of engineered target genes represents an alternative to technologies such as the tetracycline-inducible (“Tet”) system
I
241
242
I that rely on allosteric activation [47].A key advantage of dimerizer approaches 4 Controlling Protein-Protein Interactions
is the very low background transcription in the absence of dimerizer, most likely because the AD is physically separated from DNA prior to activation (see Fig. 4.2-6) [25].This feature has been exploited to achieve inducible expression of proteins that are highly toxic, such as diphtheria toxin [21],or highly potent, such as activators of viral replication [48].The modular nature of the dimerizer system also facilitates control of endogenous (as opposed to introduced) genes, achieved by fusing FKBP modules to a DBD engineered to recognize the appropriate natural promoter [49]. There is considerable interest in the use of dimerizer-controlled gene expression in regulated gene therapies. Extensive work has gone into optimizing the rapamycin-inducible system for potential clinical use, including identifying rapalogs with optimal pharmacology, and developing “humanized” DNA-binding and activation domains, so that all protein components of the system are of human original to minimize immunogenicity in a clinical setting (reviewed in Refs 25, 47). The rapamycin system has been successfully incorporated into most gene therapy vector contexts, including adenovirus and adeno-associated virus (AAV) [SO], onco-retrovirus, lentivirus, herpes simplex virus, and naked DNA (reviewed in Ref. 25). Tightly controlled erythropoietin (Epo) production in response to rapamycin has been demonstrated in nonhuman primates for over 6 years following a single intramuscular administration
Fig. 4.2-11 Use of dimerizer-controlled transcription to achieve long-term regulated expression of a therapeutic gene in a nonhuman primate. At time zero, the animal received a single intramuscular injection of adeno-associated viral vectors encoding primate erythropoietin (Epo) under the control o f the rapamycinregulated dimerization system. Subsequent administrations o f rapamycin at the
indicated doses (mg/kg, intravenously triangles) induced discrete and reversible increases in serum Epo levels (black symbols, left axis) and commensurate elevations in hematocrit (open symbols, right axis). Inducibility has persisted for over 6 years t o date and the study is ongoing. This figure was originally published in Blood [51]. 0The American Society of Hematology.
4.2 Controlling Protein-Protein lnteractions
ofAAV vectors (Fig. 4.2-11)[51].Rapamycin- or rapalog-controlled gene expression has also been demonstrated in animal models after gene delivery to the liver [52], eye [53],and brain [54].These studies support the concept ofbringing therapeutic protein production under dimerizer control in the clinical setting. 4.2.4.4.1
Three-hybridApproaches
Another use of dimerizer-controlled transcription is in three-hybrid assays [14, 151. In these applications, the “third hybrid” is the dimerizer, and gene activation serves merely as an assay to report on the interaction between a dimerizer and the two fusion proteins, rather than as the end in itself. Threehybrid assays can be used to identify target proteins for a given small molecule (by incorporating the molecule into a dimerizer and screening against a cDNA library fused to an AD; see Chapter 18.2), or to identify small molecules that bind a given target (by cloning the target as an AD fusion protein and screening against a library of dimerizers in which one monomer is diversified). More recently, they have been applied to directed evolution of the catalytic properties of proteins using “chemical complementation” (see Chapter 4.1).
4.2.4.5 Regulated Secretion Using “Reverse Dimerization” System
The reverse dimerization system (Section 4.2.3.5) has been used to develop an approach for the regulated pulsatile secretion of proteins [55].The aim of this work was to mimic the natural, rapid release of proteins such as insulin using a regulated gene therapy strategy. Since control at the transcriptional level takes place on the timescale of days, it is necessary to directly regulate the secretion process. To achieve this, the protein ofinterest is expressed as a secreted protein fused to tandem copies of the FKBP-F36M domain, resulting in the formation of aggregates in the endoplasmic reticulum (ER) that are too large to exit to the Golgi (Fig. 4.2-12). Addition of a monomeric ligand breaks up the aggregates, allowing the proteins to proceed to the Golgi, where they are processed by the endogenous protease furin, releasing the authentic protein for secretion. Using this system, rapid pulses of insulin secretion could be iteratively induced by adding ligand to cells in uitro (Fig. 4.2-12(c)).Furthermore, in a mouse model of insulin-dependent diabetes, induced release of insulin could transiently reverse hyperglycemia [55].More recently, we have incorporated the system into an AAV vector and demonstrated long-term inducible secretion following gene transfer into mice (our unpublished studies). These findings suggest that regulated secretion could be useful for regulating the expression of proteins that require delivery in rapid pulses. The ability to reversibly induce large protein aggregates has also provided a useful tool in basic research on the mechanisms of intracellular transport - for example, allowing demonstration, for the first time, of the existence of “megavesicles” that traffic between the ER (endoplasmatic reticulumn) and Golgi [56].
I 243
244
I
4 Controlling Protein-Protein Interactions
4.2 C o n t r d h g Protein-Protein interactions 4
Fig. 4.2-12 Use ofthe reverse dimerization system t o control protein secretion in mammalian cells. (a) Scheme for inducible secretion. (b) Chemical structure o f monomeric ligand AP21998. (c) Pulsatile release o f insulin from engineered cells.
Cells expressing an insulin-F36M fusion protein were exposed t o AP21998 for three 1-h periods as indicated, and medium was collected every hour and assayed for insulin levels [55].
4.2.5 Future Development
Inducible dimerization technologies are now firmly established as research tools. The components of the various systems are largely developed, although refinements will likely continue in some areas - for example, the optimization of protein-ligand pairings, particularly rapamycin analogs. A worthwhile goal now within reach is the simultaneous regulation of multiple pathways or proteins using dimerizers and binding proteins that are completely orthogonal to one another [24]. Some of the most powerful research applications of the technology are only now starting to be explored - a consequence of the time necessary to establish transgenic mouse lines expressing appropriate fusion proteins. The next few years will likely see many more reports using such mice to dissect the roles of individual proteins and pathways in normal physiology and in disease. Similarly, although the feasibility and promise of therapeutic uses of dimerizer technology has been well established in animal models, translation into the clinic has been slow owing to the general issues and complexities associated with gene and cell therapies. As these issues are resolved, dimerizer technology may have a key role to play in conferring control and safety on such therapies. Looking further ahead, interesting extensions of the dimerizer concept are emerging. These include attempts to enhance the potency of drugs by linking them to another small molecule, such as an FKBP ligand, that can recruit an endogenous protein and improve overall binding affinity [57]. The ultimate extrapolation of chemical dimerization would be dimerizers that bind directly to native target proteins, as opposed to engineered fusion proteins. Attempts to build fully synthetic transcriptional activators that directly bind both DNA and transcriptional regulators are a step in this direction [58],and compounds that directly dimerize and activate cytokine receptors may, in time, become a therapeutic alternative to recombinant proteins such as Epo [59]. 4.2.6 Conclusion
Chemically controlled dimerization represents a clear and successful example of how chemical biology approaches can “cross over” into mainstream biology
I
245
246
I and become established as powerful and generally accepted research tools. The 4 Controlling Protein-Protein Interactions
technology has contributed significant new insights into numerous biological processes and, in turn, has inspired new directions in chemical biology research. Both of these benefits are likely to continue as the technology becomes more broadly utilized.
Acknowledgments
I thank Len Rozamus, Xiaotian Zhu, Vic Rivera, and Renate Hellmiss for preparing the figures. I am indebted to my many ARIAD colleagues and collaborators, past and present, who have contributed to our work on dimerization technology. Particular thanks are due to Vic Rivera for numerous discussions over many years. Kits for the regulated dimerization of proteins may be requested through ARIAD’s website at www.ariad.com/regulationkits. References 1.
2.
3.
4.
5.
6.
G.R. Crabtree, S.L. Schreiber, humanized system for pharmacologic Three-part inventions: intracellular control of gene expression, Nat. Med. signaling and induced proximity, 1996,2,1028-1032. Trends Biochem. Sci. 1996, 21, 7. J. Choi, J. Chen, S.L. Schreiber, 418-422. J. Clardy, Structure of the J.D. Klemm, S.L. Schreiber, G.R. FKBP12-rapamycin complex Crabtree, Dimerization as a regulatory interacting with the binding domain mechanism in signal transduction, of human FRAP, Science 1996, 273, Annu. Rev. Immunol. 1998, 16, 239-242. 569-592. 8. L.A. Banaszynski, T.J.Wandless, D.M. Spencer, T.J. Wandless, S.L. Conditional control of protein Schreiber, G.R. Crabtree, Controlling function, Chem. Biol. 2006, 13, 11-21. signal transduction with synthetic 9. A complete list of publications ligands, Science 1993, 262, 1019-1024. describing use of chemical P.J. Belshaw, S.N. Ho, G.R. Crabtree, dimerization technologies can be S.L. Schreiber, Controlling protein found at, http://www.ariad.com/ association and subcellular regulationkits. localization with a synthetic ligand 10. S.T. Diver, S.L. Schreiber, Single-step that induces heterodimerization of syntheses of cell permeable protein proteins, Proc. Natl. Acad. Sci. U.S.A. dimerizers that activate signal 1996, 93,4604-4607. transduction and gene expression, J . S.N. Ho, S.R. Biggar, D.M. Spencer, Am. Chem. SOC.1997, 119,5106-5109. S.L. Schreiber, G.R. Crabtree, Dimeric 11. J.F. Amara, T. Clackson, V.M. Rivera, ligands define a role for transcriptional T. Guo, T. Keenan, S. Natesan, activation domains in reinitiation, R. Pollock, W. Yang, N.L. Courage, Nature 1996,382,822-826, D.A. Holt, M. Gilman, A versatile V.M. Rivera, T. Clackson, S. Natesan, synthetic dimerizer for the regulation R. Pollock, J.F. Amara, T. Keenan, S.R. of protein-protein interactions, Proc. Magari, T. Phillips, N.L. Courage, Natl. Acad. Sci. U S A . 1997, 94, 10618-10623. F. Cerasoli Jr. D.A. Holt, M. Gilman, A
References I247 12.
13.
14.
15.
16.
17.
18.
19.
20.
M.A. Farrar, I. Alberol, R.M. Perlmutter, Activation of the Raf-1 kinase cascade by coumermycin-induced dimerization, Nature 1996, 383, 178-181. 1. Chen, X.F. Zheng, E.J. Brown, S.L. Schreiber, Identification of an 11-kDa FKB P 12-rapamycin-binding domain within the 289-kDa FKBP12-rapamycin-associated protein and characterization of a critical serine residue, Proc. Natl. Acad. Sci. U.S.A. 1995, 92,4947-4951. E.J. Licitra, J.O. Liu, A three-hybrid system for detecting small ligand-protein receptor interactions, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 12817- 12821. H. Lin, W.M. Abida, R.T. Sauer, V.W. Cornish, Dexamethasonemethotrexate: an efficient chemical inducer of protein dimerization in vivo,J. Am. Chem. SOC.2000, 122, 4247-4248. S.S. Muddana, B.R. Peterson, Facile synthesis of acids: biotinylated estrone oximes efficiently heterodimerize estrogen receptor and streptavidin proteins in yeast three hybrid systems, Org. Lett. 2004, 6, 1409-1412. W. Yang, L.W. Rozamus, S. Narula, C.T. Rollins, R. Yuan, L.J. Andrade, M.K. Ram, T.B. Phillips, M.R. van Schravendijk, D. Dalgarno, T. Clackson, D.A. Holt, Investigating protein-ligand interactions with a mutant FKBP possessing a designed specificity pocket, J. Med. Chem. 2000, 43,1135-1142. T. Clackson, Redesigning small molecule-protein interfaces, Curr. Opin. Struct. Biol. 1998, 8, 451-458. T. Clackson, W. Yang, L.W. Rozamus, M. Hatada, J.F. Amara, C.T. Rollins, L.F. Stevenson, S.R. Magari, S.A. Wood, N.L. Courage, X. Lu, F. Cerasoli Jr, M. Gilman, D.A. Holt, Redesigning an FKBP-ligand interface to generate chemical dimerizers with novel specificity, Proc. Natl. Acad. Sci. U.S.A. 1998, 95,10437-10442. J.D. Iuliucci, S.D. Oliver, S . Morley, C. Ward, I. Ward, D. Dalgarno, T. Clackson, H.J. Berger, Intravenous
21.
22.
23.
24.
25.
26.
27.
28.
safety and pharmacokinetics of a novel dimerizer drug, AP1903, in healthy volunteers, /. Clin. Pharmacol. 2001, 41,870-879. R. Pollock, R. Issner, K. Zoller, S. Natesan, V.M. Rivera, T. Clackson, Delivery of a stringent dimerizerregulated gene expression system in a single retroviral vector, Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 13221-13226. S.D.Liberles, S.T. Diver, D.J. Austin, S.L. Schreiber, Inducible gene expression and protein translocation using nontoxic ligands identified by a mammalian three-hybrid screen, Proc. Natl. Acad. Sci. U.S.A. 1997, 94, 7825-7830. K. Stankunas, J.H. Bayle, J.E. Gestwicki, Y.M. Lin, T.J. Wandless, G.R. Crabtree, Conditional protein alleles using Knockin mice and a chemical inducer of dimerization, Mol. Cells 2003, 12, 1615-1624. J.H. Bayle, J.S. Grimley, K. Stankunas, J.E. Gestwicki, T. J. Wandless, G.R. Crabtree, Rapamycin analogs with differential binding specificity permit orthogonal control of protein activity, Chern. Biol. 2006, 13, 99-107. R. Pollock, T. Clackson, Dimerizer-regulated gene expression, Curr. Opin. Biotechnol. 2002, 13, 459-467. W. Yang, T.P. Keenan, L.W. Rozamus, X. Wang, V.M. Rivera, C.T. Rollins, T. Clackson, D.A. Holt, Regulation of gene expression by synthetic dimerizers with novel specificity, Bioorg. Med. Chern. Lett. 2003, 13, 3181-3184. C.T. Rollins, V.M. Rivera, D.N. Woolfson, T. Keenan, M. Hatada, S.E. Adams, L. J. Andrade, D. Yaeger, M.R. van Schravendijk, D.A. Holt, M. Gilman, T. Clackson, A ligand-reversible dimerization system for controlling protein-protein interactions, Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 7096-7101. M. Muzio, B.R. Stockwell, H.R. Stennicke, G.S. Salvesen, V.M. Dixit, An induced proximity model for
248
I
4 Controlling Protein-Protein interactions
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
caspase-8 activation,J. Biol. Chew. 1998, 273,2926-2930. K.M. Smith, R.A. Van Etten, Activation of c-Abl kinase activity and transformation by a chemical inducer of dimerization, J. Bzol. Chew. 2001, 276,24372-24379. L. Zhan, B. Xiang, S.K. Muthuswamy, Controlled activation of ErbBl/ErbB2 heterodimers promote invasion of three-dimensional organized epithelia in an ErbB1-dependent manner: implications for progression of ErbB2-overexpressingtumors, Cancer Res. 2006,66,5201-5208. X. Li, D.A. Steeber, M.L.K. Tang, M.A. Farrar, R.M. Perlmutter, T.F. Tedder, Regulation of L-selectin-mediated rolling through receptor dimerization, J . Exp. Med. 1998, 188,1385-1390. S.L. Ameres, L. Drueppel, K. Pfleiderer, A. Schmidt, W. Hillen, C. Berens, Inducible DNA-loop formation blocks transcriptional activation by an SV40 enhancer, EMBOJ. 2005, 24,358-367. N. Jullien, F. Sampieri, A. Enjalbert, J.P. Herman, Regulation of Cre recombinase by ligand-induced complementation of inactive fragments, Nucleic Acids Res. 2003, 31, e131. B.R. Graveley, Small molecule control of pre-mRNA splicing, R N A 2005, 11, 355-358. H.D. Mootz, T.W. Muir, Protein splicing triggered by a small molecule, J . Am. Chem. SOC.2002, 124(31), 9044- 9045. J.J.Kohler, C.R. Bertozzi, Regulating cell surface glycosylation by small molecule control of enzyme localization, Chew. Biol. 2003, 10, 1303-1311. K.W. Freeman, B.E. Welm, R.D. Gangula, J.M. Rosen, M. Ittmann, N.M. Greenberg, D.M. Spencer, Inducible prostate intraepithelial neoplasia with reversible hyperplasia in conditional FG F R1 -expressing mice, Cancer Res. 2003, 63,8256-8263. V.O. Mallet, C. Mitchell, J.E. Guidotti, P. Jaffray, M. Fabre, D. Spencer, D. Arnoult, A. Kahn, H. Gilgenkrantz,
39.
40.
41.
42.
43.
44.
45.
46.
Conditional cell ablation by tight control of caspase-3 dimerization in transgenic mice, Nat. Biotechnol. 2002, 20,1234-1239. S.H. Burnett, E.J. Kershen, J. Zhang, L. Zeng, S.C. Straley, A.M. Kaplan, D.A. Cohen, Conditional macrophage ablation in transgenic mice expressing a Fas-based suicide gene, J. Leukocyte Biol. 2004, 75, 612-623. T. Neff, C.A. Blau, Pharmacologically regulated cell therapy, Blood 2001, 97, 2535-2540. L. Jin, H. Zeng, S. Chien, K.G. Otto, R.E. Richard, D.W. Emery, A.C. Blau, In vivo selection using a cell-growth switch, Nat. Genet. 2000, 26, 64-66. R.E. Richard, R.A. De Claro, J. Yan, S. Chien, H. Von Recum, J. Morris, H.P. Kiem, D.C. Dalgarno, S. Heimfeld, T. Clackson, R. Andrews, C.A. Blau, Differences in F36VMpl-based in vivo selection among large animal models, Mol. Ther. 2004, 10, 730-740. B.A. Hanks, J. Jiang, R.A. Singh, W. Song, M. Barry, M.H. Huls, K.M. Slawin, D.M. Spencer, Re-engineered CD40 receptor enables potent pharmacological activation of dendritic-cell cancer vaccines in vivo, Nat. Med. 2005, 11, 130-137. C. Berger, C.A. Blau, M.L. Huang, J.D. Iuliucci, D.C. Dalgarno, J. Gaschet, S. Heimfeld, T. Clackson, S.R. Riddell, Pharmacologically regulated Fas-mediated death of adoptively transferred T cells in a nonhuman primate model, Blood 2004, 103(4), 1261-1269. D.C. Thomis, S. Marktel, C. Bonini, C. Traversari, M. Gilman, C. Bordignon, T. Clackson, A Fas-based suicide switch in human T cells for the treatment of graft-versus-host disease, Blood 2001, 97,1249-1257. K.C. Straathof, M.A. Pule, P. Yotnda, G. Dotti, E.F. Vanin, M.K. Brenner, H.E. Heslop, D.M. Spencer, C.M. Rooney, An inducible caspase 9 safety switch for T-cell therapy, Blood 2005, 105,4247-4254.
References I 2 4 9 47.
48.
49.
50.
51.
52.
53.
T. Clackson, Regulated gene 54. L.M. Sanftner, V.M. Rivera, B.M. expression systems, Gene Ther. 2000, Suzuki, L. Feng, L. Berk, S. Zhou, J.R. 7, 120-125. Forsayeth, T. Clackson, J. Cunningham, Dimerizer regulation H. Chong, A. Ruchatz, T. Clackson, V.M. Rivera, R.G. Vile, A system for of AADC expression and behavioral small-molecule control of response in AAV-transduced 6-OHDA conditionally replication-competent lesioned rats, Mol. Ther. 2006, 13, adenoviral vectors, Mol. Ther. 2002, 5, 167- 174. 195-203. 55. V.M. Rivera, X. Wang, S. Wardwell, R. Pollock, M. Giel, K. Linher, N.L. Courage, A. Volchuk, T. Keenan, T. Clackson, Regulation of D.A. Holt, M. Gilman, L. Orci, endogenous gene expression with a F. Cerasoli Jr, J.E. Rothman, small-molecule dimerizer, Nat. T. Clackson, Regulation of protein secretion through controlled Biotechnol. 2002, 20, 729-733. aggregation in the endoplasmic X. Ye, V.M. Rivera, P. Zoltick, reticulum, Science 2000, 287,826-830. F. Cerasoli Jr, M.A. Schnell, G. Gao, J.V. Hughes, M. Gilman, J.M. Wilson, 56. A. Volchuk, M. Amherdt, Regulated delivery of therapeutic M. Ravazzola, B. Brugger, V.M. proteins after in vivo somatic cell gene Rivera, T. Clackson, A. Perrelet, T.H. Sollner, J.E. Rothman, L. Orci, transfer, Science 1999, 283, 88-91. Megavesicles implicated in the rapid V.M. Rivera, G.P. Gao, R.L. Grant, transport of intracisternal aggregates M.A. Schnell, P.W. Zoltick, L.W. across the Golgi stack, Cell 2000, 102, Rozamus, T. Clackson, J.M. Wilson, Long-term pharmacologically 335- 348. regulated expression of erythropoietin 57. J.E. Gestwicki, G.R. Crabtree, I.A. Graef, Harnessing chaperones to in primates following AAV-mediated generate small-molecule inhibitors of gene transfer, Blood 2005, 105, amyloid beta aggregation, Science 1424-1430. 2004,306,865-869. A. Auricchio, G.P. Gao, Q.C. Yu, 58. C.Y. Majmudar, A.K. Mapp, Chemical S. Raper, V.M. Rivera, T. Clackson, approaches to transcriptional J.M. Wilson, Constitutive and regulation, Curr. Opin. Chem. Biol. regulated expression of processed 2005, 9,467-474. insulin following in vivo hepatic gene transfer, Gene Ther. 2002, 9, 963-971. 59. S.A. Qureshi, R.M. Kim, Z. Konteatis, D.E. Biazzo, H. Motamedi, A. Auricchio, V. Rivera, T. Clackson, R. Rodrigues, J.A. Boice, J.R. Calaycay, E. O’Connor, A. Maguire, M.A. Bednarek, P. Griffin, Y.D. Gao, M. Tolentino, J. Bennett, J. Wilson, K. Chapman, D.F. Mark, Mimicry of Pharmacological regulation of protein erythropoietin by a nonpeptide expression from adeno-associated viral molecule, Proc. Natl. Acad. Sci. U.S.A. vectors in the eye, Mol. Ther. 2002, 6, 1999, 96,12156-12161. 238-242.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim 250
I
4 Contro//;ng Prote;n-Protein interactions
4.3 Protein Secondary Structure Mimetics as Modulators o f Protein-Protein and Protein- Ligand Interactions
Hang Yin and Andrew D. Hamilton
Outlook
The development of low-molecular-weight agents that modulate protein-protein interactions has been regarded as a difficult goal due to the relatively large and featureless protein interfacial surfaces involved [l-31. Conventional methods for identifylng inhibitors of protein-protein interactions generally involve the preparation and screening of large chemical libraries to discover lead compounds [4]. Despite significant advances in high-throughput methods, screening a large number of compounds cannot guarantee the delivery of potential drug candidates with necessary potency and selectivity. Structure-based design is an area of great current interest and represents a much-considered alternative to conventional methods. In this chapter, we will review some representative studies ofusing synthetic agents that mimic protein secondary structures in drug discovery, in particular, to target protein-protein and protein-ligand interactions. These studies have expanded the horizon of drug design, strengthened our understanding of protein-protein and protein-ligand interactions, and offered an economical alternative to conventional screening methods.
4.3.1 Introduction
Modulating protein-protein interactions using synthetic compounds is a highly active field in medicinal chemistry. Conventional targets for small molecule agents are usually enzyme active sites within the interior of proteins because: (a) the enzyme recognition sites are usually well-defined clefts or cavities within the protein, with multiple points of contact often leading to high affinity, (b) hydrogen bonding, salt bridges, and electrostatic interactions play critical roles in the recognition of small molecules within the cavities, so inhibitors containing complementary hydrogen-bond donors or acceptors often work well, (c) native enzyme substrates can provide good models for the inhibitor design, and (d) the assay methods to test these enzyme inhibitors are well established and readily available. In contrast, the development of synthetic agents that modulate protein-protein interactions is much more demanding even though it is of great therapeutic value. In particular, approaches for the disruption of protein-protein interactions are made more difficult because: (a) large Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
4.3 Protein Secondary Structure Mimetics I251
and mobile protein surfaces are involved in protein-protein interactions, (b) natural protein-binding partners are usually not good models for small molecule antagonist design as the binding regions are often discontiguous and relatively featureless, (c) few “druglike” small molecules have been identified from library screening as effective disrupters of large surface area contact, and (d) finally, biological assays that evaluate the functional consequence of disrupting protein-protein interactions are less readily available. In spite of these daunting challenges, several successful approaches have appeared in recent years using small molecule agents to mediate protein-protein interactions. General methodologies, such as virtual and fragment screening, tethering techniques, and computer-aided inhibitor design, have been established and applied in drug discovery. The rational design of synthetic inhibitors that mimic protein secondary structural domains is an active area of research in the development of protein-protein disrupters. Such structural mimetics of a-helices and B-turns or strands are anticipated to maintain the biological functions of their protein progenitors and should possess biological activity.
4.3.2 History and Development
The rational design of low-molecular-weight inhibitors that disrupt protein-protein interactions is challenging because of their large interfaces. Often, as much as 1600 A2 of interfacial area with 10 to 30 amino acid residues (170 atoms) from each protein are buried upon complex formation [l].To effectively compete with such a vast binding surface using low-molecular-weight agents is a daunting task. Despite this, as early as 1925 it had been recognized that morphine competes with peptide ligands in binding to protein receptors [5]. In 1980, Farmer, with great foresight, proposed the use of cyclohexane as a scaffold to project functionality as a mimetic of protein secondary structures [6]. Moreover, several groups reported, in the late 1980s, nonpeptide agents that mimic B-turns or strands and this area has recently been summarized by Fairlie and Loughlin [7]. In a milestone analysis, the energetics for human growth hormone (hGH) binding to the extracellular domain of its receptor (hGHbp) was studied [S], leading to the conclusion that the critical binding region of one protein partner might be reduced to a small domain, and therefore, mimicked by relatively simple molecules. By conducting alanine scanning of the interfacial residues, Clackson and Wells found that a small and complementary set of these residues, the “hot spot”, accounts for most of the free energy change in the complex formation. They showed that the hGHbp residues Trpl04 and Trp169 (Fig. 4.3-1) dominate the binding interface, with each donating over -4.5 kcal mol-.’ to a total binding energy of -12.3 kcal mol-’ for the
252
I
4 Controlling Protein-Protein Interactions
Fig. 4.3-1 X-ray crystal structure o f the h C H (purple)/hCHbp (cyan) complex. Side chains of the critical amino acid residues (hot spots) are shown in stick representation.
complex formation. In a similar manner, Aspl71, Lys172, and Thr175 ofhGH make substantial contributions to the binding [9]. In contrast, half of the 31 interfacial residues do not make significant contributions. Some of the earliest work on protein surface mimetics came, in the early 1990s, from Hirschmann, Nicolaou, and Smith, who reported a series of nonpeptide agents that mimic b-strands and B-turns. These compounds were used to develop inhibitors of several protein targets, such as HIV protease and somatostatin (SRIF) receptors [10,11]. In an early example of synthetic mimics of a-helices, Honvell et al. showed that 1,G-disubstituted indanes present functionalities in a similar spatial arrangement to the i and i + 1 residues of an a-helix [12]. However, these mimics do not cover a surface area large enough to sufficiently represent an a-helical mimetic. In an attempt to improve on this, Kahne and coworkers have reported an a-helix mimic, based on an oligosaccharide scaffold, which binds the minor groove of DNA with selectivity over RNA [13].Similarly, Hamilton et al. have recently
4 . 3 Protein Secondary Structure Mimetics
reported terphenyl, oligoamide, and terephthalamide derivatives as structural and functional mimics of extended regions of a-helices and have confirmed their binding to a series of protein targets [14-161. Several reviews have provided insights into the key issues involved in identifying disrupters of protein-protein interactions. Stites has presented a thorough discussion on the thermodynamic aspects of protein-protein association and the relative importance of enthalpy, entropy, and the heat capacity effects in stabilizing complexation [ 11. Cochran has summarized the early development of synthetic antagonists of protein-protein interactions and a number of recent reviews have brought the field up to date [l,3,4,171. Most recently, Hamilton et al. have discussed the strategies for designing synthetic agents to target protein-protein interactions [18].
4.3.3 General Considerations
Conventional drug discovery often starts by screening a large and diverse chemical library, from which lead compounds can be identified using biochemical and cell-based evaluation methods. The subsequent steps involve an iterative loop of structure determination, modeling, and lead optimization. In many cases, millions of compounds in the preliminary screening, dozens of highresolution X-ray structures of a drug target, as well as months of collaborative research are necessary to achieve the potency, selectivity, and pharmacokinetic and toxicological properties required of a preclinical drug candidate. Rational inhibitor design offers a compelling alternative for the identification of protein-protein disrupters as it is based on a structural knowledge of the interface. In particular, synthetic scaffolds that mimic the key elements of a protein surface can potentially lead to small molecules with the full activity of a protein domain, a fraction of the molecular weight, and no peptide bonds. Furthermore, lead compounds derived from rational design can be readily optimized by structure-activity relationship (SAR) studies. In general, structure-based drug design treats the backbone of the protein as a relatively rigid entity. Once the structure of a complex of the protein with a representative ligand has been solved experimentally, it can be used as a valid template, onto which atoms or functional groups can be added to the ligand if free space is available within the binding pocket. In reality, protein side chains within the binding pocket may move to accommodate a ligand and, in some cases, there may even be limited movement of the polypeptide backbone. Moreover, bound solvent may define the surface of the binding pocket, rather than the protein itself, and thus limit the space available for the addition of substituents. Before designing small molecule agents that target certain protein-protein interfaces, it is helpful to consider the characteristics of a general protein-protein complex. The association constant, which is determined by
1
253
254
I the free energy difference (AG) between the associated and unassociated 4 Contro//ing Protein-Protein Interactions
states of the proteins, is the parameter of the utmost importance since it determines at what concentrations the protein complex is formed. However, the changes in enthalpy, entropy, and heat capacity all provide useful insights into the nature of the complexation and the interacting sites. In his review, Stites listed the thermodynamic characteristics for 43 protein-protein, and 26 protein-peptide interactions, most of which were determined by isothermal titration calorimetry. The range of AG is -7.0 to -17.2 kcal mol-' for protein-protein interactions and -5.3 to -11.7 kcal mol-' for protein-peptide interactions. The range of A H and A S is +12.6 to -66.7 kcal mol-' and f78.6 to -188.4 cal mol-' K-' for protein-protein interactions and +19.9 to -41.9 kcal mol-' and +95.7 to -109 cal mol-' K-' for protein-peptide interactions. The values of heat capacity (ACp), which can be correlated to the amount of polar and nonpolar surface areas buried upon complex formation, range from 2 to -767 and -100 to -1200 cal mol-' KP1 for protein-peptide interactions. The average A G value for protein-protein interactions is -10.40 kcal mol-' with a standard deviation of 2.49 kcal mol-'. The average AH value is -8.60 13.63 kcal mol-l, and that of AS is 6.12 43.68 cal mol-' K-'. Protein-protein interactions have an average AC, of -333 =t202 cal mol-' K-'. The most important conclusion to be drawn from this analysis is that the thermodynamic driving force for protein-protein interactions is highly variable, ranging from strongly enthalpically to strongly entropically driven. Stites also concluded that hydrophobic interactions generally provide the key contact forces for protein-protein complexation though other alternatives, such as electrostatic effects can also play a dominant role [19]. The association of proteins generally follows a two-step mechanism, with the first being a diffusion-controlled association resulting in a loose complex and the second involving specific docking of complementary surfaces that yields the high affinity complex [20]. A common feature of associating proteins is that the on-rate for interaction shows strong dependence on ionic strength, whereas the off-rate is relatively insensitive. The study of the association of bacterial ribonuclease barnase and its polypeptide inhibitor barstar, which is driven by strong complementary electrostatic forces, shed light on the influence of electrostatic forces on the structure of the activated complex [21]. Fersht and Schreiber probed the interaction of barnase and barstar at various ionic strengths and found that at low ionic strength, all proximal charge pairs form contacts. Increasing the ionic strength, which masks the electrostatic forces, induced a partial loss of the charge-charge interactions. However, the barnase-barstar interface still aligned itself correctly [22]. Extensive work has been done on the amino acid composition at protein-protein interfaces, which provides useful information for inhibitor design. Bogan et al. examined 2325 alanine mutants for which changes in free energy of binding have been measured and showed that the energetic
*
*
4.3 Protein Secondary Structure Mimetics I 2 5 5
contributions of the individual side chains did not correlate with their buried surfaces [23]. In several cases, a set of energetically unimportant contacts surrounded the hot spot, seeming to occlude bulk solvent in the manner of an 0 ring. Certain amino acid residues, in particular, tryptophan (21%),arginine (13%), and tyrosine (12%), appear more frequently in hot spots (contribute more than 2 kcal mol-' to a binding interaction) than others, such as leucine, methionine, serine, threonine, and valine, each of which account for less than 3% of the overall hot spot residues [24]. Tryptophan, arginine, and tyrosine residues are also found more frequently in the protein interfaces, with 3.91-, 2.47-, and 2.29-fold enrichment, respectively, in hot spot areas. An enrichment of tyrosine and tryptophan as well as a discrimination against valine, isoleucine, and leucine has also been reported in antibody complementarity-determining region (CDR) sequences [25]. Padlan et al. proposed that the enrichment of these aromatic amino acid residues is due to their ability to participate in hydrophobic contacts without large entropic penalty, as they have fewer rotatable bonds. Recent developments in bioinformatics have provided insights into the analysis of protein-protein interfaces and have helped detection of the hot spots. A wealth of data of alanine mutations in various protein-protein complexes is available (www.asedb.org) and has assisted in the design of small molecules to modulate their interactions [2G]. Table 4.3-1 lists the protein-protein interactions whose alanine scanning energetic data are currently available on the ASEdb database. Alternatives for detecting hot spot regions include computational tools that generate combinatorial libraries offunctional epitopes and identify recurring sets ofresidues in the epitope [27]. The spatial arrangement of key structural motifs at protein-protein interfaces has been efficiently detected by this method. Ben-Tal and coworkers have developed an algorithm, Rate4Site, and a web-server Consurf (consurf.tau.ac.il) [28] for identification of functional interfaces based on the evolutionary relations among homologous proteins, as reflected in phylogenetic trees [29]. Using the tree topology and branch lengths corresponding to the evolutionary relationships between two proteins, the algorithm accurately identified a homodimer interface of a hypothetical protein Mj0577 that was also detected in an X-ray crystallographic analysis.
4.3.4
Applications and Practical Examples
A major problem with peptide-based modulators of protein-protein interactions is that they are vulnerable to proteolytic cleavage and thus have poor bioavailability. Different strategies have been used to overcome this problem. For example, peptides in which L-amino acids at potential protease cleavage sites are replaced by D-aminO acids or constrained analogs have improved halflives in cellular assays. However, these methods have serious limitations as the
256
I
4 Contro//ing Protein-Protein hteractions
Protein-protein interactions currently listed in the ASEdb database
Table 4.3-1
Ab hu4D5-5/~185HER2 Agitoxinjshaker Angiogenin/RNase inhibitor Barnaselbarstar bFGF/FGFRlb BMP type IA receptor/BMP-4 Bovine profilin I/rabbit actin B PTI lchymotrypsin BPTI/trypsin CD2 /CD48 CD4/gp120 Charybdotoxin/shaker Complement Clq/IgG2b D1.3/E5.2 D1.3/HEL Dendrotoxin K / K f channels Erabutoxin A/AChR Erabutoxin A/Ma2-3 Factor VII/tissue factor H EL/ HYH E L-10 hG-CSF/hG-CSFbp hGH/MAb (1-21) hGHbp/MAblZB8 hG Hbp/MAbl3 E 1 hGHbpIMAb263 hGHbpjMAb3B7 hGHbp/MAb3D9 hI L- 18 binding protein/h I L- 18 HYHEL-lO/HEL IGF-l/IGF-lR
I L-2 (human)/ I L-2R IL-2 (murine)/IL-ZRB I L-4/1L4-BP IL-G/IL-GR
IL-G/MAb8 I L-8/IL-8R I L-8/1 L-8RA 1L4(IL4bp)/y -c Im2/E9 Dnase k-Conotoxin PVIIA/shaker K+ channel Kistrin/GP IIb-IIla MAb A4.6.1/VEGF mIL-2/ mIL-2Ra NmmI/nAChR NT-3/~75 NT-3/trkC Protein A/IgG1 RNase inhibitorlangiogenin RNase inhibitor/Rnase A SCTCRVb/SEC3-1A4 SEC3/TCR Vb Shaker/agitoxin Shaker/CTX sHIR/insulin Tissue factor/Fab 5G9 Tissue factorjfactor VIIa VEGF/KDR VEGF/MAb 3.2E3.1.1 VEGF/MAb A4.6.1 yCaM/calcineurin
unnatural amino acids and conformational constraints sometimes interfere with the complexation process. Furthermore, it has been suggested that the poor oral bioavailability of peptides is not solely due to their susceptibility to cleavage by peptidases as the peptide bond itself contributes, at least partially, to the problem [30]. Such limitations make the development of nonpeptide agents that mediate protein-protein interactions a matter of much interest and therapeutic value.
4.3.4.1
Peptidomimeticsof /?-TurnslStrands
Hirschmann, Nicolaou, and Smith have pioneered the development of synthetic agents that mimic B-strand and B-turn conformations. As an early example, Hirschmann and Nicolaou reported a mimetic of the cyclic peptide hormone somatostatin (SRIF) using a B-D-glucose scaffold [lo]. SRIF is a cyclic tetradecapeptide that inhibits the release of growth hormone (GH) [31].
4.3 Protein Secondary Structure Mimetics
1 Fig. 4.3-2
2
Structure of j3-D-glucose-based peptidomimetics of SRIF.
Previous studies had shown that cyclic hexapeptide 1 was a potent agonist of SRIF [32], due to the dipeptide motif of Phe-Pro, enforcing a B-turn conformation and the correct positioning of the remaining four side chains. In addition, the aromatic side chains of the Phe-Pro dipeptide provide favorable hydrophobic interactions with the SRIF receptor. On the basis ofthis peptide agonist of SRIF, compound 2 was designed with the critical side chains of 1 projected on a B-D-glucose scaffold (Fig. 4.3-2). B-D-Glucose is a good design for a B-turn mimetic because: (a) the pyran ring imposes an appropriate projection of the side chains, and (b) the glucose backbone is relatively rigid. The shape and substitution pattern of B-D-glucose was found to best present the Trp, Lys, and Phe side chains. A radiolabeled binding assay showed that 2 completely displaced a peptide ligand, 12'I-CGP 23996, from the SRIF receptor on membranes from AtT-20 cell lines with an ICso of 1.9 pM. Binding studies using cerebral cortex and pituitary membrane cells showed similar results. Taken together, this study supported the validity of using nonpeptide scaffolds to mimic protein secondary structures that are of biological interest. In a follow-up study, Smith and Hirschmann have elaborated a pyrrolinonebased mimetic of the /I-strandlp-sheet conformations [33, 341, in which all of the key recognition features (i.e., side chains and hydrogen-bond donors/acceptors) are faithfully represented within a low-molecular-weight nonpeptide analog 4 (Fig. 4.3-3). This design has been applied to the development of antagonists of HIV-1 protease and more recently to mimics of major histocompatibility complex (MHC)class I1 protein substrate [34, 351. Computational modeling using the Macromodel program suggested that 3,S-linked pyrrolin-4-ones can structurally mimic a short peptide in a B-strand conformation. In a computer-simulated conformational search, the pyrrolinone rings fix the dihedral angles analogous to 4, $, and w in a peptide (Fig. 4.3-3). This favored conformation is due to the hindrance of the gauche interaction between the side chain substituents and their neighboring pyrrolinone rings. The side chains appended at the 5-positions of pyrrolinone
I
257
258
I
4 Controlling Protein-Protein Interactions
I >
3 Fig. 4.3-3
4 Polypyrrolinone-based B-turn peptidomimetic 4.
take up an orientation axial to the heterocyclic ring. Comparison of peptide 3 with the mimetic 4 suggested that the disposition of the vinylogous amide carbonyls in 4 closely reproduces the orientation of the peptide carbonyls in 3. By this means, compound 4 maintains the hydrogen-bond acceptors of the native B-strand using the vinylogous amide nitrogen. Despite the presence of the vinylogous substitution, pyrrolinone -NH groups are comparable to amide groups in basicity and may further stabilize the requisite B-strand and B-sheet conformations through intra- and intermolecular hydrogen bonding, respectively. As a test of this B-strand mimetic design, Hirschmann and Smith selected a fragment of equine angiotensinogen, tetrapeptide methyl ester 3, as the initial target. Least-square comparison showed good spatial agreement between the optimized conformation of 4 and the X-ray crystal structure of 3. The X-ray crystal structure of 4 confirmed that this mimetic adopts a B-strand conformation in solid state. Moreover, the side chain trajectories and carbonyl orientations showed similar spatial projection with those of the tetrapeptide, affirming that 4 is a good structural mimetic of 3. To evaluate the biological applicability ofthis design, Smith and Hirschmann have developed HIV-1 protease inhibitors based on the polypyrrolinone scaffold. Previous studies have shown that many binding interactions are conserved in the HIV-1 protease/inhibitor complex formation [36]. B-Strand peptide inhibitors, such as 5 and JG-365 (Ac-Ser-Leu-Asn-Phe-Hea-Pro-IleVal-OMe, Hea - hydroxylamine [CH(OH)CHzN]),bind in an active site on the HIV-1 protease surface with their side chains inserting into hydrophobic pockets (Fig. 4.3-4). The inhibitory effects of the pyrrolinone derivatives were evaluated using enzyme inhibition and cellular activation assays. Compound G (Fig. 4.3-5) showed an IC50 of lOnM, compared to O.GnM for the related peptide inhibitor 5 (L682,679). However, the synthetic agent G showed better cell transport capacity. In a cellular antiviral assay, 5 and G showed CIC95 values (the concentration that inhibits 95% of virus multiplication in the cellular cultures) of 6.0 and 1.5 pM, respectively. Smith and Hirschmann proposed that the improved cellular uptake properties of polypyrrolinones are due to a reduction in the inhibitor solvation. Solvation is an impediment to transport because extraction of a molecule into a lipid bilayer from an aqueous phase is
4.3 Protein Secondary Structure Mimetics
Fig. 4.3-4
Complex o f t h e HIV-1 protease and p-strand peptide inhibitor JC-365.
5 (L682,679) Fig. 4.3-5
6
HIV-1 protease inhibitors 5 and 6
thermodynamically disfavored [ 371. The polypyrrolinone compounds can form intramolecular hydrogen bonds, which reduce the number of solvating water molecules by two and favor the entry of the mimetics into the cell membrane. Smith and Hirschmann’s studies opened a new field of using de novo designed synthetic scaffolds to mimic relatively large protein secondary structures. While more structural studies, such as X-ray and N M R analyses, are needed to confirm whether these compounds recognize their protein targets in the same manner as their peptide models, the concept of using small molecules to project critical functionalities to target proteins is established. Although many of the B-strand mimetic designs were used only to modulate protein-ligand interactions, the potential application of this strategy in other biological processes is clear.
I
259
260
I
4 Controlling Protein-Protein interactions
4.3.4.2
Terphenyl-based Helical Mimetics that Disrupt the Bcl-xL/Bak Interaction
a-Helices are another major protein secondary structure found in nature. About 40% of all amino acids in natural proteins take up a-helical conformations. A typical a-helix rises at 5.4 per turn or 1.5 A per residue (Fig. 4.3-G(a)).The amino acid residues at the i, i + 3, i 4, and i + 7 positions are aligned on the same face of the helical backbone and often combine in the recognition of a complementary surface. a-Helices play key roles in numerous protein-protein, protein-DNA, and protein-RNA interactions, making them an attractive target for the design of small molecule agents that mimic both their structures and functions [38]. In recent years, major strides have been made in this field, evolving from strategies based on induced helix stabilization to the recent advent of helix proteomimetics, molecules that mimic the surface functionalities presented by a-helical secondary structures 12, 391. Hamilton et al. have reported a series of synthetic agents based on a terphenyl scaffold that mimic the helical region of the Bak peptide. The terphenyl derivatives (Fig. 4.3-G(b)),substituted with alkyl or aryl side chains at the 3,2',2"-positions, project these side chains in a fashion similar to the arrangement of the i, i 4, and i 7 residues on an a-helical backbone.
a
+
Fig. 4.3-6
+
+
(a) Surface displacement o f residues on an a-helix surface.
(b) Terphenyl-based a-helical rnimetics.
4.3 Protein Secondary Structure Mimetics
To test this general design, Hamilton and coworkers have developed a-helix mimetics of the Bak protein that binds into a shallow hydrophobic cleft on the surface of Bcl-xL. Bak and Bc1-x~are members of the B-cell lymphoma2 (Bcl-2) protein family, which plays an important role in the apoptotic pathway [40]. This protein family can be divided into two subgroups: the proapoptotic and the prosurvival subfamilies. The proapoptotic subfamily proteins, such as Bak, Bad, and Bax, share a minimal helical homologous region, the BH3 domain, which is responsible for mediation of apoptosis through heterodimerization with the prosurvival Bcl-2 family members [41]. Overexpression of the prosurvival proteins, such as Bcl-2 and Bcl-x~,can inhibit the potency of many currently available anticancer drugs by blocking the apoptotic pathway [42]. A current strategy for modulating apoptosis is to target the Bak-recognition site on BcI-XL and thereby disrupt the protein-protein contact. The structure of the Bcl-xL/Bak complex determined by N M R spectroscopy showed that a helical region of Bak (amino acid 72 to 87) binds to a hydrophobic cleft on the surface of Bcl-x~(& = 340 nM) [43].Furthermore, the crucial residues for binding, shown by alanine scanning, are Va174, Leu78, Ile81, and Ile85, which project at the i, i 4, i 7, and i 11positions along one face of the Bak helix. The design of agents that directly mimic the death-promoting BH3 domain of the proapoptotic subfamily of Bcl-2 proteins is of much current interest as they can potentially provide drugs that control apoptosis [44]. A series of terphenyl derivatives with different side chains was prepared as structural mimetics of the Bak peptide using a modular and convergent synthesis. We used a fluorescence polarization assay to monitor the interaction between the inhibitor and the target protein. Some of the structure-activity results are listed in Table 4.3-2. Terphenyl 7, with two carboxyl groups and a substituent sequence of isobutyl, 1-naphthylmethylene,isobutyl groups in the 3,2',2"-positions, was identified as a potent inhibitor (Kd = 114 nM) of the Bak/Bcl-xLcomplexation. The binding specificity was confirmed by scrambling the sequence of the substitutions, as in isomer 12, which caused a 25-fold drop in Ki. The importance of the side chains was confirmed by terphenyll3 which lacks the ability to disrupt Bak binding to BcI-XL, ruling out the possibility of nonspecific binding by the terphenyl backbone. "N-HSQC N M R experiments with 7 indicated that the terphenyl derivatives target the same hydrophobic cleft on Bc1-x~as the Bak peptide (shown in blue, Fig. 4.3-7). Residues A89, L99, L108, T109, S110, 4111, 1114, 4125, L130, F131, W137, G138, R139, 1140, A142, S145, and F146 (shown in magenta in Fig. 4.3-7) showed significant chemical shift changes on addition of the synthetic inhibitor 7. Some other residues, including G94, L112, S122, G134, K157, E158, and M159 (shown in yellow in Fig. 4.3-7) showed moderate chemical shift changes under the same conditions. All these affected residues lie near the shallow cleft on the protein surface into which the Bak BH3 helix binds. The targeted residues V74, L78, and I81 of Bak BH3 are within 4 A distance of residues F97, R102, L108, L130, 1140, A142, and F146 of Bc~-xL,
+ +
+
I
261
262
I
4 C o n t r o h g Protein-Protein Interactions
Table 4.3-2 Results ofthe fluorescence polarization assay for the terphenyl-based Bak rnirnetics.
q
.
3
H
Bn
-iBu
11
2.73
iBu
iBu
12
2.70
H
H
13
>30.0
C02H Polarization measurements were recorded on titration of inhibitors at varying concentrations in a solution of 15 n M labeled Bak peptide (F1-CQVCRQLAIIGDDINR-CONH2) and 184 nM Bcl-xL (25 "C, 1.0 mM PBS, pH 7.4)
most ofwhich showed significant chemical shift changes (F97 overlapped with NS), confirming that 7 and Bak BH3 target the same area on the exterior surface of Bc1-x~.Overlay of 7 and the Bak BH3 peptide suggested that the terphenyl indeed adopts a staggered conformation, mimicking the cylindrical shape of the helix with the substituents making a series of hydrophobic contacts with the protein surface. Further studies using human embryonic kidney 293 (HEK293) cells have shown that terphenyl 7 disrupts Bak/Bcl-xL binding in whole cells [lG]. HEK293 cells transfected with both HA-Bcl-xL and flag-Bax,an analog of Bak, were treated with terphenyl derivatives. After 24-h incubation, the cells were harvested and lysed. HA-tagged BcI-XLwas collected via immunoprecipitation with HA antibody. The resulting mixture was loaded on to a 12.5% SDS-PAGE gel, and proteins transferred to nitrocellulose for western blot analysis. The presence of Bax protein was probed with antiflag antibody. The inhibitory potencies of the terphenyl compounds were determined by measuring the relative intensity of the Bax protein bound to Bcl-xL. We found that 51% of the Bak/Bcl-xL interaction was disrupted in HEK293 cells treated with terphenyl 7, indicating that certain terphenyls are competitive with the full-length protein-protein interaction in a cellular environment.
4.3 Protein Secondary Structure Mimetics
Fig. 4.3-7 Results ofthe "N-HSQC and computational docking experiments o f 7 binding to BcI-xL. The residues that showed significant chemical shift changes in the presence o f 7 are shown in yellow. The
highest ranked binding mode o f inhibitor 7 predicted from a computational docking simulation (Autodock 3.0) has been superimposed on the helical Bak BH3 domain for comparison.
A critical issue in the design of small molecule a-helix mimetics is the selectivity of these compounds among different helix-binding proteins, as lack of specificity might lead to damage to normal cells [45]. Nature frequently uses secondary structure modules, such as a-helices, to recognize different protein targets and achieves high specificity through spatial and charge complementarity [ 171. As an example, the tumor suppressor protein p53 selectively binds, with its helical N-terminal domain, to the regulatory protein HDM2 over other oncogenic proteins, such as Bcl-xL and Bcl-2, which both complex with the a-helical Bak BH3 domain [46]. Comparison of terphenyl isomers 7 and 10, with 1-and 2-naphthylmethylene side chains, respectively, on the middle phenyl rings, showed that terphenyl derivatives can selectively bind to different helix-binding proteins (Table 4.3-3) [15, 161. Terphenyl 5 binds to Bcl-xL more than 10-fold stronger than 8, whereas, terphenyl 8 specifically disrupts the HDM2/p53 complexation, possibly due to the deeper pocket in HDM2 for W23 at the i + 4 position compared to the L78-pocket of Bcl-xL or Bcl-2. These results confirm the generality of the terphenyl scaffold as a mimic of the side chain induced selectivity of a-helices and provide a useful tool for the rational design of protein-binding agents.
I
263
264
I
4 Controlling Protein-Protein interactions Table 4.3-3 Comparison ofterphenyl derivatives 7 and 10 in inhibition of different protein-protein complex Ki (ILM)
HDM2/p53
Bcl-xL/Bak
Bcl-Z/Bak
25.7 0.182
0.114 2.50
0.121 15.0
~~
7 10
4.3.5 Future Developments
The future development of structure-based drug design depends heavily on the progress of computer techniques. In a recent review, Jorgenson has pointed out that despite widespread suspicion, computer-aided drug design has become a useful tool in generating focused libraries [47]. The recently developed computer program BOMB is among the first software packages that can assist in the design of inhibitors for a specific protein target, from scratch, on the basis of the available structural information. Even though these approaches are in their infancy, when more parameters, such as solvent effects, ionic strength, and surface mobility, are taken into account the accuracy and credibility of the methods will be improved. It is unlikely that dramatic improvements in current sampling algorithms and scoring functions will occur in the near future; thus, advancement of the field will likely come from better understanding of how to apply existing technologies. The techniques applied to the identification of potential inhibitors of protein-protein interactions have been another evolving area. NMR-based screening methods that focus either on the protein receptor or the ligand have been used in pharmaceutical research, although they can still be lengthy processes [48].Structure-based NMR screening and fragment combination strategies are particularly effective for discovering novel leads that target a different area on a protein surface. Furthermore, Mrksich etal. have described a strategy using matrix-assisted laser-desorption ionization timeof-flight (MALDI-TOF) mass spectrometry (MS) to screen large libraries of low-molecular-weight compounds [49]. The major advantage of MS is that it avoids the requirement of analyte labeling. Mrksich and coworkers used self-assembled monolayers (SAM) that are engineered to measure enzyme activities and MALDI-TOFto detect lead compounds. Currently, this approach has been used only in identifying small molecule agents that inhibit enzyme activity. MS will certainly be applied more broadly to detect inhibitors for protein-protein interactions as an efficient alternative to the conventional fluorescent-based screening methods. Fragment-based lead discovery has drawn much attention as a novel discovery strategy. By screening a relatively small number of fragment units, functional groups can be found to recognize subpockets within an active site. This approach is especially useful with protein targets that have more
References I 2 6 5
than one binding pocket, each of which might contribute separately to the complex formation. Furthermore, smaller molecules offer better starting points for drug discovery because they can be readily assembled into larger compounds. Wells et al. have reported a powerful technique for identifying antagonists of protein-protein interactions with only medium to low potency (micromolar millimolar) by using a dynamically interconverting thioltethered library [SO]. This method has a great advantage in searching for inhibitors that target a mobile protein surface. Kodadek et al. have developed a general methodology that is effective in searching for a second binding site on the protein surface. A library of combinatorial oligomeric compounds is attached to a low-affinity anchor compound that can recognize the target protein. The resulting library is then screened under conditions too demanding for the lead to support robust binding to the protein target. Using MDM2 as a model, they have identified relatively potent chimeric compounds that simultaneously recognize multiple binding sites on the protein surface [Sl].
-
4.3.6 Conclusion
Several examples of rationally designed protein secondary structure mimetics that modulate protein-protein and protein-ligand interactions have appeared in recent years. These studies showed that the strategy of mimicking protein secondary structures in small molecules provides an alternative to conventional library screening in drug discovery. To further accelerate progress in this area, we need more in-depth understanding of the receptor-ligand complexation, which requires a collaborative effort in organic syntheses, structural analyses, computational simulations, and biological evaluation.
Acknowledgments
We thank the National Institutes of Health (GMG9850) for financial support of this work.
References I . W.E. Stites, Protein-protein
interactions: interface structure, binding thermodynamics, and mutational analysis, Chem. Rev. 1997, 97,1233-1250. 2. M.W. Peczuh, A.D. Hamilton, Peptide and protein recognition by designed
molecules, Chem. Rev. 2000, 100, 2479-2493. 3. P.L. Toogood, Inhibition of protein-protein association by small molecules: approaches and progress, /. Med. Chem. 2002,45, 1543- 1558.
266
I
4 Controlling Protein-Protein lnteractions 4.
5.
6. 7.
8.
9.
10.
11.
12.
A.G. Cochran, Antagonists of protein-protein interactions, Chem. Biol. 2000, 7, R85-R94. J.M.Gulland, R. Robinson, The constitution of codeine and the baine, Mem. Proc. Munch. Lit. Phil. SOC. 1925, 69, 79. P.S. Farmer, in Drug Design, (Ed.: E.J. Ariens), Vol. X . Academic, New York, 1980, pp. 119. W.A. Loughlin, J.D. Tyndall, M.P. Glenn, D.P. Fairlie, Beta-strand mimetics, Chem. Rev. 2004, 104, 6085-6118. T. Clackson, J.A.Wells, A hot-spot of binding-energy in a hormone-receptor interface, Science 1995, 267, 383-386 B.C. Cunningham, J.A. Wells, Comparison of a structural and a functional epitope, 1.Mol. Biol. 1993, 234,554-563. R. Hirschmann, K.C. Nicolaou, S. Pietranico, J. Salvino, E.M. Leahy, P.A. Sprengeler, G. Furst, A.B. Smith, C.D. Strader, M.A. Cascieri, M.R. Candelore, C. Donaldson, W. Vale, L. Maechler, Nonpeptidal peptidomimetics with a beta-D-glucose scaffolding - a partial somatostatin agonist bearing a close structural relationship to a potent, selective substance-P antagonist, /. Am. Chem. Soc. 1992, 114,9217-9218. A.B. Smith, R. Hirschmann, A. Pasternak, R. Akaishi, M.C. Guzman, D.R. Jones, T.P. Keenan, P.A. Sprengeler, P.L. Darke, E.A. Emini, M.K. Holloway, W.A. Schleif, Design and synthesis of peptidomimetic inhibitors of Hiv-1 protease and renin - evidence for improved transport, 1.Med. Chem. 1994,37,215-218. D. Horwell, M. Pritchard, J. Raphy, G. Ratcliffe, ‘Targeted’molecular diversity: design and development of non-peptide antagonists for cholecystokinin and tachykinin receptors, Immunophamacology 199G, 33,68-72; D.C. Honvell, W. Howson, G.S. Ratcliffe, H.M.G. Willems, The design of dipeptide helical mimetics: the synthesis, tachykinin receptor affinity and conformational analysis of
13.
14.
15.
16.
1,1,6-trisubstitutedindanes, Bioorg. Med. Chem. 1996, 4, 33-42. H. Xuereb, M. Maletic, J. Gildersleeve, I. Pelczer, D. Kahne, Design of an oligosaccharide scaffold that binds in the minor groove of DNA, /. Am. Chem. SOC.2000, 122, 1883-1890. B.P. Orner, J.T. Ernst, A.D. Hamilton, Toward proteomimetics: terphenyl derivatives as structural and functional mimics of extended regions of an alpha-helix,/. Am. Chem. SOC. 2001, 123,5382-5383; J.T. Ernst, 0. Kutzki, A.K. Debnath, S. Jiang, H. Lu, A.D. Hamilton, Design of a protein surface antagonist based on alpha-helix mimicry: inhibition of gp41 assembly and viral fusion, Angew. Chem. Int. Ed. Engl. 2001,41,278-282-; 0. Kutzki, H.S. Park, J.T. Ernst, B.P. Orner, H. Yin, A.D. Hamilton, Development of a potent Bcl-X(L)antagonist based on alpha- helix mimicry, /. Am. Chevn. SOC.2002, 124, 11838-11839; J.T. Ernst, J. Becerril, H.S. Park, H. Yin, A.D. Hamilton, Design and application of an alpha-helix-mimetic scaffold based on an oligoamide-foldamer strategy: antagonism of the bak Bh3/Bcl-X1 complex, Angew. Chem. Int. Ed. Engl. 2003,42,535-550 H. Yin, A.D. Hamilton, Terephthalamide derivatives as mimetics of the helical region of bak peptide target Bcl-X1 protein, Bioorg. Med. Chem. Lett. 2004, 14, 1375-1379; H. Yin, G.I. Lee, K.A. Sedey, J.M. Rodriguez, H.G. Wang, S.M. Sebti, A.D. Hamilton, Terephthalamide derivatives as mimetics of helical peptides: disruption of the Bcl-Xl/Bak interaction, J. Am. Chem. Soc. 2005, 127, in press. H. Yin, G.I. Lee, H.S. Park, G.A. Payne, J.M. Rodriguez, S.M. Sebti, A.D. Hamilton, Terphenyl-based helical mimetics that disrupt the P53/Hdm2 interaction, Angew. Chem. Int. Ed. Engl. 2005, 44, 2704-2707. H. Yin, G.I. Lee, K.A. Sedey, 0. Kutzki, H.S. Park, B.P. Orner, J.T. Ernst, H.G. Wang, S.M. Sebti, A.D. Hamilton, Terphenyl-based bak-Bh3
References I 2 6 7
17.
18.
19.
20.
21.
alpha-helical proteomimetics as low-molecular-weight antagonists of Bcl-X1,j . Am. Chem. Soc. 2005, 127, 10191-10196. T. Berg, Modulation of protein-protein interactions with small organic molecules, Angew. Chem. Int. Ed. Engl. 2003,42, 2462-2481; D.L. Boger, J. Desharnais, K. Capps, Solution-phase combinatorial libraries: modulating cellular signaling by targeting protein-protein or protein-DNA interactions, Angew. Chem., Int. Ed. Engl. 2003, 42,4138-4176; D.L. Boger, Solution-phase synthesis of combinatorial libraries designed to modulate protein-protein or protein-DNA interactions, Bioorg. Med. Chem. 2003, 1 1 , 1607-1613; A.G. Cochran, Protein-protein interfaces: mimics and inhibitors, Curr. Opin. Chem. Biol. 2001, 5, 654-659; T.R. Gadek, J.B. Nicholas, Small molecule antagonists of proteins, Biochem. Pharmacol. 2003, 651-8; A.V. Veselovsky, Y.D. Ivanov, A.S. Ivanov, A.I. Archakov, P. Lewi, P. Janssen, Protein-protein interactions: mechanisms and modification by drugs, 1.Mol. Recognit. 2002, 15, 405-422; M.R. Arkin, J.A. Wells, Small-molecule inhibitors of protein-protein interactions: progressing towards the dream, Nat. Rev. Drug Discov. 2004, 3, 301-317. H. Yin, A.D. Hamilton, Strategies for targeting protein-protein interactions using synthetic agents, Angew. Chem., Int. Ed. Engl. 2005, 44,4130-4163. G.C. Kresheck, L.B. Vitello, J.E. Erman, Calorimetric studies on the interaction of horse ferricytochrome-C and yeast cytochrome-C peroxidase, Biochemistry 1995,34,8398-8405. H. Wendt, L. Leder, H. Harma, 1. Jelesarov, A. Baici, H.R. Bosshard, Very rapid, ionic strength-dependent association and folding of a heterodimeric leucine zipper, Biochemistry 1997, 36,204-213. C. Frisch, G. Schreiber, C.M. Johnson, A.R. Fersht, Thermodynamics of the interaction of barnase and barstar: changes in free energy versus changes
22.
23.
24.
25.
26.
27.
28.
29.
30.
in enthalpy on mutation, j . Mol. Biol. 1997,267,696-706. C. Frisch, A.R. Fersht, G. Schreiber, Experimental assignment of the structure of the transition state for the association of barnase and barstar, /. Mol. Biol. 2001, 308, 69-77. A.A. Bogan, K.S. Thorn, Anatomy of hot spots in protein interfaces, j . Mol. Biol. 1998, 280, 1-9. B.Y. Ma, T. Elkayam, H. Wolfson, R. Nussinov, Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces, Proc. Natl. Acad. Sci. U. S.A. 2003, 100,5772-5777. E.A. Padlan, On the nature of antibody combining sites - unusual structural features that may confer on these sites an enhanced capacity for binding ligands, Proteins Struct. Funct. Genet. 1990, 7,112-124. K.S. Thorn, A.A. Bogan, Asedb: a database of alanine mutations and their effects on the free energy of binding in protein interactions, Bioinformatics 2001, 17, 284-285. N. Leibowitz, Z.Y. Fligelman, R. Nussinov, H.J. Wolfson, Automated multiple structure alignment and detection of a common substructural motif, Proteins Struct. Funct. Genet. 2001, 43,235-245; B.Y. Ma, H.J. Wolfson, R. Nussinov, Protein functional epitopes: hot spots, dynamics and combinatorial libraries, Curr. Opin. Struct. Biol. 2001, 1 1 , 364-369. F. Glaser, T. Pupko, I . Paz, R.E. Bell, D. Bechor-Shental, E. Martz, N. Ben-Tal, Consurf: identification of functional regions in proteins by surface-mapping of phylogenetic information, Bioinformatics 2003, 19, 163- 164. R.E. Bell, N. Ben-Tal, In silico identification of functional protein interfaces, Comp. Funct. Genom. 2003, 4,420-423. R. Hirschmann, Medicinal chemistry in the golden-age of biology - lessons from steroid and peptide research,
268
I
4 Controlling Protein-Protein Interactions
Angew. Chem. Int. Ed. Engl. 1991, 30, 1278-1301. 31. P. Brazeau, W. Vale, R. Burgus, R. Guillemi, Isolation of Somatostatin (a somatotropin-release-inhibitingfactor) of ovine hypothalamic origin, Can.]. Biochem. 1974,52,1067-1072. 32. P. Brazeau, W. Vale, R. Burgus, N. Ling, M. Butcher, J. Rivier, R. Guillemi, Hypothalamic polypeptide that inhibits secretion of immunoreactive pituitary growth-hormone, Science 1973, 179, 77-79. 33. A.B. Smith, W.Y. Wang, P.A. Sprengeler, R. Hirschmann, Design, synthesis, and solution structure of a pyrrolinone-based beta-turn peptidomimetic, J . Am. Chem. SOC. 2000, 122,11037-11038; A.B. Smith, H. Liu, R. Hirschmann, A second generation synthesis of polypyrrolinone nonpeptidomimetics: prelude to the synthesis of polypyrrolinones on solid support, Org. Lett. 2000, 2,2037-2040 A.B. Smith, T.P. Keenan, R.C. Holcomb, P.A. Sprengeler, M.C. Guzman, J.L. Wood, P.J. Carroll, R. Hirschmann, Design, synthesis, and crystal-structure of a pyrrolinone-based peptidomimetic possessing the conformation of a beta-strand - potential application to the design of novel inhibitors of proteolytic-enzymes, J. Am. Chem. SOC. 1992, 114,10672-10674; A.B. Smith, L.D. Cantin, A. Pasternak, L. Guise-Zawacki, W.Q. Yao, A.K. Charnley, J. Barbosa, P.A. Sprengeler, R. Hirschmann, S. Munshi, D.B. Olsen, W.A. Schleif, L.C. Kuo, Design, synthesis, and biological evaluation of monopyrrolinone-based Hiv-1 protease inhibitors, J. Med. Chem. 2003,46, 1831-1844; A.B. Smith, M.C. Guzman, P.A. Sprengeler, T.P. Keenan, R.C. Holcomb, J.L. Wood, P.J. Carroll, R. Hirschmann, De-novo design, synthesis, and x-ray crystal-structures of pyrrolinone-based beta-strand peptidomimetics, J . Am. Chem. Soc. 1994, 116, 9947-9962.
34. A.B. Smith, A.B. Benowitz, P.A.
35.
36.
37. 38. 39.
Sprengeler, J. Barbosa, M.C. Guzman, R. Hirschmann, E. J. Schweiger, D.R. Bolin, 2. Nagy, R.M. Campbell, D.C. Cox, G.L. Olson, Design and synthesis of a competent pyrrolinone-peptide hybrid ligand for the class Ii Major histocompatibility complex protein Hla-Dr1,J. Am. Chem. SOC.1999, 121, 9286-9298. A.B. Smith, R. Hirschmann, A. Pasternak, W.Q. Yao, P.A. Sprengeler, M.K. Holloway, L.C. Kuo, Z.G. Chen, P.L. Darke, W.A. Schleif, An orally bioavailable pyrrolinone inhibitor of Hiv-1 protease: computational analysis and X-ray crystal structure of the enzyme complex, J . Med. Chem. 1997, 40, 2440-2444; P.V. Murphy, J.L. O’Brien, L.J. Gorey-Feret, A.B. Smith, Synthesis of novel Hiv-1 protease inhibitors based on carbohydrate scaffolds, Tetrahedron 2003, 59, 2259-2271; P.V. Murphy, J.L. O’Brien, L.J. Gorey-Feret, A.B. Smith, Structure-based design and synthesis of Hiv-1 protease inhibitors employing beta-D-mannopyranoside scaffolds, Bioorg. Med. Chem. Lett. 2002, 12, 1763-1766. J.R. Huff, Hiv Protease - a Novel Chemotherapeutic Target for Aids, /. Med. Chem. 1991,34, 2305-2314 A.L. Swain, M.M. Miller, J. Green, D.H. Rich, J. Schneider, S.B.H. Kent, A. Wlodawer, X-ray crystallographic structure of a complex between a synthetic protease of human immunodeficiency virus-1 and a substrate-based hydroxyethylamine inhibitor, Proc. Natl. Acad. Sci. U.S . A. 1990,87,8805-8809. W.D. Stein, The Movement ofMolecules across Cell Membranes, Academic, New York, 1967, pp. 65-125. D.P. Fairlie, M.L. West, A.K. Wong, Towards protein surface mimetics, Curr. Med. Chem.1998,5, 29-62. L.D. Walensky, A.L. Kung, I. Escher, T.J. Malia, S. Barbuto, R.D. Wright, G. Wagner, G.L. Verdine, S.J. Korsmeyer, Activation of apoptosis in
References I 2 6 9
40.
41.
42.
43.
44.
vivo by a hydrocarbon-stapled Bh3 45. J.W. Harbour, T.G. Murray, in helix, Science 2004, 305, 1466-1470. Ophthalmic Surgely: Principles and J.M. Adams, S. Cory, The Bcl-2 protein Techniques, (Ed.: D. Albert), Blackwell family: arbiters of cell survival, Science Publishers, Maden, 1998, pp. 1998, 281,1322-1326; J.C. Reed, 682-705. Double identity for proteins of the 46. J.W. Harbour, L. Worley, D.D. Ma, Bcl-2 family, Nature 1997, 387, M. Cohen, Transducible peptide therapy for uveal melanoma and 773-776. retinoblastoma, Arch. Ophthalmol. D.T. Chao, S.J. Korsmeyer, Bcl-2 2002, 120,1341-1346. family: regulators of cell death, Annu. Rev. Immunol. 1998, 16, 395-419. 47. W.L. Jorgensen, The many roles of computation in drug discovery, Science A. Strasser, D.C.S. Huang, D.L. Vaux, 2004,303,1813-1818. The role of the Bcl-2/Ced-9 gene family in cancer and general 48. C.A. Lepre, J.M. Moore, J.W. Peng, implications of defects in cell death Theory and applications of Nmr-based control for tumourigenesis and screening in pharmaceutical research, Chem. Rev. 2004,104,3641-3675. resistance to chemotherapy, Biochim. Biophys. Acta Rev. Cancer 1997, 1333, 49. D.H. Min, W.J. Tang, M. Mrksich, Chemical screening by mass F 151-F178. spectrometry to identify inhibitors of M. Sattler, H. Liang, D. Nettesheim, anthrax lethal factor, Nut. Biotechnol. R.P. Meadows, J.E. Harlan, 2004, 22,717-723. M. Eberstadt, H.S. Yoon, S.B. Shuker, 50. D.A. Erlanson, A.C. Braisted, D.R. B.S. Chang, A.J. Minn, C.B. Raphael, M. Randal, R.M. Stroud, Thompson, S.W. Fesik, Structure of E.M. Gordon, J.A. Wells, Site-directed Bcl-X(L)-Bakpeptide complex: ligand discovery, Proc. Natl. Acad. Sci. recognition between regulators of U. S. A. 2000, 97,9367-9372. apoptosis, Science 1997, 275, 51. M.M. Reddy, K. Bachhawat-Sikder, 983-986. T. Kodadek, Transformation of J.M. Adams, S. Cory, Life-or-death low-affinity lead compounds into decisions by the Bcl-2 protein family, high-affinity protein capture agents, Trends Biochem. Sci. 2001, 26, Chem. Bid. 2004, 1 1 , 1127-1137. 61-66.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I271
5 Expanding the Genetic Code 5.1 Synthetic Expansion o f the Central Dogma
Masahiko Sisido
Outlook
Protein biosynthetic system has been expanded to incorporate a variety of nonnatural amino acids. The expansion includes nonenzymatic attachment of a nonnatural amino acid to a specific tRNA, design of orthogonal tRNAs that cannot be aminoacylated by any of the endogenous aminoacyl-tRNA synthetases, examination of elongation factor (EF-Tu) if it accepts wide variety of nonnatural amino acids, extension of the codonlanticodon pairs for assigning the positions of nonnatural amino acids, and finally expansion of ribosomal system to accept nonnatural amino acids. The extent of the expansion required at each step depends on the types of nonnatural amino acid. For amino acids whose structures resemble some of the naturally occurring ones, relatively small alterations on the relevant biomolecules may be sufficient. For large-sized nonnatural amino acids that carry specialty side groups, however, further modifications of the biomolecules are required and sometimes even creation of totally artificial “bio”molecu1es is needed. The author will refer to the small expansion that requires only minor modification within the framework of conventional protein engineering, as the biological expansion. On the other hand, if the expansion requires introduction of a synthetic component it may be called chemical or synthetic expansion. In this chapter, we inclined to describe the chemical expansion more than the biological one, because our final goal is to introduce chemical functions into living organisms by the incorporation of nonnatural amino acids that often have large-sized specialty side groups. But, of course, the above discrimination is tentative and there is no clear boundary between the two. The technology of nonnatural mutagenesis is finding a wide range of applications in fluorescence labeling for proteome analysis, synthesis of Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinhrim ISBN: 978-3-527-31150-7
272
5 Expanding the Genetic Code
I phosphorylated or glycosylated proteins as medicinal tools, and so on. Furthermore, synthesis of mutant proteins that contain specialty amino acids in living cells will open a way toward “synthetic microorganisms” that function differently from the existing organisms.
5.1.1 Introduction
Progress of synthetic chemistry during the last century was really overwhelming. Chemists with the state-of-the-artknowledge and technique can produce almost any compounds that can exist in nature. Moreover, they can fabricate compounds into membranes, vesicles, and other supramolecular assemblies by using secondary forces, like hydrogen bonds, electrostatic forces, hydrophobic interactions, and so on. Then, a question arises, whether chemists can create a living organism. Creation of a living organism is not an unrealistic target, because essential mechanisms of major reactions in living cells and important structures of biomolecules that function inside the cells have been clarified during the last 30 years. It may be possible, at least in theory, to put all components of the DNA replicating system and the protein synthesizing system inside an artificial liposome together with relevant monomers for creation of a minimum prototype of a self-replicating system. The most advantageous point of the synthetic approach is, however, not a simple reconstitution of the existing living organisms, but expansion or alteration of the existing systems by introducing analogs and surrogates of biomolecules. Analogs of biomolecules are artificial compounds that resemble existing biomolecules and function like they do in living organisms. Nonnatural amino acids and nonnatural nucleic bases, described in this chapter, are typical analogs. Surrogates are also artificial molecules that have structures different from those of existing biomolecules but function similarly or alternatively as some of them. Peptide nucleic acid (PNA) is a typical surrogate that emulates the hybridization behavior of DNAs and RNAs. By introducing analogs and surrogates into biochemical systems, we can alter or expand biochemical functions to create novel functions that have not been observed in the existing organisms. In particular, expansion of protein biosynthesizing system to include a variety of nonnatural amino acids is the subject of this chapter. The introduction of the 21st and more nonnatural amino acids requires expansion ofwhole steps in protein synthesis (central dogma) as illustrated in Fig. 5.1-1 [l-41. 1. Synthesis of nonnatural amino acids of desired functions. 2. Preparation of an orthogonal tRNA that cannot be aminoacylated by any aminoacyl-tRNAsynthetases (aaRSs)in the biochemical system. The orthogonal tRNA,
5.7 Synthetic Expansion ofthe Central Dogma
Fig. 5.1-1
Mechanism of protein synthesis (central dogma) and its expansion to include nonnatural amino acids.
once it has been aminoacylated with a nonnatural amino acid, must work like other aminoacylated tRNAs. 3. Aminoacylation of the orthogonal tRNA by a nonnatural amino acid. For in vivo synthesis of nonnatural mutant proteins, the aminoacylation must be tRNA specific, that is, must take place only on a particular orthogonal tRNA even in the presence of different types of tRNAs. 4. Modification of an elongation factor for translation (EF-Tu)to accept aminoacyl-tRNAs carrying nonnatural amino acids and to bring them into the A site of ribosome. 5. Expansion of the codon/anticodon pairs to assign positions of nonnatural amino acids in proteins. 6. Modification of the ribosome system to accept nonnatural amino acids. Steps 4 and 6 may not be serious, since both EF-Tu and ribosome are tolerant to accept all 20 naturally occurring amino acids and this tolerance may hold for some nonnatural amino acids also. However, if we want to incorporate large-sized nonnatural amino acids whose side chain structures are very different from the naturally occurring ones, we cannot postulate the tolerance of EF-Tu and ribosome. In these cases, we will also have to expand them.
I
273
274
I 5.1.2
5 Expanding the Genetic Code
Aminoacylation of tRNA with Nonnatural Amino Acids
5.1.2.1
Hecht Method for Chemical Aminoacylation of Isolated tRNAs
Since the enzymes for tRNA aminoacylation (aaRSs) show high specificity to particular amino acid and to particular tRNA, it is difficult, if not impossible, to obtain mutants that accept a specific nonnatural amino acid (aa*) and do not accept any naturally occurring ones. The aminoacylation for nonnatural amino acids, therefore, has to be carried out nonenzymatically. Nonenzymatic aminoacylation has been pioneered by Hecht and coworkers [S] (Fig. 5.1-2). They synthesized a 2'( 3') -aminoacylated mixed dinucleotide pCpA-aa*, then ligated it with a tRNA that lacks a pCpA unit at the 3' end. Later, the pCpA dinucleotide was replaced by a pdCpA unit to simplify the synthesis. The Hecht method is applicable to any types of amino acids and any types of tRNAs with relatively high yields. At present, the Hecht method has been employed most widely for aminoacylation of isolated tRNA in vitro. However, there are several drawbacks. First, a large-scale synthesis of pdCpA is difficult, although a few milligram quantity of pdCpA can be obtained through solid phase method. For a coupling of pdCpA with N-protected amino acid, the former must be solubilized into dimethylformamide through formation of tetrabutylammonium salt. This process is sometimes tricky, although this problem can be avoided by using cationic micelles as the reaction medium [GI.Ligation of the pdCpA-aa" to tRNA(-CA)by T4 RNA ligase must compete with formation of a cyclic tRNA as a by-product. Unfortunately, the cyclic tRNA works as an inhibitor of protein synthesis [7]. Of course, the Hecht method is not tRNA selective and it cannot be carried out for aminoacylation of a specific tRNA in vitro and in vivo. Nonenzymatic aminoacylation has been attempted by simpler procedures. Krzyzaniak et al. reported that aminoacylation took place when a solution of amino acid and tRNA was incubated under high pressures as GOO0 bar [8].
Fig. 5.1-2 acid.
Hecht method for chemical aminoacylation oftRNA with a nonnatural amino
5.1 Synthetic Expansion ofthe Central Dogma
I
However, they have not confirmed if the aminoacylated tRNA really works in vitro or in vivo.
5.1.2.2 Micelle-mediated Arninoacylation
Very recently, the author found that cationic rnicelles mediate arninoacylation of tRNAs with N-protected amino acid activated ester under ultrasonic irradiation (Fig. 5.1-3) [9].A cationic rnicelle, like CTACI rnicelle, solubilizes hydrophobic N-pentenoyl amino acid cyanomethyl ester inside the hydrophobic core, whereas the negatively charged tRNA molecules are concentrated on the positively charged rnicelle surface. The two components are separated inside and outside the rnicelle and do not react with each other as they stand still. When the mixture was ultrasonicated, the rnicellar structure may have fluttered and the reaction taken place. For example, when 5 mM of N-pentenoyl-~-2naphthylalanine cyanomethyl ester and 0.01 rnM tRNA were sonicated in a 90 mM imidazole buffer (pH 7.5) that contained 1 8 mM CTAC1, up to 75% yield of the aminoacylated tRNA was achieved within 10 minutes. Product analysis indicated that about 70% of the aminoacylation is occurring at the 2' or 3' OH group of the 3' end and no aminoacylation to the amino groups of the nucleobases occurs. This high regioselectivity is surprising, because there are 77 OH groups in the tRNA and most of them are exposed to the solvent. The rest of 30% arninoacylation occurs at the OH groups of other nucleotide units.
Fig. 5.1-3
Micelle-mediated aminoacylation under ultrasonic agitation
275
276
I Fortunately, the incorrectly aminoacylated tRNAs did not seriously inhibit 5 Expanding the Genetic Code
protein synthesis, presumably because they cannot bind to EF-Tu and cannot go into the A site of ribosome. Indeed, when the crude aminoacyl-tRNAwas added to Escherichia coli in vitro protein biosynthesizing system, a mutant protein incorporated with a 2-naphthylalanine was obtained. The success of micellar aminoacylation suggests that the t RNA aminoacylation is inherently specific to the 2’(3’)-OHgroup, presumably because of the high reactivity of the gem-diol group. A drawback of the micellar aminoacylation is that a small amount of the cationic detergent remains attached to the negatively charged tRNA. This may reduce the protein yield to some extent.
5.1.2.3 Ribozyme-mediated Aminoacylation
Suga and coworkers undertook a challenging work to create a surrogate of aaRS with their ribozyme technique (Fig. 5.1-4) [lo-131. Inspired by the fact that tRNAs are biosynthesized through cleavage of 5’ flankers, they attached a random RNA sequence at the 5‘ end of a tRNA to obtain a library of extended
Fig. 5.1-4
Ribozyme-mediated aminoacylation.
5.1 Synthetic Expansion ofthe Central Dogma
tRNAs. From the library, they selected those that undergo self-aminoacylation with a biotinylated amino acid cyanomethyl ester. The identified RNA sequence worked as an artificial aaRS even after it was cleaved off from the original tRNA. Because the ribozyme is flexible enough to aminoacylate a wide variety of tRNAs that have a common ACCA 3’ end, with a variety ofp-substituted phenylalanine derivatives, it was named as a Jexizyrne. After optimization and minimization of the RNA sequence, the flexizyme was charged onto a columnar gel. The flexizyme column can aminoacylate tRNAs with a variety of p-substituted phenylalanine cyanomethyl esters simply by passing a tRNA with an amino acid cyanomethyl ester through the column [14-161. The aminoacylated tRNA has been shown to work in E. coli in vitro system to introduce the p-substituted phenylalanine derivatives into proteins. Recently, the flexizyme has been given tRNA specificity by extending its 3’ end with a complementary chain to a specific tRNA [17].
5.1.2.4 PNA-assisted Aminoacylation
Recently, the author’s group developed another aminoacylation method using PNA [18] as a tRNA-recognizing molecule (Fig. 5.1-5) [19]. An amino acid thioester was linked through a spacer to a 9-mer PNA that is complementary to the 3’ region of a tRNA. When the PNA was hybridized with the tRNA, the amino acid thioester comes close to the 3’ OH group of the tRNA, provided the
Fig. 5.1-5
PNA-mediated aminoacylation.
I
277
278
I spacer chain is properly designed. The PNA must bind to a specific tRNA, but 5 Expanding the Genetic Code
not too tightly, otherwise it will remain attached after the aminoacylation and retard or even inhibit the protein synthesis. In the case of yeast phenylalanine tRNA, the 9-mer PNA was the best choice, but the chain lengths had to be optimized for other tRNAs. Addition of an equimolar amount of the aa*-Ssp-PNA conjugate to the tRNA gave 40-50% yield of aminoacylation against yeast phenylalanine tRNA. The PNA-assisted aminoacylation was specific to a target tRNA that has a complementary 3‘-region to the PNA in an E. coli S30 in uitro protein synthesizing system that contained a variety of endogenous tRNAs. When we put a 2-naphthylalanine thioester-spacer-PNA conjugate together with an orthogonalized yeast phenylalanine tRNA into the S30 system, the nonnatural amino acid was successfully incorporated into the target protein. The PNA-assisted aminoacylation/in vitro translation system is currently the simplest way to obtain nonnatural mutants, if the relevant compound is given. Since this is a chemical expansion of the aminoacylation process, it will be applicable to a wide variety of nonnatural amino acids and different tRNAs. The PNA-assisted aminoacylation is specific to a complementary tRNA and is potentially effective in a living cell. The only obstacle against the in uiuo aminoacylation is that the Nielsen-type PNA does not easily penetrate through cell membranes. Efforts to design different types of PNAs that can penetrate through cell membranes are in progress [20, 211.
5.1.2.5 Directed Evolution of Existing aaRS/tRNA Pair to Accept Nonnatural Amino Acids
An alternative approach to the nonnatural aminoacylation is to alter substrate specificity of existing aaRSs. This is not an easy task, since aaRSs show rigorous specificity to a particular amino acid and to a particular tRNA, and link the former specifically to the 3’ or 2’-OH group of the latter. The rigorous specificity must maintain the fidelity of the translation process. Schultz and coworkers, however, constructed a sophisticated selection scheme to find a mutant of aaRS that aminoacylates a particular tRNA with a specific nonnatural amino acid, but not with any of natural amino acids [22, 231. They started from a TyrRS/tRNA pair of Methanococcas jannaschi and mutated its tRNA structure not to accept any natural amino acids by the endogenous aaRSs in the E. coli system (Fig. 5.1-6). The mutated tRNA/TyrRS pair worked as an orthogonal aaRS/tRNA pair in the E. coli system independently from the endogenous aaRS/tRNA pairs [22].Next, they mutated the TyrRS structure not to accept Tyr or any other natural amino acids (Fig. 5.1-7), but to accept only 0-methyltyrosine (Fig. 5.1-8) [23].They introduced the orthogonal tRNA/aaRS pair into an E. coli and obtained a first living cell that incorporates 0methyltyrosine as a 21st amino acid into a protein (Fig. 5.1-9). By using a
5.I Synthetic Expansion ofthe Central Dogma
Fig. 5.1-6
Selection oftRNAs that are not aminoacylated by any o f t h e aaRSs in E. coli.
Fig. 5.1-7 Negative selection for eliminating TyrRS mutants that aminoacylate the orthogonal tRNA with Tyr or any o f natural amino acids in E. coli.
I
279
280
I
5 Expanding the Genetic Code
Fig. 5.1-8 Positive selection for picking up TyrRS mutants that aminoacylate the orthogonal tRNA with 0-methyltyrosine.
Fig. 5.1-9 Expanded living organism that produces proteins including a nonnatural amino acid as the 21 st one.
5.I Synthetic Expansion ofthe Central Dogma
similar procedure, they introduced various nonnatural amino acids into living cells [24-26]. Later, they put the orthogonal tRNA/aaRS pair together with an enzyme that synthesizes p-aminophenylalanine from basic carbon sources [27].This is the first example of a cell that self-creates a 21st amino acid and lives with it. Yokoyama and coworkers also used a similar approach to find an orthogonal aaRS/tRNA pair that works in mammalian cells. They used the orthogonal pair to incorporate iodotyrosine into proteins [28, 291. The i n vivo system that produces proteins in which iodine atoms are incorporated at specific positions will find applications in large-scale production of heavy-atom labeled proteins for X-ray analysis. The elegant approaches of Schultz and Yokoyama are, however, typical examples of biological expansion. It is not surprising, therefore, that their screening processes, so far, produced aaRS/tRNA pairs only for amino acids that are not far from the naturally occurring ones. It seems difficult, if not impossible, to identify aaRS/tRNA pairs that can introduce large-sized amino acids from their screening processes. Since nonnatural amino acids of specialty functions, like fluorescence, electron donating, and accepting functions, often carry large side groups, a more widely applicable method for aminoacylation is needed. At this moment, aminoacylation of tRNA with a nonnatural amino acid is still a bottleneck step for nonnatural mutagenesis both in vitro and i n vivo. Hecht method is versatile to almost any types of amino acids, but can be done only for isolated tRNAs in a test tube. Further, the aminoacylation step of pdCpA is sometimes tricky. For aminoacylation in a test tube, micellemediated method is easier than the Hecht method, at least for some types of amino acids. The ribozyme technique of Suga is applicable to a variety of p-substituted phenylalanines and to a wide variety of tRNAs. This is, at present, the simplest and most dependable method of aminoacylation for isolated tRNAs. It has not been, however, applied to i n vivo systems and to large-sized amino acids. Our PNA-assisted aminoacylation method may also be applicable to a wide variety of amino acids and tRNAs. Since the PNAassisted aminoacylation is tRNA selective, it works as a potential amino acid donor in living cells. The orthogonal tRNA/aaRS pairs reported by Schultz and by Yokoyama are effective in some nonnatural amino acids with small side groups, but they have not been applied to large-sized amino acids, so far. 5.1.3 Other Biornolecules That Must Be Optimized for Nonnatural Amino Acids 5.1.3.1
Orthogonal tRNAs
As pointed out above, the tRNA to be used as a carrier of nonnatural amino acid must not be aminoacylated by any aaRSs in the system, but once it
I
281
282
5 Expanding the Genetic Code
I is aminoacylated with a nonnatural amino acid by any means, it must work efficiently as an ordinary aminoacyl-tRNA. In the Schultz’s case, the orthogonal tRNA has to be selected as an orthogonal tRNA/aaRS pair. This imposes tough restrictions on the tRNA structures and makes it difficult to identify rigorously orthogonal and highly efficient tRNAs for a nonnatural amino acid. Whether the aminoacylation would be carried out for isolated tRNAs, or for a specific tRNA with a ribozyme or with an amino acid-PNA conjugate, the orthogonal condition has to be satisfied only against aaRSs in the system. Namely, the tRNA must be protected from the attack of endogenous aaRSs, but does not have to be a specific and efficient substrate of an engineered aaRS for a nonnatural amino acid. Under these relaxed conditions, we have found several orthogonal tRNAs that efficiently deliver a nonnatural amino acid to the E. coli ribosomal system [30]. We started with tRNAs having nonstandard secondary structures, such as those in mitochondria and other species, and added small changes on their stem structures. The tRNAs were examined for their ability of exclusive introduction of a nonnatural amino acid into a protein in E. coli in vitro protein synthesizing system. The nonstandard tRNAs that carry a CCCG four-base anticodon were absolutely protected from the attack by the endogenous aaRSs in the E. coli system. Fortunately, however, some of the nonstandard tRNAcCCGS, when they were chemically aminoacylated with p-nitrophenylalanine, very efficiently decoded a CGGG four-base codon on the streptavidin mRNA to introduce the nonnatural amino acid. The results indicate that the tRNAs of nonstandard structures make a good starting point toward finding orthogonal tRNAs as carriers of nonnatural amino acids. Some of the orthogonal tRNAs that have been identified to work efficiently as carriers of nonnatural amino acids in E. coli system are listed in Fig. 5.1-10.
Suga
Schultz
Schultz with yeast Phe acceptor stem
Bovine mt tKNA’er,,,,
Fig. 5.1-10 Orthogonal tRNAs that are not aminoacylated by any of natural amino acids in E. coli, but can bring a nonnatural amino acids efficiently into the ribosome A site.
5.1 Synthetic Expansion ofthe Central Dogma
5.1.3.2 Adaptability of EF-Tu to Aminoacyl-tRNAs Carrying a Wide Variety of Nonnatural Amino Acids
Aminoacyl-tRNAs that carry nonnatural amino acids enter into the A site of ribosome with the aid of an enzyme called an elongation factor, EFTu. Only a single type of EF-Tu molecule exists in E. coli and it delivers all types of aminoacyl-tRNAs into the ribosome A site. Therefore, the EF-Tu molecule has an adaptability to bind a wide range of aminoacyltRNAs, presumably, including those with some nonnatural amino acids. Our preliminary experiment indicates that the E. coli EF-Tu binds yeast phenylalanine tRNA that carries a variety of nonnatural amino acids with, however, reduced affinities [31]. Aminoacyl-tRNAs carrying bulky nonnatural amino acids, like 1-pyrenylalanine bind very weakly to the EF-Tu. Although the binding affinity to EF-Tu may not be directly proportional to the incorporation efficiency, it is clear that insufficient binding to EF-Tu leads to unsuccessful incorporation of the nonnatural amino acid. Design and synthesis of engineered EF-Tus that bind wider range of aminoacyl-tRNAs with bulky nonnatural amino acids, are now in progress.
5.1.3.3 Adaptability of Ribosome to Wide Variety o f Nonnatural Amino Acids
Since the peptide bonds form in the ribosome, its expansion to accept wide range of nonnatural amino acids will be the final target. It is somewhat surprising that amino acids that carry large side groups like those shown in Fig. 5.1-11 (left) have been incorporated into proteins in fairly high yields in E. coli and other biosynthesizing systems [32]. This indicates that the ribosomes of various species are very tolerant to a wide variety of amino acids even beyond the naturally occurring ones. At the same time, however, there are kinds of nonnatural amino acids that are rigorously rejected from the ribosome, although their side groups are not very bulky [32]. Some examples are shown in Fig. 5.1-11 (right).Typically, D-amino acids have been rigorously rejected by the E. coli ribosome [33, 341. Similarly, our recent experiment suggests that 9-anthrylalanine is rigorously rejected [32], even though chemically aminoacylated yeast Phe tRNA with 9-anthrylalanine binds to EF-Tu with somewhat reduced affinity [31]. The adaptability of E. coli ribosome has been investigated by using puromycin analogs that carry a variety of nonnatural amino acids [35]. Since puromycin is known to bind to the ribosomal A site without assistance of EF-Tu, the extent of the inhibition of translation by the puromycin analogs can be a direct measure of the adaptability of the A site to a variety of nonnatural amino acids. The inhibition efficiency indicated that some aromatic amino acids that carry widely expanded side groups, like 9-anthrylalanine and 1-pyrenylalanine, are
I
283
284
I
5 Expanding the Genetic Code \
'
COOH
v
COOH
COOH
R I
I
NrC=O
NH
I
o=s=o
D-Amino acids
Relatively small amino acids that are rejected by E.coli I
NMe,
Relatively large amino acids that are allowed by E.coli ribosome
/
Fig. 5.1-1 1
Relatively large-sized nonnatural amino acids that are efficiently incorporated into proteins and small-sized ones that cannot be incorporated into proteins.
not accepted by the A site. Recently, Roberts and coworkers also showed that analogs carrying D-aminO acids or ,!?-aminoacids are little bound to the A site, although they did not carry very large side groups [36]. These facts suggest that the inner structure of A site is very critical to reject some types of amino acids and even small modifications of its structure will expand its amino acid adaptability significantly. Indeed, Hecht and coworkers showed that an E. coli ribosome with 23s rRNA with a UGGCA sequence instead of GAUAA in the region 2447-2451, accepts D-amino acids to some extent [37].Elaboration on the ribosome structure will open a way to synthesize proteins that contain much wider variety of nonnatural amino acids.
5.1.4 Expansion o f the Genetic Codes 5.1.4.1
Amber and Other Stop Codons
The second key step for the expansion of the biosynthesizing system to introduce nonnatural amino acids is the expansion of the genetic codes. Schultz [38]and Chamberlin [39]first assigned an amber (UAG) stop codon to a nonnatural amino acid (aa"). By adding an aa"-tRNA with a CUA anticodon as a suppressor of the amber codon, they successfully introduced the nonnatural amino acid at that position. Since then, the amber suppression method has been employed by a number of researchers. This method is advantageous in that an unsuccessful decoding of the UAG codon automatically leads to
5.I Synthetic Expansion ofthe Central Dogma
truncation of the protein synthesis. No full-length protein that erroneously contains one of the 20 naturally occurring amino acids is produced, provided that the tRNA is rigorously orthogonal. One of the drawbacks of the stop-codon suppression method is that only one or two of the three stop codons (UAG, UAA, UGA) can be assigned to nonnatural amino acids and, therefore, only one or two nonnatural amino acids can be incorporated into a single protein. This restricts the application of the nonnatural mutagenesis. It is not trivial that the amber suppression method can be used in living cells, because some of essential proteins may not be synthesized properly in the presence of a large amount of the aminoacylated suppressor tRNA. However, the amber suppression method has been reported to work successfully in Xenopus oocyte [40, 411, E. coli [23-251, and mammalian cells [28, 42-44]. 5.1.4.2 Four-base Codons We have demonstrated that several four-base codons like CGGG and AGGU can be used independently in the framework of the existing three-base codon system [45, 461. The idea of the four-base codon has been inspired from the naturally occurring frame-shift suppression. An undesired frame shift that originates from an insertion of one nucleotide unit can be suppressed by a frame-shift suppressor tRNA that contains a four-base anticodon. Similar to the frame-shift suppressor tRNA, some of the four-base codons can be successfully decoded by artificial frame-shift suppressor tRNAs that contain the complementary four-base anticodons. Unsuccessful translation of a fourbase codon as the corresponding three-base codon causes an undesired frame shift, but it often leads to an encounter of a stop codon downstream (Fig. 5.1-12).Therefore, the four-base codon method, like the amber method, gives exclusively a full-length protein that contains a nonnatural amino acid at that position and an undesired decoding as a three-base codon gives a truncated protein. The probability of the undesired three-base codon decoding can be reduced by choosing rare codons as the first three bases of the four-base codons. The most remarkable advantage of the four-base codons as compared with the amber codon is that we can incorporate two or more different nonnatural amino acids into single proteins [47, 481. We have identified five different four-base codons that work independently in E. coli system, namely, AGGU, CGGG, GGGU, CUCU, and CCCU [4G]. Similarly, CGGU(CGCU), CCCU, CUCU(CUAU), and GGGU work efficiently in the lysate of rabbit reticulocyte [49]. Since they are independent and orthogonal to each other, we can introduce, in theory, up to five different nonnatural amino acids into a single protein in E. coli system, and up to four in the rabbit system. In practice, however, because of the reduced incorporation efficiencies of nonnatural amino acids, the maximum number of nonnatural amino acids in a single protein is limited to three, at this moment. The multiple incorporation has
I
285
286
I
5 Expanding the Genetic Code
Fig. 5.1-12
Principle o f the four-base codon strategy.
been actually demonstrated by introducing a fluorophore-quencher pair into single streptavidin [48]. Four-base codons can be used in conjunction with stop codons for multiple incorporations [SO, 511. It is argued that the extension of the lengths of codons and anticodons might cause steric overcrowding between the tRNAs in the ribosomal A site and P site. The overcrowding in ribosome, however, has been avoided by a bend of mRNA chain at the junction between the A and P sites [52]. Because of this bend, the main bodies of the two tRNAs are well separated, while the two anticodons as well as the amino acid and the peptide C-terminal are close to each other. Indeed, even five-base codons [ 5 3 ] and a tandem four-base codon [54] have been reported to be successful. Similar to the amber codon method, four-base codon method has been shown to work in living cells [55].
5.1.4.3 "Synthetic Codons" That Contain Nonnatural Nucleobases
Nonnatural nucleobases are another important and challenging area of chemical biology. Benner reported that isoC-isoG pair works as an orthogonal base pair in addition to the existing A-T and G-C pairs (Fig. 5.1-13) [SG].
5. I Synthetic Expansion ofthe Central Dogma
isoC
*H
isoG
Benner Fig. 5.1-13
Hirao, Yokoyaina
Hirao, Yokoyama
Nonnatural base pairs that are orthogonal to the A-T and G-C pairs.
The “synthetic codon/anticodon pair”, like isoCAG/CUisoG has been actually used to assign a nonnatural amino acid in an E. coli in vitro system [57]. Hirao and Yokoyama reported that a y-s pair also works as an orthogonal base pair. The y-s pair is advantageous because “s” on DNA can be transcribed to “y” on mRNA with high enough fidelity in the presence of yTP. The resulting synthetic codon yAG was successfully translated by a tRNA containing the corresponding synthetic anticodon CUs [58, 591. Unfortunately, transcription of “y” on DNA to “s” on RNA was not accurate enough and the tRNAcus had to be synthesized chemically. Recently, they reported an improved version of the nonnatural base pair, s-z pair, to solve this problem [GO]. Nonnatural base pairs have also been explored by Schultz’s group, using hydrophobic interactions as the unique forces for base pairing [ G l ] .
5.1.5 In vivo Synthesis o f Nonnatural Mutants
So far, the nonnatural mutants have been synthesized mostly in cell-free in vitro protein synthesizing system, mainly because chemical aminoacylation had to be carried out for isolated tRNAs in a test tube. In vivo synthesis of nonnatural mutant proteins is advantageous because it produces a much larger amount of mutant proteins and provides opportunity for in vivo test of drugs and other small molecules by selective fluorescent labeling of target proteins in vivo. For an in vivo synthesis of nonnatural mutants, the aminoacylation has to be carried out for a specific tRNA with a specific nonnatural amino acid. At this moment, the in vivo aminoacylation has been successfully carried out only by engineered aaRSs that have been selected to accept a specific nonnatural amino acid [23-291. As mentioned above, however, the engineered aaRSs have been successful only for small-sized amino acids, and no successful result has been reported for those carrying large-sized amino acids, like fluorescent ones.
I
287
288
5 Expanding the Genetic Code
I Although ribozyme- and PNA-assisted aminoacylation are potentially tRNA
specific and would work as aminoacylating agents in vivo, their application in living cells has not been reported, yet. Import of aminoacyl-tRNA into living cells is another approach toward in vivo production of nonnatural mutant proteins. Dougherty and coworkers microinjected [41]or electroporated [44]an aminoacyl-tRNA/mRNA pair into Xenopus oocyte to synthesize fluorescently labeled acetylcholine receptor. The microinjection method is applicable to any type of tRNA and amino acid, but the number of cells that can be treated at one time is very limited. RajBhandary and coworkers [42, 431 showed that aminoacyl-tRNAs can be imported safely by the use of transfection reagents (Fig. 5.1-14). By importing two types oftRNAs, one for suppressing amber (UAG) codon and the other for suppressing ocher codon, that are preaminoacylated with different amino acids they successfully obtained a multiply mutated protein in a mammalian cell. The transfection method is also applicable to any type oftRNA and amino acid and to a wide variety ofcells. A possible drawback ofthis method is the short lifetime of aminoacyl-tRNAs that is often less than an hour at neutral pH ranges, whereas most of the transfection reagents form endosomes that are stable in cytoplasm for a few hours or even a day. Fortunately, however, since the pH value inside the endosome is estimated to be about 4,significant amount of aminoacyltRNAs will be still remaining until the breakdown of endosome. Despite these
Fig. 5.1-14 Import oftRNA aminoacylated with nonnatural amino acids into a living cell through endocytosis.
5.I Synthetic Expansion ofthe Central Dogma
facts, for the transfection method to be efficient, the endosomes must be broken in the cytoplasm as quickly as possible, or alternatively, another technique that leads to direct penetration of aminoacyl-tRNA must be developed.
5.1.6 Application o f Nonnatural Mutagenesis - Fluorescence Labeling
Nonnatural mutagenesis has been finding applications in probing protein functions and structures, in glycosylation [62-641 and phosphorylation [65] as alternative routes to the posttranslation modifications, in controlling protein functions by external factors like photoirradiation, and so on. Since the amount of mutant proteins produced in conventional in vitro system is usually less than a microgram, fluorescence labeling seems the most practical and promising application. Position-specific fluorescence labeling is a key step in vast biochemical fields including in vitro and in vivo proteome analysis and protein network analysis, in vitro and in vivo conformational analysis, and single molecular spectroscopic analysis. A variety of fluorescent amino acids have been synthesized and examined for their incorporation into proteins. The fluorescent amino acids that show excitation wavelengths longer than 350 n m and have been successfully incorporated into proteins are listed in Fig. 5.1-15 [66-731. When polarity-sensitive fluorescent amino acids, like 1, 2, 4, 5, and G were incorporated into antibodies, receptors, and enzymes, the mutants worked as sensors for the antigens, ligands, and substrates or inhibitors. For the fluorescently labeled proteins to be sensitive enough, however, the fluorophore must be located at a specific position where binding of low-molecular-weight compound causes polarity change around the fluorophore, but, at the same time, the body of the fluorophore should not disturb the binding of the low-molecular-weight compounds. In short, the fluorophore must be located not too close to, but not too far from the binding site. Only position-specific incorporation of fluorescent amino acids can satisfy the conflicting conditions. When an acridonylalanine (acdAla)was incorporated at different positions of camel single-chain antibody against hen-lysozyme, the TyrlO6acdAla mutant sensitively responded to the binding of nanomolar concentration of the antigen, whereas the Trpl23acdAla mutant was insensitive to the binding (Fig. 5.1-16) [71].When the same fluorescent amino acid was incorporated into streptavidin, some mutants responded to even a picomolar quantity of biotin [71].The lower limit of the detectable concentration is determined not by the fluorescence sensitivity, but by the dissociation constants of the protein-small molecule interactions. Incorporation of two different fluorescent amino acids into single proteins can expand the scope of fluorescence analysis from the simple quenching analysis as described above to a detailed study on conformational changes associated with folding processes. Fluorescence resonance energy transfer
I
289
290
I
5 Expanding the Genetic Code COOH
COOH
1
COOH
COOH
3
4
2
COOH
COOH
COOH
COOH
H 2 N 3
NH
I
o=s=o
5 $
6
7
NMe, Fig. 5.1-15
Nonnatural amino acids carrying fluorescent groups, that have been incorporated into proteins with high efficiency.
Fig. 5.1-16
Detection of antigen molecule by a fluorescently labeled antibody.
References I 2 9 1
(FRET)is often the method ofchoice [53]because it is based on firm theoretical background and has been experimentally shown to obey the Forster’s l / r 6 distance dependence, provided that the orientation factor has been averaged out [74]. The only restriction at present is that the types of fluorescent amino acids for energy donors and energy acceptors are very limited as listed in Fig. 5.1-15.
5.1.7 Future Development and Conclusion
Basic strategy ofnonnatural mutagenesis was first reported more than 15 years ago, as a promising technology for structural and functional analyses of proteins in vitro and in vivo and for creating proteins of specialty functions. However, it still remained a special method for only a limited number of researchers, mainly because of the lack of an easy way of aminoacylation and lack of appropriate nonnatural amino acids for useful applications. Fortunately, facile and dependable methods for aminoacylation are now available and several nonnatural amino acids reported recently appear to be really useful for fluorescence labeling, glycosylation, phosphorylation, and other applications. Commercialization of the reagents for aminoacylation and the nonnatural amino acids carrying specialty side groups will further accelerate the prevalence of this method. Nonnatural mutagenesis is a unique method that enables position-specific labeling with a variety of functional groups. Further, the labeling can be done even in living cells. No alternative technique can do this. Wide application of this method will open a new area in protein research in general and, especially, in drug discovery and protein network analysis.
Acknowledgments
Recent experimental results from our laboratory described in this chapter have been obtained by a support from a Grand-in-Aid for Scientific Research of the Ministry of Education, Science, Sports, and Culture, japan (No. 15101008).
References 1.
2.
T. Hohsaka, M. Sisido, Incorporation of non-natural amino acids into proteins, Curr. O p k . Chem. Bid. 2002, 6,809-81s. M. Sisido, Proteins containing nonnatural amino acids, in
Biopolymers, Vol. 8 (Eds.: A. Steinbiichel, S.R. Fahnestock),
Chapter 2, Wiley-VCH, Weinhelm, Germany, 2002, pp. 26-49. 3. M. Sisido, Synthetic expansion of the central dogma: chemical
292
I
5 Expanding the Genetic Code
4.
5.
6.
7.
8.
9.
10.
11.
12.
aminoacylation, 4-base codons and nonnatural mutagenesis, in Peptide Revolution: Genomics, Proteomics @ Trterupeutics”, Proceedings ofthe Eighteenth Awlencan Peptide Symposium (Eds.: M. Chorev, T.K. Sawyer),American Peptide Society, Cardiff, CA, USA, 2004, pp. 294-300. C. Kohrer, U.L. RajBhandary, Proteins with one or more Unnatural Amino Acids, in 7 h e Aminoacyl-tRNA Synthetases (Eds.: M. Ibba, C. Francklyn, S. Cusack), Landes Bioscience, Georgetown, Texas, USA, 2005. T.G. Heckler, L.H. Chang, Y. Zama, T. Naka, M.S. Chorghade, S.M. Hecht, T4 RNA ligase mediated preparation of novel “chemically misacylated” tRNAPhes,Biochemistry 1984, 23, 1468- 1473. K. Ninomiya, T. Kurita, T. Hohsaka, M. Sisido, Facile aminoacylation of pdCpA dinucleotide with a nonnatural amino acid in cationic micelle, Chem. Commun2004,2242-2243. K. Yamanaka, H. Nakata, T. Hohsaka, M. Sisido, Efficient synthesis of nonnatural mutants in E. coli in vitro protein synthesizing system, J. Biosci. Bioeng. 2004, 97, 395-399. A. Krzyzaniak, P. Salanski, J. Jurczak, T. Twardowski, J. Barciszewski, tRNA aminoacylated at high pressure is correct substrate for protein biosynthesis, Biochem. Mol. Biol. Int. 1998,45,489-500. N. Hashimoto, K. Ninomiya, T. Endo, M. Sisido, Simple and quick chemical aminoacylation of tRNA in cationic micellar solution under ultrasonic agitation, Chem. Commun. 2005, 4321-4323. N. Lee, Y. Bessho, K. Wei, J.W. Szostak, H. Suga, Ribozyme-catalyzed tRNA aminoacylation, Nut. Strut. Biol. 2000, 7, 28-34. H. Saito, H. Suga, A ribozyme aminoacylates exclusively on the 3’-hydroxylgroup of the 3’-terminus of tRNA, J. Am. Chem. SOC.2001, 123, 7178-7179. Y. Bessho, D.R.W. Hodgson, H. Suga, A tRNA aminoacylation system for
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
non-natural amino acids based on a programmable ribozyme, Nut. Biotechnol. 2002, 20, 723-728. H. Saito, D. Kourouklis, H. Suga, An in vitro evolved precursor tRNA with aminoacylation activity, EMBO J. 2001, 20,1797-1806. H. Murakami, N.J. Bonzagni, H. Suga, Aminoacyl-tRNAsynthesis by a resin-immobilized ribozyme, J. Am. Chem. SOC.2002, 124,6834-6835. H. Murakami, H. Saito, H. Suga, A versatile tRNA aminoacylation catalyst based on RNA, Chem. Biol. 2003, 10, 655-662. H. Murakami, D. Kourouklis, H. Suga, Using a solid-phase ribozyme aminoacylation system to reprogram the genetic code, Chem. Biol. 2003, 10, 1077-1084. H. Saito, H. Murakami, K. Shiba, K. Ramaswamy, H. Suga, Designer ribozymes: programming the tRNA specificity into flexizyme,J. Am. Chem. SOC.2004, 126,11454-11455. P.E. Nielsen, M. Egholm, R.H. Berg, 0. Buchardt, Sequence selective recognition of DNA by strand displacement with a thymine-substituted polyamide, Science 1991,254,1497-1500. K. Ninomiya, T. Minohata, M. Nishimura, M. Sisido, In situ chemical aminoacylation with amino acid thioesters linked to a peptide nucleic acid, J. Am. Chem. SOC.2004, 126,15984-15989. M. Kitamatsu, M. Shigeyasu, T. Okada, M. Sisido, Oxy-peptide nucleic acid with a pyrrolidine ring that is configurationally optimized for hybridization with DNA, Chem. Commun. 2004,1208-1209. M. Kitamatsu, M. Shigeyasu, M. Saitoh, M. Sisido, Configurational preference of pyrrolidine-based oxy-peptidenucleic acids as hybridization counterparts with DNA and RNA, Biopolymers Pept. Sci. 2006, 84,267-273. L. Wang, P.G. Schultz, A general approach for the generation of orthogonal tRNAs, Chem. Biol. 2001, 8, 883-890.
References I 2 9 3 23.
24.
25.
26.
27.
28.
29.
30.
31. 32.
L. Wang, A. Brock, B. Herberich, P.G. Schultz, Expanding the genetic code of Escherichia coli, Science 2001, 292, 498-500. L. Wang, A. Brock, P.G. Schultz, Adding L-3-(2-naphthyl)alanineto the genetic code of E.coli, J. Am. Chem. SOC.2002, 124, 1836-1837. J.W. Chin, S.W. Santoro, A.B. Martin, D.S. King, L. Wang, P.G. Schultz, Addition of p-azido-L-phenylalanine to the genetic code of Escherichia coli, J . Am. Chem. SOC. 2002, 124,9026-9027. J.W. Chin, T.A. Cropp, J.C.Anderson, M. Mukherji, Z. Zhang, P.G. Schultz, An expanded eukaryotic genetic code, Science 2003, 301, 964-967. R.A. Mehl, J.C. Anderson, S.W. Santoro, L. Wang, A.B. Martin, D.S. King, D.M. Horn, P.G. Schultz, Generation of a bacterium with a 21 amino acid genetic code, J. Am. Chem. SOC.2003, 125,935-939. D. Kiga, K. Sakamoto, K. Kodama, T. Kigawa, T. Matsuda, T. Yabuki, M. Shirouzu, Y. Harada, H. Nakayama, K. Takio, Y. Hasegawa, Y. Endo, I . Hirao, S. Yokoyama, An engineered Escherichia coli tyrosyl-tRNA synthetase for site-specific incorporation of an unnatural amino acid into proteins in eukaryotic translation and its application in a wheat germ cell-free system, Proc. Natl. Acnd. Sci. U. S. A. 2002, 99,9715-9720. K. Sakamoto, A. Hayashi, A. Sakamoto, D. Kiga, H. Nakayama, A. Soma, T. Kobayashi, M. Kitabatake, K. Takio, K. Saito, M. Shirouzu, I . Hirao, S. Yokoyama, Site-specific incorporation of an unnatural amino acid into proteins in mammalian cells, Nucleic Acids Res. 2002, 30, 4692-4699. T. Manabe, T. Ohtsuki, M. Sisido, Design and synthesis of orthogonal tRNAs of nonstandard structures as carriers of nonnatural amino acids in E.coli in vitro protein synthesizing system, in preparation. H. Nakata, T. Ohtsuki, M. Sisido, in preparation. T. Hohsaka, D. Kajihara, Y. Ashizuka, H. Murakami, M. Sisido, Efficient
33.
34.
35.
36.
37.
38.
39.
40.
incorporation of nonnatural amino acids with large aromatic groups into streptavidin in in vitro protein synthesizing systems, J . Am. Chem. SOC.1999, 121, 34-40. J.R. Roesser, C. Xu, R.C. Payne, C.K. Surratt, S.M. Hecht, Preparation of misacylated aminoacyl- tRNAPhes useful as probes of the ribosomal acceptor site, Biochemistry 1989, 28, 5185-5195. J.D. Bain, E.S. Diala, C.G. Glabe, D.A. Wacker, M.H. Lyttle, T.A. Dix, A.R. Chamberlin, Site-specific incorporation of nonnatural residues during in vitro protein biosynthesis with semi-synthetic aminoacyl-tRNAs, Biochemistry 1991, 30, 5411-5421. T. Hohsaka, K. Sato, M. Sisido, K. Takai, S. Yokoyama, Adaptability of nonnatural aromatic amino acids to the active center of E. Coli ribosomal A site, FEBS Lett. 1993, 335, 47-50. S.R. Starck, X. Qi, B.N. Olsen, R.W. Roberts, The puromycin route to asses stereo- and regiochemical constraints on peptide bond formation in eukaryotic ribosomes, J . Am. Chem. SOC.2003, 125,8090-8091. L.M. Dedkova, N.E. Fahmi, S.Y. Golovine, S.M. Hecht, Enhanced D-amino acid incorporation into protein by modified ribosomes, J. Am. Chem. SOC.2003, 125,6616-6617. C.J. Noren, S.J. Anthony-Cahill, M.C. Griffith, P.G. Schultz, A general method for site-specific incorporation of unnatural amino acids into proteins, Science 1989, 244, 182-188. J.D. Bain, C.G. Glabe, T.A. Dix, A.R. Chamberlin, E.S. Diala, Biosynthetic site-specific incorporation of a non-natural amino acid into a polypeptide, J. Am. Chem. SOC.1989, 111, 8013-8014. M.W. Nowak, P.C. Kearney, J.R. Sampson, M.E. Saks, C.G. Labarca, S.K. Silverman, W. Zhong, J. Thorson, J.N. Abelson, N. Davidson, P.G. Schultz, D.A. Dougherty, Nicotinic receptor binding site probed with unnatural amino acid incorporation in intact cells, Science 1995, 268, 439-442.
294
I
5 Expanding the Genetic Code 41.
42.
43.
44.
45.
46.
47.
48.
D.A. Dougherty, Unnatural amino acids as probes of protein structure and function, Cum. Opin. Chem. Biol. 2000,4,645-652. C. Kohrer, L. Xie, S. Kellerer, U. Varshney, U.L. RajBhandary, Import of amber and ochre suppressor tRNAs into mammalian cells: a general approach to site-specific insertion of amino acid analogues into proteins, Proc. Natl. Acad. Sci. U. S . A. 2001, 98,14310-14315. C. Kohrer, J.-H.Yoo, M. Bennett, J. Schack, U.L. RajBhandary, A possible approach to site-specific insertion of two different unnatural amino acids into proteins in mammalian cells via nonsense suppression, Chem. Biol. 2003, 10, 1095-1102. S.L. Monahan, H.A. Lester, D.A. Dougherty, Site-specificincorporation of unnatural amino acids into receptors expressed in mammalian cells, Chem. Biol. 2003, 10, 573-580. T. Hohsaka, Y. Ashizuka, H. Murakami, M. Sisido, Incorporation of nonnatural amino acids into streptavidin through in vitro frame-shift suppression, J . Am. Chem. SOC. 1996, 118,9778-9779. T. Hohsaka, Y. Ashizuka, H. Taira, H. Murakami, M. Sisido, Incorporation of nonnatural amino acids into proteins by using various four-base codons in an Escherichia coli in vitro translation system, Biochemistry2001,40, 11060-11064. T. Hohsaka, Y. Ashizuka, H. Sasaki, H. Murakami, M. Sisido, Incorporation of two different nonnatural amino acids independently into a single protein through extension of the genetic code, J . Am. Chem. SOC.1999, 121, 12194-12195. M. Taki, T. Hohsaka, H. Murakami, K. Taira, M. Sisido, Position-specific incorporation of a fluorophore-quencher pair into a single streptavidin through orthogonal four-base codon/anticodon pairs, 1. Am. Chem. SOC.2002, 124, 14586-14589.
49.
50.
51.
52.
53.
54.
55.
56.
57.
H. Taira, M. Fukushima, T. Hohsaka, M. Sisido, Four-base codon-mediated incorporation of nonnatural amino acids into proteins in a eukaryotic cell-freetranslation system, J. Biosci. Bioeng. 2005, 99,473-476. R.D. Anderson, J. Zhou, S.M. Hecht, Fluorescence resonance energy transfer between unnatural amino acids in a structurally modified dihydrofolate reductase, J. Am. Chem. SOC.2002, 124,9674-9675. S.W. Santoro, J.C. Anderson, V. Lakshman, P.G. Schultz, An archaebacteria-derived glutamyl-tRNA synthetase and tRNA pair for unnatural amino acid mutagenesis of proteins in Escherichia coli, Nucleic Acids Res. 2003, 31, 6700-6709. M.M. Yusupov, G.Z. Yusupova, A. Baucom, K. Lieberman, T.N. Earnest, J.H.D. Cate, H.F. Noller, Crystal structure of the ribosome at 5.5 A resolution, Science 2001, 292, 883-896. T. Hohsaka, Y. Ashizuka, H. Murakami, M. Sisido, Five-base codons for incorporation of nonnatural amino acids into proteins, Nucleic Acids Res. 2001, 29, 3646-3651. B. Moore, C.C. Nelson, B.C. Persson, R.F. Gesteland, J.F. Atkins, Decoding of tandem quadruplets by adjacent tRNAs with eight-base anticodon loops, Nucleic Acids Res. 2000, 28, 3615-3624. J.C. Anderson, N. Wu, S.W. Santoro, V. Lakshman, D.S. King, P.G. Schultz, An expanded genetic code with a functional quadruplet codon, Droc. Natl. Acad. Sci. U. S. A. 2004, 101, 7566-7571. C. Switzer, S.E. Moroney, S.A. Benner, Enzymatic incorporation of a new base pair into DNA and RNA, /. Am. Chem. SOC.1989, I l l , 8322-8323. J.D. Bain, C. Switzer, A.R. Chamberlin, S.A. Benner, Ribosome-mediated incorporation of a non-standard amino acid into a peptide through expansion of the genetic code, Nature 1992, 356, 537-539.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
References I295
I. Hirao, T. Ohtsuki, T. Fujiwara, T. Mitsui, T. Yokogawa, T. Okuni, H. Nakayama, K. Takio, T. Yabuki, T. Kigawa, K. Kodama, T. Yokogawa, K. Nishikawa, S. Yokoyama, An unnatural base pair for incorporating amino acid analogs into proteins, Nut. Biotechnol. 2002, 20, 177-182. T.Ohtsuki, M. Kimoto, M. Ishikawa, T. Mitsui, I. Hirao, S. Yokoyama, Unnatural base pairs for specific transcription, Proc. Natl. Acad. Sci. U. S. A. 2001, 98,4922-4925. 1. Hirao, Y. Harada, M. Kimoto, T. Mitsui, T. Fujiwara, S. Yokoyama, A two-unnatural-base-pair system toward the expansion of the genetic code,J. Am. Chem. Soc. 2004, 126, 13298-13305. Y. Wu, A.K. Ogawa, M. Berger, P.G. Schultz, Efforts toward expansion of the genetic alphabet: optimization of interbase hydrophobic interactions, 1. Am. Chem. SOC.2000, 122,7621-7632. H.Liu, L. Wang, A. Brock, C.-H. Wong, P.G. Schultz, A method for the generation of glycoprotein mimetics, J. Am. Chem. Soc. 2003, 125, 1702-1703. S.V. Mamaev, A.L. Laikhter, T. Arslan, S.M. Hecht, Firefly luciferase: alteration of the color of emitted light resulting from substitutions at position 286,J. Am. Chem. Soc. 1996, 118,7243-7244. S. Manabe, K. Sakamoto, Y. Nakahara, M. Sisido, T. Hohsaka, Y. Ito, Preparation of glycosylated amino acid derivatives for glycoprotein synthesis by in vitro translation system, Bioorg. Med. Chem. 2002, 10,573-581. D.M. Rothman, E.J. Peterson, M.E. Vazquez, G.S. Brandt, D.A. Dougherty, B. Imperiali, Caged phosphoproteins, J . Am. Chem. SOC. 2005, 127,846-847. H.Murakami, T. Hohsaka, Y. Ashizuka, K. Hashimoto, M. Sisido, Site-directed incorporation of fluorescent nonnatural amino acids into streptavidin for highly sensitive detection of biotin, Biomacromolecules 2000, I , 118-125. T. Hohsaka, N. Muranaka, C. Komiyama, K. Matsui, S. Takaura,
68.
69.
70.
71.
72.
73.
74.
R. Abe, H. Murakami, M. Sisido, Position-specific incorporation of dansylated nonnatural amino acids into streptavidin by using a four-base codon, FEBS Lett. 2004, 560,173-177. H. Hamada, N. Kameshima, A. Szymanska, K. Wegner, L. kankiewicz, H. Shinohara, M. Taki, M. Sisido, Position-specific incorporation of a highly photodurable and blue-laser excitable fluorescent amino acid into proteins for fluorescence sensing, Bioorg. Med. Chem 2005, 13,3379-3384. V.W. Cornish, D.R. Benson, C.A. Altenbach, K. Hideg, W.L. Hubbell, P.G. Schultz, Site-specific incorporation of biophysical probes into proteins, Proc. Natl. Acad. Sci. 1994, 91,2910-2914. G. Turcatti, K. Nemeth, M.D. Edgerton, U. Meseth, F. Talabot, M. Peitsch, J. Knowles, H. Vogel, A. Chollet, Probing the structure and function of the tachykinin neurokinin-2 receptor through biosynthetic incorporation of fluorescent amino acids at specific sites,]. Biol. Chem. 1996, 271, 19991-19998. L.E. Steward, C.S. Collins, M.A. Gilmore, J.E. Carlson, J.B. Alexander Ross, A.R. Chamberlin, I n vitro site-specific incorporation of fluorescent probes into p-galactosidase, J . Am. Chem. Soc. 1997, 119,6-11. C.F.W. Becker, C.L. Hunter, R.P. Seidel, S.B.H. Kent, R.S. Goody, M.A. Engelhard, A sensitive fluorescence monitor for the detection of activated Ras: total chemical synthesis of site-specifically labeled Ras binding domain of c-Rafl immobilized on a surface, Chem. Bid. 2001, 8, 243-252. B.E. Cohen, T.B. McAnaney, E.S. Park, Y.N. Jan, S.G. Boxer, L.Y. Jan, Probing protein electrostatics with a synthetic fluorescent amino acid, Science 2002,296,1700-1703. M. Kuragaki, M. Sisido, Long-distance singlet energy transfer along a-helical polypeptide chains, J. Phys. Chem. 1996, 100,16019-16025.
PART 111 Discovering Small Molecule Probes for Biological Mechanisms
Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Ghnther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
6 Forward Chemical Genetics Stephen]. Haggarty and Stuart L. Schreiber
Outlook
This chapter will review important historical and conceptual developments in the use of chemical genetics to discover small-molecule probes of biological mechanisms. The main focus will be on the notion ofusing “forward” chemical genetics (phenotype-based discovery of biologically active small molecules) to dissect the functions of genes. We will provide a comparison of this approach to its classical genetic counterpart and to “reverse” chemical genetics (gene product-based discovery of small molecules). We will summarize recent technical advances that facilitate the discovery process - most notably the use of high-throughput, phenotypic assays that measure cell-state changes on the basis of the recognition of epitopes by antibodies, messenger ribonucleic acid (mRNA) expression levels, and fluorescence imaging of individual and populations of cells. As practical examples of the application of forward chemical genetics we will discuss the use of the ongoing development of a “molecular tool box” for the study of the cell-cycleand chromatin remodeling, which has both basic- and clinical-research applications. Besides these specific examples, and by way of an analogy to the creation of genetic maps using classical genetics, we will generalize the notion of using an individual chemicalgenetic screen to find an active compand for the systematic use of chemical genetics to map “chemical space” using phenotypic descriptors. Lastly, we will discuss possible future developments in the field of chemical genetics.
6.1 Introduction It is sometimes thought that the Neurospora work was responsible for the “one gene-one enzyme” hypothesis - the concept that genes in general have single primavyfunctions, aside from serving an essential role in their own replication, and that in many cases thisfunction Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
300
I
6 Forward Chemical Genetics
is to direct specijicities ofenzymatically active proteins. Thefact is that it was the other way around - the hypothesis was clearly responsiblefor the new approach.
George Wells Beadle Nobel prize in medicine or physiology, 1958
Since the time of Gregor Mendel (1822-1884) and the discovery of “heritable factors” [I],which are now referred to as genes, classical genetics, and more recently molecular genetics, has become the dominant experimental paradigm for understanding biological systems [2].An attractive feature of the genetic approach is its adherence to the logic that to understand a system you should perturb it and observe the consequences. Another important feature is its generality, that is, genetics provides an experimental approach that is applicable to the dissection of almost all biological systems provided that the systems can reproduce and heritable mutations in genes can be made. Despite the successes of classical genetics and knowledge of the complete sequence of deoxyribonucleic acid (DNA) that comprises the human genome [ 3 ] , the functions of the majority of genes and other regulatory elements within the genome remain as enigmatic as they were at the time of Mendel. In fact, many recent studies analyzing the basic tenets of what constitutes a “gene”, as well as studies on the regulatory roles of ribonucleic acids (RNA), challenge many of the tenets of the central dogma (DNA-to-RNA-to-protein). Moreover,while knowledge of the complete human-genome sequence provides a foundation for understanding disease biology, even for the majority of cases of single-gene Mendelian disorders (e.g.,Huntington’s disease, cystic fibrosis), knowledge of the genetic variation that causes the diseases is only the first step toward an understanding of the disease pathogenesis and the development of therapeutic treatments. Furthermore, it is now widely recognized that many common human diseases, including cancer, schizophrenia, and diabetes, have a strong genetic component, but the heritability of these diseases is so-called complex in terms of the number of alleles (variants of genes) that contribute to the final outcome and susceptibility. As a result of these challenges, there exist only a handful of medical treatments based on an understanding of the molecular etiology of a particular disease, and very few treatments that take into account an individual’s genetic history. Therefore, there exists a great need to expand the “molecular toolkit” available to both researcher scientists and clinicians - the field of chemical biology is well poised to contribute toward this task. As stated above, George W. Beadle in his acceptance speech for the Nobel prize in medicine or physiology in 1958 (shared with Edward L. Tatum “for their discovery that genes act by regulating definite chemical events” using the red bread mold Neurospora crussa, and with Joshua Ledenberg “for his discoveries concerning genetic recombination and the organization of the genetic material of bacteria”) noted that the desire to test new hypotheses in science can be the genesis of new approaches that are transformative to
G. 1 Introduction
the existing scientific paradigm - rather than the other way around. With this notion in mind, and with the aim of deciphering the functions of the human and other model genomes, chemical genetics provides an approach both to discover and to dissect the functions of gene products encoded within a genome using biologically active small molecules (Fig. 6-1) [4-111. By directly targeting gene products, mostly encoding for proteins, rather than by mutating an organism’s genetic material, this approach differs from classical genetics. However, as discussed in this chapter and elsewhere in this book, the overall logic of chemical genetics and many of the principles of the approach are similar to classical genetics. Given the temporal control offered by small molecules, and the ability to use Combinations of small-molecule modulators, chemical genetics promises to complement the use of pure genetic analysis to study a wide range of biological systems and mechanisms. In this regard, it is possible that many of the hypotheses that can be tested using chemical genetics will ultimately play a transformative role in the coming years, much like Beadle and Tatum’s efforts over a half-century ago. To be effective as probes of biological mechanisms, and to function as therapeutic agents in the clinical setting, small molecules must modulate biological states by perturbing cellular networks through interactions with macromolecular molecules. The challenge of doing this effectively is highlighted by emerging models from genome- and proteome-wide interaction
Fig. 6-1 Classical genetics versus chemical genetics. Chemical genetics aims to target gene products using small molecules rather than t o target the genes themselves by m u t a t i n g an organism’s genetic material.
1
301
302
G I studies [ 11 151. These models have revealed the highly interconnected nature Forward Chemical Genetics -
of the underlying networks of biochemical and genetic interactions in which the nodes are proteins or genes and the edges represent a physical or genetic interaction. Here, the observation that biological systems are robust to random perturbations but are highly susceptible to the targeted perturbation of highly connected nodes, means that not all gene products involved in a particular cellular process have equal importance in terms of the fidelity or robustness of the process [ll,141. As such, contrary to the original tenets of the Beadle and Tatum’s ‘one gene-one enzyme’ hypothesis, many gene products are not enzymes and many gene products have multiple functions, some of which are redundant in that they can be compensated for in their absence by other gene products. Thus, because of the connectivity of biological networks, while targeting a highly connected node may produce a desired phenotype, doing so may also result in untoward effects due to modulation of functionally connected nodes that are neither directly relevant nor are needed for the desired phenotypic outcome. The development of experimental methods to uncover and modulate selectively the functions of individual nodes (mostly representing proteins) in such networks is the central aim of functional genomics, in general, and chemical genetics, in particular, [4-111.
6.2 History/Development
Throughout history, small molecules have played an important role in many basic discoveries in science and have provided medicinally useful agents for the treatment of disease in the millennia. Although difficult to define precisely what constitutes a “small molecule”, as compared to other molecules in general, it is instructive to examine examples (Fig. 6-2). In general, small molecules are composed of stable arrangements of the atoms carbon, hydrogen, oxygen, nitrogen, sulfur, phosphorous - the same constituents of the amino acids, nucleic acids (DNA and RNA), carbohydrates, lipids, and other chemicals that form the macromolecular building blocks of life itself. Unlike the macromolecular components of DNA, RNA, and protein, small molecules are generally of lower molecular weight and are usually not composed of polymeric, repeating subunits. A few, important examples of small molecules include (Fig. 6-2): penicillin (1) an antibiotic discovered by Alexander Fleming; thiamine (vitamin B1) (2) used by George W. Beadle and Edward L. Tatum to rescue auxotrophic mutants of N.crussu; geldanamycin (3)a natural product that targets the HSP90 resulting in aberrant protein folding and suppression of oncogenic mutations that occur in certain cancers; dopamine (4)an important excitatory neurotransmitter that mediates many aspects of human behavior and cognition; haloperidol (5) an antipsychotic used to treat schizophrenia that targets a family of neurotransmitter receptors, including the dopamine Dz receptor; colchicine
6.2 History/Deue/oprnent
9
OMe
6
Fig. 6-2 Examples of biologically active small molecules whose structural complexity, protein targets, and consequent observable phenotypes are different. (1) Penicillin C,an antibiotic; (2) thiamine (vitamin BI), a metabolite that is an enzyme cofactor; (3) geldanamycin, an inhibitor o f heat-shock protein 90 (HSP90); (4) dopamine, a neurotransmitter; ( 5 ) haloperidol, a central nervous system
depressant and sedative; (6) colchicine, an inhibitor o f mitosis that causes microtubule destabilization; (7) rapamycin, an anticancer agent that inhibits TOR proteins when complexed t o FKBP12; (8) latrunculin B, a destabilizer of actin microfilaments; (9) caffeine, a central nervous system stimulant that targets proteins including cyclic nucleotide phosphodiesterases.
( 6 ) first used by the Egyptians over 35 centuries ago for the treatment of what is now recognized as cancer, and later used to discover tubulin, a major component of the cytoskeleton; rapamycin (7) a natural product with anticancer properties first isolated from the bacteria Streptornyces and later used to discover mammalian FKB P12-rapamycin-associated protein (FRAP)/mammalian target of rapamycin (mTOR); latrunculin (8), a natural product isolated from the marine sponge that causes destabilization of the actin cytoskeleton; and caffeine (9), a naturally occurring methylxanthine found in coffee and tea, which has several cellular actions, including the inhibition of cyclic nucleotide phosphodiesterases. Indeed, many aspects of biological research - from using antibiotics (e.g., ampicillin), to selecting for the transformation of Escherichia coli with a recombinant DNA plasmid, to the vitamin constituents (e.g., vitamin B6) of the basic culture media used to culture mammalian cells, to the inhibition of proteases (e.g., leupeptin) and phosphatases (e.g., pervanadate) during biochemical purification of proteins - rely on the use of small molecules. Besides these routine uses in biology, biologically active small molecules are widely used as imaging
I
303
304
I reagents in basic research and clinical diagnosis (e.g., fiuorodeoxyglucose G Forward Chemical Genetics
positron-emission tomography (FDG-PET)). They provide essential roles in newly developed technologies such as somatic cell nuclear transfer (e.g., A23187, a calcium ionophore), and many small molecules are produced in mammalian cells using endogenous metabolic pathways (e.g., the opiate analgesic morphine). By using small-molecule libraries in appropriate cell-based assays, the functions of a growing number of novel gene products and biologically active small molecules from both natural sources and laboratory syntheses have been discovered (Table 6-1). Many of these small molecules cause a loss of function of their cognate targets, including kinases and phosphatases, deacetylases and acetyltransferases, membrane receptors, proteases, isoprenyl transferases, and polymerases, and to a lesser extent, small molecules that cause a gain of function have also been discovered or invented. An important example of using chemical genetics to characterize a signaling pathway from the cell membrane to the nucleus is that of the discovery of the common targets of the immunosuppressant drugs cyclosporine A (CsA) and FKSO6 (reviewed in Refs 16, 17). Prior to this discovery, CsA was known to inhibit the production of IL-2, a T-cell-derived cytokine that mediates the immune response leading to rejection of transplanted organs in humans, although the mechanism of action was unknown. Scientists looking to discover new immunosuppressants, first isolated FK506 from the fermentation broth of Streptomyces tsukubaensis after discovering that an extract of this organism could also block IL-2 secretion [18].Since FK506 was a potent immunosuppressive with activity at concentrations several hundredfold lower than CsA, scientists became interested in identifying the cellular receptors or targets of both CsA and FK506, leading first to the recognition that they had to target separate “immunophilins”, cyclcophilin and FK506 binding protein-12 (FKBP12)[19].Further investigation led to the recognition that the complexes of cyclophilin-CsAand FKBP12-FK506 competitively bind and inhibit the Ca2+and calmodulin-dependent phosphatase calcineurin [20]. Collectively, these studies revealed a previously unknown family of evolutionarily conserved gene products (the immunophilins), revealed a biological function of calcineurin, identified and characterized new biologically active small molecules, provided an important example of using synthetic chemistry to manipulate an important class of small molecules to identify their cellular targets using affinity chromatography, and expanded the repertoire and medical understanding of immunosuppressant drugs. Since the time of these discoveries, calcineurin has been recognized as an important mediator of T-cell signal transduction pathway regulating transcription factors such as the nuclear factor of activated T cells (NF-AT),which are involved in the expression of a number of important genes involved in T-cell-receptoractivation, including IL-2; calcineurin has also been shown to be an important regulator of the nervous and cardiovascular system [21].
Increases tubulin acetylation Depolymerizes microtubules Bypasses DNA damage induced G2 checkpoint Bypasses chromatid catenation induced G2 checkpoint Synthetic lethal with tranformin oncogens Synthetic lethal with RNAi of Tsc2 Prevents cell invasion
Cultured cells Cultured cells Cultured cells Cultured cells
Cultured cells Cultured cells Cultured cells
Cultured cells
Cultured cells
Cultured cells
Cultured cells
Tubacin Myoseverin Isogranulatimide Suptopins
Erastin Macbecin I1 Dihydromotuporamine C
Chromatin remodeling Trapoxin B
Depeudecin
Trichostatin A
ITSAl
Reversal of transformed phenotype: histone acetylation Reversal of transformed phenotype: histone acetylation Reversal of transformed phenotype; histone acetylation Bypasses cell-cycle arrest by trichostatin A
Perturbs mitosis Perturbs mitosis Perturbs mitosis Perturbs mitosis Perturbs mitosis Perturbs mitosis Inhibits smooth muscle contraction Inhibits actin polymerization Induces monopolar spindles Induces a small mitotic spindle
Cultured cells Cultured cells Cultured cells Cultured cells Cultured cells Cultured cells Smooth muscle tissue Xenopus extract Cultured cells Xenopus extract
Cytoskeleton and cell division Colchicine Taxol Hesperadin Latrunculin Synstab A Depol-2b Y-27632 Wiskostatin Monastrol Diminutol
Key phenotype
Assay format
Small molecule
Table 6-1
(continued overleaf)
9 Unknown
a
-
3
n
3
p
5
F
3
h,
I;\
Histone deacetylases
Histone deacetylases
Histone deacetylases
Sphingolipid metabolism
Unknown Unknown
Tubulin Tubulin Aurora kinases Actin Tubulin Tubulin pl6OROCK N-WASP Mitotic kinesin Eg5 NADP-dependent quinone oxidoreductase Histone deacetylase 6 Tubulin Chkl Unknown
Target
Assay format
Protein synthesis,folding, traficking, and secretion Geldanamycin Leptomycin B Antiviral/antifungal Multiple inhibitors In vitro translation extract Multiple inhibitors Cultured cells Brefeldin A Antiviraljantifungal Exol Cultured cells Ex02 Cultured cells Cultured cells Multiple sulfonamides Sortins Cultured cells Ubiquitin-proteasome pathway Lactacystin Cultured cells Ubistatin Xenopus extract Signaling pathway Cyclopamine Cultured cells Cyclosporin Cultured cells FK50G Cultured cells Rapamycin Cultured cells Fumagillin Cultured cells SMIR4 Cultured cells Purmorphamine Cultured cells TWS119 Cultured cells Cardiogenol Cultured cells Concentramide Zebrafish embryos GS4012 Zebrafish embryos
Small molecule
Table 6-1 (continued)
Crml RNA and varied Varied Arfl Unknown Unknown Unknown Unknown Proteasome Multiubiquitin chain Smoothened Cyclophilin and calcineurin FKBP12 and calcineurin FKBPl2 and TOR kinase Methionine aminopeptidase Nirlp (Ybr077cp) Hedgehog signaling agonist Glycogen synthase kinase-3b Unknown Unknown Upregulates VEGF levels
Neurite induction and protease inhibition Inhibits ubiquitin-dependent proteolysis Inhibits hedgehog signaling Inhibits T-cell signaling Inhibits T-cell signaling Inhibits T-cell signaling Inhibits endothelial cell proliferation Suppresses rapamycin Induces osteogenesis Induces neurogenesis Induces cardiomyogenesis Disrupts heart patterning Suppresses cardiac defect
Target
Inhibits nuclear export Inhibition of translation initiation and elongation Inhibit FOXOla nuclear export Blocks ER-to-Golgi transport Blocks ER-to-Golgi transport Blocks ER-to-Golgi transport Block Golgi-to-cell-membranetransport Induce secretion
Key phenotype n
a
$
3
cn
6.3 General Considerations
Although many important individual discoveries, like the role of calcineurin in T-cell signaling, have been made using chemical genetics (Table G-l), one of the limiting factors in making such discoveries is the gap between the fields of chemistry and biology. In an effort to bridge the differences between these fields, a notable “cross-talk’’article entitled Toward a Pharrnacological Genetics in the inaugural issue of the journal Chemistry @ Biology in 1994 cited many of the advantages of using small molecules to study biological systems and the need for increased interaction among chemists and biologists [4].Over a decade later, many of the ideas discussed in this article continue to be favored topics of discussion and provide challenges that the field of chemical biology as a whole continues to face. Besides the development of high-throughput phenotypic assays for screening large collections of small molecules, which has enabled chemical-genetic approaches, and high-throughput binding and enzymatic assays, which have enabled reverse chemical-genetic approaches, chemical genetics has evolved to emulate classical genetics in a number of other ways: (a) the development of high-throughput phenotypic assays compatible with performing screens of large collections of chemicals, (b) the use of chemicalgenetic modifier (suppressor and enhancer) screens to reveal connections between pathways and networks as well as epistatic relationships between gene products, (c) the use of synthetic-lethal (and synthetic-viable) screening to reveal redundant elements ofpathways and networks, and (d) the creation of “chemical-geneticmaps” that position chemicals in a multidimensional space formed from phenotypic or computed descriptors. It is the objective of this chapter to discuss these topics and to provide examples in which the approach of chemical genetics has been successful in discovering small-molecule probes for biological mechanisms.
6.3 General Considerations 6.3.1 Small Molecules as a Means to Perturb Biological Systems Conditionally
Although chemical genetics is modeled after classical genetics, especially with respect to the use of phenotype-based screening (the word phenotype is derived from Greekphaino-, from phainein, meaning to show or be observable), it differs from classical genetics in the use of small molecules, rather than mutations, to perturb the function(s) of gene products [4-111. Thus, chemical genetics applies the principles and logic of genetics, but the analyses focus on proteins rather than genes. Several features of small molecules render them ideal for use with complex biological systems and for complement classical genetic analysis and methods based on ribonucleic acid-based interference (RNAi). These features include the ability to offer nearly instantaneous temporal control,
I
307
308
G Fonvard Chemical Genetics
I the ability to use combinations of small-molecule modulators, the ability to disrupt protein-protein interactions, the ability to cause both gain and loss of individual functions, and the ability to modulate individual functions of multifunctional proteins. Since small molecules can alter specifically the function of a gene product from all copies of a gene (assuming there are no functional differences between the alleles), a small molecule can be used analogously to an inducible dominant or homozygous recessive mutation in diploid genetic systems that posses two copies or alleles of each gene. This circumvents the difficulty of generating these types of mutations in the case of mammalian systems. Also, just as mutation sites can identify functionally relevant coding sequences of genes, small molecules can identify functionally relevant amino acid residues of proteins, on the basis of their mechanism of interaction. Unlike most mutagenic methods, the use of small molecules will not generally produce heritable alterations in genes. Since a small molecule can generally be added and removed from an experiment at will, the perturbations induced by small molecules are generally conditional and reversible. Large numbers of small molecules, and not mutations, are required to perturb the complete complement of cellular gene products. Determining which gene product is altered in a genetic assay requires mapping of a mutation or sequencing of a gene as opposed to identifying the protein(s) targeted by a small molecule, that is, the “target identification problem” (see below). Although the focus of the chemical genetics described in this chapter is that of the screening of small organic molecules, other exogenously added chemicals, such as DNA sequences that encode for an amino acid or nucleic acid polymer or other compositions of matter that may alter the state of a biological system, are also of interest. In particular, the use of RNAi and related phenomena now provide powerful reverse genetic approaches for functional genomics [22, 231. However, while RNAi can provide selectivity (assuming that the probe is appropriately designed and validated for the system being tested), RNAi probes must first be synthesized using the knowledge of gene sequence, and their effects are limited to loss or reduction of function of gene products. Furthermore, the inability of RNAi to selectively target individual functions of proteins, to directly disrupt protein-protein interactions, and its extended temporal scale, limits the generality and applicability of this strategy for modulating gene-product function. Ultimately, however, the combination of different forms of perturbations will be an important means of elucidating pathways and targets.
6.3.2 Forward and Reverse Chemical Genetics
Overall, the use of genetic approaches can be subdivided into “forward genetics”, which involves the use of phenotype-based screening and “reverse
G.3 General Considerations I 3 0 9
genetics”, which involves studying the phenotypic consequences of mutations in a known gene (Table 6-2).The use offorward genetics entails determining the phenotypic consequences of mutations in genes and identifying the gene product that produces a heritable phenotypic change when mutated. By starting with a phenotype of interest and working toward an altered gene sequence, the forward genetic approach allows the ordering of gene products into functional pathways and the analysis of the interactions between other gene products and pathways (epistasis). Although initially developed for the study of how genes control inheritance by establishing a connection between changes in genotype and changes in phenotype, a forward genetic approach allows the identification of novel gene products involved in almost any biological process of interest. Since the pioneering work of Mendel, a number of genetically tractable model organisms have become widely used, including: Drosophila melanogaster (fruit fly), Caenorhabditis eleguns (nematode worms), Saccharornyces cerevisiae (budding yeast), Arabidopsis thalina (plant), and even complex vertebrates such as Danio rerio (zebrafish)and Mus musculus (mice) [24-281. Each of these provides a number of strengths and weakness for elucidation of genotype-phenotype relationships. Like its genetic counterpart, “forward” chemical genetics relies on a phenotype of interest to guide the selection of biologically active small molecules that modulate a particular biological system or mechanism (Fig. 6-3) [5-7]. Overall, this approach entails a three-step process that
Table 6-2 Forward genetics (from phenotype to gene/protein) Classical genetic approach
Chemical-genetic approach
Add library of small Random mutagenesis (e.g., molecules to a biological system irradiating cells) (extracts, cells, whole organisms) Select small Select mutants molecules that with the produce the phenotype of phenotype of interest interest Identify the Identify the mutated genes by protein(s) and genetic pathways mapping and with which the sequencing small molecules interact
Reverse genetics (from gene/protein to phenotype) Classical genetic approach
Chemical-genetic approach
Mutate single gene of interest in cells or whole organisms (e.g., knockout mouse) Generate cells or animals with mutant gene
Use a purified protein to screen a collection of small molecules for binders or modulators of function Add the molecules that bind to the protein of interest to cells or whole organisms Observe phenotype(s)
310
I
G Forward Chemical Genetics
Fig. 6-3 Forward versus reverse chemical genetics. While forward chemical genetics relies on a phenotype o f interest t o guide the selection o f biologically active small molecules, reverse chemical genetics use a protein of interest t o identify small
molecules that can be used t o probe the function o f the selected protein. Both approaches require the use o f small molecules and phenotypic assays but differ in the starting Points ofdiscovery.
begins with the development of a phenotypic assay to measure a biological property or mechanism of interest, and then screening of small-molecule libraries for compounds that induce a change in the desired phenotypic property or mechanism. After identifying active compounds, the third, and often most challenging, step involves the identification of interacting protein targets and genetic pathways. Thus, by starting with a phenotype of interest and working toward identifying the protein whose function is altered (rather than altered gene sequence) the forward chemical-genetic approach still allows the ordering of gene products into functional pathways and the analysis of the interactions between other gene products and pathways (epistasis). In addition to identifying functions of gene products, by using phenotypic variation as a means to study biologically active small molecules the forward chemical-genetic approach allows the ordering of biologically active small molecules into functional pathways irrespective of knowledge of their targets and mechanism of action. By analogy to the study of “genotype-phenotype’’ relations, these efforts contribute toward an understanding of “chemotype-phenotype” relations, which includes quantitative structure-activity relationship (QSAR)modeling, which attempts to explain the chemical properties of small molecules that produce molecular recognition events that lead to specific phenotypes. As discussed below, a greater understanding of the relationship between chemotype and phenotype may come about through efforts similar to that of the mapping of genetic mutations.
312
I
6 Fonvard Chemical Genetics
Fig. 6-4 Phenotypic assays for chemical genetics. (a) Types of assays that have been used for chemical-genetic screening. (b) Example o f a cell-based assay involving phospho-specific antibody-based determination o f a cell state [31]. A cytoblot involves growing cells on the bottom of a well, fixing the cells and probing the cells for
the presence of a particular antigen using a specific primary antibody in solution. A secondary antibody covalently linked t o horseradish peroxidase is added and the presence of the entire complex is detected through the chemiluminescent reaction caused by addition of luminal and hydrogen peroxide.
be low such that methods of analysis can readily identify which molecules are active. Ideally, instead of using visual observations or considering a binary descriptor of “0” or “I”, the assay being used is quantitative in nature in terms of providing a continuous valued measure of activity that can be recorded electronically using plate readers designed to measure changes in absorbance, fluorescence, and luminescence. High-throughput (10000-200 000 compounds per day) phenotypic assays involving the measurement of changes in calcium levels or second messengers, like cyclic adenosine monophosphate (CAMP),in cultured cells have been possible using “fluorescence imaging plate readers” (FLIPRs) for many years. However, almost exclusively, these assays have been performed in the context of the development of drugs targeting directly specific cell surface receptors, including the large family of G-protein coupled receptors (GPCRs), whose expression has been engineered to occur in a particular cell line that is readily amenable to high-throughput screening. While these assays have produced many biologically active small molecules that work as either receptor agonists
6.3 General Considerations
or antagonists, some of which are therapeutically used drugs, the focused nature of the screens means that they have not been used to purposefully target the full diversity of possible biological mechanisms. Another assay type that has been widely used is that of using a “reporter gene”, which acts as an easy-to-measure surrogate for a gene product of interest. Such reporter genes contain one or more specific gene regulatory elements that often bind transcription factors whose function is directly linked to a pathway of interest (e.g., CAMP response element binding (CREB) protein), the reporter gene sequence itself (e.g., luciferase or Bgalactosidase), and other sequences required for the formation of functional mRNA. Once the reporter construct is introduced into the cells, a direct assay of the reporter protein’s enzymatic activity provides a means to monitor the upstream signaling pathways, as well as other factors affecting mRNA stability and protein turnover. Through the use of gene expression-based highthroughput screening (GE-HTS)in which a gene-expression signature is used as a surrogate for cellular states, it is now possible to multiplex the number of reporters that are used, although the concept of coupling phenotypic changes in response to small molecules interacting with protein to changes in mRNA is the same [ 3 2 ] . Once a signature consisting of a small set of genes is obtained, this approach provides a general method of screening applicable to many cell types and biological mechanisms. By not having to introduce a reporter gene construct and instead relying on expression of gene from endogenous promoters and read-outs based on hybridization of specific transcripts, these assays have the advantage of examining gene expression under the influence of its natural chromatin and chromosomal context. In the limit of using a full genome’s level of mRNA expression patterns as a phenotype, even with the coexpression patterns of many genes, this approach to forward chemicalgenetic screening provides a truly high information content read-out of cell states [ 3 3 ] . However, since mRNA levels are not always directly related to protein levels and they cannot reflect directly the posttranslational state or localization of proteins in cells, there has been much effort put forth to develop assays that can measure additional biological mechanisms. One common mechanism of biological regulation that cannot be measured directly by a reporter gene or FLIPR assays involves the reversible, covalent modification of proteins. Many posttranslational modifications, including protein glycosylation, methylation, lipidation, isoprenylation, ubiquitination, phosphorylation, and acetylation, have been found to be integral components of the signal transduction mechanisms operating to transfer information in and between cells. By rapidly and reversibly altering the chemical properties of gene products in a manner dependent on and capable of influencing subcellular localization and the interaction with other protein partners, such intracellular chemistry provides a means to both observe and modulate biological systems. To assess the intracellular pathways regulating posttranslational modifications using forward chemical genetics, a number of assays have been developed that allow screening of small-molecule libraries for modulators of such
I
313
314
I modifications. One nonradioactive format, called the cytoblot, is capable of I; Forward Chemical Genetics
detecting posttranslational events using an appropriate antibody (Fig. 6-4(b)) [31]. Unlike a reporter gene assay, since this assay does not require the engineering of a the cellular system, and instead takes advantage of the ability of cells to produce proteins and to analyze proteins in their endogenous context without overexpression, this format facilitates the assaying of transformed or primary cell lines that are from different tissue types or from different genetic backgrounds. Two of the emerging technological developments, which when combined together promise to play an important role in forward chemical genetics, are the use of optical imaging and automated microscopy [34].Through the use of appropriate fluorescent dyes, antibodies, and genetically encoded probes, such as the green-fluorescent protein (GFP), these techniques allow the resolution of individual cells and subcellular organelles within cultured cells in multiwell plates (Fig. 6-5). The term “high-content” is often used to refer to the high information content of these types of assays, which follows from their ability
Fig. 6-5 Example o f a high-content multiple cell types, and phenotypes can be image-based screen for small molecules that quantified from a single image using image alter neural stem-cell differentiation. Unlike segmentation and computational analysis. homogeneous, plate-reader based assays,
G.3 General Considerations
to extract a variety of features from images. Thus, instead of considering either a binary descriptor or a continuous valued measure of activity that is produced from the entire content of a well, as is often obtained from using visual inspection or a plate reader, these assays can quantify phenotypes in individual cells, as well as provide a population average. Since routine imaging allows the use of multiple (3-4) fluorophores with different excitation and emission properties, ratiometric and multiplexed measurements can be made. For example, by considering a binary measurement of intensity alone, and not the morphology of cells, for three separate colors (red, blue, green) there are a total of 23 = 8 possible ratiometric measurements per well. Furthermore, beside overall intensity, image segmentation allows the features of only a subset of objects in a well to be quantified separately from others. As a result, complex mixtures of cell types can be assayed simultaneously to perform a multiplexed assay to provide a more physiologically relevant environment. Figure 6-5 shows an example of an image-based screen to look for small molecules that modulate the differentiation of mouse neuronal stem cells into the three principal cell types of the brain: astrocytes, oligodendrocytes, and neurons. The following three examples highlight the usefulness of image-based screening for chemical genetics. Example 1: Perlman and colleagues performed a fully automated, imagebased, centrosome-duplication assay that measured the size of centrosomes in individual cells [ 3 5 ] . Using this assay, they performed a series of chemical-genetic modifier screens (see below) looking for suppressors and enhancers of hydroxyurea, a compound that was known to induce centrosome duplication. Out of a collection of known biologically active compounds this assay revealed that compounds targeting microtubules and protein synthesis blocked centrosome duplication, while certain paralog-specific protein kinase C inhibitors and retinoic acid receptor agonists increased it. Then using a library of uncharacterized small molecules, they were able to identify five novel centrosome-duplication inhibitors that do not target microtubule dynamics or protein synthesis. Example 2: In a phenotypic screen for inhibitors of the secretory pathway (endoplasmic reticulum - Golgi apparatus - cell membrane), Feng and colleagues identified several structural classes of small molecules that perturb membrane trafficking [36].Through more in-depth analysis [37], one class of sulfonamide-containing molecules were shown to inhibit the ATPase activity of the vacuolar ATPase and others were shown to act by a mechanism distinct from that of the natural-product brefeldin A, which inhibits Arfl GTPase by stabilizing it in its inactive GDP-bound state. Example 3: Using a visual, image-based phenotypic screen that measured the subcellular localization of GFP-tagged FOXOla, a screen for inhibitors of FOXOla nuclear export in the absence of the PTEN phosphatase was performed by Kau and colleagues [38]. These studies led to the discovery
I
31 5
316
I of general inhibitors of nucleocytoplasmic transport, which, like the natural6 Forward Chemical Genetics
product leptomycin, directly inhibited the nuclear export factor CRM1. Besides this class of compounds, a number of other compounds inhibiting PI3K/Akt signaling were discovered, which included multiple antagonists of calmodulin signaling and psammaplysene A [39],a natural product isolated from marine extracts. Given the importance of the PI3K/PTEN/Akt signal transduction pathway in a variety of cancers, and the ability of FOXOla targeted to the nucleus to reverse tumorigenicity of PTEN null cells, these small molecules and their targets may provide a new generation of therapeutic agents.
6.3.4 Nonheritable and Combinations o f Perturbations
One of the significant differences between chemical genetics and classical genetics is that the possible perturbations are not limited to those that can be made by making heritable changes in discrete factors, such as a gene. In addition, unlike a genetic perturbation that needs to be recreated if one wants to study a new organism or the mutation in a different genetic background, many small molecules are active in multiple biological systems. In fact, if a small molecule can be found to have a similar phenotype in a genetically tractable organism, such as S. cerevisiae or C. elegans, then exploiting the evolutionary conservation of biological systems provides a means to assist in the identification of the targets of the small molecules. As a result of the ease of being able to add different small molecules to an experimental system, as compared to the difficulty of making extensive double or other combinations of genetic mutants, it is possible to exploit the combinatorics of possible perturbations to discover combinations of small molecules or other perturbations that produce a desired phenotype [39]. For example, ifwe consider a chemical library composed of N small molecules that are to be tested at C concentrations, there are: C x N possible single treatments, C x N (C x N - 1)/2 possible unique combinations, and C x NZ possible combinations (if the order of addition of the small molecules is relevant). Thus, even for a small collection of compounds (N= 100) tested at three concentrations (C = 3) there are over 44 850 possible unique combinations of treatments. However, the diversity of the resulting perturbations might be less optimal for discovering new probes, as it would be expected that many of the different combinations would be functionally similar. Alternatively, instead of performing an “all against all” screen, it is possible to select specific small molecules of interest and purposefully perform what is referred to as a “chemical-genetic modifier” screen to look for suppressors and enhancers of the phenotypic effect of the small molecule of interest (Fig. 6-6). In classical genetics, suppressor and enhancer screens are used to identify genes that, when mutated, suppress or enhance a previously identified phenotype of interest. The advantage of such screening, as compared to
6.3 General Considerations
Fig. 6-6 Chemical-genetic modifier screens. (a) By p u t t i n g cells i n a defined cell state, it is possible t o identify small-molecule suppressors and enhancers. (b) Examples o f data collected f r o m a screen for chemical-genetic modifiers u s i n g a growth assay i n b u d d i n g yeast (data f r o m Harvard U n iversity, MCB100 Ex per im e nta I Biology course). Each r o w corresponds t o a
small molecule f r o m a chemical library and each c o l u m n a different small-molecule modifier that puts the yeast i n t o a different cell state. The level o f red and green is indicative of the observed growth measured by optical density o f w e l l s . Certain compounds allow the yeast t o grow, whereas others prevent growth.
using a wild-type (WT) genetic background, is in the sensitization of the pathway to further perturbation, rendering the mutations identified often more relevant to the pathway of interest. In the end, like the synthesis of diverse compounds via two-component coupling reactions, the sparse sampling of a larger matrix of possible combinations via chemical-genetic modifier screens may prove beneficial for identifying novel small-molecule probes of biological mechanisms. Examples of chemical-genetic modifier screens that have been performed include the identification of suppressors of (a) the histone deacetylase inhibitor trichostatin A [40], (b) ICRF-193 [41], a topoisomerase I1 inhibitor that causes a Gl-checkpoint arrest, (c) rapamycin [42], an inhibitor of TOR proteins, (d) FK50G and its effect on calcineurin’s regulation of salt stress [43], and (e) hydroxyurea’s effect on centrosome duplication [35]. Suppressors and enhancers have also been identified for a variety of other small molecules, including the motor protein kinesin-5 inhibitor monastrol, the microtubule destabilizer nocodazole, the microtubule stabilizer taxol, the actin destabilizer latrunculin, the protein translation inhibitor cycloheximide, and the calmodulin inhibitor W7 (S.J.H.and S.L.S., unpublished data).
1
317
318
I 6.3.5
G Forward Chemical Genetics
Multiparametric Considerations: Dose and Time
From first principles, other important considerations for determining the phenotypic effect of small molecules are those of the concentration and the length of treatment, which are collectively referred to as dosage efects. Not unlike the challenges faced by geneticists who induce multiple different alleles by mutagenesis and determine which mutations are hypomorphic (reduction of function), hypermorphic (gain of function), or a complete null allele (no function), chemical biologists studying small molecules that show different phenotypes at different concentrations have to determine whether the molecule is interacting with multiple protein targets with different thresholds of activity, or with a single target that induces different phenotypes with different levels of modulation. Depending on the resolution of the assay being used to screen the small molecules and to assess their phenotypic effects, there may be a threshold for the length of treatment with a small molecule, which can also be affected by the concentration. For example, measuring the effects of a small molecule on the progression of mammalian cells through the cell cycle requires a few hours of treatment, but cellular processes such as the synaptic vesick cycle require only a few seconds. As discussed below, these along with other parameters are beginning to be addressed upfront as part of “multidimensional” screening efforts. 6.3.6 Sources of Phenotypic Variation: Genetic versus Chemical Diversity
In many ways, the ongoing development of improved collections of small molecule perturbagens (SMPs) for forward chemical genetics is reminiscent of the development of improved method for mutagenesis in classical genetics. Before it was realized that the genetic material was a molecule, early geneticists, such as Thomas H. Morgan who was awarded the Nobel prize in physiology or medicine 1933 “for his discoveries concerning the role played by the chromosome in heredity”, had to rely on spontaneous mutants as their source of genetic variation, thus limiting the power of forward genetics. A great leap forward was made in 1927 when Herman J. Muller, a student of Thomas H. Morgan, discovered that heritable mutations in Drosophila could be induced. For “the discovery of the production of mutations by means of X-ray irradiation” Herman J. Muller was recognized in 1946 with the Nobel prize in physiology or medicine. This finding meant that for the first time it was possible to access a wide swath of genetic variation and associated diversity of phenotypes. With the advent of chemical mutagens, such as ethylnitrosourea capable of inducing point mutations (changes in single base pairs), many different types of alleles could be induced, including both lossof-function and gain-of-function mutations. While the early practitioners of
6.3 General Considerations
genetics would likely have never anticipated such developments, the advent of even improved methods for genome manipulation, including gene disruptions due to insertion of transposable elements, gene trap vectors, and homologous recombination, now allow a wide spectrum of genetic variation to be studied. The serendipitous discovery of small molecules “spontaneously” produced by natural sources, such as cultured bacteria and marine sponges, has been a long-standing source of bioactive small molecules [44, 451. Like the discovery of X rays and other agents that can induce phenotypic variation, chemical biologists are becoming increasingly adept at making small molecules that are suitable for use in forward and reverse chemical-genetic studies [6, 46-49]. These methods include the use of DNA template-mediated, and target-and diversity-oriented organic synthesis, peptide and carbohydrate synthesis, and enzyme-mediated synthesis, the latter of which enables in vitro evolution, protein engineering, and even nonnatural amino acids to be incorporated into polypeptides. The collective aim is to provide increasingly complex and effective small-molecule modulators of biological processes by developing efficient (three- to five-step) syntheses of collections of small molecules having rich skeletal and stereochemical diversity. Such synthetic strategies are not directed toward any one molecular target, as occurs in target-oriented synthesis; instead, the efforts are ultimately aimed at being able to target all molecular components of the networks regulating biological processes [G,461. An important conceptual development in chemical library synthesis has been the recognition of the importance of not only creating diversity (so as to increase the likelihood of finding an active small molecule) but also retaining the potential to site- and stereoselectively attach appendages to the small molecule during a postscreening optimization stage. Such chemical handles not only facilitate the addition of functionalities that increase the potency or selectivity of the small molecule but, equally as important, can also be used to facilitate the identification of interacting target proteins and pathways (see below). With access to such idealized collections of small molecules, the challenge for the field of chemical biology includes: (a) determining which of these molecules have spec@ effects on biological systems (at various levels of resolution from proteins to whole organisms), (b) determining the structural and physiochemical properties of molecules that specify associated biological activities, and ultimately (c) directing future synthetic efforts along particular pathways in the synthetic network to produce effectively small molecules that modulate biological systems in any desired manner.
6.3.7 The “Target Identification” Problem
Like its classical genetic counterpart, an important aspect of forward chemical genetics is the reliance on the ability of biological systems to reveal a set of possible targets that when perturbed creates a desired phenotype [4-7, 101.
I
319
320
I GHowever, reliance on phenotype alone to select active small molecules requires Forward Chemical Genetics
that the exact nature ofthe molecular interactions that give rise to the phenotype be further investigated, usually by lower-throughput methods. This situation differs from efforts directed toward target validation through indirect means, such as loss of function caused by gene targeting, overexpression, or reduction in expression by RNAi. By considering the effects of small molecules on intact biological networks as part of the initial discovery process, the logic of forward chemical genetics is a reversal of the logic of most ofthe current efforts in drug discovery. Current drug discovery often picks a specific molecular target based on indirect means of target validation, and then optimizes the interactions of small molecules with a network of main- and side-chain interactions from an individual polypeptide in vitro or in silica Since the eventual desire of the drug discovery approach is to use the small molecule in the context of intact living systems, the full spectrum of phenotypic effects is later explored only for a few select compounds. As such, there exists a paucity of information about the phenotypic effects of large collections of small molecules. Such information would help enable the design of new probes and generations of small-molecule therapeutics. Besides the examples of the identification of the targets of the immunosuppressant compounds CsA and FK506 that are described above, there are a growing number of successful examples of identifying the targets of small molecules identified from forward chemical-genetic screens (Table 6-2) [SO]. However, as was true for early geneticists who used random mutagenesis to introduce genetic variation and then faced the challenge of identifying where in the genome the mutation was, the most challenging aspect of forward chemical genetics, and the rate-limiting step in the discovery cycle, involves the identification of the target of the small-molecule perturbation. To be successful in targeting the myriad possible gene products that might result in a desired phenotypic effect, chemical genetics requires access to diverse small molecules that incorporate structural features to assist in target identification and resynthesis. One method of target identification that requires the modification of the small molecules, which was the approach taken to identify the cellular targets of CsA and FK506, involves the fractionation of cellular extracts with an affinity matrix covalently modified with the biologically active small molecules. A classic example of this approach is that of the identification of the target of microbially derived cyclotetrapeptide trapoxin B (Fig. 6-7)[Sl]. Like trichostatin A and butyrate [ 5 2 ] , trapoxin B was known at the time to share the properties of causing both reversion of oncogene-transformed fibroblast cells and the accumulation of acetylated histones [Sl]. However, unlike trichostatin A and butyrate, trapoxin B was found to be an irreversible inhibitor of the deacetylation of histones, and its cellular and in vitro activity were dependent on the presence of the epoxide functionality [Sl]. Since trapoxin by itself was not directly amenable to modification to facilitate target identification, using a total of 20 steps from commercially available staring material, Taunton and
6.3 General Considerations
OH Y297
0
(Y303,
N
0
K-
-
Fig. 6-7 Target identification o f an inhibitor o f histone deacetylation. (a) Cap-linker-chelator model of H D A C inhibitors and structures of trichostatin A and trapoxin 6. (b) Histone acetyltransferase (HAT) activity opposes that o f H D A C activity. (c) Synthesis o f K-trap
%o
321
,
D173 (Dl911 H131 (ti1401
"
D25& (D264)
1
.I,
D166
OJ
0168 iDli6)
l(D174)
" K - t v Affi-Sol 10 offinity matrix
affinity matrix that lead t o the identification by affinity chromatography o f H D A C l [53]. (d) Crystal structure o f t r i c h o s t a t i n A in an HDAC-like protein revealing chelation by t h e hydroxamate o f a metal a t o m important t o t h e hydrolytic activity o f t h e enzyme 1551.
colleagues replaced one of the amino acid moieties (phenylalanine) in the cyclic ring with a lysine group to afford a modified trapoxin B, named K-trap, which could be directly attached to a solid support (Affi-Gel 10) [53]. After first using subcellular fractionation and anion exchange chromatography to reduce the complexity of the proteome of human cells, the K-trap affinity matrix isolated two nuclear proteins that copurified with histone deacetylase activity [54]. Using peptide microsequencing, a complementary DNA (cDNA) encoding the histone deacetylase catalytic subunit 1 (HDAC1) was identified, which showed sequence similarity to Rpd3p, a known transcriptional regulator in yeast [54]. Since the discovery of HDACl, the family of HDAC-related enzymes has grown to include a total of 11paralogs, and is now the subject of both research and clinical investigation. As reviewed recently, these proteins have emerged as multifunctional nodes involved in many cellular processes including cell-cycle progression, cellular differentiation, transcriptional regulation, cytoskeletal
322
I dynamics, and protein trafficking [55,561. Histone hyperacetylation induced 6 Forward Chemical Genetics
by HDAC inhibitors, such as trichostatin A and trapoxin B, correlates with gene expression, cell-cycle arrest, cell differentiation, and cell death depending on the cell type, duration of treatment, and the concentration of treatment. As a result, there is a growing interest in developing means to modulate HDAC activity, both as research tools and as therapeutic agents. HDAC inhibitors have been proposed for treatment of cancer as well as neurodegenerative disorders associated with mutations in polyglutamine encoding tracts [57]. In addition, agents already used clinically for other purposes, such as valproate (which is used for the treatment of epilepsy, bipolar disorder, and is used as an adjuvant therapy for schizophrenia), inhibit HDACs and cause histone hyperacetylation in cultured cells [58]. Further research aimed at elucidating a functional role for acetylation of proteins other than histones is necessary to understand better the physiological targets of protein deacetylases and the mechanisms by which HDAC inhibitors mediate their spectrum of phenotypic effects (see below for an example of identifying inhibitors of protein deacetylases with selectivity patterns different than that of trichostatin A). A second method of target identification involves preparing radiolabeled derivatives of the small molecule and determining the molecular targets that are labeled, perhaps covalently, by these radioactive probes. Ideally, a covalent labeling allows for the isolation of a small molecule-protein complex under conditions required for separating proteins under denaturing conditions of sodium dodecyl sulfate, polyacrylamide gel electrophoresis (SDS-PAGE),or through mass spectrometric detection of an altered mass of a given peptide or protein. An excellent example of this approach is the identification of the target of the steroidal alkaloid cyclopamine by Chen and colleagues (Fig. 6-8) [59]. Cyclopamine had been known for many years to posses both teratogenic and antitumor activities, and prior to their work had been shown to inhibit the Hedgehog signaling in pathway in vertebrate cells and organisms, but through unknown mechanisms. By synthesizing a 12’iodine-labeledphotoaffinity ( 125 IPA-cyclopamine) derivative, on light activation and consequent cross-linking they were able to detect labeling of a “smoothened”, seven-transmembrane protein that is the receptor for the ligand “patched”, when expressed in COS-1 cells [59, 601. In further support of the target being smoothened, a fluorescent (B0DIPY)-cyclopamine derivative was synthesized, and this probe fluorescently labeled the membrane region of cells that express the smoothened target in a manner that could be completed using cyclopamine itself [59, 601. A third method of target identification uses a “three-hybrid” transcriptional activation system that anchors a derivative of the active ligand for display against a library of cDNAs fused to a transcriptional activation domain [61]. A fourth method involves the use of mRNA expression analysis to identify targets and associate patterns of gene expression to specific perturbations [33, 621. A fifth method involves the use of the display of target protein on phage [63]. Lastly, with the recent advent of microarray technology and the
6.3 General Considerations
1
323 0
..& , .. F N'-
B F N MP
H O I
Cycbpomim
h t e of finity- cyckpamine
"
BObIPY-cyckpdne
Fig. 6-8 Target identification of an inhibitor of Hedgehog
signaling. (a) Structure of the alkaloid cyclopamine. (b) Photoaffinity and radioactive derivative of cyclopamine [59, 601. (c) Fluorescently labeled derivative of cycloparnine [59, 60).
development of increasingly large collections of recombinant proteins from a variety of organisms, including humans, it has become possible to search for the protein targets of a small molecule in a high-throughput manner using protein microarrays (Fig. 6-9) [42, 641. This approach in conjunction with libraries of small molecules that can be easily modified to include a fluorescent label provides a very promising path forward for target identification. In addition to these biochemical methods, genetic mutations that render a cell or organism resistant to the effects of a small molecule have also been used to identify the target of small molecules and other components of the interacting pathway. Now with the advent of collections of genome-wide deletion strains in S. cerevisiae, and related knock-down collections created using RNAi, the loss of function of genes and the matching of mutants with similar phenotypes is being used to suggest candidate targets for further testing [lo, 65-68]. Another approach uses multicopy gene suppression in which the expression of a genomic library is screened for sensitivity or resistance to a particular small molecule [69].While the success of biochemical approaches is dependent on both the specificity of the compound and its affinity, the success of genetic approaches depends on both the specificity of the compound and the availability of existing mutant phenotypes to match the observed phenotypic defects or to discover an interacting mutation. Technical developments in both biochemical and genetic methods, along with the use of computational science described below, will continue to provide improved solutions for target identification in the years to come.
6.3.8 Relationship between Network Connectivity and Discovery o f Small-molecule Probes
A question raised by chemical-genetic screens is why are some proteins targeted by small molecules more frequent than others. For example, in an antimitotic
'
'
Me
324
I
G Forward Chemical Genetics
Fig. 6-9 Target identification of a suppressor of rapamycin [42].
(a) SMIR4 a suppressor of rapamycin identified using a chemical-genetic modifier screen. (b) Identification o f gene products that interact with biotin-SMIR4 using a yeast protein rnicroarray [42].
screen performed by Haggarty and colleagues [70], over 80 small molecules that directly targeted tubulin and two structurally distinct small molecules that arrested cells in mitosis without targeting tubulin were later shown to target the motor protein kinesin-5 (monastrol [71] and HR22C16 [72]). Similarly, DeBonis and colleagues, by screening growth inhibitory compounds that were obtained from the National Cancer Institute collection identified S-trityl-Lcysteine, gossypol, flexeril, and two phenothiazines as kinesin-5 inhibitors [73]. Kau and colleagues in a screen for inhibitors of FOXOla nuclear export found many general inhibitors of nucleocytoplasmic transport, which, like the natural-product leptomycin, directly bind the nuclear export factor CRM 1 [38]. In addition, multiple antagonists of calmodulin signaling were identified [38]. Are some proteins simply more susceptible to modulation by small molecules or do biases exist in the way that targets are identified? One explanation for these observations is provided by emerging models ofthe global organization of cellular networks in which gene products are modeled as nodes and the functions of genes are represented by edges [ll-151. In these
G.3 General Considerations
models, where protein and genetic interaction networks are robust and have a power-law distribution of edges, if a random perturbation results in a change in phenotype, then the perturbation is more likely to target a highly connected node (a node with many edges) than a node with a low degree of connectivity. The relevance of these network properties can be illustrated by the following experiment designed to simulate the act of screening small molecules in a cellbased assay. Consider four nodes (modeling proteins), with edges (modeling a function of a protein) of degrees of one, two, three, and four respectively, such that the total sum of edges equals 10. If these nodes are randomly sampled by picking an edge (simulating a molecular recognition event in which a small molecule modulates a protein function), then even though there is a 25% chance of picking each node, 70% of the time nodes of a degree equal to or greater than three will be selected (assuming replacement of nodes after each selection). This preferential selection of highly connected nodes is due to the increased probability of interacting with a node with many edges. Thus, if we consider that biological systems have evolved over time, and that many gene products have been formed by reusing protein domains (e.g.,immunoglobulin or GTP-binding domains) and by gene duplications, then identifying small molecules with similar phenotypic effects in evolutionary distant organisms may provide a method for mapping the chemical properties ofhighly connected and, therefore, functionally important nodes in biological networks. In support of this, many small molecules, including: rapamycin (inhibitor of TOR proteins), FK506 (calcineurin phosphatase inhibitor), trichostatin A (histone deacetylase inhibitor), colchicines/nocodazole (microtubule destabilizers), taxol (microtubule stabilizer), latrunculin B (actin microfilament destabilizer), brefeldin A (inhibits ADP ribosylation), etoposide/camptothecin (topoisomerase inhibitors), wortmanin (phosphatidylinositol kinase inhibitor), staurosporine (protein kinase C inhibitor), UCN-01 (Chkl/2 inhibitors), caffeine (ATM/ATR kinase inhibitors), roscovitine (cyclin-dependent kinase inhibitor), target functionally important nodes in mammalian cells and have similar biochemical interactions and phenotypic effects in organisms, such as S. cerevisiae. Testing the hypothesis that there exists a correlation between the connectivity of proteins in a biological network and the likelihood of finding a modulating small molecule by screening will require further characterization of the targets of biologically active small molecules in multiple biological systems, and the analysis of the connectivity of these targets in the relevant biological network.
6.3.9 Computational Framework for Forward Chemical Genetics: Legacy o f Morgan and Sturtevant
On testing a set of small molecules in a chemical-genetic screen, it is a natural question to ask how the same small molecules, or ones that are close structural
I
325
326
I analogs, performed in other related or unrelated chemical-genetic screens. As 6 Forward Chemical Genetics
a result of numerous such screens now available in the public domain, the resulting datasets allow answering this question, but the size and complexity (in terms of the number of possible comparisons between objects) of the datasets require the use of computational tools that are designed for allowing visualization and pattern recognition in high-dimensional spaces. The need to develop a suitable computational framework is reminiscent of the need of classical geneticists close to a century ago to develop an analytical framework to guide the then nascent field. At that time, geneticists such as Thomas H. Morgan and his graduate student Alfred H. Sturtevant, were struggling with understanding the nature of Mendelian genes and trying to interpret a growing amount of observational data on heritable variation collected using forward genetic screen in the fruit fly Drosophila [2]. Particularly puzzling was the pattern of inheritance of combinations of traits that did not sort independently during meiosis as predicted by Mendel’s second law (law of independent assortment) [l].After many years of collecting mutants and analyzing data, Morgan and Sturtevant recognized that the “. . .frequency of crossing over (recornbination) furnish[ed] evidence of the linear order of the elements (genes) in each linkage group and of the relative position of the elements (genes) with respect to each other” [2].Accordingly, mutant genes (or allelic variation) could be “mapped” as a point in a one-dimensional space using the metric (measured in centiMorgans) of 1% recombination equal to one map unit. By making overlapping distance measurements, it was discovered that a genetic map corresponding to the relative arrangement of genes in the linear space could be constructed. From these genetic maps, it became apparent that the deviation observed from Mendel’s law of independent assortment could be explained by “linkage” of genes due to their location within a similar position in the space representing the underlying DNA sequence [2]. Although not obvious at the onset of Morgan and Sturtevant’s studies, the maps of these genetic spaces are now known to correspond physically to the arrangement of genes within a linear and continuous sequence of the DNA, constituting a chromosome. In the end, the recognition that genes could be arranged as a linear series provided the conceptual foundation for the eventual sequencing of the complete human and other model organism’s genomes [3]. 6.3.10 Mapping of Chemical Space Using Forward Chemical Genetics
By analogy to the framework for classical genetics developed by Morgan and colleagues, the development of an experimentally driven, computational framework for chemical genetics, which allows the “mapping” ofthe functional units (chemicals) that can induce variation in biological systems, holds the potential to revolutionize the discovery of small-molecule probes for basic
6.3 General Considerations
research and, potentially, the discovery of novel therapeutic targets and agents [74-761. But how can biologically active small molecules be “mapped” as points (loci) in a space? If they can be mapped, what would the global properties of this space look like and, moreover, what might the global properties of such space reveal about the nature of the interaction of small molecules with biological systems? While it is much too early to have a full answer to these questions, a number of ideas have emerged as to how the “mapping” of small molecules using biological descriptors might be approached. Unlike genes, which are physically located at a locus on a chromosome based on their linkage to other sequences of DNA (although they may move owing to transpositions and recombination events), small molecules that induce phenotypic variation in biological systems are themselves not physically located in a space. Thus, if small molecules are to be mapped to a common space, then the space must be considered to represent “abstract space” in the sense that it is mathematically derived [74-761. This abstract space, which we will refer to as “chemical space”, is formed by multiple dimensions, or axes, such that the relative distance between small molecules represented by points becomes a measure of their structural or functional similarity. The notion is that certain regions in this space correspond to small molecules that have similar structure or function. According to such a framework, the corresponding data structure for analyzing chemical space is most often that of a two-dimensional array, or matrix, denoted by S, consisting of an ordered array of n columns and m rows (Fig. 6-10). Each column (y]) in S, corresponds to a descriptor, and is denoted by a bold face, lower case letter subscripted j (wherej = 1 to n). Each row (xi) in S corresponds to a chemical, and is denoted by a bold face, lower case letter subscripted i (where i = 1 to m). Accordingly, an element (en) of S encodes information (m, n) about chemical m for descriptor n. This allows the elements of S to be considered as coordinates in a multidimensional space spanned by the descriptor axes, which, in turn, allows each chemical to be represented as a vector whose magnitude and direction are given by the corresponding values in S, x, = [el, e2, . . . . e,]. In this matrix-based representation of chemical space, the relative distance between chemicals x, becomes a measure of their similarity with respect to the particular descriptors considered. As depicted in Fig. 6-10, when considering the dimensions or axes of chemical space there are two fundamentally different classes of descriptors that are used: computed and measured [74-761. These classes differ insofar as the former are generally calculated using a computer and various algorithms designed to determine the value of a specified mathematical function [77, 781, whereas the latter involve the observation of the effect of a given small molecule on, for example, the function of a gene product (nucleic acids, proteins) or metabolite (carbohydrate, lipid, other organic molecules) [79, 801. Recognizing the distinction between chemical spaces derived from computed descriptors as compared to measured descriptors is of fundamental importance. While the former is unambiguously definable, the latter involves
I
327
328
IG
Fonvard Chemical Genetics
Fig. 6-10 Mapping chemical space 1761. Principle component models o f chemical space are shown for 480 small molecules analyzed using 24 computed molecular descriptors and 60 measured phenotypic descriptors derived from a cell-based assay o f cell proliferation. By considering the elements o f S as coordinates, small molecules can be modeled as vectors, xi = [el, e2, . . . , en], in an n- dimensional vector space. By defining the Euclidean distance D between two vectors (e.g., x1 and x2) in this vector space t o be: D I =~ C[(x1 ~ - xz)’], the space o f chemical-genetic observation can be considered as a metric space. This means the relative distance D between chemicals xi is informative with respect t o similarity between the particular descriptors
considered. Accordingly, small molecules xi can be considered t o befunctionally similar i f they are closely positioned (i.e., within a specified radius) in the underlying descriptor space. Since similarity between small molecules is determined by the pattern o f interaction with biological systems, the corresponding distance metric D complements the definition o f similarity obtained from calculated molecular descriptors based on chemical structure. Furthermore, since similarity in cell-based assays results from patterns o f small molecules interacting with expressed gene products, the corresponding distance metric D complements the definition o f similarity obtained from DNA sequence or gene-expression analysis.
the process of observation, and as such involves noise inherent to the process of measurement. Measured phenotypic descriptors are also subject to the influence of a variety of other variables, including the dose of the chemical, length of treatment, and the genotype of the biological system.
I
6.3 General Considerations 329
Most representations of the structure of small molecules are themselves graphical models of chemicals embedded in a three-dimensional space and projected onto the two-dimensional plane of the paper (or screen) [81]. While such models are useful for visualization purposes, for computational purposes small molecules are best represented more abstractly in the form of an adjacency matrix. This adjacency matrix encodes both the connectivity of a graph composed of nodes as atoms and edges as bonds between nodes (Fig. (3-11).Once represented in this manner, the structure of a small molecule can be analyzed using various graph- and information-theoretic descriptors to quantify topological properties, along with physiochemical properties, such as the molecular weight and estimations of the partition coefficient between octanol and water (cLogP) [74, 75, 811. This format enables a quantitative definition of molecular “similarity”, and provides a means to create a map representing the relative position of small molecules in a space formed from their descriptors (see below) [77, 781. One challenge with using molecular descriptors to create maps of chemical space that can both locally and globally predict biological activity is that a given chemical can exist as a variety of structures corresponding to various protonation, tautomeric, and stereochemical states depending on the molecule’s environment [44, 781. Another challenge is the ability of enzymes to metabolize small molecules into what might be either an active or inactive component. Together, these and other factors contribute to the difficulty of predicting the function of a small molecule, particularly in the context of an intact living system as complex as the human body. Nonetheless, since chemical space can be explicitly defined using specific algorithms to compute molecular descriptors, it seems reasonable to expect that a universally agreed upon set or perhaps biological, mechanism-specific sets of molecular descriptors will be useful for creating maps of chemical space.
Fig. 6-11 Small molecules as chemical node (atom), the type o f edge (bond), and graphs [Sl].Representation of the structure the connectivity of nodes. Hydrogen atoms of small molecules as graphs encoded by an are not considered as nodes in the graph. adjacency matrix that specifies the type of
330
I
6 Forward Chemical Genetics
In contrast to computed molecular descriptors, observed or phenotypic descriptors involve the measurements of the effects of a small molecule on a biological system. Accordingly, phenotypic descriptors provide the opportunity to classify chemical structures by creating maps of chemical space according to biologically relevant descriptors (Fig. 6-12) [74-761. Given the wide range of observable properties of biological systems, the challenge for mapping chemical space using chemical genetics is to determine the most relevant phenotypic descriptors and to measure them in a high-throughput enough manner, which in turn may depend on the biological system and process being studied. Ultimately, it is the relationships between the positions of small molecules in different chemical spaces that will allow researchers to understand the chemotype-phenotype mapping at increasing resolutions (Fig. 6-13). 6.3.1 1 Dimensionality Reduction and Visualization of Chemical Space
Given a multidimensional matrix of data derived from chemical-genetic screens and computed molecular descriptors (Fig. 6-10), meaningful visual
Fig. 6-12 Mapping chemical space using multidimensional phenotypic descriptors. Phenotypic data from multiple assays are arranged in a chemical-genetic data array and computational methods are used t o select small molecules for further
characterization. Clustering and the construction o f chemical-genetic networks provide methods for visualization o f high-dimensional observation spaces and pattern finding.
6.3 General Considerations
Fig. 6-13 Overview ofchemical space. O n the left, chemicals are positioned in space using computed molecular descriptors. O n the right chemicals are positioned in space using measured phenotypic descriptors of biological activity.
and compact representations are required to allow for data exploration and to facilitate subsequent modeling efforts aimed at understanding the relationships between objects (small molecules and assays). To solve related problems in other fields of study, a variety of “dimensionality-reduction” and pattern-finding techniques have been developed [77- 791. Although differences exist in the specific algorithms, the techniques share the common goals of extracting trends and information that is otherwise not apparent from manual inspection, and to provide a more compact representation or model of the data. In doing so, dimensionality-reduction and pattern-finding techniques allow for the creation of higher-level representations of the information inherent in the lower-level relational data with a large data matrix. In general, two types of such “learning” techniques are used: supervised and unsupervised. In supervised learning, a set oflabeled or known data is used to classify the rest of an unknown dataset. Alternatively, in unsupervised learning the goal is to discover a “natural” grouping of objects without knowledge of any class labels. One method of unsupervised learning that has proved useful for analyzing data from chemical-genetic screens is called clustering. This method attempts to cluster objects into sets that are somehow related on the basis of a set
I
331
332
I of descriptors. For example, consider a model dataset consisting of seven 6 Forward Chemical Genetics
SMPs (SMP-1to -7) and a control treatment (e.g., only organic solvent), which are subject to an array of five, chemical-genetic screens consisting of three cell-based assays measuring: (a) neurite extension, (b) neuron viability, and (c) synapse formation, and two in vitro assays with cell extracts to measure the polymerization of: (d) actin, and (e) tubulin (Fig. 6-14(a)).In the resulting data matrix, a value of “1” encodes the observation that the SMPs were active in the assay and otherwise a value of “0” is used. Even with such a small dataset, which uses a binary rather than a continuous valued measure, the challenge of defining the major activity patterns and the compounds that are similar to each other becomes apparent. What exactly does “similar” mean and how is it computed? Although for binary data other distance, metrics are in general more appropriate (e.g., Tanimoto metrics), for simplicity we can compute the standardized (to the mean and standard deviation of the distribution) Pearson correlation matrix, which contains the correlation coefficients between each of the five assays. These data can then be used to cluster the chemicals based on their correlation as a metric of similarity. The groupings depicted in Fig. 6-14(b)
Assay
A
- SMP-1 SMP-2 -8 SMP-3 2 E : SMP6 Q)
3
E
u,
SMP-7 coaliol
Neiirite Exteiisioe N e w o i l Viability Syiiapse Foriliatioil
1
1
1
0 1
0
1 1
Small Molecule Clustering
1
1
0
0
1 0
1 1 I
1
0
0 I
1 0
0
C
Fig. 6-14 Cluster analysis of multidimensional, chemical-genetic data. (a) Example o f five small-molecule perturbagens (SMP-1 t o -7) and their activity i n five phenotypic assays. A value o f “1”
Actiii 1
0 1
0 0 1 0 0
Tiibiiliii 0 1 0
1
0
0 0 0
Assay Clustering
indicates activity and a value o f “0” indicates that t h e c o m p o u n d was inactive. (b) Dendrogram showing clustering o f t h e small molecules. (c) Dendrogram showing clustering o f t h e assays.
6 . 3 General Considerations
reflect the fact that, of the seven SMPs, some had identical patterns of activity (analogous to mutations mapping to the same region of the chromosome), while others showed varying levels of common activity (analogous to mutations mapping to different regions of a chromosome). Likewise, by transposing the data matrix and considering the small molecules as descriptors for the phenotypic assays, it becomes possible to use the information encoded in the pattern of interaction of small molecules with biological systems to classify the assay measurements instead of the small molecules (Fig. G-l4(c)). Just as for the small molecules, the resulting data creates a high-dimensional, information-rich signature of the biological system being probed, which in turn can be used for pattern recognition and classification. The activity patterns from small-molecule descriptors can provide a measure of the diversity of particular cell types or cell states when subject to additional perturbations, such as those provided by natural genetic variation and chemical-genetic modifiers. When characterizing different genotypes, the generation of these “perturbation profiles”, by analogy to mRNA profiling, has been referred to as chemical-genomicprofiling (see below) [82]. The nature of these profiles can shed light on the underlying chemical differences between cell states, and may eventually be useful as cellular network-based diagnostics to complement traditional use of DNA sequence analysis. However, to date there have been only a few studies that have purposefully used the patterns of activities of small molecules to classify biological systems. Besides clustering, which has been widely used to group small molecules into various structural and activity classes, another method of dimensionality reduction for multidimensional chemical-genetic screening is that of principal component analysis (PCA). Unlike clustering, this method does not group small molecules into discrete groups by imposing a particular structure of the data (i.e.,to form clusters). Instead, to analyze the diversity of small molecules, PCA consists of a linear transformation of the original system of axes formed by the n-dimensions of the data matrix, where n is the number of descriptors. This transformation is in the form of a Euclidean distance-preserving rotation, the directions of which are determined by computing a set of eigenvectors and corresponding eigenvalues of a diversity matrix created by computing a standardized covariance matrix (i.e., Pearson correlation coefficients). The resulting eigenvectors provide a new set of linearly independent, orthogonal axes, calledfactors or principal components,each ofwhich accounts for successive directions in the n-dimensional ellipsoid spanning the multivariate distribution of the original data. The corresponding eigenvalues account for progressively smaller fractions of the total variance in the original data. Accordingly, PCA creates a global model that minimizes the information lost on projection into a space of reduced dimensionality, and is thus well suited for exploring complex activity patterns and datasets that do not have a clustered structure. Besides allowing for visualization of multidimensional data, PCA has a practical application for data analysis, as the reduced number of dimensions simplifies subsequent computations that may be memory- and time-intensive. While PCA
1
333
334
I provides a readily computable, linear dimensionality reduction affording linear G Fonvard Chemical Genetics
combinations of descriptors that allow for the maximum amount of variance to be described by a minimum set of descriptors, a number of algorithms with improved outcome have been described, and others will undoubtedly be developed in the years to come. Following the example ofthe model data set shown in Fig. 6-14(a),to perform PCA the correlation matrix is computed to reveal the relationship between the descriptors being considered. From the correlation matrix, the eigenvalues and corresponding eigenvectors are computed (Fig. 6-15(a)).These eigenvalues are mathematical objects that represent the quality of the dimensionality reduction from the original multidimensional space. For ideal representations, the first two or three eigenvalues will correspond to a high percentage of the variance. Each eigenvalue corresponds to a factor (a linear combination of the initial descriptors that is uncorrelated with the other factors), and each factor corresponds to one dimension in the new space. In this example (Fig. 6-15(a)),the first eigenvalue equals 2.43 and represents 48.5% of the total variability. This means that if we were to represent the data on only one axis we would still be able to see 48.5% of the total variability of the data. The “cumulative %” calculated from the eigenvalues provides an idea of the global variability represented when using the axes of interest. Using the corresponding eigenvectors to create a new rotated axis, the SMPs can be seen distributed throughout the resulting assay measurement space, with the distance between them in the reduced space (here three of the five original dimensions) a measure of their similarity (Fig. 6-15(b)).Thus, like the cluster analysis, we conclude that the pairs of compounds SMP-1, and -3 and SMP-5, and -7 are the same with the distance between the other compounds a measure of their functional differences. As the size of the dataset and complexity of the activity patterns increases, methods of analysis like PCA become invaluable tools for discerning the global activity patterns and relationships between objects on the axes [80].
6.3.12 Discrete Methods of Analysis o f Forward Chemical-genetic Data
Given a multidimensional matrix of data derived from chemical-genetic screens, it is also possible to use computational tools derived from the field of discrete mathematics and principles, again, borrowed from graph theory [81].For example, through multiple screens biologically active small molecules can be linked together into a network of chemical-genetic interactions, which can be represented by the graph G = (V, E ) , where V represents either small molecules or assays and E represents edges indicating the activity of a small molecule in a given assay (Fig. 6-16). To determine that a small molecule is active, a threshold or a statistical measure based on a control distribution of inactive or control compounds can be used. Ultimately, the topology of the
6.3 General Considerations
sw-2 SMP-3
Control
.9. * ii
8‘2
Fig. 6-15 Principal component analysis o f multidimensional, chemical-genetic data. (a) Eigenvalues and associated variance, and eigenvectors and associated factor scores computed from the data in Fig. 6-14(a). The matrix of eigenvectors
defines a coordinate transform (rotation) that best decorrelates the data into orthogonal linear subspaces. (b) Resulting three-dimensional chemical space created from using the first factors (principal components) as axes.
Fig. 6-16 A chemical-genetic network representing a graph C = ( V , E ) (data from [82]). Each node ( V circles) represents a biologically active small molecule or a phenotypic assay and each edge (E; line) represents an observed biological activity. Shown here is an undirected, unweighted, bipartite graph with a total of 426 nodes (V)
and 1107 edges ( E ) between small-molecule nodes (colored red or yellow for active; gray for inactive; total o f 352) and an assay node (colored blue; total o f 74 in 7 organisms). This “energy-minimized’’ representation was computed using Pajek v0.72 (see http://vlado.fmf.uniIj.si/pub/networks/pajek/).
1
335
336
I chemical-genetic network for a particular biological system will be determined I; Forward Chemical Genetics
by the selectivity of the small molecules and constrained by the properties of the underlying biological networks being studied. This graph-theoretic framework is well suited for visualizing the results of performing chemical-genetic modifier screens iteratively on any of the active products of an assay. Here, each node represents a biologically active small molecule (e.g., an enhancer or a suppressor) that is linked (represented by an edge) to new nodes (small molecules with differentfunctions) through different phenotypic assays. The result is reminiscent of the use of pairs of complexitygenerating reactions with an essential product-substrate relationship along a synthetic pathway to create structurally complex and diverse compounds. In this case, each node in the corresponding network represents a discrete chemical entity that can be linked (represented by an edge) to new nodes (small molecules with different structures) through synthetic transformations. Thus, the recognition of “product-substrate” relationships is useful for both the designing of diverse collections of small molecules and the exploration of the diversity of biological mechanisms.
6.4 Applications and Practical Examples
One of the most useful applications of chemical genetics is to reveal the gene products that function in pathways or processes in an unbiased manner. In this section we will describe two practical examples. We will then end with another example of applying collections of small molecules discovered using chemical genetics to study the phenotypic differences of cells with different genotypes in an unbiased, global manner (chemical-genomic profiling).
6.4.1 Example 1: Mitosis and Spindle Assembly
Since Pernice’s description in 1889 of the effects of colchicines, small molecules have played essential roles in dissecting the molecular mechanisms involved in chromosome segregation during mitosis (Fig. 6-2)[83],and later in the discovery of tubulin as the cellular target. Owing to the clinical efficacy of inhibitors of mitosis as antitumor agents, such as paclitaxel (Taxol) [84],which were originally discovered by the National Cancer Institute’s plant naturalproduct screening program in the early 1960s [85],numerous chemical-genetic screens for inhibitors of mitosis have been performed. Most of these screens have used natural-product extracts as a source material of chemical diversity [83].In an attempt to discover new inhibitors ofmitosis from a synthetic library that worked in ways similar and different from existing small molecules, Haggarty and colleagues used a collection of 16320 compounds and both
6.4 Applications and Practical Examples
Fig. 6-17 Forward chemical-genetic screen for inhibitors of mitosis (data from Ref. 73). (a) Overview o f mitotic cell cycle. (b) Example of data from one 384-well plate form the cytoblot primary screen with increased TC-3 mAb reactivity indicative of
compound activity from the initial cell-based and in vitro tubulin polymerization assay. (d) Examples o f a compound that destabilized microtubules (deploy-2b) and a compound that stabilized microtubules (synstab A).
an increased mitotic index. (c) Summary o f
phenotypic and biochemical assays (Fig. 6-17(a))[70]. As an initial filter, the compounds were screened using a high-throughput cytoblot assay, where an antibody is used to detect a posttranslational modification characteristic of the process ofinterest [31].This assay used TG-3, a monoclonal antibody (mAb)that recognizes a phosphorylated form of the protein nucleolin formed in mitosis, to report indirectly on the progress of cells through mitosis [86]. Accordingly, small molecules that increase the reactivity of this mAb in cells are likely to have arrested cells in the mitotic state. Since many compounds that were previously shown to arrest cells in mitosis affect directly the polymerization of a - and B-tubulin (the heterodimeric subunits of microtubules), and thereby alter the microtubule dynamics of the mitotic spindle, compounds that scored positive in this initial assay are subsequently tested in an in vitro tubulin polymerization assay. Finally, to classify compounds further based on their phenotypic effects, fluorescence microscopy was used to visualize
1
337
338
6 Forward Chemical Genetics
I the distribution of microtubules, actin, and chromatin in cells treated with compounds of interest. Two rounds of screening 16 320 compounds at -20-50 PM resulted in the identification of 139 compounds that increased the number of cells in mitosis (Fig. 6-17(b))[70]. Fifty-two of these compounds destabilized and one compound, named synstab A (for synthetic stabilizer), stabilized microtubules through a direct interaction with tubulin. Although the discovery of smallmolecule inhibitors of protein-protein interactions is in general demanding, approximately 0.3% of compounds screened were found to be direct inhibitors of alp-tubulin interactions in this study, which illustrates an example of using phenotypic screenings to identify components in a pathway that are most easily targeted by small molecules. It also suggests that the toxicity associated with many compounds may be due to their ability to destabilize microtubules. To determine the mechanism of action of the 86 compounds, each was tested in a TG-3 cytoblot assay using cells that had previously been arrested in interphase by the histone deacetylase inhibitor, trichostatin A or the topoisomerase I1 inhibitor, ICRF-193 (Fig. 6-18). Under these conditions, none of the compounds allowed cells to accumulate in mitosis, indicating that they require active cell-cycle progression for an increase of reactivity with the TG-3 mAb. Subsequent cellular studies revealed that many of these small molecules cause an altered stability of microtubules in cells in interphase suggesting that they also targeted tubulin (Fig. 6-17(c)).The common occurrence of compounds targeting microtubules recapitulated what has been observed in natural-product screening, where the sensitivity of cells to perturbation of the mitotic spindle was first observed [83]. This screen, however, identified for the first time compounds that affect the mitotic machinery without directly targeting microtubules. As discussed in Mayer et al. [71],the unique monopolar phenotype of one of these compounds, named monastrol, inhibits specifically the motor protein kinesin-5 (Fig. 6-18). This provided evidence for the first time of a means to perturb the mitotic spindle without directly targeting tubulin. Subsequently, monastrol has been a useful tool for dissecting the molecular mechanisms underlying spindle assembly [87] Second generation, more potent kinesin-5 inhibitors have now been discovered and are beginning to be tested in tumor models. 6.4.2 Example 2: Protein Acetylation
To expand further the molecular toolbox available for studying intracellular protein acetylation [88], a number of chemical-genetic screens have been performed. To identify probes of the mechanism through which HDAC inhibitors cause cell-cell cycle arrest and affect histone acetylation, a “cytoblot” cell-based screen was used to identify small-molecule suppressors of the trichostatin A named the ITSAs (for inhibitor of trichostatin A) (Fig. 6-19) [40].
G.4 Applications and Practical Examples
Fig. 6-18 New activities in chemical space and the target o f monastrol. (a) Three-dimensional representation o f chemical space showing the position o f 15 120 small molecules-(colored balls) in a molecular descriptor space derived from the first three principal components axes (W1 W3) obtained from the analysis ofthe corresponding structural and physiochemical descriptors (data from Refs 40, 41, 70, 80). Inset shows 132 biologically active small molecules colored based on phenotypic data from cell-based assays for suppressors o f the topoisomerase inhibitor ICRF-193 (red), suppressors o f t h e histone deacetylase inhibitor trichostatin A (green), ~
and antimitotics (blue). In all, there were 20 suppressors o f ICRF-193, 21 suppressors o f ITSA, 89 antimitotics, and 2 small molecules that scored in both the antimitotic and trichostatin A suppressor screen. Monastrol's location was as shown. Testing o f over 30 structurally similar analogs revealed no other active compounds [71]. (b) Cocrystal structure o f monastrol with the motor domain o f human KSP (Eg5) showing that monastrol confers inhibition by creating an "induced-fit'' to a pocket away from t h e adenosine triphosphate and magnesium binding site within the catalytic center (data from Ref. 87).
Besides counteracting the cell-cycle arrest phenotype of trichostatin A, the ITS As counteract trichostatin-induced histone acetylation and transcriptional activation. Some of these ITSAs are active as suppressors of trichostatin A in zebrafish and yeast suggesting they target an evolutionarily conserved component of chromatin remodeling. As such, suppressors of HDAC inhibitors, such as the ITSAs, may prove to be valuable probes of many biological processes involving protein acetylation. In addition to butyrate, trichostatin A, and trapoxin B, other small-molecule inhibitors of protein deacetylation have been identified from both natural and
I
339
340
I
6 Forward Chemical Genetics
Fig. 6-19 Chemical-genetic modifiers o f trichostatin A (data from Ref. 40). Trichostatin A causes cell-cycle arrest, which is correlated with an increase in histone
acetylation and altered chromatin remodeling. The “ITSAs” (for inhibitor o f trichostatin A) suppress the ability of trichostatin A t o arrest the cell cycle.
synthetic sources [55]. For example, using a panel of cell-based assays based on the recognition of histone and a-tubulin acetylation on specific lysine residues using antibodies and a library of over 7200 small molecules derived from a diversity-oriented synthesis that included “biasing” elements to target the compounds toward the family of HDACs [89], over 600 small-molecule inhibitors of protein deacetylation were identified (Fig. 6-20) [80]. Following the decoding of chemical tags and resynthesis, the selectivity of one inhibitory molecule (tubacin) was shown toward a-tubulin deacetylation and that of another (histacin) toward histone deacetylation (Fig. 6-21) [80]. Tubacin was found not to affect the level of histone acetylation, gene-expression patterns, or cell-cycle progression. Using immunoprecipitated, recombinant enzyme, it was determined that the class I1 histone deacetylase 6 (HDAC6) is the intracellular target of tubacin [90]. Through a combination of the use of catalytically inactive point mutations in each of the two catalytic domains of HDAC6 and tubacin, it was shown that only one of the two catalytic domains of HDAC6 possesses tubulin deacetylase activity, and that only that domain’s deacetylase activity could be inhibited by tubacin. Collectively, the small molecules identified as suppressors of trichostatin A (ITSAs) and the selective inhibitors of protein deacetylation should facilitate dissecting of the role of acetylation in a variety of cell-biological processes (Fig. 6-22) [40, 901. 6.4.3 Example 3: Chemical-genomic Profiling
With increasing appreciation of the contribution of genotype to the outcome of therapeutic treatments, efforts in drug discovery are moving more toward
6.4 Applications and Practical Examples
Fig. 6-20 Forward chemical-genetic screen for inhibitors o f protein deacetylation (data from Ref. 80). (a) Overview o f cell-based screens o f the 1,3-dioxane-based, diversity-oriented synthesis-derived library using antibodies t o measure tubulin and histone acetylation. (b) Relative position o f selected active compounds in a three-dimensional principal component model computed from five cell-based assay descriptors. AcTubulin-selective (red),
AcLysine-selective (green), and most potent (blue). (c) Chemical-genetic network from screening data after applying the Fruchterman-Reingold “energy” minimization algorithm (http://vlado.fmf.uni-lj.si/pu b/ networkslpajekl). Nodes represent either assays or small molecules according t o the indicated colors. Edges (black lines) connect bioactive small molecules t o the corresponding assay.
“personalized medicine” based on an individual’s genetic make up. As a result, there is much interest in characterizing the genetic differences between cells using profiling experiments, where genome-wide measurements yield rich fingerprints for comparison and interpretation. While differential labeling of mRNA or protein samples and their analyses on microarrays and two-dimensional gels, respectively, are facilitating global views of biological networks, they do so by ultimately analyzing intrinsic molecular features of gene products strictly in an observational manner. In contrast, a new type of profiling experiment where the response of genetically similar but not identical cells to individual or pairwise combinations of biologically active small molecules has been developed, which is referred to as chemical-genomicprojiling (Fig. 6-23(a)). Using this method of profiling, the ability of combinations of small molecules to interact antagonistically or synergistically provides a chemical tool to resolve differences between biological networks. Because the outcomes of this method of profiling are dependent on the interaction of small molecules in the context of an intact genetic network (i.e., perturbations),
I
341
342
I
G Forward Chemical Genetics
Fig. 6-21 Selective inhibitors ofu-tubulin (tubacin) and histone deacetylation (histacin) identified by chemical-genetic screening [go].
this method differs fundamentally from profiling methods based on DNA sequence or mRNA/protein expression patterns (i.e., observations). For example, chemical-genomic profiling was performed using a WT strain of the budding yeast S. cerevisiae along with nine otherwise isogenic deletion strains, each missing a component of the cell polarity network [82]. As a model phenotype relevant to the function of the deleted genes, cell-cycle progression was used. To obtain a chemical-genomic profile, a two-dimensional matrix of all possible painvise combinations of 24 small molecules, each with a different structure, was expanded in a third dimension by using the WT and nine deletion strains. In total, 5760 assay measurements were obtained (Fig. 6-23(b)).Besides a set of 4 known biologically active small molecules, 20 additional biologically active small molecules were used that had been discovered in yeast chemical-genetic modifier and synthetic-lethal screens. Given that many of these modulators have unknown targets and mechanisms of action, they were referred to as SMPs, for “small-molecule perturbagens”. After analyzing the growth of each well, the data were encoded into the form of a binary adjacency matrix, A, with one row and one column for each of the 24 small molecules. A value of 0 was used to indicate no observable effect on growth, and a value of 1 was used to indicate no growth or that growth was reduced, in both replicates. Each adjacency matrix was then used to construct
6.4 Applications and Practical Examples
Fig. 6-22
Molecular tools for the dissection o f intracellular protein acetylation [40, 80)
a discrete model in the form of a graph G = (V, E) composed of V nodes, one for each small molecule, and E edges connecting nodes representing small molecules whose combination resulted in a value of 1 in the adjacency matrix A. The results obtained revealed that the structure of the genetic network determines the structure of the chemical-genetic network with none of the deletion strain networks being identical to each other or the WT network (Fig. 6-23(c)).Given a graphical representation of the phenotypic differences, graph-theoretic descriptors that are analogous to molecular descriptors used for the quantitative analysis and comparison of the structures of small molecules were computed for each of the 10 chemical-genetic networks. Collectively, the numerical values of the descriptors yielded a topological fingerprint of each chemical-genetic network; standard clustering and dimensionality-reduction algorithms were used to reveal global similaritiesldifferences of the observed chemical-genetic networks. Besides aiding the characterization of molecular diversity and annotation of chemical space, the results suggest that chemicalgenomic profiling may serve as a tool for the characterization of perturbations in biological networks or of the networks themselves (e.g., as a diagnostic tool). These capabilities may lead to new approaches to discern the molecular
I
343
344
I
G Forward Chemical Genetics
Fig. 6-23 Chemical-genomic profiling (data from Ref. 82). (a) 276 unique combinations and 24 single treatments o f “small-molecule perturbagens” (SMPs) were assayed for an effect on the cell cycle o f budding yeast. Each ofthe 10 strains profiled had a different genotype yielding a three-dimensional matrix o f 24 x 24 x 10 observations. (b) Structures o f 23 small molecules (other than dimethylsulfoxide) used to profile 10 yeast genotypes in a three-dimensional matrix. (c) Twenty-four
node networks derived from the mapping o f a matrix o f 2 4 x 24 combinations o f small molecules against a set o f 10 strains o f t h e budding yeast. Graphs were visualized using Pajek v0.72 and “energy” minimizations performed using the Fructherman-Reingold algorithm (http://vlado.fmf.unilj.si/pub/networks/pajek/). None o f the 10 chemical-genetic networks were identical, indicating that the structure o f t h e genetic network determines the structure ofthe chemical-genetic network.
etiology of complex phenotypes, including those involved in human disease, that in the case of quantitative traits, emerge as a result of the additive effects of multiple alleles.
6.5 Future Development
For chemical genetics to truly compete with classical genetics, and for it to function as a general approach to dissecting biological mechanisms, there
6.5 Future Development 1345
needs to be continued development and refinement of the techniques for screening and assessing complex patterns of phenotypic changes. Besides the specific examples of identifying inhibitors of mitosis and modulators of protein deacetylation described above, it is worth noting the remarkable ability of antibodies to detect posttranslational modifications of proteins and other biosynthetic events that occur intracellularly at a single-cell level. Antibodies differ from small molecules in their size, composition, and origin as they are immunoglobulins composed of both heavy and light chains, which are secreted by immune system cells. The ability to recognize epitopes, as small as a single acetyl group within the context of chromatin or a single phosphate group on a protein within the cytoplasm of cells, speaks of their specificity and power as markers of phenotypes. The development of an expanded collection of cell-state selective antibodies, and improved methods for multiplexing multiple probes in parallel or in series would have widespread utility for chemical genetics as part of cytoblot and image-based screens. Similarly, further development of genetically encoded probes that allow for imaging of signaling events and cellular processes in live cells in real time will open up previously unexplored areas of cellular biology. In particular, the use of genetically encoded probes targeted to specific cell populations will be useful for creating more complex and physiologically relevant assays, particularly in animal models. By aiming to provide information-rich profiles of chemical and biological systems, chemical genetics should provide a framework for a number of lines of deeper inquiry that will continue to challenge chemical biologists for many years to come. One line of inquiry will be to investigate the cellular mechanisms in terms of interactions(s) with a molecular target, a cell, and an entire organism. A prerequisite for many of these studies and the understanding of such chemotype-phenotype relations will be the discovery of specific molecular targets of small molecules using proteome-wide approaches (Fig. 6-9). With targets in hand, these efforts can be merged with structural biology efforts to look at atomic resolution interactions, and an examination of the degree to which specificity for targets influences the observed phenotypic effects. With the use of phenotypic descriptors derived from cell-based assays, a second line of inquiry will be to determine how well traditional statistical approaches involving linear and nonlinear regression can derive structure-activity relationships, or whether alternative approaches, for example, based on creating discrete graphical networks, are required (Fig. 6-16). There also remains a paucity of studies addressing more general properties of bioactive molecules, independent of those that are developed into drugs. Furthermore, with the development of numerous natural-product-like small molecules that are entering the realm of screening, and the noted differences between many natural products and drugs, it remains to be seen whether a strict adherence to rules, such as those developed by Lipinski based on analyzing known drugs, continues to hold up as the best predictor of biological activity for probe development and therapeutic drug discovery.
346
I
G Forward Chemical Genetics
Lastly, it may be possible to search for a “molecular recognition code(s)” that ultimately determines the mapping, both locally and globally, between molecules in multidimensional molecular descriptor spaces and multidimensional phenotypic descriptor spaces (Fig. 6-10). These codes may be considered at a variety of levels, including more general categories that allow the prediction of properties relevant to the interaction with different subcellular structures (e.g., the mitochondria or cytoskeleton) or different biological systems (e.g., the xenobiotic transformation systems involved in drug metabolism). Knowledge of such codes would, as did knowledge of the genetic code, usher in a new era of research and medical advances that would allow the systematic modulation of gene-product function. Besides these lines of inquiry, there are a number of “grand challenges” for chemical genetics (Fig. 6-24).One of these grand challenges will undoubtedly be to assay, in a high-throughput multiplexed manner, in real time, in live cells, the signal transduction events leading from an extracellular stimulus, to the intracellular signaling events that lead to a change in chromatin structure, changes in gene expression, protein translation, and consequent biological response. To return to its roots, perhaps the ideal model pathway for developing this capability will be that of T-cell-receptoractivation in lymphocytes, leading to the activation of calcineurin, changes in chromatin remodeling at NF-AT target genes, and the resulting secretion of IL-2, which were elucidated in part as described above using CsA and FK506. Here, assays exist for many of the steps in the pathway, although not yet in a suitable manner that allows the interrogation of live cells and the measurement of changes in real time. For the latter reasons, such an effort will require further advances in the use
Fig. 6-24
Future challenges for forward chemical-genetic discovery of probes of biological mechanisms.
6.6 Conclusion
of genetically-encoded or small-molecule fluorescent probes, and automated imaging. A second “grand challenge” will be to test the hypothesis that there exists a correlation between the connectivity of proteins in the underlying biological network and the likelihood of finding a cognate small molecule by chemicalgenetic screening. As explained above, this will require substantial development in the future of improved methods for target identification and understanding the overall topology of biological networks (genetic and biochemical). A final “grand challenge” that, ideally, would be incorporated into a scheme for assaying the effects of small molecules from the cell surface to the nucleus as described above, would be to use immortalized human cell lines, or even differentiated human stem-cell lines, that have been fully genotyped and are known to provide a comprehensive sample of the major patterns of genetic diversity for screening. With this set of cell lines as a reference set, it could then be possible to determine whether individual or combinations of “SMPs” can reveal phenotypic consequences of otherwise cryptic allelic differences that act in concert to create complex, non-Mendelian traits associated with human disease. Should this be possible, then chemical genetics will truly have proven its merit and contributed to our understanding of genotype-phenotype relationships. 6.6
Conclusion Indeed, the vista ofthe biochemist is one with a n infinite horizon. And yet, this program of explaining the simple through the complex smacks suspiciously ofthe program ofexplaining atoms in terms ofcomplex mechanical models. I t looks sane until the paradoxes crop up and come into sharperfocus. In Biology we are not yet at the point where we are presented with clear paradoxes and this will not happen until the analysis ofthe behavior ofliving cells has been carried intofargreater detail. This analysis should be done on the living cell’s own terms and the theories should befomulated withoutfear ofcontradicting molecular physics.
Max Delbruck Nobel prize in medicine or physiology, 1958
Mendel’s rules for considering the discreteness and combinatorics of inherited traits provided a foundation for classical genetics that has continued to provide insight into genotype-phenotype relations and the nature of heredity for more than a century [l].By using small molecules to perturb biological systems conditionally at the level of gene products, rather than at the level of genes themselves, chemical genetics promises to complement the use of classical genetic analysis to study a wide range of biological mechanisms and systems [S-lO]. Because of the confluence of recent technical and conceptual developments, the field of chemical biology in general, and chemical genetics in particular, is well poised to translate the discoveries made by genomics and proteomic studies into tools and technologies that will be transformative
1
347
348
I in basic and biomedical research. While earlier advances in the field have G Forward Chemical Genetics
previously come from molecular biology, chemical synthesis, and materials science, future advances will require integration of the information derived from computational studies of molecular structure and observational studies of molecular function into global models that are both explanatory and predictive. To this end, the analysis of multidimensional data derived from chemical genetics, using methods of dimensionality-reduction and pattern-finding techniques, is beginning to provide a computational framework for mapping multidimensional, chemical descriptor spaces [74-77, 911. Overall, these techniques allow for the creation of higher-level representations of the information inherent in the lower-level relational data encoded within matrices of data. The systematic screening of small molecules in minimally redundant, cell- and organism-based assays, which cover a wide range of biological phenotypes relevant to basic and clinical research, will enable accurate maps of chemical space to be constructed, which can be compared to those derived from using computed molecular descriptors. Here, the use of global methods of analysis, when coupled with local methods aimed at validating and elucidating the mechanisms of action of reference makers (landmarks) in these spaces, should allow, over time, for increasingly higher resolution maps to be created, analogous to the progression of genetic maps over the past century. As evidenced by the efforts toward the development of ChemBank [92], Blueprint’s Small-Molecule Interaction Database (SMID) [93], and the PubChem Database [94], the importance of computational science, and open access to information on small-molecule activities and structures, to chemical biology is rapidly growing and will continue to do so in the future. Through continued refinement and development of new techniques, particularly for target identification and understanding the influence of genotype on biological activity of small molecules [95],it should be possible to annotate genomes, not only by sequence analysis but also functionally using the language of organic chemistry. Should it prove possible to use individual or combinations of SMPs to reveal phenotypic consequences of otherwise cryptic allelic differences that act in concert to create complex, non-Mendelian traits, chemical genetics will truly have earned its name. As summarized by Max Delbruck, who originally trained as a physicist under Niels Bohr, the vista of the chemical biologist indeed “is one with an infinite horizon”. For this reason, the use of forward chemical genetics to discover small-molecule probes for biological mechanisms will likely continue to flourish in the years to come.
Acknowledgments
Members of the Schreiber Lab, the Broad Institute’s Chemical Biology Program, and Michel Roberge are thanked for sharing their insight and passion for chemical genetics. We apologize to our colleagues whose work we were unable to cite for reasons of space constraints.
References References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
G . Mendel, Experiments in Plant Hybridization, Harvard University Press, Cambridge, 1963. T.H. Morgan, A.H. Sturtevant, H.J. Muller, C.B. Bridges, The Mechanism ofMendelian Heredity, Henry Holt and Company, New York, 1915. E.S. Lander, et al., Initial sequencing and analysis of the human genome, Nature 2001, 409,860-921. T.J. Mitchison, Towards a pharmacological genetics, Chem. Biol. 1994, I, 3-6. B.R. Stockwell, Chemical genetics: ligand-based discovery of gene function, Nat. Rev. Genet. 2000, I, 116-125. S.L. Schreiber, The small-molecule approach to biology: chemical genetics and diversity-oriented organic synthesis make possible the systematic exploration of biology, Chem. Eng. News 2003,81, 51-61. K.M. Specht, K.M. Shokat, The emerging power of chemical genetics, Curr. Opin. Cell Biol. 2002, 14, 155-159. S.L. Schreiber, Chemical genetics resulting from a passion for synthetic organic chemistry, Bioorg. Med. Chem. 1998, 6, 1127-1152. B.R. Stockwell, Exploring biology with small organic molecules, Nature 2004, 432,846-854. S. Shang, D.S. Tan, Advancing chemistry and biology through diversity-oriented synthesis of natural product-like libraries, Curr. Opin. Chem. Biol. 2005, 9, 248-258. J.R. Sharom, D.S. Bellows, M. Tyers, From large networks to small molecules, Curr. Opin. Chem. Biol. 2004,8,81-90. H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, A.L. Barabasi, The large scale organization of metabolic networks, Nature 2000,407,651-654. S. Maslov, K. Sneppen, Specificity and stability in topology of protein networks, Science 2002, 296, 910-913.
14.
15.
16.
17.
18.
19.
20.
21.
22.
R. Albert, H. Jeong, A.L. Barabasi, Error and attack tolerance of complex networks, Nature 2000, 406, 378-382. T.1. Lee, N.J. Rinaldi, F. Robert, D.T. Odom, 2. Bar-Joseph, G.K. Gerber, N.M. Hannett, C.T. Harbison, C.M. Thompson, I. Simon, J. Zeitlinger, E.G. Jennings, H.L. Murray, D.B. Gordon, B. Ren, J.J.Wyrick, J.B. Tagne, T.L. Volkert, E. Fraenkel, D.K. Gifford, R.A. Young, Transcriptional regulatory networks in Saccharomyces cerevisiae, Science 2002, 298, 799-804. S.L. Schreiber, Chemistry and biology of the immunophilins and their immunosuppressive ligands, Science 1991, 251,283-287. S. Ho, N. Clipstone, L. Timmermann, J. Northrop, 1. Graef, D. Fiorentino, J. Nourse, G.R. Crabtree, The mechanism of action of cyclosporin a and FK506, Clin. Immunol. Immunopathol. 1996 80, S4O-S45. T. Kino, H. Hatanaka, M. Hashimoto, M. Nishiyama, T. Goto, M. Okuhara, M. Kohsaka, H. Aoki, H. Imanaka, FK-506, a novel immunosuppressant isolated from a Streptomyces. I . Fermentation, isolation, and physico-chemical and biological characteristics, J . Antibiot. 1987, 40, 1249- 1255. M.W. Harding, A. Galat, D.E. Uehling, S.L. Schreiber, A receptor for the immunosuppressant FK506 is a cis-trans peptidyl-prolyl isomerase, Nature 1989, 341,758-760. J . Liu, J.D. Farmer Jr, W.S. Lane, J . Friedman, I. Weissman, S.L. Schreiber, Calcineurin is a common target of cyclophilin-cyclosporin a and FKBP-FK506 complexes, Cell 1991, 66, 807-815. J. Aramburu, J. Heitman, G.R. Crabtree, Calcineurin: a central controller of signalling in eukaryotes, EMBO Rep. 2004, 5, 343-348. G.J. Hannon, J.J. Rossi, Unlocking the potential of the human genome with RNA interference, Nature 2004, 431, 371-378.
1349
350
IG
Fonvard Chemical Genetics
23.
24.
25.
26.
27.
28.
29.
30.
C.C. Mello, D. Conte Jr, Revealing the world of RNA interference, Nature 2004,431,338-342. L.H. Hartwell, Twenty-five years of cell cycle genetics, Genetics 1991, 4, 975-980. M.M. Metzstein, G.M. Stanfield, H.R. Horvitz, Genetics of programmed cell death in C. elegans: past, present and future, Trends Genet. 1998, 14, 410-416. C. Nusslein-Volhard, E. Wieschaus, Mutations affecting segment number and polarity in drosophila, Nature 1980, 287,795-801. M.C. Mullins, M. Hammerschmidt, P. Haffter, C. Nusslein-Volhard. Large-scale mutagenesis in the zebrafish: in search of genes controlling development in a vertebrate, Curr. Biol. 1994, 4, 189-201. P.M. Nolan, J. Peters, M. Strivens, D. Rogers, J. Hagan, N. Spurr, I.C. Gray, L. Vizor, D. Brooker, E. Whitehill, R. Washbourne, T. Hough, S. Greenaway, M. Hewitt, X. Liu, S. McCormack, K. Pickford, R. Selley, C. Wells, Z. Tymowska-Lalanne, P. Roby, P. Glenister, C. Thornton, C. Thaung, J.A. Stevenson, R. Arkell, P. Mburu, R. Hardisty, A. Kiernan, A. Erven, K.P. Steel, S. Voegeling, J.L. Guenet, C. Nickols, R. Sadri, M. Nasse, A. Isaacs, K. Davies, M. Browne, E.M. Fisher, J. Martin, S. Rastan, S.D. Brown, J. Hunter, A systematic, genome-wide, phenotype-driven mutagenesis programme for gene function studies in the mouse, Nut. Genet. 2000, 25,440-443. R.T. Peterson, B.A. Link, J.E. Dowling, S.L. Schreiber, Small molecule developmental screens reveal the logic and timing of vertebrate development, Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 12965- 12969. S.N. Bailey, D.M. Sabatini, B.R. Stockwell, Microarrays of small molecules embedded in biodegradable polymers for use in mammalian cell-based screens, Proc. Natl. Acad. Sci. U.S.A. 2004, 101,16144-16149.
31.
32.
33.
34.
35.
36.
37.
38.
B.R. Stockwell, S. J. Haggarty, S.L. Schreiber, High-throughput screening of small molecules in miniaturized mammalian cell-based assays involving post-translational modifications, Chem. Biol. 1999, G, 71-83. K. Stegmaier, K.N. Ross, S.A. Colavito, S. O’Malley, B.R. Stockwell, T.R. Golub, Gene expression-based high-throughput screening (GE-HTS) and application to leukemia differentiation, Nut. Genet. 2004, 36, 257-263. T.R. Hughes, M.J. Marton, A.R. Jones, C.J. Roberts, R. Stoughton, C.D. Armour, H.A. Bennett, E. Coffey, H. Dai, Y.D. He, M. J. Kidd, A.M. King, M.R. Meyer, D. Slade, P.Y. Lum, S.B. Stepaniants, D.D. Shoemaker, D. Gachotte, K. Chakraburtty, J . Simon, M. Bard, S.H. Friend, Functional discovery via a compendium of expression profiles, Cell 2000, 202,109-126. T. J. Mitchison, Small-molecule screening and profiling by using automated microscopy, Chembiochem 2004,29,33-39. Z.E. Perlman, T.J , Mitchison, T.U. Mayer, High-content screening and profiling of drug activity in an automated centrosome-duplication assay, Chembiochem 2005, 6, 145-151. Y. Feng, S. Yu, T.K. Lasell, A.P. Jadhav, E. Macia, P. Chardin, P. Melancon, M. Roth, T. Mitchison, T. Kirchhausen, Exol: a new chemical inhibitor of the exocytic pathway, Proc. Natl. Acad. Sci. U.S.A. 2003, 200, 6469-6474. T.J. Nieland, Y. Feng, J.X. Brown, T.D. Chuang, P.D. Buckett, J. Wang, X.S. Xie, T.E. McGraw, T. Kirchhausen, M. Wessling-Resnick, Chemical genetic screening identifies sulfonamides that raise organellar pH and interfere with membrane traffic, Trafic 2004,5,478-492. T.R. Kau, F. Schroeder, S. Ramaswamy, C.L. Wojciechowski, J.J.Zhao, T.M. Roberts, I. Clardy, W.R. Sellers, P.A. Silver, A chemical
References
39.
40.
41.
42.
43.
44.
45.
46.
47.
genetic screen identifies inhibitors of regulated nuclear export of a Forkhead transcription factor in PTEN-deficient tumor cells, Cancer Cell 2003,4, 463-476. F.C. Schroeder, T.R. Kau, P.A. Silver, J. Clardy, The psammaplysenes, specific inhibitors of FOXOla nuclear export,]. Nat. Prod. 2005,68, 574-576. K.M. Koeller, S.J. Haggarty, B.D. Perkins, I. Leykin, J.C. Wong, M.C. Kao, S.L. Schreiber, Chemical genetic modifier screens: small molecule trichostatin suppressors as probes of intracellular histone and tubulin acetylation, Chem. Biol. 2003,10, 397-410. S.J. Haggarty, K.M. Koeller, T.R. Kau, P.A. Silver, M. Roberge, S.L. Schreiber, Small molecule modulation of the human chromatid decatenation checkpoint, Chem. Biol. 2003, 10, 1267-1279. J. Huang, H. Zhu, S.J. Haggarty, D.R. Spring, H. Hwang, F. Jin, M. Snyder, S.L. Schreiber, Finding new components of the target of rapamycin (TOR) signaling network through chemical genetics and proteome chips, Proc. Natl. Acad. Sci. U.S.A. 2004,101, 16594-16599. R.A. Butcher, S.L. Schreiber, A small molecule suppressor of FK506 that targets the mitochondria and modulates ionic balance in saccharomyces cerevisiae, Chem. Biol. 2003,10,521-531. J. Clardy, C. Walsh, Lessons from natural molecules, Nature 2004,432, 829-837. J. Handelsman, M.R. Rondon, S.F. Brady, J . Clardy, R.M. Goodman, Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products, Chem. Biol. 1998,5,245-249. S.L. Schreiber, Target-oriented and diversity-oriented organic synthesis in drug discovery, Science 2000,287, 1964-1969. D.S. Tan, M.A. Foley, M.D. Shair, S.L. Schreiber, Stereoselective synthesis of over two million compounds having
48.
49.
50.
51.
52.
53.
54.
55.
56.
structural features both reminiscent of natural products and compatible with miniaturized cell-based assays, 1.Am. Chem. SOC.1998,120,8565-8566. H.E. Blackwell, L. Perez, R.A. Stavenger, J.A. Tallarico, E. Cope Eatough, M.A. Foley, S.L. Schreiber, A one-bead, one-stock solution approach to chemical genetics: part 1 , Chem. Biol. 2001,8, 1167-1182. P.A. Clemons, A.N. Koehler, B.K. Wagner, T.G. Sprigings, D.R. Spring, R.W. King, S.L. Schreiber, M.A. Foley, A one-bead, one-stock solution approach to chemical genetics: part 2., Chem. Biol. 2001,8,1183-1195. G.P. Tochtrop, R.W. King, Target identification strategies in chemical genetics, Comb. Chem. High 7'hroughput Screen. 2004,7, 677-688. M. Kijima, M. Yoshida, K. Sugita, S. Horinouchi, T. Beppu, Trapoxin, an antitumor cyclic tetrapeptide, is an irreversible inhibitor of mammalian histone deacetylase, J. Biol. Chem. 1993,268,22429-22435. M. Yoshida, M. Kijama, M. Akita, T. Beppu, Potent and specific inhibition of mammalian histone deacetylase both in vivo and in vitro by trichostatin A,J. Biol. Chem. 1990,265, 17174- 17179. J. Taunton, J.L. Collins, S.L. Schreiber, Synthesis of natural and modified trapoxins, useful reagents for exploring histone deacetylase function,]. Am. Chem. SOC.1996,118, 10412- 10422. J. Taunton, C.A. Hassig, S.L. Schreiber, A mammalian histone deacetylase related to the yeast transcriptional regulator Rpd3p, Science 1996,272,408-411. M.S. Finnin, J.R. Donigian, A. Cohen, V.M. Richon, R.A. Rifkind, P.A. Marks, R. Breslow, N.P. Pavletich, Structures of a histone deacetylase homologue bound to the TSA and SAHA inhibitors, Nature 1999,401, 188-193. C.M. Grozinger, S.L. Schreiber, Deacetylase enzymes: biological functions and the use of
1351
352
I
6 Forward Chemical Genetics
57.
58.
59.
60.
61.
62.
63.
64.
65.
small-molecule inhibitors, Chem. Biol. identification, Chem. Biol. 2005, 12, 2002, 9, 3-16. 55-63. 66. T. Hughes, B. A n d r e w c . Boone, Old B. Langley, J.M. Gensert, M.F. Beal, R.R. Ratan, Remodeling chromatin drugs, new tricks: using genetically and stress resistance in the central sensitized yeast to reveal drug targets, nervous system: histone deacetylase Cell 2004, 116, 5-7. 67. P.Y. Lum, C.D. Armour, S.B. inhibitors as novel and broadly effective neuroprotective agents, c u r . Ste~aniants,G . Cavet, M.K. Wolf, J.S. Drug Targets CNS Neurol. Disord. 2005, Butler, 1.c. Hinshaw, p. Gamier, G.D. 4,41-50. Prestwich, A. Leonardson, C.J. Phiel, F. Zhang, E.Y. Huang, M.G. p. Garrett-Engele,C.M. Rush, Guenther, M.A. Lazar, P.S. Klein, M. Bard, G. Schimmack, J.W. Phillips, Histone deacetylase is a direct target of C.J. Roberts, D.D. Shoemaker, valproic acid, a potent anticonvulsant, Discovering modes of action for mood stabilizer, and teratogen, /. Biol. therapeutic compounds using a Chem. 2001, 276,36734-36741. genome-wide screen of yeast J.K. Chen, J. Taipale, M.K. Cooper, heterozygotes, Cell 2004, 116, 121-137. P.A. Beachy, Inhibition of hedgehog signaling by direct binding of 68. G. Giaever, P. Flaherty, J. Kumm, M. Proctor, C. Nislow, D.F. Jaramillo, cyclopamine to smoothened, Genes A.M. Chu, M.I. Jordan, A.P. Arkin, DCV.2002, 16,2743-2748. R.W. Davis, Chemogenomic profiling: J.K. Chen, J. Taipale, K.E. Young, T, Maiti, P,A, Smallmolecule identifying the functional interactions of small molecules in yeast, Proc. Natl. modulation of smoothened activity, Acad. Sci. U.S.A.2004, 101,793-798. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 69. H. Luesch, T.Y. Wu, P. Ren, N.S. 14071- 14076. Gray, P.G. Schultz, F. Supek, A E.J. Licitra, J.O. Liu, A three-hybrid genome-wide overexpression screen in system for detecting small yeast for small-molecule target ligand-protein receptor interactions, identification, Chem. Biol. 2005, 12, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 55-63. 12817-12821. 70. S.J. Haggarty, T.U. Mayer, D.T. M.J. Marton, J.L. DeRisi, H.A. Miyamoto, R. Fathi, R.W. King, T.J. Bennett, V.R. Iyer, M.R. Meyer, C.J. Mitchison, S.L. Schreiber, Dissecting Roberts, R. Stoughton, J. Burchard, cellular processes using small D. Slade, H. Dai, D.E. Bassett Jr, L.H. molecules: identification of Hartwell, P.O. Brown, S.H. Friend, colchicine-like,taxol-like, and other Drug target validation and small molecules that perturb mitosis, Chem, Biol, 2000, 7, 275-286. identification of secondary drug target effects using DNA microarrays, Nat. 71. T.U. Mayer, T.M. &poor, S.J. Med. 1998,4,1293-1301. Haggarty, R.W. King, S.L. Schreiber, P.P. Sche, K.M. McKenzie, J.D. White, T,J , Mitchison, smallmolecule D.J. Austin, Display cloning: inhibitor of mitotic spindle bipolarity functional identification of natural identified in a phenotype-based product receptors using cDNA-phage screen, Science 1999, 286, 971-974. display, Chem. B i d . 1999, 6, 707-716. 72. S . Hotha, J.C. Yarrow, J.G. Yang, J. Labaer, N. Ramachandran, Protein S. Garrett, K.V. Renduchintala, T.U. microarrays as tools for functional Mayer, T.M. Kapoor, HR22C16: a proteomics, Curr. Opin. Chem. Biol. potent small-molecule probe for the 2005, 9, 14-19. dynamics of cell division, Angew. H. Luesch, T.Y. Wu, P. Ren, N.S. Chem. Int. Ed. Engl. 2003, 42, Gray, P.G. Schultz, F.A. Supek, 2379-2382. Genome-wide overexpression screen 73. S. DeBonis, D.A. Skoufias, L. Lebeau, in yeast for small-molecule target R. Lopez, G. Robin, R.L. Margolis,
74. 75.
76.
77.
78.
79.
80.
81.
82.
83.
R.H. Wade, F. Kozielski, In vitro screening for inhibitors of the human mitotic kinesin Eg5 with antimitotic and antitumor activities, fvfol. Cancer Ther. 2004,3,1079-1090. C.M. Dobson, Chemical space and biology, Nature 2004, 432, 824-828. C. Lipinski, A. Hopkins, Navigating chemical space for biology and medicine, Nature 2004, 432, 855-861. S.J. Haggarty, The principle of complementarity: chemical versus biological space, C u r . Opin. Chem. Biol. 2005, 9, 296-303. D.K. Agrafiotis, Multiobjective optimization of combinatorial libraries, Mol. Divers. 2002, 5, 209-230. D.K. Agrafiotis, V.S. Lobanov, F.R. Salemme, Combinatorial informatics in the post-genomics ERA, Nut. Rev. Drug Discov. 2002, I , 337-346. J.N. Weinstein, T.G. Myers, P.M. O’Connor, S.H. Friend, A.J. Fornace, K.W. Kohn, T. Fojo, S.E. Bates, L.V. Rubinstein, N.L. Anderson, J.K. Buolamwini, W.W. van Osdol, A.P. Monks, D.A. Scudiero, E.A. Sausville, D.W. Zaharevitz, B. Bunow, V.N. Viswanadhan, G.S. Johnson, R.E. Wittes, K.D. Paul1 Jr, An information-intensive approach to the molecular pharmacology of cancer, Science 1997, 275, 343-349. S.J. Haggarty, K.M. Koeller, J.C. Wong, R.A. Butcher, S.L. Schreiber, Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays, Chem. B i d . 2003, 10,383-396. A.T. Balaban, Chemical Applications of Graph Theory, Academic Press, London, 1976. S.J. Haggarty, P.A. Clemons, S.L. Schreiber, Chemical genomic profiling of biological networks using graph theory and combinations of small molecule perturbations, J. Am. Chem. SOL.2003, 125, 10543-10545. E. Hamel, Antimitotic natural products and their interactions with tubulin, Med. Res. Rev. 1996, 16, 207-23 1.
84.
85.
86.
87.
88.
89.
90.
91.
92.
P.B. Schiff, J. Fant, S.B. Honvitz, Promotion of microtubule assembly in vitro by taxol, Nature 1979, 277, 665-657. M.C. Wani, H.L. Taylor, M.E. Wall, P. Coggon, A.T. McPhail, Plant antitumor agents. VI. The isolation and structure of taxol, a novel antileukemic and antitumor agent from Taxus brevifolia, J. Am. Chem. SOC.1971, 93, 2325-2327. M. Roberge, B. Cinel, H.J. Anderson, L. Lim, X. Jiang, L. Xu, C.M. Bigg, M.T. Kelly, R.J. Andersen, Cell-based screen for antimitotic agents and identification of analogues of rhizoxin, eleutherobin, and paclitaxel in natural extracts, Cancer Res. 2000, 60, 5052-5058. Y. Yan, V. Sardana, B. Xu, C. Homnick, W. Halczenko, C.A. Buser, M. Schaber, G.D. Hartman, H.E. Huber, L.C. Kuo, Inhibition of a mitotic motor protein: where, how, and conformational consequences, /. Mol. Biol. 2004, 335, 547-554. T. Kouzarides, Acetylation: a regulatory modification to rival phosphorylation? E M B O J . 2000, 19, 1176-1 179. S.M. Sternson, J.C. Wong, C.M. Grozinger, S.L. Schreiber, Synthesis of 7200 small molecules based on a substructural analysis of the histone deacetylase inhibitors trichostatin and trapoxin, Org. Lett. 2001, 3, 4230-4242. S.J. Haggarty, K.M. Koeller, J.C. Wong, C.M. Grozinger, S.L. Schreiber, Domain-selective small molecule inhibitor of HDAC6-mediated tubulin deacetylation, Proc. Natl. Acad. Sci. U.S.A. 2003, 100,4389-4394. S.J. Haggarty, P.A. Clemons, J.C. Wong, S.L. Schreiber, Mapping chemical space using molecular descriptors and chemical genetics: deacetylase inhibitors, Comb. Chem. High Throughput Screen. 2004, 7, 669-676. ChemBank, 2006; http://www.broad harvard.edu/chembio.
354
IG
Forward Chemical Genetics
Blueprint’s Small-Molecule Interaction Database (SMID),2006; http://smid.blueprint.org. 94. PubChem, 2006; http://pubchem. ncbi.nlm.nih.gov/. 93.
95.
A.B. Parsons, R. Geyer, T.R. Hughes, C. Boone, Yeast genomics and proteomics in drug discovery and target validation, Prog. Cell Cycle Rex 2003,5,159-166.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I355
7 Reverse Chemical Genetics Revisited
7.1 Reverse Chemical Genetics - An Important Strategy for the Study of Protein Function in Chemical Biology and Drug Discovery
Rolf Breinbauer, Alexander Hillisch, and Herbert Waldmann
7.1.1 Introduction
Drug discovery has seen several paradigm shifts over the last two decades. Several new techniques have been introduced to widen what was believed to be the bottleneck of this endeavor at the given time. Although many of these techniques did not keep their initial promise, there is no doubt that high-throughput screening (HTS) and protein structure-based drug design have contributed enormously to the process of developing new high-affinity protein binders and have made it more efficient. The sequencing of whole genomes has provided numerous new potential drug targets. Unfortunately, the undisputed value of these techniques has not (yet) led to an increase in the number of new chemical entities entering the market. Spectacular cases of several costly failures of drug candidates in late-stage clinical trials or - even worse - the withdrawal of several drugs, (e.g., COX-2 inhibitors), which benefited millions of patients due to unanticipated side effects, has reminded us that the biological systems with which we are dealing are extremely complex. Target validation has become the critical factor in drug discovery. Consequently, all methods that contribute to a deeper understanding of biological systems ranging from protein function within a cell to the complex interplay within multicell organisms will gain importance in the future. Systems biology, although still in its infancy, might be one approach to achieve this goal. The pharmacological approach, in which protein function is modulated by small molecules, has played a prominent role in the study ofbiological systems. Compared to other and complementary approaches, such as DNA knockouts, Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor. and Cunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KCaA, Weinheim ISBN: 978-3-527-31150-7
356
I
7 Reverse Chemical Genetics Revisited
mRNA
DNA
I
I
- Gene knockout - DNA-binder Scheme 7.1-1
- Antisense
- RNAi
Proteins
- Small molecules
Probing of biological systems on different levels of hierarchy.
antisense, or RNA interference it has several advantages. The most significant among them is the fact that small molecules probe biological systems at the level of proteins. This aspect is shared only by antibodies, which are usually limited to the interaction with extracellular proteins (Scheme 7.1-1). In an analogy to related terms in mutation genetics, Schreiber and Mitchison defined “forward chemical genetics” as the probing of biological systems with small molecules and observing changes in phenotypes or biomarkers. On the other hand, in “reverse chemical genetics” a small molecule probe with validated affinity to a defined protein is used as a tool to study the biological function of this particular protein within its natural context [ 1-31.
7.1.2 History/Developrnent
The concept of reverse chemical genetics has been applied since natural product probes have been discovered as research tools in biology. In experiments on the salivary gland of the cat, J. N. Langley (in 1878) showed the mutually antagonistic effect between pilocarpine and atropine. He observed a similar relationship between nicotine and curare in his study of the contraction of muscle cells. These results inspired him to formulate the “receptor theory” of drugltarget interaction, which has become the main pillar of pharmacology [4].Once it was realized that the toxicity of colchizine, the poison of meadow saffron, originates from its ability to lead to cell cycle arrest, biologists have exploited this property to intentionally create this condition and study the biological consequences. The use of microtubule poisons has enabled numerous important discoveries, such as the determination of the correct number of diploid chromosomes in humans or the demonstration of the role of microtubuli in cell migration, tumor invasion, or anchoring of the Golgi complex at the microtubule-organizing center [ S ] . Many other such probes have been identified and as shown in Table 7.1-1 the number of references containing their name may serve as an indicator how big their impact is on biological studies.
7. I
The Study ofprotein Function in Chemical Biology and Drug Discovery
2 0
N
.h r
Lm r 3
CQ
m
00 N
L D
Y
3 2 0 0
fp
""\$
y yo o 0
N
I
357
358
I
7 Reverse Chemical Genetics Revisited
2
00
0
*
00 0
N i
I 0 I -0
~
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery
But, it is not only secondary metabolites, that function as natural poisons, that have stimulated small molecule-fueled research of protein function. In 1914, Henry Dale classified cholinergic receptors as being either nicotinic or muscarinic on the basis of whether nicotine or muscarine stimulated a response [GI. Similarly, Raymond Ahlquist explained the different pharmacological actions of drugs on smooth muscle using the existence of two types of adrenoceptors. Noradrenaline was an a-receptor agonist (making smooth muscles to contract), whereas isoprenaline was a B-receptor agonist (causing smooth muscles to relax). Adrenaline, which is a mixed alB-receptor agonist, exhibits both activities, but varied with the site of action (Fig. 7.1-1) [6, 71. Today, 60 years later, these receptors have been recognized to be membrane located G-protein coupled receptors (GPCRs) for which several subtypes a1-2,B1-3 and even subsubtypes have been identified. These receptors represent some of the most important drug targets addressed by current medications. Very selective inhibitors have been identified and developed as drugs. For example, selective -antagonists (“B-blockers”)have saved millions of lives and have reached a blockbuster status. James Black, who was one of the most important contributors in the development of the B-blockers, applied the lesson learnt there for the development of the most successful drug of the 1980s. He and others interpreted the observation that alkyl-substituted histamine analogs did not exhibit equal activity on histamine receptors in different tissues as a result of the existence of more than one histamine receptor. Indeed, it could be
Adrenaline (a@-agonist)
Noradrenaline (a-agonist) Fig. 7.1-1
Agonists o f a-and B-adrenergic receptors.
lsoprenaline (P-agonist)
I
359
360
I shown that classical antihistamines in the treatment of inflammation affected 7 Reverse Chemical Genetics Revisited
the so-called histamine HI-receptor, whereas in the stomach, a new type of receptor named histamine HZ -receptor was involved in the release of gastric acid. Refinement of the early antihistamine compounds led to the development of the selective H2-receptor antagonist cimetidine, which revolutionized the treatment of ulcers (Fig. 7.1-2) [GI. Until the early 1980s small molecules played an important role in the discovery of new proteins. Tissue-dependent differences in the responses of drug candidates often indicated that several subtypes of a receptor might exist, stimulating research in this direction. On the other hand, clinical observations of the side effects of the drugs used revealed that other protein targets were affected as well. By variation of the structure this side effect could be optimized to become a new drug against a different disease. This approach is highlighted by the classic example of the development of sulfoantibiotics into antidiuretics targeting carbonic anhydrases, enzymes which had been characterized just a few years before [8, 91. For a long time, the search for proteins was guided by the proposition that an observation made within a biological experiment could be best explained if an according protein would exist. This meant that in many cases essential features of its function were known before it was identified. In contrast, today with the emergence of new techniques in molecular biology the scenario dominates, in which new genes and proteins are found for which no experimental evidence of their function is known [lo]. Sequence comparisons by bioinformatics tools often allow making qualified guesses about their potential functions, by proposing functional relationships with proteins of similar sequence. While sequencing of a gene or a protein had previously been a multiyear effort, nowadays it is routinely performed and offered by service groups within large research institutions or commercial companies. The currently pending functional assignment of the many newly sequenced proteins will benefit from a new renaissance of the use of small molecule probes.
,f--NH2
N H Histamine (agonist)
Fig. 7.1-2
Cimetidine (Hp-agonist)
Development ofcimetidine as a Hz-selective agonist for the treatment of ulcers.
7.1 The Study of Protein Function in Chemical Biology and Drug Discovery
7.1.3 General Considerations
The key element of any reverse chemical genetics approach is the access to a small molecule, which modulates protein function by binding to the target protein [11]. Such molecules can be identified using two different approaches (1) HTS of large compound collections and (2) computer-aided design of compounds on the basis of the structure of the target protein, directed synthesis, and biological testing of selected compounds. 1. High-throughput screening: HTS is used to test large numbers of compounds for their ability to affect the activity of target proteins. Today, entire in-house compound libraries with millions of compounds can be screened with a throughput of 10000 (HTS) up to 100000 compounds per day (ultra high-throughput screening, uHTS) using robust test assays [12, 131. Homogeneous “mix and measure” assays are preferred for HTS as they avoid filtration, separation, and wash steps that can be time consuming and difficult to automate. Assays for HTS can be grouped into two categories: so-called solution-based biochemical assays and cell-based assays [ 14, 151. The former are based on radioactive (scintillation proximity assay, SPA), fluorescence (fluorescence resonance energy transfer, FRET, fluorescence polarization, FP, homogeneous time resolved fluorescence, HTRF, and fluorescence correlation spectroscopy, FCS), calorimetric and surface plasmon resonance (SPR, e.g., BiaCore) detection methods to quantify the interaction of test compounds with biological target molecules. SPAS in HTS have largely replaced heterogeneous assays that make use of radiolabeled ligands with subsequent filtration steps to measure high-affinity binding to receptors. Cell-based assays include (a)second messenger assays that monitor signal transduction, (b) reporter gene assays that monitor cellular responses at the transcriptional/translational level, (c) cell proliferation assays that detect induction or inhibition of cell growth, and (d) phenotypic assays that monitor change in cell morphology or related parameters. Once a robust test assay has been set up, the choice of suitable compound libraries is the next key step. An excellent source of selective small molecule probes is the natural product pool. In an evolutionary process of millions of years, nature has come up with molecular structures that offer an evolutionary advantage to the species that makes the effort to synthesize these molecules. In most cases, these molecules are used to defend against enemies or to paralyze or kill preys. It is in the nature of these processes that such molecular weapons act most efficiently if they interfere with important biological processes of the target species, meaning that biologically relevant protein targets are addressed. A disadvantage of natural compounds is the often complex structure and the associated low synthetic accessibility. However, as has been outlined in Chapter 7.1.2 natural products have been the first small molecule probes used in biological studies and continue to be of significant importance (vide
I
361
362
I infia). Recently,the combination of chemoinformatics,bioinformatics, and the 7 Reverse Chemical Genetics Revisited
chemistry of natural products has led to the insight that natural products can be regarded as evolutionary selected starting points in chemical space and to the establishment of “natural product guided compound library development” [ l G , 171. Historically grown libraries of synthetic compounds or compounds from combinatorial chemistry approaches are usually the first choice in the pharmaceutical industry for HTS. Every large pharmaceutical company and an increasing number of startup companies and research institutions now have access to a collection of these compounds. These collections have been built by in-house synthetic efforts, purchased from commercial vendors, or obtained by the synthesis of compound libraries using combinatorial methods [ 181. 2. Computer-assisted drug design: Small molecule probes can also be identified or designed from scratch using computational tools exploiting knowledge of pharmacophores or the protein structure as a guiding principle. Computational tools encompass 3D-pharmacophore searches and high-throughput docking [17, 191. In 3D-database searching, structures of compounds from virtual or physically existing libraries are screened to identify compounds that fulfill a certain spatial arrangement of functional groups (a pharmacophore). High-throughput docking involves the in silico docking of small molecules into binding sites of target proteins with known or predicted structure. Empirical scoring functions are used to evaluate the steric and electrostatic complementarity (the fit) between the compounds and the target protein. The highest ranked compounds are then suggested for biological testing. These software tools are attractive and cost-effective approaches to generate chemical lead structures, virtually and before committing expensive synthetic chemistry. Furthermore, they allow rapid and thorough understanding of the relationship between chemical structure and biological function. Depending on the software used, the virtual screening of small molecules normally takes less than a minute per chemical structure per computer processor (CPU) [17]. Utilizing clusters of CPUs results in a high degree of parallelization. The throughput with 100parallel CPU machines is even higher compared to current uHTS technologies. The main advantage is that the method does not depend on the availability of compounds, meaning that not only in-house libraries can be searched but also external or virtual libraries. The application of scoring functions on the resulting data sets facilitates smart decisions about which chemical structures bear the potential to exhibit the desired biological activity. On the other hand, the high-throughput docking approach can only be applied to protein targets for which structural information based on X ray, NUCLEAR MAGNETIC RESONANCE NMR, or homology models are available.
Once a hit compound has been identified, its specificity to the protein target has to be assigned. Ideally,the small molecule should exhibit perfect selectivity toward the protein of interest. In reality, it is more likely that none of the small molecule probes used today fulfill this requirement. Compounds that previously had been thought to be specific have turned out to hit more protein
7.7 The Study ofprotein Function in Chemical Biology and Drug Discovery
targets once they are subjected to screens against other protein targets. In the light of new technological opportunities and by failure of drugs in clinical trials or practice due to off-target activity, efforts have been initiated to reinvestigate the biological activity of existing drugs or interesting chemical compounds and annotate their activity to as many proteins as available. An example of a pioneering effort toward this direction has been the proteomic analysis of the selectivity of kinase inhibitors by the groups of Meijer, Daub, and Lockhart [20-231. As the development of protein assays progresses rapidly and leads to improvements in quality and quantity of information and a significant increase in scope of screened protein targets, the door for full annotation of chemical compounds has been opened. Screening the hit compound against many protein targets has become imperative for two reasons: First of all, lack of selectivity might be addressed by preparation of a second generation compound library using the methods described above, and secondly, if this process does not lead to further improvement, knowledge about the off-target promiscuity of a small compound probe will allow a careful and critical interpretation of the results of the biological studies carried out with this probe (Scheme 7.1-2). The small molecule probe that has been selected by the process detailed above is then used as a tool in a series of biological studies, exploiting the whole repertoire of modern molecular and cell biology, such as genomic or proteomic profiling, imaging techniques, or functional readouts (241. Other techniques that are used for the assignment of gene function involve the preparation of DNA mutants or gene knockouts, the application of gene silencing via antisense probes, or RNA interference [25]. As shown in Scheme 7.1-1, biological systems are probed with these strategies at the level of genetic information or transcriptional expression. Consequently, the main advantage of these genetic techniques is the pronounced, in many cases even absolute specificity,with which they allow the probing of biological systems (Table 7.1-2).On the other hand, reverse chemical genetics has several unique advantages complementing these genetic techniques [26, 271:
Table 7.1-2 Comparison of different strategies to probe biological
systems Property
Rate of action Specificity Tunability Cost of individual experiment Time to set up experiment Reversibility Developmental studies
Gene knockout X
++S -
t:positive, -: negative, 0 : neutral, x: not relevant
RNA interference -
++ 0 -
-
+ +
Small molecule
+++ +++ ++ ++ ++ +++ 0
I
363
364
I
7 Reverse Chemical Genetics Revisited
Scheme 7.1-2
Flow scheme for a reverse chemical genetics approach.
The effect of small molecules is rapid (high temporal control of the experiment). Concentration of small molecules can in many cases be spatially controlled and monitored. The effect is tunable. By varying the concentration different degrees of phenotype expression can be created. In most cases the biological effect is reversible (due to metabolism or excretion), which allows transient study of protein function.
7.1 The Study ofprotein Function in Chemical Biology and Drug Discovery
The effect is conditional. I t can be initiated at any stage during the development of an organism. In contrast, a gene knockout that is lethal for embryonic development cannot be studied in an adult organism. Knockout studies cannot differentiate between different protein forms that result from the same gene. Small molecules should, in principle, be able to distinguish between the different forms. Small molecules can even consolidate protein structures in different conformations (agonists resp. antagonists), allowing gain-of-function as well as loss-of-function studies to be performed. As ligand-binding sites of a protein exhibit in many cases a very high structural similarity in different species, the same small molecule probe can be used for studies in different species, whereas any genetic experiment would have to be adapted to the different genetic repertoire. The effect can be studied by anyone who has access to the small molecule probe (simple reproducibility). Recently, several techniques have been introduced, which combine the experimental advantages of chemical probes with the specificity of genetic methods. Conklin et al. have established the “receptors activated solely by synthetic ligands” (RASSL)) approach for the study of G-proteins in vivo. In one example they removed the third extracellular loop of the K opioid receptor (KOR), which reduced the binding affinity of natural endogenous peptide ligand dynorphin to t O . O S % , while maintaining affinity for small molecule K agonists that have a different binding pocket close to the transmembrane region [28]. The human genome encodes >SO0 kinases, many of them playing important roles in key processes such as cell signaling and cell division. Although all kinases have an ATP-binding pocket, which qualifies them for small molecule binding, the structural Similarity of these ligand-binding sites renders specificity almost impossible. Shokat et al. have developed an elegant approach, which allows for the allele-specificchemical intervention of kinases. A promiscuous kinase inhibitor was modified by a bulky substituent, which prohibited binding to the regular ATP-binding sites of native kinases. Almost all kinases exhibit a hydrophobic residue at the ATP-binding site, which functions as the “gatekeeper”. Mutational replacement of the gatekeeperresidue against Gly does not affect the regular activity of the kinase, but opens intervention by the bulky inhibitor, which interacts only with sensitized kinases. Shokat et al. used this technique, for instance, to show that there are significant phenotypic differences between the rapid loss of activity by inhibition and the deletion of the genomic copy of the cyclin-dependent kinase Pho85 [29, 301.
I
365
366
I
7 Reverse Chemical Genetics Revisited
7.1.4 Applications and Practical Examples
Since a comprehensive description of all examples for reverse chemical genetics investigations carried out is beyond the scope of this chapter, we will highlight several notable examples from seven case studies, which exemplify key elements of this approach. Many other important contributions, such as the seminal work of the Schreiber group in revealing the chemical biology of immunophilins and histone deacetylases, and the preparation of subtype-selective agonists of the somatostatin receptor through combinatorial chemistry by researchers from Merck, are listed in Table 7.1-3. A recently published review article describes forward and reverse chemical genetics related to cell division, cytoskeleton, protein trafficking, and the ubiquitinproteasome pathway [31].
Case Study 1: Isotype-SelectiveSmall Molecule Probes for Orphan Nuclear Receptors (CW4064 and Farnesoid X Receptor)
To date, 48 nuclear receptors have been identified in the human genome. Each of these receptors contains the signature DNA-binding and/or ligand-binding domain (LBD). However, only 12 receptors bind to the classical steroid and retinoid hormones, and the remaining 36 have been designated as orphan nuclear receptors. Researchers from GlaxoSmithKline Inc. used HTS of natural compound and combinatorial chemistry libraries to deorphanize selected members ofthe nuclear receptor family [49,50].The farnesoid X receptor (FXR) has been shown to be weakly activated by farnesol. However, this effect is only indirect since farnesol does not bind to the receptor. Screening ofa collection of naturally occurring steroids revealed that FXR is a receptor for bile acids, with Table 7.1-3
Selected examples for reverse chemical genetics
Small molecule probes
Cytochalasin, latrunculin Cyclosporin, FKSOG, rapamycin
Comments
Inactivates actin (cytoskeleton) Calcineurin, FRAP, TOR pathway (signal transduction) Trichostatin A, tubacin, histacin Histone deacetylase (gene expression) Uretupamine Ure2p (glucose signaling) MT1-2 agonists and antagonists Melatonin receptors (cell signaling) Kinase inhibitors Raf/MAP kinase pathway (cell signaling) Somatostatin receptors (cell signaling) SSTI-5 selective agonists Src-kinase inhibitors Maturation of T-cell contacts SAG Smo protein (Hedgehog signaling) Monastrol kinase inhibitors Aurora kinases (cell division) Tunicamycin Glycoprotein biosynthesis
+
References
368
I experiments have been aiding in gaining insight into estrogen signaling, 7 Reverse Chemical Genetics Revisited
additional information on the function of E R a and ERB was provided by the application of isotype-selective ER agonists. These compounds include the E R a selective agonists propyl pyrazole trio1 (PPT) [55], the ERB selective agonists diarylpropionitrile (DPN) [SG], and the benzoxazole derivative ERB041 [57].On the basis of the crystal structure of the ERa, LBD and a homology model of the ERB-LBD (59% sequence identity to ERa) [58] Hillisch et al. designed steroidal ligands that exploited the differences in size and flexibility between the two ligand-binding cavities (Fig. 7.1-4). Computer-aided drug design methods were used to dock compounds into the binding pockets. Compounds predicted to bind preferentially to either ERa or ERB were synthesized and tested in vitro. This approach directly led to high ER, isotype selective, (200-250 fold) and potent ligands. To unravel the physiological roles of each of the two receptors, in vivo experiments with rats were conducted using the ERa- and ERB-selective agonists in comparison to the natural ligand, 17B-estradiol.The compounds were administered to Wistar rats using osmotic pumps to overcome pharmacokinetic deficiencies of these tool compounds. A specifically developed, highly sensitive RIA (Radio Immune Assay) allowed the detection and quantification of the compounds in systemic circulation [59]. The E R a agonist 1Ga-LE2 was shown to be responsible for most of the known estrogenic effects such as induction of uterine growth, and bone-protective, pituitary, and liver effects. In addition, the compound showed positive effects on blood vessels in ovariectomized spontaneously hypertensive rats; endothelium-dependent NO-mediated vasorelaxation; and e-NOS (endothelial Nitric Oxide Synthase) expression [59]. The ERB agonist 8B-VE2 was shown
Fig. 7.1-4
Isotype-selective probes for E R a and ERB. Reprinted with permission from The Endocrine Society [58].
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery
to stimulate early folliculogenesis, decrease follicular atresia, induce ovarian gene expression, and stimulate late follicular growth, accompanied by an increase in the number of ovulated oocytes in hypophysectomized rats and gonadotropin-releasing hormone antagonist-treated mice [GO]. Affymetrix analysis revealed the expression of a considerable number of genes to be strongly modulated in the ovary by treatment of juvenile rats with the natural hormone estradiol ( E l ) and the tool compounds 8B-VE2, among these cellular retinoic acid binding protein I1 (CRABP-11),a-L-fucosidase (ALFUC),calciumbinding protein (CaBP), prostacyclin synthase (PGIS), and inhibin a. These experiments revealed several new aspects of estrogen signaling and stimulated further research. Use of the ERB agonist might provide clinicians with a new option for tailoring classical ovarian stimulation protocols. These studies show that it is possible to design highly selective compounds, if structure information on all relevant homologs of the target is available and the designed tool compounds contribute essentially to the elucidation of the physiological roles of the target protein.
Case Study 3: Deorphanizing Receptors by Reverse Pharmacology (Orexins and C PCRs)
The sequencing of the human genome has resulted in the identification of 300-400 nonolfactory GPCRs, for most of them an endogenous ligand has not yet been identified (“orphan receptors”). GPCRs respond to a variety of signals, including photons, biogenic amines, lipids, or peptides. The biological activity of all known small regulatory peptides (small peptide hormones and neuropeptides) is associated with their acting on GPCRs. It is believed that for most orphan GPCRs, peptides are their unidentified signaling molecules. To understand the biological significance of the many GPCRs in the human genome, deorphanization is a goal of utmost importance. Sakurai et al. have demonstrated that “reverse pharmacology” is a powerful strategy to accomplish this task. After generating over 50 transfectant cell lines, each expressing a distinct orphan GPCR, they challenged the cells with HPLC (high performance liquid chromatography) fractions of extracts derived from different tissues and monitored a number of signal transduction readouts for G-protein activation. In such an experiment, they observed interesting initial activity in an extract from rat brain. Several rounds of reverse phase-H PLC purification revealed a 3 3 amino acid peptide as the active substance, which received the name orexin-A. The corresponding receptor received the name orexin receptor (greek: orexis = appetite). Further investigations resulted in the notion that two substances orexin-A and orexin-B exist, both exhibiting intramolecular disulfide bridges, which activate two receptors A and B that are found mainly in the brain [Gl].A combination of chemical, genetic, and physiological studies revealed that these peptides stimulate food consumption and their production is influenced by the nutritional state of a test animal. The discovery of orexin
I
369
370
I
7 Reverse C h e m i c a l Genetics Revisited
deficiency in narcoleptic patients showed that orexins play an important role in the regulation of sleep and wakefulness.[G2] The strategy of “reverse pharmacology” has turned out to be a generally applicable and productive approach for the deorphanization of GPCRs [G3]. For example, it has been used for the functional annotation of the receptors Drostar-1 and Drostar-2, for which a role in visual information processing has been identified [G4]. Case Study 4:lsoform Selective Inhibitor made by Combinatorial Chemistry Unravels the Roles of lsoforms In Vivo(Cranzymes A and 6)
Natural killer (NK) cells and cytotoxic T lymphocytes (CTL)are the primary line of defense against viruses and other intracellular pathogens in the immune system. The cytotoxic lymphocytes recognize infected host cells and kill them with the help of the pore-forming protein perforin and by proteolytic events carried out by members of the granzyme family of serine proteases. Although an essential component of immunity under normal conditions, aberrant cytotoxic lymphocyte activity has been associated with autoimmune disorders such as rheumatoid arthritis, diabetes, or allograft rejection [GS]. Craik and Mahrus applied a reverse chemical genetics approach to reveal the role of the most important granzymes A and B in cell lysis, as two classical approaches of cell biology have led to contradictory results: Cytotoxic lymphocytes from knockout mice (lacking either granzyme A, granzyme B, or both) behave relatively normal in their ability to lyse target cells. On the other hand, a reconstituted system in which target cells are treated with sublytic levels of perforin and either granzyme A or granzyme B leads to efficient cell lysis. This discord in findings could result from the well-known limitations of these two approaches: It is known that the results from genetic deletion studies are obscured by compensation effects of similar genes, whereas in reconstituted systems the concentrations and mode of delivery of the agents can be nonphysiological. Craik and Mahrus used a positional scanning approach to prepare two isozyme-specific phosphonate inhibitors as affinity labels of granzymes A and B (Fig. 7.1-5). Both inhibitors were tested against a panel of all known human granzymes A, B, H, K, and M and only exhibited activity against their target protein. Use of these activity-based probes in cytotoxicity assays then allowed dissection of the contribution of granzymes A and B to lysis of target cells by N K cells. Granzyme B functions as a major effector of target cell Ivsis, whereas granzyme A is only a minor effector in the same process. Tlie difFerence between the outcome of the reverse chemical genetics approach and the above mentioned conventional experiments might be a consequence of the fact that in pharmacological studies high temporal control circumvents compensation, and also because no alterations are made to the concentrations and mode of delivery of granzymes and perforin.
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery
Probe A (granzyme A-selective)
Probe B (granzyme B-selective) Fig. 7.1-5
Isozyme-selective probes for reverse chemical genetics of granzymes A and B.
Case Study 5: Design o f an Inhibitor of a Protein to Study Protein Function in a Cell (Raspalin 3 and APT1)
The observation that the Ras proteins are critically involved in the development of cancer has spurred substantial interest in developing new classes of antitumor drugs on the basis of interference with the impaired signal transducing activities of Ras. The Ras proteins belong to the class of proteins whose biological activity is dependent on lipid modification. In the normal and oncogenic state, the H- and N-Ras isoforms are anchored to the plasma membrane by means of S-farnesylation and S-palmitoylation at their Cterminus, which are required to exert their full biological activity. While inhibition of the enzyme farnesyltransferase is known and has become a drug target for intervention of tumors carrying a mutation in the Ras oncogene, the enzyme responsible for the palmitoylation of the Ras and other G-protein has not been identified so far. The only known “bona jide player” in Ras-palmitoylation was acyl protein thioesterase 1 (APTl), which depalmitoylates H-Ras and other lipidated proteins [GG].However, its relevance to Ras biology was unclear. In an attempt to elucidate the biological role of APT1 the groups of Giannis, Kuhlmann, and Waldmann followed a Chemical Genetics approach, that is, developed a
1
371
372
I
7 Reverse Chemical Genetics Revisited
Fig. 7.1-6 Raspalin 3 - inhibitorofAPT1.
Raspalin 3 (APTl : C I,
= 148 nM)
potent inhibitor of APTl to perform a chemical knockout of the protein in cellular assays and to study the subsequent response of the biological system. Peptidomimetics that imitate the C-terminus of the H-Ras protein and embody different lipidation patterns, in particular a nonhydrolyzable sulfonamide as analog of the palmitic acid thioester, were designed and investigated as inhibitors of APTl, among which Raspalin 3 emerged as the most useful inhibitor (Fig. 7.1-6) [67]. Raspalin 3 was then used in experiments employing the neuronal precursor cell line PC12, in which the semisynthetic Ras proteins modified with fluorescent probes played a major role (Fig. 7.1-7). Cell-biological experiments with these protein conjugates had shown that if a farnesylated yet still palmitoylatable Ras protein (that is with a free and palmitoylatable cysteine-SH) was microinjected into PC12 cells, the cellular machinery would carry out the palmitoylation, resulting in localization of the protein at the plasma membrane, and neurite outgrowth from the cells. It was to be expected that APTl through depalmitoylation should antagonize this process leading to reduced neurite outgrowth. Consequently, inhibition of the depalmitoylating thioesterase by the freshly designed inhibitors should lead to an increase of neurite formation. However, when microinjected or added to the culture medium, application of an APTl inhibitor surprisingly resulted in reduced formation of neurites. Thus, this compound did not behave as an inhibitor of Ras-depalmitoylation but rather as an inhibitor of Ras-palmitoylation. This finding was backed up by employing a different semisynthetic Ras protein that is biologically active yet not palmitoylatable or depalmitoylatable (it embodies a stable hexadecyl thioether instead of a labile palmitic acid thioester and was synthesized employing the methods described above). Use of yet another semisynthetic Ras protein that is palmitoylatable and additionally fluorescent-labeled in the PC12 cell assay, and inspection of the cells by confocal laser fluorescence microscopy showed that - as expected, if palmitoylation and not depalmitoylation
7. I The Study ofprotein Function in Chemical Biology and Drug Discovery
Fig. 7.1-7 Reduction of PC12 cell differentiation rate by Raspalin i n the PC12 differentiation assay.
was inhibited - in the presence of the inhibitor, the Ras protein is no longer localized to the plasma membrane but rather accumulates in intracellular membranes (Fig. 7.1-8).Taken together these findings indicated that APT1 may be involved in mediating both Ras-depalmitoylation and Ras-palmitoylation. Case Study 6: Rationally Designed lsoform Selective Inhibitor Exhibiting a New Clinical Aspect of the Protein Target (Viagra and PDE5)
Cyclic guanosine monophosphate (cGMP) is the ubiquitous second messenger for GPCRs activated by endogenous substances such as nitric oxide (NO)
I
373
374
I
7 Reverse Chemical Genetics Revisited
Fig. 7.1-8 Inhibition o f plasma membrane localization o f fluorescently labeled Ras protein by Raspalin 3. Localization ofthe fluorescent lipoprotein was monitored 7 h after microinjection by confocal microscopy. Although Ras protein alone shows a distinct
staining ofthe plasma membrane (a), coinjection o f 2 pM inhibitor Raspalin 3 results in an accumulation ofthe lipoprotein in cytoplasmic structures, which is typical for nonpalmitoylatable Ras constructs (b).
and atrial natriuretic factor (ANF). Intracellular levels of cGMP are controlled by cyclic nucleotide cyclases (synthesis of cGMP from GTP) and phosphodiesterases (PDE) (hydrolysis of cGMP to inactive GMP). Among at least seven families of PDEs, PDE5 is a calcium/calmodulin insensitive cGMP PDE, occurring in the lung, platelets, and in various forms of smooth muscles. A research team at Pfizer/UK was of the opinion that a selective PDE5 inhibitor would preserve tissue levels of cGMP and hence would potentiate the vasodilator and natriuretic effects of ANF. Therefore, such a PDE5 inhibitor would show potential for the treatment of hypertension and other cardiovascular indications [68]. Starting from an unselective lead substance, a medicinal chemistry approach led to sildenafil showing, at that time, an unprecedented selectivity over other PDE isoenzymes (Fig. 7.1-9). Despite encouraging results in the laboratory, the clinical results in coronary heart disease were disappointing. Surprisingly, several participants in a trial of sildenafil on 30 men in Merthyr Tydfil/Wales refused to return their unused tablets when the trial was stopped. On questioning by the physician in charge, it emerged that the patients had discovered that PDE5 is the predominant cGMP hydrolyzing activity in the cytosolic fraction from human corpus cavernosum [6].As penile erection is mediated by NO and thus cGMP, sildenafil improves erection by enhancing relaxation of the corpus cavernosal smooth muscle (Scheme 7.1-3). Sildenafil (Viagra'") revolutionized the treatment of male erectile dysfunction and became a blockbuster drug in the market. Follow-up drugs exhibit even higher potency and isozyme selectivity, potentially reducing some of the unwanted side effects of sildenafil.
7. I
The Study ofProtein Function in Chemical Biology and Drug Discovery
1
375
Sildenafil (ViagraTM) Fig. 7.1-9
Structure and isozyme selectivity of sildenafil.
NO
GTP ~GMP GMP
Smooth muscle relaxation
T
Sildenafil Scheme 7.1-3
NO-signaling pathway interfered by sildenafil.
Case Study 7: Natural Products Allow the Characterization of Different Binding Sites within a Family o f Proteins (Conotoxins and Nicotinic Acetylcholine Receptors)
As mentioned above, the classic experiments by Langley with the nicotinic acetylcholine receptor (nAChR) at the neuromuscular junction has led to the
Erection
376
I formulation of the receptor concept. nAChRs are ligand-gated ion channels 7 Reverse Chemical Genetics Revisited
belonging to the Cys-loop receptor superfamily, which allow the passage of potassium, sodium, or calcium ions across the synaptic membrane. Two classes of nAChRs exist - neuromuscular and neuronal - each being composed of five subunits that can form heteropentameric or homopentameric membrane-bound channel structures [69-71]. While the identification and pharmacological distinction of nAChR subtypes at the neuromuscular endplate (responsible for muscle contraction) and in sympathetic and parasympathetic ganglia (mediating neurotransmission) were accomplished earlier, the investigation of neuronal nAChRs in the brain is more elusive. The basic framework of neuronal nAChRs takes the form ~ 2 8 3 whose , extraordinary variety and complexity results from the fact that so far a2-a7, a9, a10, 82-84 subunits have been cloned from neuronal and sensory mammalian tissues. Diseases like Alzheimer's, Parkinson's, epilepsy, and schizophrenia, or nicotine addiction have been proven to be connected to specific subclasses of nAChRs, which creates an urge for understanding these potential targets for pharmaceutical intervention [70]. The venom of the Conus genus of marine snails contains a family of toxins, which contains oligopeptides that are highly selective at blocking nAChRs by binding to acetylcholine binding pockets between specific subunit pairs. The so-called a-conotoxins range in size between 12 and 19 amino acids and use disulfide bonds to maintain their three-dimensional shape. Although only a fraction of a-conotoxins has been isolated from snail venom yet, the small proportion of toxins whose biological activity has been annotated, has proven to be a bounty of selective tools for the study of both neuromuscular and neuronal nAChRs (Table 7.1-4) [70]. The conotoxins have not only proven invaluable for the chemical biological study of nAChRs but some of them have also been developed for the treatment of neurological conditions and are in advanced stages of clinical trials [72]. Just recently Elan Pharmaceuticals has introduced the synthetic equivalent of the w-conotoxin MVIIa Ziconotide (Prialt'") in the market as a novel nonopioid drug for the treatment of severe chronic pain. Ziconotide acts by potently and selectively blockading neuronal N-type voltage-sensing calcium channels, causing the inhibition ofthe activity ofa subset of neurons, including pain-sensing primary nociceptors [73].
7.1.5 Future Developments
Although the pharmacological approach of target validation is almost as old as the idea of target receptors, a series of recent breakthroughs in method developments in chemistry, biochemistry, bioinformatics, cheminformatics, biology, and pharmacology will boost reverse chemical genetics to new heights.
7. I
The Study ofprotein Function in Chemical Biology and Drug Discovery
Table 7.1-4 Sequences and mammalian subunit specificities of neuronal u-conotoxins [70] Name
MI1
AuIA AuIC PnIA PnI B EPI AnIA AnlB AnIC
GIC GID
VCl.1
PIA AuIB ImI
Subunit specificity
Gly-Cys-Cys-Ser-Asn-Pro-Val-Cys-His-Leu-Glu-His-Sera 6 b 2 Y u3B2 Asn=u-Cys-NH2 Gly-Cys-Cys-Ser-Tyr-Pro-Pro-Cys-Phe-Ala-Thr-As~-Sera3p4 Asp-Tyr-vs-NHz Gly-Cys-Cys-Ser-Tyr-Pro-Pro-Cys-Phe-Ala-Thr-As~-Seru3p4 Gly-Tyr-CT-N Hl Gly-Cys-Cys-Ser-Leu-Pro-Pro-Cys-Ala-Ala-Asn-Asn-Prou3B2 Asp-Tyrl”1-Cys-N Hz Gly-Cys-Cys-Ser-Leu-Pro-Pro-Cys-Ala-Leu-Ser-Asn-Prou7 Asp-Tyrlcys-NH2 Gly-Cys-Cys-Ser-Asp-Pro-Arg-Cys-Asn-Met-Asn-Asn-Pro~ 3 ~u3B4. 2 . a7 Asp-TyrlGys-NH2 Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Ala-Asn-Asn-Gln-Aspa3p2 TyrIal-Cys-NHl Gly-G~Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Al~-Asn-Asna3B2 Gln-Asp-Tyr[”l-Cys-NHz u3P2 Gly-Gly-Cys-Cys-Ser-His-Pro-Ala-Cys-Phe-Ala-Ser-As~. Pro-Asp-Tyrl”I-Cys-NH2 u3b2 (~6B283 Gly-Cys-Cys-Ser-His-Pro-Ala-Cys-Ala-Gly-As~-Asn-GlnHis-Ile-CGNHz w3P2 2 (u7 Ile-Arg-~p-Gla~’~-Cys-Cys-Ser-Asn-Pro-Ala-Cys-Arg-ValAsn-Asn-Hyp-His-Val-Cys u3B4 Gly-Cys-Cys-Ser-Asp-P~Arg-Cys-Asn-Tyr-Asp-His-ProG lu-He-CTNH 2 a 6 l a 382B3 Arg-Asp-Pro-Cys-Cys-Ser-Asn-Pro-Val-Cys-Thr-Val-HisAsn-Pro-Glu-Ile-Cys-NH2 a3b4 Gly-Cys-Cys-Ser-~-Pro-Pro-Cys-Phe-Ala-Thr-Asn-Pro-
ASP-CYS-NH~
u7 Gly-Cys-Cys-Ser-Asp-Pro-Arg-Cys-Ala-Trp-Arg-Cys-NHl
a7 n.d.(not ImIII Tyr-Cys-Cys-His-Arg-Gly-Pro-Cys-Met-Val-Trp-C>-NHl determined) a6lu3B2 Y BuIA Gly-Cys-Cys-Ser-Thr-Pro-Pro-Cys-Ala-Val-Leu-Tyr-Cysa6lu3p4 NH2
lmI1
~
Sequence
Ala-Cys-Cys-Ser-Asp- Arg-Arg-Cys- Arg-Trp- Arg-qs-N Hz
Disulfide bonds are linked as bold pairs and underlined pairs a Sulfotyrosine. b Carboxyglutamate.
We think that the following developments will shape the future of the field to a major extent: 1. The completion of the sequencing of the human genome has provided a global map of the potential landscape of
I
377
378
I
7 Reverse Chemical Genetics Revisited
efforts in reverse chemical genetics. At present, a qualified total number of genes or gene products is available, and most proteins are available at least as expressed sequence tags (EST) sequence data. Future efforts in sequencing and single nucleotide polymorphism (SNP) analysis of subpopulations, defined by health respective disease status, genetic heritage, ethnic background, etc. will increase the resolution of sequence data and information. 2. The large-scale efforts in biochemistry and biology using the whole repertoire of classical mutation genetics, antisense, RNAi, cell-biological methods, etc. will continue and support the exponential growth of biological understanding of cells and organisms. 3. The now fruit-bearing structural genomic initiatives will increase the number of available protein structures that could be exploited for rational design of small molecule ligands, as detailed above. Unfortunately, for a series of important target protein classes such as GPCRs and ion channels, only a very limited number of experimentally solved protein structures are available. Hopefully, new protein expression techniques and crystallization procedures will eliminate this bottleneck in the near future. Homology modeling techniques have been improved substantially in the last years and they provide a way to bridge the time gap until experimentally derived structure information on target proteins becomes available 1741. 4. Combinatorial chemistry, parallel synthesis, and solidphase synthesis will continue to become more efficient and productive tools for the synthesis of compound libraries. Despite their still incomplete status, rationales about library diversity, drug-likeness, promiscuity of functional groups or structural elements, metabolic stability, bioavailability, etc. will become increasingly important guiding principles for library design. Growing accessibility of building blocks and an increasing number of different scaffolds will allow creation of chemical compounds of a new quantity and quality, which can be subjected to biological screening for protein-binding assays or phenotypic forward genetic screening. 5. An increasing number of available protein-binding assays, functional cell-based assays, and methods of chemical proteomics (affinity chromatography, three-hybrid assays, pull-down assays) will allow for a better assignment of the specificity and selectivity ofa hit compound. It would be desirable that the data collected during these screening
7.1 The Study ofprotein Function in Chemical Biology and Drug Discouev
programs, will be translated into an understanding of the correlation between the chemical structure and the protein-binding capability. New cheminformatic approaches will support this approach. 6. With the more specific chemical probes, identified from the screening processes outlined earlier in 1-5, more educational and functional analyses of cells and organisms can be carried out, taking advantage of new methodologies describing the physiological state of an object, such as DNA-chip analysis, imaging techniques, RT-PCR, proteomics, phenotypic assaying using antibodies, and many more [75-771. 7. The holistic approach of system biology is assisted by large-scale computing that is able to deal with the complexity of the biological networks and experimental data. Once it is possible to compute the global response of a biological system to a perturbation or external intervention, the system can be regarded as understood and this might accelerate the search for new pharmacological targets tremendously [78]. Although these techniques will certainly bear fruit, the difficulty and the complexity ofthe task tackled should not be underestimated. Research carried out at the interface of chemistry and biology over the last two decades has taught one important lesson: the increase in our understanding of processes at a cellular or organismic level goes parallel to the notion that nature is much more complicated than most might have anticipated. What once were signal pathways have turned into signal networks, which shows an almost brainlike plasticity which is currently beyond our understanding. Recent results indicate that “dirty” drugs (i.e.,drugs targeting several protein targets at the same time) [79]used in the treatment of CNS (central nervous system) disorders are more effective and cause less side effects than “clean” drugs [80].A similar effect, in which a synergistic interplay between kinases plays a role, has been proposed for cancer drugs [81].Manipulation of a network with multiple redundant backup lines needs the orchestrated tracking down of a signal via multiple interactions but most likely not the knockout of a single mode (i.e., a single protein). This will lead to new rules for drug discovery. Whether randomly created or intentionally designed unselective drugs or mixtures of selective drugs will be the ideal remedies against those diseases, will be a question which has to be answered in the future. 7.1.6 Conclusion
Reverse chemical genetics is one of the several necessary tools in target validation. Among these tools it holds a particularly prominent role because
1
379
380
7 Reverse Chemical Genetics Revisited
I full control over the biological function of a protein is the key to its complete understanding in a physiological context. Unfortunately, it will not be easy to achieve this ultimate goal, as it will be very difficult to develop chemical probes with complete selectivity and specificity. Nevertheless, even an approximation to this goal will be rewarded with a major gain in insight and understanding of biological systems.
Acknowledgments
R. B. and H. W. thank the Max-Planck-Society, the Deutsche Forschungsgemeinschaft, the Fonds der Chemischen Industrie, and the University of Dortmund for continuous and generous financial support of their research.
References
relationships, Nat. Rev. Genet. 2003, 4, resulting from a passion for synthetic 309-314. organic chemistry, Bioorg. Med. Chem. 11. M. Bredel, E. Jacoby, 1998, 6, 1127-1152. Chemogenomics: an emerging T.J. Mitchison, Towards a strategy for rapid target and drug pharmacological genetics, Chem. Biol. discovery, Nat. Rev. Genet. 2004, 5, 1994, 1, 3-6. 262-275. H.E. Blackwell, Y. Zhao, Chemical 12. R.P. Hertzberg, A. J. Pope, genetic approaches to plant biology, High-throughput screening: Plant Physiol. 2003, 133,448-455. technology for the 21st century, Curr. A.H. Maehle, C.-R. Priill, R.F. Opin. Chem. Biol. 2000, 4,445-451. Halliwell, The emergence of the drug 13. J. Wolcke, D. Ullmann, Miniaturized receptor theory, Nat. Rev. Drug Discou. HTS technologies-uHTS, Drug 2002, 1, 637-641. Discov. Today 2001, 6,637-646. J.R. Peterson, T.J. Mitchison, Small 14. S.A. Sundberg, High-throughput and molecules, big impact: a history of ultra-high-throughput screening: chemical inhibitors and the solution- and cell-based approaches, cytoskeleton, Chem. Biol. 2002, 9, Curr. Opin. Biotechnol. 2000, 11, 1275-1285. 47-53. W. Sneader, Drug Discovery: A History, 15. L. Silverman, R. Campbell, J.R. Wiley, Chichester, 2005. Broach, New assay technologies for R.P. Ahlquist, A study of the high-throughput screening, Curr. adrenotropic receptors, A m .J. Physiol. Opin. Chem. Biol. 1998, 2, 397-403. 1948, 153,586-600. 16. R. Breinbauer, I.R. Vetter, C.G. Wermuth, Selective optimization H. Waldmann, From protein domains of side activities: another way of drug to drug candidates-natural products discovery, J. Med. Chem. 2004,47, as guiding principles in the design 1303- 1314. and synthesis of compound libraries, J. Drews, Drug discovery: a historical Angew. Chem. 2002, 114,3002-3115; perspective, Science 2000, 287, Angew. Chem. Int. Ed. Engl. 2002, 41, 1960- 1964. 2879-2890. B.R. Bochner, New technologies to 17. G. Schneider, H.J. Bohm, Virtual assess genotype-phenotype screening and fast automated docking
I . S.L. Schreiber, Chemical genetics
2.
3.
4.
5.
6. 7.
8.
9.
10.
References I381
18. 19.
20.
21.
22.
23.
24.
25.
methods, Drug Discov. Today 2002, 7, 64-70. Glaxo Wellcome, Redesigning drug discovery, Nature 1996, 384 (Suppl-5). L.M. Toledo-Sherman, D. Chen, High-throughput virtual screening for drug discovery in parallel, Curr. Opin. Drug. Discov. Deuel. 2002, 5,414-421. M. Knockaert, N. Gray, E. Damiens, Y.-T. Chang, P. Grellier, K. Grant, D. Fergusson, J. Mottram, M. Soete, J.-F. Dubremetz, K. Le Roch, C. Doerig, P.G. Schultz, L. Meijer, Intracellular targets of cyclin-dependent kinase inhibitors: identification by affinity chromatography using immobilised inhibitors, Chem. Biol. 2000, 7, 411-422. J. Wissing, K. Godl, D. Brehmer, S. Blencke, M. Weber, P. Habenberger, M. Stein-Gerlach, A. Missio, M. Cotton, S. Muller, H. Daub, Chemical proteomic analysis reveals alternative modes of action for Pyrido[2,3-d]pyrimidine kinase inhibitors, Mol. Cell. Proteomics 2004, 3,1181-1193. D. Brehmer, Z. Greff, K. Godl, S. Blencke, A. Kurtenback, M. Weber, S. Muller, B. Klebl, M. Cotton, G. Keri, J. Wissing, H. Daub, Cellular targets of gefitinib, Cancer Res. 2005, 65, 379-382. M.A. Fabian, W.H. Biggs 111, D.K. Treiber, C.E. Atteridge, M.D. Azimioara, M.G. Benedetti, T.A. Carter, P. Ciceri, P.T. Edeen, M. Floyd, J.M. Ford, M. Galvin, J.L. Gerlach, R.M. Grotzfeld, S. Herrgard, D.E. Insko, M.A. Insk0,A.G. Lai, J.-M. Lelias, S.A. Mehta, Z.V. Milanov, A.M. Velasco, L.M. Wodiscka, H.K. Patel, P.P. Zarrinkar, D.J. Lockhart, A small molecule-kinase interaction map for clinical kinase inhibitors, Nut. Biotechnol. 2005, 23, 329-336. R.A. Butcher, S.L. Schreiber, Using genome-wide transcriptional profiling to elucidate small-molecule mechanism, C u r . Opin. Chem. Biol. 2005, 9, 25-30. M.D. Adams, J.J.Sekelsky, From sequence to phenotype: reverse
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
genetics in drosophila melanogaster, Nut. Rev. Genet. 2002, 3, 189-198. T.U. Mayer, Chemical genetics: tailoring tools for cell biology, Trends Cell Biol. 2003, 13, 270-277. B.R. Stockwell, Chemical genetics: ligand-based discovery of gene function, Nut. Rev. Genet. 2000, I, 116-125. K. Scearce-Levie, P. Coward, C.H. Redfern, B.R. Conklin, Tools for dissecting signaling pathways in vivo: receptors activated solely by synthetic ligands, Meth. Enzymol. 2002, 343, 232-248. K. Shokat, M . Vellaca, Novel chemical genetic approaches to the discovery of signal transduction inhibitors, Drug Discov. Today 2002, 7,872-879. A.S. Carroll, A.C. Bishop, J.L. DeRisi, K.M. Shokat, E.K. O’Shea, Chemical inhibition of the Pho85 cyclin-dependent kinase reveals a role in the environmental stress response, Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 12578-12583. N.A. Hathaway, R.W. King, Dissecting cell biology with chemical scalpels, Curr. Opin. Cell Biol. 2005, 17, 12-19. M.-A. Bjornsti, P.J. Houghton, The TOR pathway: a target for cancer therapy, Nat. Rev. Cancer 2004, 4, 335 -348. S.L. Schreiber, Immunophilinsensitive phosphatase action in cell signaling pathways, Cell 1992, 70, 365-368. C.M. Grozinger, S.L. Schreiber, Deacetylase enzymes: biological functions and the use of small-molecule inhibitors, Chem. Biol. 2002, 9, 3-16. S. J. Haggerty, K.M. Koeller, J.C. Wong, C.M. Grozinger, S.L. Schreiber, Domain-selective small-molecule inhibitor of histone deacetylase 6 (HDAC6)-mediated tubulin deacetylation, Proc. Natl. Acad. Sci. U.S.A. 2003, 100,4389-4394. S.J. Haggerty, K.M. Koeller, J.C. Wong, R.A. Butcher, S.L. Schreiber, Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase
382
I
7 Reverse Chemical Genetics Revisited
37.
38.
39.
40.
41.
42.
43.
44.
inhibitors using cell-based assays, Chem. Biol. 2003,10,383-396. F.G. Kuruvilla, A.F. Shamji, S.M. Sternson, P.J. Hergenrother, S.L. Schreiber, Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays, Nature 2002,41 6, 653-657. J.A. Boutin, V. Audinot, G. Ferry, P. Delagrange, Molecular took to study melatonin pathways and actions, Trends Phamacol. Sci. 2005,26, 412-419. J.S. Sebolt-Leopold,R. Herrera, Targeting the mitogen-activated protein kinase cascade to treat cancer, Nat. Rev. Med. 2004,4, 937-947. J.S. Sebolt-Leopold,D.T. Dudley, R. Herrera, K. van Becelaere, A. Wiland, R.C. Gowan, H. Tecle, S.D. Barrett, A. Bridges, S. Przybranowski, W.R. Leopold, A.R. Saltiel, Blockade of the MAP kinase pathway suppresses growth of colon tumors in vivo, Nut. Med. 1999,5,810-816. S.P. Rohrer, E.T. Birzin, R.T. Mosley, S.C. Berk, S.M. Hutchins, D.-M. Shen, Y. Xiong, E.C. Hayes, R.M. Parmar, F. Foor, S.W. Mitra, S.J. Degrado, M. Shu, J.M. Klopp, S.-J.Cai, A. Blake, W.W.S. Chan, A. Pasternak, L. Yang, A.A. Patchett, R.G. Smith, K.T. Chapman, J.M. Schaeffer, Rapid Identification of subtype-selective agonists of the somatostatin receptor through combinatorial chemistry, Science 1998, 282, 737-740. S.P. Rohrer, J.M. Schaeffer, Identification and characterization of subtype selective somatostatin receptor agonists, 1.Physiol. 2000,94, 211-215. K.L. Geris, B. De Groef, S.P. Rohrer, S. Geelissen, E.R. Kuhn, V.M. Darras, Identification of somatostatin receptors controlling growth hormone and thyrotropin secretion in the chicken using receptor subtype-specificagonists, /. Endocrinol. 2003,177,279-286. M. Pawlikowski, G. Melen-Mucha, Somatostatin analogs-from new
45.
46.
47.
48.
49.
50.
51.
52.
53.
molecules to new applications, Curr. Opin. Phamacol. 2004,4, 608-613. K. Kohler, A.C. Lellouch, S. Vollmer, 0. Stoevesandt, A. Hoff, L. Peters, H. Rogl, B. Malissen, R. Brock, Chemical inhibitors when timing is critical: a pharmacological concept for the maturation of T cell contacs, Chembiochem 2005, 6, 152-161. J.K. Chen, J. Taipale, K.E. young, T. Maiti, P.A. Beachy, Small molecule modulation of smoothend activity, Proc. Natl. Acad. Sci. U.S.A. 2002,99, 14071- 14076. M.A. Lampson, K. Renduchitala, A. Khodjakov,T.M. Kapoor, Correcting improper chromosome-spindle attachments during cell division, Nat. Cell Biol. 2004,6,232-237. W. McDowell, R.T. Schwarz, Dissecting glycoprotein biosynthesis by the use of specific inhibitors, Biochimie 1998, 70,1535-1549. T. Willson, Chemical genomics of orphan nuclear receptors, in Ernst Schering Research Foundation Workshop 42: Small Molecule-Protein Interactions, (Eds.: H. Waldmann, M. Koppitz), Springer, Berlin, 2003, pp. 29-42. S.A. Kliewer, J.M. Lehmann, T.M. Willson, Orphan nuclear receptors: shifting endocrinology into reverse, Science 1999, 284, 757-760. D.J. Parks, S.G. Blanchard, R.K. Bledsoe, G. Chandra, T.G. Consler, S.A. Kliewer, J.B. Stimmel, T.M. Willson, A.M. Zavacki, D.D. Moore, J.M. Lehmann, Bile acids: natural ligands for an orphan nuclear receptor, Science 1999,284,1365-1368. A.M. Zavacki, J.M. Lehmann, W. Seol, T.M. Willson, S.A. Kliewer, D.D. Moore, Activation of the orphan receptor RIP14 by retinoids, Proc. Natl. Acad. Sci. U.S.A. 1997, 94, 7909-7914. P.R. Maloney, D.J. Parks, C.D. Haffner, A.M. fivush, G. Chandra, K.D. Plunket, K.L. Creech, L.B. Moore, J.G. Wilson, M.C. Lewis, S.A. Jones, T.M. Willson, Identification of a chemical tool for the orphan nuclear receptor FXR, J . Med. Chem. 2000,43, 2971-2974.
References I 3 8 3 54.
55.
56.
57.
58.
59.
60.
B. Goodwin, S.A. Jones, P.R. Price, Impact of isotype-selective estrogen M.A. Watson, D.D. McKee, L.B. receptor agonists on ovarian function, Moore, C. Galardi, J.G. Wilson, M.C. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, Lewis, M.E. Roth, P.R. Maloney, T.M. 5129-5134. Willson, S.A. Kliewer, A regulatory 61. T. Sakurai, A. Amemiya, M. Ishii, cascade of the nuclear receptors FXR, I. Matsuzaki, R.M. Chemelli, SHP-1, and LRH-1 represses bile acid H. Tanaka, S.C. Williams, J.A. biosynthesis, Mol. Cell 2000, 6, Richardson, G.P. Kozlowski, S. 517-526. Wilson, J.R.S. Arch, R.E. Buckingham, S.R. Stauffer, C.J. Coletta, R. Tedesco, A.C. Haynes, S.A. Carr, R.S. Annan, G. Nishiguchi, K. Carlson, J. Sun, D.E. McNulty, W.S. Liu, J.A. Terrett, B.S. Katzenellenbogen, J.A. N.A. Elshourbagy, D.J. Bergsma, Katzenellenbogen, Pyrazole ligands: M. Yanagisawa, Orexins and orexin structure-affinity/activity relationships receptors: a family of hypothalamic and estrogen receptor-alpha-selective neuropeptides and G protein-coupled agonists, J. Med. Chem. 2000, 43, receptors that regulate feeding 4934-4947. behaviour, Cell 1998, 92, 573-585. M.J. Meyers, J. Sun, K.E. Carlson, 62. T. Sakurai, Reverse pharmacology of G.A. Marriner, B.S. Katzenellenbogen, orexin: from an orphan GPCR to J.A. Katzenellenbogen, Estrogen integrative physiology, Regul. Pept. receptor-beta potency-selective 2005, 126,3-10. ligands: structure-activity relationship 63. S. Katugampola, A. Davenport, studies of diarylpropionitriles and Emerging roles for orphan G-protein coupled receptors in the their acetylene and polar analogues, I. cardiovascular system, Trends Med. Chem. 2001,44,4230-4251. Phamacol. Sci. 2003, 24, 30-35. H.A. Harris, L.M. Albert, 64. H.J. Kreinkampf, H. J. Larusson, Y. Leathurby, M.S. Malamas, R.E. I. Witte, T. Roeder, N. Birgiil, H.-H. Mewshaw, C.P. Miller, Y.P. Kharade, Honck, S. Harder, G . Ellinghausen, J. Marzolf, B.S. Komm, R.C. Winnek, F. Buck, D. Richter, Functional D.E. Frail, R.A. Henderson, Y. Zhu, annotation of two orphan J.C. Keith Jr, Evaluation of an estrogen G-protein-coupled receptors, drostar-1 receptor-beta agonist in animal and -2 from drosophila melanogaster models of human disease, and their ligands by reverse Endocrinology 2003, 144,4241-4249. pharmacology, J. Biol. Chem. 2002, A. Hillisch, 0. Peters, D. Kosemund, 277, 39937-39943. G. Muller, A. Walter, B. Schneider, 65. S . Mahrus, C.S. Craik, Selective G. Reddersen, W. Elger, K.-H. chemical functional probes of Fritzemeier, Dissecting physiological Granzymes A and B reveal granzyme roles of estrogen receptor alpha and B is a major effector of natural killer beta with potent selective ligands from cell-mediated lysis of target cells, structure-based design, Mol. Chem. Biol. 2005, 12,567-577. Endocrind. 2004, 18,1599-1609. 66. J.A. Duncan, A.G. Gilman, A J. Widder, T. Pelzer, C. Poser-Klein, cytoplasmic acyl-protein thioesterase K. Hu, V. Jazbutyte, K.H. Fritzemeier, that removes palmitate from G protein C. Hegele-Hartung, L. Neyses, alpha subunits and pZl(RAS),]. Bid. J. Bauersachs, Improvement of Chem. 1998,273, 15830-15837. endothelial dysfunction by selective estrogen receptor-alpha stimulation in 67. P. Deck, D. Pendzialek, M. Biel, M. Wagner, B. Popkirova, B. Ludolph, ovariectomized SH R, Hypertension G. Kragol, J. Kuhlmann, A. Giannis, 2003,42,991-996. H. Waldmann, Development and C. Hegele-Hartung, P. Siebel, biological evaluation of acyl protein 0. Peters, D. Kosemund, G. Miiller, thioesterase 1 (APT1) inhibitors, A. Hillisch, A. Walter, Angew. Chem. 2005, 117,5055-SOGO: J. Kraetzschmar, K.-H. Fritzemeier,
384
I
7 Reverse Chemical Genetics Revisited
68.
69.
70.
71.
72.
73.
74.
Angew. Chem. Int. Ed. Engl. 2005, 44, 4975-4980. N.K. Terrett, A.S. Bell, D. Brown, P. Ellis, Sildenafil (ViagraTM, a potent and selective inhibitor oftype 5 CGMP phosphodiesterase with utility for the treatment of male erectile dysfunction, Bioorg. Med. Chem. Lett. 1996, 6, 1819-1824. A. Nicke, S. Wonnacott, R.J. Lewis, a-Conotoxins as tools for the elucidation of structure and function of neuronal nicotinic acetylcholine receptor subtypes, Eur. J . Biochem. 2004, 271,2305-2319. R.W. James, a-Conotoxins as selective probes for nicotinic acetylcholine, Curr. Opin. Pharmacol. 2005, 5, 280-292. R.C. Hogg, M. Raggenass, D. Bertrand, Nicotinic acetylcholine receptors: from structure to brain function, Rev. Physiol. Biochem. Pharmacol. 2003, 147, 1-46. B.G. Livett, K.R. Gayler, Z. Khalil, Drugs from the sea: conopeptides as potential therapeutics, Curr. Med. Chem. 2004, 1 I, 1715-1723. G.P. Miljanich, Ziconotide: Neuronal calcium channel blocker for treating severe chronic pain, C u m Med. Chem. 2004, I I , 3029-3040. A. Hillisch, L.F. Pineda, R. Hilgenfeld, Utility of homology models in the drug discovery process, Drug Discov. Today 2004, 9, 659-669.
75.
76.
77.
78.
79.
80.
81.
D.E. Root, S.P. Flaherty, B.P. Kelley, B.R. Stockwell, Biological mechanism profiling using an annotated compound library, Chem. Biol. 2003, 10,881-892. Z.E. Perlman, M.D. Slack, Y. Feng, T.J. Mitchison, L.F. Wu, S.J. Altschuler, Multidimensional drug profiling by automated microscopy, Science 2004,306,1194-1 198. Z.E. Perlman, T.J. Mitchison, T.U. Mayer, High-content screening and profiling of drug activity in an automated centrosome-duplication assay, Chembiochem 2005, 6, 145-151. E.C. Butcher, E.L. Berg, E.J. Kunkel, Systems biology in drug discovery, Nat. Biotechnol. 2004, 22, 1253-1259. R. Morphy, C. Kay, Z. Rankovic, From magic bullets to designed multiple ligands, Drug Discov. Today 2004, 9, 641-651. B.L. Roth, D. J. Sheffer, W.K. Kroeze, Magic shotguns versus magic bullets: selectively non-selective drugs for mood disorders and schizophrenia, Nut. Rev. Drug Discov. 2004, 3, 353-359. C. Kung, D.M. Kenski, S.H. Dickerson, R.W. Howson, L.F. Kuyper, H.D. Madhani, K.M. Shokat, Chemical genomic profiling to identify intracellular targets of a multiplex kinase inihibitor, Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 3587-3592.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study
Philip A. Cole
Outlook
This chapter discusses two chemical technologies used to evaluate protein kinase structure and function. The introduction of phosphorlate analogs of phosphoamino acids site specifically into proteins by protein semisynthesis has allowed for unique insights into the regulation of protein tyrosine phosphatases (PTP) and melatonin production. Mechanistically designed peptide and protein-based bisubstrate analogs of protein kinases have been demonstrated to be selective and also high-affinity ligands for both tyrosine and serinelthreonine kinases. These compounds can be useful structural as well as functional proteomic tools. By complementing well-established methods used in protein kinase analysis, phosphonate incorporation into proteins and bisubstrate analogs show promise in sorting out cell-signaling pathways. More broadly, this chapter has attempted to convey the enormous opportunities for modern chemical intervention in the study of enzymes in the postgenomic era.
7.2.1 Overview
The discovery of enzymes as protein-based catalysts for chemical reactions in living organisms represents a milestone in our understanding of life and in our development of cures in post-nineteenth-centurymedicine. While we now know that not all proteins are enzymes, the study of enzymes in a range of venues is still a central focus of modern biomedical research. Historians of science point out that it has been a combination of the discovery and development of new technologies and their experimental exploitation that has led to new scientific concepts. Over the course of the twentieth century, the application of novel technologies provided fundamental advances in our understanding of enzyme mechanism and function. In the early years of enzymology, extensive use of chemically modified substrates (including isotopic labels), group-modifying reagents to target specific amino acid side chains, and varied reaction conditions (salt, pH, viscosity) led to relatively simple, but surprisingly accurate, models of understanding of how enzymes work. Later in the twentieth century, the revolutions in structural biochemistry beginning with the first X-ray structure of an enzyme (lysozyme) bound to substrate analog in 1965 have been critical to elucidating catalytic mechanisms and substrate selectivity [ 11. Other biophysical techniques, especially N M R spectroscopy, mass spectrometry, and fluorescence spectroscopy, have, in parallel, led to key Chemical Biology From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gbnther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
I
385
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study
In 1994, the method of native chemical ligation was developed, which allows for the efficient linking of large peptide segments with amide bonds [7]. The native chemical ligation strategy is based on Wieland’s chemoselective reaction between an N-terminal Cys of one peptide and a C-terminal thioester of another. This methodology was subsequently expanded in 1996 to use in protein semisynthesis by generating N-terminal cysteines in recombinant protein fragments via proteolysis [8]. An even more practical advance was achieved when recombinant protein fragments containing thioesters were generated by exploiting nature’s inteins [9, 101. These thioesters can be linked to N-terminal cysteine containing peptides in a process that has been called expressedprotein ligation (EPL)(Fig. 7.2-1).This technology has been particularly useful in the study of enzyme recognition, mechanism, and regulation. EPL is most efficiently applied when the region of the protein under study is near the C-terminus such that chemical modification can be introduced within the N-terminal cysteine containing synthetic peptide.
7.2.2 The Enzymology of Posttranslational Modifications o f Proteins
Whereas the field of enzymology has primarily concerned small-molecule metabolic pathways over the past 80years, there is a growing interest in focusing on enzyme structure and function that relates to protein posttranslational modifications. It is now believed that posttranslational modifylng pathways are hierarchically elevated in the context of governing cell
Fig. 7.2-1 Method of expressed protein ligation. Thiophenol can be substituted by M ESNA (mercaptoethylsulfonate).
I
387
388
I growth and differentiation in health and disease. Modifications of particularly 7 Reverse Chemical Genetics Revisited
intensive investigation include proteolysis, phosphorylation, acetylation, methylation, ubiquitination, glycosylation, and carboxylation [ll]. Current understanding of these processes, in general, is rather primitive. Many of the chemical tags produced by posttranslational modifying enzymes are reversible and tightly regulated by cellular machinery. Reconstructing these enzyme pathways is especially challenging since protein substrates are abundant and varied in the cell, creating an almost infinite number of potential sites of modification. It is in addressing problems in the posttranslational modification arena that the experimental arsenal of biochemists is sorely tested.
7.2.2.1
Protein Kinases and Phosphatases
Among enzyme superfamilies, protein kinases and protein phosphatases (Fig. 7.2-2) have occupied a preeminent position in biomedical research both because of their relatively large size and involvement in a myriad of cell regulatory and disease processes. It is estimated that the human genome encodes 500 protein kinases, about 80% serine/threonine selective and the remaining 20% tyrosine selective [12]. There are about 100 protein tyrosine phosphatases (PTPs)which include classical as well as dual specificity enzymes [13]. Understanding the function and regulation of these enzymes is a daunting task because of their large numbers as well as the numerous potential cellular substrates and complex signaling networks in which they participate. Peptide substrates and in vitro kinase assays are often unable to replicate the specificity of in vivo phosphorylation events [14]. Protein kinase inhibitors developed so far lack the specificity necessary to pinpoint kinase function. Genetic knockouts, coimmunoprecipitation studies, two-hybrid screens, site-directed mutagenesis, and other classical molecular biological techniques have been of enormous help in analyzing protein kinases and their functions but even these can be imprecise tools. Kinase-substrate interactions are often very weak with regard to ground-state binding. Thus, standard protein-protein interaction techniques can lack the sensitivity necessary to identify kinase-substrate relationships. Gene deletions, even conditional and tissue-specific knockouts, are unable to provide the temporal resolution that underlies rapid phosphorylation events characterized by kinases. While mutagenesis can be effective in analyzing the role of phosphorylation events, the genetically encoded amino acids fall short in mimicking phosphoserine and especially phosphotyrosine function. Since the early 199Os, chemical Protein kinase
4&isx
u
ROH
ROP0,'-
Protein phosphatase
Fig. 7.2-2
Reversible protein phosphorylation.
1
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study
389
biologists have designed several powerful approaches to augment our ability to analyze phosphorylation networks and functions [15-181. We will discuss the development of two of these approaches, their scopes and limitations, and highlight several applications. 7.2.2.1.1
Phosphonates as Probes o f Kinase Function
As described earlier, the ability to site specifically replace one amino acid with another genetically encoded residue provides extraordinary access to analyze protein structure and function. An area where it is often applied is in the assessment of the role of phosphorylation of side chains. Typically, two classes of mutants are made: those that prevent modification (nonphosphorylatable) and those that are constitutive (nonhydrolyzable) phosphorylated mimics. For the former, the phosphorylatable residues Ser and Thr are replaced with Ala, and Tyr with Phe (Fig. 7.2-3). These are reasonably successful in many cases, although they can be misleading because they lack the hydrogen-bonding and polarity characteristics of the authentic residues [19]. More difficult is the substitution of a phosphoamino acid with one of the 20 encoded residues. Phosphoserine/threonine is commonly replaced with Asp or Glu residues (Fig. 7.2-4). However, Asp and Glu are deficient in several respects. First, Asp and Glu are considerably smaller than phosphoserine/threonine. Second, Asp and Glu side chains have only two oxygen atoms available for receiving hydrogen bonds and can only be monoanionic, unlike the typical dianionic form of phosphate. Third, the pKa values of Asp and Glu are considerably higher than that of the phosphate monoanion - indeed Asp and Glu carboxylates can sometimes be found in the neutral form. Thus, interpreting results with Asp and Glu substitutions can be difficult. For phosphotyrosine, there are no really suitable replacements among the 20 natural amino acids. Recognition of the lack of similarity between the phosphoamino acids and the natural residue mimics have led investigators to design synthetic analogs. Among these, the phosphonates are probably the closest mimics and have been the most popular alternatives [20]. In these analogs, the bridging oxygen between phosphorus and carbon is replaced by a methylene or a difluoromethylene (Fig. 7.2-5). While the bond distances and angles are
+H3N Hobo- 0
Ser
+H3N H O G0o -
Thr
Ho\o-+H3N TYr
0
0
Ala
Amino acid residues targeted by eukaryotic protein kinases and their nonphosphorylatable analogs.
Fig. 7.2-3
"--i.:
qo-
+H3N
+H3N
Phe
0 0-
7 Reverse Chemical Genetics Revisited
PhosphoSer
PhosphoThr
Glu
ASP
Phosphosphorylated amino acid residues and genetically encoded mimics
Fig. 7.2-4
0
0 -0 -;Lo-
-o-;!.o-
00 Prna
Fig. 7.2-5
0-
0 F,Prna
0-
0 PrnP
0 F2PrnP
Phosphonate mimics of phosphorylated amino acids.
slightly different from an ester linkage, they are fairly close approximations. The relative merits of fluoro versus hydrogen substitution in the bridging methylene have also been described [21]. While the CF2 is slightly larger than CH2 and sterically bulkier than a single oxygen atom, CF2, like oxygen, has the potential to be a hydrogen bond acceptor via the fluorine lone pairs. Perhaps more importantly, it confers a more physiologic pKa for the nonbridging phosphate oxygens, encouraging the dianionic form at neutral pH. From a practical perspective, the CF2 group can be exploited as a specific and sensitive probe in NMR studies, although this has not been performed routinely. Early work on the use of phenylalanine phosphonates in synthetic peptides as SH2 domain ligands and phosphotyrosine phosphatase inhibitors proved the efficicacy of these agents in medicinal chemistry [20,22]. Incorporation of phosphonomethylene alanine (Pma)and phosphonomethylene phenylalanine (Pmp) using nonsense-mediated suppression has also been shown to be feasible using in vitro translation [5], but this has not been used for practical applications, perhaps because of scale-up challenges. Pma and Pmp have not yet been used in vivo in nonsense suppression, presumably because of the limited cell permeability of the amino acids. Protein semisynthesis and, in particular, EPL can provide a straightforward route to phosphonate incorporation. Indeed, these techniques prove valuable for site-specificincorporation of the standard phosphoamino acids which have been effectively used in structural and enzymatic analyses [9, 231. EPL is most efficiently used when the phosphate modification is within 50 amino acids of
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study
the C-terminus of the desired protein or protein fragment. The next simplest case for protein semisynthesis occurs when the modification of interest is near the N-terminus and is installed in a C-terminal thioester containing peptide. Because of the somewhat more challenging task of preparing complex peptides carrying thioesters, this strategy can be a bit more cumbersome than EPL. However, phosphonates have now been incorporated using both strategies and in the following text, we will describe applications of these approaches in investigations on PTPs and serotonin N-acetyltransferase. 7.2.2.1.2
Protein Tyrosine Phosphatases as Substrates o f Kinases
The PTPase family consists of about 100 family members that include both classical and dual specificity (Ser/Tyr) for hydrolyzing phosphoproteins and, sometimes, phospholipids [13]. Like protein kinases, they are usually multidomain enzymes and are subject to a range of regulatory events. Somewhat paradoxically, many PTPases are themselves substrates for protein tyrosine kinases [24]. However, working out the function of these phosphorylation events has been a challenging task. As one might expect, these phosphorylated PTPase forms are quite unstable and readily undergo presumed autodephosphorylation, complicating biochemical analysis. Some investigators have attempted to use thiophosphorylation catalyzed by protein kinases, but achieving high stoichiometry and site specificity is very difficult; moreover, thiophosphates are still susceptible to enzymatic hydrolysis, albeit more slowly [25]. Here, phosphonate analog incorporation is an attractive solution. 7.2.2.1.3
SHP-1 and SHP-2
Examples of tyrosine phosphatases that are subject to tyrosine phosphorylation include SHP-1 and SHP-2 [26]. These phosphatases are the SH2 domain containing tyrosine phosphatases that have the domain architecture shown and include two tandem N-terminal SH2 domains followed by a catalytic domain and ending in a C-terminal tyrosine phosphorylated tail (Fig. 7.2-6). They are quite homologous overall in terms of the amino acid sequence but do show significant functional differences. SHP-2 is ubiquitously expressed and implicated as a positive effector of growth factor receptor tyrosine kinase signaling through MAP kinases [26]. Noonan syndrome, which is a genetic disease involving multiple developmental abnormalities, is often caused by mutations in SHP-2 [26].SHP-1 expression is most prominently expressed in cells of hematopoietic lineage [26]. In contrast to SHP-2, SHP-1 is generally regarded as a negative regulator of MAP kinase signaling [26]. Mutations of SHP-1 in mice lead to pulmonary fibrosis through unclear mechanisms [26]. Both SHP-1 and SHP-2 show similar three-dimensional structures which are noteworthy for a large surface of interaction between the N-terminal S H 2 domain and the catalytic domain [26]. Enzymatic studies show that this interaction, which can be disrupted by point mutations or SH2 engagement by
I
391
392
N N-SHP
I
7 Reverse Chemical Genetics Revisited
C-SH2
PTPase
C
SHP-2
PTPase
c
SHP-1
Fig. 7.2-6
Domain architecture of protein tyrosine phosphatases SHP-1 and SHP-2. The highlighted tyrosine residues are modified by protein tyrosine kinases.
trans-phosphotyrosinepeptide ligands, is quite repressive for catalytic activity [26].Removal of the SH2 domains activates the phosphatase activity of SHP-1 and SHP-2 by 10-fold or more and the binding of the SH2 domains by phosphotyrosine ligands can be comparably stimulating [26]. Phosphonates as Probes o f SHP-1 and SHP-2 Regulation Several groups have shown that SHP-2and SHP-1are C-terminallyphosphorylated on two tyrosine residues but the function of these phosphorylation events is controversial. One model is that these phosphorylation events may recruit SH2 domain containing adaptor proteins such as Grb2. Another model is that they may modulate the activity of the enzymes. To address these problems, EPL was employed to incorporate the phosphonate analogs Pmp or FzPmp at the sites of modification. Semisynthetic proteins containing one or two phosphonates at the physiologic sites were prepared [24, 27, 281. In the case of SHP-2,each ofthe phosphonate replacements were responsible for two- to threefold stimulation of phosphatase activity [24]. It should be noted that FzPmp was associated with about 1.5-fold greater activation than the corresponding Pmp substitution [27]. Moreover, the two Pmps, when present together, showed nearly additive effects, suggesting concerted mechanistic models [27]. Partial proteolysis studies along with site-directed mutagenesis experiments revealed that Y-542 was likely interacting with the N-terminal SH2 domain and Y-580, with C-terminal SH2 domain [24, 271, each in an intramolecular fashion (Fig. 7.2-7). Not surprisingly, the corresponding phosphotyrosine groups were “protected” from intermolecular phosphatase activity by these SH2 interactions [27]. While the activation by Pmp-542 resulting from intramolecular engagement of the N-SH2 domain could be readily rationalized from the X-ray structure, the effects of the C-SH2 interaction with Pmp-580 were less easily understood and were presumably related to an indirect effect on conformation. To evaluate the relevance of these findings to in vivo signaling, cellular microinjection studies were undertaken [24]. It should be pointed out that a practical shortcoming of in vitro semisynthesis of an engineered protein is the need to rely on relatively cumbersome techniques, such as microinjection, to 7.2.2.1.4
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study
I
393
UnphosphorylatedSHP-2 PTPase
\
/
protein tyrosine kinase
Y-542
-
,7-580
pj-580 pY-542
i
C-SH2
PTPase PTPase 580-Phosphorylated
Fig. 7.2-7
542-Phosphorylated
Model for structural regulation o f SHP-2 by tyrosine phosphorylation
study its intracellular effects and behavior. Nevertheless, the microinjection method for the introduction of semisynthetic SHP-2-modifiedproteins proved feasible and permitted an analysis of the effects of Pmp-542 modification on protein stability and MAP kinase activation [24]. The effects on MAP kinase activation were indirectly monitored via a serum response element reporter. Immunocytochemical analysis revealed that the Pmp-542 containing SHP-2 showed a significant relative activation of MAP kinase compared with Tyr542 containing SHP-2, whereas both the proteins showed similar stabilities in the cell. This provided compelling data that the tyrosine phosphorylation of SHP-2 could specifically stimulate signaling in an important cellular pathway, and this finding has subsequently been confirmed and extended in other studies [29]. In experiments on SHP-1, related but nonidentical effects of tail phosphonates were observed [28]. While up to an eightfold enhancement of catalytic activity by FLPmp substitution at Tyr536 was detected, only a 1.6fold stimulation of phosphatase action by substitution at Tyr564 was found [28]. Mutagenesis revealed that these effects were mediated by intramolecular interactions with the N-SH2 and C-SH2 domains, respectively, analogous to the behavior of SHP-2 [28]. Interestingly, unlike SHP-2, these phosphonylated residues were quite accessible to Grb2 interaction, indicating that the intramolecular interactions were less energetically favorable than the SHP-2 case [24, 281. These studies reveal the value of studying the detailed molecular energetics of posttranslational effects on individual protein homologs.
394
I 7.2.2.2
7 Reverse Chemical Genetics Revisited
Regulation o f Serotonin N-acetyltransferase by Phosphorylation
Serotonin N-acetyltransferase (arylalkylamine N-acetyltransferase, AANAT) catalyzes the penultimate and regulated step in the pineal gland biosynthesis of melatonin, the critical circadian rhythm hormone (Fig. 7.2-8) [30]. It has been known for over 30years that the rhythm of melatonin production is driven by a rise and fall of AANAT, which is highest at night and falls during the day [30]. Moreover, when mammals and people are exposed to light in the middle of the night, a rapid decrease in AANAT follows [30]. Over the last few years, the role of phosphorylation of AANAT has been proposed to contribute to this regulatory process. In the current model, AANAT can be phosphorylated on Thr32 and Ser205 by protein kinase A (PKA), which is, in turn, under the regulation of the adrenergic G-protein-coupled receptor [31]. Upon phosphorylation, a 14-3-3recruitment is believed to occur which might somehow shield AANAT from proteolytic degradation (Fig. 7.2-9). 7.2.2.2.1 Phosphonates as Probes o f Serotonin N-acetyltransferase Regulation A prediction of the kinase regulatory model for melatonin rhythm is that AANAT, which incorporates phosphate mimics at the protein kinase A (PKA) phosphorylation sites, should show resistance to proteolysis and increased cellular stability [32, 331. The usual Ser/Thr to Glu mutations were considered unlikely to be a promising strategy on the basis of the structural features of the 14-3-3-phosphoprotein interaction [32]. The phosphoAANAT-14-3-3 complex reveals that each of the three nonbridging phosphate oxygens are involved in hydrogen-bonding interactions with 14-3-3 residues [47]. Thus, phosphonate-containing AANATs were prepared by the methods of native chemical ligation (Thr32 replacement) and EPL (Ser205) [32, 331. These studies used Pma (Thr32) and FZPma (Ser205). The corresponding Glu32 AANAT was generated for use in 14-3-3binding analysis [32]. As expected, the
dNH2 C02H
Aromatic
Tryptophan
H
H
o
d
' 2
decarboxylase aminoacid
Hydroxylase
H
0 2
L-Tryptophan
0
0 -
Serotonin N-acetyl-
A
o-methyl HydroxyindoleM e O E J transferase ..
H N-Acetyl-serotonin
Fig. 7.2-8
H Serotonin (5-hydroxytryptarnine)
H Melatonin
Biosynthetic pathway t o melatonin from tryptophan.
CH3
1 . ... ..
"Destruction"
dirner
"Protection"
Fig. 7.2-9
Proposed model for the regulation of serotonin N-acetyltransferase (AANAT)
by phosphorylation.
Pma-32 and PhosThr32 AANAT proteins showed strong (and similar) affinity for the 14-3-3interaction, whereas the Ala and Glu AANAT proteins showed minimal binding to 14-3-3under these conditions [32]. Likewise, F2 Pma-205 and PhosSer205 AANAT showed similar 14-3-3binding affinity to each other but enhanced 14-3-3affinity compared to Ser205 AANAT. The stabilities of semisynthetic AANATs were explored in Chinese hamster ovarian (CHO)cells using microinjection methods [32,33].This cell type, while not identical to the natural pinealocytes, has been shown to recapitulate many of the features of AANAT regulation and has, thus, been used as a model system [34].Immunocytochemistry showed that nonphosphorylated AANAT injected into CHO cells is readily observed minutes after microinjection but disappears mostly by 1 h [32]. Stabilities were low and similar for PhosThr32 and Glu32 containing AANATs. Strikingly, Pma-32 AANAT is greatly stabilized compared to each of these other proteins, indicating a direct role for this phosphorylation event in stimulating melatonin production [32].It is noteworthy that PhosThr32 AANAT showed diminished stability compared to Pma-32 AANAT and this suggests that phosphatases play a critical role in rapidly reversing the effects of cellular phosphorylation. The importance of 14-3-3 in contributing to the AANAT regulation was revealed by demonstrating that PhosThr32 AANAT but not Glu32 AANAT was significantly stabilized by concomitant microinjection with the 14-3-3 adaptor protein [32]. Related findings were demonstrated in the case of Ser205-modified protein comparing F2Pma and Ser205 AANAT stability [33].Thus, phosphonate analogs have been effectivelyutilized to clarify the basis of AANAT and melatonin regulation.
7.2.2.3
Bisubstrate Analogs as Protein Kinase Inhibitors
For the past 20years, investigators have recognized the need for selective protein kinase inhibitors as research tools [35]. Such tools can be used to
396
I examine the function of a particular kinase in cell lysates, cell culture, or in 7 Reverse Chemical Genetics Revisited
vivo.They can be used to aid in structural studies and other biophysical analyses. Numerous natural products and synthetic scaffolds have been employed for this purpose [35]. Most efforts that have led to potent protein kinase inhibitors have exploited the ATP-binding site [35]. The advantage of this site is that it is relatively hydrophobic, deep, and contains hydrogen bond donorslacceptors, which allow for enhanced affinity. Molecules that target the ATP site are often cell permeable and can show favorable pharmacokinetic properties. However, ATP binding is relatively conserved among protein kinases, making specificity difficult to achieve. Because protein kinases, by definition, always must bind a protein substrate prior to phosphorylation, compounds that disrupt this interaction would also be useful kinase inhibitors. The advantage of protein substrate sites is that they often display relatively specific interactions with their individual targets, necessary for achieving their precise biological functions [36]. However, the kinase interactions with protein targets are often of modest affinity, reflecting the shallow interaction surfaces involved. Aside from a few notable exceptions often inspired by naturally occurring protein kinase inhibitor peptide sequences [37],protein substrate site inhibitors have not yet proved to be highly efficacious. An approach to inhibitors that have the potential to improve both potency and specificity involves the covalent linking of nucleotide and peptide site ligands. Often termed bisubstrate analogs, these compounds can, in principle, achieve binding energies that are equal to or greater than the sum of the binding energies of the individual ligands [38]. In the case of protein kinases, much of the potency can be expected to be derived from the nucleotidebinding site, whereas the specificity should relate to the more divergent protein substrate-binding site. A critical element in the design of such protein kinase-bisubstrate analog inhibitors relates to the choice of the linker. To underscore this point, an early effort to prepare a potent protein kinase A bisubstrate inhibitor resulted in a relatively weak compound [39]. In this design, the consensus peptide substrate kemptide was directly linked via its Ser oxygen to the y-phosphate of ATP generating 1 (Fig. 7.2-10). Bisubstrate analog 1 showed an approximate Ki of 125 p M and was slightly weaker in affinity than ATP itself [39]. 7.2.2.3.1
Bisubstrate Tyrosine Kinase Inhibitors Designed for Dissociative Mechanisms
Finding effective linkers for bisubstrate analogs could, in principle, be based on combinatorial chemistry or rational design principles. Since compounds synthesized to mimic the transition state are often potent enzyme inhibitors, a consideration of enzyme mechanism might be helpful in linker design. In this regard, a preponderance of evidence including enzyme model reactions, linear free-energy relationships, pH-rate profiles, and X-ray crystal
7.2 Chemical Biology and Enzymology: Protein Phosphoryylation as a Case Study
I
R2 RZ
H O OH R1=NH2-Leu-Arg-Arg-AlaR2= -Leu-Gly-C02H
1
HO OH R =AcNH-Lys-Lys-Lys-Leu-Pro-Ala-Thr-Gly-Asp-
R,= -Met-Asn-Met-Ser-Pro-Val-Gly-Asp-C02H
2
n
HO OH
3
Fig. 7.2-10
Bisubstrate analogs for protein kinases
structures suggests that protein kinases catalyze phosphoryl transfer via a dissociative transition state [18]. In such a transition state, the entering group (Ser/Thr/Tyr) forms little or no bond with the attacked phosphorus before near-complete severing of the bond between the phosphorus and the leaving group (ADP). This mechanism relies on the high reactivity of the electrophilic metaphosphate-like species. Mildvan has suggested that the reaction coordinate distance between the ATP and Ser or Tyr might be 5 A prior to the development of a dissociative transition state [40]. A bisubstrate analog 2 for the insulin receptor kinase (IRK) was developed with this framework in mind, in which an acetyl spacer was inserted between the ATPyS and an I R K peptide substrate [41]. Because pH-rate studies had suggested that proton removal from the substrate Tyr occurs late [18],a Tyr isostere was chosen in which the Tyr oxygen was replaced with a nitrogen atom. This anilino nitrogen could comprise part of the linker but still serve as a hydrogen bond donor to the highly conserved catalytic-loop Asp carboxylate. The extended distance from the anilino nitrogen to the y-phosphorus was estimated to be 5.7 A, slightly longer than the 5 A reaction coordinate distance predicted for a dissociative transition state. The synthesis of this compound was efficiently achieved by exploiting a chemoselective ligation between ATPy S and the readily prepared bromoacetanilido peptide [41]. While these peptideATP conjugates are acid labile, they are quite stable under physiologic buffer conditions. In accordance with design, compound 2 was shown to be a potent I R K inhibitor with K, of 370 nM, competitive versus both ATP and peptide substrate [41]. This potency was nearly equivalent to that expected for summing the binding energies of the individual ligands ATPyS and the insulin receptor peptide substrate. Deletion of the peptide moiety (as in compound 3, Fig. 7.2-10) led to a much weaker inhibitor, comparable to the
397
398
I potency of ATPyS itself
7 Reverse Chemical Genetics Revisited
[41]. An X-ray crystal structure of the IRK-bisubstrate analog complex (Fig. 7.2-11) indicated that several of the design principles were validated [41]. Thus, the nucleotide- and peptide-binding sites on the IRK were dually occupied by the inhibitor, the distance between the anilino nitrogen and the y-phosphate was about 5 A, and a hydrogen bond between the anilino nitrogen and the catalytic Asp was maintained. Surprisingly, the acetyl linker carbonyl was found to be a ligand for the active site Mg, replacing a water molecule observed in the ternary complex structure. The structural basis for potent inhibition has also been probed by preparing and testing a series of closely related analogs of 2 as IRK inhibitors (Fig. 7.2-12) [42]. Among these, replacement of the anilino nitrogen with a more native
Fig. 7.2-11 Cocrystal structure o f bisubstrate analog 2 bound t o the insulin receptor kinase (IRK) domain [41]. IRK is shown in molecular surface representation with atoms ofthe N-terminal lobe colored blue and atoms ofthe C-terminal lobe colored gray. The molecular surface is semitransparent and shows the ATP moiety
o f compound 2. Compound 2 is shown in a ball-and-stick representation with nitrogen atoms colored blue, oxygen atoms colored red, sulfur atoms colored green, and phosphorus atoms colored black. Carbon atoms o f the peptide moiety are colored yellow, and carbon atoms of the ATP moiety and linker are colored orange.
7.2 Chemical Biology and Enzymology: Protein Phosphorylation as a Case Study NH2
HO OH
b 0
R2
6
HO OH
R, =AcNH-Lys-Lys-Lys-Leu-Pro-Ala-Thr-Gly-AspRp= -Met-Asn-Met-Ser-Pro-VaCGly-Asp-COzH
Fig. 7.2-12 linkers.
Bisubstrate analog inhibitors of the insulin receptor kinase with varying
oxygen atom (compound 4) introduced an 80-fold penalty in binding affinity [42]. This gave further credence to the relative importance of the hydrogen bond between the anilino nitrogen and Asp. Also deleterious to potency were alterations in the spacer length by methylene insertion (compound 5) or phosphate removal (compound 6) which cost 18-fold and more than 200fold penalties, respectively [42]. These observations underscore the value of targeting the precise reaction coordinate distance by the designed inhibitor. One unanticipated dividend of the structure of the complex between the IRK and 2 was the more detailed information relating to the molecular recognition of the peptide moiety-kinase interaction [42]. Many more contacts between the enzyme and peptide moiety were seen in this structure than in the ternary complex where the peptide was largely disordered [43]. In hindsight, this can be understood as reflecting the greater stability of the bisubstrate complex. As expected, substitution or deletion of key amino acids observed in the structure led to reduced affinity, in the range of 5-lO-fold per modification [42]. These results indicate that bisubstrate analogs combined with X-ray crystallographic analysis have the potential to enhance the understanding of peptide recognition by k'inases. 7.2.2.3.2
Bisubstrate Analog Designed for a Serine/Threonine Kinase
The favorable results in the case of the insulin receptor tyrosine kinase prompted the application of the bisubstrate analog approach to a serine/ threonine kinase [44]. Protein kinase A was selected because it had been
I
399
400
I previously targeted with the directly linked ATP-kemptide conjugate 1 as 7 Reverse C h e m i c a l Genetics Revisited
described above [39]. In this case, aminoalanine was used as a surrogate for serine, and bromoacetamide was readily coupled with ATPy S, affording compound 7 (Fig. 7.2-13) [44].The ATPy S-acetyl-kemptideconjugate 7 proved to be a moderately potent inhibitor of protein kinase A with a Ki of 3 pM [44].Interestingly, this compound was a competitive inhibitor against ATP but noncompetitive against peptide [44].This pattern of inhibition can be attributed to the previously established preferred order of the binding mechanism of ATP prior to peptide [44].Bisubstrate analog 7 was about 40-fold more potent than the original ATP-kemptide conjugate 1, consistent with the importance of spacer length. Bisubstrate analog 7 showed very weak ability to block protein kinase C, which is noteworthy because of the overlapping peptide substrate specificity of these two enzymes [44]. While its structural basis is not yet understood, this selectivity highlights the potential of using the bivalent approach to individually target closely related protein kinases. 7.2.2.3.3 Protein-ATP Conjugates as Kinase Ligands Prepared by Expressed Protein Ligation
Many protein kinases are rather inefficient at catalyzing the phosphorylation of short synthetic peptides but are highly effective at attaching a phosphate to full-length protein substrates. In general, the molecular basis for this specificity is not understood. A classical example of this behavior is the phosphorylation of the tail tyrosine residue of Src by the protein tyrosine kinase Csk [45].This phosphorylation event is known to be important because it downregulates the Src kinase activity by inducing a complex conformational change in the Src protein [45]. It has been demonstrated that C-terminal tyrosine containing peptides derived from Src are very poor Csk substrates in vitro [45]. Nevertheless, recombinant Src protein that includes at least the
7
o y p
?3
HNxNH2 HNLNH 1. (PhW4Pd(0)
+
2. Et2NCS2H Et3N
R4
R4
NH,
1. Bromoacetic acid, DIC ___)
2. TFA. H20, CH2C12,thioanisole
R, =AcNH-Leu-Arg-Arg-AlaR2= -Leu-Gly-C02H R,=AcNH-Leu-Arg( Pmc)-Arg(Pmc)-AlaR4= -Leu-Gly-C02-Wang resin
Fig. 7.2-13
7
HO OH
Synthetic scheme for the generation o f a protein kinase A selective bisubstrate analog inhibitor based on a dissociative transition state.
References I401
Fig. 7.2-14 A Src-ATPyS conjugate as a high-affinity Csk ligand produced by expressed protein ligation.
Src catalytic domain and C-terminal tail is an excellent in uitro substrate, about 1000-fold better than peptides [45]. I t is noteworthy that the groundstate interaction between Csk and Src is quite weak (& > 50pM) even though the apparent Src K, is in the 2-4 pM range [45]. A high-resolution cocrystal structure of the Csk-Src complex that might provide insights into the molecular recognition has not yet been obtained. In order to generate a high-affinity Src-related ligand for Csk which might aid structural studies, a bivalent Src conjugate was prepared in which ATPyS linkage was introduced into the Src tail [4G].Because the target molecule contains a protein ofgreater than 300 amino acids, total chemical synthesis was an unrealistic option. However, using EPL, the ATPy S-acetanilide function was readily introduced into the Src tail (Fig. 7.2-14) [4G].As expected, this produced a potent (sub-micromolar) ligand for Csk [4G].Specificity of this Src-ATP conjugate for Csk was shown using a pull-down experiment from cell extracts [4G].These studies also point to the use of both peptide- and protein-ATP conjugates in proteomic analysis.
References 1.
L.N. Johnson, D.C. Phillips, Nature 1965, 206,761-763.
C.T. Walsh, Enzymatic Reaction Mechanisms, W.H. Freeman, 1978, New York, NY. 3. G. Winter, A.R. Fersht, A.J. Wilkinson, M. Zoller, M. Smith, Nature 1982, 299,756-758. 4. T.W. Muir, S.B. Kent, Curr. Opin. BiotechnoL 1993, 4,420-427. 2.
L. Wang, P.G. Schultz, Angav. Chem., Int. Ed. Engl. 2004,44, 34-66. 6. C.]. Wallace, Cum. Opin. Biotechnol. 5.
1995, 6,403-410. 7.
P.E. Dawson, T.W. Muir,
1. Clark-Lewis, S.B. Kent, Science 1994, 266, 776-779. 8.
D.A. Erlanson, M. Chytil, G.L. Verdine, Chem. B i d . 1996, 3,981-991.
402
I
7 Reverse Chemical Genetics Revisited 9.
10. 11.
12.
13.
14. 15.
16. 17. 18.
19. 20.
21.
22.
23.
24. 25.
26.
T.W. Muir, D. Sondhi, P.A. Cole, Proc. 27. W. Lu, K. Shen, P.A. Cole, Biochemistry 2003, 42, 5461-5468. Nat!. Acad. Sci. U.S.A. 1998, 95, 28. Z. Zhang, K. Shen, W. Lu, P.A. Cole, 6705-6710. J . Biol. Chem. 2003, 278,4668-4674. T.C. Evans Jr, J. Benner, M.Q. Xu, 29. T. Araki, H. Nawa, B.G. Neel,J. Biol. Protein Sci. 1998, 7, 2256-2264. Chem. 2003,278,41677-41684. C.T. Walsh, Posttranslational 30. S . Ganguly, S.L. Coon, D.C. Klein, Cell Modijcation of Proteins: Expanding Tissue Res. 2002, 309, 127-137. Nature’s Inventory, Roberts & Co, 2005, 31. S. Ganguly, J.L. Weller, A. Ho, Greenwood Village, Co. P.Chemineau, B. Malpaux, D.C. G. Manning, D.B. Whyte, R. Martinez, Klein, Proc. Natl. Acad. Sci. U.S.A. T. Hunter, S. Sudarsanam, Science 2005, 102,1222-1227. 2002,298,1912-1934. 32. W. Zheng, Z. Zhang, S. Ganguly, J.L. A. Alonso, J. Sasin, N. Bottini, Weller, D.C. Klein, P.A. Cole, Nat. I. Friedberg, A. Osterman, A. Godzik, Struct. Biol. 2003, 10, 1054-1057. T. Hunter, J. Dixon, T. Mustelin, Cell 33. W. Zheng, D. Schwarzer, A. LeBeau, 2004, 117,699-711. J.L. Weller, D.C. Klein, P.A. Cole,]. K.M. Shokat, Chem. Biol. 1995, 2, Biol. Chem. 2005,280,10462-10467. 509-514. 34. G. Ferry, J. Mozo, C. Ubeaud, M.A. Shogren-Knaak, P.J. Alaimo, S. Berger, M. Bertrand, A. Try, K.M. Shokat, Annu. Rev. Cell Deu. Biol. P. Beauverger, C. Mesangeau, 2001, 17,405-433. P. Delagrange, J.A. Boutin, Cell. Mol. S.A. Johnson, T. Hunter, Nat. Methods L f e Sci. 2002,59,1395-1405. 2005, 2,17-25. 35. P. Cohen, Nat. Rev. Drug Discov. 2002, D.M. Williams, P.A. Cole, Trends 1, 309-315. Biochem. SOC.2001, 26, 271-273. 36. D.S. Lawrence, J. Niu, Pharmacol. P.A. Cole, A.D. Courtney, K. Shen, Ther. 1998, 77, 81-114. Z. Zhang, Y. Qiao, W. Lu, D.M. 37. J.H. Lee, S.K. Nandy, D.S. Lawrence, J . Williams, Acc. Chem. Res. 2003, 36, Am. Chem. SOC.2004, 126,3394-3395. 444-452. 38. K. Parang, P.A. Cole, Pharmacol. Ther. D. Wang, P.A. Cole,J. Am. Chem. SOC. 2002, 93,145-157. 2001, 123,8883-8887. 39. D. Medzihradszky, S.L. Chen, G.L. S.M. Domchek, K.R. Auger, Kenyon, B.W. Gibson, J . Am. Chem. S. Chatterjee, T.R. Burke Jr, S.E. SOC.1994, 116,9413-9419. Shoelson, Biochemistry 1992, 31, 40. A.S. Mildvan, Proteins 1997, 29, 9865-9870. 401-416. L. Chen, L. Wu, A. Otaka, M.S. Smyth, 41. K. Parang, J.H. Till, A.J. Ablooglu, P.P. Roller, T.R. Burke Jr, J. den R.A. Kohanski, S.R. Hubbard, P.A. Hertog, Z.Y. Zhang, Biochem. Biophys. Cole, Nat. Struct. Biol. 2001, 8, 37-41. Res. Commun. 1995,216,976-984. 42. A.C. Hines, K. Parang, R.A. Kohanski, T.R. Burke Jr, Z.J.Yao, D.G. Liu, J. S.R. Hubbard, P.A. Cole, Bioorg. Voigt, Y. Gao, Biopolymers 2001, 60, Chem. 2005,33,285-297. 32-44. 43. S.R. Hubbard, EMBOJ. 1997, 16, J.W. Wu, M. Hu, J. Chai, J. Seoane, 5572-5581. M. Huse, C. Li, D.J. Rigotti, S. Kyin, 44. A.C. Hines, P.A. Cole, Bioorg. Med. T.W. Muir, R. Fairman, J. Massague, Chem. Lett. 2004, 14,2951-2954. Y. Shi, Mol. Cell. 2001, 8, 1277-1289. 45. P.A. Cole, K. Shen, Y. Qiao, D. Wang, W. Lu, D. Gong, D. Bar-Sagi, P.A. Curr. Opin. Chem. Biol. 2003, 7, Cole, Mol. Cell. 2001, 8, 759-769. 580-585. H. Cho, R. Krishnaraj, M. Itoh, 46. K. Shen, P.A. Cole, J . Am. Chem. SOC. E. Kitas, W. Bannwarth, H. Saito, C.T. 2003, 125,16172-16173. Walsh, Protein Sci. 1993, 2, 977-984. 47. T. Obsil, R. Ghirlando, D.C. Klein, B.G. Ned, H. Gu, L. Pao, Trends S. Ganguly, F. Dyda, Cell 2001, 105, 257-267. Biochem. Sci. 2003, 28, 284-293.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
7.3 Chemical Strategiesfor Activity-based Proteomics
7.3 Chemical Strategies for Activity-based Proteomics
NadimJessani and Benjamin F. Cravatt
Outlook
The assignment of molecular and cellular functions to the numerous protein products encoded by prokaryotic and eukaryotic genomes presents a major challenge to the field of proteomics. To address this need for higher order functional proteomic strategies, a chemical proteomic method referred to as activity-based protein profiling (ABPP) was introduced, in which active sitedirected small-molecule probes are employed to measure protein activity rather than abundance. By covalently labeling the active sites of enzyme superfamilies, ABPP provides a direct readout of global changes occurring in the functional state of enzyme families present in samples of high biological complexity. The goal of this chapter is to detail the need for such activity-based methods, and to describe the development and application of ABPP by highlighting several studies that have established the utility of this chemical proteomic method as a powerful strategy for the discovery and functional analysis of complex biological proteomes, as well as their individual constituents.
7.3.1 Introduction
The molecular information provided by the availability of complete genome sequences for numerous prokaryotic and eukaryotic organisms has granted biomedical researchers an unprecedented opportunity to understand better the molecular basis of life in its many forms. To accelerate this process, global experimental approaches, such as genomics [ 11 and proteomics [ 2 ] , have been introduced to characterize genes and proteins collectively, rather than individually. Most genomic and proteomic methods, however, rely on measurements of mRNA and protein abundance as indirect estimates of protein function, a potentially risky assumption considering that most proteins are regulated by posttranslational events in vivo [ 3 ] . Considering that proteins mediate nearly all biochemical events underlying cell and organismal physiology and pathophysiology, the need to develop general methods to measure levels and activities of these biomolecules directly in cell and tissue proteomes is apparent. Indeed, the ability to profile classes of proteins based on the activity would greatly accelerate assignment of protein function and identification of new biomarkers and therapeutic targets for the diagnosis and treatment of human disease. To address this need for higher Chemical Biology. From Small Molecules to System Biology and Drug Design Edited bv Stuart L. Schreiber. Tarun M. Kauoor. and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag G d b H & Co KGaA Weinheim ISBN 978-3-527-31150-7
I
403
404
I order functional proteomics methods, a chemical proteomic strategy referred 7 Reverse Chemical Genetics Revisited
to as activity-based protein profiling (ABPP) [4,51 emerged, which utilizes active site-directed chemical probes that measure protein activity rather than abundance to profile the functional state of enzyme families directly in complex proteomes. By providing a covalent link between labeled proteins and a chemical tag, ABPP permits the consolidated detection, isolation, and identification of active enzymes directly from samples of high biological complexity. 7.3.2 History/Development
7.3.2.1 Global Approaches for Biological Research in the Postgenome Era A fundamental goal of biological research is to understand the complex roles that enzymes play in physiological and pathological processes and to use this knowledge to decipher the molecular correlates of health and disease. Until recently, this process of discovery principally entailed an iterative cycle of identifying, isolating, and functionally characterizing proteins and genes associated with a particular molecular or cellular event. However, with the dawn of complete genome sequence availability for numerous prokaryotic and eukaryotic organisms, the scientific community experienced a paradigm shift that transformed the most basic methods of experimentation. From this, several global experimental approaches evolved to meet the emerging challenge and opportunity of characterizing genes and/or proteins collectively, rather than individually. These approaches included genomics [ 11, the analysis of a cell’s complete transcript repertoire (transcriptome), and proteomics [ 2 ] , the analysis of a cell’s complete protein repertoire (proteome). Indeed, genomics, or “functional” genomics, evolved rapidly as a field, with gene microarray studies nearing the goal of quantitatively comparing in a single experiment the complete transcriptomes of two test samples. Such studies have provided valuable insights into the global gene expression patterns of many pathologies, such as cancer[6] and inflammation [7]. However, inherent to most genomics approaches is their reliance on mRNA transcript levels as an indirect measure of protein quantity and function. To grant biochemical and cell biological meaning to genomic data, one must accept that dynamics in mRNA expression correlate with similar changes in protein levels and activity, a potentially problematic assumption given the numerous posttranscriptional and posttranslational events known to regulate protein expression and function [3].Furthermore, although transcript profiling has become a standard tool in biomedical research, the need for global characterization of biological samples at the level of the proteome will likely be critical for the identificationof new diagnostic markers and drug targets. While, proteomics as a field has rapidly evolved to meet these challenges, standard
7.3 Chemical Strategiesfor Activity-based Proteomics
approaches are often restricted to detecting changes in protein abundance, and therefore, do not take into account numerous posttranslational events that regulate protein activity. Thus, the need for proteomic methods that measure activity rather than abundance to complement conventional genomic and proteomic strategies has become apparent.
7.3.2.2 Chemical Strategies for Functional Proteomics
Given the success of genome sequencing projects, biological research has been launched into a new era where focus has shifted from the identification of novel genes to the functional characterization of gene products. Considering that the number of unique human genes appears to exceed 25000, the daunting task of assigning molecular, cellular, and physiological function to the protein products encoded by these genes awaits postgenomic researchers. To accelerate this process, and as a complement to genomics, the field of proteomics has the development and application of methods for the parallel analysis of large numbers of proteins as one of its major goals [2]. However, the technical challenges associated with proteomic studies greatly exceed those faced by genomics [S]. For example, while gene microarrays can exploit the inherent specificityof complementary oligonucleotidehybridization to analyze vast numbers of distinct mRNA transcripts in parallel, proteins lack such highspecificity binding partners for use as selective probes. Unlike nucleic acids, molecular amplification strategies such as PCR (polymerase chain reaction) do not exist for proteins, thereby restricting the ability to analyze samples where only minimal or limited quantities of cellular material are available (e.g., single cell analysis or clinical specimens). Moreover, while nucleic acids generally display similar biochemical properties, proteins exhibit a wide range of distinct biochemical properties and cannot be treated as experimentally equivalent. These properties include membrane-association, hetero- and homo-oligomerization,and a host of posttranslational modifications, meaning that no single experimental protocol is suitable for the characterization of all proteins. Given these technical challenges, the development of complementary analytical strategies must maximize the information content extractable from proteomic samples. Such proteomic strategies included efforts to characterize both protein expression and protein function on a global scale. The most mature current method for analyzing protein expression patterns utilizes two-dimensional electrophoresis (2DE) for the separation of proteins coupled with protein staining and mass spectrometry (MS) for protein detection and identification, respectively [9]. Although 2DE-MS methods permit the consolidated analysis of the relative expression levels of many proteins across multiple proteomic samples, these approaches suffer from an inability to resolve several important protein classes, including low abundance and membrane-associated proteins [lo]. To address these shortcomings, several powerful MS-based strategies for the gel-free analysis
I
405
406
I
7 Reverse Chemical Genetics Revisited
of proteomes have emerged, including isotope-coded affinity tagging (ICAT) for quantitative proteomics [I11 and multidimensional protein identification technology (MudPIT) for comprehensive proteomics [12]. ICAT, for example, utilizes chemical labeling reagents, referred to as isotope-coded aflnity tags to enable the comparative analysis of protein expression levels by liquid chromatography (separation) and tandem MS (detection), thereby circumventing several limitations of gel-based methods, and providing improved access to membrane-associated and low abundance proteins [13]. Nonetheless, these methods, like 2DE-MS, still focus on measuring changes in protein abundance and, therefore, provide only an indirect estimate of dynamics in protein function. Indeed, several important forms of posttranslational regulation, including protein-protein and protein-small molecule interactions [ 3 ] ,may elude detection by abundance-based proteomic methods. To facilitate the analysis of protein function, several proteomic methods have been introduced to characterize the activity of proteins on a global scale. These include large-scale yeast two-hybrid screens [14] and epitope-tagging immunoprecipitation experiments [ 15, 161, which aim to construct comprehensive maps of protein-protein interactions and protein microarrays [ 17, 181, which aim to provide an assay platform for the rapid assessment of protein activities. Although these methods have the advantage of assigning specific molecular functions to individual protein products, they typically rely on the recombinant expression of proteins in artificial environments, and therefore, do not directly assess the functional state of these biomolecules in their native settings. It was to address this need for higher order functional proteomic methods, that ABPP has emerged as a strategy to measure protein activity rather than abundance (Fig. 7.3-1). In contrast to conventional proteomic strategies, which aim to catalogue the entire complement of protein products in a given sample, ABPP is designed to address the proteome at the level of discrete enzyme families, providing a way to distinguish, for example, active enzymes from their inactive zymogen [ 191 and/or inhibitor-bound forms [20].
DNA
-b
RNA
Microarrays
t
__+ Protein b
Genomics
Chemical probes
MudPlT
Proteomics
Fig. 7.3-1 Overview of genomic and proteomic methods. Standard genomic and proteomic approaches measure changes in mRNA and protein abundance, respectively. In contrast, activity-based protein profiling
Protein activity
f
ABPP
(ABPP) applies active site-directed chemical probes t o measure dynamics in enzyme activities, directly in the context of whole Proteomes and living systems.
7.3 Chemical Strategiesfor Activity-based Proteomics
7.3.3 General Considerations 7.3.3.1
7.3.3.1.1
Activity-based Protein Profiling (ABPP) - A Chemical Strategy for the Global Profiling of Enzyme Activities in Complex Proteomes The Need for Activity-based Proteomic Methods
As described above, genomic and proteomic approaches assess protein function indirectly, by measuring changes in mRNA and protein level, respectively. A proponent of these strategies might reasonably argue that alterations in transcript and protein level will generally correlate well with changes in protein function. However, several enzyme families clearly represent important exceptions to this premise. For example, most proteases are produced as inactive precursors (zymogens), and upon activation are often bound by a complex array of endogenous inhibitors that serve as critical posttranslational regulators of their catalytic activities in uivo [ 3 , 211. Thus, a change in the level of a given protease may or may not have functional impact depending on whether the enzyme is processed and/or its abundance exceeds the level of its endogenous inhibitors (Fig. 7.3-2).
4 4
Protease gene
t- Transcription
Protease mRNA
+Translation
Inactive zyrnogen
J-
t Secretion
Inactive zyrnogen
Endogenous inhibitors
4
+Activation
1
t Degradation
Active protease
ECM
Fig. 7.3-2
Schematic o f representative protease posttranslational regulation mechanisms. Multiple levels o f posttranscriptional and posttranslational regulation of protease expression levels and
function, including production as inactive zymogens, compartmentalization/secretion. and inhibition by endogenous protein-binding partners.
I
407
408
I
7 Reverse Chemical Genetics Revisited
Chemical probes that can react with proteases in an activity-dependent manner offer a powerful means to distinguish in a given proteome those enzymes that are active (free)from those that are inactive (zymogens;inhibitorbound), thereby providing a readout of net proteolytic activity. Notably, several other enzyme families, including kinases [22] and phosphatases [23] also undergo dramatic changes in their activities in the absence of alterations in abundance, indicating that numerous classes of enzymes are relevant targets for ABPP. Moreover, because labeling by ABPP probes is based on conserved features contained within enzyme active sites (rather than abundance) these reagents provide a means to access low abundance proteins contained within samples of high complexity, thus addressing the large dynamic range of protein expression displayed by most proteomes [24]. 7.3.3.1.2 The Design of Chemical ABPP Probes for Functional Proteomics
In the appraisal of strategies for ABPP that focus on protein function rather than abundance, it is important to consider how the cell regulates protein activity. In the case of enzymes, most posttranslational regulatory mechanisms share a common feature in that they perturb, either structurally or sterically, the active sites of these proteins [3]. Accordingly, it was hypothesized that chemical probes capable of directly reporting on the integrity of enzyme active sites might serve as effective activity-based profiling tools capable of parallel monitoring of many enzymes directly within the proteomes in which they are naturally expressed. Such “activity-based”probes, can be defined as chemical reagents that meet the following criteria: 1. React with a broad range of enzymes from a particular class (or classes) directly in complex proteomes. 2. React with these enzymes in a manner that correlates with their catalytic activities. 3. Display minimal cross-reactivitywith other undesired protein classes. 4. Possess a chemical tag for the rapid detection and isolation of reactive enzymes.
An activity-based probe meeting these requirements could, in principle, enable the comparative measurement and molecular identification of all the active members of a given enzyme class present in one or more proteomes. Importantly, these enzyme activity profiles can be read out in a variety of formats including gels [20,25], microarrays [26], liquid chromatography-mass spectrometry (LC-MS)[27], and capillary electrophoresis [28] (Fig. 7.3-3). 7.3.3.1.3 The General Structure of Activity-based Probes: Directed versus Nondirected Strategies
An activity-based chemical probe consists of at least two general elements: (a) a reactive group (RG) that binds and covalently modifies the active sites
7.3 Chemical Strategiesfor Activity-based Proteomics
Fig. 7.3-3 General strategy for activity-based protein profiling (ABPP). Proteomes are treated with chemical probes that label active enzymes o f a particular class (or classes) in a manner that allows for their detection, isolation, and identification. Active enzymes are denoted by openlunshaded active sites, with their inactive counterparts (e.g., zymogen or inhibitor-bound forms) shaded in black.
RG - reactive group, BC - binding group, tag - biotin and/or fluorophore. Probe-labeled proteomes can be analyzed via several different platforms, including gel [20] or microarray [26] analysis o f probe-labeled enzymes, or capillary electrophoresis (CE) [28] and liquid chromatography-mass spectrometry (LC-MS) [27] analysis o f probe-labeled tryptic peptides.
of a broad range of enzymes from a particular enzyme class (or classes), and (b) one or more chemical tags, such as biotin and/or a fluorophore, for the consolidated detection and isolation of probe-labeled enzymes from complex proteomes. The RG elements of moderate reactivity and electrophilicity were selected, thereby priming them to preferentially modify enzyme active sites that offer a binding pocket enriched in nucleophilic residues important for catalysis. Finally, in certain cases a third structural element may also be introduced into probes design in the form o f a binding group (BG) intended to direct RGs to different enzyme active sites present in the proteome. Directed ABPP - Probe Design for Enzyme Classes Possessing Cognate Affinity Labels
Initial strategies for ABPP focused on the design and application of chemical probes that targeted specific classes of enzymes. In this approach, wellcharacterized affinity labels were incorporated as the RG to direct probe reactivity toward enzymes sharing a similar catalybc mechanism and/or substrate specificity. The design of first-generation serine hydrolase (SH)-directed ABPP probes, for example, exploited the irreversible inhibition that fluorophosphonate (FP) compounds exhibit toward the majority of the members of this enzyme superfamily (Fig. 7.3-4).To date, these directed ABPP efforts have generated probes that profile numerous enzyme classes, including members of all major families of proteases (serine [4,19]cysteine [29-321, metallo [33,34], aspartyl [35], proteasomal [36,37]),as well as select phosphatases [38,39], kinases [40,41],and glycosidases [42]. Some specific examples of directed ABPP
1
409
410
I
7 Reverse Chemical Genetics Revisited
Fig. 7.3-4 Fluorophosphonate labeling o f serine hydrolase (SH) active sites. As a result o f a shared catalytic mechanism, nearly all SHs are potently and irreversibly inhibited by fluorophosphonates (FPs).
Reactivity of FPs depends on SHs being catalytically active, which enables FP reagents coupled with reporter tags t o serve as activity-based probes for this large enzyme family.
probes include: (a) biotinylated/fluorophore-tagged FPs that target the SH superfamily [4,19], (b) biotinylated electrophilic ketones that target the caspase class of cysteine proteases [30], and (c) biotinylated/fluorophore-tagged variants of the natural product EG4 that target the papain class of cysteine proteases [29]. In many of these cases, the chemical probes have been shown to label their enzyme targets in an activity-dependent manner directly within complex proteomes, distinguishing, for example, active enzymes from inactive zymogen or inhibitor-bound forms [4,19,20]. Nondirected ABPP - Probe Design for Enzyme Classes Lacking Cognate Affinity Labels
From these examples of directed approaches for ABPP it may be extrapolated that, for enzyme classes with known covalent inhibitors, the design of activitybased proteomic probes is, at least in concept, straightforward. However, covalent inhibitors do not yet exist for majority of proteins in the proteome; therefore, an alternative strategy is needed to discover active site-directed profiling reagents for proteins lacking known affinity labels. With this goal in mind, a combinatorial, or “nondirected” strategy for ABPP was introduced in which libraries of candidate probes with fixed RGs and variable BGs are synthesized and screened against complex proteomes to identify “specific” protein labeling events, which are defined as those that occurred in native, but not heat-denatured proteomes [43,44]. Probe-protein reactions that are heat-sensitive were predicted to occur in structured, small molecule-binding sites that would often determine the biological activity of the proteins (e.g., the active site of an enzyme or ligand-binding pocket of a receptor). In contrast, proteins reacting with probes in a heat-insensitive manner would be considered “nonspecific” targets, as these labeling events could occur with either native or denatured versions of the proteins. This type of general screen to distinguish specific from nonspecific labeling was deemed particularly important for
7.3 Chemical Strategiesfor Activity-based Proteornics
nondirected ABPP, which utilizes probes that, unlike directed reagents, lack well-established selectivity for a given class of enzymes. Screening libraries of probes against individual proteomes also provided a complementary method to detect specifically labeled proteins, which were expected to show selectivity for a select number of probes on the basis of the structure of their respective BGs and should therefore be discernible from proteins that reacted indiscriminately (i.e., nonspecifically) with the probe library. The utility of nondirected methods for ABPP was initially demonstrated with a modest-sized library of sulfonate ester (SE) probes bearing varying alkyl/aryl BGs that was generated and screened against a collection of tissue and cell line proteomes [43,44]. The SE-group was selected as the library’s RG based on a general survey of the literature, which revealed that a large range of enzyme classes, including proteases, kinases, and phosphatases, are susceptible to covalent inactivation by natural products and/or synthetic inhibitors that possess carbon electrophiles. Accordingly, it was hypothesized that ABPP probes incorporating a carbon electrophile RG may prove capable of profiling enzymes not only within but also across mechanistically distinct classes. Consistent with this premise, several heat-sensitive protein targets of the sulfonate library were identified and found to represent members of at least nine different enzyme classes (Table 7.3-1). Interestingly, each enzyme target displayed a unique reactivity profile with the SE probe library, indicating that the structure of the variable BG strongly influenced probe-protein interactions. Several lines of evidence supported that the sulfonate probes labeled the active sites of their enzyme targets. For example, the addition of cofactors and/or substrates was found to inhibit the labeling of several enzymes, while the reactivity of others was either positively or negatively affected by known allosteric regulators of catalytic activity [43,441. Notably, for one enzyme target, aldehyde dehydrogenase-1 (ALDH-1) sulfonate probes were shown to act as time-dependent inactivators of catalytic activity [43, 441. Finally, advanced LC-MS platforms for ABPP have revealed that, in nearly all cases, SE probes label their enzyme targets on conserved active site residues [27]. While these original studies demonstrated that nondirected strategies can in fact deliver bonafide activity-based probes for enzyme families not yet accessible by directed methods, one major drawback still existed in the limited structural diversity of the SE library, a factor proposed to be responsible for the modest differences in the proteome reactivity profiles observed for these probes. To test the hypothesis, that exploring further proteome space would require a more structurally diverse library of electrophilic agents, one such library was developed in which an a-chloroacetamide (a-CA)RG was coupled to a variable dipeptide BG that would enable the intrinsic diversity of amino acid functional groups to be exploited for probe binding to additional enzyme families [45].In addition to its tempered electrophilicity (stable under many synthetic chemistry conditions), the a-CA group is small in size, therefore limiting the likelihood
I
41 1
412
6
5 m
W
m
v) v)
U W
$ S
W
I W c '0 S
m
v)
2
c
2 m
4-
al
n
ea
a m
n Q
7
x 2 %
S P I_mE
I
7 Reverse Chemical Genetics Revisited
7.3 Chemical Strategiesfor Activity-based Proteomics
a,
F
I
413
414
I
7 Reverse Chemical Genetics Revisited
-
* 0" * 3
I
7.3 Chemical Strategiesfor Activity-based Proteamics
of unduly influencing noncovalent probe-protein interactions driven by the dipeptide BG. Furthermore, given the precedence of other carbon electrophile RGs, such as the SEs [43, 441 and epoxides [29], to label a range of active site residues, it was proposed that the inherent reactivity of the a-CA probe library would not be strongly biased toward a specific enzyme class. Indeed, initial studies identified more than 10 different classes of enzymes targeted by a representative “optimal set” of a-CA dipeptide library members, most of which were not labeled by previously developed ABPP probes, including several obesity-associated enzyme activities, and proteins involved in lipid metabolism and gluconeogenesis (Table 7.3-1). Collectively, these studies reveal that, through the use of both directed and nondirected strategies, activity-based probes compatible with whole proteome analysis can be generated for numerous enzyme classes. While comparing directed and nondirected approaches for ABPP, it is perhaps most interesting to note the striking nonoverlap between enzyme targets profiled by each method (Table 7.3-1). Indeed, none of the SE-labeled enzymes identified to date represent known targets of directed ABPP probes. This finding suggests that the amount of “active site space” in the proteome accessible to chemical profiling is still far from saturation.
7.3.4 Applications and Practical Examples 7.3.4.1
Biological Applications: Comparative and Competitive ABPP
Methods for ABPP have matured rapidly since their introduction in the late 1990s, providing a new avenue for identifying novel diseaseassociated enzymes (target discovery) and chemical inhibitors thereof (inhibitor discovery). In addition to highlighting the biological utility of activitybased proteomic methods to provide information content not readily achieved by other expression-based techniques, the studies presented in this section are aimed at demonstrating the benefit of parsing the proteome into tractable functional units (activity states of given enzyme classes), for the discovery of disease-related enzymes, as well as lead inhibitors that target these enzymes. 7.3.4.1.1 Comparative Profiling for the Discovery o f Enzyme Activities Associated with Discrete Physiological and Pathological States
The identification of enzymes selectively expressed by tumor cells and tissues may provide a rich source of new biomarkers and targets for the diagnosis and treatment of cancer. In one such effort, the activity, subcellular distribution, and glycosylation state of members from the SH superfamily of enzymes was quantitatively profiled across a panel of human cancer cell lines [20]. The SHs represent one of the largest and most diverse enzyme classes in higher eukaryotic proteomes, consisting of proteases, lipases, esterases,
I
41 5
416
I and amidases, that collectively constitute approximately 1%of the predicted 7 Reverse C h e m i c a l Genetics Revisited
protein products encoded by the human genome. By profiling the secreted, membrane-associated, and soluble cellular fractions derived from human breast carcinoma and melanoma lines, this study led to the identification of SH activities that distinguished cancer lines according to their respective tissue of origin. Interestingly, nearly all of these activities were downregulated in the most invasive cancer lines analyzed that instead upregulated a distinct set of secreted and membrane-associated SH activities. In contrast to the diverse patterns of enzyme activity observed in the secreted and membrane proteomes of cancer cells, their soluble proteomes appeared quite similar, with few enzyme activities exhibiting restricted patterns of distribution. These findings suggest that, at least for the SH superfamily, the membrane and secreted proteomes are enriched in enzyme activities that depict cellular phenotype, highlighting the value of methods, like ABPP, that can analyze technically challenging proteomic fractions (e.g., secreted, membrane, glycosylated, and low abundance proteins). More generally, these results suggest that invasive cancer cells share discrete proteomic signatures that are more reflective of their biological phenotype than their cellular heritage, highlighting that a common set of enzymes may support the progression of tumors from a variety of origins and thus represent attractive targets for the diagnosis and treatment of cancer. This comparative ABPP analysis was subsequently extended to a more sophisticated in vivo model of human cancer-breast cancer xenografts grown in immunodeficient mice [4G]. The mixed species nature ofthe xenograft model enabled the discrimination of active enzymes that were tumor-associated (human) or host-derived (mouse), resulting in the identification of several different classes of activities, including: carcinoma enzyme activities expressed selectively in culture or in xenograft tumors, as well as host stromal activities that either infiltrated or were excluded from xenograft tumors. Interestingly, cell lines derived from xenograft tumors exhibited profound differences in their enzyme activity profiles, as compared to the parental line, which correlated with increased tumor growth rates and metastasis upon reintroduction into mice. In particular, xenograft-derived breast cancer cells exhibited dramatic elevations in secreted protease activities (urokinase and tissue-type plasminogen activator), as well as the downregulation of key glycolytic enzymes (phosphofructokinase). These findings suggest that the behavior of human cancer cell lines grown in vivo may vary considerably from their characteristics in culture, and that the in vivo microenvironment of the mouse mammary fat pad cultivates the growth of human breast cancer cells with altered enzyme activity profiles and elevated tumorigenic properties. The benefit of addressing the proteome at the level ofdistinct enzyme classes, as well as the versatility of ABPP reagents, is highlighted in a third example of comparative ABPP profiling. In this study, carried out by Greenbaum and colleagues, activity-based probes were applied to characterize the functional role of the papain subclass of cysteine proteases in the Plasmodium falciparum life cycle [47]. While cysteine proteases are known to be essential for the
7.3 Chemical Strategiesfor Activity-based Proteomics
survival of several human parasites, the specific roles played by these enzymes during the complex life cycle of P. fulcipururn remain ill defined. ABPP of P. fulcipurum proteomes isolated at various stages of the parasite life cycle identified a specific cysteine protease, falcipain 1,that was upregulated during the invasive merozoite stage of growth. Falcipain 1-selective inhibitors were then identified by screening epoxide-based chemical libraries for compounds that blocked probe labeling of this enzyme in complex proteomes. These inhibitors were subsequently demonstrated to inhibit parasite invasion of host erythrocytes, with no detectable effect on other parasite processes (as opposed to the general papain family protease inhibitor, E-64,which produced multiple aberrations and, ultimately, developmental arrest). Importantly, this ABPP analysis of falcipain 1 function and inhibition was carried out directly in whole parasite lysates, circumventing the need for technically difficult gene ablation experiments and/or recombinant enzyme expressions that often serve as the basis for such studies. 7.3.4.1.2
Competitive ABPP for Discovering Potent and Selective Reversible Enzyme Inhibitors
While activity-based probes can serve as powerful tools for the discovery of enzyme activities associated with discrete (patho) physiological function, the target promiscuity displayed by these profiling agents limits their utility for defining the biological function of individual enzymes, which often depends on the development of specific reagents to perturb the protein function of defined members contained within large enzyme classes. However, as illustrated in the study done by Greenbaum and colleagues [47, 481, ABPP can in fact be effectively applied to identify irreversible inhibitors that, for certain enzyme classes like cysteine proteases, achieve sufficient selectivity to serve as useful pharmacological agents in vivo. Since, for many enzyme classes, irreversible inhibitors display poor target selectivity due to their inherent reactivity, it was also necessary to adapt the ABPP method to serve as an effective primary screen of reversible enzyme inhibitors as well. Toward this end, Leung and colleagues devised a competitive screening strategy to evaluate the activity of libraries of candidate reversible inhibitors, in this case against SH activities expressed in mouse tissue proteome [49]. In this study, proteomes were incubated with a library of candidate inhibitors and a SH-directed probe for a restricted period of time during which most enzymes had not yet reacted to completion with the probe. Under such kinetically controlled conditions, the binding of competitive reversible inhibitors to specific enzymes was detected as a reduction in probe labeling (Fig. 7.3-5). By performing this screen in mouse brain and heart proteomes using varying inhibitor concentrations, both potencies (ICSO values) and selectivities of inhibitors were determined concurrently. Importantly, calculated values, as measured by ABPP, matched closely with K, values, determined by standard substrate assays, closely. Analysis of resulting data sets demonstrated that inhibitors selective for individual SHs could be readily
I
417
418
I
7 Reverse Chemical Genetics Revisited
Fig. 7.3-5 Inhibitor discovery by ABPP. The potency and selectivity of inhibitors can be profiled in parallel by performing competitive ABPP reactions in proteomes. Complex proteomes are treated with a reversible inhibitor library and an activity-based probe, and subsequently
analyzed to identify enzymes sensitive t o individual inhibitors (reflected by a reduction in intensity of probe labeling). Active enzymes are denoted by open/unshaded active sites, with their inhibitor-bound counterparts shaded in color.
distinguished from compounds that displayed comparable or greater activity toward multiple enzymes. Notably, inhibitors were discovered for both-known enzymes of therapeutic interest (e.g., fatty acid amide hydrolase) and novel enzymes that lack known substrates. A further advantage of inhibitor screening by ABPP is that these analyses can be carried out directly in native proteomes without the need for recombinant expression or purification of proteins. Finally, because inhibitors are tested against numerous enzymes in parallel within the context oftheir native proteomes, promiscuous agents can be readily triaged in favor of equally potent compounds that display high target selectivity. Inhibitor screening by ABPP has also facilitated the design of selective covalent agents for several proteases, including papain-directed ABPP probes that have been used as in vivo imaging tools for characterizing cathepsin protease activity in mouse models of human multistage tumorigenesis [SO]. This study culminated in the detection of a pronounced upregulation of cathepsin activity associated with the angiogenic vasculature and invasive fronts of pancreatic and uterine cervical carcinomas, distinguishing the activities derived from the differential expressions in immune, endothelial, and cancer cells. Consistent with these findings, pharmacological inhibition of protease activity with a broad-spectrum cathepsin inhibitor at defined stages of tumorigenesis resulted in the impairment of angiogenic switching in progenitor lesions, as well as tumor growth, tumor vascularity, and invasion in the pancreatic model. 7.3.4.1.3
ABPP strategies for the in uiuo Analysis o f Enzyme Activities
The in vivo imaging studies carried out with cysteine protease-directed probes [SO] underscored the need for a generally applicable methodology for in vivo analysis of enzyme activities. Indeed, as exemplified by many protease families, most enzymes are subject to multiple mechanisms for
7.3 Chemical Strategiesfor Activity-based Proteomics
tightly regulating their activity within the cell, including spatial and temporal expression, binding to small-molecule or protein cofactors, and posttranslational modification. Furthermore, since the physical disruption of cells and tissues may alter the concentrations of endogenous activators/inactivators of enzymes, as well as their respective subcellular distributions, i n vitro proteomic preparations can only, at best, approximate the dynamic functional state of proteins within the physiologically relevant environment of the living cell or organism. A general method for performing ABPP in vivo required that this strategy be transformed into a “tagfree” method, as most reporter groups (e.g., biotin and fluorophores) inhibit the cell permeability and distribution of probes. To address this issue, bio-orthogonal chemical reactions were sought to enable ligation of reporter tags onto proteins after covalent labeling by ABPP probes. In one example, conjugation of the reporter group to the probe following proteome labeling was accomplished by engineering into these reagents a pair of biologically inert coupling partners, the alkyne and azide, which can react to form a stable triazole product via the Huisgen’s 1,3-dipolar cycloaddition reaction [51, 521. The key to the success of this strategy was the recent description by Sharpless and colleagues of a Cu(1)-catalyzed,stepwise version of the azide-alkyne cycloaddition reaction, which can be carried out under mild conditions to produce high yields of product in rapid reaction times (“click chemistry” [53]).Click chemistry-based ABPP has been applied to living cells and organisms, leading to the discovery of enzymes that are selectively labeled i n vivo but not i n vitro [52]. A second bio-orthogonal reaction, the Staudinger ligation, has also been applied to profile proteasomal subunits labeled i n situ with azide-modified probes [37]. Collectively, these studies emphasize the importance of performing ABPP in vivo and underscore the value of bio-orthogonal chemical reactions to achieve this goal.
7.3.4.2
Expanding the Scope ofABPP
7.3.4.2.1 Activity-based Probes for the Proteomic Profiling o f Metalloproteases So far we have described the development of ABPP probes derived from
a combination of two complementary approaches, namely directed and nondirected ABPP, where covalent modification of enzyme active sites was achieved by electrophilic labeling of complementary nucleophilic residues. What about enzyme families that do not utilize an enzyme-bound nucleophile for catalysis? The metalloprotease family of enzymes, for instance, plays key roles in many physiological and pathological processes including tissue remodeling, peptide hormone signaling, and cancer, and are also regulated by myriad posttranslational events [54],thus making them an attractive target for ABPP. However, unlike previous enzyme families targeted by ABPP, metalloproteases (MPs) do not use a protein-bound nucleophile, but rather a zinc-activated water molecule.
I
419
420
I
7 Reverse Chemical Genetics Revisited
To address this important challenge, a novel approach to ABPP probe design was undertaken, in which a zinc-chelating group (hydroxomate) and a photocrosslinking group (benzophenone) were incorporated to promote selective binding and modification of MP active sites, respectively [33, 341 (see Table 7.3-1 for probe structure). Some of these hydroxamate-benzophenone (HxBP) probes were shown to serve as bona fide activity-based probes for several matrix metalloproteases (MMPs), including MMP-2, MMP-7, and MMP-9, labeling the active forms of these proteases but not their zymogen or inhibitor-bound variants [33].Interestingly, competitive profiling experiments carried out with HxBP probes uncovered several MPs in tissue proteomes that constituted “off-target” sites of action for the MMP-directed inhibitor GM6001. Notably, none of these enzymes shared any sequence homology with MMPs, indicating that GM6OOl (a compound currently in clinical trials) inhibits several MPs outside its intended target family (MMPs) and, more generally, that these off-target sites may be partially responsible for the repeated failure of MMP inhibitors developed for clinical use. These findings also emphasize that enzymes can share considerable active site homology without showing sequence relatedness and can underscore the value of ABPP for the discovery of such unanticipated sites of action for inhibitors and drugs. 7.3.4.2.2 Class Assignment o f Sequence-unrelated Members of Enzyme Superfamilies
As a corollary to the notion that enzyme superfamilies comprise members that share a common catalytic mechanism, but not necessarily sequence or structural homology, recent studies have shown that directed ABPP probes, which typically target a large set of mechanistically related enzymes (e.g., SHs, metalloproteases), can also facilitate the identification of unannotated members of enzyme superfamilies [55, 561. Typically, probe-labeled activities identified by ABPP can be readily assigned to a superfamily on the basis of database (BLAST) searches, which identify conserved sequence elements shared by members of a particular enzyme class. For instance, in the analysis of the human cancer cell lines described earlier, numerous FP-labeled protease, lipase, and esterase activities were identified in this manner. However, one FP target identified in this study, sialic acid 9-O-acetylesterase (SAE), which was selectively expressed in melanoma cell lines, shared no sequence homology with SHs or any other known enzyme class. Thus, to determine whether SAE was, in fact, a member of the SH superfamily, experiments were carried out to determine the site of FP probe labeling, a site that was identified as a serine residue that is completely conserved among all SAE family members [55]. Mutagenesis of this residue to alanine, produced an SAE variant that exhibited negligible FP-labeling and enzyme activity, indicating that SAE and its sequence homologs constitute a novel branch of the SH superfamily. More generally, these findings suggest that ABPP can uncover cryptic members of enzyme classes that have eluded
7.3 Chemical Strategiesfor Activity-based Proteomics
classification based on sequence comparisons, an important discovery given the large numbers of unannotated proteins that have come out of recent eukaryotic and prokayotic genome sequencing projects, and “orphan” or cryptic members of many enzyme classes likely still exist in these proteomes.
7.3.5
Future Development
The discipline of chemistry is perhaps uniquely suited to provide powerful new tools and methods for the functional analysis of the proteome. A s has been highlighted in this chapter, chemical approaches for ABPP have, over the past few years, enjoyed an intense phase of technical innovation, during which these strategies have advanced our understanding of the role that enzymes play in complex physiological and pathological processes. Looking forward, researchers interested in broadening the scope and impact of ABPP are faced with several conceptual and experimental challenges. First, active site-directed chemical probes, which constitute the fundamental currency of ABPP, have, to date, only been developed for a modest portion of the proteome. The successful generation of proteomic-compatible profiling reagents for additional enzyme (and protein) classes will likely require the synthesis of more structurally diverse libraries of candidate probes, which may be either directed (e.g.. possess reactive and/or BGs that bias probe affinity for certain enzyme classes) or nondirected in nature. Enticing forays have already been made into “highpriority” enzyme families, like kinases [40,411 and phosphatases [38, 391, suggesting that most, if not all, enzyme classes should be amenable to active site profiling in whole proteomes. In the development of new active site-directed proteomic probes, it is also important to consider the fidelity with which these reagents will report on changes in protein activity. For certain probes, like the FPs, which react with conserved catalytic residues in the active sites of their enzyme targets, probe labeling has been shown to provide an excellent readout of catalytic activity. However, it is likely that other probes may be discovered that modify enzyme active sites on noncatalytic residues, akin to the manner in which microcystin labels a noncatalytic cysteine residue in serine/threonine phosphatases [57]. Although such active site-directed labeling events would not be considered purely activity-based in a mechanistic sense, from a more biological perspective, if, as is commonly the case, enzyme activity is regulated in vivo by steric blockade of the active site (by autoinhibitory domains or protein/small molecule-binding partners, for example) [ 3 ] , then any probe that is sensitive to these molecular interactions should effectively report on the functional state of enzymes in complex proteomes. More generally, these issues highlight the importance of understanding the molecular basis for individual probe-enzyme reactions, especially those originating from nondirected ABPP
I
421
422
I efforts, where the parameters that dictate probe bindingllabeling are not always 7 Reverse Chemical Genetics Revisited
obvious. Finally, as the proteome coverage of ABPP continues to grow, it is becoming clear that this strategy would benefit from improved methods for the qualitative and quantitative analysis of probe-labeled samples. Currently, most probe-labeled proteomes are analyzed by 1DE or 2DE, which exhibit limited resolving power, especially for large protein families with members of similar molecular mass. Future efforts to merge ABPP with gel-free (e.g., LC-MS [27], capillary electrophoresis [28]) proteomic platforms, may provide a complementary strategy for resolving large numbers of probe-labeled enzyme activities. The enhanced resolution offered by gel-free methods may permit the multiplexing of ABPP probes, such that proteomes of limited quantity could be analyzed simultaneously with a collection of probes. Adapting ABPP for direct LC-MS analysis should also permit comparative quantitation of probelabeled proteomes by isotope-coded mass tagging [ l l ] . Still, it is important to emphasize that, although such LC-MS platforms will surely exhibit superior resolving power compared to 1DE gel-based methods for analyzing probelabeled proteomes, the 1DE approach does possess the advantage of exhibiting much higher throughput (i.e., dozens of proteomes can be compared on a single gel). Thus, the choice of whether to employ gel-based or gel-free strategies (or both) for the analysis of ABPP experiments will likely depend on the scientific problem under examination, with the former strategy being more suitable for the rapid comparison of large numbers of proteomes and the latter approach being superior for the in-depth analysis of a restricted set of samples. In either case, continued efforts to advance both the chemical and technical components of ABPP should foster the development of an increasingly robust and sensitive platform for the functional analysis of both the proteome and its individual constituents. 7.3.6 Conclusions
The field of proteomics aims to develop new tools and methods for the functional characterization of proteins on a global scale. The daunting size and diversity of eukaryotic proteomes, however, have inspired efforts to approach this goal by developing technologies that address the proteome as tractable functional units, that is, the profiling of activity state of specific enzyme classes. In this chapter, we have attempted to illustrate how ABPP offers a powerful strategy to directly access higher order biological information to assist in elucidating the function of proteins in complex cell and organismal systems. Ultimately, the general and systematic application of ABPP will likely require the advent of integrated platforms for the design, synthesis, and analysis of chemical probes that target a large diversity of enzyme classes. However, as outlined here, the success of ABPP studies carried out thus far suggests
References I 4 2 3
that this goal may in fact be attainable. This is highlighted by the impressive number of enzyme classes for which activity-based probes have already been developed as a result of both directed and nondirected approaches, as well as the insights that have been gained by applying ABPP to complex biological systems, ranging from cancer cells and tumors to invasive malarial parasites to mouse models of obesity. More broadly, this chapter has attempted to emphasize the potential ofABPP to identify new diagnostic markers and therapeutic targets for human disease. Through the integration of the comparative and competitive profiling platforms that have been described here, ABPP provides a powerful new avenue for the parallel discovery of disease-associated enzymes (target discovery) and chemical inhibitors thereof (inhibitor discovery), thus complementing the studies being carried out within other realms of chemical biology, as well as providing valuable tools and insight that can be beneficial across multiple disciplines, extending from the lab to the clinic. Indeed, it has been recently stated that chemical biology, as a whole, has as one of its grand challenges the charge of identifying small-molecule modulators for each individual function of all human proteins [58], which would address the large gap that currently exists between basic and clinical research. We anticipate that ABPP will play an important role in achieving this goal.
Acknowledgments
The authors would like to acknowledge the support of the National Institutes of Health [CA087660(B.F.C.)],the California Breast Cancer Research Foundation (N.J. and B.F.C.), and the Skaggs Institute for Chemical Biology.
References P.O. Brown, D. Botstein, Exploring the new world of the genome with DNA microarrays, Nut. Genet. 1999, 21, 33. 2. S.D. Patterson, R. Aebersold, Proteomics: the first decade and beyond, Nat. Genet. 2003, 33, 311. 3. B. Kobe, B.E. Kemp, Active site-directed protein regulation, Nature 1999,402,373. 4. Y. Liu, M.P. Patricelli, B.F. Cravatt, Activity-based protein profiling: the serine hydrolases, Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 14694. 5. N. Jessani, B.F. Cravatt, The development and application of methods for activity-based protein 1.
profiling, Cum. Opin. Chew. Biol. 2004, 8, 54. 6. L.J. van’t Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression profiling predicts clinical outcome of breast cancer, Nature 2002, 415,530. 7. R.A. Heller, M. Schena, A. Chai, D. Shalon, T. Bedilion, J. Gilmore, D.E. Woolley, R.W. Davis, Discovery and analysis of inflammatory disease-related genes using cDNA
424
I
7 Reverse Chemical Genetics Revisited
microarrays, Proc. Natl. Acad. Sci. U.S.A. 1997, 94, 2150. 8. T. Kodadek, Protein microarrays: prospects and problems, Chew. Biol.
Y. Ho, A. Gruhler, A. Heilbut, G.D. Bader, L. Moore, S.L. Adams, A. Millar, P. Taylor, K. Bennett, K. Boutilier, L. Yang, C. Wolting, I. Donaldson, S. Schandorff, 2001, 8,105. J. Shewnarane, M. Vo, J. Taggart, 9. W.F. Patton, B. Schulenberg, T.H. M. Goudreault, B. Muskat, Steinberg, Two-dimensional C. Alfarano, D. Dewar, Z. Lin, electrophoresis: better than a poke in K. Michalickova, A.R. Willems, the ICAT? Curr. Opin. Biotechnol. H. Sassi, P.A. Nielsen, K.J. 2002, 13, 321. Rasmussen, J.R. Andersen, L.E. 10. V. Santoni, M. Molloy, T. Rabilloud, Johansen, L.H. Hansen, H. Jespersen, Membrane proteins and proteomics: A. Podtelejnikov, E. Nielsen, un amour impossible? Electrophoresis J. Crawford, V. Poulsen, B.D. 2000, 21,1054. Sorensen, J. Matthiesen, R.C. 11. S.P. Gygi, B. Rist, S.A. Gerber, Hendrickson, F. Gleeson, T. Pawson, F. Turecek, M.H. Gelb, R. Aebersold, M.F. Moran, D. Durocher, M. Mann, Quantitative analysis of complex protein mixtures using isotope-coded C.W. Hogue, D. Figeys, M. Tyers, affinity tags, Nat. Biotechnol 1999, 17, Systematic identification of protein complexes in Saccharomyces 994. cerevisiae by mass spectrometry, 12. M.P. Washburn, D. Wolters, J.R. Yates 111, Large-scale analysis of the yeast Nature 2002, 415, 180. 17. G. MacBeath, S. Schreiber, Printing proteome by multidimensional proteins as microarrays for protein identification technology, Nat. Biotechnol. 2001, 19, 242. high-throughput function deter13. D.K. Han, J. Eng, H. Zhou, mination, Science 2000, 289, 1760. R. Aebersold, Quantitative profiling of 18. H. Zhu, M. Bilgin, R. Bangham, differentiation-induced microsomal D. Hall, A. Casamayor, P. Bertone, proteins using isotope-coded affinity N. Lan, R. Jansen, S. Bidlingmaier, tags and mass spectrometry, Nat. T. Houfek, T. Mitchell, P. Miller, R.A. Biotechnol. 2001, 19, 946. Dean, M. Gerstein, M. Snyder, Global 14. T. Ito, T. Chiba, R. Ozawa, analysis of protein activities using M. Yoshida, M. Hattori, Y. Sakaki, A proteome chips, Science 2001, 293, 2101. comprehensive two-hybrid analysis to explore the yeast protein interactome, 19. D. Kidd, Y. Liu, B.F. Cravatt, Profiling Proc. Natl. Acad. Sci. U.S.A. 2001, 98, serine hydrolase activities in complex 4569. proteomes, Biochemistry 2001, 40, 15. A.C. Gavin, M. Bosche, R. Krause, 4005. P. Grandi, M. Marzioch, A. Bauer, 20. N. Jessani, Y. Liu, M. Humphrey, B.F. J. Schultz, J.M. Rick, A.M. Michon, Cravatt, Enzyme activity profiles of the C.M. Cruciat, M. Remor, C. Hofert, secreted and membrane proteome that M. Schelder, M. Brajenovic, depict cancer invasiveness, Proc. Natl. H. Ruffner, A. Merino, K. Klein, Acad. Sci. U.S.A. 2002, 99, 10335. M. Hudak, D. Dickson, T. Rudi, 21. Y.A. DeClerck, S. Imren, A.M.P. V. Gnau, A. Bauch, S. Bastuck, Montgomery, B.M. Mueller, R.A. B. Huhse, C. Leutwein, M.A. Heurtier, Reisfeld, W.E. Laug, Proteases and R.R. Copley, A. Edelmann, protease inhibitors in tumor E. Querfurth, V. Rybin, G. Drewes, progression, Adv. Exp. Med. Biol. 1997, M. Raida, T. Bouwmeester, P. Bork, 425,239. B. Seraphin, B. Kuster, G. Neubauer, 22. M. Huse, J. Kuriyan, The G. Superti-Furga, Functional conformational plasticity of protein organization of the yeast proteome by kinases, Cell 2002, 109, 275. systematic analysis of protein 23. H. Shirato. H. Shima, G. Sakashita. complexes, Nature 2002,415,141. T. Nakano, M. Ito, E.Y. Lee, 16.
References I 4 2 5
24.
25.
26.
27.
28.
29.
30.
31.
32.
K. Kikuchi, Identification and characterization of a novel protein inhibitor of type 1 protein phosphatase, Biochemistry 2000, 39, 13848. G.L. Corthals, V.C. Wasinger, D.F. Hochstrasser, J.C. Sanchez, The dynamic range of protein expression: a challenge for proteomic research, Electrophoresis 2000, 21, 1104. D. Greenbaum, A. Baruch, L. Hayrapetian, Z. Darula, A. Burlingame, K.F. Medzihradszky, M. Bogyo, Chemical approaches for functionally probing the proteome, Mol. Cell. Proteomics 2002, I , 60. S.A. Sieber, T.S. Mondala, S.R. Head, B.F. Cravatt, Microarray platform for profiling enzyme activities in complex proteomes, J . Am. Chem. Soc. 2004, 126,15640. G.C. Adam, J.J. Burbaum, J.W. Kozarich, M.P. Patricelli, B.F. Cravatt, Mapping enzyme active sites in complex proteomes, J . Am. Chem. SOC. 2004, 126,1363. E.S. Okerberg, J. Wu, B. Zhang, B. Samii, K. Blackford, D.T. Winn, K.R. Shreder, J.J. Burbaum, M.P. Patricelli, High-resolution functional proteomics by active-site peptide profiling, Proc. Natl. Acad. Sci. U.S.A. 2005, 102,4996. D. Greenbaum, K.F. Medzihradszky, A. Burlingame, M. Bogyo, Epoxide electrophiles as activity-dependent cysteine protease profiling and discovery tools, Chem. Biol. 2000, 7, 569. L. Faleiro, R. Kobayashi, H. Fearnhead, Y. Lazebnik, Multiple species of CPP32 and Mch2 are the major active caspases present in apoptotic cells, E M B O J . 1997, 16,2271. A. Borodovsky, H. Ovaa, N. Kolli, T. Can-Erdene, K.D. Wilkinson, H.L. Ploegh, B.M. Kessler, Chemistry-based functional proteomics reveals novel members of the deubiquitinating enzyme family, Chem. Biol. 2002, 9, 1149. D. Kato, K.M. Boatright, A.B. Berger, T. Nazif, G . Blum, C. Ryan, K. Chehade, G.S. Salvensen,
33.
34.
35.
36.
37.
38.
39.
M. Bogyo, Activity-based probes that target diverse cysteine protease families, Nat. Chem. Biol. 2005, I , 33. A. Saghatelian, N. Jessani, A. Joseph, M. Humphrey, B.F. Cravatt, Activity-based probes for the proteomic profiling of metalloproteases, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 10000. E.W. Chan, S. Chattopadhaya, R.C. Panicker. X. Huang, S.Q. Yao, Developing photoactive affinity probes for proteomic profiling: hydroxamate-based probes for metalloproteases, J . Am. Chem. Soc. 2004, 126,14435. Y.M. Li, M. Xu, M.T. Lai, Q. Huang, J.L. Castro, J. DiMuzio-Mower, T. Harrison, C. Lellis, A. Nadin, J.G. Neduvelil, R.B. Register, M.K. Sardana, M.S. Shearman, A.L. Smith, X.P. Shi, K.C. Yin, J.A. Shafer, S.J. Gardell, Photoactivated gamma-secretase inhibitors directed to the active site covalently label presenilin 1, Nature 2000, 405, 689. M. Groll, T. Nazif, R. Huber, M. Bogyo, Probing structural determinants distal to the site of hydrolysis that control substrate specificity of the 20s proteasome, Chem. Biol. 2002, 9, 655. H. Ovaa, P.F. Van Swieten, B.M. Kessler, M.A. Leeuwenburgh, E. Fiebiger, A.M. Van Den Nieuwendijk, P.J. Galardy, G.A. Van Der Marel, H.L. Ploegh, H.S. Overkleeft, Chemistry in living cells: detection of active proteasomes by a two-step labeling strategy, Angew. Chem., Int. Ed. Engl. 2003, 42, 3626. S. Kumar, B. Zhou, F. Liang, W.Q. Wang, Z. Huang, Z.Y. Zhang, Activity-based probes for protein tyrosine phosphatases, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 7943. K.R. Shreder, Y. Liu, T. Nomanhboy, S.R. Fuller, M.S. Wong, W.Z. Gai, J. Wu, P.S. Leventhal, J.R. Lill, S. Corral, Design and synthesis of AX7 5 74: a microcystin-derived, fluorescent probe for serine/threonine phosphatases, Bioconjugate Chem. 2004, 15, 790.
426
I
7 Reverse Chemical Genetics Revisited
Y. Liu, K.R. Shreder, W. Gai, S. Corral, 49. D. Leung, C. Hardouin, D.L. Boger, B.F. Cravatt, Discovering potent and D.K. Ferris, J.S. Rosenblum, selective reversible inhibitors of Wortmannin, a widely used phosphoenzymes in complex proteomes, Nat. inositide 3-kinase inhibitor, also Biotechnol. 2003, 21,687. potently inhibits mammalian polo-like 50. J.A. Joyce,A. Baruch, K. Chehade, kinase, Chem. Biol. 2005, 280,99. N. Meyer-Morse,E. Giraudo, F.Y. 41. M.C. Yee, S.C. Fas, M.M. Stohlmeyer, Tsai, D.C. Greenbaum, J.H. Hager, T.J. Wandless, K.A. Cimprich, A M. Bogyo, D. Hanahan, Cathepsin cell-permeable activity-based probe for cysteine proteases are effectors of protein and lipid kinases, J. Biol. invasive growth and angiogenesis Chem. 2005,280(32), 29053-9. during multistage tumorigenesis, 42. D.J. Vocadlo, C.R. Bertozzi, A strategy Cancer Cell 2004, 5, 443. for functional proteomic analysis of 51. A.E. Speers, G.C. Adam, B.F. Cravatt, glycosidase activity from cell lysates, Activity-basedprotein profiling in vivo Angew. Chern., Int. Ed. Engl. 2004,43, using a copper(1)-catalyzed 5338. azide-alkyne [3 + 21 cycloaddition,J . 43. G.C. Adam, B.F. Cravatt, E. J. Am. Chem. SOC.2003, 125,4686. Sorensen, Profiling the specific 52. A.E. Speers, B.F. Cravatt, Profiling reactivity of the proteome with enzyme activities in vivo using click non-directed activity-basedprobes, chemistry methods, Chem. Biol. 2004, Chem. Biol. 2001, 8, 81. 11, 535. 44. G.C. Adam, E.J. Sorensen, B.F. 53. H.C. Kolb, K.B. Sharpless, The Cravatt, Proteomic profiling of growing impact of click chemistry on mechanistically distinct enzyme drug discovery, Drug Discov Today classes using a common chemotype, 2003, 8, 1128. Nat. Biotechnol. 2002, 20, 805. 54. C. Chang, Z. Werb, The many faces of 45. K.T. Barglow, B.F. Cravatt, metalloproteases: cell growth, Discovering disease-associated invasion, angiogenesis, and enzymes by proteome reactivity metastasis, Trends Cell Biol. 2001, 1 1 , profiling, Chem. Biol. 2004, 1 I , 1523. s37. 46. N. Jessani, M. Humphrey, W.H. 55. N. Jessani, J.A. Young, S.L. Diaz, M.P. McDonald, S. Niessen, K. Masuda, Patricelli, A. Varki, B.F. Cravatt, Class B. Gangadharan, J.R. Yates 111, B.M. assignment of sequence-unrelated Mueller, B.F. Cravatt, Carcinoma and members of enzyme superfamilies by stromal enzyme activity profiles activity-basedprotein profiling, Angew. associated with breast tumor growth Chem., Int. Ed. Engl. 2005, 44, 2400. in vivo, Proc. Natl. Acad. Sci. U.S.A. 56. S.M. Baxter, J.S. Rosenblum, 2004, 101,13756. S. Knutson, M.R. Nelson, J.S. 47. D.C. Greenbaum, A. Baruch, Montimurro, J.A. Di Gennaro, J.A. M. Grainger, Z. Bozdech, K.F. Speir, J. J. Burbaum, J.S. Fetrow, Medzihradszky, J. Engel, J. DeRisi, Synergistic computational and A.A. Holder, M. Bogyo, A role for the experimental proteomics approaches protease falcipain 1 in host cell for more accurate detection of active invasion by the human malaria serine hydrolases in yeast, Mol. Cell. parasite, Science 2002, 298, 2002. Proteornics 2004, 3, 209. 48. D.C. Greenbaum, W.D. Arnold, F. Lu, 57. M. Runnegar, N. Berndt, S.M. Kong, L. Hayrapetian, A. Baruch, E.Y. Lee, L. Zhang, In vivo and in vitro J. Krumrine, S. Toba, K. Chehade, binding of microcystin to protein D. Bromme, I.D. Kuntz, M. Bogyo, phosphatases 1 and 2A, Biochem. Small molecule affinity fingerprinting. Biophys. Res. Commun. 1995,21 6, 162. A tool for enzyme family subclassification, target identification, 58. S.L. Schreiber, Small molecules: the missing link in the central dogma, and inhibitor design, Chem. Biol. 2002, Nat. Chern. Biol. 2005, I , 64. 9. 1085. 40.
v
~
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
8 Tags and Probes for Chemical Biology 8.1 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
Stephen R. Adams
Outlook
The biarsenical-tetracysteine method was first introduced more than 7 years ago, and further refinements and development of novel applications are still appearing. Within the last few years, biologists have started to exploit the unique features of this system for probing protein trafficking, turnover, localization, and dynamics. This review aims to describe the conception and development of this protein tag and its applications in the biological sciences.
8.1.1 Introduction
The ability to label proteins with green fluorescent protein (GFP) in living cells has been a major research advance in cell biology in the last decade [I]. In response to this success, chemical biologists have devised an ever-increasing variety of alternative methods to provide a wider range of fluorescent colors and other useful functionalities than those available from GFP and its variants. One of the key features of GFP is that it can be genetically encoded; that is, the DNA of the GFP gene can be fused to the DNA of any desired protein by standard molecular biology techniques and then the chimeric protein can be expressed in cells, tissues, or transgenic animals [ 2 ] . All the chemical biological methods incorporate this major stratagem but differ from GFP in that the genetically encoded peptide or protein sequence does not become autofluorescent (like GFP) but acts as a specific receptor for derivatives of fluorophores that can be added exogenously to the expressing cells. The size and structure of this Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L.. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
428
I receptor can be quite varied, from proteins or enzymes the size of GFP (-240 8 Tags and Probes for Chemical Biology
amino acids) such as 06-alkylguanine-DNA alkyltransferase (AGT) [3-5) and single-chain antibodies [GI,to small peptide epitopes as small as 6-20 amino acids [7-91 (Fig. 8.1-1).Binding of the fluorophore derivative with the receptor can be through covalent or ionic bonds or through noncovalent interactions, and may be reversible or irreversible. This review will discuss a method that uses a genetically encoded peptide epitope; a tetracysteine-containing sequence that forms a high affinity yet reversible, covalent complex with biarsenical fluorophores [7, 8, 101. This was one of the first chemical biological methods for tagging proteins to be introduced and has been particularly useful in applications where the GFP is (so far) less capable of, such as protein turnover [ll,121, correlated fluorescence and electron microscopy [I11, and chromophore-assisted light inactivation (CALI) [13,14].It has also been shown to have advantages over the conventional chemical labeling of proteins in vitro, as an affinity-purification handle [8, 151, and as a fluorescence anisotropy probe of protein dynamics [8, 16, 171. New examples of applications of this method, in progress or recently published, include targeting fluorescent calcium sensors to channels inside living cells and replacement for cyan fluorescent protein (CFP) in fluorescence resonance energy transfer (FRET) sensors of G-protein coupled receptor (GPCR) activation [18].
Fig. 8.1-1 A comparison o f t h e relative sizes o f GFP and the biarsenical-tetracysteine complex. The atoms comprising the chromophores are shown in color with the peptide backbone depicted in green.
8. I The BiarsenicaI-tetracysteine Protein Tag: Chemistry and Biological Applications
8.1.2 History and Design Concepts o f the Tetracysteine-biarsenical System
Forming a high-affinity interaction with a peptide as short as 6-20 amino acids generally requires covalent bonds (a notable exception are the florettes for Texas Red; Ref. 19). The thiolate of cysteine is one of the most reactive chemical groups in proteins and its comparable rarity in intracellular proteins offers some hope of specificity. Well-known reactants of protein cysteines include arsenite ion and phenylarsenoxides, both of which contain the arsenic(111) atom. Importantly, these form only weak complexes (about millimolar affinity) with single cysteines (such as those in glutathione which is present at 5- 10 mM in cytoplasm) but bind with micromolar affinity to closely spaced pairs of cysteines. The reaction of such vicinal thiols in cells with arsenic is well described; as is their regeneration by small dithiols such as 1,l-ethanedithiol (EDT) by forming more stable, five-membered ring chelates with the arsenic (Scheme 8.1-1). The concept was to design a high-affinity ligand containing two arsenic groups (a biarsenical) that bind four appropriately spaced cysteines (a tetracysteine) forming a complex that would be stable to such dithiol antidotes. Thereby, preventing binding of the ligand to endogenous vicinal cysteines or thiols leading to additional nonspecific or background labeling and toxicity. The first such molecule was 4’,5’-bis(dithioarsolanyl)fluorescein (F1AsH) that binds with picomolar affinity to peptides or proteins containing appropriately spaced tetracysteines with the general sequence Cys-Cys-Xaa-Xaa-Cys-Cysin which Xaa is an amino acid other than cysteine [7].Such tetracysteine motifs are very rare in naturally occurring proteins, so only the tagged protein is labeled
X = H, 1.2-ethanedithiol,EDT X = CHzOH, British Anti-Lewisite,BAL
p. SSI
X
Scheme 8.1-1 The regeneration of protein-lipoates cofactors and enzyme thiols bound to arsenic by reaction with small dithiols.
1
429
430
I
8 Tags and Probesfor Chemical Biology
Fig. 8.1-2
Fluorescent enhancement of FIAsH-EDT2 on binding a tetracysteine peptide.
with FlAsH. When FlAsH is bound to two moles of EDT, forming FlAsHEDT2, its fluorescence is almost completely quenched; but on reaction with a tetracysteine peptide a strongly fluorescent complex is formed (Fig. 8.1-2). This feature is particularly useful when labeling cells expressing tetracysteinetagged proteins, as unbound dye does not have to be fully removed by washing to generate contrast unlike most alternative labeling methods. Even so, nonspecific binding of FlAsH to thiols and hydrophobic sites can generate some background signal that limits the sensitivity of this method compared to GFP [8, 10, 201.
8.1.3 General Considerations 8.1.3.1
The Chemistry of Biarsenicals
FlAsH-EDT2 is synthesized by the palladium acetate-catalyzed transmetallation of fluorescein 4’,5’-bis-mercuricacetate (or trifluoroacetate) by arsenic trichloride in polar aprotic solvents such as N-methylpyrrolidinone [lo].Rather than isolate the resulting unstable dichlorophenylarsine intermediate, EDT is added to generate FlAsH-EDTZ,which can be purified in modest overall yield by chromatographyon silica gel (Scheme 8.1-2).FlAsH-EDT2can be hydrophobic like its parent fluorescein (e.g., soluble in toluene) or hydrophilic (soluble in aqueous neutral buffer) because of a reversible lactone-quinone tautomerization (Scheme 8.1-2).FlAsH is therefore permeable across cell membranes but can still generate a sufficiently high concentration in the cytoplasm to give a rapid reaction with a tetracysteine-tagged protein.
I
8.7 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
HOTOH O Y C F 3OYCF3
/
,
. O0
/ \
Fluorescein
~ 0
HgO
-
TFA
00
A
~
, ,1 ASC13
/
/ \ \
O 0
0
H
'As'
'As'" o
~'As' o
H
-
\
2 EDT
\
/
' \
0
\
O
w
I
2
/
coz-
FIAsH-EDT, Dianion quinone tautomer colored, non-fluorescent
1 Hg2' 2. -2H'
0 As ,
&CO,
H
\
FIAsH:EDT, Free acid lactone tautomer colorless
Fluorescein 4',5'-bismercuric trifluoroacetate
n 'As'
/ A 7
0 As
Scheme 8.1-2
n
n
Pd(OAc)z
431
.
Dianion FlAsHO auinone tautomer Colored. weakly-fluorescent
The synthesis of FIAsH-EDT2 and FIAsHO.
Biarsenicals sharing the dihydroxyxanthene skeleton of FlAsH (Scheme 8.1-3) can be synthesized analogously (Scheme 8.1-2). Mercuration of the parent dye usually occurs quite cleanly using mercuric trifluoroacetate in trifluoroacetic acid as a solvent; using mercuric acetate-acetic acid can lead to a mixture of substituted products that are difficult to separate. ReAsH, the corresponding derivative of the red-fluorescent dye resorufin [8], is the most important biarsenical besides FlAsH as it has additional features as a photosensitizer in addition to a fluorophore [8].A blue fluorescent biarsenical 181, CHoXAsH-EDT2 completes the range of colors available, although it is more prone to photobleaching than FlAsH or ReAsH. Biarsenicals substituted with halogens, carboxylic acids, amines, sulfonic acids, and so on can be synthesized and are useful in adding other functionalities or reactivities [8]. For example, carboxy- or amino-FlAsH can be used to attach the biarsenical to a solid support for affinity chromatography of tetracysteine-tagged proteins [8, 151, or are useful intermediates in the synthesis of more complex biarsenicals such as environmentally sensitive fluorophores [21] and calcium indicators (unpublished [22]). The sulfo derivative renders the biarsenical membrane impermeable allowing the labeling of extracellular or membrane proteins with no intracellular staining 18).Adding halogens such as the chloro substituents in ChoXasH-EDT2decreases the pH sensitivity of the dye in the physiological range, whereas adding bromine substituents in FlAsH or ReAsH increases the photosensitizing properties via the heavy atom effect. Replacing the oxygen bridge of the xanthone with sulfur has a similar effect, but almost completely quenches the fluorescence [8].
+
~
~
432
/ J $o
In
8 Jags and Probesfor Chemical Biology
n
S\A<S
n
s,AAis O&L&-
"CI
O
0 w
CI -
CHOXASH-EDT, FIAsH-EDT, Green fluorescence FRET from CFP
n
Blue fluorescence FRET to GFP,YFP
ReAsH-EDT, Red fluorescence FRET from GFP, YFP
\
BrAsH-EDT,
Photoconversion of diaminobenzidine for electron rnicroscopy
n
v
6 Qco2
H A 0
BarNile-EDT, Environment-sensitive fluorescence
SulfoFIAsH-EDT,
OR
Immobilized FIAsH-EDT,
",1
coz-co2-
Calcium green FIAsH-EDT, Low affinity fluorescent Ca2+indicator
Scheme 8.1-3
Membrane impermeant ligand for extracellular proteins
AffinltYchromatography
Biarsenical ligands for tetracysteine tags.
Biarsenicals, which replace one or both of the phenolic groups with amino substituents to form rhodol or rhodamine biarsenicals, can also be synthesized. An amino derivative of Nile Red, a napthorhodol, can be converted by the usual method to give an environmentally sensitive biarsenical fluorophore (Scheme 8.1-3) [23]. Biarsenical derivatives of tetramethyl rhodamine or rhodamine B have also been made [S]; the usual mercuration conditions gave no reaction, but reaction of the free base in nonpolar solvents was successful. However, despite both rhodamine biarsenicals binding to tetracysteine peptides, the complexes were neither fluorescent nor colored, suggesting that the rhodamine is in the lactone form. This is presumably because steric hindrance between the arsenic-dicysteine group and the N,Ndialkyl group forces the nitrogen lone pairs out of conjugation and destabilizes the quinone tautomer. Screening a library of tetracysteine variants failed to find any sequences that formed fluorescent complexes with these biarsenicals, appropriately named TrAsH and RbAsH (unpublished results). Rhodamines lacking alkyl substituents have proven much harder to synthesize so far and would also fluoresce in the green-like FlAsH; however, their improved resistance to photobleaching might make them valuable as labels for singlemolecule studies. Biarsenical derivatives of other fluorophores emitting at longer wavelengths would also be useful, particularly those based on nonxanthene skeletons
8.1 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
such as cyanines. Naphthofluorescein biarsenicals have been synthesized but become less fluorescent on binding cysteine-containing peptides (unpublished results). Cyanines containing two pendant phenylarsenoxides attached by flexible linkers have been reported but their properties have not yet been described in sufficient detail [24]. The loss of rigidity between the two arsenic atoms may lead to a greatly reduced affinity. In general, the increased distance between the arsenic groups in these biarsenicals based on such fluorophores would probably require tetracysteine motifs with more intervening residues between the cysteines, namely, CC(X)3_5CC,which might result in an orthogonal labeling system to the existing biarsenical-tetracysteine system.
8.1.3.2
The Tetracysteine Motif
The first tetracysteine peptide to be designed (AcWEAAAREACCRECCARANHZ) was based upon a a-helical peptide with four cysteines incorporated such that they form a patch on one face of the helix. Amino acids such as alanine were included to increase the proportion of helical character in such a short peptide (17 amino acids) and salt bridge forming arginine and glutamate pairs were included at the i. i 4 positions. A number of biarsenicals were synthesized in which two phenylarsenoxides were connected by linkers of various lengths or on disubstituted ferrocenes, but they all proved unstable to even stoichiometric amounts of dithiol antidotes. Only the complex formed with the rigidly spaced arsenic groups of FlAsH was sufficiently stable with the low concentrations of EDT required to prevent promiscuous binding to other cellular thiols. A helical structure of the complex was proposed as a model monoarsenical increased helicity in dicysteine peptides according to circular dichroism measurements. This could not be shown to be the case with the FlAsH-tetracysteine complex because of obscuring absorbance changes from FlAsH at these wavelengths (Griffin, unpublished). As a test of the proposed helical structure of the complex, we designed a peptide containing the helix-breaking residues proline and glycine interposed between the pairs of dicysteine residues [8].Surprisingly, FlAsH readily bound to this peptide with higher affinity (Table 8.1-1)forming a more fluorescent complex. Preliminary N M R studies with a short peptide, AcWDCCAECCK-NH2,indicated through interactions consistent with a hairpinlike structure, with the turn centered at the residues between the cysteines (unpublished results with D. Wemmer). A more conclusive answer must await either more complete NMR work or a crystal structure. Confirmation of the increased affinity of FlAsH for the motif CCPGCC came from detailed kinetic studies of the on- and off-rates for the complexes [8]. The reaction of FlAsH-EDT2 with tetracysteines involves at least two steps, as each EDT has to dissociate before binding the cysteines. To simplify the reaction, we looked at the reaction of FlAsH lacking EDT in which the arsenics are present as arsenoxides (Scheme 8.1-2). Reaction of FlAsH
+
I
433
434
I
8 Tags and Probes for Chemical Biology Table 8.1-1 Fluorescent properties and stability of
FIAsH-tetracysteine complexes Peptide sequence
ccxcc
Fluorescence quantum yield ofcomplex
X=
AcWDCCCCK-NH2 AcWDCCACCK-NH2 AcWDCCGCCK-NH2 AcWDCCPCCK-NH2 AcWEAAAREACCRECCARA-NH2 AcWDCCAECCK-NH2 AcWDCCSECCK-NH2 AcWDCCDECCK-NH2 AcWDCCPGCCK-NH2 AcWDCCGPCCK-NH2 AcWDCCDEACCK-NH2
Apparent kd (in 20 mM monothiol) (PM)
0.14
1800
A G P
0.6 0.55 0.6
67 100 150
RE AE SE DE PG GP DEA
0.5 0.58 0.58 0.5 0.67 0.44
70 72 42 41 4 72
0.23
92 000
bis-arsenoxide (generated by addition of two equivalents of Hg2+) with a tetracysteine peptide can be followed by an increase in fluorescence and occurs rapidly with rates of lo6 M-' s-l . To determine the off-rates of the complex, we used an exchange reaction with a large excess of the red biarsenical, ReAsH also present as the arsenoxide. To increase this extremely slow reaction and also to mimic intracellular thiol concentrations, the off-rates were determined in the presence of varying amounts of a monothiol, 2mercaptoethane sulfonate (MES). Even with 20 mM MES, exchange rates with the AcWDCCPGCCK-NH2 complex still took several weeks for completion, indicating affinity constants of the complex in the low picomolar range. Devising a method of measuring the complex stability allowed some testing of the optimal spacing of the four cysteines in the tetracysteine peptide (Table 8.1-1).Sequences with spacing of one or two amino acids turned out to be considerably more stable than the ones with zero or three residues, with the latter having off-rates up to 3 orders of magnitude greater. Additionally, the fluorescent quantum yields of the complexes roughly paralleled their stability with the CCPGCC sequence yielding a value of >0.6, comparable with that of GFP. Studies with model peptides were also able to suggest the orientation of FlAsH binding to a tetracysteine peptide [8]. Each arsenic atom can bind to adjacent cysteines, namely, at the i, i 1 positions or to alternating cysteines, i, i 4.The reaction of the two peptides, AcWEAAARECCARANH2 and AcWEACARECAARA-NH2 with a fluorescein monoarsenical atom gave binding kinetics and fluorescent products similar to those of FlAsH to a tetracysteine peptide with the first peptide only, indicating FlAsH prefers the i , i 1configuration. HPLC (high-performanceliquid chromatography) of
+
+
+
8. I The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
the reaction products of FlAsH with different peptides indicate that a number of isomeric products can be formed, indicating that such different binding configurations are possible. However, with all the peptides containing the CCXXCC motif, only two products were formed with identical fluorescent quantum yields, suggesting conformational isomers involving the hindered benzoic acid group. These isomers interconvert at pH 7 and are only isolatable under the acidic conditions of the HPLC separation. Indeed, ReAsH that has no benzoic acid substituent forms only a single product with these tetracysteine peptides.
8.1.3.3
Optimizing the Tetracysteine Sequence with Peptide Libraries
Despite the picomolar affinity of FlAsH for peptides containing the CCPGCC motif, further improvements seemed possible as a fluorescein monoarsenical bound to a dicysteine peptide with an affinity of 100nM, simplistically suggesting an affinity of = 10 fM was theoretically possible. A higher affinity ligand is desirable to increase the specificity of labeling with FlAsH in living cells. Under typical labeling conditions, low concentrations (10 pM) of the EDT antidote must be used to ensure that labeling occurs within a reasonable time frame as high EDT concentrations reverse the reaction with tetracysteines. Under these less stringent conditions, nonspecific binding of FlAsH to cellular thiols occurs, leading to some fluorescent background staining. This staining is almost completely removable by washing with a high (millimolar) concentration of antidotes such as British anti-Lewisite (BAL) but this completely reverses the desirable staining to tetracysteinetagged proteins [8]. Optimized tetracysteine sequences that are stable in such dithiol concentrations would greatly enhance the specificity of the biarsenicaltetracysteine method. A library approach was required to sample sufficient optimal peptides. We used a retrovirally transduced library of tetracysteine peptides fused to GFP in mammalian cells, which could be labeled with ReAsH and screened for high affinity (resistance to dithiol) and enhanced fluorescence with a fluorescence activated cell sorter (FACS) [22, 251. Advantages of this approach include the reducing cellular environment that maintains the thiol form of the cysteines required for reaction with biarsenicals (phage libraries containing cysteines are usually avoided because of the formation of disulfides). Peptides are screened in the environment in which they will be used and any peptides that are toxic or express poorly will be selected against. Inclusion of GFP as a reference fluorophore allows measurement of ReAsH binding through FRET while expression levels of the peptide can be assessed by emission from GFP. This method also selects for an optimal GFP-tetracysteine combination (i.e., no unfavorable effect of the peptide on the folding efficiency and fluorescence of GFP) for biological applications that use GFP fluorescence but require ReAsH for additional functionalities such as pulse-chase labeling, photoconversion, or CALI. Using a retroviral vector allows generation of libraries with high
I
435
436
8 Jags and Probesfor Chemical Biology
I complexity
(>lo8 different peptide sequences) and integration into the cell’s DNA forms stable cell lines expressing single variants with high expression levels. Recovery of the peptide sequence can be achieved by Polymerase chain reaction (PCR). Finally, FACS permits comparatively quick sorting of cells (about 10 million cells/day) on the basis of their fluorescence properties at several different wavelengths (i.e., FRET or ReAsH fluorescence) with either pooling or single-cell collection options available. Two libraries were constructed; the first to test the feasibility of the approach and optimize the residues between the cysteines and some flanking residues, and the second to incorporate the intervening residues into a larger library of flanking residues. The first retroviral library, RR1 included all amino acids other than cysteine as a C-terminal fusion peptide to GFP, was used to infect CHO cells so that on an average only one sequence was expressed per cell. Initial FACS analysis of the ReAsH stained cells showed as expected a range of GFP to ReAsH fluorescence ratios, with a few cells showing increased ReAsH fluorescence. Those cells were collected and expanded, and then restained and sorted at higher stringency for binding by increasing the dithiol concentration during the washing step. After three cycles of increasing EDT washes and sorts, single cells were sorted, and the resulting 10 clones were sequenced. The clonal cell lines, which included several duplicates, were those assayed on a multiplate reader or a fluorescence microscope for ReAsH binding when titrated with EDT. The most resistant clones contained the CCPGCC motif with the highest affinity sequence being MPCCPGCCGC which showed a twofold improvement in its resistance to EDT compared to our previous best peptide, AEAAARECCPGCCARA. The additional cysteine when mutated to serine showed no change in EDT resistance, indicating that ReAsH still bound to the invariant cysteines. Encouraged by this confirmatory result, the second library RR2 consisting of XXXCCPGCCXXX (where X is any amino acid other than cysteine) fused to the N-terminus of GFP was used to produce 30 million transduced HEK 293 cells. This time the top 10-15% cells in the FRET-to-ReAsH ratios were collected by FACS, so no library members were lost through overly restrictive sorts that lost library diversity because of the high degree of noise in such single-cell measurements. Ten subsequent sorts at ever-increasing BAL wash conditions enriched the population in high binders and improved fluorescence until the cells were sorted into two pools for either high FRET or high ReAsH fluorescence. Dithiol titration of these cells indicated many lines, showing that high ReAsH resulted merely from GFP overexpression while the high FRET fluorescence lines had the desired properties. Cloning and sequencing some of these lines gave two peptide sequences (HRWCCPGCCKTF and FLNCCPGCCMEP) with greatly improved BAL resistance (up to 10-fold improvement) and increased ReAsH fluorescence (a twofold increase in brightness). The two sequences showed little consensus for acidic or basic residues at any position but did include a surprisingly high number of aromatic residues including fluorescent quenchers such as tryptophan.
8. I The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
These improved properties were maintained when fused to cellular proteins such as actin and gave increased staining specificity compared to earlier tetracysteine sequences. The improvements were independent of GFP (as this was coselected for in the screen this was a distinct possibility: in fact a peptide that precipitated GFP when bound to ReAsH and mimicked high affinity was one of the final sorted clones) so actin tagged with the peptides alone, showed high contrast staining in the presence of up to 1 mM BAL. Comparison of the staining sensitivity of these new peptides with GFP showed that they were about 15-fold lower with detection limits in the low micromolar for diffuse cytoplasmic staining which is about lo-fold higher than with our previous best peptide.
8.1.3.4
Toxicity and Antidotes
Toxicity is often an initial concern for potential users of the biarsenicaltetracysteine system for labeling living cells. However, the typical staining conditions used for FlAsH and ReAsH (50 nM-2.5 pM biarsenical and 10 pM EDT) with incubations of a few hours have shown no general effects on cell viability, signaling pathways, or mitochondria1 toxicity (this is the site of several proteins containing lipoic acid cofactors sensitive to arsenic - see Scheme 8.1-1)[7].Inclusion of the usual 10 pM EDT during staining does not even appear necessary according to a recent labeling protocol [26].Cells can be stained overnight (usuallywith lower concentrations of dye) with no apparent negative effects. Cells stained with FlAsH or ReAsH have been followed through cell division by fluorescence microscopy (unpublished results), and have been clonally expanded following single-cell sorting, suggesting little if any long-term effects from labeling [25]. Sensitive primary cells such as cultured neurons, tolerate staining with biarsenicals [12] in addition to more robust tissue culture cell lines. Reports of staining of transgenic flies do not mention any adverse effects although the desired labeling was ultimately unsuccessful as FlAsH was preferentially absorbed by fat bodies presumably because of its high lipophilicity [13]. A number of dithiol arsenic antidotes are useful with the biarsenicaltetracysteine method. EDT is the simplest, most hydrophobic, and has the weakest affinity for biarsenicals that makes it most suitable as the antidote used during labeling. Solutions of FlAsH-EDT2 can lose the bound EDT through a process of hydrolysis followed by irreversible oxidation such that labeling cells with a 10-foldexcess of EDT usually improves the staining. FlAsH lacking one or both bound EDT is less membrane permeable but readily reacts with free EDT at physiological pH. Removing nonspecific staining is best achieved with BAL as it is about 2.5-fold more efficacious than EDT [8];it is also less toxic to cells (it is still used clinically for treating acute arsenic and heavy metal poisoning, especially for organoarsenic compounds), more soluble in the cell culture solutions, and has a considerablyless offensive odor. However using it (at appropriately low concentrations) during cell staining
I
437
438
I can prevent labeling presumably because the FlAsH-BAL2 that is formed 8 Tags and Probes for Chemical Biology
is considerably less membrane permeable (unpublished results). The high concentrations of dithiol used in destaining, particularly with the optimized, high-affinitytetracysteines could cause cellular toxicity. BAL has been reported to decrease cell viability when present continuously at concentrations over 1 mM [27]; however, the brief incubation of a few minutes required for the destaining of biarsenical labeled cells seems much less toxic. BAL and EDT are comparatively weak reducing agents compared to such dithiols as dithiothreitol (DTT) and probably do not significantly alter the redox potential of the cytoplasm (which contains 5-10 mM reduced thiol as glutathione). They do not appear capable of reducing oxidizing environments such as the endoplasmic reticulum and Golgi as the reduction of oxidized tetracysteinetagged proteins in such compartments requires the addition of more potent, reducing agents such as phosphines (unpublished results). Aromatic dithiols such as 1,2-benzenedithiol and 3,4-toluenedithiol also quench the fluorescence of FlAsH when bound to the arsenic groups (unpublished results). Their increased lipophilicity and weaker binding to biarsenicals compared to EDT, make them faster but less specific reagents for labeling tetracysteines in cells. So far, all attempts to make dithiol analogs that have a higher affinity for biarsenicals than EDT (to reduce nonspecific staining) but with a low-molecular-weight and suitable polarity have been unsuccessful.
8.1.3.5
Comparison with Other Small-molecule Labeling Systems
The revolutionary effect of GFP on cell biological research has inspired numerous, often ingenious, alternatives requiring the interplay of chemistry and biology [3-6, 9, 19, 28, 291. All these methods, including the biarsenicaltetracysteine method generally require the addition of an exogenous ligand, usually labeled with a fluorophore that has to penetrate cell membranes (for more useful intracellular applications), bind specifically, and remove the excess by washing. The difficulty in achieving these steps with high efficiency and speed for a range of different conjugated fluorophores and labels is the major limitation of all these methods. By comparison, autofluorescent proteins do not require any additional components other than molecular oxygen and so are far easier to use experimentally. In general, the more complex the method is (i.e., the number of components such as receptor, ligand/substrate, modifying enzyme, etc.) the more limited the applications. Of course, GFP and related proteins have limitations: size of tag, colors available, no readouts other than fluorescence; yet these have proven to be less of a problem than what might have been expected, and active research has made steady progress in improving their properties and extending applications. Widespread use of alternative methods needs to offer new functionalities such as sensors, caged molecules, and so on and with additional readout modes such as MRI (magnetic resonance imaging) and PET (positron emission tomography), in
8.1 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
addition to improvements in its practicability in simple tissue culture cells and eventually intact organisms. The development of such new molecules will require chemical ingenuity coupled with expertise and knowledge of biological systems. A detailed comparison between genetically encoded tags for proteins is beyond the scope of this review; the topic has been reviewed quite extensively [30-321 and elsewhere in this book (Chapter 8.2). Ultimately, such comparisons must include an assessment of the biological discoveries made with these methods, as has been attempted in this review for one ofthem, the biarsenical-tetracysteine system. Many newer methods have not yet had the time to progress beyond proof-of-concept experiments and await testing in the more complex and challenging experiments that are often required for advances in biological knowledge.
8.1.4 Practical Applications of the Biarsenical-tetracysteine System 8.1.4.1
A Small, Genetically Encoded Fluorescence Tag
The small size of the biarsenical-tetracysteine tag has made it useful in biological studies requiring a genetically encoded fluorescent tag and GFP is not tolerated or has deleterious effects because of its size. FlAsH binding occurs rapidly, does not require any protein secondary structure to generate fluorescence (unlike GFP and its variants that can take minutes to days to become fully fluorescent) and should therefore be a more faithful reporter of the initiation of protein synthesis. The following examples include studies in viruses, bacteria, yeast, and mammalian cells and also indicate the broad applicability of the biarsenical-tetracysteine system. 8.1.4.1.1
Trafficking of Viral Coat Proteins
The biarsenical-tetracysteine tag has been used successfully to monitor the targeting and trafficking of Ebola virus coat protein, VP40 to lipid rafts in mammalian cells [33]. Tagging this protein with GFP results in a failure to efficiently form virus-like particles (VLP). However, a 17 amino acid Nterminal tetracysteine tag (WEAAAREACCRECCARA), when coexpressed with the viral glycoprotein, resulted in the efficient release of filamentous viral particles that could be visualized by electron microscopy. HEK 293T cells transfected with tagged VP40, stained with FlAsH gave a plasma membrane (PM) localization but with an unexpected fraction in the intracellular globular or tubular compartments. FlAsH staining correlated well with fixed cells labeled with anti-GP40 antibody. Live cell imaging of VP40 and its truncation mutants, coupled with biochemical fractionation experiments suggests that the protein concentrates in lipid rafts at the PM and on oligomerization results in viral budding.
I
439
4-40
I
8 Jags and Probesfor Chemical Biology
8.1.4.1.2
Infection of Mammalian Cells by Yersinia Bacteria
The small size of the biarsenical-tetracysteine tag was also used advantageously to investigate the role of specific proteins of the pathogenic bacteria Yersinia in the infection of mammalian cells [34].This family of bacteria, which includes Yersinia pestis - responsible for the plague, infects cells by injecting various Yops (Yersinia outer proteins) through a thin needle complex. Two of the 14 Yops present in Yersinia enterocolitica, YscMl and YscM2, which are known to regulate the expression of these proteins in the bacteria, were tagged with a 16 amino acid tetracysteine tag. These were used to directly detect secretion of these proteins into HeLa cells following infection, washing, fixation, and labeling with FlAsH. The tag did not appear to effect protein function as it confirmed biochemical differential fractionation studies that show that YscM2 was injected into the cytoplasm. Using a larger tag, such as GFP, might hinder secretion of these proteins through the narrow pore of the needle complex. 8.1.4.1.3
Comparing CFP and Tetracysteine Tags to j3-tubulin in Yeast
GFP as a fusion tag is surprisingly well tolerated in living cells and even organisms despite its size. For example, in yeast when GFP was systematically fused to the C-termini of all the open reading frames in their chromosomal location, over 75% of the resulting proteins were expressed at levels that permitted subcellular localization of the protein [35]. Recently, the effect of tagging B-tubulin with GFP or with much smaller one, two, or three tandem tetracysteine tags (lOaa, 20, or 29aa respectively) has been compared as chromosomal insertions in yeast strains [36].Generally, the larger the tag the more effect on yeast viability. For example, one or two tetracysteine tags were viable in haploid yeast (in which only the tagged protein is expressed), whereas three tetracysteine tags or GFP could only be tolerated in diploid heterozygous yeast (with an unmodified tubulin also present) although both could be incorporated into microtubules. However, even one tetracysteine had a discernible effect on spore viability and showed a subtle growth defect at elevated temperatures. All the yeast strains expressing tetracysteine-tagged B-tubulin could be labeled with FlAsH after overnight staining of growing cells and gave the expected microtubule staining with remarkably low nonspecific background. However, the staining of B-tubulin with multiple tetracysteine tags were no more fluorescent than the singly tagged version although they were more resistant to photobleach. This is reminiscent of fluorophore-labeled antibodies that are quenched at high dye-to-proteinratios and may be caused by dye-dye stacking interactions. 8.1.4.1.4 Replacing CFP with FlAsH in a FRET Sensor o f C-protein Coupled Receptor (CPCR) Activation
One of the most successful and novel applications of GFP has been in the design of FRET sensors of biochemical pathways [l]. Two-color mutants of
8. I The Biarsenical-tetracystei~eProtein Tag: Chemistry and Biological Applications
GFP, typically CFP and YFP (yellow fluorescent protein), which are capable of FRET are fused with intervening proteins or protein domains. Biochemical pathways that modify these domains, for example, phosphorylation, or binding of a small ion or molecule such as Ca2+or CAMP,alter the distance and/or the orientation of CFP and YFP relative to each other, changing the degree of FRET occurring. The resulting ratiometric fluorescence signals allow monitoring of such pathways by the imaging of single cells, tissues, and organisms. Recently, Lohse’s group has replaced YFP with FlAsH in two FRET sensors of GPCR activation with two advantageous consequences [18].These constructs consist of a C-terminal CFP and a tetracysteine site inserted in the third intracellular loop of either the A2-adenosine receptor or the (Y 2-adrenergic receptor. FlAsH labeling of cells expressing these chimeras showed FRET from CFP to FlAsH that was modulated by the binding of the receptors to their natural ligands, adenosine and nor-epinephrine respectively. Similar sensors containing CFP inserted at this loop (with a C-terminal YFP) also respond to receptor activation but their FRET changes are small (a few percent change in the FRET ratio). However, the presence of CFP greatly impairs G-protein coupling and therefore the downstream signaling of the receptors [37]. In contrast, the FlAsH versions give much larger changes in FRET (three- to fivefold), permitting single-cell imaging of the adrenergic receptor activation. Importantly, for the adenosine receptor, downstream activation of adenyl cyclase was unaltered compared to the wild-type receptor, indicating the less perturbative effect of the FlAsH-tetracysteine tag compared to the much larger fluorescent proteins. Such improvements in these fluorescent reporters should be very useful in the development of optical assays for GPCRs. 8.1.4.1.5
Release of Mitochondria1Cytochrome c during Apoptosis
One of the key events in programmed cell death or apoptosis is the release of cytochrome c from the mitochondria1 matrix into the cytoplasm where it triggers the formation of the apoptosome, resulting in caspase activation and eventual cell death. What controls the release of cytochrome c and what extra steps are required (if any) following loss of the mitochondria1 transmembrane potential ( A Q m ) are current areas of active research. Single-cell studies of cytochrome c release have used cytochrome c-GFP as a fluorescent marker in conjunction with a fluorescent indicator of A q m and the revealed release is sudden, rapid, and complete before any changes in the A q m [38]. However, concerns have been voiced about the size of GFP (-30kDa) relative to cytochrome c (10.5 kDa) and whether this assay faithfully reports the behavior of the endogenous polypeptide. Recent work [39] has taken advantage of the small size of the biarsenical-tetracysteine system to construct a much smaller fluorescent reporter (13.3 kDa). Interestingly, this construct when labeled with FlAsH in single living cells targeted to the mitochondria and was released in a similar manner and kinetics to the GFP reporter, again preceding decreases in A Q m. Additionally, the ReAsH-labeled reporter could
I
44’
442
I be monitored simultaneously with the GFP reporter in the same cell and 8 Tags and Probes for Chemical Biology
showed that identical kinetics demonstrating the pores generated during mitochondria1 permeability do not hinder diffusion of large proteins. 8.1.4.1.6
An Assay for Targeted Nuclear Acid Repair for Gene Therapy in Yeast
The biarsenical-tetracysteine system has also been used to develop an in vivo assay of nucleic acid repair, targeted by DNA-RNA hybrid oligonucleotides [40, 411. These double-stranded hairpin-capped molecules form a double-D loop structure on hybridization with the targeted chromosomal gene, initiating repair, and have been used to correct mutated genes in several animal models. As a model system to investigate the proteins involved in such repair and the conversion efficiency, yeast was transfected with a plasmid that expresses a mutated neomycin marker containing an internal stop codon TAG, which is tagged with C-terminal 19 amino acid tetracysteine tag. Expression of the protein results in premature termination, so no fluorescence is seen when the cells are labeled with FlAsH. Repair oligonucleotides can be coelectroporated into the cells and if repair is successful, the G is converted to C and a complete protein is produced resulting in green fluorescence after labeling with FlAsH. The biarsenical-tetracysteines system is advantageous over GFP as the dyes bind and generate fluorescence rapidly with no requirement for protein folding, allowing the rate of conversion of different oligonucleotides to be compared. Cellular inheritance of the repair can also be demonstrated by washing out the label, expanding individual cells, and then by relabeling; long-lived GFP molecules might yield false positives. 8.1.4.1.7 Monitoring Protein Synthesis and Folding in Bacteria with FlAsH To generate an indicator of protein folding, Gierasch and coworkers inserted
a tetracysteine motif into a surface-exposed loop of a mammalian cellular retinoic acid binding protein (CRABP 1) and expressed it in E. coli [42]. A CCGPCC site was created by mutating two adjacent amino acids to cysteines around an existing GP followed by insertion of two additional cysteines. The tetracysteine insertion had a minimal effect on folding compared to the wild type. Unexpectedly, FlAsH-labeled protein increased in fluorescence on folding, allowing the measurement of the free energy of unfolding and ureadependant denaturation of CRAB P in the crowded cellular environment of living E. coli bacteria. Preloading the cells with FlAsH permitted the real-time monitoring of protein folding and the formation of inclusion bodies by a slow-foldingmutant, which show up as bright fluorescent puncta at the poles of the cells. Although other assays of protein folding that use GFP have been proposed, this method probes directly the protein of interest. As FlAsH binding can occur as soon as the protein is synthesized, it also potentially reports much faster than GFP, which can take up to tens of minutes to fold and fluoresce.
8.7 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
8.1.4.2
Multicolor Pulse-chase Labeling
One noticeable feature of the biarsenical-tetracysteine system is the high stability of the complex when formed, with off-rates up to weeks in vitro [8]. This property can be exploited with a two-color pulse chase of protein turnover in living cells, using the sequential addition of FlAsH and ReAsH to label old and newly synthesized proteins. The time course of protein turnover can be determined simply by varying the time interval between removal of the first label and the addition of the second label. Unlike traditional biochemical methods of pulse chase that use incorporation of radioactive amino acids, the two-color method allows continuous imaging of single cells and can reveal additional information about subcellular localization of protein turnover. 8.1.4.2.1
Turnover of Connexin43 in Gap Junctions
The two-color pulse chase was developed to examine the turnover of gap junctions in HeLa cells (111.Gap junctions are channels that connect adjacent cells and are permeable to small molecules and ions. They are composed of connexins, a family of transmembrane proteins with molecular weights ranging from 25 to 50 kDa. A functional channel is composed of 12 connexin subunits with each adjacent cell contributing a hexameric hemichannel. Large semicrystalline clusters of these channels form gap junctional plaques. Specific FlAsH or ReAsH staining of plaques in HeLa cells could be achieved by expressing connexin43 tagged with a C-terminal EAAAREACCRECCARA tetracysteine motif (Fig. 8.1-3). Old protein was first stained with FlAsH, the unbound dye being removed by washing with a low concentration of EDT, and subsequent freshly synthesized connexin43 was visualized by staining with ReAsH. When the period between FlAsH and ReAsH staining was 24 h, green fluorescence was not visible in the gap junctional plaques but in cytoplasmic vesicles indicating complete turnover of connexin43 in that time period. However, decreasing the staining interval to about 4 h (to match the expected half-life of connexin43) gave striking images of plaques with green centers surrounded by a red rim (Fig. 8.1-4) indicating freshly synthesized protein additions to the outside of the plaque with the old protein removed from its center. The cytoplasm contained abundant separate green and red vesicles of labeled protein trafficking to and from the plaque respectively. 8.1.4.2.2 Activity Dependant Turnover and Trafficking of Glutamate Receptors in Neurons
The role of AMPA glutamate receptor trafficking in the neural plasticity required for learning and memory is currently an area of active research in neurobiology. Malenka and coworkers tagged the carboy termini of two AMPA receptor subunits, GluRl and GluR2 with the tetracysteine motif, EAAAREACCRECCARA and used two-color pulse-chase labeling to look at their trafficking and localized synthesis in living cultured hippocampal neurons by confocal microscopy [12]. The tagged proteins following staining gave the
I
443
444
I
8 Tags and Probes for Chemical Biology
Fig. 8.1-3 Specificity of FlAsH staining in HeLa cells expressing Cx43-tetracysteine. A gap junction plaque between two transfected cells is marked with an arrow.
Fig. 8.1-4 details.
(a) FlAsH fluorescence, (b) staining with a Cx43-specific antibody, (c) overlay of these channels combined with a propidium iodide stain (blue) to indicate nucleii.
Two-color pulse chase of connexin43-tetracysteine in HeLa cells. See text for
expected localization to synaptic puncta with no detectable signs of toxicity caused by the procedure. Labeling existing GluR with ReAsH, and freshly synthesized protein with FlAsH gave clear evidence of an increase in the rate of synthesis of GluRl but not GluR2 under pharmacological conditions (activity blockade) known to enhance synaptic strength in such cultures. An important control was that adding the protein synthesis inhibitor cycloheximide,
8.1 The BiarsenicaI-tetracysteine Protein Tag: Chemistry and Biological Applications
immediately after ReAsH labeling, greatly reduced subsequent FlAsH staining indicating that ReAsH had saturated all preexisting tetracysteine-tagged GluR. That both GluR subunits were locally synthesized in dendrites rather than the cell body was demonstrated by physically isolating these regions by transection and showing the appearance of freshly synthesized protein at the synapses. Again, activity blockade only increased the localized synthesis of GluRl and not GluR2 in transected neurons. The application of this technique to other synaptic proteins synthesized in response to activity may permit elucidation of the molecular mechanisms involved in synaptic plasticity. 8.1.4.2.3 Probing the lntracellular Site o f Synthesis ofthe HIV-1 Gag Protein
Recently, the two-color pulse chase has been used to image the dynamics of recently synthesized Gag, a primary structural protein of human immunodeficiency virus type 1 (HIV-1) in living HeLa, Me1 JuSo, and Jurkat T cells [43].The biarsenical-tetracysteine system was used for its small size and because binding of the dye is independent of localized secondary structure unlike GFP that only generates fluorescence after folding (various mutants have half-lives of 30 min-4 h). Gag was tagged with a C-terminal improved sequence (GSMPCCPGCCGC)derived from the first peptide library screen described above, and gave good FlAsH staining in these cell types that colocalized with anti-Gag antibody staining. Deconvolution microscopy revealed that Gag-TC (tetracysteine) localized primarily to discrete areas (possiblylipid rafts) of the (PM) plasma membrane even when using two-color pulse chase to detect recently synthesized protein (-30 min) suggesting that Gag is synthesized close to the PM. Gag-tetracysteine and similar construct containing an extended linker were compatible with forming VLPs when cells were transfected with a plasmid containing the complete HIV-1 genome. These lower expressing viral plasmids also gave good plasma membrane staining; although, the construct with a longer linker showed more intracellular vesicular staining that colocalized with markers for the protein degradation pathway. The importance of posttranslational myristoylation of Gag for correct targeting was demonstrated, as mutations at this site gave diffuse cytoplasmic fluorescence with no plasma membrane or organellar fluorescence. In contrast, mutations in the L-domain required for efficient budding from the PM gave no effect on Gag localization.
8.1.4.3 Environment-sensitive Fluorescent Biarsenicals
The fluorescence of FlAsH and ReAsH are, like the parent fluorophores fluorescein and resorufin, relatively insensitive to the local protein environment that the tetracysteine is surrounded by. Such insensitivity is useful in quantitative studies such as protein localization or trafficking in cells when the protein monitored may experience different local environments. However, environment-sensitive fluorophores are very useful probes of changes in
446
8 Tags and Probes for Chemical Biology
I protein conformation and the synthesis of such biarsenical derivatives has
permitted their use in in vitro studies of purified proteins or in living cells [21, 231. Umezawa and coworkers have pioneered this application of the biarsenical-tetracysteine system with the synthesis of BarNile-EDT2 and mansyl FlAsH-EDT2.Their first strategy was to add two arsenoxide groups to the 9-amino derivative of Nile Red, a well-known environment-sensitive fluorophore containing the phenoxazine ring system of ReAsH. The 9amino group was chosen rather than the diethylamino group of Nile Red because of potential steric hindrance with the dithioarsolanyl substituent; an effect seen with the biarsenical derivative of rhodamines (see Section 8.1.3.1). BarNile specifically bound to tetracysteine peptide fused to GST (glutathione S-transferase) and to calmodulin (CaM)-tetracysteine in vitro and the latter construct gave a small (10%)increase in fluorescence in living cells when Ca2+ was added. CaM undergoes a large conformational change upon metal binding. Larger fluorescent increases of up to almost 40% were achieved with the mansyl-FlAsH (an amino derivative of FlAsH conjugated to the environmentsensitive mansyl fluorophore) possibly through a PET process. CaM labeled with FlAsH at the N-terminal helix has been reported to give a 12% decrease in fluorescence on binding with Ca2+ [16]but the same construct with BarNile gave negligible change. These results indicate the difficulty in predicting the optimal site for an environment-sensitive fluorophore even in well-studied proteins, such as CaM, and the difficulties in interpreting any spectral changes.
8.1.4.4
Fluorescence Anisotropy o f the FIAsH-tetracysteine Complex
The four arsenic-sulfur bonds present in the F1AsH-tetracysteine complex rigidly lock the fluorophore to the peptide so that any rotational motion reflects that of the peptide or protein rather than the dye. This is in direct contrast to conventional coupling chemistries used to modifi. proteins in which the fluorophore is attached by a rotatable single bond to the flexible side chain of an amino acid. 8.1.4.4.1
Protein Dynamics of Calmodulin on Ca2+ Activation
This useful feature of the biarsenical-tetracysteine labeling system was demonstrated [8, 161 by the high fluorescence anisotropy values found for CaM labeled with FlAsH at the N-terminal helix mutated to give the motif CCEQCC. By comparison, CaM labeled with FITC (fluorescein isothiocyanate) at a lysine residue had low anisotropy values in nonviscous aqueous solutions reflecting decoupling of mobility of the fluorophore and protein. Increases in the steady-state anisotropy of the FlAsH-labeled CaM on Ca2+ binding revealed that this helical region rotates somewhat freely of the remaining protein until rigidified by the Ca2+-inducedconformational change [ 161. The ninefold slower labeling of Ca2+-boundCaM labeling by FlAsH was consistent with this, if FlAsH preferentially bound disordered structures.
8. I The Biarsenical-tetracysteine Protein Jag: Chemistry and Biological Applications
8.1.4.4.2
Monitoring Proteolysis with Biarsenical-Tetracysteines
F1AsH-tetracysteine anisotropy was shown to be useful in monitoring the rates and specificity of proteases in cleaving affinity-purification tags from expressed proteins [44]. The initial high values of anisotropy and subsequent large decreases measurable on cleavage of a FlAsH-labeled 3-4 kDa peptide fragment from the target protein permitted parallel real-time measurements in a plate-reader format. Strategies involving a CCPGCC motif adjacent to N-terminal his6 and S-tag affinity sites or when inserted in multidomain proteins were both successful. Optimization of protease cleavage sites and the ability to easily monitor completion of reaction are important for large-scale, automated purification of proteins for structure analysis. Alternative methods using HPLC, capillary electrophoresis or polyacrylamide gel electrophoresis (PAGE)are discontinuous and time-consuming. 8.1.4.4.3 Determining the Structure o f the Phospholamban Pentamer by Homo FRET
The high stability of the biarsenical-tetracysteine complex and the sensitivity of its fluorescence polarization to localized protein dynamics complex has recently been used to probe the structure of the pentameric oligomer of phospholamban [17], a key regulator of contractility in the heart. Tetracysteine sites were formed at three internal sites within the a-helical region involved in oligomerization by mutation of existing amino acids at positions 5, 6, 9, 10, or 23, 24, 27, 28, or 41, 42, 45, 46 to give sequences containing CCLTCC, CCRQCC, and CCLLCC motifs respectively. A fourth construct with an N-terminal tag used the hairpin-favoring sequence MCCPGCCMDK. All these sites could be labeled with biarsenicals apart from CCLLCC site in the transmembrane region of phospholamban. The latter site was probably resistant to labeling because of suppression of cysteine ionization in the membrane. Labeling of these mutant phospholambans with a mixture of FlAsH and ReAsH followed by separation of the pentamer from monomer by gel electrophoreses, revealed the presence of intraoligomer FRET occurring in the gel. Similar FRET could be measured by confocal microscopy in living sf21 insect cells expressing the constructs by again labeling with both FlAsH and ReAsH. The uncertainty of the relative stoichiometries of labeling with FlAsH and ReAsH prevents determination of the distance between the tetracysteine sites in each oligomer. However, measuring homoFRET measurements using in-gel fluorescence anisotropy with F1AsHlabeled phospholamban did allow such distances to be calculated. Surprisingly, the amount of FRET decreased as the labeling site is moved away from the transmembrane region toward the N-terminus and was consistent with the pentamer having a helical pinwheel conformation in which each N-terminus is slightly bent back toward the membrane. This work provides strong evidence that FlAsH can bind tightly to a-helices containing tetracysteine sites without inducing a hairpin turn. The leucine/isoleucine zipper-stabilized quaternary
448
I conformation of phospholamban in these helices presumably disfavors hairpin 8 Tags and Probes for Chemical Biology
formation.
8.1.4.5
Single-molecule Studies Using Biarsenical-tetracysteines
There has been considerable interest in single-molecule experiments in the last 5-10 years as they often give different and unique insights from ensembleaveraged measurements [45]. Fluorescence techniques give the required sensitivity but require very photostable fluorophores such as rhodamines or cyanines. As the biarsenicals currently available are based on the less photoresistant fluorophores of fluorescein and resorufin, single-molecule studies using tetracysteine labeling are somewhat limited. If more photostable biarsenicals can be synthesized, their potential for specificallylabeling proteins in cells and complex mixtures would be of considerable use in single-molecule experiments. Despite these current limitations, the use of both FlAsH and ReAsH has been demonstrated to be feasible for some applications. 8.1.4.5.1 Single-molecule Fluorescence Anisotropy Measurements o f Calmodulin
Protein motions in single F1AsH-labeled CaM molecules tethered to glass slides have been measured by anisotropy using time-correlated single-photon counting in a confocal microscope [46]. Average anisotropy values were similar to bulk measurements but showed wide variability from molecule to molecule. Decay rates indicated that rapid-scale protein motions occur in the N-terminal domain on a nanosecond timescale but limited signal-to-noise levels precluded detailed analysis. Comparable experiments with CaM labeled with Texas Red failed to detect such motions because of faster dye rotation, independent of the protein motions. Nanometer Localization of Single ReAsH-tetracysteineComplexes Selvin and coworkers have shown that a single ReAsH molecule (bound to CaM-tetracysteine on the surface of a glass slide) can be localized with a precision of 5 n m in less than a second using total internal reflection microscopy and imaging the emitted photons with a CCD (charge-coupled device) camera [47]. This technique requires collecting thousands of emitted photons per molecule and determining their center of distribution, and has recently been used by this group to measure the step size of the molecular motors, myosin V and kinesin, using conventional single-molecule fluorophores such as Cy3 and GFP [48-SO]. ReAsH was shown to produce more photons than GFP before photobleaching (but < Cy3) and it could be used to follow the 25 or 40 n m stepping movements with variable stepping rate shown by molecular motors. This study opens up the possibility of using the biarsenical-tetracysteine system to measure the movements of such proteins inside living cells.
8.1.4.5.2
8.7 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological App/ications
8.1.4.6
Photoinduced Generation of Singlet Oxygen by Biarsenicals
Unlike GFP, FlAsH and ReAsH can generate significant amounts of reactive oxygen species such as singlet oxygen when strongly illuminated. The physical processes responsible for this are illustrated in Scheme 8.1-4. Excitation of the biarsenical dye promotes it to its first excited state, which usually rapidly decays to the ground state in a few nanoseconds by emitting a photon, resulting in fluorescence. Occasionally, intersystem crossing in the singlet state results in the formation of the comparatively long-lived (microseconds) triplet state that can sensitize molecular oxygen (whose ground state is triplet) producing highly reactive singlet oxygen and the ground state of the biarsenical dye. Strong illumination can therefore catalytically generate many hundreds to tens of thousands of singlet oxygen molecules per dye molecule. The cycle is stopped by destruction of the dye (photobleaching), often on reaction with singlet oxygen or its by-products. Most fluorophores (and chromophores) are capable of generating singlet oxygen, especially those with increased rates of intersystem crossing (or high triplet quantum yield) resulting from incorporation of atoms of high atomic weight such as bromine, iodine, or sulfur. However, the fluorophore of GFP does not seem to be an efficient generator of singlet oxygen probably because it is deeply buried within the interior of the P-barrel structure (Fig. 8.1-1).Oxygen diffusion to and singlet oxygen diffusion away from the chromophore are probably strongly hindered. Singlet oxygen’s high reactivity (with a halflife in aqueous solutions of a few microseconds) results in highly localized zones of this species around the generating fluorophore. This property has been exploited with the biarsenical-tetracysteine system to locally inactivate proteins (CALI) or to generate localized precipitates that permit visualization of the tagged protein on electron microscopy (diaminobenzidine (DAB) photoconversion).
Scheme 8.1-4 The photogeneration of singlet oxygen by the ReAsH-tetracysteine complex and its use for CALI and photoconversion of diaminobenzidine (DAB).
I
449
450
I
8 Tags and Probesfor Chemical Biology 8.1.4.6.1 Chromophore or Fluorophore Assisted Laser or Light Inactivation (CALI or FALI)
Jay and coworkers used illumination of antibodies that had been labeled with the photosensitizer malachite green or fluorescein and bound to the target protein to acutely inactivate the protein in living cells with high specificity [511. The requirements for antibodies that did not block the biological functions of the protein before CALI, their microinjection into cells, and laser excitation are all limitations, that can be avoided using tetracysteinetagged proteins labeled with biarsenicals. Independently, Davis and coworkers and our group have demonstrated the high efficiency and specificity of FlAsH and ReAsH-mediated CALI in living cells [13, 14, 521. CALI offers advantages to alternative methods-such as transgenic knockouts, siRNA, and pharmacological methods-in that the inactivation can be acute (within a few seconds), it is generally applicable provided the tetracysteine tag is tolerated, and can be spatially targeted to specific cells or even subcellular regions of a single cell. FIAsH-FALI Inactivation of Synaptotagmin
Davis and coworkers used FlAsH-mediated CALI (or FIAsH-FALI) to investigate the function of the synaptotagmin, a protein involved in transmitter vesicle release at synapses and neuromuscular junctions in animals [13]. Drosophila synaptotagmin I (Syt I) tagged with a AEAAARECCRECCARA motif at the C-terminus and expressed in fly larvae Syt I null mutants had no detectable deleterious-biological effects. Specific labeling of the expressed protein in dissected larvae with FlAsH could be achieved with no noticeable background staining or perturbation of normal transmitter release monitored by patch-clamp recordings. A few seconds of illumination with a mercury lamp in an epifluorescence microscope, produced Syt I inactivation without affecting other proteins necessary for vesicle fusion. Surprisingly, almost the same degree of inactivation was achieved when Syt 1 was overexpressed fourfold in wild-type flies suggesting that a dominant negative effect is possible. In further studies [52], CALI of Syt I combined with simultaneous imaging of endocytosis of released vesicles with a GFP-based pH-sensitive reporter, synapto-pHluorin,revealed an additional role for Syt I in endocytosis of released vesicles. ReAsH-mediated CALI of Connexin43 and L-type Calcium Channels
The high efficiency of ReAsH in generating reactive oxygen species (for the photoconversion of DAB in correlated electron microscopy - see below) compared to that seen with FlAsH suggested the use of ReAsH for CALI. Oded Tour of our group measured the inactivation of ReAsH-labeled, tetracysteinetagged gap junctions using a connexin43 construct [14]. Illumination of gap junctions in living cells resulted in a rapid (few seconds), light-dependant decrease in their electrical coupling, measured by double patch-clamp
8.1 The Biarsenical-tetracysteine Protein Tag: Chemistry and Biological Applications
recordings, indicating channel inactivation. FlAsH-labeled gap junctions inactivated substantially slowly, with GFP and monomeric red-fluorescent protein (mRFP) tagged constructs showing minimal CALI. As 12 connexins are required to form one channel, each of which is labeled with ReAsH and capable of generating reactive oxygen species, we tested the ability of CALI to inactivate the a l c L-type Ca channel that contains a single poreforming subunit. Using a N-terminal tag containing two tandem, high-affinity CCPGCC motifs, light-dependant inactivation of the ReAsH-labeled channel could be monitored by whole cell patch-clamprecordings. Importantly, the low expression levels ofthese channels in HEK 293 cells and the nonspecific ReAsH staining prevented visualization of the labeling by fluorescence microscopy but still permitted specific CALI. The inhibitory effects of scavengers such as azide and imidazole, enhancement by D2O and increased concentrations of O2 were evidence for the role of singlet oxygen in CALI. 8.1.4.6.2 ReAsH-mediated Photoconversion of Diaminobenzidine for Correlated Fluorescence and Electron Microscopy (EM)
Imaging of fluorescently labeled proteins or molecules in cells is a powerful technique but its limited resolution often prevents precise subcellular localization and requires the use of electron microscopy. Fluorescent labels do not intrinsically show contrast in the electron microscope but some can photosensitize the polymerization of DAB to generate a localized precipitate that can be stained with electron dense metals. Cells expressing tetracysteinetagged proteins stained with ReAsH can efficiently photoconvert DAB, unlike FlAsH or GFP, and have distinct advantages over alternative EM methods such as immunogold labeling. The large size of antibodies prevents penetration into fixed cells resulting in low labeling efficiencies and requires membrane permeabilization with detergents and low concentrations of fixatives that result in poor preservation of ultrastructure. For photoconversion of tetracysteinetagged proteins, only the small membrane-permeable molecules ReAsH, DAB, and O2 are required that are compatible with fixation protocols that result in excellent cellular preservation. The technique was demonstrated with gap junctions of connexin43tetracysteine that had been pulse chased with FlAsH and ReAsH [ll] as described above in Section 8.1.4.2.1. Fixation of the cells and strong illumination of ReAsH in the presence of DAB and O2 gave localized deposition of precipitates that could be stained with osmium tetroxide. Electron microscopy revealed electron dense material only at cell regions that had been stained with ReAsH (Fig. 8.1-5).Higher magnification of cross sections of gap junction plaques showed structures with appropriate dimensions of individual connexons. Photoconverted vesicles of characteristic size trafficking to or from the plaque could also be visualized. The sensitivity of this method still requires improvement before being generally applicable to all proteins, as it requires high concentrations of
I
451
452
I
8 Jags and Probesfor Chemical Biology
Fig. 8.1-5 Correlated fluorescence and electron microscopy o f ReAsH-labeled connexin43-tetracysteine in HeLa cells. (a) Fluorescence confocal image o f a gap junction plaque after FlAsH (green) and
ReAsH (red) two-color pulse-chase labeling. (b) The corresponding electron micrograph with photoconverted DAB staining indicated with arrows. (c) Higher magnification micrograph of boxed region in (b).
localized protein such as those present in the gap junction. Attempts to improve the ability ofthe biarsenical to generate singlet oxygen have so far been unsuccessful. Adding the heavy atom substituents of classical photosensitizers (e.g., Rose Bengal and eosin) such as bromo groups and replacing the xanthene or phenoxazine ring oxygen with a sulfur atom decrease fluorescence, increase photobleaching and hydrophobicity that decreases the specific labeling in live cells [8]. A more promising strategy may be to increase the photostability of the biarsenical, permitting more singlet oxygen to be produced before photodestruction. This is probably why ReAsH is far superior than FlAsH in photoconverting DAB as it is about fivefold more resistant to photobleach. The ideal molecule will therefore have a modest fluorescent and triplet quantum yields coupled with high photostability. However, ReAsH-mediated photoconversion has proved practical for proteins other than connexin43 including actin, Golgi transmembrane proteins, and DNA-binding proteins (unpublished results from the Ellisman and Tsien labs) 8.1.4.7 Affinity Purification o f Tetracysteine-tagged Proteins
The picomolar affinity of biarsenicals for a tetracysteine peptide and its quick reversal by millimolar concentrations of dithiols make it a useful system
8. I The BiarsenicaI-tetracysteine Protein Tag: Chemistry and Biological Applications
for affinity purification of recombinant proteins. Two similar approaches (with comparable results) have been described that differ only in the chemistry used to immobilize the biarsenical to the solid support. Vale and coworkers prepared a B-alanyl derivative of FlAsH with an aliphatic amino substituent in four steps, suitable for reaction with carboxy-sepharose gel activated as N-hydroxysuccinimide [ 151. Our approach [8] involving coupling a N-hydroxy succinimidyl ester (NHS) of 5-carboxyF1AsH (prepared from carboxyfluorescein by the usual steps of mercuration and transmetallation) with an amino-agarose, has fewer steps, gives higher overall yields, and involves only simple column separations. Despite these minor differences, both supports specifically bind proteins tagged with tetracysteines from bacterial or mammalian lysates with reasonable efficiency. However, care has to be taken to prevent any oxidation of the tetracysteine thiols, which would prevent binding to the support, by the inclusion of reducing agents such as DTT, monothiols, or phosphines. The absence of endogenous tetracysteinecontaining proteins in mammalian and bacterial extracts allows removal of any contaminating proteins with a simple wash containing low concentrations of EDT or BAL. The tetracysteine-tagged protein can then be eluted with a high concentration of a highly water-soluble dithiol antidote such as 2,3dimercaptopropanesulfonate (DMPS)or by DTT. The protein purity obtainable usually exceeds that achievable using the hisG tag. 8.1.4.8 SDS-polyacrylarnide Gel Electrophoresis (PAGE) Analysis
The biarsenical-tetracysteine complex is also stable in the denaturing conditions used in SDS-PAGE and provides a quick method to check the specificity of biarsenical staining in live cells, crude assays, or purified proteins [8, 531. The fluorescent complex can be visualized using a light box and appropriate filters before staining for protein with Coomassie Blue or similar reagents. The high sensitivity of this method with FlAsH has been reported to permit detection limits of less than 1 pmol per band with UV excitation (Invitrogen), and would probably be higher using blue light. The high concentration of thiols such as 2-mercaptoethanol or DTT used in some sample buffers can disrupt the complex so it is advisable to use phosphines or low concentrations of monothiols or DTT as reducing agents [8]. Any unbound FlAsH runs with the tracking dye front as a brightly fluorescent band presumably because of loss of EDT during sample preparation or electrophoresis. 8.1.5 Future Developments and Applications
Work in progress on future applications of the tetracysteine-biarsenical system includes targeting fluorescent indicators, such as Ca2f sensors to voltagegated Ca2+ channels on the plasma membrane (unpublished results [22]).
I
453
454
I Unexpected localized hotspots of channel activity have already been visualized 8 Tags and Probesfor Chemical Biology
with this approach in tissue culture cells; future work in more biologically interesting cell types such as neurons is likely to be informative about the spatial and temporal dynamics of Ca2+ signaling pathways. Using imaging modes other than fluorescence are currently being explored, such as luminescence which in conjunction with time-resolved imaging could lead to more sensitive detection limits of tagged proteins in cells. Transgenic animals expressing tetracysteine-tagged proteins have already been described but the development of new protocols or biarsenical derivatives for labeling live animals and tissues will probably be required. Nonspecific staining will probably be the major limitation in these applications and better antidotes will be required to either prevent or remove such background. This will be aided by the more-than-GOyears research into more effective dithiol antidotes to combat the enormous human health problems resulting from arsenic-contaminatedwater supplies in many parts of the world today. Finally, this method has considerable potential as a general approach for site-specific labeling of crude or purified proteins in vitro with any desired probe, for example, for phosphorescence and fluorescence anisotropy, FRET, NMR, EPR (electron paramagnetic resonance), and so on, by simple conjugation to a biarsenical. The ability to predict where tetracysteines can be inserted and labeled in proteins may become possible with the determination of the three-dimensional structure of the complex and will lead to more applications, both in vitro and in living cells. 8.1.6 Conclusions
Designing protein tags for use in living cells requires chemistry compatible with the complex biochemical milieu that proceeds with high reactivity and selectivity. The biarsenical-tetracysteine method was one of the first such methods to be developed and the lessons learnt during the process and from its application to address biological questions should be of general interest to chemical biologists. Acknowledgments
I would like to thank all my coworkers over the years in the Tsien and Ellisman labs who have contributed to the development of the biarsenical-tetracysteine method, particularly Roger Tsien for devising the original concepts and for continual input into their improvement. References 1. J. Zhang, R. Campbell, A. Ting,
R. Tsien, Creating new fluorescent
probes for cell biology, Nat. Rev. Mol. Cell Biol. 2002, 12, 906-918.
References I 4 5 5 2.
3.
4.
5.
6.
7.
a.
9.
10.
11.
R. Tsien, The green fluorescent protein, Annu. Rev. Biochem. 1998, 67, 509-544. A. Keppler, S. Gendreizig, T. Gronemeyer, H. Pick, H. Vogel, K. Johnsson, A general method for the covalent labeling of fusion proteins with small molecules in vivo, Nat. Biotechnol 2003, 21, 86-89. A. Juillerat, T. Gronemeyer, A. Keppler, S. Gendreizig, H. Pick, H. Vogel, K. Johnsson, Directed evolution of 06-alkylguanine-DNA alkyltransferase for efficient labeling of fusion proteins with small molecules in vivo, Chem. Biol. 2003, 10, 313-317. A. Keppler, H.Pick, C. Arrivoli, H. Vogel, K. Johnsson, Labeling of fusion proteins with synthetic fluorophores in live cells, Proc. Nutl. Acad. Sci. 2004, 101,9955-9959. J. Farinas, A. Verkman, Receptormediated targeting of fluorescent probes in living cells, /. B i d . Chem. 1999,274,7603-7606. B. Griffin, S. Adams, R. Tsien, Specific covalent labeling of recombinant protein molecules inside live cells, Science 1998, 281, 269-272. S. Adams, R. Campbell, L. Gross, B. Martin, G. Walkup, Y. Yao, J. Llopis, R. Tsien, New biarsenical ligands and tetracysteine motifs for protein labeling in vitro and in vivo: synthesis and biological applications, J. Am. Chem. SOC.2002, 124, 6063 - 6076. I. Chen, M. Howarth, W. Lin, A. Ting, Site-specific labeling of cell surface proteins with biophysical probes using biotin ligase, Nut. Methods 2005, 2, 99-104. B. Griffin, S. Adams, J. Jones, R. Tsien, Fluorescent labeling of recombinant proteins in living cells with FlAsH, Methods Enzymol.2000, 327, 565-578. G. Gaietta, T. Deerinck, S. Adams, J , Bouwer, 0. Tour, D. Laird, G. Sosinsky, R. Tsien, M. Ellisman, Multicolor and electron microscopic imaging of connexin trafficking, Scie~ce2002,296,503-507. ~
12.
13.
14.
15.
16.
17.
la.
19.
20.
W. Ju,W. Morishita, J. Tsui, G. Gaietta, T. Deerinck, S. Adams, C. Garner, R. Tsien, M. Ellisman, R. Malenka, Activity-dependent regulation of dendritic synthesis and trafficking of AMPA receptors, Nut. Neurosci. 2004, 7, 244-253. K. Marek, G . Davis, Transgenically encoded protein photoinactivation (FIAsH-FALI):acute inactivation of synaptotagmin I, Neuron 2002, 36, 805-813. 0. Tour, R. Meijer, D. Zacharias, S. Adams, R. Tsien, Genetically targeted chromophore-assisted light inactivation, Nat. Biotechnol. 2003, 21, 1505- 1508. K. Thorn, N. Naber, M. Matuska, R. Vale, R. Cooke, A novel method of affinity-purifying proteins using a bis-arsenical fluorescein, Protein Sci. 2000, 9,213-217. 8. Chen, M. Mayer, L. Markillie, D. Stenoien, 7 . Squier, Dynamic motion of helix A in the amino-terminal domain of calmodulin is stabilized upon calcium activation, Biochemistry 2005, 44, 905-914. S . Robia, N. Flohr, D. Thomas, Phospholamban pentamer quaternary conformation determined by in-gel fluorescence anisotropy, Biochemistry 2005,44,4302-4311. C. Hoffmann, G . Gaietta, M. Bunemann, S. Adams, S. Oberdorff-Maass, B. Behr, J. Vilardaga, R. Tsien, M. Ellisman, M. Lohse, A FIAsH-based FRET approach to determine G proteincoupled receptor activation in living cells, Nut. Methods 2005, 17, 171-176. K. Marks, M. Rosinov, G. Nolan, In vivo targeting of organic calcium sensors via genetically selected peptides, Chem. Biol2004, 11, 347-356. K. Stroffekova, C. Proenza, K. Beam, The protein-labeling reagent FLASH-EDT2binds not only to CCXXCC motifs but also non-specifically to endogenous cysteine-rich proteins, Pflugers Arch. 2001, 442,859-866.
456
I
8 Tags and Probesfor Chemical Biology 21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
J. Nakanishi, M. Maeda, Y. Umezawa, A new protein conformation indicator based on biarsenical fluorescein with an extended benzoic acid moiety, Anal. Sci. 2004, 20, 273-278. R. Tsien, Building and breeding molecules to spy on cells and tumors, FEBS Lett. 2005,579,927-932. J. Nakanishi, T. Nakajima, M. Sato, T. Ozawa, K. Tohda, Y. Umezawa, Imaging of conformational changes of proteins with a new environmentsensitive fluorescent probe designed for site-specificlabeling of recombinant proteins in live cells, ~ ~them, ~ 2001, l 73,, 2920-2928, R. Ebright, Y. Ebright, Reagents and procedures for high-specificity labeling, U. S. Pat. Appl. 2004, US 0019104 A l . B, Martin, B, ,-iepmans, s. Adams, R. Tsien, Mammalian cell-based optimization of the biarsenicalbinding tetracysteine motif for improved fluorescence and affinity, Nat. Biotechnol. 2005, 23, 1308-1314. Instruction Manualfor Lumid" In-Cell Labeling Kits, InVitrogen Life Technologies, Carlsbad, 2003. V. Boyd, J. Harbell, R. O'Connor, E. McGown, 2,3-Dithioerythritol,a possible new arsenic antidote, Res. Toxicol. 1989, 2, 301-306. L. Miller, J. Sable, P. Goelet, M. Sheetz, V. Cornish, Methotrexate conjugates: a molecular in vivo protein tag, Angew. Chem. Int. Ed. Engl. 2004, 43,1672-1675. E. Guignet, R. Hovius, H. Vogel, Reversible site-selectivelabeling of membrane proteins in live cells, Nat. Biotechnol. 2004, 22, 440-444. N. Johnsson, K. Johnsson,A fusion of disciplines: chemical approaches to exploit fusion proteins for functional genomics, Chembiochem 2003, 4, 803-810. I. Chen, A. Ting, Site-specificlabeling of proteins with small molecules in live cells, Curr. Opin. Biotechnol. 2005, 16,35-40. L. Miller, V. Cornish, Selective chemical labeling of proteins in living
33.
34.
35.
36.
37.
38.
39.
40.
41.
cells, CUT. Opin. Chem. Biol. 2005, 9, 56-61. R. Panchal, G. Ruthel, T. Kenny, G. Kallstrom, D. Lane, S. Badie, L. Li, S. Bavari, M. Aman, In vivo oligomerization and raft localization of ebola virus protein VP40 during vesicular budding, Proc. Natl. Acad. Sci. 2003, 100,15936-15941. E. Cambronne, J. Sorg, 0 . Schneewind, Binding of SYcH chaperone to YscM1 and YscM2 activates effector yop expression in Yersinia enterocolitica, J . Bacteriol. 2004, 186,829-841. W. Huh, I. Falvo, L. Gerke, A. Carroll, R. Howson, J. Weissman, E. O'Shea, Global analysis of protein localization in budding yeast, Nature 2003, 425, 686-691. M. Andresen, R. Schmitz-Salue, S. Jakobs, Short tetracysteine tags to beta-tubulin demonstrate the significance of small labels for live cell imaging, Mol. Biol. Cell. 2004, 15, 5616-5622. J. Vilardaga, M. Bunemann, C. Krasel, M. Castro, M. Lohse, Measurement of the millisecond activation switch of G protein-coupled receptors in living cells, Nat. Biotechnol. 2003, 21, 807-812. J , Goldstein, N, Waterhouse, p. Juin, G. Evan, D. Green, The coordinate release of cytochrome c during apoptosis is rapid, complete and kinetically invariant, Nat, Cell Biol, 2000,2,156-162. J. Goldstein, C. Mufioz-Pinedo, J.-E. Ricci, S. Adams, A. Kelekar, M. Schuler, R. Tsien, D. Green, Cytochrome c is released during apoptosis in a single step, Cell Death if^^,2005, 12, 453-462, M. ~ iK, Czymmek, ~ ~ ,E. ~ ~The i potential of nucleic acid repair in functional genomics, Nat. Biotechnol. 2001, 19,321-326. M. Rice, M. Bruner, K. Czymmek, E. Kmiec, In vitro and in vivo nucleotide exchange directed by chimeric RNA/DNA oligonucleotides in saccharomyces cerevisae, Mol. Microbiol. 2001, 40, 857-868.
~
References I 4 5 7 42.
43.
44.
45.
46.
47.
48.
Z. Ignatova, L. Gierasch, Monitoring protein stability and aggregation in vivo by real-time fluorescent labeling, Proc. Natl. Acad. Sci. 2004, 101, 523-528. L. Rudner, S. Nydegger, L. Coren, K. Nagashima, M. Thali, D. Ott, Dynamic fluorescent imaging of human immunodeficiency virus type 1 gag in live cells by biarsenical labeling, /. Virol. 2005, 79, 4055-4065. P. Blommel. B. Fox, Fluorescence anisotropy assay for proteolysis of specifically labeled fusion proteins, Anal. Biochem. 2005, 336, 75-86. S. Weiss, Measuring conformational dynamics of biomolecules by single molecule fluorescence spectroscopy, Nat. Struct. Biol. 2000, 7, 724-729. X. Tan, D. Hu, T. Squier, H. Lu, Probing nanosecond protein motions of calmodulin by single-molecule fluorescence anisotropy, Appl. Phys. Lett. 2004, 85, 2420-2422. H. Park, G. Hanson, S. Duff, P. Selvin, Nanometre localization of single ReAsH molecules, J Microsc. 2004,216,199-205. A. Yildiz, J. Forkey, S. McKinney, T. Ha, Y. Goldman, P. Selvin, Myosin V walks hand-over-hand: single fluorophore imaging with 1.5-nm
49.
50.
51.
52.
53.
localization, Science 2003, 300, 206 1- 2065. A. Yildiz, M. Tomishige, R. Vale, P. Selvin, Kinesin walks hand-over-hand, Science 2004, 303, 676-678. G . Snyder, T. Sakamoto, J. Hammer, J. Sellers, P. Selvin. Nanometer localization of single green fluorescent proteins: evidence that myosin V walks hand-over-hand via telemark configuration, Biophys. /. 2004, 87, 1776-1783. F. Wang, D. Jay, Chromophoreassisted laser inactivation (CALI): probing protein function in situ with a high degree of spatial and temporal resolution, Trends Cell Biol. 1996, 6, 442-445. K. Poskanzer, K. Marek, S. Sweeney, G. Davis, Synaptotagmin I is necessary for compensatory synaptic vesicle endocytosis in vivo, Nature 2003,426,559-563. G. Feldman, R. Bogoev, J. Shevirov, A. Sartiel. I. Margalit, Detection of tetracysteine-tagged proteins using a biarsenical fluorescein derivative through dry microplate array gel electrophoresis, Electrophoresis 2004, 25,2447-2451.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim 458
I
8 Jags and Probesfor Chemical Biology 8.2 Chemical Approaches to Exploit Fusion Proteins for Functional Studies
Anke Arnold, India Sielas Nils Johnsson, and Kai Johnsson
Outlook
Contemporary approaches to study protein function often rely on the expression of the protein of interest as a fusion protein with an additional polypeptide or tag, whose role is to aid in the purification, detection, or functional characterization of the corresponding fusion protein. Recently, the role of these polypeptides has been extended to mediate the labeling of the protein of interest with chemically diverse compounds to monitor and manipulate protein function in both living cells and in vitro. To highlight the potential and limitations of this approach we discuss in this chapter two methods developed in our laboratories for the specific and covalent labeling of fusion proteins in living cells and in vitro.
8.2.1 Introduction
Proteins participate in almost all biological processes and a detailed understanding of protein function and mechanism is therefore a prerequisite for an understanding of biological processes on a molecular level. The function of a protein is affected by temporal control of its expression, its localization and posttranslational modifications, its chemical microenvironment, and interactions with other biomolecules. Because of the enormous complexity of this problem and the large number of different proteins that are expressed in any given cell, cell biologists and protein chemists are struggling to invent more efficient and generally applicable methods for studying proteins. By far the most successful strategy to meet this challenge has been the use of fusion proteins. Here, the protein of interest is genetically engineered to contain an additional sequence either at its N- or C-terminus. This so-called tag equips the resulting fusion with a unique property that can be exploited to study certain activities of the protein. Tags of fusion proteins currently have two main applications: their use in purification schemes and as tools to explore the basic cellular properties of the protein. Examples of tags used in purification schemes are the polyhistidine tag recognized by immobilized metals, and glutathione S-transferase recognized by immobilized glutathione [l,21. Most applications for the use of fusion proteins to examine biological functions in live cells involve the use of autofluorescent proteins, the green fluorescent protein (GFP) being the most prominent example [3, 41. Coupled with sensitive imaging techniques, the behavior of autofluorescent fusion Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies
proteins can be observed in real time, thereby giving new insights into the dynamic distribution and localization of proteins in the cell. Other prominent fusion protein-based approaches to study protein function in live cells include the yeast two-hybrid system and the split-ubiquitin sensor, two methods that allows the characterization and identification of protein-protein interactions [5, 61. One impressive proof of the enormous utility of fusion proteins can be found in the efforts to fuse all open reading frames of a given organism to an appropriate tag and to exploit the properties of the tag to investigate certain aspects of the biological function of the corresponding fusion protein library. Examples include efforts to construct genome-wide protein interaction maps using the two-hybrid system, to gain an inventory of all cellular protein complexes using affinity tags, to observe the intracellular localization of all proteins in the cell via GFP fusions, to map protein-protein interactions among 705 integral membrane proteins of the yeast Saccharomyces cerevisiae, or to display the entire proteome of an organism as a protein microarray using a polyhistidine tag and nickel-coated glass slides [7-121. As impressive these efforts were, the genome-wide application of fusion proteins dramatically revealed two shortcomings of the currently available tags: their limitation to properties that can be genetically encoded and the restriction of each fusion tag to one particular type of functional assay. The latter point is acceptable for the studies of individual proteins but more bothersome for genome-wide approaches. In recent years, a new approach to exploit fusion proteins in functional proteomics, which addresses these limitations has been developed: this new approach is based on a tag-mediated labeling of fusion proteins, either in vitro or in live cells, with synthetic molecules that transfer a unique and specific property to the fusion protein [ 131.
8.2.2 General Considerations
The labeling of proteins with small molecules that can serve as spectroscopic probes or cross-linkers is one of the cornerstones of protein chemistry. However, the lack of specificity of the underlying chemistry used in traditional protein labeling makes its application in the living cell or complex protein mixtures, impossible. Currently, there exist two different strategies to equip proteins with synthetic probes to monitor and manipulate protein function: the incorporation of unnatural amino acids pioneered by the group of Schultz [14], and the use of protein tags to mediate an exclusive labeling of synthetic molecules, which will be the focus of this article. To be of general use, the mechanism of labeling must be sufficiently promiscuous with respect to the synthetic molecule so that different functionalities can be attached to the tag but at the same time highly specific with respect to the protein tag so that only the fusion protein is labeled with the synthetic molecule. Currently used approaches for the labeling of fusion proteins with small molecules or
I
459
460
8 Jags and Probes for Chemical Biology
I ligands that carry the desired functionality can be classified into three groups: (a) intein-based labeling of proteins with small molecules; (b) tags that bind to a small molecule through noncovalent interactions, and (c) tags that bind to a small molecule through covalent bond formation. Intein-based approaches are a powerful method for the derivatization and semisynthesis of proteins in uitro and applications of this approach will be discussed in detail in a different chapter of this book [15, 161. Concerning the labeling of proteins with small synthetic probes in live cells, an approach based on transsplicing inteins has been developed by the group of Tom Muir [17].This approach is very elegant as the intein tag removes itself in the process of the labeling, however, its general applicability remains to be shown. A list compiling the approaches developed so far for the noncovalent or covalent labeling of fusion proteins is shown in Table 8.2-1. Concerning the labeling of fusion proteins via tags that noncovalently interact with small molecules, a variety of different approaches has been developed over the last few years. These include antibodies binding to haptens, streptavidin binding to biotin derivatives, dihydrofolate reductase (DHFR) binding to methotrexate (Mtx) or trimethoprim derivatives, FKBP12 mutants binding to a synthetic ligand, and short peptides binding to derivatized a-bungarotoxin or to Texas red derivatives [18-241. These tags have been successfully used for the labeling of fusion proteins with fluorophores and other probes in live cells. A good example is the study of receptor trafficking of the a -amino-3-hydroxy-5-methyl-4-isoxazole-propionate(AMPA) receptor [241. In this study, the AMPA receptor was expressed as fusion protein with an a-bungarotoxin-binding peptide and subsequently labeled with fluorescent, radioactive, or biotinylated a-bungarotoxin derivatives. Using this approach, the total receptor expression, surface expression, internalization, and insertion of receptors into the plasma membrane could be visualized and quantified in fixed or live cells. A possible limitation of tags labeled through noncovalent interactions is the reversibility of the labeling. This feature is disadvantageous for applications such as pulse-chase type labeling experiments, long-term studies, and the detection of the labeled protein under denaturing conditions. The remaining part of the chapter is therefore dedicated to discussing the approaches for a covalent labeling of fusion proteins, in more detail. The first tag allowing for a covalent labeling of fusion proteins in vitro and in live cells was the tetracysteine tag that specifically binds to biarsenical compounds such as FlAsH, a biarsenical fluorescein derivative [25,33].The two main advantages of the tetracysteine tag is its relatively small size, which can be as small as 6 amino acids (CCPGCC in one-letter code), and the possibility to use different fluorophores. Potential disadvantages of the approach are the reported unspecific binding of the biarsenical fluorophores and the need to coincubate with dithiols such as 1,2-ethanedithiol to minimize this unspecific binding. However, the Tsien group recently reported sequences with increased affinity toward biarsenical compounds that should enhance the performance of the approach in live cells [34]. The use of the tetracysteine tag is discussed in detail by Steve Adams in another chapter of this book.
cY-Bungarotoxinbinding peptide DHFR
Texas red binders
His tag
N-terminal Cys
182
AGT
NTA derivatives
38,42 Texas red derivatives 1 3 a-Bungarotoxin derivatives (74 a.a.) 157 Methotrexatel trimethoprim derivatives
16
Benzylguanine derivatives >l[’’] Thioester derivatives
None
None
None
None
None
None
Dithiols to suppress unspecific binding
6-12
Tetracysteine tag
Biarsenical fluorophores
Required additives
Sizela] Label
Tag
with synthetic molecules
Table 8.2-1 Tags used for the selective labeling of fusion proteins
Intracellular
Intracellular, cell surface Intracellular
Intracellular, cell surface Intracellular, cell surface Cell surface
Covalent and reversible
Covalent and irreversible Covalent and irreversible Noncovalent and reversible Noncovalent and reversible Noncovalent and reversible Noncovalent and reversible
Intracellular
Applications
Type of linkage
-
-
[21, 221
PI
~ 7 1
[261
(251
References
(continued overleaf)
Limited specificity of labeling; slow reaction
-
Cell surface applications require reduction of disulfide bonds
Comments
%
3
2 2.
2 Lo.
+ .P.
‘p
::
2
5n
a
‘D
n b
h,
Po
Phosphopantetheine transferase Biotin ligase, ATP
Biotin ligase, ATP, hydrazide derivatives
Hapten derivatives
CoA derivatives
Keto isostere of biotin
-250
>77
315
115
scFv
CP
Biotin acceptor peptides
Biotin acceptor peptides ~
a Size is given in amino acids; AGT 06-alkylguanine-DNA alkyltransferase; DH FR - dihydrofolate reductase; NTA - nitrilotriacetic acid; CP - carrier protein. b Requires expression as N-terminal fusion with ubiquitin or intein.
Biotin
None
>125 Biotin derivatives
Streptavidin None
None
Synthetic ligand
Required additives
108
Size[a] Label
(continued)
FKBP12 mutant
Tag
Table 8.2-1
Covalent and reversible
Noncovalent and reversible Noncovalent and reversible Noncovalent and reversible Covalent and irreversible Covalent and irreversible
Type of linkage
Cell surface
Intracellular, cell surface
Intracellular, cell surface Cell surface
Intracellular
Intracellular
Applications
Allows use of derivatized streptavidins to label cell surface proteins Two-step labeling required
Comments
[321
[311
References
P m N
a
-0
sl.
5
n
B
2
00
-
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies
We have developed two general approaches for the covalent labeling of fusion proteins with chemically diverse compounds. The first approach allows for the labeling of fusion proteins of human O'-alkylguanine-DNA alkyltransferase (AGT) with synthetic molecules [26]. The labeling is based on the irreversible and specific reaction of ACT with 06-benzylguanine(BG) derivatives, leading to the irreversible transfer of a synthetic probe to a reactive cysteine residue of AGT. The second approach is based on the expression of the protein of interest as a fusion with acyl carrier protein (ACP) and the specific ligation of chemically diverse compounds to this fusion protein using a phosphopantetheine transferase (PPTase), an approach that is particularly well suited for the labeling of cell surface proteins [30]. Using these two technologies as representative examples, we will discuss in the following text the potential and limitations of such a chemical approach to exploit fusion proteins for functional studies. As the tags are only the links between the protein of interest and the synthetic molecule, at least some of the experiments described in the following section could have been performed with different tags but similar synthetic molecules.
8.2.3 Applications and Practical Examples
8.2.3.1
Labeling ofACT Fusion Proteins
ACT is a DNA repair enzyme that reverts lesions resulting from the 06-alkylation of guanine [35]. DNA repair is achieved by irreversibly transferring the alkyl group to a reactive cysteine of AGT. The mechanism is in as much unusual, as alkylated AGT is not regenerated after repair but degraded at some point after alkyl transfer. Taking advantage of the observation that human ACT not only reacts with alkylated guanine incorporated in DNA but also with the base BG, we demonstrated that BG derivatives carrying various labels at the 4-position of the benzyl ring can be used for the specific labeling of AGT fusion proteins, both in living cells and in vitro (Fig. 8.2-la) [26]. Importantly, the labeling is highly specific for ACT fusion proteins as BG derivatives do not show any appreciable reactivity toward other proteins or simple nucleophiles. Also important for practical applications is the ease with which BG derivatives can be synthesized, resulting already in a large number of derivatives.(Fig. 8.2-lb) Wild-type human AGT is a monomeric protein of 207 residues that due to its affinity for DNA is located in the nucleus of the cell [36]. To remove this undesired feature we engineered the properties of AGT so that the affinity for DNA was suppressed and at the same time its activity against BG derivatives was increased by a factor of about 50 [37, 381. When expressed in mammalian cells, these mutants show cellular distributions similar to GFP and the increased activity of ACT against BG derivatives also translates into a more efficient labeling of ACT fusion proteins [39]. Another potential limitation of
1
463
464
I
8 Tags and Probesfor Chemical Biology
hTA?
: ~ k coo-
{
BGBT Biotin
BGMtx Methotrexate
BGBD Biotin and dinitrophenol
Fig. 8.2-1 (a) General scheme for labeling ofACTfusion proteins using BC derivatives. (b) BC derivatives described in this work; compounds are listed with their abbreviations and the name ofthe label transferred t o ACT fusion proteins.
the approach is interference from endogenous wild-type 06-alkylguanine-DNA alkyltransferase (wtAGT). In contrast to AGTs from yeast or Escherichia coli that do not react with BG derivatives, mammalian wtAGTs can react with BG derivatives. Although the activities of the currently used mutants are at least 50-fold higher than wtAGT, the mammalian wtAGTs might still lead to some unwanted background labeling. To exclude such an undesired labeling of endogenous wtAGT, the specific labeling of AGT fusion proteins has
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies
initially been restricted to AGT-deficient mammalian cell lines. To address this limitation in a more general way, we have synthesized an inhibitor of wtAGT and have generated AGT mutants that are resistant to this inhibitor. This scheme allows to inactivate wtAGT, whereas labeling of the AGT mutant is still possible [40]. The ability to specifically label AGT fusion proteins in the presence of endogenous AGT by a brief preincubation of the cells with a small molecule, significantly broadens the scope of application of AGT fusion proteins in living cells. One important application of AGT fusion proteins is fluorescence labeling. Currently, a variety of different fluorophores have been coupled to BG and depending on cell permeability, these molecules can be used for the labeling of AGT fusion proteins in live cells [39]. Up to now, the BG derivatives of the fluorophores: fluorescein, Oregon green, rhodamine green, tetramethylrhodamine, and SNARF-1 have been used for fluorescence labeling within live cells, whereas the conjugates of BG with fluorophores such as Cy3 and Cy5 proved to be cell impermeable and are therefore more suitable for applications on cell surfaces and in vitro (Fig. 8.2-l(b)).The possibility to choose the nature of the fluorophore and the time point of labeling opens up a number of interesting applications. One such application is sequential labeling of ACT fusion proteins to distinguish older copies from newer copies of the same protein within one cell through multicolor analysis. To demonstrate the feasibility of multicolor imaging of AGT fusion proteins, we analyzed the translocation of the temperature-sensitiveglycoprotein of vesicular stomatitis virus-0'-alkylguanine-DNA alkyltransferase (tsVSVG-AGT)[39].The tsVSVG is a membrane protein that is transported via the secretory pathway to the plasma membrane at permissive temperatures, whereas at the restrictive temperature of 40 "C the protein reversibly misfolds and is retained in the endoplasmic reticulum (ER) [41]. AGT was fused to the cytoplasmic C-terminus of tsVSVG (tsVSVG-AGT)and Chinese hamster ovary cells (CHO) transiently expressing tsVSVG-AGT at the permissive temperature of 34 "C show efficient transport of the fusion protein to the plasma membrane, as demonstrated by labeling with fluorescein. We then tried to discriminate between tsVSVG-AGT populations expressed before and after a temperature shift through sequential labeling with different fluorophores. In these experiments, CHO cells transiently expressing tsVSVG-AGT were incubated for 20 h at 34 "C and subsequently labeled with fluorescein. The temperature of the medium was then shifted to 40°C for 75 min, and the tsVSVG-AGT synthesized at this temperature was labeled with SNARF-1. Subsequent fluorescence imaging of cells demonstrated that fluorescein-labeled tsVSVG-AGT was located predominantly in the plasma membrane, whereas SNARF1-labeled tsVSVG-AGTwas predominantly located in internal membrane structures, most likely perinuclear E R or Golgi (Fig. 8.2-2). The data clearly demonstrate that within live cells, older and newer copies of AGT fusion proteins can be discriminated by sequential labeling with different fluorophores.
I
465
466
I
8 Tags and Probesfor Chemical Biology
Fig. 8.2-2 Multicolor analysis of tsVSVC-ACT. (a-c) Sequential labeling o f tsVSVC-ACT: Labeling with fluorescein at permissive temperature (34 “C) and with SNARF-1 at nonperrnissive temperature (40°C). (a) Overlay oftransmission and
fluorescence micrographs. (b) Fluorescence channel for fluorescein-labeled tsVSVC-ACT (ex. 488 nrnlern. 505-530 nm). (c) Fluorescence channel for SNARF-1-labeled tsVSVC-ACT (ex. 543 nm/em.>650 nm).
Fluorescence resonance energy transfer (FRET) measurements between pairs of (auto) fluorescent proteins are an important tool to investigate the spatial distance between proteins in a time resolved manner [42].Furthermore, various reporter systems to monitor cellular parameters are based on intramolecular FRET between two autofluorescent proteins fused in a single polypeptide chain [3].To demonstrate the use of AGT fusion proteins in FRET applications, we constructed a fusion protein between enhanced green fluorescent protein (EGFP) and O6-a1kylguanine-DNAalkyltransferase-nuclear localization sequence (AGT-EGFP-NLS3)[39].Here, EGFP would be the donor (488 nm) and SNARF-1-labeled AGT would be the acceptor for intramolecular FRET. The broad absorbance of SNARF-1-labeledAGT in the region from 500 to 580 nm at pH 7.4 makes it an ideal acceptor in such experiments. Labeling of AGTEGFP-NLS3 with 5 pM beuzylguanine-SNARF (BGSF) for 1 h led to a 98% decrease in EGFP emission at 505 to 530 nm, indicating both efficient FRET as well as an efficient labeling of AGT-EGFP-NLS3with SNARF-I. As SNARF-1 can also be excited to some extent at 488 nm, we assumed that the observed emission above 650 nm is due to both direct laser excitation as well as FRET. The labeling ofAGT fusion proteins is not restricted to monitor protein function, but can be also used to manipulate it. One example for this approach is induced protein dimerization through covalent labeling ofAGT fusion proteins [43].The dynamic dimerization and dissociation of pairs of proteins plays an important role in various biological processes. As a chemical approach to study processes that depend on protein dimerization, the teams of Schreiber, Crabtree have introduced “chemical inducers ofdimerization” (CIDs) [44].CIDs are cell-permeable molecules, which can bind simultaneously to two different proteins, thereby inducing their dimerization. Various biological processes have
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies
been controlled and studied with this approach, including signal transduction and control of transcription in eukaryotic and prokaryotic cells. Previously used CIDs relied on noncovalent interactions and we extended the approach through covalent labeling of AGT fusion proteins with ligands capable of interacting with other proteins. As a first ligand we chose Mtx. Mtx is a tight-binding inhibitor of DHFR, and heterodimers of Mtx and dexamethasone, a ligand of the glucocorticoid receptor (GR),have been used as CIDs to control transcription in yeast [45]. In this so-called three-hybrid system, a DNA-binding domain and a transcriptional activation domain were expressed as DHFR and GR fusion proteins, respectively, and transcription was initiated through the addition of the CID. On the basis of these studies, we synthesized a 06-benzylguaninemethotrexate (BGMtx) heterodimer as CID (Figs. 8.2-l(b)and 3) [43]. To use BGMtx as CID in a three-hybrid system, we constructed fusion proteins ofAGT with the DNA-binding domain LexA and of DHFR with the transcriptional activation domain B42 (Fig. 8.2-3).The in vivo labeling of the AGT fusion protein with Mtx using BGMtx then induced the dimerization of the AGT and DHFR fusion proteins, leading to stimulation of transcription of a reporter gene. Pairs of plasmids encoding LexA and B42 fusion proteins were transformed into the yeast strain L40, in which the dimerization of LexA and I342 fusion proteins leads to transcription of the reporter genes H I S 3 and lacZ. Growing these yeast strains in the presence of BGMtx then complemented the histidine auxotrophy of the yeast and also induced the expression of B-galactosidase. These experiments clearly showed that BG derivatives can be used as CIDs to control transcription in yeast and, more generally, also demonstrated how AGT fusion proteins can be used to control protein dimerization in vivo.
Fig. 8.2-3 ACT-based three-hybrid system.
I
467
468
I
8 Jags and Probesfor Chemical Biology
The previously described experiments focused on labeling of AGT fusion proteins in live cells, but a number of interesting in vitro applications are also possible. One important application is the covalent immobilization of AGT fusion proteins. We were able to show that by linking BG via a flexible linker to a bioinert surface, AGT fusion proteins can be specifically and covalently immobilized (Fig. 8.2-4) [4G]. Importantly, AGT fusion proteins generally retain their function after immobilization and because of the specificity of the reaction can be also directly immobilized out of cell extracts without prior purification of the fusion protein. These features make AGT fusion proteins particularly well suited for the generation of protein microarrays. Protein microarrays are regarded as one key research tool in proteomics [47]. They are generated by arraying a repertoire of different proteins on a solid support at high spatial density for the subsequent characterization of the immobilized proteins. While the continuous identification of proteins with unknown function fuels a need for high-throughput technologies for their characterization, protein function microarrays so far have been used in relatively few laboratories. The reasons for this are the technological hurdles associated with their generation, in particular, the parallel expression and purification of large numbers of proteins and their subsequent immobilization on the microarray in a functional state. AGT fusion proteins appear as attractive candidates for applications on protein microarrays for two main reasons: firstly,the specificity of the reaction between AGT and BG should allow a direct covalent immobilization of fusion proteins from complex mixtures. This is particularly important when large numbers of proteins must be processed in parallel. Secondly, the ability to both immobilize and to fluorescence label, the AGT fusion proteins should facilitate rapid screening for protein-protein interactions by generating a defined array of AGT fusion proteins and subsequent probing of the microarray with a fluorescencelabeled AGT fusion protein (Fig. 8.2-5(a)).Such a strategy requires that both
Fig. 8.2-4
General scheme for immobilization of ACT fusion proteins
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies
Fig. 8.2-5 (a) Use of ACT-based protein microarrays t o screen for protein-protein interactions. (b) Purified ACT-FKBP and ACT-FRB (both 1 p M ) were immobilized in arrays o f 8 x 8 spots each on a BG-covered glass. The slide was then incubated with a solution containing Cy3-labeled ACT-FKBP (100 pM), Cy5-labeled ACT-FRB (100 pM),
and rapamycin (100 nM) and afterwashing analyzed for fluorescence: (1) detection o f Cy3, (2) detection o f Cy5 on same microarray as in ( l ) , (3) overlay of (1) and (2). (c) Same experiments as in (b) but using cell lysates o f E. coli BL21 (DE3) expressing either ACT-FKBP or ACT-FRB for spotting.
the labeling of the protein and its immobilization are practically irreversible, a requirement that is not fulfilled by low-affinity tags such as the His tag. The generation of AGT-based protein microarrays requires the display of BG on otherwise bioinert glass slides. We have previously shown that surfaces covered either with carboxymethylated dextran or polymer brushes of poly(oligo(ethy1eneglyco1)methacrylate)(POEGMA) and displaying BG are sufficiently bioinert for the selective immobilization of ACT fusion proteins [46, 481. Building on these results, we used glass slides covered either
I
469
470
I with carboxylated hydrogel or POEGMA. To demonstrate the use of AGT8 Jags and Probesfor Chemical Biology
based protein microarrays for the analysis of protein-protein interactions, we first analyzed the rapamycin-dependent heterodimerization of FKSOG-binding protein (FKBP)and the binding domain of FKBP rapamycin-associated protein (FRB). FKBP and FRB were expressed in E. coli with AGT fused to their Ntermini, yielding AGT-FKBP and AGT-FRB. Each of these proteins had a His tag fused to the N-terminus of AGT for purification. In the initial experiments, AGT-FKBP and AGT-FRB were expressed and purified via their His tag and the purified proteins were arrayed on glass slides displaying BG. In separate experiments, AGT-FKBP was labeled with Cy3 and AGT-FRB was labeled with Cy5, using appropriate BG derivatives. The protein microarray displaying AGT-FKBP and AGT-FRB was then simultaneously incubated with fluorescence-labeled AGT-FKBP and AGT-FRB in the presence of rapamycin. Analysis of the microarray clearly showed that Cy3-labeled FKBP interacts only with immobilized FRB and Cy5-labeled FRB interacts only with immobilized FKBP, demonstrating that both AGT fusion proteins remain functional after immobilization or labeling (Fig. 8.2-5(b)).As mentioned above, the specificity of the reaction of AGT fusion proteins with BG derivatives should allow for a direct arraying of AGT fusion proteins from cell extracts. We therefore repeated the above experiments by spotting extracts of E. coli expressing either AGT-FKBP or AGT-FRB. The resulting microarrays were incubated with Cy3-labeled FKBP and Cy5-labeled FRB in the presence of rapamycin, allowing for the recapitulation of the specific interaction between FKBP and FRB and demonstrating that a purification of the fusion protein prior to immobilization was not necessary (Fig. 8.2-5(c)).The possibility of using sets of AGT fusion proteins to rapidly screen for mutual protein-protein interactions is certainly one of the major applications we envision for AGTbased protein microarrays. Furthermore, we were able to show that small molecule-protein interactions as well as posttranslational modifications can be detected on AGT-based protein microarrays. As already mentioned, the possibility to label AGT fusion proteins with fluorophores is also an important feature of functional studies. To facilitate the purification of fluorescencelabeled AGT fusion proteins, we developed a synthesis of BG derivatives that allows for the labeling of AGT fusion proteins with bifunctional synthetic probes such as fluorophores and affinity labels (Fig. 8.2-l(b))[49].The affinity label allows for the isolation of fluorescence-labeled AGT fusion proteins and the bifunctional substrates could become useful tools for various applications in functional proteomics. Together, these features should make AGT-based protein microarrays a powerful tool for functional proteomics.
8.2.3.2
Labeling o f CP-fusion Proteins as a Tool to Study Cell Surface Proteins
The cell surface plays a key role in a variety of complex biological processes ranging from signal transduction to cell-cell and host-pathogen interactions. Proteins that act as receptors, channels, transporters, or enzymes that build
8.2 Chemical Approaches to Exploit Fusion Proteinsfor functional Studies
and remodel the extracellular matrix play the most prominent role in these activities. The detailed in vivo characterization of proteins is therefore an important prerequisite for understanding the biology of the cell surface in molecular terms. As the surfaces of cultured cells are freely accessible to chemical treatment, the labeling of their proteins with synthetic molecules appears to be an attractive strategy to equip them with probes that allow for their functional characterization [SO]. Tetracysteine tag and AGT are two examples for tags that were designed primarily for the covalent modification of intracellular proteins. Consequently, these protein tags are not necessarily suitable for applications in the oxidizing environment of the cell surface. For example, the application of the tetracysteine tag on cell surfaces requires the reduction of the otherwise oxidized and unreactive cysteines of the tag using membrane-impermeable reductants such as 2-mercaptoethanesulfonate and tris(carboxyethy1)phosphine [Sl]. Since this treatment will also reduce the disulfide bridges of most cell surface proteins, it will automatically perturb many of their activities. The labeling of AGT fusion proteins, on the other hand, relies on the alkylation of the reactive cysteine of AGT. While we have previously shown that AGT mutants with increased stability toward oxidizing conditions can be displayed in an active form on cell surfaces or viral particles, the requirement for a reactive cysteine makes AGT fusion proteins, nevertheless, to some extent sensitive to the oxidative environment of cell surfaces. The noncovalent labeling of cell surface proteins can alternatively be achieved by expressing them with an oligohistidine tag and incubating the corresponding cells with probes comprising a chromophore together with a metal-ion-chelating nitrilotriacetate (NTA) moiety [28]. This moiety binds reversibly to the oligohistidine sequences that are displayed by the fusion proteins. The feasibility of the approach has been demonstrated by binding NTA-chromophore conjugates to oligohistidine fusion proteins of a ligand-gated ion channel and a G protein-coupled receptor (GPCR). Possible drawbacks of the approach are the modest stability of the complex and unspecific binding of the NTA derivate to other proteins. As already mentioned, an alternative strategy is based on the expression of a cell surface protein as a fusion protein with an a-bungarotoxin-binding peptide and the incubation of cells expressing this protein with covalently derivatized a-bungarotoxin derivatives [24]. This labeling is of higher specificity than the His tag-based labeling, but also suffers from the fact that it is noncovalent and hence reversible. We have recently developed a novel labeling strategy for cell surface proteins, which promises to overcome some of the limitations of these approaches [30]. Here, the protein of interest is fused to a carrier protein (CP) and the corresponding fusion protein is then specifically labeled with CoA derivatives through a posttranslational modification catalyzed by a PPT. CPs are integral components of various primary and secondary metabolic pathways, including fatty acid synthesis (FAS),nonribosomal peptide synthesis (NRPS), polyketide synthesis (PKS), and lysine biosynthesis. All CPs harbor
I
471
472
I a phosphopantetheine (Ppant) as a covalently attached prosthetic group 8 Jags and Probesfor Chemical Biology
(Fig. 8.2-G(a))[52]. The Ppant serves as the attachment site for the building blocks and intermediates of different pathways. The different substrates are coupled as acyl thioesters to the free SH group of Ppant. Depending on the structure of the bound substrate, CPs are named acyl carrier proteins (ACPs),peptidyl carrier proteins (PCPs) or aryl carrier proteins (ArCPs).The covalent attachment of Ppant to the CP is catalyzed by a group of enzymes named phosphopantetheine transfrases [52]. PPTases use CoA as the source for Ppant and attach it as a phosphodiester to an invariant serine residue of the CP (Fig. 8.2-G(a)).Representative examples for PPTases are the PPTase acyl-carrier protein synthase (AcpS) from E. coli, which modifies ACPs, and the PPTase Sfp from Bacillus subtilis, which accepts PCPs from NRPS as substrates but also ACPs of FAS and PKS [52]. The overlapping substrate specificity of Sfp stands in contrast to that of AcpS which transfers the Ppant only to ACPs, but not to the PCPs of the enterobactin synthetase EntF from E. coli or other PCPs. Structural and biochemical studies have revealed that the #?-mercaptoethylamine group of CoA does not participate in the recognition of CoA by PPTases and that thiol-modified CoA derivatives can be employed for the labeling of CPs [53]. This lack of sensitivity with respect to the modification of the #?-mercaptoethylamine of CoA has been exploited to achieve specific labeling of CP-fusion proteins on the surface of eukaryotic cells (Fig. 8.2-G(b)) [30, 501. In initial applications, we chose the ACP/PPTase pair from E. coli. ACP from E. coli is a small protein of only 77 residues that folds into a compact structure composed of four a-helices, a fold shared by other CPs. The Ppant derivative is attached to Ser3G of ACP. The protein contains no cysteines, thus avoiding a potential misfolding of secreted ACP fusion proteins due to unwanted oxidations. When tested in vitro, ACP from E. coli is readily modified by CoA derivatives and the rate of reaction does not show a significant dependence on the nature ofthe label. At concentrations of 0.2 pM AcpS, 1 pM ACP, and 5 pM of the CoA derivative, a typical labeling experiment is complete within 10 min and the reaction is nearly quantitative. The ACP-Saglp fusion protein serves as a representative example for the modification of a protein on the surface of the yeast S. cerevisiae. Saglp is the a-agglutinin of yeast cells and is covalently attached to the B-1,G-glucan of the cell wall via its modified glycosylphosphatidylinositol anchor [54]. For the construction of the fusion protein we replaced the natural signal sequence of Saglp with the signal sequence of the a-factor followed by the coding sequence of the bacterial ACP. The combined addition of CoA-Cy3 and AcpS resulted in the specific labeling of yeast cells expressing ACPSaglp (Fig. 8.2-7(a)).The observed specificity and efficiency of labeling can be rationalized by two properties of the system. First, the cell surface separates the cell-impermeable CoA derivatives and the appropriate PPTase from host PPTase, host ACPs, and underivatized CoA, thereby suppressing unwanted side reactions such as the labeling of internal CPs. Second, bacterial ACPs
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies
Fig. 8.2-6 CP-based labeling o f fusion proteins. (a) Phosphopantetheinylation ofCPs, (b) Labelrng o f CP (cell surface) fusion proteins, (c) CoA derivatives described in this article.
I
473
474
I
8 Tags and Probesfor Chemical Biology
Fig. 8.2-7 Fluorescence labeling o f ACP fusion proteins on cell surfaces. (a) Fluorescence micrographs o f yeast cells expressing ACP-Sag1p. Cells shown in (a) were labeled with CoA-Cy3. (b,c) Labeling o f HEK293 cells transiently coexpressing ACP-NK1 and enhanced green fluorescent
protein fused t o a nuclear localization sequence (ECFP-NLS3). The nuclear green fluorescence identifies the transfected cells. Confocal micrographs are showing overlays o f fluorescence and transmission channels. (b) Labeling with Cy3 using CoA-Cy3, (c) Labeling with Cy5 using CoA-Cy5.
are not efficient substrates of eukaryotic PPTases. This feature minimizes unwanted phosphopantetheinylation of the fusion protein before it escapes from the cytosol into the secretory pathway. In addition to ACP-Saglp, we
8.2 Chemical Approaches to Exploit Fusion Proteinsfor Functional Studies
have previously shown that ACP C-terminally attached to the cu-agglutinin receptor Aga2p (Aga2p-ACP) can be effectively labeled on the surface of yeast. Together, these experiments demonstrate the flexibility of the ACP tag with respect to different orientations in the fusion protein. As the N- and C-termini of ACP reside on the same side of the protein and are proximal to each other it should be possible to insert ACP into the loops of cell surface proteins without dramatically perturbing the structures of the host and the guest protein. ACP fusion proteins can also be specifically labeled on the surfaces of mammalian cells. For example, ACP was attached to the exoplasmic N-terminus of the human GPCR neurokinin 1 ( N K l ) [30]. GPCRs represent an important class of therapeutic targets and the specific labeling of these proteins with spectroscopic probes on live cells makes the technique an interesting starting point for the development of functional cell-based assays [55]. As observed for yeast, HEK293 cells transiently expressing ACPNKI could be labeled with different fluorophores or affinity labels, whereas nontransfected cells were not labeled to any significant extent (Fig. 8.2-7(b)). Furthermore, PCP fusion proteins can be labeled specifically in vitro and on the surface of bacteriophage M13, further extending the number of hosts shown to be able to display active CP-fusion proteins [SG, 571. Besides its promiscuity toward different labels, CP fusions of cell surface proteins can be used for studying the dynamics of their distribution on and in the cell. Specifically, the membrane impermeability of PPTases and CoA derivatives limits the labeling to proteins that are already displayed on the cell surface during the incubation and leaves those proteins unlabeled that are either still in the secretory pathway or already internalized. This feature allows monitoring of the subsequent movement of the fusion protein from the plasma membrane to other cellular locations. Furthermore, the controlled addition of enzyme and substrate and their rapid removal permits a precise timing of the labeling. Thus, labeling reactions with different fluorophores at different times could be used to discriminate between different generations of CP-fusion proteins in individual cells. As a proof of principle, we have recently performed pulse-chase experiments in which three different generations of ACP-Saglpwere labeled with different fluorophores on yeast cells, allowing for a stunning visualization of localized cell wall growth. Of course, speed and high efficiency of the labeling are important prerequisites for these applications. Our previous measurements have indicated that the kinetics of the labeling of ACP fusion proteins on cell surfaces are comparable to those measured for the purified ACP. Consequently, labeling can be quantitative within a period of about 10 min, provided that sufficiently high substrate and PPTase concentrations are used. Another general approach for the labeling of fusion proteins, which conceptually resembles the labeling of CP-fusion proteins is the biotinylation of so-called acceptor peptides by biotin ligase [31, 581. The biotinylation of fusion proteins by itself is a valuable modification, as numerous streptavidinand avidin-based probes and materials are commercially available. However,
I
475
476
I
8 Jags and Probesfor Chemical Biology
streptavidin is a tetramer of 53 kD, which can be problematic for a number of applications. The versatility of the approach would therefore be significantly broadened if the synthetic probe could be directly attached to the biotin. Recently, Ting and coworkers have demonstrated that biotin ligase BirA from E. coli also accepts ketone isostere as a substrate for the labeling of a protein fused to an acceptor peptide [32]. The introduction of a keto functionality into fusion proteins allows their subsequent labeling with hydrazides linked to biophysical probes. This two-step labeling approach is particularly attractive for the labeling of fusion proteins of the cell surface or for the labeling of proteins in vitro. Its main advantages are the short size of the tag (15 amino acids) and the ease with which hydrazide derivatives can be synthesized. However, the formation of the hydrazone has two problematic features. Firstly, hydrazone formation is a slow process at pH 7 and, secondly, its formation is reversible.
8.2.4 Conclusions and Future Developments
The labeling of AGT and CP-fusion proteins discussed here demonstrates the two main advantages of a tag-mediated labeling of fusion proteins. Firstly, proteins can be equipped with functionalities that cannot be genetically encoded. This can be achieved in live cells or in vitro and possible functionalities range from synthetic fluorophores to ligands that mediate the interaction with other proteins. Secondly, a single fusion protein can be used for a variety of different applications. This second point applies, in particular, to AGT fusion proteins that can be used for pulse-chase experiments in live cells or for the generation of protein microarrays. Together, these properties make such fusion proteins powerful tools for functional proteomics and we are convinced that we will see many applications of these and other related technologies in the near future. What kind of further technological developments can be expected in this area of research? An obvious extension of the previous work is the specific labeling of fusion proteins in multicellular organisms such as Duosophila melanoguster, Caenorhabditis elegans, or mice. Another important development would allow the specific and simultaneous labeling of multiple fusion proteins with different (fluorescent) probes to collect multiple parameters and proteins, simultaneously in one cell. Such a multicolor imaging could either be achieved by using different labeling approaches, such as AGT and the tetracysteine tag, or by generating mutants of one tag with so-called orthogonal substrate specificities. As previous experiments have shown, AGT appears to be an ideal candidate for the latter strategy. Furthermore, the active transport of membrane-impermeable compounds for labeling experiments in live cells would significantly extend the general applicability of the approach. Here, the recently described arginine transporters are attractive candidates to achieve
References I 4 7 7
this goal [59,60].Research in these directions is currently pursued in a number of laboratories.
Acknowledgments
I.S. was supported by a stipend of the Fonds der chemischen Industrie. We thank the Swiss National Science Foundation, the EPFL, and the Human Frontier Science Program for generous support.
References 1.
2.
3.
4.
5.
6.
7.
8.
9.
J.A. Bornhorst, J.J. Falke, Methods Enzymol. 2000,326,245-254. D.B. Smith, Methods Enzymol. 2000, 326,254-270. J. Zhang, R.E. Campbell, A.Y. Ting, R.Y. Tsien, Nat. Rev. Mol. Cell Biol. 2002, 3,906-918. J. Lippincott-Schwartz, G.H. Patterson, Science 2003, 300, 87-91. S. Fields, 0. Song, Nature1989, 340. 245-246. N. Johnsson, A. Varshavsky, Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 10340- 10344. M.R. Martzen, S.M. McCraith, S.L. Spinelli, F.M. Torres, S. Fields, E.J. Grayhack, E.M. Phizicky, Science 1999,286,1153-1155. H. Zhu, M. Bilgin, R. Bangham, D. Hall, A. Casamayor, P. Bertone, N. Lan, R. Jansen, S. Bidlingmaier, T. Houfek, T. Mitchell, P. Miller, R.A. Dean, M. Gerstein, M. Snyder, Science 2001, 293,2101-2105. A.C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, J.M. Rick, A.M. Michon, C.M. Cruciat, M. Remor, C. Hofert, M. Schelder, M. Brajenovic, H. Ruffner, A. Merino, K. Klein, M. Hudak, D. Dickson, T. Rudi, V. Gnau, A. Bauch. S. Bastuck. B. Huhse, C. Leutwein, M.A. Heurtier R.R. Copley, A. Edelmann, E. Querfurth, V. Rybin, G. Drewes, M. Raida, T. Bouwmeester, P. Bork,
10.
11.
12.
13. 14. 15.
16.
17. 18. 19.
B. Seraphin, B. Kuster, G. Neubauer, G. Superti-Furga, Nature 2002, 415, 141-147. P. Uetz, L. Giot, G. Cagney, T.A. Mansfield, R.S. Judson, J.R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G . Vijayadamodar, M. Yang, M. Johnston, S. Fields, J.M. Rothberg, Nature 2000,403, 623-627. T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, Y. Sakaki, Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 4569-4574. J.P. Miller, R.S. Lo, A. Ben-Hur, C. Desmarais, I. Stagljar, W. Stafford Noble, S. Fields, Proc. Natl. Acad. Sci. U.S.A. 2005, 102,12123-12128. N. Johnsson, K. Johnsson, ChemBioChem2003,4,803-810. L. Wang, P.G. Schultz, Angew. Chem., [at. Ed. Engl. 2004, 44, 34-66. C.J. Noren, J. Wang, F.B. Perler, Angew. Chem., Int. Ed. Engl. 2000, 39, 450-466. T.W. Muir, Annu. Rev. Biochem. 2003, 72, 249-289. 1. Giriat, T.W. Muir, J . Am. Chem. SOL. 2003, 125,7180-7181. J. Farinas, A.S. Verkman, J . Biol. Chem. 1999, 274,7603-7606. M.M. Wu, J. Llopis, S. Adams, J.M. McCaffery, M.S. Kulomaa, T.E. Machen. H.P. Moore, R.Y. Tsien, Chem. Biol. 2000, 7, 197-209.
478
I
8 Tags and Probesfor Chemical Biology D.I. Israel, R.J. Kaufman, Proc. Natl. Acad. Sci. U.S.A. 1993, 90,4290-4294. 39. 21. L.W. Miller, J. Sable, P. Goelet, M.P. Sheetz, V.W. Cornish, Angew. Chem., Int. Ed. Engl. 2004, 43, 1672-1675. 22. L.W. Miller, Y. Cai, M.P. Sheetz, 40. V.W. Cornish, Nut. Methods 2005, 2, 255-257. 23. K.M.Marks, P.D. Braun,G.P. Nolan, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 41. 9982-9987. 24. Y. Sekine-Aizawa, R.L. Huganir, Proc. 42. Natl. Acad. Sci. U.S.A. 2004, 101, 17114-171 19. 43. 25. B.A. Griffin, S.R. Adams, R.Y. Tsien, Science 1998, 281, 269-272. 26. A. Keppler, S. Gendreizig, 44. T. Gronemeyer, H. Pick, H. Vogel, K. Johnsson, Nut. Biotechnol. 2003, 21, 86-89. 45. 27. D.S. Yeo, R. Srinivasan, M. Uttamchandani, G.Y. Chen, Q. Zhu, S.Q. Yao, Chem. Commun. 46. (Camb.) 2003, 23,2870-2871. 28. E.G. Guignet, R. Hovius, H. Vogel, Nut. Biotechnol. 2004, 22,440-444. 47. 29. K.M. Marks, M. Rosinov, G.P. Nolan, Chem. Biol. 2004, 1I, 347-356. 48. 30. N. George, H. Pick, H. Vogel, N.Johnsson, K. Johnsson, J. Am. Chem. SOC.2004, 126,8896-8897. 31. J.E. Cronan Jr,J. Biol. Chem. 1990, 49. 265,10327-10333. 32. I. Chen, M. Howarth, W. Lin, A.Y. Ting, Nut. Methods 2005, 2, 50. 99-104. 33. G. Gaietta, T.J. Deerinck, S.R. Adams, 51. J. Bouwer, 0. Tour, D.W. Laird, G.E. Sosinsky, R.Y. Tsien, M.H. Ellisman, Science 2002, 296, 503-507. 52. 34. R.Y. Tsien, FEBS Lett. 2005, 579, 927-932. 35. A.E. Pegg, Mutat. Res. 2000, 462, 83-100. 36. A. Lim, B.F. Li, E M B O J . 1996, 15, 53. 4050-4060. 37. A. Juillerat, T. Gronemeyer, A. Keppler, S. Gendreizig, H. Pick, 54. H. Vogel, K. Johnsson, Chem. Biol. 2003, 10, 313-317. 55. 38. A. Juillerat, C. Heinis, I. Sielaff, 1. Barnikow, H. Jaccard, B. Kunz,
20.
A. Terskikh, K. Johnsson, Chembiochem 2005,6,1263-1269. A. Keppler, H. Pick, C. Arrivoli, H. Vogel, K. Johnsson, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 9955-9959. A. Juillerat, C. Heinis, I. Sielaff, J. Barnikow, H. Jaccard, B. Kunz, A. Terskikh, K. Johnsson, ChemBioChem 2005, 6,1263-1269. J.E. Bergmann, S.J. Singer, J . Cell Biol. 1983, 97,1777-1787. Y. Chen, J.D. Mills, A. Periasamy, Diferentiation 2003, 71, 528-541. S. Gendreizig, M. Kindermann, K. Johnsson,]. Am. Chem. SOC.2003, 125,14970-14971. D.M. Spencer, T.J. Wandless, S.L. Schreiber, G.R. Crabtree, Science 1993,262,1019-1024. H.N. Lin, W.M. Abida, R.T. Sauer, V.W. Cornish, J. Am. Chem. Soc. 2000, 122,4247-4248. M. Kindermann, N. George, N. Johnsson, K. Johnsson,J. A m . Chem. SOC.2003, 125,7810-7811. J. LaBaer, N. Ramachandran, Curr. Opin. Chem. Biol. 2005, 9, 14-19. S. Tugulu, A. Arnold, 1. Sielaff, K. Johnsson, H.A. Klok, Biomacromolecules 2005, 6, 1602-1607. M. Kindermann, I . Sielaff, K. Johnsson, Bioorg. Med. Chem. Lett. 2004, 14,2725-2728. N.Johnsson, N. George, K. Johnsson, ChemBioChem 2005, 6,47-52. S.R. Adams, R.E. Campbell, L.A. Gross, B.R. Martin, G.K. Walkup, Y. Yao, J. Llopis, R.Y. Tsien, J. Am. Chem. SOC.2002, 124,6063-6076. R.H. Lambalot, A.M. Gehring, R.S. Flugel, P. Zuber, M. LaCelle, M.A. Marahiel, R. Reid, C. Khosla, C.T. Walsh, Chem. Biol. 1996, 3, 923-936. J.J.La Clair, T.L. Foley, T.R. Schegg, C.M. Regan, M.D. Burkart, Chem. Biol. 2004, 11, 195-201. P.N. Lipke, J. Kurjan, Microbiol. Rev. 1992,56,180-194. J . Drews, Science 2000, 287, 1960-1964.
References I 4 7 9 56. 57.
58.
J. Yin, F. Liu, X. Li, C.T. Walsh,]. Am. Chem. SOC.2004, 126,7754-7755. J. Yin, F. Liu, M. Schinke, C. Daly, C.T. Walsh, J. Am. Chem. SOC.2004, 126, 13570-13571. D. Beckett, E. Kovaleva, P.J. Schatz, Protein Sci. 1999, 8, 921-929.
59.
D. Derossi, G. Chassaing,
A. Prochiantz, Trends Cell Biol. 1998,
8, 84-87. Rothbard, E. Kreider, C.L. VanDeusen, L. Wright, B.L. Wylie, P.A. Wender,]. Med. Chem. 2002, 45, 3612-3618.
60. J.B.
PART IV Expanding the Scope of Chemical Synthesis
Chemical Biology. Fr-om Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
9 Diversity-oriented Synthesis
9.1 Diversity-orientedSynthesis
Derek S. Tan
Outlook
Diversity-oriented synthesis (DOS) involves the synthesis of combinatorial libraries of diverse small molecules for biological screening. Rather than being directed toward a single biological target, DOS libraries can be used to identify novel ligands for a variety oftargets. These ligands can then be used as powerful probes to investigate biological processes. This chapter discusses the origins of DOS, key enabling technologies, and library design strategies. Several recent examples of novel chemical probes identified from DOS libraries are also described.
9.1.1 Introduction
Small molecules are extremely powerful tools for studying biological systems [ 11. They allow rapid and conditional modulation of biological functions, often in a reversible, dose-dependent manner. Moreover, they can modulate individual functions of multifunctional targets and distinguish between different posttranslational modification and conformational states of proteins. These features make the chemical, genetic, or pharmacological approach a valuable complement to genetic and RNA interference-based methods, particularly for dissecting complex, dynamic biological processes. Small molecules can also be used to illuminate new potential therapeutic targets and provide a very direct means of validating these targets in model systems. Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
484
I
9 Diversity-orjented Synthesis
However, the identification of new, highly specific small molecule probes remains a major challenge in chemical biology. Structure- or mechanismbased rational design is sometimes feasible when a single protein target and a natural ligand are known. Conversely, high-throughput screening (HTS) of small molecule libraries provides a practical and effective solution for individual targets that may be less well characterized and for systems that involve multiple targets. Diversity-oriented synthesis (DOS) has emerged as a valuable approach to generate combinatorial libraries for use in these screens, particularly novel libraries that explore untapped or underrepresented regions of biologically relevant chemical structure space [2]. Efforts in DOS have led to the discovery of powerful new biological probes and have also spurred continuing advances in synthetic organic chemistry.
9.1.2 History/Developrnent
Early efforts in DOS were reported in the 1990s. However, several key synthetic technologies that were developed earlier laid the foundations for DOS. Foremost among these are (1)solid-phase synthesis and related separation techniques and (2) combinatorial synthesis.
9.1.2.1 9.1.2.1.1
Solid-phase Synthesis and Related Separation Techniques Solid-phase Synthesis
Solid-phase peptide synthesis was developed by Merrifield in the early 1960s [ 3 ] . Subsequently, solid-phase strategies were developed for the synthesis of oligonucleotides, as well as for nonbiopolymer small molecules, such as synthetic drugs and natural products. In a solid-phase synthesis, the starting material (e.g., first residue or molecular scaffold) is attached to an insoluble solid support, such as a polymer or glass bead, via a chemical “linker” that can be cleaved under specific, orthogonal reaction conditions [4](Fig. 9.1-1(a)).The support-bound substrate is then exposed to solutions of reagents to effect building block coupling reactions or other chemical transformations. The support-bound product is then separated from the excess reagents and reaction by-products simply by rinsing the solid supports with appropriate solvents. This allows stoichiometric excesses and multiple couplings to be used to drive these reactions to completion. This process is repeated at each step in the synthesis. Finally, the product is cleaved from the solid supports and purified as necessary, using standard techniques such as column chromatography. Thus, solid-phase synthesis provides a rapid, convenient, and often automatable means to isolate synthetic intermediates
9. I
Diversity-oriented Synthesis
that circumvent the need for tedious purifications at each step of a multistep synthesis. 9.1.2.1.2
Precipitation Tags and Fluorous Tags
A number of other related strategies have been developed more recently to facilitate the recovery and handling of synthetic intermediates [5]. The heterogeneous nature of solid-phase reactions sometimes causes problems related to reaction kinetics, due to the slow diffusion of reagents into the polymer matrix. Monitoring reaction progress also presents a new challenge, generally requiring cleavage of an aliquot of the solid support, which introduces an additional chemical reaction that may alter or degrade the reaction product, or recourse to “on-bead” spectroscopic analysis, which often suffers from poor sensitivity and resolution. Thus, in one alternative technique, the starting material is linked to a “precipitation tag” or “phase switch” that is soluble under most reaction conditions (Fig. 9.1-1(b)).Non-cross-linked polymers such as poly(ethyleneglyco1)and polystyrene have been used for this purpose, as well as individual functional groups. This allows standard homogeneous reaction conditions and analytical techniques to be used during the synthesis. The reaction product is then precipitated from the reaction mixture by adding a solvent or reagent that induces phase switching of the tag. The precipitated product is rinsed with the appropriate solvents to remove excess reagents and reaction by-products. Ideally, the product can then be resolublized to allow subsequent reactions in the synthetic sequence. Along similar lines, polyfluorocarbon chains have been used as “fluorous tags” that are soluble in both organic and fluorocarbon solvents [59] (Fig. 9.1-1(c)).This again allows homogeneous reaction conditions to be employed. The tagged reaction products are easily separated from the reagents and reaction by-products by extraction of the reaction mixture with an immiscible fluorocarbon solvent or by passage over a column of fluorinated silica gel. The recovered products can then be subjected to subsequent downstream reactions. 9.1.2.1.3
Solid-supported Reagents and Scavengers
As a variation of this theme, reagents, instead of reaction substrates, can also be attached to solid supports or otherwise tagged to facilitate their separation from reaction mixtures [GI. This approach is particularly useful for reagents or reaction by-products that are difficult to remove by extraction or silica gel chromatography. Carbodiimide coupling reagents and their urea by-products are excellent examples (Fig. 9.1-1(d)). Furthermore, solid supports carrying reactive functional groups can be used as “scavengers” to trap excess reagents and reaction by-products [7]. This allows completely homogeneous reaction conditions to be used, followed by direct addition of or passage over a column of the appropriate scavengerbearing supports. Thus, for example, excess isocyanates are readily trapped with solid supports having primary amine functionalities (Fig. 9.1-1(e)).
I
485
486
I
9 Diversity-oriented Synthesis
9.1 Diversity-oriented Synthesis 4
Fig. 9.1-1 Separation techniques used in diversity-oriented synthesis. (a) Solid-phase synthesis allows reaction products to be separated easily from excess reagents and reaction by-products by rinsing the solid supports (yellow) with appropriate solvents. At the end o f t h e synthesis, the products are usually cleaved from the solid support for screening. (b) Precipitation tags or phase switches (red) allow homogeneous reaction conditions, but can then be precipitated (blue) from the crude reaction mixture by addition o f the appropriate solvent or reagent. The reaction products are again easily separated from excess reagents and reaction by-products. Ideally, the precipitation tag can then be resolublized for subsequent reactions. (c) Fluorous tags
9.1 2 . 2 9.1.2.2.1
(purple) are soluble in organic solvents, again allowing homogeneous reaction conditions to be used. The reaction products can then be separated from excess reagents and reaction by-products by extraction with an immiscible fluorocarbon solvent, or by passage over a column offluorinated silica gel (not shown). (d) Solid-supported reagents, such as the carbodiimide shown, are used with substrates in solution t o facilitate removal o f reaction by-products that may be difficult t o separate using traditional extraction or chromatographic techniques. (e) Solid-supported scavengers, such as the amine shown, are used t o remove excess reagents or reaction by-products from solution phase reactions.
Cornbinatorial Synthesis The Power o f Cornbinatorialization
Combinatorial synthesis actually traces its origins to biological processes. For example, the genetic recombination processes at the heart of the immune response involve mixing and matching of various gene segments to produce libraries of antibodies and cell surface receptors. Similarly, combinatorial chemistry involves systematic mixing and matching of various chemical building blocks and transformations to generate libraries of small molecules. Importantly, solid-phase synthesis allows convenient handling and distribution of synthetic intermediates to facilitate this combinatorialization process. This feature was initially leveraged to generate combinatorial peptide libraries. Subsequently, as with solid-phase synthesis, combinatorial chemistry has been extended to the synthesis of oligonucleotide and nonbiopolymer small molecule libraries. The power of combinatorialization in systematically generating large numbers of compounds is readily demonstrated by the following example: Consider that there are 20 naturally occurring amino acids in humans. This seems a rather small number in comparison to the tremendous diversity of protein structures and functions that result by simply coupling these 20 monomers. However, when one considers the exponentially increasing number of combinatorial possibilities in the progressively longer polypeptide chains below, the source of this diversity becomes clear. (For the sake of simplicity, additional possibilities arising from the disulfide formation between cysteine residues are omitted from this analysis.)
I
487
488
I
9 Diversity-oriented Synthesis
Amino acid 20 Dipeptide 20 x 20 Tripeptide 20 x 20 x 20 Tetrapeptide 2ox2ox2ox2o Decapeptide (10-mer) 20 x 20 x 20 x 2 0 . . . Icosapeptide (20-mer) 20 x 20 x 20 x 20. , . . . . Hectapeptide (100-mer)20 x 20 x 20 x 20.. . . . . . . .
= 2 0 ~ = 20 monomers = 202 = 400 combinations = 203 = 8000 combinations = 204 = 160000 combinations = 20" = 1.02 x 1013 combinations = 20'' = 1.05 x combinations = 20'" = 1.27 x combinations
The separation techniques described in Section 9.1.2.1 allow rapid parallel processing of synthetic reactions, a critical requirement for the synthesis of combinatorial libraries that may contain hundreds, thousands, or even millions of members. Several synthetic formats are discussed below, using two examples. First, we will consider a library of all the possible tetrapeptides formed using any of the four amino acids: aspartate (D), histidine (H),lysine (K), and threonine (T). There are 4 x 4 x 4 x 4 = 44 = 256 combinations. Second, we will consider a library of all the possible tetrapeptides formed using any of the 20 natural amino acids. There are 160 000 combinations as indicated above. 9.1.2.2.2
Mixture Synthesis
Conceptually, the most straightforward approach to combinatorial synthesis is to simply use all the possible building blocks in a one-pot reaction at each step of the synthesis. Using this approach, the 256-member library described above could be synthesized using only four coupling reactions, in which all four possible amino acid building blocks are used in each reaction (Fig. 9.1-2(a)). (For simplicity, the requisite intermediate N-deprotection reactions and the final cleavage step are omitted from this analysis. Further, the relative stoichiometries of the amino acids would need to be adjusted for differential reactivities.) Interestingly, the 160 000-member library can also be synthesized using only four coupling reactions, in which all 20 amino acids are used at each step. Although this approach is extremely efficient synthetically, it creates a daunting problem in terms of subsequent biological evaluation of the libraries, since the "product" of the synthesis is a complex mixture of all library members. The library members cannot be screened individually, and so the only hope of identifying active compounds is through in vitro binding assays. Recourse to Edman sequencing provides some possibilities when using peptide libraries; however, identifying active compounds from libraries of nonbiopolymer small molecules represents a major, if not impossible, analytical challenge. It is worth noting, however, that processes based on selection, rather than screening, would allow amplification of the active library members to facilitate their eventual identification using standard analytical techniques. This is the basis of the phage display selection protocols used for peptide libraries. Recent
9. I Diversity-oriented Synthesis
progress in the DNA-templated synthesis of small molecules suggests that one day this may be possible for nonbiopolymer small molecule libraries as well [8]. 9.1.2.2.3
Parallel Synthesis
The analytical problems caused by mixture synthesis can be overcome by using parallel synthesis. In this approach, each library member is synthesized in a separate reaction vessel and, thus, can later be screened individually. Key early efforts in this area were taken separately by Geysen and Houghten [9, 101. The efficiency of separation techniques such as solid-phase synthesis allows the parallel synthesis of relatively small libraries to be accomplished quite readily. For example, the 256-member library above could be synthesized using 256 x 4 = 1024 coupling reactions, which could probably be accomplished in a week or two (Fig. 9.1-2(b)).Additional efficiencies can be achieved by carrying out the earlier coupling reactions in larger combined batches, which are then progressively split into smaller batches as the synthesis proceeds. In this fashion, the synthesis can be accomplished using 4 16 64 256 = 340 coupling reactions. Even so, manual parallel synthesis rapidly becomes unmanageable for larger libraries and recourse to expensive robotics becomes necessary. For example, the 160 000-member library above would require 160 000 x 4 = 640 000 coupling reactions under totally parallel conditions, and 20 + 400 + 8000 + 160 000 = 168420 coupling reactions using partially batched protocols.
+ + +
9.1.2.2.4
Split-Pool Synthesis
A third alternative approach, split-pool synthesis, combines much of the synthetic efficiency of mixture synthesis with the ease of screening provided by parallel synthesis. This method was developed separately by Furka and Lam [I1,121.At each step in the synthesis, each individual building block is coupled to the solid supports in a separate reaction vessel. The solid supports are then combined (pooled), mixed, and redistributed (split) into new reaction vessels, one for each building block to be coupled in the next step. The resulting stochastic distribution of solid supports provides roughly equal numbers of the different synthetic intermediates in each of the subsequent reactions. Typically, enough solid supports are used to yield at least three “copies” of the library to increase the probability that each desired library member will be represented at least once. Thus, the 256-member library above could be synthesized using only 4 x 4 = 16 coupling reactions (Fig. 9.1-2(c))and the 160000-member library could be produced with only 20 x 4 = 80 coupling reactions! The split-pool process does provide a mixture of solid supports at the end of the synthesis. However, a critical feature of split-pool synthesis is that each solid support has been exposed to only a single reaction sequence. Thus, assuming ideal reaction efficiency, each solid support carries only a single library member, which can be cleaved and assayed individually. One drawback
I
489
490
I
9 Diversity-oriented Synthesis
9. 7 Diversity-oriented Synthesis 4
Fig. 9.1-2 Synthetic strategies used t o stochastic distribution of substrates for the generate combinatorial libraries. Several next reaction. Generally, at least three copies approaches t o a 256-member library o f o f a library are synthesized t o maximize the tetrapeptides, composed o f the four amino probability that each putative library acids aspartate (D), histidine ( H ) , lysine (K), member is represented a t least once. Since and threonine (T), are shown. For simplicity, there are four coupling reactions required at only the coupling reactions are considered each step, the overall synthesis requires in these analyses. (a) Mixture synthesis 4 x 4 = 16 coupling reactions. Importantly, involves simultaneous coupling o f all each solid support has been exposed t o only building blocks in one-pot reactions. The a single synthetic sequence, and hence synthesis requires only four coupling steps, carries only a single library member. but provides a complex mixture o f 256 Encoding the solid supports with orthogonal products, complicating screening and chemistry or physical methods (e.g., TAGT) identification o f active library members. allows the history o f each bead t o be reconstructed t o identify active library (b) In parallel synthesis, each library members. (d) Recursive deconvolution can member (3 out o f 256 are shown) is also be used to identify active library synthesized in a separate reaction vessel, members. In the first round the last set allowing each to be cleaved, purified, and o f reaction products are not repooled, but screened individually. Since each are screened separately so that, for a given tetrapeptide requires four coupling steps, active library member (*), the identity o f t h e the overall synthesis requires final (N-terminal) building block is known. 256 x 4 = 1024 coupling reactions. For larger libraries, recourse to robotics may be Using this information, progressively @, 0) necessary. (c) In split-pool synthesis, each smaller sublibraries are made with an increasing number o f fixed building building block is coupled t o the solid supports in a separate reaction vessel. The blocks until the identities o f all the building solid supports are then pooled, mixed, and blocks have been determined.
(a), (a,
split into new reaction vessels t o provide a
of split-pool synthesis is that the pooling steps obscure the precise identity of each individual library member. Thus, either recursive deconvolution or encoding strategies must be used to determine the identities of active library members. Recursive Deconvolution One approach to identifying individual members of split-pool libraries is recursive deconvolution, which involves resynthesis and screening with progressively smaller sublibraries, based on the initial screening results of the primary library [13]. So, in the example of the 256-member library above, the products of the fourth coupling step are not pooled prior to distribution of the individual solid supports for the final cleavage step (Fig. 9.1-2(d)). Thus, the identity of the fourth (N-terminal) residue is known for each compound being screened. Importantly, the compounds must still be cleaved and screened individually to avoid the complications of compound mixtures. Once an active library member is found, a smaller sublibrary of 43 = 64 tetrapeptides is synthesized in which the N-terminal residue is fixed according to the active library member. In this case, the products of the third coupling step are not pooled prior to coupling of the fourth residue and cleavage. Thus, the identities 9.1.2.2.5
I
491
492
I of the third and fourth residues are known for each compound being screened. 9 Diversity-oriented Synthesis
This process is continued with a sublibrary of 42 = 16 tetrapeptides having fixed third and fourth residues, then a final sublibrary of four tetrapeptides having fixed second, third, and fourth residues, until a single active library member is identified. The process is analogous for the 160 000-member library. 9.1.2.2.6
Encoding
Recursive deconvolution is effective but time consuming. An alternative approach to identifying individual members of split-pool libraries is to use a physical or chemical method to encode the building block coupled in each reaction vessel [13].This can be accomplished by attaching an inert chemical “tag” to the solid supports using orthogonal reactions (Fig. 9.1-2(c)).Once an active library member is found, its identity can be determined by decoding the tags from the corresponding solid support. Notably, the tags do not identify the structure of the product directly, but instead provide a history of reaction conditions to which the solid support has been exposed. This reaction sequence must then be repeated to determine the structure of the library member using standard analytical techniques. Another approach involves physical tagging of the solid supports with “barcoding” devices. This can be accomplished by direct modifications to the solid supports or by enclosing the solid supports in a small permeable reaction vessel along with the tag. A variety of tags have been used for this purpose, ranging from Houghten’s original “tea bags” [14],to colored plastic pegs, to fluorescent colloids, to radiofrequency transponders, to laser-etched two-dimensional barcodes. In some cases, if the barcodes are assigned at the very beginning of the synthesis, which are then read prior to each split step, an exactly even distribution of synthetic intermediates can be accomplished, allowing synthesis of exactly one copy of the library. Again, the tag only tracks the history of reactions to which the solid support has been exposed.
9.1.2.3
Early Efforts in Diversity-oriented Synthesis
Although the term “diversity-oriented synthesis” had not yet been coined, early efforts in the synthesis of diverse, nonoligomeric small molecule libraries were taken by several labs in the 1990s. Ellman and Hobbs DeWitt separately developed libraries built around benzodiazepines, privileged scaffolds that had been shown to bind a variety of biological targets. Ellman synthesized a library of 192 benzodiazepines by parallel solid-phase synthesis using 2 x 12 x 8 building blocks with a solid-phase cyclization reaction as the key step [15] (Fig. 9.1-3(a)).Hobbs DeWitt synthesized a library of 40 benzodiazepines by parallel solid-phase synthesis using 5 x 8 building blocks [16] (Fig. 9.1-3(b)). In this case, the key cyclization reaction occurred concurrently with cleavage from the solid support, providing products with >90% purity, which could be used directly in biological screening.
9. I Diversity-oriented Synthesis 1493
Schreiber’s early efforts in this area were focused on libraries of compounds having structural features reminiscent of rigid, complex, stereochemically rich natural products. In a key early example, solid-phase split-pool synthesis was used to generate a combinatorial library of over two million complex, polycyclic compounds derived from shikimic acid [ 171. A stereoselective tandem acylation-nitrone cycloaddition was used to generate 18 tetracyclic scaffolds, to which 30 alkynes were coupled using a Sonogashira reaction, 62 amines were coupled via y-lactone aminolysis, and 62 carboxylic acids were coupled by alcohol esterification (Fig. 9.1-3(c)).In addition, a portion of the solid supports were left unreacted at each of the last three steps to generate a “skip codon” that further increased the diversity of the library. Bartlett proposed some early guidelines for library synthesis: (a)The sequence should involve a small number of steps, (b) no more than one variable should be introduced in any step, (c) starting materials should be readily obtained with a diverse selection of substituents, and (d) cyclic, nonoligomeric structures represent the most interesting targets [l8].Furthermore, Armstrong did some early work toward libraries composed of multiple scaffolds, derived from common synthetic intermediates [19, 201. In one case, Ugi multicomponent coupling reaction products were converted to various linear and cyclic derivatives (Fig. 9.1-4(a)).In another, squaric acid was proposed as a precursor that could be converted to multiple cyclic and polycyclic products (Fig. 9.1-4(b))and several such transformations were demonstrated. DOS has evolved significantly since these early efforts, particularly in the areas of library design and synthetic planning. DOS has also proven to be a fertile ground for the development of new chemistry. These topics are discussed in greater detail in Section9.1.3 below. DOS libraries have also provided a variety of powerful new probes to dissect biological systems. Recent examples are discussed in Section 9.1.4 below.
9.1.2.4
Related Alternative Strategies
Several related approaches that are complementary to the synthesis and screening of combinatorial libraries deserve mention. Many of these can be grouped under the broad heading of fragment-based ligand discovery [21]. This involves identification of two or more low-molecular-weight “fragments” that bind to an individual protein target of interest. Importantly, the individual fragments can bind with very low affinities (micromolar to millimolar), but once they are covalently linked, through either deliberate laboratory synthesis or in situ target-directed coupling, high-affinity (nanomolar) ligands can be obtained. These fragment-based approaches have proven to be an effective means to identify new ligands, although they require selection of an individual biological target and are currently limited to biochemical screening methods. In addition to traditional “wet” screening, in silico “virtual” screening has also been used to identify new ligands or ligand fragments [22,231.
494
I
9 Diversity-oriented Synthesis
3.7 Diversity-oriented Synthesis 4
Fig. 9.1-3 Early efforts in diversity-oriented synthesis o f combinatorial libraries. (a) Ellman’s solid-phase parallel synthesis o f a 192-member library built around a benzodiazepine scaffold, a privileged structure f o u n d i n many synthetic drugs [15]. (b) H o b b s DeWitt’s solid-phase parallel synthesis o f a 40-member library o f benzodiazepines featuring a cyclorelease strategy [16]. (c) Schreiber’s solid-phase split-pool synthesis o f a 2 180 106-member
I
495
library o f rigid, complex, stereochemically rich, natural productlike polycyclics featuring a stereoselective t a n d e m acylation-nitrone cycloaddition reaction [17]. The library size calculations are adjusted for the fact that the aminolysis “skip codon” (unreacted y-lactone) leaves 558 tetracyclic products that are n o t substrates for t h e final alcohol acylation step and, thus, m u s t be counted separately.
Computational algorithms are used to “dock” potential binders to an experimentally determined protein structure, or to a homology model based on a similar protein. The virtual hits are then purchased or synthesized and binding is confirmed in traditional wet experiments. This approach can be more cost-effectivethan wet screening and has successfully produced a number of new ligands. However, it again requires selection of an individual biological target and is also dependent on the availability of structural information about that target.
0
4
R2 R& ’R -
/
Fig. 9.1-4 Early efforts toward multiscaffold (b) proposed squaric acid as a versatile precursor t o various cyclic scaffolds, libraries. (a) Armstrong converted an Ugi multicomponent reaction product t o several demonstrating several such reactions [20]. linear and cyclic derivatives [19] and
/
496
I
9 Diversity-oriented Synthesis
9.1.3 General Considerations
Current efforts in DOS are focused in three areas. First, a variety of library design strategies are being explored to generate libraries that will provide new biologically active molecules to probe a wide range of targets. Second, new synthetic strategies are being developed to generate structural diversity in a flexible, efficient fashion. Third, new chemical methodologies are being developed to meet the stringent demands of DOS on reaction efficiency and selectivity.
9.1.3.1 9.1.3.1.1
DOS Library Design Strategies Chemical and Biological Space
Chemical structure space [24], the complete set of all possible small molecules, has been variously calculated to contain 1030-10200structures, depending on the algorithms used and the upper limits placed on molecule size. Clearly, it is impossible to synthesize all the possible small molecules. Moreover, even the largest industrial screening campaigns are limited to RZ 10' compounds, a practically infinitesimal fraction of the total possibilities. Fortunately, however, one can expect that only a fraction of that space will comprise molecules that are stable and soluble in aqueous media, have appropriate functional groups to interact with biological targets such as proteins and nucleic acids, and have sufficient structural complexity to do so with useful levels of specificity. Additional structural constraints are imposed when cell permeability or bioavailability in whole organisms are considered. Thus, a key goal in DOS is to design combinatorial libraries that target the biologically relevant regions of chemical structure space. To address this issue, most DOS library design strategies leverage information about known biologically active small molecules to generate compounds that will target these regions in a similar manner. These can be based on synthetic drugs or natural products and both classes are attractive, complementary starting points for DOS library design. 9.1.3.1.2
Drug-like Libraries
Synthetic drugs are often built around nitrogen-containing heteroaromatic scaffolds that are of appropriate size to bind in the active site pockets of biological targets such as enzymes and G protein-coupled receptors (Fig. 9.1-5(a)). They tend to have few or no stereogenic centers, which greatly simplifies their synthesis. Some of these scaffolds have been identified as privileged structures in that they have an empirically demonstrated ability to bind multiple classes of protein targets [25, 261.
9.7 Diversity-oriented Synthesis
The benzodiazepine scaffold is a classic example. Although the underlying basis for this “privileged” standing is not well understood, it has been suggested that conservation of protein folds may contribute toward this [25, 271. These common drug scaffolds often serve as the basis for DOS of “drug-like” libraries [29]. Furthermore, since synthetic drugs are most useful when orally bioavailable, these library designs often take into account physicochemical properties that have been found to correlate with this feature [30]. Notably, many of the current commercially available libraries still fail to recapitulate these properties [28]. Thus, efforts continue to develop drug-like libraries that match the properties of known synthetic drugs more closely. Recent attention has also been drawn to generating “lead-like’’libraries [31]. During the drug development process, lead optimization to provide clinical candidates often results in increased molecular weight and hydrophobicity, factors that can adversely affect permeability and solubility. Thus, leadlike libraries are composed of relatively simple, low-molecular-weight, hydrophilic compounds that must be screened at relatively high concentrations but are then more suitable candidates for medicinal chemistry optimization. 9.1.3.1.3
Natural Product-like Libraries
Natural products exhibit tremendous structural diversity and often have increased size and structural complexity compared to synthetic drugs (Fig. 9.1-5(b)).They frequently contain a greater proportion of oxygen than nitrogen heteroatoms and a significant number of stereogenic centers [28]. Although clinically used natural products are sometimes not orally bioavailable, they are able to address a wider range of biological targets than synthetic drugs. For example, rather than acting as ligands that bind in a protein pocket, glycopeptide antibiotics such as vancomycin act as receptors for the C-terminal D-ala-D-alaof bacterial peptidoglycan precursors. Moreover, protein-protein interactions, which have historically been very difficult targets for synthetic drugs [32, 331, can often be modulated with natural products. The natural product anticancer drugs paclitaxel (Taxol) and vincristine are examples that modulate tubulin protein-protein interactions. Thus, DOS of “natural product-like’’ libraries is a major area of current interest. Library design strategies have been divided into three general categories, according to the degree of structural similarity to natural products [34,35]: (a) Libraries based on the core scaffold ofan individual natural product, (b) libraries based on specific structural motifs that are found across a class of natural products, and (c) libraries that emulate the structural characteristics of natural products in a more general sense. Each strategy balances the degree of connection to natural product structure space against the accessibility of structural diversity that is likely required to address multiple different biological targets. Interestingly, some structural motifs originally found in natural products have subsequently been identified as privileged structures
I
497
498
I
9 Diversity-oriented Synthesis
(a) Synthetic drugs
F Difiucan (fiuconazole)
0
F
Cipro (ciproftoxacin)
0
Paxi1 (paroxetine)
0
Claritin (loratadine)
,N.Jf
Viagra (sildenafil)
(b) Natural products
OH Penicillin G Amphotericln B
Hor'u"o
0
Me
NMe,
Vancomycin
OH
HO
HO Vincristine Paclitaxel(Taxol)
9. I t Fig. 9.1-5 Structures of synthetic drugs and natural products. (a) Representative examples o f approved synthetic drugs, which are often rich in aromatic rings and nitrogen atoms. (b) Representative examples o f clinically used natural products,
Diversity-oriented Synthesis
which are often rich in stereochemical features and complex ring systems. For a recent comparison o f synthetic drugs and natural products. see Ref. [28]. See also Fig. 9.1-7.
and have been used in synthetic drugs. Some examples include purines, indoles, and benzopyrans [26].
9.1.3.2 9.1.3.2.1
DOS Synthetic Strategies New Synthetic Planning Strategies
DOS requires the development of new synthetic planning strategies. Several early ideas were outlined by Bartlett (see Section 9.1.2.3). More recently, Schreiber has advanced the concept of “forward synthetic analysis” as a more formal approach to DOS planning [36, 371. Chemical transformations are classified as generating structural diversity and/or complexity, both of which are considered as important aims in DOS. Diversity allows a given library to address a wide range of biological targets while complexity provides compounds having useful levels of biological specificity [ 381. To maximize the efficiency of a synthetic route, each diversity-generating reaction should provide products that are substrates for another such reaction. For example, in Schreiber’s two million-member library (see Section 9.1.2.3), y -1actone aminolysis simultaneously couples a building block and unmasks a hydroxyl group for the subsequent acylation step (Fig. 9.1-3(c)). Diversitygenerating reactions can be grouped into those that afford appendage, stereochemical, and skeletal diversity. Complexity-generating reactions are also desirable to convert simple substrates into complex products. In general, such products will have multiple ring systems and stereochemical features. Complexity can also be quantitated using graph theoretical and structurebased approaches [38]. Ideally, complexity-generating reactions also introduce structural diversity simultaneously, although this is not always possible. In the two million-member library example above, the tandem acylation-nitrone cycloaddition reaction generates structural complexity, while concurrently introducing a degree of diversity through the use of ortho, meta, and para substituted aromatics. Developing synthetic routes that provide skeletal diversity with multiple core scaffolds or backbone structures is an area of particular current interest. Several approaches to generating such multiscaffold libraries have been advanced. In one straightforward strategy, Schultz synthesized a 45 140-member library from multiple heterocyclic scaffolds, each having a set of functional groups in common [39].The scaffolds were coupled as building blocks, which were then
I
499
500
I
9 Diversity-oriented Synthesis
9. I 4
Diversity-oriented Synthesis
Fig. 9.1-6 Approaches t o skeletal diversity. scaffolds from precursors, all having a triene (a) Schultz used multiple heterocyclic functionality in common (401. (c) Schreiber scaffolds, all having a common set o f functional groups, as building blocks that could then undergo the same set o f appendage-coupling reactions [39]. (b) Schreiber used consecutive stepwise Diels-Alder cycloaddition reactions with various dienophiles to generate multiple
used a set o f differentially functionalized furan precursors t o generate multiple scaffolds under a single set o f reaction conditions [41]. The nature ofthe scaffold was determined by the functionalization o f the furan sidechains.
all substrates for the same set of subsequent appendage-coupling reactions (Fig. 9.1-6(a)). In another approach, originally advanced by Armstrong (see Section 9.1.2.3), a set of substrates bearing a common functional group is exposed to different complexity-generating reactions to provide a variety of new scaffolds. For example, Schreiber has used a triene intermediate in various appendagecoupling consecutive Diels-Alder reactions to generate multiple complex polycyclic scaffolds in a 29 400-member library [40] (Fig. 9.1-6(b)). Alternatively, substrates bearing different functional groups can be exposed to a single set of reaction conditions that generates different products depending on the “programming“ provided by the functional groups. In a key demonstration of this strategy, Schreiber used an oxidative-acidic reaction sequence to generate multiple scaffolds from furan precursors in a 1260-member library [41] (Fig. 9.1-6(c)). The nature of each scaffold was determined by the functionality of the furan sidechains. 9.1.3.2.2
Assessing Library Diversity
A key goal of these synthetic strategies is to increase the structural diversity provided by each individual library. Since DOS libraries are not directed toward a single biological target, their utility is based on their ability to provide selective probes for different multiple targets. While this “functional diversity” can be assessed only through biological screening, “structural diversity” is often used as an intermediate metric, since it is more readily accessible and likely correlates, at least to some extent, with functional diversity. In both cases, a key tool for analyzing diversity (and similarity) is a statistical method called principal component analysis (PCA) [42]. In this process, a set of n descriptors is defined for each compound in the library. These can be structural descriptors, such as molecular weight; physicochemical descriptors, such as experimentally determined artificial membrane permeability; or biological descriptors, such as binding constants. Each compound can then be represented as a vector in n-dimensional space. Of course, for n > 3 such vectors are difficult to visualize. Thus, PCA is used to analyze the entire data set and to define new unitless axes, called principal components or eigenvectors. Each new axis is a linear combination of the original descriptors, calculated to represent as much of the variance
1
501
502
I
9 Diversity-oriented Synthesis
in the dataset as possible in each successive principal component, based on correlations between the original descriptors. The new axes are orthogonal and uncorrelated. Each compound can then be replotted as a vector in readily visualized one-, two-, or three-dimensional space using its coordinates, or eigenvalues, on these new axes (Fig. 9.1-7).This representation limits the loss of information relative to the original n-dimensionaldataset and allows further processing using statistical methods such as clustering or partitioning [42]. It is important to recognize that the PCA results are highly dependent on the compounds selected for analysis and the descriptors used for each compound, especially for small datasets and for those with outliers. However, PCA has been useful in comparing the molecular properties of synthetic drugs, natural products, and commercial combinatorial libraries [28] and in visualizing small molecule inhibitors of protein-protein interactions in comparison to commercial libraries [33]. Moreover, PCA has proven to be a powerful tool for analyzing biological screening data to assess the functional diversity or similarity of small molecules (see Section 9.1.4.2).
9.1.3.3 New Chemical Methodologies for DOS DOS has proven to be fertile ground for new advances in chemical methodology. Although synthetic techniques such as solid-phase synthesis facilitate the separation of synthetic intermediates from excess reagents and soluble reaction by-products, they do not allow separation of supportbound impurities that may arise from undesired side reactions. With traditional chromatographic purification of synthetic intermediates precluded, extraordinarily high requirements are placed on reaction efficiency and selectivity. In general, DOS routes require reactions that provide ~ 9 0 % yield and stereoselectivity, lest the synthetic sequence produce such a complex mixture as to make purification of the final product impossible. Further, each reaction must be compatible with hundreds or even thousands of different substrates generated by the preceding combinatorial steps. Thus, the same ideals that have driven reaction development in traditional organic synthesis - high yield, selectivity, and generality - apply to DOS to an even greater extent and, as a result, DOS has been an important engine for new advances in synthetic organic chemistry [2, 341. In particular, efforts in DOS have led to a variety of new stereoselective reactions and a resurgence of interest in multicomponent coupling reactions. 9.1.4 Applications and Practical Examples
Screening of DOS libraries has provided a significant number of new biological probes [I].Several recent examples are presented below, with a particular focus on studies that have provided new biological insights [2]. In addition, many of
9.7 Divers@-oriented Synthesis
Fig. 9.1-7 Example o f principle component analysis comparison o f synthetic drugs and natural products. A set o f 20 synthetic drugs, including the top 10 best-sellers in 2004, and 20 natural products was analyzed for nine molecular descriptors: molecular weight, hydrophobicity (X log P or C log P), # hydrogen-bond donors, # hydrogen-bond acceptors, # rotatable bonds, topological polar surface area 1431, # stereogenic centers, # nitrogen atoms, # oxygen atoms. PCA was used t o reduce the nine-dimensional vectors t o two-dimensional vectors, which were then replotted as shown. The first principal component accounts for 55.1% o f the original information and the first two
principal components account for 84.2%. Synthetic drugs (squares, capitalized) and natural products (circles, italicized) cover distinct regions of chemical space with limited overlap; Flonase and Zocor are synthetic drugs that are analogs o f natural products. Molecular descriptors were obtained from PubChem (http://pubchem.ncbi.nlm.nih.gov/) and ChemBank (http://chembank.broad. harvard.edu/) or calculated using ChemDraw/Biobyte and Molinspiration (http://www.molinspiration.com). PCA was performed with R v1.01 (http://cran. r-project.org/). Adapted from Ref. [2] with permission.
I
503
504
I
9 Diversity-oriented Synthesis
Q
CNH2
R3
High-throughput
o*o
0
screening
0
HO
Q
1,890-member library
Structure-activity relationship analysis
Uretupamine A
9 9
*
HO Ph
Uretupamine B
/-kYR4 High-throughput screening
&w.,,
Q HY
0
ex*"' oho
_______)
A
and statistical analysis
HO
HO+ S Y
N T P h
Tubacin
7,200-member biased library YR4 = OH, NHOH.
0% HO
CN-Ph
Histacin
Ph
9.1 Diversity-oriented Synthesis 4
Fig. 9.1-8 Uretupamines, tubacin, and histacin. (a) Schreiber discovered uretupamine A as a function-selective suppressor o f the yeast nutrient signaling protein Ure2p through HTS o f a library of natural productlike compounds [44]. Analysis o f SAR led t o the development o f an improved analog, uretupamine B. See Fig. 9.1-9 for biological data. (b) Tubacin and histacin were discovered as paralog-selective HDAC (histone deacetylase) family inhibitors from a related
I
library [45]. This biased library was targeted to HDACs by capping each library member with a metal-binding functional group at the end o f a long alkyl chain (YR4). Each subset ofthe library was screened in two cytoblot assays for histone acetylation and cr-tubulin acetylation. PCA was used to replot the data t o identify selective inhibitors o f histone versus a-tubulin deacetylation, including histacin and tubacin. See Fig. 9.1-9 for biological data.
these studies highlight a key advantage of screening synthetic combinatorial libraries, as opposed to collections of individually archived compounds. Namely, once a flexible synthetic route has been developed, a “primary” library of diverse molecules can be screened to identify initial “hit” molecules and to provide information on structure-activity relationships (SAR). Using the same synthetic route, the initial hits can then be readily optimized through the synthesis and evaluation of “secondary”, “tuning”, or “focused” libraries and individual analogs to identify compounds with improved potency, specificity, and pharmacological properties.
9.1.4.1
Uretupamines, UreZp, and Glucose Signaling
Ure2p is a yeast signaling protein that regulates cellular responses to the quality of both carbon and nitrogen nutrients (e.g.,glucose vs. acetate and ammonium vs. proline). Ure2p represses the transcription factors Nillp and Gln3p, and differential regulation is thought to distinguish carbon- and nitrogen-nutrientresponsive signaling. Thus, these two effects cannot be separated using Ure2p knockouts (ure2A), while a function-selective small molecule inhibitor would be ideally suited to this task. Since the functional binding sites of Ure2p have not been identified, structure-based rational design cannot be used to identify such an inhibitor. Thus, Schreiber screened a DOS library of 1890 natural productlike compounds in a Ure2p binding assay on a small molecule microarray [44] (Figs. 9.1-8(a)and 9(a)). The initial hits were retested in a secondary cellbased reporter gene assay, leading to the identification of uretupamine A as a functional Ure2p suppressor. The availability of analogs using the established synthetic route allowed rapid development of a more potent analog, uretupamine B. Despite their relatively moderate binding affinities, uretupamines A and B (& = 18.1 and 7.5 pM) exhibited high specificity for targeting Ure2p-mediated effects in transcriptional profiling studies of wild-type and targetless ure2A knockout strains (Fig. 9.1-9(b)). Further examination of the transcriptional profiling data revealed that the uretupamines upregulated a subset of genes that are induced in response to
505
506
I
9 Diversity-oriented Synthesis
I
9.I Diversity-oriented Synthesis 507 4
Fig. 9.1-9 Biological data obtained using probes identified from DOS libraries. Uretupamine: (a) A small molecule microarray o f library members was probed with Cy5-labeled Ure2p. The resulting fluorescent spot corresponding t o Ure2p-bound uretupamine A is shown. (b) The biological effects of uretupamine A were assessed by transcriptional profiling o f wild-type (PM38) and ure2A knockout yeast treated with 100 p M uretupamine A versus vehicle control (N,N-dimethylformamide). Uretupamine upregulates UREZ-dependent genes in wild type, but not “targetless” ure2A cells, indicating a high degree o f specificity (right). Reprinted from Ref. [44] with permission. Tubacin and histacin: (c) Fluorescence microscopy experiments were used to evaluate the effects o f trichostatin A (TSA), a pan-HDAC inhibitor, tubacin, and histacin on histone acetylation (green, top), and a-tubulin acetylation (red, bottom) in A549 cells. Nuclei are stained with Hoechst dye (blue). TSA upregulates both histone and a-tubulin acetylation while tubacin is selective for a-tubulin acetylation and histacin is selective for histone acetylation. Adapted from Refs. [45] and [46] with permission. Stem cell differentiation modulators: (d) TWS119 (1-5 p M ) induces neurogenesis o f mouse embryonic stem cells (D3), as demonstrated by immunofluorescence staining with the neuron-specific markers microtubuleassociated protein 2(a b) (red, top), neurofilament-M (red, bottom), and PIII-tubulin (green, top and bottom). (e) Cardiogenol C (0.25 p M ) induces cardiomyogenesis o f mouse embryonic stem cells (D3), as demonstrated by immunofluorescence staining with the cardiomyocyte-specific markers myosin heavy chain (green, top) and the transcription factor MEF2 (red, bottom). Cell nuclei are stained with DAPl (4’,6-diamidino-2-phenylindole) (blue, top and bottom). . (.f ,) Purmorphamine (2 pM) induces osteogenesis o f mouse multipotent
+
mesenchymal progenitor cells (C3HlOT1/2) as demonstrated by histochemical staining of the osteoblast-specific marker alkaline phosphatase (red) in purmorphaminetreated (bottom), but not dimethyl sulfoxide DMSO-treated (top) cells. Cell nuclei are stained blue. (8) Conversely, reversine (5 p M ) induces dedifferentiation o f lineage-specific murine myoblasts (C2C12) to multipotent mesenchymal progenitor cells, which can then be induced t o differentiated into osteoblasts or adipocytes (not shown). Histochemical staining for the osteoblast-specific marker alkaline phosphatase (red) was apparent in cells exposed to osteogenesis-inducing medium following initial dedifferentiation with reversine (bottom), but not DMSO (top). Adapted from Refs. [47-501 with permission. Fexaramine: (h) Transcriptional profiling was used to analyze the effects o f various FXR agonists in human primary hepatocytes. Profiles were compared following treatment with fexaramine (10 p M ) , a highly specific FXR agonist; chenodeoxycholic acid (CDCA, 100 pM), the primary bile acid; and GW4064 (10 p M ) , another nonsteroidal FXR agonist, versus DMSO-treated controls. Genes whose expression patterns were altered by >2-fold relative to DMSO were identified and subjected to hierarchical clustering as shown. The differences between the expression profiles indicate that CDCA and CW4064 affect other signaling pathways as well as the FXR pathway. (i) Fexaramine was cocrystallized with the FXR and the binding interactions were identified. (j) This allowed construction o f a structural model for the weak binding o f CDCA to FXR. Reprinted from Ref. [51] with permission. llA6B17 Myc-Max inhibitor: (k and I) llA6B17 inhibits cell foci formation in Myctransformed chicken embryo fibroblasts. This compound also inhibits foci formation in Jun- but not Src-transformed cells, indicating a limited degree o f specificity. Adapted from Ref. [52] with permission
carbon nutrient quality, including Nillp. Although Ure2p is usually considered a nitrogen-nutrient-responsivesignaling protein, this suggested that it might also be a direct target of carbon-nutrient-responsivepathways (as opposed to
508
I pathways bypassing Ure2p and acting directly upon Nillp). Further evidence 9 Diversity-oriented Synthesis
for this model was provided by transcriptional profiling experiments with the uretupamines in n d l A and gln3A strains. Ure2p was also found to be selectivelydephosphorylated in response to changes in carbon, but not nitrogen nutrient quality. Thus, these studies with a function-selective small molecule probe from a DOS library shed a new light on the role of Ure2p in glucose signaling.
9.1.4.2
Tubacin, Histacin, and the HDACs
The HDAC family of proteins plays a critical role in modulating chromatin structure and in regulating the functions of other proteins. Several HDAC inhibitors are used in clinical trials as anticancer drugs. However, these inhibitors are not selective among the multiple HDAC paralogs that have been identified. Thus, new selective inhibitors are required to separate the functions of the various HDAC family members. In particular, treatment with pan-HDAC inhibitors also results in hyperacetylation of a-tubulin, the functional implications of which are unclear. Despite the availability of protein structural information, structure-based design of selective HDAC inhibitors has proved to be challenging. Thus, Schreiber leveraged this structural information in combination with DOS to synthesize a library of 7200 dioxane-containing natural productlike molecules that were targeted to HDACs [45] (Fig. 9.1-8(b)).Each library member was capped with a metal-binding functional group at the end of a long alkyl chain, designed to bind a zinc ion at the bottom of a channel in the HDAC active site. This library was first screened using cell-based cytoblot assays that monitored levels of histone and tubulin acetylation. Statistical analysis of the screening data using PCA was then carried out to identify compounds that selectively induced histone or tubulin acetylation. These initial hits were retested in fluorescence microscopy assays to confirm these effects, leading to the identification of tubacin as a selective inducer of a-tubulin acetylation (EC50 = 2.9 pM) and histacin as a selective inducer of histone acetylation (ECSO= 34 pM) [46](Fig. 9.1-9(c)). Tubacin proved to be a particularly valuable tool for studying HDACG, an a-tubulin deacetylase having two catalytic domains [53, 541. In contrast to the pan-HDAC inhibitor trichostatin A (TSA), tubacin had no effect on gene expression in transcriptional profiling experiments and did not affect cell cycle progression. Further, tubacin-induced a-tubulin hyperacetylation did not alter microtubule dynamics, but it did inhibit cell migration. Conversely, overexpression of HDACG had previously been shown to increase cell motility [54]. Additional experiments indicated that HDACG colocalized with acetylated a-tubulin following tubacin treatment, possibly via the HDACG N-terminal catalytic domain, which did not have tubulin deacetylase activity. This suggested a direct role for HDACG in modulating the activities of other microtubule-associated proteins and implicated HDACG in
9.7 Diversity-oriented Synthesis
metastasis and angiogenesis, as well as in neurodegenerative disorders such as Alzheimer’s disease. Recently, tubacin was also shown to synergize with bortezomib against multiple myeloma [60].
9.1.4.3
Stem Cell Differentiation Modulators
The ability to control the fate of stem cells has major potential therapeutic implications in areas such as cancer, neurodegenerative disease, and tissue regeneration. Small molecules that can induce differentiation (or dedifferentiation) are valuable tools for studying these processes and the underlying signaling pathways that regulate them. Schultz has identified several such molecules [55]by screening a multiscaffold DOS library of 45 140 druglike molecules [39](Fig. 9.1-6(a)).Cell-based phenotypic assays have been useful in identifying molecules that may act by novel mechanisms to elucidate new signaling pathways that control differentiation. Several molecules have been identified that induce differentiation of pluripotent embryonic stem cells into particular tissue-specific adult stem cells (Fig. 9.1-10). These adult stem cells have exciting therapeutic potential, but have generally been difficult to obtain by direct isolation and expansion. HTS was accomplished using pluripotent mouse carcinoma cell lines transfected with reporter genes driven by lineage-specific markers. SAR analysis and the ease of secondary tuning library synthesis again proved useful for optimizing the initial hits. Differentiation-inducing activity was further confirmed by immunostaining for additional neuronal or cardiac muscle markers in the carcinoma cell line as well as mouse embryonic stem cell lines. TWS119 was identified as a compound that induces neurogenesis (ECsOx 1 yM) [47] (Fig. 9.1-9(d)) while another series of compounds, the cardiogenols, induce cardiomyogenesis (EC50 = 0.1- 1.O pM) [48] (Fig. 9.1-9(e)).Affinity chromatography experiments identified glycogen synthase kinase-3b (GSK-3b) as one target of TWS119 (& = 126 nM, ICs0 = 30 nM), supporting a role for this protein in neuronal differentiation. Studies to identify the molecular targets of the cardiogenols are ongoing. Another molecule, purmorphamine, was identified in a screen for molecules that induce differentiation of multipotent mesenchymal stem cells into osteoblasts (ECS0= 1 pM) [49] (Fig. 9.1-9(4). HTS was accomplished using a fluorescence-based enzymatic assay for the bone-specific marker, alkaline phosphatase. Consistent with its osteogenic activity, purmorphamine also upregulated Cbfal/Runx2, a master regulator of bone development, and other bone-specific markers. Subsequent transcriptional profiling experiments revealed that purmorphamine upregulates the Hedgehog signaling pathway 1551. Conversely, dedifferentiation of tissue-specific progenitor cells could provide another source of multipotent stem cells, which could then be retasked to other lineages. This would be analogous to the process of tissue regeneration observed in some amphibians. Along these lines, reversine has been identified as a compound that induces dedifferentiation of myoblasts to multipotent
I
509
Diversity-oriented Synthesis 5101 9
H HO-N HO
0
H
cyNO
,NH2
TWS119
R Cardiogenol A (R = NHPh) Cardiogenol B (R = OPh) Cardiogenol C (R = OMe) Cardiogenol D (R = (QCH=CHPh)
NH
Purmorphamine Fig. 9.1-10 Small molecule modulators o f stem cell differentiation. Schultz has discovered a number o f small molecules that modulate stem cell differentiation from a multiscaffold library o f druglike heterocycles [39] (see Fig. 9.1-6(a)). (a) T W S l l 9 induces neurogenesis o f mouse embryonic stem cells [47]. (b) The cardiogenols induce cardiomyogenesis of
Reversine mouse embryonic stem cells [48]. (c) Purmorphamine induces osteogenesis o f mouse mesoderm fibroblast cells [49]. (d) Reversine induces dedifferentiation o f lineage-specific murine myoblasts to multipotent mesenchymal progenitor cells, which can then be induced t o differentiate into osteoblasts or adipocytes [50]. See Fig. 9.1-9 for biological data.
mesenchymal progenitor cells (complete at 5 pM) [50] (Fig. 9.1-9(g)).HTS was accomplished using a two-stage assay involving initial treatment of myoblasts with a compound to induce dedifferentiation, followed by exchange into osteogenesis-inducing medium and assaying for alkaline phosphatase expression as above, to detect osteoblast formation. The dedifferentiating capacity of reversine was further confirmed by dedifferentiation of myoblasts followed by redifferentiation to adipocytes, and by the inability of reversine to induce direct transdifferentiation of myoblasts to osteoblasts. Efforts to identify the molecular targets of reversine and to improve its potency and specificity are ongoing.
9. I Diversity-oriented Synthesis
9.1.4.4 Fexaramine and the Farnesoid X Receptor The farnesoid X receptor (FXR) is a nuclear hormone receptor implicated in the regulation of cholesterol metabolism. In response to bile acids, FXR is thought to repress genes responsible for conversion of excess cholesterol to bile acids and to induce genes involved in bile acid transport. However, bile acids are low-affinity ligands for FXR. Thus, novel high-affinity ligands would be useful probes to study the physiological functions of FXR and to evaluate its potential as a new therapeutic target. In the absence of protein structural information, Nicolaou and Evans used a reporter gene assay to screen a DOS library of 10000 compounds based on 2,2-dimethylbenzopyran, a privileged substructure that is found in numerous natural products and has also been used in synthetic drugs [51, 561. This provided a number of moderate agonists (EC50 = 5-10 yM). Through extensive SAR analysis and the synthesis and evaluation of several secondary focused libraries, they identified fexaramine as a potent agonist (ECSo = 25 nM), which no longer contained the benzopyran substructure (Fig. 9.1-11).A fluorescence resonance energy transfer (FRET) assay was used to confirm that fexaramine induces binding of FXR and the steroid receptor coactivator (SRC-1). Fexaramine was further demonstrated to upregulate known FXR target genes in FXR-expressing cell lines. However, it did not activate a panel of other nuclear hormone receptors, indicating a high degree of specificity. The genome-wide effects of fexaramine-induced FXR activation were then evaluated in transcriptional profiling experiments (Fig. 9.1-9(h)).Strikingly, fexaramine induced a distinct transcriptional profile compared to a bile acid, indicating that the latter likely interacts with multiple signaling pathways. Moreover, new potential roles for FXR in the bilirubin biosynthetic pathway, thyroid metabolism, and amino acid transport were revealed. Fexaramine was also cocrystallized with FXR (Fig. 9.1-9(i))to gain structural insights into the binding interactions, allowing a model for low-affinity binding by bile acids to be proposed (Fig. 9.1-9(j)).Thus, this molecule identified from a DOS library has proven to be a valuable tool for probing FXR structure and function.
9.1.4.5 Protein-Protein and Protein-DNA Interaction Antagonists Historically, protein-protein and protein-DNA interactions have been extremely difficult targets to address with synthetic druglike molecules, owing in part to the large, flat, discontinuous binding surfaces often involved and the lack of endogenous small molecule ligands to use as starting points for rational design [32]. To address this important challenge, Boger has synthesized a variety of natural product-like libraries that are based on peptides, peptidomimetics, or other oligomeric natural products. Notably, efficient solution phase syntheses and mixture deconvolution protocols were developed to synthesize and screen these libraries.
I
51 1
512
I
9 Diversity-oriented Synthesis High-throughput screening _____)
R' 10,000.rnember lfbrary (R' = 9 scaffolds)
Lead compounds (EC,,
= 5-10yM)
OMe
Secondary Screening and __3c
_____)
0 Screening and _____)
secondary library synthesis
~3.
OMe
OMe Screening and _____t
Secondary library synthesis
R3
Me,N OMe Fig. 9.1-11 Fexaramine, a potent, highly specific nonsteroidal agonist o f the farnesoid X receptor. Nicolaou used a library o f compounds built around the privileged 2,2-dimethylbenzopyran substructure, which is found in a wide range of natural products, t o discover lead compounds that were moderate agonists o f the farnesoid X
Fexaramine (EC,, = 25 nM)
OMe
receptor [51, 561. Synthesis and screening o f multiple secondary libraries provided extensive SAR data, ultimately leading t o the development of fexaramine as a potent agonist. Fexaramjne proved t o be highly specific for activation ofthe FXR signaling pathway. See Fig. 9.1-9 for biological data.
This approach has yielded an impressive collection of molecules that inhibit both extracellular and intracellular protein-protein interactions, as well as protein-DNA interactions [57]. In one particularly interesting case, a
9. 1 Diversity-oriented Synthesis
I
513
series of isoindoline-based compounds were identified by Vogt and Boger as inhibitors of the protein-protein interaction between the Myc and Max transcription factors [52]. Myc is aberrantly activated in a number of human cancers and acts by heterodimerization with Max via their helix-loop-helix leucine zipper domains, leading to transcription of Myc target genes. Several different DOS libraries were screened using a biochemical FRET assay, yielding four hits, including IIAGB17, from a 240-member library built around a peptidomimetic isoindoline scaffold (Fig. 9.1-12). The activity of the hits was further confirmed using enzyme-linked immunosorbent assays (ELISA) and electrophoretic mobility shift assays (EMSA) (IIAGB17 ELISA IC50 125 pM; EMSA IC50 50 pM). Two of the hits also inhibited cell focus formation in Myc-transformed chicken embryo fibroblasts (IIAGB17 IC50 = 15-20 pM) (Fig. 9.1-1O(k,l)).In control experiments, IIAGB17 also inhibited focus formation in Jun-transformed cells, but not Src-transformed cells, indicating a limited degree of biological specificity. While further characterization of these inhibitors is necessary, this work demonstrated the feasibility of inhibiting transcription factor protein-protein interactions
Solubon phase synthesis
OMe
X-R3
R’
0
0 lsoindoline diester
240-member library
High-throughput screening ___L
-s
0
llA6B17, a small molecule inhibitor ofthe Myc-Max protein-protein interaction. Vogt and Boger identified llA6B17 by screening a library built around a peptidomimetic isoindoline scaffold [52].A
Fig. 9.1-12
llA6817 biochemical FRET assay was used in the initial screen and the hits were analyzed further using ELISA, EMSA, and cell foci formation assays. See Fig. 9.1-9 for biological data.
514
I with small molecules. Such probes should be valuable tools for dissecting the 9 Diversity-oriented Synthesis
roles of these transcription factors in cancer and for evaluating their potential as new therapeutic targets.
9.1.5 Future Development
DOS has provided a powerful arsenal of new small molecule probes to dissect complex biological processes. It has also driven new advances in the field of synthetic organic chemistry. In the continuing evolution of this field, the current focus is on refining library design strategies so that new probes can be identified as efficiently as possible given a particular biological target or system of interest. For example, correlation of particular chemical scaffolds with specific classes of biological targets will facilitate prioritization of appropriate compounds to screen against these targets. Other targets may prove more challenging, requiring ventures into new, uncharted regions of chemical structure space. Systematic evaluation of various library design strategies across a wide range of biological assays is on the horizon under the Molecular Libraries Initiative of the National Institutes of Health [58]. Importantly, the results of these experiments will be deposited into the publicly available PubChem database (http://pubchem.ncbi.nlm.nih.gov/) to allow subsequent statistical analyses through data mining. This will provide valuable information for future efforts in library design.
9.1.6 Conclusion
DOS is a powerful new approach to identifying new small molecule probes to dissect complex biological systems. Both drug-like and natural productlike libraries that target biologically relevant regions of chemical structure space have proven useful for discovering such probes. New synthetic planning strategies and new chemical methodologies have also been developed in the context of DOS. Thus, the exciting potential of DOS in chemical biology has now been demonstrated clearly. Further evolution and refinement of this field can be expected in the coming years.
Acknowledgments
Generous financial support for my laboratory has been provided by the NIH (P41 GM076267, R21 CA 104685), CDMRP (CM030085), NYSTAR James D. Watson Investigator Program, William Randolph Hearst Fund in
References 1515
Experimental Therapeutics, Mr. William H. Goodwin and Mrs. Alice Goodwin and the Commonwealth Foundation for Cancer Research, and Experimental Therapeutics Center of MSKCC.
References 1.
2.
3.
4.
5.
6.
7.
8.
9.
J.S. Potuzak, S.B. Moilanen, D.S. Tan, Discovery and applications of small molecule probes for studying biological processes, Biotechnol. Genet. Eng. Rev. 2004, 21, 11-77. D.S. Tan, Diversity-oriented synthesis: exploring the intersections between chemistry and biology, Nut. Chem. Biol., 2005, I, 74-84. R.B. Merrifield, Solid phase peptide synthesis. I. The synthesis of a tetrapeptide, /. Am. Chem. Soc. 1963, 85,2149-2154. F. Guillier, D. Orain, M. Bradley, Linkers and cleavage strategies in solid-phase organic synthesis and combinatorial chemistry, Chem. Rev. 2000, 100,2091-2157. C.C. Tzschucke, C. Markert, W. Bannwarth, S. Roller, A. Hebel, R. Haag, Modern separation techniques for efficient workup in organic synthesis, Angew. Chem. Int. Ed. Engl. 2002,41,3964-4000. A. Kirschning, H. Monenschein, R. Wittenberg, Functionalized polymers-emerging versatile tools for solution-phase chemistry and automated parallel synthesis, Angew. Chem. Int. Ed. Engl. 2001, 40,650-679. J.G. Garcia, Scavenger resins in solution-phase combichem, Methods En~ymol.2003,369,391-412. X. Li, D.R. Liu, DNA-templated organic synthesis: Nature’s strategy for controlling chemical reactivity applied to synthetic molecules, Angew. Chem. Int. Ed. Engl. 2004, 43,4848-4870. H.M. Geysen, R.H. Meloen, S. J. Barteling, Use of peptide synthesis to probe viral antigens for epitopes to a resolution of a single amino acid, Proc. Natl. Acad. Sci. U. S . A. 1984, 81, 3998-4002.
10.
11.
12.
13.
14.
15.
16.
17.
R.A. Houghten, General method for the rapid solid-phase synthesis of large numbers of peptides: specificity of antigen-antibody interaction at the level of individual amino acids, Proc. Natl. Acad. Sci. U. S. A. 1985, 82, 5131-5135. A. Furka, F. Sebestyen, M. Asgedom, G. Dibo, General method for rapid synthesis of multicomponent peptide mixtures, Int.]. Pept. Protein Res. 1991, 37,487-493. K.S. Lam, S.E. Salmon, E.M. Hersh, V.J. Hmby, W.M. Kazmierski, R. J. Knapp, A new type of synthetic peptide library for identifying ligand-binding activity, Nature 1991, 354, 82-84. R.L. Affleck, Solutions for library encoding to create collections of discrete compounds, Curr. opin. Chem. Bid. 2001, 5, 257-263. R.A. Houghten, General method for the rapid solid-phase synthesis of large numbers of peptides: specificity of antigen-antibody interaction at the level of individual amino acids, Proc. Natl. Acad. Sci. U. S . A. 1985, 82, 5131-5135. J.A. Ellman, Design, synthesis, and evaluation of small-molecule libraries, Ace. Chem. Res. 1996, 29, 132-143. S. Hobbs DeWitt, J.S. Kiely, C.J. Stankovic, M.C. Schroeder, D.M. Reynolds Cody, M.R. Pavia, “Diversomers”: an approach to nonpeptide, nonoligomeric chemical diversity, Proc. Natl. Acad. Sci. U. S. A. 1993, 90,6909-6913. D.S. Tan, M.A. Foley, M.D. Shair, S.L. Schreiber, Stereoselective synthesis of over two million compounds having structural features both reminiscent of natural products and compatible with
516
I
9 Diversity-orjented Synthesis
18.
19.
20.
21.
22. 23.
24. 25.
26.
27.
miniaturized cell-based assays, J. Am. Chem. SOC.1998, 120,8565-8566. M.A. M a n , A.-L. Grillot, C.T. Louer, K.A. Beaver, P.A. Bartlett, Synthetic design for combinatorial chemistry. Solution and polymer-supported synthesis of polycyclic lactams by intramolecular cyclization of azomethine ylides, J. Am. Chem. Soc. 1997, 119,6153-6167. T.A. Keating, R.W. Armstrong, Postcondensation modifications of ugi four-component condensation products: 1-isocyanocyclohexeneas a convertible isocyanide. Mechanism of conversion, synthesis of diverse structures, and demonstration of resin capture, J. Am. Chem. SOC.1996, 118, 2574-2583. P.A. Tempest, R.W. Armstrong, Cyclobutenedione derivatives on solid support: toward multiple core structure libraries, J. Am. Chem. SOC. 1997, 119,7607-7608. D.A. Erlanson, R.S. McDowell, T. O’Brien, Fragment-based drug discovery,]. Med. Chem. 2004,47, 3463-3482. B.K. Shoichet, Virtual screening of chemical libraries, Nature 2004,432, 862-865. D.B. Kitchen, H.Decornez, J.R. Furr, J. Bajorath, Docking and scoring in virtual screening for drug discovery: methods and applications, Nat. Rev. Drug D~SCOV. 2004, 3, 935-949. C.M. Dobson, Chemical space and biology, Nature 2004, 432, 824-828. B.E. Evans, K.E.Rittle, M.G. Bock, R.M. DiPardo, R.M. Freidinger, W.L. Whitter, G.F. Lundell, D.F. Veber, P.S. Anderson, et al. Methods for drug discovery: development of potent, selective, orally effective cholecystokinin antagonists, J. Med. Chem. 1988, 31,2235-2246. R.W. DeSimone, K.S.Currie, S.A. Mitchell, J.W. Darrow, D.A. Pippin, Privileged structures: applications in drug discovery, Comb. Chem. High Throughput Screen. 2004, 7,473-493. M.A. Koch, R. Breinbauer, H. Waldmann, Protein structure similarity as guiding principle for
combinatorial library design, Biol. Chem. 2003,384,1265-1272. 28. M. Feher, J.M. Schmidt, Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry, J. Chem. InJ Comput. Sci. 2003, 43,218-227. 29. D.A. Horton, G.T. Bourne, M.L. Smythe, The combinatorial synthesis of bicyclic privileged structures or privileged substructures, Chem. Rev. 2003, 103,893-930. 30. M.S. Lajiness, M. Vieth, J. Erickson, Molecular properties that influence oral drug-like behavior, C u r . Opin. Drug Discov. Devel. 2004, 7,470-477. 31. M.M. Hann, T.I. Oprea, Pursuing the leadlikeness concept in pharmaceutical research, C u r . Opin. Chem. Biol., 2004, 8, 255-263. 32. M.R. Arkin, J.A. Wells, Small-molecule inhibitors of protein-protein interactions: progressing towards the dream, Nat. Rev. Drug Discov. 2004, 3, 301-317. 33. L. Pagliaro, J. Felding, K. Audouze, S.J. Nielsen, R.B. Terry, C. Krog-Jensen,S. Butcher, Emerging classes of protein-protein interaction inhibitors and new tools for their development, C u r . Opin. Chem. Bid. 2004,8,442-449. 34. S. Shang, D.S. Tan, Advancing chemistry and biology through diversity-oriented synthesis of natural product-like libraries, Curr. Opin. Chem. Bid. 2005, 9,248-258. 35. D.S. Tan, Current progress in natural product-like libraries for discovery screening, Comb. Chem. High Throughput. Screen. 2004, 7, 631-643. 36. S.L. Schreiber, Target-oriented and diversity-oriented organic synthesis in drug discovery, Science 2000, 287, 1964-1969. 37. M.D. Burke, S.L. Schreiber, A planning strategy for diversity-oriented synthesis, Angew. Chem. Int. Ed. Engl. 2004,43,46-58. 38. P. Selzer, H.-J. Roth, P. Ertl, A. Schuffenhauer, Complex molecules: do they add value? C u r . Opin. Chem. Biol..2005, 9, 310-316
References
39. S. Ding, N.S. Gray, X. Wu, Q. Ding,
40.
41.
42.
43.
44.
45.
46.
47.
48.
P.G. Schultz, A combinatorial scaffold approach toward kinase-directed heterocycle libraries, /. Am. Chem. SOC. 49. 2002, 124,1594-1596. 0. Kwon, S.B. Park, S.L. Schreiber, Skeletal diversity via a branched pathway: efficient synthesis of 29,400 discrete, polycyclic compounds and their arraying into stock solutions, 1. 50. Am. Chem. SOC.2002, 124, 13402-13404. M.D. Burke, E.M. Berger, S.L. Schreiber, Generating diverse skeletons of small molecules 51. combinatorially, Science 2003, 302, 613-618. L. Xue, F.L. Stahura, J. Bajorath, Cell-based partitioning, Methods Mol. Biol. 2004, 275, 279-289. P. Ertl, B. Rohde, P. Selzer, Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport 52. properties, J . Med. Chem. 2000, 43, 3714-3717. F.G. Kuruvilla, A.F. Shamji, S.M. Sternson, P.J. Hergenrother, S.L. Schreiber, Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays, Nature 2002, 416, 53. 653-657. S.J. Haggarty, K.M. Koeller, J.C. Wong, R.A. Butcher, S.L. Schreiber, Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays, 54. Chem. Biol. 2003, 10,383-396. J.C. Wong, R. Hong, S.L. Schreiber, Structural biasing elements for in-cell histone deacetylase paralog selectivity, J. Am. Chem. SOC.2003, 125, 5 586- 5 587. 55. S. Ding, T.Y.H. Wu, A. Brinker, E.C. Peters, W. Hur, N.S. Gray, P.G. Schultz, Synthetic small molecules 56. that control stem cell fate, Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 7632-7637. X. Wu, S. Ding, Q. Ding, N.S. Gray, P.G. Schultz, Small molecules that
induce cardiomyogenesis in embryonic stem cells, /. Am. Chem. SOL. 2004, 126,1590-1591. X. Wu, S. Ding, Q. Ding, N.S. Gray, P.G. Schultz, A small molecule with osteogenesis-inducing activity in multipotent mesenchymal progenitor cells, /. Am. Chem. SOC.2002, 124, 14520-14521. S. Chen, Q. Zhang, X. Wu, P.G. Schultz, S. Ding, Dedifferentiation of lineage-committed cells by a small molecule, /. Am. Chem. SOC.2004, 126, 410-411. M. Downes, M.A. Verdecia, A.J. Roecker, R. Hughes, J.B. Hogenesch, H.R. Kast-Woelbern, M.E. Bowman, J.-L. Ferrer, A.M. Anisfeld, P.A. Edwards, J.M. Rosenfeld, J.G.A. Alvarez, J.P. Noel, K.C. Nicolaou, R.M. Evans, A chemical, genetic, and structural analysis of the nuclear bile acid receptor FXR, Mol. Cell 2003, I I , 1079-1092. T. Berg, S.B. Cohen, J . Desharnais, C. Sonderegger, D.J. Maslyar, J. Goldberg, D.L. Boger, P.K. Vogt, Small-molecule antagonists of Myc/Max dimerization inhibit Myc-induced transformation of chicken embryo fibroblasts, Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 3830-3835. S.J. Haggarty, K.M. Koeller, J.C. Wong, C.M. Grozinger, S.L. Schreiber, Domain-selective small-molecule inhibitor of histone deacetylase 6 (HDAC6)-mediated tubulin deacetylation, Proc. Natl. Acad. Sci. U.S.A. 2003, 100,4389-4394, C. Hubbert, A. Guardiola, R. Shao, Y. Kawaguchi, A. Ito, A. Nixon, M. Yoshida, X.-F. Wang, T.-P. Yao, HDAC6 is a microtubule-associated deacetylase, Nature 2002, 41 7, 455-458. S. Ding, P.G. Schultz, A role for chemistry in stem cell biology, Nat. Biotechnol. 2004, 22, 833-840. K.C. Nicolaou, R.M. Evans, A.J. Roecker, R. Hughes, M. Downes, J.A. Pfefferkorn, Discovery and optimization of non-steroidal FXR aeonists from natural Droduct-like D
I
517
518
I
9 Diversity-oriented Synthesis
libraries, Org. Biomol. Chem. 2003, I , 908-920. 57. D.L. Boger, J. Desharnais, K. Capps, Solution-phase combinatorial libraries: modulating cellular signaling by targeting protein-protein or protein-DNA interactions, Angew. Chem., Int. Ed. Engl. 2003, 42, 4138-4176. 58. C.P. Austin, L.S. Brady, T.R. Insel, F.S. Collins, Policy forum: molecular biology: NIH molecular libraries initiative, Science 2004, 306,1138- 1139.
W. Zang, Fluorous technologies for solution-phase high-throughput organic-synthesis, Tetrahedron, 2003, 59,4475-4489. 60. T. Hideshima, J.E. Bradner, J. Wong, D. Chauhan, P. Richardson, S.L. Schreiber, K.C. Anderson, Small-molecule inhibition of proteasome and aggresome function induces synergistic antitumor activity in multiple myeloma, Proc. Natl. Acad. S C ~U. . S. A. 2005, 102,8567-8572, 59.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
9.2 Cornbinatorial Biosynthesis ofpolyketides and Nonribosomal Peptides
9.2 Combinatorial Biosynthesis o f Polyketides and Nonribosomal Peptides
Nathan A. Schnarr and Chaitan Khosla
Outlook
The pursuit of novel biologically active molecules remains a difficult and critical challenge at the forefront of the chemistry-biology interface. Nature provides a vast array of chemical scaffolds on which to build diversity and functionality. This chapter outlines the advances in the area of chemical and genetic manipulation of the biosynthetic machinery responsible for the production of polyketide and nonribosomal peptide natural products. We hope to familiarize the reader with important developments and remaining challenges in this area as well as demonstrate the enormous potential that lies ahead for chemists, biologists, and engineers alike.
9.2.1 Introduction
As the need for new and improved pharmaceutical and material-based compounds continues to grow, it is abundantly clear that cooperation among scientists with very diverse backgrounds is essential to meet the demand. As questions become increasingly complex, we rely more heavily on nature to provide insight. This is especially true in the area of drug discovery and design where biological systems offer an inordinate amount of important information regarding structure-activity relationships in small molecules. In many cases, the organism has accomplished the difficult task of creating the appropriate chemical scaffold and it is left to the researcher to optimize for a particular target. Reprogramming the biosynthetic machinery responsible for assembling these molecules offers unmatched potential for production of useful natural product analogs. Polyketides and nonribosomal peptides are an important class of compounds, which display a wide range of properties including antibiotic (erythromycin, vancomycin), immunosuppressant (rapamycin), and antitumor (epothilone) activities [l, 21 (Fig. 9.2-1). Although the specific building blocks that make up these structural diverse molecules vary widely, their biosynthetic pathways remain highly conserved and readily deconstructed. Significant efforts have focused on understanding the basic processes associated with polyketide and nonribosomal peptide syntheses resulting in several successful reprogramming attempts to create “unnatural” natural products. To understand better the rationale behind these biosynthetic manipulations, Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited bv Stuart L. Schreiber. Tarun M. Kauoor. and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN 978-3-527-31150-7
1
519
520
I
9 Diversity-oriented Synthesis
OH 0
0 ' OH 0
Actinorhodin Erythromycin A
Tetracenornycin
YNH2 0 I
,OH
0
Tyrocidine A Surfactin A
Fig. 9.2-1 Polyketide and nonribosomal peptide structures described in the text. Erythromycin A is produced by a modular polyketide synthase. Actinorhodin and
tetracenomycin are constructed via aromatic polyketide synthases. Surfactin A and tyrocidine A are produced through nonribosomal peptide synthetases.
we need to first become familiar with the mechanisms involved in constructing these molecules. Polyketides are generally separated into two common classes on the basis of the precise organization of biosynthetic enzymes [3-6]. Multimodular (type I) polyketide synthases (PKSs), consist of large polypeptides, containing individual, covalently tethered modules responsible for single ketide-unit elongation of the growing chain. The specific arrangement of these modules directly determines the structural and stereochemical outcome of the final product. In contrast, type I1 PKSs, primarily involved in biosynthesis of aromatic compounds, function through iterative cycling of the growing polyketide chain between noncovalently interacting enzymes. Product size is ultimately determined by a chain length factor (CLF) associated with the clustered enzymatic domains. Although subtle, this mechanistic distinction produces vastly different structures and each will be discussed separately. As stated, modular PKSs function through cooperation of large, multienzyme polypeptides. Primer units, which vary widely from simple acetate/propionate to complex aromatic acids, are loaded onto the ketosynthase (KS) domain of
[TryrLqTJ-
[?r?..lT]
9.2 Cornbinatorial Biosynthesis offolyketides and Nonribosornal Peptides
A
3
5
3
OH
3
SH
I
B
3
S
521
SH
OH
0
H o t B O A0 f j C0o A
[7ry-rq-7][FlTrTIF] JL
3
SH
3
3
OH
3
SH
3
OH
H 3
1 , 1 1 ,
HO
Fig. 9.2-2 Proposed mechanism for polyketide formation in modular PKSs. (A) Substrate is transferred t o KS from upstream ACP. (B) AT i s loaded with methylmalonyl extender unit. (C) Extender unit is transferred to downstream ACP.
HO
Claisen-like condensation between diketide and extender unit produces ACP-bound triketide. KR domain reduces ,!?-ketothioester to p-hydroxy thioester. KS - ketosynthase, AT - acyltransferase, KR - ketoreductase, ACP - acyl carrier protein.
the first module via thioester linkage to the active-site cysteine (Fig. 9.2-2). The next sequential (downstream) acyl carrier protein (ACP)receives a specific extender unit, usually derived from malonyl- or methylmalonyl-CoA, from the appropriate acyl transferase (AT) domain. A Claisen-like decarboxylative condensation between the primer and extender units affords an ACP-bound p-ketothioester. The ultimate oxidation state and stereochemical configuration of the intermediates are determined by collaboration of optional ketoreductase
522
I (KR), dehydratase (DH), and enol reductase (ER) domains while docked at 9 Diversity-oriented Synthesis
the ACP. Once fully processed, the extended chain is passed to the KS of the subsequent module by a transthioesterification reaction. The process is repeated, leading to the final module where the product is generally excised via hydrolysis or thioesterase (TE)mediated macrocyclization. The less clearly understood aromatic PKSs utilize a single KS(CLF)/ACP pair capable of multiple elongation reactions to construct the complete polyketide backbone. The number of elongation events is controlled by the CLF associated with the KS domain. Transthioesterification and decarboxylative condensation reactions proceed in an analogous fashion to modular systems. The ultimate topology of advanced aromatic polyketides is controlled by precise combination of tailoring enzymes responsible for redox chemistry and cyclization pattern. Analogous to polyketide biosynthesis, nonribosomal peptide natural products are produced by nonribosomal peptide synthetase (NRPS) assembly lines. A thioester template similar to the PKS systems is employed but with very different extender units. In place of simple malonate and substituted malonate groups, NRPSs utilize amino acids (proteinogenic and nonproteinogenic) as their aminoacyl-AMP derivatives for chain extension. Minimal NRPSs consist of an adenylation domain (A), peptidyl carrier protein (PCP) or thiolation domain (T), and a condensation domain (C). The A domain is responsible for loading the PCP or T domain with the appropriate aminoacyl component. The condensation domain then catalyzes the peptide bond formation between flanking aminoacyl-PCP/T domains. Auxiliary domains including methylation (M), epimerization (E), cyclization (Cy), and TEs combine to control peptide topology and functionality similar to aromatic PKS assemblies. An increasing number of “hybrid” systems containing both NRPS and PKS components are being identified. The compatibility of these systems speaks of the mechanistic similarities and offers an additional level of potential regarding genetic and chemical reprogramming. Despite the many lingering questions concerning nonribosomal peptide and polyketide syntheses (vide infa), our current level of understanding provides numerous possibilities for combinatorial biosynthesis. It is clear that deciphering the elaborate interplay between chemistry and biology that governs the reactivity in these systems will require innovative thought and experimentation. In the simplest of terms, manipulation of polyketide and nonribosomal peptide components involves alteration of materials, tools, or both. From a chemical standpoint, modification of building blocks can ideally result in structures limited only by our imagination. Biologically, genetic control over biosynthetic machinery could allow, theoretically, for boundless reprogramming capabilities. Realistically, insight from both perspectives will be required as enzyme selectivity and reactivity can impede combinatorial prospects. With a basic understanding of the intricate construction of polyketides and nonribosomal peptides, we can discuss the potential for biosynthetic generation of analogous compounds. Chemical synthesis provides a powerful
9.2 Cornbinatorial Biosynthesis ofpolyketides and Nonribosornal Peptides
approach to this end. Modification of simple reagents incorporated into these elaborate scaffolds opens possibilities for customized tailoring of structure and functionality. In addition, subjecting more advanced intermediates to specific sets of enzymes allows for additional chemical variation and control. This approach may permit circumvention of highly selective enzymes that limit processing capabilities. Genetic manipulation of macromolecular components offers promise as an orthogonal method of analog production. In lieu of chemical synthesis, redirecting the biosynthetic machinery to produce novel compounds may be opted for. Numerous approaches can be considered including physical swapping of domains or modules, addition or inactivation of tailoring enzymes, and alteration of product release. The significant challenge to these methods, thoroughly discussed later in this chapter, involves optimization of protein-protein recognition elements to achieve usable kinetic parameters for product transfer. Combating inherent selectivity, both small molecule and macromolecular, will likely require combinations of the above methods. Subtle changes in polyketide structure may necessitate reconstruction of synthase components. Each case will provide important advances and significant obstacles. As we will see, progress toward true combinatorial biosynthesis continues to advance and with it, our understanding of polyketide and nonribosomal peptide synthesis on the whole. 9.2.2 History/Development
For the past few decades, efforts toward combinatorial biosynthesis of polyketides and nonribosomal peptides have primarily focused on determining enzyme reactivity and specificity in truncated synthases [7-121. Given the enormous size of the intact systems, obtaining information about individual steps would prove challenging. Despite this, several successful attempts at producing full-length products have been realized. This section will highlight some of these accomplishments for each class of molecule described above. Most ofour knowledge regarding modular PKSs comes from the work on the 6-deoxyerythronolide B synthase (DEBS) that is responsible for production of the 14-membered macrolactone precursor to erythromycin [ 31. This relatively small PKS is composed of three polypeptides (DEBS1, DEBS2, and DEBS3), each of which contains two distinct modules (Fig. 9.2-3). In addition, DEBS1 possesses a loading didomain, which specifically transfers the propionate group to the KS of module 1. Module 6 bears a TE domain responsible for cyclization of the full-length polyketide. Recognition domains, termed linker regions, control the precise arrangement of the individual polypeptides [13- 181. Early studies showed high selectivity for the natural propionate starter unit on the loading didomain of DEBS. Slight alterations in chain length,
I
523
524
I
9 Diversity-oriented Synthesis
Fig. 9.2-3 Schematic diagram o f 6-deoxyerythronolide B synthase (DEBS). The synthase consists ofthree separate polypeptides composed o f two modules each, which are responsible for a single ketide-unit elongation of the 6-deoxyerythronolide B backbone. The
terminal TE domain is responsible for cyclization and release o f the fully elongated product. LDD - loading didomain, KS - ketosynthase, AT - acyltransferase, KR - ketoreductase, ER - enoyl reductase, DH - dehydratase, ACP - acyl carrier protein, TE - thioesterase.
acetyl or butyryl, resulted in significantly lower incorporation rates relative to propionate [ 191. More complex substrates including benzoyl, phenylacetyl, and B-hydroxybutyryl displayed little to no relative loading propensity. To circumvent this obvious difficulty, a strategy was employed where the KS of module 1 was inactivated through site-directed mutagenesis. This allowed for direct incorporation of a phenyl analog of the natural diketide resulting in production of 14-phenyl-6-deoxyerythronolide B [20]. The AT domain, responsible for selecting suitable extender units has also been shown to possess high substrate specificity. To address the challenge of incorporating unnatural extender units, methylmalonyl-specific DEBS AT domains have been replaced with malonyl and ethylmalonyl-specific AT domains to generate novel macrolactones [21-231. Production of 6desmethyl-6-ethylerythromycinA, from the ethylmalonyl-specific AT domain replacement, required increased levels of intracellular ethylmalonyl-CoA.The authors explain this to be the result of competitive loading with methylmalonylCoA, suggesting some level of relaxed substrate specificity in the heterologous AT domain. Several successful attempts at altering the extent of reduction have been completed through mutagenic inactivation of KR, DH, and ER domains
9.2 C o m b i n a t o h / Biosynthesis ofPolyketides and Nonribosomal Peptides
[24-261. The difficult task of adding these domains where they are absent has been accomplished through generation of hybrid modules. Santi and coworkers were able to control ultimate oxidation state of 6-deoxyerythronolide B analogs by genetic insertion of redox-active domains from the rapamycin synthase into various DEBS modules [27]. Interestingly, some modifications resulted in incomplete reduction of intermediates possibly due to competition between reduction and chain transfer to the downstream module. This observation underscores the delicate reactivity balance that must be addressed when combining domains and modules not naturally associated with one another. Attempts at altering polyketide chain length have resulted in a number of abridged lactones. By repositioning the thioester domain in DEBS to the C-terminal end of module 5, a 12-membered macrolactone analog of 10deoxymethynolide, the aglycon precursor to methymycin, was produced [28]. This study revealed the propensity for TE cyclization of nonnatural substrates, which has since been used to permit multiple turnover experiments using single, isolated modules. In contrast, the stand-alone TE domain exhibits increased selectivity relative to those fused to various modules indicating a possible change in the mechanism [29]. In contrast to the modular systems, our understanding of aromatic PKSs remains largely undeveloped. However, this area does benefit from several high-resolution crystal and solution structures of individual domains, which provide enormous insight into enzyme specificity and mechanism [30-341. The ability to program specifio polymerization parameters promises readily accessible structure variation. By simply choosing an appropriate starter unit and polyketide length determinant, arrays of small aromatic molecules could be potentially designed. To elucidate the precise role of the CLF, the chain length specificity in the actinorhodin (act) and tetracenomycin (tcrn) PKSs was effectively altered by site-specific mutagenesis of the CLF [35]. For this, residues associated with the KS-CLF dimer interface (as determined from crystallographic data) were compared across a number of aromatic PKSs that specifically produce polyketide backbones ranging from Clb to C24. Mutation of two key residues in the CLF enabled the production of decaketide products in the typically octaketide-specific act system. Similarly, single point mutation of the wildtype tcrn CLF effected conversion of a decaketide synthase to an octaketide one. Importantly, overall polyketide yields in these mutant systems were comparable to the natural synthases indicating no significant influence on enzyme reactivity. Some aromatic polyketides including frenolicin and R1128 are derived from nonacetate starter units which require a unique primer module for their incorporation into the iterative portion of the PKS [ll].Tang et al. have recently combined the R1128 priming module with the actinorhodin or tetracenomycin minimal PKS in an attempt to generate novel aromatic polyketide structures [36-381. The engineered bimodular PKS could efficiently
I
525
526
I
I
9 Diversity-oriented Synthesis
171128loading module
C16Minimal PKS
ZhuC. ZhuH, ZhuG KR, DH, ER
Act KSiCLF ZhuN, MAT
0 HO-S
*
I
0 -CoA
0
5x
0
HO-S-CoA
Fig. 9.2-4 Production of aromatic polyketide analogs. Combining the R1128 loading module with act minimal PKS produces a novel biaromatic polyketide. See text for domain abbreviations, MAT - malonyl acyltransferase.
produce novel hexaketides (act), octaketides (tcm),and decaketides (pms) bearing propionyl and isobutyryl primer units in place of acetyl primers (Fig. 9.2-4). KR, aromatase, and cyclase enzymes could effectively recognize and modify these nonnative substrates indicating that specificity arises from functional group recognition rather that polyketide chain length. This could potentially allow for generation of large libraries of related, fully processed aromatic compounds via simple, bimodular synthases. Efforts toward reprogramming N RPSs have closely resembled those for polyketides. Through chemical modification of building blocks and rearrangement of biosynthetic scaffolds, the fundamental rules governing nonribsomal peptide synthesis are gradually being deciphered [8, lo]. Increased substrate complexity within these systems, relative to PKSs, underscores the potential for developing elaborate functionality yet unmatched amongst polyketide structures. However, more sophisticated substrates often bring with them challenges concerning enzyme specificity and synthetic feasibility. Early efforts toward novel nonribosomal peptide production focused on module replacement in the surfactin (srf) NRPS system. Marahiel and coworkers genetically replaced the leucine-incorporating A-T components of module 2 and module 7 with A-T components specific for cysteine (from d-aminoadipyl-cysteinyl-D-vahe,ACV synthetase) and ornithine (from gramicidin S synthetase) respectively [39, 401. Although surfactin analogs containing the predicted amino acid alterations were identified, their yields, relative to wild-type production of surfactin, were significantly impeded. This again underscores the importance of understanding the consequences of
9.2 Combinatorid Biosynthesis ofPolyketides and Nonribosomal Peptides
mismatched protein-protein interfaces when engineering heterologous or hybrid synthases. The isolated TE domain from the tyrocidine (tyc) NRPS has recently been shown to catalyze the macrocyclization of unnatural substrates to generate a variety of cyclic peptides. In conjunction with standard solid-phase peptide synthesis, Walsh and coworkers demonstrated a broad substrate tolerance for peptidyl-N-acetylcysteamine thioesters by the tyrocidine TE [41,42].Cyclization of peptide analogs, where individual amino acids were replaced with ethylene glycol units, was observed with high efficiency. In addition, hydroxyacid starter units were readily cyclized by the isolated TE domain to form nonribosomal peptide-derived macrolactones. More recently, Walsh and coworkers have demonstrated effective cyclization of PEGA resin-bound peptide/polyketide hybrids by the tyrocidine TE domain [43]. Utilization of a pantetheine mimic for covalent attachment of small molecules to the resin, serves as an appropriate recognition domain for the enzyme. As peptide macrocyclizations remain challenging in the absence of enzymatic assistance, this approach promises facile construction of previously unattainable structures.
9.2.3 General Considerations
To achieve vast chemical diversity through biosynthetic manipulation, the basic principles, outlined above, must be extended to generate small molecule libraries efficiently. Although seemingly straightforward, this process brings with it many difficult challenges. Fortunately, initial efforts at combinatorial biosynthesis have provided some early insight into specific requirements that researches should bear in mind when venturing into this area. This section will outline the essential components and necessary considerations for bringing library generation to practice. With the goal of producing many novel natural product analogs in a timely manner, the precise method of small molecule generation is a critical consideration that must be addressed. For in vivo production, this often means appropriate selection of the host organism. It must be readily engineered to produce compounds of interest in at least high enough quantities for facile detection and analysis. In addition, the host proteome should be well characterized and readily controlled to avoid unintentional post-PKS/NRPS tailoring that may attenuate activity. Methods involving in vitro polyketide and nonribosonal peptide production involve a similar set of considerations. High turnover numbers are essential to increase product yields and minimize the amount of enzyme required. It is important that proteins used in these experiments be readily expressible in practical quantities and exhibit broad substrate tolerance. The latter is imperative to minimize laborious purification of numerous proteins for library construction.
I
527
528
I
9 Diversity-oriented Synthesis
I"'
0
'/OH
10
I+
O
I",
O
0
0 21
22
i3
24
is
9.2 Combinatorial Biosynthesis of Polyketides and Nonribosomal Peptides 4
Fig. 9.2-5 Cornbinatorial library o f 6-deoxyerythronolide B analogs by domain substitution. Colors correspond to specific engineered ketide units resulting from substitution o f modules indicated in the
legend. Figure taken from R. McDaniel, A. Tharnchaipenet, C. Custafsson, H. Fu, M. Betlach, G. Ashley, Proc. Natl. Acad. Sci. U.S.A. 1999, 9G, 1846-1851.
9.2.4 Applications and Practical Examples
Thus far, we have examined several approaches toward generating natural product analogs through chemical and genetic manipulation of PKS and NRPS assembly lines. Realization of combinatorial biosynthetic methods requires extension of these basic principles to create larger libraries of compounds from known templates. The complexity of these molecules precludes traditional chemical synthesis making biosynthetic manipulation the only viable means to access them. This section will focus on several examples of successful library generation using the techniques described above. Manipulation of the DEBS system has led to the most impressive demonstration of combinatorial biosynthesis to date. McDaniel and coworkers have utilized specific module-swapping strategies to access a variety of 6-deoxyerythronolide B analogs with modifications at each carbon of the macrolide backbone [26]. Modules 1-6 of DEBS were systematically replaced with individual rapamycin synthase components to alter oxidation state and methylation in the final polyketide product. The study produced 60 unique structures at yields ranging from 1 to 70% of that of 6-deoxyerythronolide B (Fig. 9.2-5). However, each new compound required independent synthase engineering, which made library construction quite tedious. To circumvent this laborious process, Santi and coworkers developed a multiplasmid approach whereby genetic variations on separate plasmids could be combined to produce a variety of analogs with multiple modifications [27]. Specifically, three discrete plasmids, each encoding one DEBS polypeptide (i.e., DEBSl), were prepared and appropriate module swaps were executed for each. The modified plasmids could then be selectively combined to generate genetically altered DEBS systems. The novel synthases produced a library comparable to the single plasmid one, but with a fraction of the effort and time (Fig. 9.2-6). The potential for combinatorial biosynthesis of aromatic polyketides has remained largely untapped. However, recent work has laid the appropriate groundwork for further exploration. Matching various initiation modules with heterologous elongation components produced a moderate sized library of small aromatic compounds [38]. For instance, coexpression of the R1128 loading module with the tcm minimal PKS generated the predicted products YT127 and YT127b derived from propionyl and isobutyryl starter units
1
529
9.2 Combinatorial Biosynthesis OfPolyketides and Nonribosomal Peptides
respectively (Fig. 9.2-7). Structural variants of these compounds were readily formed by simple swapping of act with the tcm minimal PKS. In addition to the array of molecules prepared in this study, the authors suggest numerous possibilities for production of related structures through alternative bimodular combinations. In all, a library of over 100 known and predicted aromatic polyketides could be described with this methodology. More recently, similar strategies have also been applied for the engineered biosynthesis of nonacetate primed decaketides. Combinatorial methods in NRPS systems have been limited to chemoenzymatic strategies as described above. However, given the relative ease ofmodern peptide synthesis, these studies have resulted in a vast array of highly functionalized macrocycles. A particularly impressive work in this area, executed by Burkart and coworkers, involved the synthesis and subsequent cyclization of more than 300 distinct peptides [44]. In an effort to gain access to improved tyrocidine A analogs, an assortment of peptides containing both natural and nonnatural amino acids at the D-Phe 1 and D-Phe 4 positions were synthesized and cyclized by tyc TE on the solid phase. Products were assayed for antimicrobial activity and most of the analogs tested showed improved therapeutic profiles over natural tyrocidine A. The authors mention that this methodology may ideally be used for initial discovery purposes. The chemical synthesis component permits limited NRPS engineering, until promising candidates are identified. 9.2.5 Future Development
Future success in combinatorial biosynthesis will rely heavily on increased understanding of specific recognition interfaces. This includes both motifs associated with protein-substrate and protein-protein interactions. In addition, development of improved techniques for monitoring and optimizing engineered processes will be critical to test the viability of using these methods to produce novel compounds efficiently. Despite the impressive examples described above, the area ofcombinatorial biosynthesis is still in its infancy and will require significant attention and ingenuity to truly harness its potential. Structure-based design of catalytically efficient synthetases will prove vital for future success in this area. As we have seen in the case of CLF engineering above, intrinsic specificity in these enzymes may be altered through manipulation of a set of key residues. However, this approach requires knowledge of three-dimensional protein structure. As little is known regarding the precise arrangement of specificity determinants in modular P I G and NRPS systems, efforts toward elucidating this information are critical to advancement. The extent to which these systems must be altered to achieve appreciable yields of natural product analogs remains to be seen. In some cases, analog production may be hindered by a single module or
I
531
532
I
? Diversity-on'ented Synthesis
& \' ,
b b b R+kIDlmu€
01
9.2 Cornbinatorid Biosynthesis ofpolyketides and Nonribosorna/ Peptides 4
Fig. 9.2-7 Aromatic polyketide library from genetically combining initiation modules (IM) with minimal aromatic PKSs. Compounds that have been reported are shown in bold. Predicted combinations are shown in plain text. KS-CLFs that have not
been identified are in gray. Blue - ketoreductase (KR) requirements, red - cyclase requirements, green - other methyl transferases (MT), and additional KRs. Figure taken from Y. Tang, T.S. Lee, H.Y. Lee, C. Khosla, Tetrahedron 2004, GO, 7659-7671.
domain, whereas others may require extensive engineering. The future of combinatorial biosynthesis will rely on our collective ability to answer these questions. Techniques to monitor individual transformations along the assembly line will offer necessary insight into analog processing. Ideally, problematic steps could be precisely identified in a high-throughput manner. Recent work by Kelleher and coworkers provides promise for realization of this goal [45-481. In short, they have established high-resolution mass spectrometry as a tool for evaluating intact domain-bound intermediates. This allows for facile assessment of mechanism and specificity in these systems under biologically relevant conditions. The enormous technological and intellectual advances in bioanalytical chemistry promise numerous opportunities for the future of real-time monitoring and troubleshooting. Genetic selection of organisms capable of efficiently producing natural product analogs represents a complementary approach to the structurebased design described above. Evolution of microorganisms in response to external pressures can provide an efficient means of producing novel bioactive molecules. It may be possible to produce strains whose survival relies on their ability to utilize heterologous biosynthetic machinery introduced through genetic manipulation. In this way, compounds can be selected for specific targets by simply altering the external stimuli. For instance, the discovery of antibiotics active against certain resistant bacterial strains may be realized by providing competitors with a host of chemical and biosynthetic resources followed by high-throughput analysis of those that produce effective small molecule defenses. 9.2.6 Conclusion
Given a wealth of natural chemical scaffolds for improved drug design, our ability to generate novel pharmaceuticals requires increased understanding of the biosynthetic processes that may lead to their discovery and production. Polyketide and nonribosomal peptide assembly offers enormous potential for development of combinatorial biosynthetic methods. The structural complexity of these natural products often prohibits practical chemical synthesis, which underscores the need for alternative means of accessing them in usable quantities. Research in this area requires in-depth knowledge of chemical,
I
533
534
9 Diversity-oriented Synthesis
I biological, and engineering principles that typify the field of chemical biology. The studies highlighted in this chapter demonstrate significant forward progress but there is much need for motivated scientists from all disciplines to take part in the development and exploration of improved methods.
Acknowledgment
This work was supported by grants from the National Institutes of Health (CA66736 and CA77248). Nathan A. Schnarr is a recipient of an NIH postdoctoral fellowship.
References 1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
D. O’Hagan, The Polyketide Metabolites, High Throughput Screen. 2003, 6, Ellis Horwood, New York, 1991. 527-540. David E. Cane (Ed.), For a thematic 11. I. Kantola, T. Kunnari, P. Mantsala, review covering polyketide and K. Ylihonko, Expanding the scope of nonribosomal peptide biosynthesis aromatic polyketides by combinatorial see, Chem. Rev. 1997, 97(7). biosynthesis, Comb. Chem. High ). Staunton, K. Weissman, Polyketide 7hroughput Screen. 2003, 6, 501-512. biosynthesis: a millennium review, 12. J. Staunton, B. Wilkinson, Nat. Prod. Rep. 2001, 18, 380-416. Combinatorial biosynthesis of C. Khosla, Natural product polyketides and nonribosomal biosynthesis, I.Org. Chem. 2000, 65, peptides, Cum. Opin. Chem. Biol. 2001, 8127-8133. 5,159-164. D. Cane, C. Walsh, C. Khosla, 13. N. Wu, S. Tsuji, D. Cane, C. Khosla, Harnessing the biosynthetic code: Assessing the balance between Combinations, permutations, and protein-protein interactions and mutations, Science 1998, 282, 63-68. enzyme-substrate interactions in the L. Katz, G. Ashley, Translation and channeling of intermediates between protein synthesis: Macrolides, Chem. polyketide synthase modules, I.Am. Rev. 2005, 105,499-528. Chem. SOC.2001, 27,6465-6474. H. Floss, Antibiotic biosynthesis: 14. S. Tsuji, N. Wu, C. Khosla, From natural to unnatural Intermodular communication in compounds, J. Indust. Micro. Biotech. polyketide synthases: Comparing the 2001, 27, 183-194. role of protein-protein interactions to C. Walsh, Combinatorial biosynthesis those in other multidomain proteins, of antibiotics: Challenges and Biochemistry 2001, 40,2317-2325. opportunities, ChemBioChem 2002, 3, 15. N. Wu, D. Cane, C. Khosla, 124-134. Quantitative analysis of the relative S. Donadio, M. Sosio, Strategies for contributions of donor acyl carrier combinatorial biosynthesis with proteins, acceptor ketosynthases, and modular polyketide synthases, Comb. linker regions to intermodular transfer Chem. High %roughput Screen. 2003, of intermediates in hybrid polyketide 6,489-500. synthases, Biochemistry 2002,42, U. Keller. F. Schauwecker, 5056-5066. Combinatorial biosynthesis of 16. R. Broadhurst, D. Nietlispach, non-ribosomal peptides, Comb. Chem. M. Wheatcroft, P. Leadlay,
References 1535
17.
18.
19.
20.
21.
22.
23.
24.
25.
K. Weissman, The structure of docking domains in modular polyketide synthases, Chem. Biol. 2003, 10,723-731. S. Tsuji, D. Cane, C. Khosla, Selective protein-protein interactions direct the channeling of intermediates between polyketide synthase modules, Biochemistry 2001, 40, 2326-2331. P. Kumar, Q. Li. D. Cane, C. Khosla, Intermodular communication in modular polyketide synthases: structural and mutational analysis of linker mediated protein-protein recognition, 1.Am. Chem. Soc. 2003, 125,4097-4102. J. Lau, D. Cane, C. Khosla, Substrate specificity of the loading didomain of the erythromycin polyketide synthase, Biochemistry 2001, 29, 10514-10520. J. Jacobsen, C. Hutchinson, D. Cane, C. Khosla, Precursor-directed biosynthesis of erythromycin analogs by an engineered polyketide synthase, Science 1997, 277, 367-369. X. Ruan, A. Pereda, D. Stassi, D. Zeidner, R. Summers, M. Jackson, A. Shivakumar, S . Kakavas, M. Staver, S. Donadio, L. Katz, Acyl transferase domain substitutions in erythromycin polyketide synthase yield novel erythromycin derivatives, 1.Bacteriol. 1997, 179,6416-6425. J. Lau, H. Fu, D. Cane, C. Khosla, Dissecting the role of acyl transferase domains of modular polyketide synthases in the choice and stereochemical fate of extender units, Biochemistry 1999,38,1643-1651. D. Stassi, S. Kakavas, K. Reynolds, G. Gunawardana, S. Swanson, D. Zeidner, M. Jackson, H. Liu, A. Buko, L. Katz, Ethyl-substituted erythromycin derivatives produced by directed metabolic engineering, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 7305-7309. S. Donadio, M. Staver, J. McAlpine, S. Swanson, L. Katz, Modular organization of genes required for complex polyketide biosynthesis, Science 1991, 252,675-679. S. Donadio, J . McAlpine, P. Sheldon, M. Jackson, L. Katz, An erythromycin
26.
27.
28.
29.
30.
31.
32.
33.
analog produced by reprogramming of polyketide synthesis, Proc. Natl. Acad. Sci. U.S.A. 1993, 90,7119-7123. R. McDaniel, A. Thamchaipenet, C. Gustafsson, H. Fu, M. Betlach, G. Ashley, Multiple genetic modifications of the erythromycin polyketide synthase to produce a library of novel “Unnatural” natural products, Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 1846-1851. Q. Xue, G. Ashley, C. Hutchinson, D. Santi, A multiplasmid approach to preparing large libraries of polyketides, Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 11740-11745. C. Kao, G. Luo, L. Katz, D. Cane, C. Khosla, Manipulation of macrolide ring size by directed mutagenesis of a modular polyketide synthase, J . Am. Chem. Soc. 1995, 117,9105-9106. R. Gokhale, D. Hunziker, D. Cane, C. Khosla, Mechanism and specificity of the terminal thioesterase domain from the erythromycin polyketide synthase, Chem. Biol. 1999, 6 , 117-125. M. Crump, J. Crosby, C. Dempsey, J. Parkinson, M. Murray, D. Hopwood, T. Simpson, Solution structure of the actinorhodin polyketide synthase acyl carrier protein from Streptomyces coelicolor A3(2), Biochemistry 1997,36,6000-6008. H. Pan, S.-C. Tsai, E. Meadows, L. Miercke, A. Keating-Clay, J . O’Connell, C. Khosla, R. Stroud, Crystal structure of the priming ,8-Ketosynthase from the R1128 polyketide biosynthetic pathway, Structure 2002, 10, 1559-1568. S. Findlow, C. Winsor, T. Simpson, J . Crosby, M. Crump, Solution structure and dynamics of oxytetracycline polyketide synthase acyl carrier protein from Streptornyces uimosus. Biochemistry 2003, 42, 8423 -8433. Q. Li, C. Khosla, J. Puglisi, C. Liu, Solution structure and backbone dynamics of the holo form of the frenolicin acyl carrier protein, Biochemistry 2003, 42,4648-4657.
536
I
9 Diversity-oriented Synthesis 34.
35.
36.
37.
38.
39.
40.
41.
K. Watanabe, C. Khosla, R. Stroud, S.-C. Tsai, Crystal structure of an Acyl-ACP dehydrogenase from the FK520 polyketide biosynthetic pathway: Insights into extender unit biosynthesis, J. Mol. Biol. 2003, 334, 435-444. Y. Tang, S.-C.Tsai, C. Khosla, Polyketide chain length control by chain length factor,]. Am. Chem. SOC. 2003, 125,12708-12709. Y. Tang, T.S. Lee, C. Khosla, Engineered biosynthesis of regioselectively modified aromatic polyketides using bimodular polyketide sythases, PLoS Biol. 2004, 2, 227-238. Y. Tang, T.S. Lee, S. Kobayashi, C. Khosla, Ketosynthases in the initiation and elongation modules of aromatic polyketide synthases have orthogonal acyl carrier protein specificity, Biochemistry 2003, 42, 6588-6595. Y. Tang, T.S. Lee, H.Y. Lee, C. Khosla, Exploring the biosynthetic potential of bimodular aromatic polyketide synthases, Tetrahedron 2004, 60, 7659-7671. T. Stachelhaus, A. Schneider, M. Marahiel, Rational design of peptide antibiotics by targeted replacement of bacterial and fungal domains, Science 1995, 269, 69-72. A. Schneider, T. Stachelhaus, M. Marahiel, Targeted alteration of the substrate specificity of peptide synthetases by rational module swapping, MoE. Gen. Genet. 1998, 257, 308-318. R. Kohli, J. Trauger, D. Schwarzer, M. Marahiel, C. Walsh, Generality of peptide cyclization catalyzed by
42.
43.
44.
45.
46.
47.
48.
isolated thioesterase domains of nonribosornal peptide synthetases, Biochemistry 2001,40,7099-7108. J. Trauger, R. Kohli, C. Walsh, Cyclization of backbone-substituted peptides catalyzed by the thioesterase domain from the tyrocidine nonribosomal peptide synthetase, Biochemistry 2001,40,7092-7098. R. Kohli, M. Burke, J. Tao, C. Walsh, Chemoenzymatic route to macrocyclic hybrid peptidelpolyketide-like molecules, J . Am. Chem. SOC.2003, 125,7160-7161. R. Kohli, C. Walsh, M. Burkart, Biomimetic synthesis and optimization of cyclic peptide antibiotics, Nature 2002, 418,658-661. S. McLoughlin, N. Kelleher, Kinetic and regiospecific interrogation of covalent intermediates in the nonribosomal peptide synthesis of yersiniabactin,]. Am. Chem. SOC.2004, 126,13265-13275. L. Hicks, S. O’Connor, M. Mazur, C. Walsh, N. Kelleher, Mass spectrometric interrogation of thioester-bound intermediates in the initial stages of epothilone biosynthesis, Chem. Biol. 2004, 11, 327-335. S. Garneau, P. Dorrestein, N. Kelleher, C. Walsh, Characterization of the formation of the pyrrole moiety during clorobiocin and coumermycin Al biosynthesis, Biochemistry 2005,44,2770-2780. G . Gatto, S. McLoughlin, N. Kelleher, C. Walsh, Elucidating the substrate specificity and condensation domain activity of FkbP, the FK520 pipecolate-incorporating enzyme, Biochemistry 2005, 44, 5993-6002.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I
10 Synthesis of Large Biological Molecules 10.1 Expressed Protein Ligation
Matthew R. Pratt and Tom W. Muir
Outlook
The generation of proteins containing homogeneous natural and unnatural modifications is a key component in understanding biological processes. With this goal in mind a variety of protein-enineering approaches have been developed, including expressed protein ligation (EPL). EPL is an intein-based approach that yields chemically modified proteins from smaller synthetic and/or recombinant fragments allowing for the construction of proteins containing a broad range of a theoretically unlimited number of modifications. The history and applications of this powerful protein-engineering technology are highlighted below. 10.1.1 Introduction
As the biological sciences continue forward in what is referred to as the postgenomic era, an intimate understanding of protein structure and function has become a core goal in biological study. Looking at the number of genes in the human genome this goal appears large but within reach; however, the grand scope of this task is further complicated by the spatial and temporal dynamics of protein modification on the pre- and posttranslational levels. Seventy to ninety percent of the transcripts encoded in the human genome contain two or more exons, allowing for the alternative splicing of pre-mRNAs. In addition, one-third of the entire mammalian proteins are thought to be phosphorylated [l],and 1% of all gene products (-300 genes) encode for glycosyltransferases involved in the biosynthesis of carbohydrates appended Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
537
538
I
70 Synthesis of Large Biological Molecules
to glycoproteins and glycolipids [ 2 ] . It is becoming increasingly clear that a full understanding of the human proteome will be achieved only when the individual members have been considered in a context that includes tissue and cell-typeexpression, modification patterns, and how those patterns change over timescales, ranging from minutes to years. Cataloging the human proteome begins with a full description of the modifications of a given protein and how they affect function, stability, structure, localization, and interactions with other molecules. This task is a very large proposition, yet it is a crucial longterm objective of biology. Indeed, many new fields including bioinformatics, chemical biology, proteomics, and structural genomics have emerged in recent years providing new technologies with these goals clearly in mind. Chemistry has long played a key role in the elucidation of biological processes. The strength of chemistry has been, and always will be, the synthesis of homogeneous, structurally defined materials. The extension of this strength to proteins has been a major focus of biological chemistry research, both for the understanding of native biological function and from the perspective of harnessing that function for nonbiological applications (e.g., reaction catalysis, surface chemistry). Chemical synthesis has elegantly allowed for the incorporation of unnatural or modified amino acids into proteins that would otherwise be unattainable using standard ribosomal synthesis and has facilitated the construction of proteins possessing natural posttranslational modifications. This second feature is of importance because it is extremely difficult to obtain, by traditional recombinant methods, homogeneous preparations of posttranslationally modified proteins for structural and functional studies. The demand for specifically modified proteins has encouraged the development of a variety of protein-engineering approaches. These techniques range from classical chemical labeling methods to more recent methodologies such as specific chemical reactions [3,4], enzymatic labeling [5],nonsense suppression mutagenesis [6, 71, and expressed protein ligation (EPL) [8-121. EPL involves the linking of synthetic and recombinant peptidelprotein building blocks to give a final protein product. This semisynthesis is achieved using chemoselective functional groups at the appropriate ends of the fragments, allowing for their assembly to take place with complete regioselectivity in water at physiological pH. Although EPL can involve more chemical steps (e.g.,peptide synthesis) than the other methods mentioned above, it has two important advantages: A theoretically unlimited number of unnatural amino acids can be incorporated, and a much broader range of modifications are possible. For these reasons, EPL has been successfully applied to a broad variety of protein-engineering problems, and this technology and its applications are highlighted below. 10.1.2 History/Development
EPL had its genesis in the convergence of chemical synthesis and protein biochemistry. The established areas of peptide and protein chemistry provided
70.7 Expressed Protein Ligation
the technical foundation, and inputs from a naturally occurring biological process, protein splicing, catalyzed the development of the technology. To see how this union led to the development of EPL, it is worth reviewing the relevant areas of protein chemistry.
10.1.2.1
Protein Semisynthesis
Protein semisynthesis was originally achieved as the process by which proteolytic or chemical cleavage fragments of natural proteins were used as the building blocks for the resynthesis of the protein [13]. For example, it has been shown that CNBr-induced cleavage fragments of certain proteins (pancreatic trypsin inhibitor and cytochrome c) [14, 151 spontaneously reform the native peptide bond between them. This spontaneous process was used to incorporate natural and unnatural amino acids into cytochrome c. More recently, the scope of protein sernisynthesis has been broadened to include the site-specific modification of a natural protein. The most successful approach of this type to date has been the introduction, by standard site-directed mutagenesis, of a unique cysteine residue into the protein of interest, permitting selective derivatization of the sulfhydryl group with any number of thiol-reactive probes. This method has been used to incorporate photoactivatable cross-linkers [ 161, fluorophores [17], and carbohydrates [18]into proteins and has been used to prepare photocaged enzymes [19]. Another approach to protein semisynthesis involves the use of proteolytic enzymes to facilitate the regioselective ligation of peptide fragments. Carrying out reverse proteolysis involves the altering of the reaction conditions such that aminolysis of an acyl-enzyme intermediate is favored over hydrolysis. This is typically achieved by using high concentrations of organic solvents such as glycerol, dimethylformamide (DMF), or acetonitrile in the reaction medium. Under these conditions, the acyl-enzyme intermediate will undergo aminolysis with a second peptide fragment, giving an amide-linked product [20]. Significant progress in the area of enzyme-mediated protein ligation has been realized through enzyme active site engineering. In an elegant example, Wells and coworkers made a double mutant of substillin, termed subtiliguse, giving an enzyme capable of ligating peptide fragments with a high level of efficiency [21]. The Bordusa laboratory has also improved the reverse proteolysis technology by developing substrate mimetic leaving groups at the C-terminus of the N-terminal peptide-coupling partner. These peptide esters have been successful in trypsin-, V8 protease-, and chymotrypsin-catalyzed reactions [22].
10.12.2
Chemical Ligation
Over the last -15 years, chemoselective ligation has emerged as a powerful technique in chemical biology, allowing mutual and exclusive reactive
I
539
540
I partners to be joined without the need for protecting groups in an aqueous 70 Synthesis o f h r g e Bio/ogica/ Molecules
environment. Naturally, this ligation strategy was further developed as a solution to the classic problems associated with classical fragment condensation reactions, which are handicapped by the necessity for protected peptide building blocks. In the area of protein engineering, Offord and Rose pioneered the use of hydrozone/oxime forming reactions for chemically ligating synthetic and recombinant peptide fragments together [23-251. In the early 1990s, the idea of using a chemoselective coupling reaction with fully synthetic peptides was realized in the Kent laboratory when the 99-residue HIV-1 protease was assembled from two -50-residue unprotected peptides using a thioester-bond forming reaction [2G]. Given the simplicity and elegance of chemoselective ligation, a large amount of effort has gone into expanding the technique to include thioether, thiazolidine, and amide forming reactions [27]. The next major step in establishing chemoselective ligation as a general route for protein synthesis came with the development of native chemical ligation (NCL) [28]. Using this technique, two fully unprotected peptide fragments can be reacted under neutral aqueous conditions culminating in the formation of a native peptide bond at the ligation site (Fig. 10.1-l(a)).The first step in NCL involves the chemoselective transthioesterification reaction between one peptide containing an N-terminal cysteine residue and another peptide containing a a-thioester group. This initial reaction is followed by a spontaneous intramolecular S 4 N acyl shift, generating a native amide bond at the ligation junction. NCL is compatible with all naturally occurring side chain functionalities including the sulfhydryl group of cysteine. This compatibility with cysteine is due to the reversibility of the initial transthioesterification step and allows for the presence of internal cysteine residues in both peptide sequences. Because of its compatibility with all naturally occurring amino acids, NCL is ideally suited for protein semisynthesis. The only requirement for the recombinant protein is that it contains one ofthe two chemoselective reactive groups, either a-Cys or an a-thioester. Indeed, NCL has been used in a semisynthetic context through the recombinant incorporation of an a-Cys residue, providing access to natural proteins modified by synthetic molecules at their N-terminus [29].The remaining obstacle is how to prepare recombinant protein a-thioesters, which are required if synthetic peptides are to be incorporated at the C-terminus or the middle of semisynthetic proteins. The solution to this problem fell serendipitously out of studies on the naturally occurring process known as protein splicing.
10.1.2.3
Protein Splicing
Protein splicing is a posttranslational process whereby a precursor protein undergoes a series of self-catalyzedintramolecular rearrangements that result in the removal of an internal protein segment, termed intein,and the ligation of the two flanking polypeptides, referred to as exteins (Fig. 10.1-l(b))[30, 311. One hundred and seventy-six members of the intein protein domain family are currently cataloged (http://www.neb.com/neb/inteins.html), being
70.7 Expressed Protein Ligation
Fig. 10.1-1 (a) Mechanism o f native chemical ligation. Both polypeptides are fully unprotected, and the reaction proceeds in water at neutral pH. (b) Schematic representation o f protein splicing.
Intramolecular rearrangements result in the ligation o f two polypeptides with the requisite removal of an internal segment.
characterized by several conserved sequence motifs. Inteins are autocatalytic and some are promiscuous for the sequences of the two flanking exteins, allowing many polypeptides to participate in protein splicing. As shown in Fig. 10.1-l(b),the first step of protein splicing involves an N -+ S (or N + 0) acyl shift in which the N-terminal extein is transferred to the side chain of a cysteine (or Ser) residue at the immediate N-terminus of the intein. A second cysteine residue (or Ser/Thr) located at the N-terminus ofthe remaining extein attacks the resulting thioester yielding a branched thioester intermediate. The branched intermediate is subsequently resolved on cyclization of the conserved asparagine residue located at the C-terminus of the intein. The intein is thus excised as a C-terminal succinimide derivative. The final step in this process involves the S + N (or 0 -+ N ) acyl shift providing the spliced protein product. The final step of protein splicing closely resembles the second step of
1
541
542
70 Synthesis of Large Biological Molecules
I NCL. In fact, NCL provided the chemical insight for unraveling the last step in the protein splicing mechanism [32]. Inteins have been found in proteins of species ranging from eubacteria, archaea, and eucarya, suggesting that they have an ancient evolutionary origin. However, a biological role for inteins is yet to be discovered. Interestingly, the products of inteins share structural homology to autoprocessing domains, such as hedgehog proteins, present in higher eukaryotes. Furthermore, inteins are often found in gene products responsible for DNA replication or recombination, ensuring their conservation. The subject of intein distribution and evolutionary history has been discussed at length elsewhere [33]. Although the biological role of protein splicing remains a matter of inquiry, the process has been exploited extensively in the areas of biotechnology and protein chemistry. The first of these applications exploits the knowledge of the mechanism of protein splicing to produce beneficial intein mutants. A number of mutant inteins (many contain a C-terminalAsn + Ala mutation) have been designed that can achieve only the first step of protein splicing [32, 34-37]. Proteins expressed as in-frame N-terminal fusions to one of these mutant inteins can be cleaved by thiols via an intermolecular transthioesterification reaction. This system provides two things: first it acts as a traceless chemical protease that can be exploited for the purification of recombinant proteins [34], and more importantly, a key ingredient of NCL, protein a-thioesters, can also be prepared by this method. A second application involves the use of naturally or artificially split inteins [38-411. These split inteins individually have no activity but when combined associate noncovalently to give a functional protein. Protein transsplicing, as this process is generally known, provides a way of selectively ligating two different polypeptides together and represents an augmenting alternative to EPL. Indeed, transsplicing has been exploited for the generation of cyclic peptides and proteins, for detecting protein-protein interactions, and for controlling protein function, some of which will be discussed later in this chapter. Harnessing protein splicing, researchers now have the ability to generate recombinant protein a-thioesters through the thiolysis of an appropriately mutated protein-intein fusion. In principle, this means that synthetic and recombinant building blocks can be fused in a semisynthetic version of NCL. Such an approach was first reported in 1998 and has been named expressed protein ligation [8]. 10.1.3 General Considerations 10.1.3.1 Generation ofThioesters
The bottleneck of EPL is the generation of peptide or protein thioesters. This has encouraged many groups to develop methods for their construction.
10. I Expressed Protein Ligation
Fig. 10.1-2 Generation of peptide a-thioesters by Fmoc-based SPPS using sulfonamide safety catch linker resin (a), a masked thioester equivalent incorporated post-SPPS (b), and a masked thioester linker strategy (c).
Several methods for the production of peptide thioesters using solid-phase peptide synthesis (SPPS) have been fashioned. The most general strategy involves the use of tert-butylmethoxy carbonyl (Boc)-based peptide synthesis because the thioester is labile to the repeated base treatments required in 9-fluroenylmethoxycarbonyl (Fmoc)-based SPPS [28]. However, different technologies employing the Fmoc synthesis method have been developed because the strategy has the advantage of milder cleavage conditions allowing for the incorporation of acid sensitive functionality, such as phosphates and carbohydrates, not accessible through Boc chemistry. One such method is based on the modifications of Kenner’s sulfonamide “safety catch” linker (Fig. 10.1-2(a))[42].The growing peptide chain is attached to the resin with an acid and base stable N-acyl sulfonamide linker. After the peptide synthesis is complete, the sulfonamide can be activated by N-alkylation using electrophiles such as iodoacetonitrile. This activated species can then be cleaved with a thiol nucleophile to generate the peptide thioester [43].An aryl hydrazine resin
1
543
544
I
10 Synthesis ofLarge Biological Molecules
has also been reported recently, which could be utilized in a similar fashion to create peptide thioesters through thioylsis [44]. Another method involves the coupling of “masked” thioester equivalents to fully protected peptide free acids post-SPPS [45]. In one example (Fig. 10.1-2(b)),an amino acid derivative was coupled to a fully protected peptide, followed by global deprotection, to give a masked thioester intermediate. Treatment of this intermediate with exogenous thiols reduces the disulfide bond, allowing for a spontaneous rearrangement resulting in the formation of a peptide thioester. Finally, a masked thioester equivalent has recently been introduced as a linker for SPPS (Fig. 10.1-2(c))[46]. Standard cleavage conditions allows for the isolation of the peptide-linker intermediate, which upon treatment with thiols, rearranges to yield a peptide thioester. These examples, along with others, have been used successfully in NCL and EPL syntheses of peptides and proteins. As noted above, the production of recombinant protein thioesters was first achieved by the use of mutant inteins rendered incapable of resolving their
Fig. 10.1-3 Expressed protein ligation. as a fusion t o the N-terminus of an intein. Synthesis o f recombinant protein thioesters The CBD allows for purification. The using the IMPACT’“ system. Thioesters are thioester resulting from thiolysis can be obtained by expressing a protein o f interest ligated under the conditions o f NCL.
70.7 Expressed Protein Ligation
thioester intermediate [32, 34-36]. This technology is commercially available as the IMPACT (intein-mediated purification with an affinity chitin binding tag) system (Fig. 10.1-3)[34]. In this system, a target protein is expressed as an N-terminal fusion of a modified intein. A chitin binding domain (CBD) from Bacillus circulans is fused to the C-terminal portion of the intein allowing for affinity purification of the three-component fusion protein of interest over chitin resin. Other proteins are washed away from the desired immobilized protein, followed by cleavage with an excess of thiol, yielding the protein of interest as a C-terminal thioester. Modified mini inteins, containing an Asn + Ala mutation, from the genes of Mycobacterium xenopi ( M x e GyrA), Saccharomyces cerevisiae (Sce VMA), Methanobacterium thermoautotrophicum ( M t h R l R l ) , and Synechocystis sp. PCC6803 (Ssp DnaB) are commonly used for this process. The cleavage occurs directly at the N-terminus of the intein due to the lack of Asn cyclization. These inteins can be cleaved with various thiols such as ethanethiol, thiophenol, and 2-mercaptoethansulfonic acid (MESNA) with great efficiency.
10.1.3.2
ProtectingGroups and Sequential Ligations
Most EPL applications involve just two building blocks and thus a single ligation reaction. However, the restrictions of SPPS, which limits the length
Fig. 10.1-4 Schematic representation of sequential ligation reactions. A synthetic peptide containing an N-terminal thioproline residue can be ligated t o the N-terminus o f a protein containing a a-cysteine. The thioproline can then be
transformed into a new a-cysteine residue poised for the next ligation reaction. Likewise, a recombinant protein’s a-cysteine residue can be masked by a prosequence cleavable by the protease factor Xa.
1
545
546
I of a synthesized peptide to about 50 residues, require that the region of 10 Synthesis of Large Biological Molecules
interest in a protein be relatively close to the native N- or C-terminus. To address this issue, a sequential ligation method is necessary, and thus, protecting groups for N-terminal cysteine residues, both in synthetic peptides and recombinant proteins, are needed. The cysteine protection is necessary to prevent the peptide or protein from reacting with itself in either an intraor intermolecular fashion. This allows for a sequential ligation strategy such that multiple (three or more) building blocks can be linked together in series. Two commonly used protecting group strategies are outlined in Fig. 10.1-4. Synthetic peptide fragments can be protected as an N-terminal thioproline residue [47],which can be removed by treatment with 0.2 M methoxylamine following a ligation reaction [48]. Recombinant proteins can contain a cryptic a-cysteine residue masked by a factor Xa cleavable prosequence [49]. The advantage of this proteolytic approach is that the protecting group sequence can be encoded at the genetic level. Thus, the prosequence can be used for both synthetic and recombinant inserts in sequential EPL reactions.
10.1.3.3
Alternatives to N-terminal Cysteine
The only absolute requirement for NCL and EPL, other than a a-thioester,is of a cysteine residue or a homolog at the ligation site. The natural occurrence of this amino acid is low and there is the possibility that insertion ofadditional cysteine residues can alter the structure and function of a given protein. Therefore, different approaches have been developed to overcome this requirement [SO]. The first such approach extends NCL methodology to -X-Gly- and -Gly-Xligation sites through the use of removable auxiliaries, an example of which is shown in Fig. 10.1-5(a)[51]. In this case, an oxyethanethiol group acts as a cysteine surrogate allowing for the formation of a thioester intermediate capable of rearranging to give a peptide bond. The auxiliary can then be removed by reaction with Zn and acid. A second method allows for the ligation site to be extended to -X-Ala- (Fig. 10.1-5(b))[52]. NCL is performed in the usual fashion yielding a cysteine at the ligation site. In the following step, the Cys is converted to an Ala by desulfurization using Raney nickel and hydrogen. However, selectivity of the desulfurization reaction is impossible to achieve, prohibiting the use of this method in the case of proteins containing further Cys residues. In the final example, an entirely different chemoselective ligation, the Staudinger ligation [ 5 3 ] has been used to extend the NCL methodology (Figure 10.1-5(c))[54]. A peptide containing a C-terminal phosphinothioester is coupled to another peptide bearing an N-terminala-azido functionality. The reaction proceeds through the formation of an iminophosphorane possessing a nucleophilic nitrogen that will react with a nearby acyl donor to form a peptide bond. This methodology has successfully extended the NCL methodology to an -X-Gly-ligation site. Further extension of these and similar technologies allows for the extension of NCL to many different ligation sites in the future.
70.7 Expressed Protein Ligation
Fig. 10.1-5 The extension of ligation technology past the requirement o f cysteine using auxiliaries (a), desulfination (b), and the Staudinger ligation (c).
10.1.3.4 Ligation Strategies EPL requires, by the limitations of SPPS, that a Cys residue be located relatively close to the region of the protein where unnatural moieties will be introduced. As noted above, it is possible to reproducibly synthesize peptides of -50 residues in length. Thus, for a protein to be completely accessible to modification by EPL, there must be a Cys residue for every 50 or less residues in the primary sequence. Many proteins meet this requirement and are ideal targets for EPL. However, many more proteins do not contain suitable Cys residues, and the simplest solution is to introduce one through mutation. This technique has been used successfully for the semisynthesis of several fully active proteins 19, 55-57]. In these cases, the mutation site should be chosen with care. The mutation should be chosen to be as conservative as possible in relation to primary sequence (e.g., Ala --f Cys or Ser-Cys) [58] and structure (e.g., loops or linkers) [9]. Highly conserved residues from a family of related proteins should also be avoided as sites of mutation. Given the availability of straightforward site-directed mutagenesis strategies, the effect of a Cys mutation can often be evaluated prior to beginning a semisynthesis by recombinant expression of the protein containing a point mutation [59]. As noted in the above section, technologies are being developed to overcome the requirement of an N-terminal cysteine; however, the use of these methods is yet to be reported in the context of EPL. Another factor affecting the choice of where a Cys residue should be introduced for ligation is the identity of the preceding amino acid. This
I 547
548
70 Synthesis of Large Biological Molecules
I residue will be at the C-terminus of the thioester fragment, and the effects of varying this amino acid on the kinetics of NCL have been studied [GO]. Increasing the steric bulk of the side chain (particularly p-substitution) slows the rate of the reaction. Thus, Cys substitutions directly following bulky amino acids, especially Thr, Ile, and Val, should be avoided. A related issue is the effect of the identity of this amino acid on the efficiency of the protein-intein thiolysis step [Gl]. Certain residues result in premature cleavage (e.g., Asp, Asn, Glu, Gln), while others result in no cleavage at all (e.g., Pro). EPL reactions can be carried out in two different ways: thiolysis and NCL can be carried out in one pot, or the recombinant protein thioester can be isolated initially. The first method obviates the need for a purification step but somewhat limits the types of additives that can be present in the reactions mixture. However, one-pot EPL reactions have been successful in the presence of detergents, guanidinium chloride, urea, and organic solvent mixtures [ll].Thiols, such as MESNA or thiophenol, which generate reactive thioesters can be used directly in one-pot reactions. If the protein thioester is first isolated, then harsher denaturants may be used in the subsequent NCL reactions [27]. This has the advantage of increasing the solubility of the reaction partners, allowing for high concentrations (millimolar) of the polypeptides to be achieved, increasing the ligation yield. Less reactive alkyl thiols are often used for the thiolysis of proteins to be isolated, followed by in situ activation through the addition of MESNA or thiophenol in the NCL reaction. 10.1.4 Applications and Practical Examples
EPL has been applied to an array of proteins ranging from kinases and phosphatases, to transcription factors, polymerases, ion channels, and many others. A variety of modifications have been introduced into these proteins, allowing for studies of protein structure and function that would be difficult with other techniques. Some of these applications are highlighted below.
10.1.4.1
Introduction of Fluorescent Probes
Fluorescent spectroscopy, because of its high level of sensitivity, has long been a powerful method for studying protein behavior. Site-specific attachment of fluorophores to a unique cysteine in a protein of interest is a traditional route for the production of fluorescent proteins. In addition, the discovery of fluorescent proteins, such as the green fluorescent protein (GFP) from the jellyfish Aequorea victoria [G2], has provided a genetic approach for the production of fluorescently labeled proteins. Both these methods, however,
10.1 Expressed Protein Ligation
have drawbacks. The chemical labeling of a unique cysteine is often practically difficult and the tagging of a protein with GFP appends a -30 kDa protein, which may affect the properties of the protein of interest. The use of EPL can in principle overcome both these limitations. Typically, a fluorophore is attached to the side chain of an amino acid (e.g., the &-aminogroup of lysine) in the synthetic peptide and subsequently incorporated into the protein though EPL. Several protection schemes have been developed to allow probes, such as fluorescein or tetramethylrhodamine, to be introduced into peptides using SPPS [8]. Simple derivatives of fluorophores have also been created that can participate in EPL reactions directly [63, 641. The ability to introduce a fluorescent probe into a specific site in a protein opens up many possibilities for the assaying function. The simplest of these approaches involves the monitoring of intrinsic fluorescence of the probe during the biological process under investigation. Several fluorophores are known to be sensitive to the surrounding environment, that is, their quantum yields and/or Stokes shifts are responsive to changes in the dielectric constant of the immediate surroundings. Thus, the incorporation of one of these probes near the area of a protein that will undergo a structural change or to a site of ligand binding allows direct observation of these events. In one example, Alexandrov and coworkers incorporated a dansyl probe into a semisynthetic version of a GTPase, Rab7 [65]. The fluorophore was incorporated near the C-terminus of Rab7, which has been shown to be posttranslationally prenylated by the enzyme Rab geranylgeranyl transferase (RabGGTase). This modification controls the subcellular localization, and thus the activity, of Rab7. The prenylation reaction is further modulated by the presence of Rab escort protein (REP), which is necessary for enzymatic activity. Both steady-state and time-resolved fluorescence measurements were used to determine micromolar affinities of Rab7 for RabGGTase and REP, independent of each other. This finding supports a hypothesis that RabGGTase possesses two independent weak binding sites for Rab7 and REP. The same group used semisynthesis to obtain a crystal structure of mono-prenylated Yptl (a Rab homolog) bound to RabGDI, a critical GDP dissociation inhibitor, involved in the regulation of Rab proteins [66]. This structure provided a basis for the ability of RabGDI to inhibit the release of nucleotide by Rab proteins. Initial binding of RabGDI to Yptl causes a conformational change that opens a hydrophobic cavity in RabGDI. This cavity can then accept an isoprenyl group on Ypt, forming a soluble complex that is free to dissociate from the membrane where prenylated Rab proteins are localized. Fluorescence resonance energy transfer (FRET) is another powerful technique for the determination of structural and functional information using fluorescent proteins. FRET is a physical phenomenon in which the distance between donor and acceptor fluorophores can be determined with reasonable accuracy [67]. This phenomenon was harnessed to study the c-Crk-I1 signaling protein, which is a substrate of the c-Abl protein kinase [68]. Using
I
549
550
I
70 Synthesis of Large Biologics/ Molecules
Fig. 10.1-6
Biosensor for c-Abl a change in the distance between the termini phosphorylation o f c-Crk-ll. c-Abl ofthe protein. This change is reported by phosphorylates Tyr221 of c-Crk-ll, which the FRET pair tetramethylrhodamine (Rh) induces an intramolecular association with and fluorescein (FI) incorporated at the Nthe SH2 domain. This rearrangement yields and C-termini, respectively.
EPL, a FRET pair, tetramethylrhodamine and fluorescein, was incorporated in c-Crk-11. By judicious placement of the fluorophores within the c-Crk-11 molecule, it was possible to monitor the phosphorylation state of the protein using FRET measurements (Fig. 10.1-6).In a subsequent study, an extremely sensitive dual labeled c-Crk-11 analog was developed that enabled real-time monitoring of c-Abl kinase activity, and provided a nonradioactive assay for the screening of potential inhibitors of the kinase [69].
10.1.4.2
Introduction of Posttranslational Modifications and Unnatural Amino Acids
As noted above, the heterogeneous and often dynamic nature of posttranslational modifications, such as phosphorylation, lipidation, and glycosylation, makes their effects on protein structure and function extremely difficult to study using traditional biological techniques. The semisynthetic nature of EPL, however, is ideally suited for the incorporation of homogeneous posttranslational modifications, as well as for the introduction of completely unnatural amino acids. In the previous section, the effect of prenylation on a Rab GTPase was shown to be necessary for not only its correct localization but also interactions with an inhibitory molecule RabGDI. Shown in Fig. 10.1-7 are some of the noncoded amino acids that have been incorporated into proteins using this approach [I11. In most cases, these amino acids were used to study some aspect of protein function that was difficult or impossible to study by other means. Glycosylation is a vital posttranslational modification involved in a variety of cellular processes including development, immune recognition, and cellular trafficking [70]. Establishing the biological consequences of specific oligosaccharides is difficult owing to glycoprotein microheterogeneity, which arises from the fact that protein glycosylation is not under direct genetic control. Because of the complex structure of oligosaccharides and the inherent incompatibilities between carbohydrate and peptide chemistry (e.g.,glycan stability, protecting group compatibilities), the synthesis of homogeneous glycoproteins remains a daunting task. In a recent example, EPL was applied toward the understanding of protein glycosylation on the mucinlike glycoprotein
10.1 Expressed Protein Ligation
1
551
\
H2N(OH
H N G : H
#?
H*N H'N$H
Homocysteine Selenocysteine
0
HO
Kynurenine
R-Aipocotic acidSNipocotic acid
H N
Dapa(N'-levulinic Dapa(NL-benzophenone] acid] OH
HPcH'
H,N
2-Me-Tyr
a-Me-Tyr
Amino-Phe
2,B-Difluoro-Tyr
Cysteine(Sgeranylgerany1;
Homotyrosine H
0 I
-
o=p-o
-
lH ,NO uN,C OH
OH
0
OH
H,N
0
0
NorLeu
Phospho-SerTThr
HN, R
0
Phospho-Tyr
Tyr phosphonate
N-Biotin
EDTA
HO HO
0
(a-Ga1NAc)SerTThr
0
(p-GlcNAc)Asn
R
N-Fluorescein
R=N'-Lysine
R
N-Rhodamine
N-Dansyl
Fig. 10.1-7 Some o f t h e amino acids introduced into proteins using EPL.
GlyCAM-1 [71]. GlyCAM-1 functions as a ligand for the leukocyte adhesion molecule L-selectin,which is involved in leukocyte trafficking to sites of injury and infection. GlyCAM-1 comprises two glycosylated mucin domains, separated by a central, unglycosylated domain. The mucin domains, which are characterized by clusters of oligosaccharides linked through an a-0-glycosidic bond between N-acetyl galactosamine (GalNAc) and the hydroxyl groups of Ser/Thr residues of the protein backbone, are essential for binding L-selectin. To address the question ofwhich mucin domains are important for GlyCAM-1 function, Bertozzi and Macmillan used EPL to make three semisynthetic
552
I
10 Synthesis ofLarge Biological Molecules
Fig. 10.1-8 Semisynthesis of three different ClyCAM-1 molecules bearing different glycosylation patterns.
versions containing either or both of the mucin domains (Fig. 10.1-8). The two proteins containing only one mucin domain were synthesized using one ligation site between a synthetic glycopeptide and a recombinant protein. GlyCAM-1 containing both mucin domains was created using a three-part sequential ligation strategy with two synthetic glycopeptides and a recombinant thioester protected at the N-terminus with a factor Xa cleavage peptide. The resulting glycoproteins bearing a-GalNAc residues can then be enzymatically elaborated with further glycsosyltransferases to generate the endogenous 6-sulfo sialyl Lewis' motifs required for L-selectin binding. Transforming growth factor /3 (TGFB) is a member of a large family of secreted cytokines of central importance in the eukaryotic development and homeostasis [72]. The initiation of TGFB signaling involves a ligand-induced multiple phosphorylation event ofTGFB receptor I by TGFB receptor I1 (TBR-I andTBR-I1respectively). This yields an activated TBR-I, enabling it to phosphorylate members of the Smad family of transcription factors. The modification of Smads allows them to oligomerize, giving active transcription complexes that can enter the nucleus and mediate gene expression. EPL has been used elegantly to shed light on the molecular mechanisms of many of these steps in the TGFB signaling pathway. To understand the activation of TBR-I by phosphorylation, a semisynthetic version of the receptor was produced containing three phosphoserines and one phosphothreonine [73].Access to this homogeneous preparation of activated TBR-I allowed the mechanism of receptor activation to be studied for the first time [74].Accordingly, phosphorylation was shown to increase the binding affinity of TBR-I for Smad2 and decrease its affinity for an inhibitor of the pathway, FKBP12. These observations yielded a new model of receptor activation in which phosphorylation of the receptor switches it from an inhibited state into an activated form capable of binding substrate. The next step in the pathway, the effectof phosphorylation on Smad2, has also
10.J Expressed Protein Ligation
Fig. 10.1-9 Semisynthetic SmadZ containing two phosphoserines was used to confirm the trimeric state of the active protein.
been investigated using EPL [75]. Phosphorylation occurs in the last two serine residues in the C-terminus of Smad2 during signaling. It had been shown previously that phosphorylation of h a d 2 is indispensable in TGFB signaling, but how phosphorylation affects the conformation and function of Smad was yet to be elucidated. To investigate this, a homogeneous, doubly phosphorylated version of Smad2 was synthesized. Biochemical studies on this protein indicated that phosphorylation induced trimerization of the protein. As show in Fig. 10.1-9,this conclusion was confirmed when the crystal structure of such a trimer was determined. These investigations revealed how phosphorylation of Smad2 allows dissociation from the activated TBR-I receptor and simultaneously induces hetero-oligomerization with a key regulatory protein, Smad4. Muir and coworkers have used EPL to generate two semisynthetic versions of Smad2 to probe its transport to the nucleus. The first such protein contains two phosphates, a fluorescent probe, a fluorescence quenching molecule, and a photocleavable linker (Fig. 10.1-10)[7G]. The linker acts as a bifunctional caging group, both interfering with Smad2 trimerization and quenching the fluorescence of the molecule. Thus, cleavage of this linker with light results in the formation of active protein, as well as the induction of protein fluorescence. Indeed, when examined by gel filtration, the caged protein was found to be incapable of forming trimers, but after cleavage there was a clean conversion to the trimeric state. Importantly, this was also accompanied by an -26-fold increase in fluorescence. This caged protein is currently the focus of study for unraveling the behavior of Smad2 and the kinetics of the TGFB signaling pathway. In a complementary system, the same group synthesized a unique version of Smad2 in which the phosphate groups on the last two serines are photocaged (Fig. 10.1-ll(a))[77]. Again, the caged protein was unable to form the obligatory trimers for signaling. However, after photoactivation the phosphates were released and oligomerization could occur. Furthermore, the semisynthetic protein was used successfully in a nuclear import assay
I
553
554
I
10 Synthesis of Large Biological Molecules
Fig. 10.1-10 Design of caged SmadZ based Photolysis with 365 n m light causes on a modified C-terminal phosphopeptide. simultaneous activation of both Smad2 and fluorescence. Fluorescence and activity of Smad2 are blocked by a photocleavable caging group.
demonstrating that the caged protein behaves controllably and as desired in a biological context (Fig. 10.1-11(b)). The selectivity filter of K+ channels contains four main chain carbonyl oxygen atoms directed toward the pore. These carbonyl oxygens create four K+-binding sites in a row inside the filter. To create these binding sites, the peptide backbone has to adopt an unusual conformation in which the dihedral angles of the four amino acid sequence alternate between the left-handed and right-handed regions of the Ramachandran plot. One way to achieve this conformation is to use alternating L- and D-amino acids. However, in ribosomesynthesized proteins, nature uses exclusively L-amino acids, precluding the enantiomeric D-configuration of side chains. These L-amino acids strongly prefer right-handed a-helical conformations. Glycine is the only amino acid in proteins synthesized by the ribosome to comfortably reside in the lefthanded a-helical region of the Ramachandran plot, and, therefore in this instance, could be acting as a surrogate D-amino acid. Muir, MacKinnon, and coworkers used EPL to construct a semisynthetic version of the K+ channel KcsA containing a D-alanine in place of the conserved glycine (Gly77) [78]. Indeed, it was demonstrated that replacement of Gly77 with D-Ala yielded a protein that exhibited complete retention of function. In contrast, substitution with an L-Ala acid resulted in a nonfunctional channel. Therefore, it was concluded that, above all, glycine is used in the K+ channel’s selectivity filter
10.1 Expressed Protein Ligation
Fig. 10.1-11 (a) Smad2 bearing two caged phophoserines, and its subsequent activation with light. (b) Caged Smad2 is excluded from the nucleus, while deprotected Smad2 forms trimers and accumulates in the nucleus.
to fulfill specific dihedral angle requirements, and, thus, it serves as a D-amino acid surrogate.
10.1.4.3 Introduction of Stable Isotopes EPL has also been used successfully to develop a segmental isotopic labeling strategy designed to overcome the practical size limit for protein structure determination using nuclear magnetic resonance (NMR)spectroscopy [79]. This limit exists because of the loss of spectral resolution occurring from both increased linewidths at longer rotational correlation times, and from the increased number of amino acids in the protein. The first of these problems has to a large extent be overcome with the development of new NMR techniques and technology. However, standard isotopic labeling techniques involving the uniform incorporation of 13C, "N, and 2 H cannot address the problem of signal overlap for larger systems. Segmental isotopic labeling solves this problem by allowing selected portions of a protein to be enriched with NMR active isotopes. Unlabeled regions can then be filtered out of the NMR spectrum using suitable heteronuclear correlation experiments. Therefore, segmental labeling significantly reduces the spectral complexity of large proteins allowing for a variety of NMR experiments. Segmental isotopic labeling has been accomplished using both protein transsplicing and EPL. Yamazaki and coworkers used a protein transsplicing
1
555
556
70 Synthesis of Large Bio/ogica/ Molecules
I system based on a split PI-Pfu intein to selectively *'N label the C-terminal
domain of the Escherichia coli RNA polymerase a subunit [41]. EPL was first applied to this area when a single domain within the Src homology domain derived form the Abl protein tyrosine kinase was labeled with "N [58]. In both these pioneering experiments, one-half of the protein of interest was bacterially expressed using a growth medium enriched with a "N source. Subsequent ligation of this labeled fragment with another protein fragment, in this case unlabeled, yielded the selectively labeled protein. EPL and protein transsplicing have been successfully applied to a variety of proteins and have yielded proteins labeled not only at either termini but in internal segments as well [79]. For example, the mechanism of autoregulation of bacterial D factor was explored using EPL [80]. Autoregulation of this enzyme was purposed to occur through direct interactions between two regions of the protein. By specifically labeling one of these domains, the authors were able to use N M R to argue against a high affinity interaction between the two regions and suggest that autoinhibition of DNA binding occurs through an indirect steric and/or electrostatic mechanism. In another example, Muir and coworkers used internal isotopic labeling to study the mechanism of intein-catalyzed protein splicing [81].The peptide bond at the N-extein-intein junction was labeled with 13Cusing semisynthesis. The subsequent N M R experiments showed that this peptide bond exists in an unusual conformation, which may help catalyze the first step of protein splicing.
10.1.4.4
Topology Engineering of Proteins
Protein engineering has traditionally involved the modification of amino acid side chains, however, there has been increasing interest in altering the underlying backbone and even the overall topology of a protein. Examples of such topological changes include cyclic and branched polypeptides. EPL and protein transsplicing have both been used for the synthesis of cyclic peptides and proteins. Protein circularization is of particular interest because basic polymer theory predicts that cyclization will yield a net thermodynamic stabilization of a protein's folded state owing to reduced conformational entropy in the denatured state. Indeed, some circular proteins prepared by EPL and protein transslicing are more stable than their linear counterparts (e.g., GFP [82], B-lactamase [83], and dihydrofloate reductase (DHFR) [84]). Other proteins, however, such as the c-Crk-11SH3 domain [85] and pancreatic trypsin inhibitor [86], have not been found to be more stable. In both these latter examples, it is likely that unfavorable enthalpic effects (e.g., strain) offset the beneficial entropic effect resulting from circularization. Many pharmaceutically important natural products, including antibiotics and immunosuppressants, are based on cyclic peptides. Therefore, the ability to synthesize backbone cyclic peptides using EPL or protein transsplicing
10.I Expressed Protein Ligation
is an enticing opportunity for drug development. For example, Payan and coworkers used a split intein approach for identifying bioactive peptides [87]. A random cyclic pentapeptide library was introduced into human B cells using a retroviral delivery system. A cell-based screen was then used to identify peptides that exhibited the ability to inhibit the IL-4 signaling pathway. These active peptides have potential as anti-inflammatory therapeutics or may serve as lead compounds for the synthesis of even more efficacious drugs.
10.1.4.5 Protein Splicing in Living Cells
Although a large amount of information can be gleaned from in vitro protein characterization and semisynthesis, characterization of proteins in the context of a living cell is of extreme importance for a complete understanding of their function. Although classical genetic methods to disrupt protein function (e.g., mutagenesis, gene knockouts, and overexpression) and posttranscriptional technology such as RNAi have provided incredible insights into protein function, they have their limitations. Genetic knockouts, although exquisitely precise, can in many instances lead to a lethal phenotype for essential genes or show a limited phenotype in cases of genetic compensation. RNAi can overcome some of these limitations and has been used with great success; however, as with gene knockouts, protein levels cannot be tuned subtlety and thus delicate effects of protein activity are difficult to study. Semisynthesis of proteins in living cells can to some extent surmount these problems, as it is an inducible, temporal, and tunable technology for the modulation of protein function at the posttranslational level. Muir and Giriat described the first example of protein semisynthesis in a living cell (Fig. 10.1-12)[88].In this system, a protein ofinterest is expressed in cultured cells with the first half of the naturally occurring Ssp DnaE split intein (inteinN)genetically fused to its C-terminus. Then a semisynthetic polypeptide, comprising the second half of the intein (intein') covalently attached to a synthetic probe and a protein transduction domain (PTD) peptide, is added to the cellular media. The PTD peptide delivers the semisynthetic construct into the cells, where the intein' can interact with its complementary half, triggering protein splicing. This yields the protein of interest linked to the probe through a native peptide bond. As a proof of principle, GFP was ligated to a short synthetic peptide on the basis of the FLAG epitope. Muir and coworkers have developed a technology to control protein splicing in a living cell. This technology, termed conditional protein splicing (CPS),relies on the FKBP/rapamycin/FRB three-hybrid heterodimerization system [89]. Fusing separate halves ofa split intein to either FKBP or FRB allows the intein fragments to be brought together in response to the dimerizer molecule. Provided the juxtaposition of the intein fragments in the resulting dimer is compatible with functional complementation, this results in spicing together of the flanking extein sequences (Fig. 10.1-13(a)).This was realized through
I
557
558
I
10 Synthesis of Large Biological Molecules
Fig. 10.1-12 Principle o f protein semisynthesis in living cells. The protein transduction domain (PTD) delivers the probe t o the cell, which is followed by complementation o f the DnaE intein halves and protein splicing.
the use of an artificially split S. cerevisiae VMA intein. Two model exteins, maltose binding protein (MBP) and a polyhistidine-containing sequence (HIS), were used to explore the scope of the technology. CPS displays little to no background and produces the product within 10min of the addition of rapamycin, indicating the advantage of the posttranslational nature of CPS for quick responses. Furthermore, the level of product formation was dose and time dependent (Fig. 10.1-13(b))and can be attenuated with inhibitors of the three-hybrid system, such as ascomycin [go]. Because of the promiscuity of inteins for their flanking extein sequences, CPS is expected to have a certain level of generality. In fact, the only strict extein sequence requirement is the cysteine residue of the C-extein, necessary in EPL. In the most general form of CPS, a polypeptide with a novel function could be obtained by splicing together two fragments that lack function individually. This general goal can be achieved in several ways. For example, two domains of a protein that display no activity could be spliced together to give a functional protein. Alternatively, one splicing partner could be a peptide localization sequence, resulting in relocalization of the splicing product on addition of rapamycin. Liu and coworkers have recently developed a different strategy for smallmolecule activated protein splicing [91]. In this report, an intein was inserted
70.7 Expressed Protein Ligation
Fig. 10.1-13 (a) Principle of conditional protein splicing (CPS) A split intein is reconstituted by the addition of rapamycin, which heterodimerizes FKBP and F R B resulting i n protein splicing (b) Dose and t i m e dependence o f the CPS reaction
into a protein of interest, interrupting its function, which is restored after splicing. Simple insertion ofa natural ligand-binding domain into a minimal intein, destroyed the splicing activity and yielded an evolvable intein-based molecular switch that transduces binding of a srnall molecule into the activation of a protein of interest. Specifically, the Mycobui-terium tuberculosis RecA intein was modified with the human estrogen receptor- ( E R ) ligand binding domain (LBD) (residues 304-55 I ) ,which binds the small-molecule 4-hydroxytamoxifen. This protein was then evolved through multiplr rounds of mutation and selection in S.ctrevkiat by linking the splicing to cell survival or fluorescence. Iterated cycles of inutagenesis and selection yielded intcins with strong splicing activities that depended highly on the presencc ofthe srnall molecule. Insertion of one of these inteins into different unrelated proteins in living cells revealed
I
559
560
I that the technology allows for ligand-dependent protein function that it is 10 Synthesis of Large Biological Molecules
fairly rapid, dose dependent, and posttranslational. This system represents an exciting complementary technology to the CPS discussed above.
10.1 .s Future Development
Because of the power of EPL and protein splicing, these techniques will undoubtedly be used for many applications in the future. EPL provides researchers with a versatile tool for the study of protein function by allowing the preparation of proteins containing both natural and artificial modifications. As seen above, this technology is well suited for biochemical and biophysical studies; however, it may also be a valuable tool for areas such as proteomics, material science, and nanotechnology. For example, the Yao group has reported on the preparation of a protein microarray by first biotinylating proteins using EPL and then spatially arranging these on an avidin-coated slide [92]. Importantly, EPL ensures that the site of modification in all proteins is consistent with respect to the site of immobilization, the C-terminus in this case. These types ofprotein surfaces could be used for both proteomic profiling of cellular interactions and protein modifications. In addition, homogeneous surfaces coated with specific proteins can be prepared, which can be useful for materials and other biophysical applications (e.g., assay development, and cellular patterning). The highly controlled nature of EPL could also be used in the areas of biomedicine, through the generation of novel protein therapeutic drugs and diagnostic tools. In one such example, Sydor et al. established conditions that allow single-chain antibodies to be utilized in EPL reactions [93].Thus, it should now be possible to attach any synthetic molecule to the C-terminus of an antibody. Used in conjugation with technologies such as quantum dots and contrast reagents, EPL can be powerful in the area of bioimaging, as well as vaccine development and targeted-drug delivery. Protein transsplicing also has potential in the area of proteomics. The Umezawa group has developed a two-hybrid approach to probe for protein-protein interactions in the cytosol of prokaryotic [94] and eukaryotic cells [95]. The strategy involves fusing each half of a reporter protein (GFP or luciferase) to the appropriate end of a split intein. The intein fragments are then fused to either a receptor protein (fish) or to a library of potential ligands (bait). Interaction between a fish and bait pair results in protein splicing and generation of an active reporter protein. This type of strategy could be extended to profile interacting partners of a protein of interest, by tagging binding partners with a reporter construct. CPS could also be extended to the investigation of enzymes and signaling proteins. Indeed, this has already been accomplished in vitro through the generation of an inducible version of the kinase PKA [96]. Extrapolation of this technology to cellular systems should
References I561
follow in due course, and the development of nontoxic rapamycin analogs [97] may broaden the technology to living animals.
10.1.6 Conclusion
As noted at the beginning of this chapter, a true understanding of biological processes requires that they be studied in a context that accounts for tissue and cell-type expression, modification patterns, and temporal changes in these patterns. EPL and protein splicing have been used with great success to scratch the surface of some of these questions by allowing for homogeneous protein engineering. In the future, these technologies should provide for a more intimate understanding of protein structure and function.
References 1. P. Cohen, The development and
2.
3.
4.
5.
6.
7.
8.
therapeutic potential of protein kinase inhibitors, Curr. Opin. Chem. Bid. 1999, 3,459-465. N.L. Pohl, Functional proteomics for the discovery of carbohydrate-related enzyme activities, C u r . Opin. Chem. Biol. 2005, 9, 76-81. J.M. Antos, M.B. Francis, Selective tryptophan modification with rhodium carbenoids in aqueous solution, J . Am. Chem. SOC.2004, 126,10256-10257. N.S. Joshi, L.R. Whitaker, M.B. Francis, A three-component Mannich-type reaction for selective tyrosine bioconjugation, J. Am. Chem. SOC.2004, 126,15942-15943. I. Chen, A.Y. Ting, Site-specific labeling of proteins with small molecules in live cells, Curr. Opin. Biotechnol. 2005, 16, 35-40. P.M. England, Unnatural amino acid mutagenesis: a precise tool for probing protein structure and function, Biochemistry 2004, 43, 11623-11629. L. Wang, P.G. Schultz, Expanding the genetic code, Angew. Chem., Int. Ed. E& 2004, 44,34-66. T.W. Muir, D. Sondhi, P.A. Cole, Expressed protein ligation: a general method for protein engineering, Proc.
9.
10.
11.
12.
13.
14.
15.
Natl. Acad. Sci. U.S.A. 1998, 95, 6705-6710. K. Severinov, T.W. Muir, Expressed protein ligation, a novel method for studying protein-protein interactions in transcription, J . Biol. Chem. 1998, 273,16205-16209. T.C. Evans Jr, I. Benner, M.Q.Xu, Semisynthesis of cytotoxic proteins using a modified protein splicing element, Protein Sci. 1998, 7, 2256-2264. T.W. Muir, Semisynthesis ofproteins by expressed protein ligation, Annu. Rev. Biochem. 2003, 72, 249-289. R. David, M.P. Richter, A.G. Beck-Sickinger, Expressed protein ligation. Method and applications, Eur. J . Biochem. 2004, 271,663-677. C.J. Wallace, Peptide ligation and semisynthesis, Curr. Opin. Biotechnol. 1995, 6,403-410. D.F. Dyckes, T. Creighton, R.C. Sheppard, Spontaneous re-formation of a broken peptide chain, Nature 1974,247,202-204. C.J. Wallace, I. Clark-Lewis, Functional role of heme ligation in cytochrome c. Effects of replacement of methionine 80 with natural and non-natural residues by
562
I
10 Synthesis of Large Biological Molecules
16.
17.
18.
19.
20.
21.
22.
23.
24.
25. K. Rose, Facile synthesis of homosemisynthesis,]. Biol. Chem. 1992, geneous artificial proteins,]. Am. 267,3852-3861. Chem. SOC.1994, 116,30-33. Y. Chen, Y.W. Ebright, R.H. Ebright, 26. M. Schnnlzer, S.B.H. Kent, Identification of the target of a transcription activator protein by Constructing proteins by dovetailing protein-protein photocrosslinking, unprotected synthetic Science 1994, 265, 90-92. peptides-backbone-engineered HIV J. Mukhopadhyay, A.N. Kapanidis, protease, Science 1992, 256, 221-225. V. Mekler, E. Kortkhonjia, Y.W. 27. P.E. Dawson, S.B. Kent, Synthesis of Ebright, R.H. Ebright, Translocation native proteins by chemical ligation, ofo(70)with RNA Polymerase during Annu. Rev. Biochem. 2000, 69, transcription: fluorescence resonance 923-960. 28. p , ~D, ~T.W. ~~ ~ ~ i ~~ energy transfer assay for movement relative to DNA, Cell 2001, 106, I. Clark-Lewis,S.B. Kent, Synthesis of 45 3-463. proteins by native chemical ligation, D. Macmillan, R.M. Bill, K.A. Sage, Science 1994, 266, 776-779. D. Fern, S.L. Flitsch, Selective in vitro 29. M, Chytil, B,R, peterson, D,A, glycosylation of recombinant proteins: Erlanson, G,L, Verdine, The semi-synthesis Of novel homogeneous orientation ofthe AP-1 heterodimer on glycoforms of human erythropoietin, DNA strongly affects transcriptional Chem. Bid. 2001, 8,133-145. potency, Proc. Natl. Acad. Sci. U.S.A. M. Ghosh, I. Ichetovkin, X. Song, J.S. 1998, 95, 14076-14081, Condeelis, D.S. Lawrence, A new 30. C.J. Noren, J. Wang, F.B. Perler, strategy for caging proteins regulated Dissecting the chemistry of protein by kinases,]. Am. Chem. SOC.2002, splicing and its applications, Angew. 124,2440-2441. Chem., [nt. Ed. Engl. 2000, 39, G.A. Homandberg, M. Laskowski Jr, 450-466. Enzymatic resynthesis of the 31. H. Paulus, Protein splicing and related hydrolyzed peptide bond(s) in forms of protein autoprocessing, ribonuclease S, Biochemistry 1979, 18, Annu. Rev. Biochem. 2000, 69, 586-592. 447-496. D.Y. Jackson, J. Burnier, C. Quan, 32. M.Q. Xu, F.B. Perler, The mechanism M. Stanley, J. Tom, J.A. Wells, A designed peptide ligase for total of protein splicing and its modulation by mutation, EMBO]. 1996, 15, synthesis of ribonuclease A with unnatural catalytic residues, Science 5146-5153. 33. I. Giriat, T.W. Muir, F.B. Perler, 1994,266,243-247. F. Bordusa, Proteases in organic Protein splicing and its applications, Genet. Eng. (N.Y.) 2001, 23, 171-199. synthesis, Chem. Rev. 2002, 102, 4817-4868. 34. S. Chong, F.B. Mersha, D.G. Comb, H.F. Gaertner, K. Rose, R. Cotton, M.E. Scott, D. Landry, L.M. Vence, D. Timms, R. Camble, R.E. Offord, F.B. Perler, J. Benner, R.B. Kucera, Construction of protein analogues by C.A. Hirvonen, J.J. Pelletier, H. Paulus, M.Q. Xu, Single-column site-specificcondensation of unprotected fragments, Bioconjugate purification of free recornbinant proteins using a self-cleavableaffinity Chem. 1992,3,262-268. H.F. Gaertner, R.E. Offord, R. Cotton, tag derived from a protein splicing D. Timms, R. Camble, K. Rose, element, Gene 1997, 192,271-281. Chemo-enzymic backbone 35. T.C. Evans Jr, J. Benner, M.Q. Xu, The engineering of proteins. Site-specific in vitro ligation of bacterially incorporation of synthetic peptides expressed proteins using an intein from Methanobacterium themoautothat mimic the 64-74 disulfide loop of granulocyte colony-stimulating factor, trophicum, 1.Bid. Chem. 1999, 274, I. Bid. Chem. 1994, 269,7224-7230. 3923-3926.
,~
,
References I 5 6 3
S. Mathys, T.C. Evans, I.C. Chute, H. Wu, S. Chong, J. Benner, X.Q. Liu, M.Q. Xu, Characterization of a self-splicing mini-intein and its conversion into autocatalytic N- and C-terminal cleavage elements: facile production of protein building blocks for protein ligation, Gene 1999, 231, 1-13. 37. D.W. Wood, W. Wu, G. Belfort, V. Derbyshire, M. Belfort, A genetic system yields self-cleaving inteins for bioseparations, Nut. Biotechnol. 1999, 17,889-892. 38. M.W. Southworth, E. Adam, D. Panne, R. Byer, R. Kautz, F.B. Perler, Control of protein splicing by intein fragment reassembly, E M B O J . 1998, 17,918-926. 39. K.V. Mills, B.M. Lew, S. Jiang, H. Paulus, Protein splicing in trans by purified N- and C-terminal fragments of the Mycobacterium tuberculosis RecA intein, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 3543-3548. 40. H. Wu, Z. Hu, X.Q. Liu, Protein trans-splicing by a split intein encoded in a split DnaE gene of Synechocystis sp. PCC6803, Proc. Nutl. Acad. Sci. U.S.A. 1998, 95,9226-9231. 41. T. Yamazaki, T. Otomo, N. Oda, Y. Kyogoku, K. Uegaki, N. Ito, Y. Ishino, H. Nakamura, Segmental isotope labeling for protein NMR using peptide splicing, /. Am. Chem. SOC.1998, 120,5591-5592. 42. B.J. Backes, ].A. Ellman, An alkanesulfonamide “safety-catch” linker for solid-phase synthesis, /. Org. Chem. 1999, 64,2322-2330. 43. Y. Shin, K.A. Winans, B.J. Backes, S.B.H. Kent, J.A. Ellman, C.R. Bertozzi, Fmoc-based synthesis of peptide-(cu)thioesters: Application to the total chemical synthesis of a glycoprotein by native chemical ligation, /. Am. Chem. Soc. 1999, 121, 11684-11689. 44. Y. Kwon, K. Welsch, A.R. Mitchell, J.A. Camarero, Preparation of peptide p-nitroanilides using an aryl hydrazine resin, Org. Lett. 2004, 6, 3801-3804. 45. 1.D. Warren, 1,s. Miller, S.I. Keding, S.J. Danishekky, Toward fully ” 36.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
synthetic glycoproteins by ultimately convergent routes: a solution to a long-standing problem, /. Am. Chem. Soc. 2004, 126,6576-6578, P. Botti, M. Villain, S. Manganiello, H. Gaertner, Native chemical ligation through in situ 0 to S acyl shift, Org. Lett. 2004, 6, 4861-4864. M. Villain, J. Vizzavona, K. Rose, Covalent capture: a new tool for the purification of synthetic and recombinant polypeptides, Chem. Biol. 2001, 8,673-679. D. Bang, S.B. Kent, A one-pot total synthesis of crambin, Angew.Chem., lnt. Ed. Engl. 2004, 43, 2534-2538. G.J. Cotton, B. Ayers, R. Xu, T.W. Muir, Insertion of a synthetic peptide into a recombinant protein framework: a protein biosensor, /. Am. Chem. Soc. 1999, 121, 1100-1101. R.M. Hofmann, T.W. Muir, Recent advances in the application of expressed protein ligation to protein engineering, Curr. Opin. Biotechnol. 2002, 13,297-303. L.E. Canne, S.J. Bark, S.B. Kent, Extending the applicability of native chemical ligation, 1.Am. Chem. Soc. 1996, 118,5891-5896. L.Z. Yan, P.E. Dawson, Synthesis of peptides and proteins without cysteine residues by native chemical ligation combined with desulfurization, 1.Am. Chem. SOC. 2001, 123,526-533. E. Saxon, C.R. Bertozzi, Cell surface engineering by a modified Staudinger reaction, Science 2000, 287, 2007-2010. B.L. Nilsson, R.J. Hondal, M.B. Soellner, R.T. Raines, Protein assembly by orthogonal chemical ligation methods, 1.Am. Chem. Soc. 2003, 125,5268-5269. R.J. Hondal, B.L. Nilsson, R.T. Raines, Selenocysteine in native chemical ligation and expressed protein ligation, /. Am. Chem. SOC.2001, 123, 5140- 5141. D. Wang, P.A. Cole, Protein tyrosine kinase Csk-catalyzed phosphorylation of Src containing unnatural tyrosine analogues, 1. Am. Chem. Sac. 2001, 123, f883-8886.
564
I
10 Synthesis of Large Biological Molecules 57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
K. Alexandrov, I . Heinemann, T. Durek, V. Sidorovitch, R.S. Goody, H. Waldmann, Intein-mediated synthesis of geranylgeranylated Rab7 protein in vitro, /. Am. Chem. SOC. 2002, 124,5648-5649. R. Xu, B. Ayers, D. Cowburn, T.W. Muir, Chemical ligation of folded recombinant proteins: segmental isotopic labeling of domains for N M R studies, Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 388-393. F.I. Valiyaveetil, R. MacKinnon, T.W. Muir, Semisynthesis and folding of the potassium channel KcsA, 1.Am. Chem. SOC.2002, 124,9113-9120. T.M. Hackeng, J.H. Griffin, P.E. Dawson, Protein synthesis by native chemical ligation: expanded scope by using straightforward methodology, Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 10068- 10073. S. Chong, K.S. Williams, C. Wotkowicz, M.Q. Xu, Modulation of protein splicing of the Saccharomycescerevisiae vacuolar membrane ATPase intein, /. Biol. Chem. 1998,273,10567-10577. R.Y. Tsien, The green fluorescent protein, Annu. Rev. Biochem. 1998, 67, 509-544. T.J. Tolbert, C.-H. Wong, Inteinmediated synthesis of proteins containing carbohydrates and other molecular probes, /. Am. Chem. SOC. 2000, 122,5421-5428. V. Mekler, E. Kortkhonjia, J. Mukhopadhyay, J. Knight, A. Revyakin, A.N. Kapanidis, W. Niu, Y.W. Ebright, R. Levy, R.H. Ebright, Structural organization of bacterial RNA polymerase holoenzyme and the RNA polymerase-promoter open complex, Cell 2002, 108, 599-614. A. lakovenko, E. Rostkova, E. Merzlyak, A.M. Hillebrand, N.H. Thoma, R.S. Goody, K. Alexandrov, Semi-synthetic Rab proteins as tools for studying intermolecular interactions, FEBS Lett. 2000, 468, 155- 158. A. Rak, 0. Pylypenko, T. Durek, A. Watzke, S. Kushnir, L. Brunsveld, H. Waldmann, R.S. Goody,
67.
68.
69.
70.
71.
72.
73.
74.
75.
K. Alexandrov, Structure of Rab GDP-dissociation inhibitor in complex with prenylated YPTl GTPase, Science 2003,302,646-650. P.R. Selvin, Fluorescence resonance energy transfer, Methods Enzymol. 1995,246,300-334. G.J. Cotton, T.W. Muir, Generation of a dual-labeled fluorescence biosensor for Crk-I1 phosphorylation using solid-phase expressed protein ligation, Chem. Biol. 2000, 7,253-261. R.M. Hofmann, G.J. Cotton, E. J. Chang, E. Vidal, D. Veach, W. Bornmann, T.W. Muir, Fluorescent monitoring of kinase activity in real time: development of a robust fluorescence-based assay for Abl tyrosine kinase activity, Bioorg. Med. Chem. Lett. 2001, 11,3091-3094. A. Varki, R. Cummings, J. Esko, Essentials of Clycobiology, Cold Spring Harbor Labs, Cold Spring Harbor, 1999. D. Macmillan, C.R. Bertozzi, Modular assembly of glycoproteins: towards the synthesis of GlyCAM-1 by using expressed protein ligation, Angew. Chem., Int. Ed. Engl. 2004, 43, 1355-1359. P.M. Siegel, J. Massague, Cytostatic and apoptotic actions of TGFP in homeostasis and cancer, Nat. Rev. Cancer 2003,3,807-821. M. Huse, M.N. Holford, J. Kuriyan, T.W. Muir, Semisynthesis of hyperphosphorylated type I TGFB receptor: addressing the mechanism of kinase activation, /. Am. Chem. SOC. 2000, 122,8337-8338. M . Huse, T.W. Muir, L. Xu, Y.G. Chen, J. Kuriyan, J. Massague, The TGF beta receptor activation process: an inhibitor- to substrate-binding switch, Mol. Cells 2001, 8, 671-682. J.W. Wu, M. Hu, J. Chai, J. Seoane, M. Huse, C. Li, D.J. Rigotti, S. Kyin, T.W. Muir, R. Fairman, J. Massague, Y. Shi, Crystal structure o f a phosphorylated Smad2. Recognition ofphosphoserine by the MH2 domain and insights on Smad function in TGF-beta signaling, Mol. Cells 2001, 8, 1277-1289.-
References I 5 6 5 76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
J.P. Pellois, M.E. Hahn, T.W. Muir, Simultaneous triggering of protein activity and fluorescence, /. Am. Chem. Soc. 2004, 126,7170-7171. M.E. Hahn, T.W. Muir, Photocontrol of Smad2, a multiphosphorylated cell-signaling protein, through caging of activating phosphoserines, Angew. Chem., Int. Ed. Engl. 2004, 43, 5800-5803. F.I. Valiyaveetil, M. Sekedat, R. Mackinnon, T.W. Muir, Glycine as a D-amino acid surrogate in the K(+)-selectivity filter, Proc. Natl. Acad. Sci. U.S.A. 2004, 101,17045-17049. D. Cowburn, T.W. Muir, Segmental isotopic labeling using expressed protein ligation, Methods Enzymol. 2001,339,41-54. J.A. Camarero, A. Shekhtman, E.A. Campbell, M. Chlenov, T.M. Gruber, D.A. Bryant, S.A. Darst, D. Cowburn, T.W. Muir, Autoregulation of a bacterial m factor explored by using segmental isotopic labeling and N M R , Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 8536-8541. A. Romanelli, A. Shekhtman, D. Cowburn, T.W. Muir, Semisynthesis of a segmental isotopically labeled protein splicing precursor: N M R evidence for an unusual peptide bond at the N-extein-intein junction, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 6397 - 6402. H. Iwai, A. Lingel, A. Pluckthun, Cyclic green fluorescent protein produced in vivo using an artificially split PI-PfuI intein from Pyrococcus furiosus,J. Biol. Chem. 2001, 276, 16548-16554. H. Iwai, A. Pluckthun, Circular b-lactamase: stability enhancement by cyclizing the backbone, FEBS Lett. 1999,459,166-172. C.P. Scott, E. Abel-Santos, M. Wall, D.C. Wahnon, S.J. Benkovic, Production of cyclic peptides and proteins in vivo, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,13638-13643. J.A. Camarero, D. Fushman, S. Sato, I. Giriat. D. Cowburn, D.P. Raleigh, T.W. Muir, Rescuing a destabilized
86.
87.
88.
89.
90.
91.
92.
93.
94.
protein fold through backbone cyclization, /. Mol. Biol. 2001, 308, 1045- 1062. D.P. Goldenberg, T.E. Creighton, Folding pathway of a circular form of bovine pancreatic trypsin inhibitor, /. Mol. Biol. 1984, 179, 527-545. T.M. Kinsella, C.T. Ohashi, A.G. Harder, G.C. Yam, W. Li, B. Peelle, E.S. Pali, M.K. Bennett, S.M. Molineaux, D.A. Anderson, E.S. Masuda, D.G. Payan, Retrovirally delivered random cyclic Peptide libraries yield inhibitors of interleukin-4 signaling in human B cells, J . Biol. Chem. 2002, 277, 37512-37518. I. Giriat, T.W. Muir, Protein semi-synthesis in living cells, /,Am. Chem. SOC.2003, 125,7180-7181. H.D. Mootz, T.W. Muir, Protein splicing triggered by a small molecule, 1.Am. Chem. SOC.2002, 124, 9044- 9045. H.D. Mootz, E.S. Blum,A.B. Tyszkiewicz, T.W. Muir, Conditional protein splicing: a new tool to control protein structure and function in vitro and in vivo, J. Am. Chem. SOC.2003, 125,10561-10569. A.R. Buskirk, Y.C. Ong, Z. J. Gartner, D.R. Liu, Directed evolution of ligand dependence: small-molecule-activated protein splicing, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 10505-10510. M.L. Lesaicherre, R.Y.P. Lue, G.Y.J. Chen, Q. Zhu, S.Q. Yao, Intein-mediated biotinylation of proteins and its application in a protein microarray, I . Am. Chem. SOC. 2002, 124,8768-8769. J.R. Sydor, M. Mariano, S. Sideris, S. Nock, Establishment of intein-mediated protein ligation under denaturing conditions: C-terminal labeling of a single-chain antibody for biochip screening, Bioconjugate Chem. 2002, 13,707-712. T. Ozawa, S. Nogami, M. Sato, Y. Ohya, Y. Umezawa, A fluorescent indicator for detecting protein-protein interactions in vivo based on protein splicing, Anal. Chem. 2000, 72, 515 1- 5157.
566
I
10 Synthesis of Large Biological Molecules 95.
96.
T. Ozawa, A. Kaihara, M. Sato, K. Tachihara, Y. Umezawa, Split luciferase as an optical probe for detecting protein-protein interactions in mammalian cells based on protein splicing, Anal. Chern. 2001, 73, 2516-2521. H.D. Mootz, E.S. Blum, T.W. Muir, Activation of an autoregulated protein kinase by conditional protein splicing,
97.
Angew. Chem., Int. Ed. Engl. 2004, 43, 5189-5192. S.D. Liberles, S.T. Diver, D.J. Austin, S.L. Schreiber, Inducible gene expression and protein translocation using nontoxic ligands identified by a mammalian three-hybrid screen, Proc. Natl. Acad. Sci. U.S.A.1997, 94, 7825-7830.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
70.2 Chemical Synthesis offroteins and Large Bioconjugates
10.2 Chemical Synthesis o f Proteins and Large Bioconjugates
Philip Dawson
Outlook
This chapter describes the strategies and techniques used to chemically synthesize large macromolecules. Due to the large size and functional diversity of biological macromolecules, traditional approaches that require extensive use of protecting groups have limited utility. Instead, biological macromolecules are synthesized using chemical ligation methods that utilize highly chemoselective reactions to link medium sized synthetic precursors without the need of extensive functional group protection. Although these reactions are used for the synthesis of carbohydrates and nucleic acids, the general principles will be described with a focus on the chemical synthesis of proteins. 10.2.1 Introdudion
In many ways, proteins represent the most functionally diverse family of organic molecules. Polypeptides fold to form enzymes that are potent catalysts of an astounding variety of chemical transformations, and molecular machines and motors drive the movement cargo within cells and cell motility. Other proteins form selective ion channels and highly specific binding proteins, while others display structural roles for maintaining cellular structure or for forming the coat of a virus. Much of our knowledge about protein function is a result of detailed biophysical analysis of altered proteins. These proteins are produced using site-specific amino acid substitutions enabled by a technique termed site-directed mutagenesis [l].Although these techniques are powerful, the ability to incorporate noncoded elements of structure and function enables new questions to be experimentally addressed and the ability may also be applied in the development of novel proteins with altered functions for use as pharmaceuticals, biosensors, or for applications in nanotechnology [2-41. The sophisticated tools of organic synthesis have enabled the straightforward assembly of biopolymers such as peptides, oligonucleotides, and carbohydrates. Many complex biopolymers can be assembled using classical solution phase organic synthesis. In addition, solid phase organic chemistry, originally developed for the synthesis of these biopolymers [ S , 61, has greatly facilitated the handling and solubility of protected biological macromolecules. These methods have been further elaborated for the synthesis of more complex biopolymers containing nonstandard subunits such as posttranslational modifications to amino acids, unnatural amino acids, unnatural base pairs, and modified glycans. Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Cunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA. Weinheim ISBN: 978-3-527-31150-7
I
567
568
I
10 Synthesis of Large Biological Molecules
However, the application of these tools becomes significantly more challenging as the molecular weight and functional group complexity of the biological macromolecules increase [7]. As a result, the synthesis of large proteins and their bioconjugates remains a significant challenge. To address these challenges, a growing set of highly chemoselective reactions has been developed that enables the conjugation of unprotected fragments of biological macromolecules in aqueous solution [2, 8, 91. These chemoselective ligation reactions bridge the gap between the biopolymers accessible by classical solution phase and solid phase methodologies and the larger products that correspond to macromolecules such as proteins and glycoproteins. Although this chapter will focus on proteins and protein conjugates, the chemoselective ligation approach can be used to covalently assemble any large organic molecule of interest, and is not limited to biological polymers. 10.2.2 History/Developrnent 10.2.2.1
Chemical Synthesis o f Peptides
The goal of attaining synthetic access to proteins was a stated goal of Emil Fisher at the turn of the twentieth century [lo]. Early approaches for peptide synthesis utilized a-haloacids, acyl chlorides, and azide coupling methods [lo, 111. Interestingly a-haloacids are currently commonly used both in the synthesis of N-alkyl peptides [12]and for chemical ligation [13, 141. Indeed, the challenge of synthesizing peptides has driven the development of key methods used in modern synthetic organic chemistry including the use of reversible protecting groups [15], novel activation methods for carboxylic acids [lG], as well as solid supported organic chemistry [S, GI. The chemical synthesis of polypeptides in solution was refined throughout the twentieth century with notable achievements such as the synthesis of glutathione, oxytocin, and B-corticotrophin. Although these methods tend to be time consuming and suffer from extreme solubility problems of large fully protected fragments, the synthesis of several proteins in using traditional solution phase methods has been achieved, notablyangiogenin (123 aa), and Midkine (121 aa) by Sakakibara and coworkers [17].The use of solvent mixtures greatly enhanced the solubility of late-stage fully protected synthetic products [ 171. More recently, the solubility problem of fully protected peptides has been addressed by reversible backbone protection strategies that disrupt aggregation through backbone hydrogen bonding [18, 191. 10.2.2.2
Solid Phase Peptide Synthesis
Despite the achievements of polypeptide synthesis in solution, currently at the research level, most polypeptides are synthesized by solid phase peptide synthesis (SPPS) [S, GI. This approach, pioneered by Bruce Merrifield revolutionized the synthesis of peptides and the principles have been applied
10.2 Chemical Synthesis ofproteins and Large Bioconjugates
to oligonucleotides and in recent years, carbohydrates. The essential idea was to covalently anchor the C-terminal residue of a peptide to an insoluble swollen polymer support. The subsequent amino acids could then be assembled in a stepwise manner with activated amino acids while the growing polypeptide chain remained on the “solid support.” Following chain assembly, the polypeptide could be cleaved from the support and deprotected to yield the desired polypeptide product. The advantages of the method were that synthetic intermediates did not require extensive isolation and purification following the coupling of each amino acid. Instead, all reagents could be washed away, leaving the polypeptide attached to the solid support. The facile removal of reagents enabled an excess of activated amino acids to be used to ensure pseudo first-order kinetics throughout the course of the coupling reaction. One key advantage of SPPS, which is often overlooked, is the tremendous solvation of the peptide on the solid support. As discussed before, fully protected peptides are poorly soluble in organic solvents such as dimethylformamide (DMF). However, as the polypeptide grows on a solid support (typically cross-linked polystyrene, although many new resins have been introduced in recent years) the peptide remains soluble and the peptide resin swells as much as 10-foldin volume. As a result, resin bound peptides are effectively in solution at a much higher concentration than the same peptide that is free in solution [20]. Through years of intense effort to perfect protecting groups, coupling reagents, and deprotection strategies, SPPS has become a standard technique for making polypeptides. There are two basic protecting group strategies used in a majority of peptide syntheses. The first method, Boc/bzl uses trifluoroacetic acid (TFA) for deprotection of the Boc group at the N-terminus of the growing peptide chain and hydrofluoric acid (HF) for side chain deprotection and cleavage from the solid support [5-71. The second method is Fmoc/tBu in which the N-terminal Fmoc group is removed by a treatment with base (piperidine) and TFA is used to deprotect side chains and cleave the peptide from the resin [21]. In addition to improvements in synthetic techniques, SPPS has been enabled through the development of powerful methods for the analysis and subsequent purification of the complex mixture of products typically produced by SPPS. In particular, the development of reversed phase, high performance liquid chromatography (HPLC) and macromolecular mass spectrometry [22],matrix assisted laser desorption/ionization mass spectrometry (MALDI) [23] and electrospray ionization mass spectrometry (ESI-MS) [24]have revolutionized our ability to produce high quality synthetic peptides.
10.2.2.3
Protein Synthesis using Peptide Fragments Derived from Solid Phase Peptide Synthesis
The ability of SPPS to generate high purity polypeptides (30-GO amino acids) in reasonable yields (5-25% based on the loading of the C-terminal amino acid) has lead to the development of approaches to assemble these
1
569
570
I polypeptide fragments into the large polypeptides that compose proteins. One 10 Synthesis $Large Biological Mo/ecu/es
approach uses the backbone protection methods described above to enable the purification and assembly of protected peptide fragments [25].However, more frequently, these approaches start with largely unprotected peptides derived from SPPS and purified by HPLC. 10.2.2.4
Partially Protected Peptides
Peptide fragment condensation using partially protected fragments in polar organic solvents was developed as a strategy to avoid some of the solubility and deprotection problems associated with fully protected peptides [26]. One key observation of this approach was that many amino acid side chains such as those of Ser, Thr, Asp, Glu, His, Asn, Gln, and Trp could be left unprotected during fragment coupling while the amino group of Lys and the thiol group of Cys required protection. The second key observation was that thioacid (and later thioester) groups could be chemoselectively activated toward acylation in the presence of Glu and Asp carboxylic acid side chains. In this method (Fig. 10.2-1), peptides were synthesized by SPPS on a resin that yielded a C-terminal thioacid group. These peptides were deprotected and cleaved from the solid support and the resulting unprotected peptides were purified to homogeneity by chromatography. In order to assemble these peptides, the Lys side chains had to be selectively reprotected. This approach has been refined to enable the synthesis of several proteins, some with posttranslational modifications. For example, CAMP response element binding protein with two phosphorylated threonine residues was synthesized by this method [27]. However, the general use of these methods has been limited because of the
Fig. 10.2-1 Thioester method for the fragment condensation of partially protected peptides. (R
=
Horalkyl).
10.2 Chemical Synthesis ofproteins and Large Bioconjugates
challenges associated with side chain reprotection and epimerization of the C-terminal activated amino acid in polar organic solvents. In addition, a final deprotection of a large peptide is still necessary to complete the synthesis. A philosophically different approach for the coupling of partially protected peptides was developed by Kemp (Fig. 10.2-2) [28]. In this method, the intermolecular linking of the peptides was achieved by an initial, nonamide forming reaction - a rapid asymmetric disulfide formation between an N-terminal Cys peptide and a peptide with a C-terminal 4-hydroxy-6-mercaptodibenzofuran ester. Once the peptide fragments joined together, an intramolecular 0 to N acyl shift enabled peptide bond formation using moderate activation of
Fig. 10.2-2 Auxiliary mediated segment condensation in organic solvent.
I
571
572
I the C-terminus (aryl ester). Since the method avoids strong activation of the 10 Synthesis of large Biological Molecules
C-terminus, most side chains did not need protection except for the Cys thiol group. In addition, this approach was not demonstrated using Lys with an unprotected side chain amine. However, these acyl transfer reactions proceeded over several hours in dimethylsulfoxide (DMSO)/base and enabled the synthesis of several peptides, up to 39 amino acids.
10.2.2.5
Chemoselective Ligation of Unprotected Peptides
The majority of chemically synthesized proteins have been synthesized using chemoselective ligation methods. In principle, the problems associated with protected peptides could be avoided entirely by using fully unprotected peptides. However, this approach is complicated by the lack of selectivity of fragment coupling chemistries for the N-terminal amine over Lys side chain amino groups. The initial approaches to solve this problem were enabled by the powerful insight that molecules as large as proteins are able to tolerate significant changes to their covalent structure without significant affects to their function. For example, Ala scanning mutagenesis of proteins has demonstrated the tolerance of most side chains to alteration, except for a select few critical residues involved in binding or catalysis [29]. As a result, the synthetic chemist need not be limited to amide bond formation to link peptides together if the object is to use synthetic chemistry to understand and manipulate proteins. With this insight in mind, Offord and Rose utilized the chemoselective reaction ofhydrazides and aldehydes to form a stable hydrazone linkage [30].The reaction between one peptide with a C-terminal hydrazide and another peptide incorporating an N-terminal glyoxylyl functionality was facile in aqueous buffer at pH 4.6 (Fig. 10.2-3).
Fig. 10.2-3 Hydrazone ligation o f unprotected peptides in aqueous solution.
10.2 Chemical Synthesis off’roteins and Large Bioconjugates
Fig. 10.2-4
Thioester ligation of unprotected peptides in aqueous solution.
Concurrently, Kent demonstrated the chemoselective ligation principle with a thioester forming ligation reaction between a C-terminal thioacid group and an N-terminal bromoacetyl moiety (Fig. 10.2-4) [31]. This ligation took advantage of the unique nucleophilicity of thioacids at low pH. All strong nucleophiles in proteins have high pK, values, for example, Cys pK, 9, and Lys and Tyr pK,-lO. In contrast, thioacids have a pKa-3, and react rapidly and selectively with alkyl bromides at pH 3-4. A key component of the thioester and oxime ligation is that no side chain protecting groups are needed [32], and the final polypeptide product is generated after ligation with no further chemical manipulation. The concept of chemoselective ligation for polypeptides inspired the development of an expanding set of selective chemical reactions to link complex organic molecules in aqueous solution [33]. These reactions include Schiff base type ligations (hydrazone [30], oxime [34]), thiazolidine-based ligations [33], alkylation of sulfhydryl groups [3 11 (thioester, thioether), Staudinger chemistry 135-381 (chemoselective reaction between a phosphine and an azide followed by acyl transfer to form an amide), and [3 21 cycloaddition/click chemistry (reaction of an azide and alkyne to yield a triazole) [39-411. Many of these reactions have found wide utility in the synthesis of proteins and other biological macromolecules. A conceptually different approach to assemble fully unprotected peptides is to use an enzyme to attain both specificity and catalysis of the amide bond formation. This strategy has been developed using proteases, enzymes that cleave peptide backbone amide bonds. Following the principle of microscopic reversibility, any enzyme can be coerced to catalyze a reaction not only in the forward direction but also in the reverse direction. Such “reverse proteolysis” methods typically use substrates containing activated C-termini,
-
+
I
573
574
I
70 Synthesis OfLarge Biological Molecules
altered reaction conditions (changingthe solvent polarity, temperature or pH), or active site modified enzymes [42-441. In addition, the product ratio can be shifted in favor of ligated products by using organic solvents (lowering the concentration of water). However, slow ligation rates and background aminolysis of the peptides are significant problems with the approach. The most successful strategy for this reverse proteolysis approach is the engineered protease “subtiligase” developed by Wells and coworkers 145,461. This approach took advantage of (a) C-terminal glycolate ester dipeptides that are stable to background hydrolysis but are excellent substrates since they mimic the natural substrate, and (b) protein engineering of the protease, thiolsubtiligase [44], to yield an enzyme that better catalyzes amide bond formation rather than hydrolysis. The so-called “subtiligase” was used to synthesize RNaseA with fluorinated His analogs incorporated to probe the mechanism of RNaseA catalysis. Later studies used phage display to evolve a subtiligase variant that was more robust in the presence of denaturants 1451. Even with the improvements, the main hurdle for extensive use ofthis approach is the low solubility of large unprotected peptides in the nondenaturing buffer conditions required for efficient enzyme catalysis.
10.2.2.6 Practical Requirements for Chemical Ligation Reactions An effectiveligation chemistry needs to fulfill several criteria. First, the reaction needs to be chemoselective - there should be no cross-reactivity between other functional groups found in biomolecules such as peptides, carbohydrates, or oligonucleotides. The necessity of even a single protecting group greatly complicates a synthesis and limits the utility of the method. Second, the ligation needs to be compatible with neutral or weakly acidic aqueous solutions to ensure compatibility with hydrophilic biomolecules without promoting base catalyzed side reactions. Third, the reaction kinetics needs to be rapid. As their name implies, biological macromolecules are high in molecular weight and also have limited solubility in solution. In addition, ligation reactions between two large biopolymers are bimolecular and require equimolar amounts of reactants to avoid wasting precious starting materials. As a result, reaction rates decline rapidly as the concentration decreases. Typically, effective ligation reactions need to proceed to completion within 24 hours of starting, at room temperature and at peptide concentrations at or below 1 mM.
10.2.2.7 Chemoselective Ligation to Form Native Peptide Bonds The most commonly used chemical ligation reaction for the synthesis of proteins utilizes the highly chemoselective reaction between one peptide bearing an N-terminal Cys residue and another peptide containing a C-terminal thioester moiety (Fig. 10.2-5) [47]. In this native chemical ligation strategy, the deprotonated thiolate of the N-terminal Cys residue undergoes facile exchange with the C-terminal thioester group, forming an intermediate structure that
10.2 Chemical Synthesis ofproteins and Large Bioconjugates
Fig. 10.2-5
I
Native chemical ligation in aqueous solution.
links the peptides through a thioester bond. Subsequently, a rapid S-to-N intramolecular acyl transfer yields a stable amide bond at the site of ligation [47, 481. An advantage ofthis reaction is that the “native” polypeptide with a Cys residue at the ligation site is obtained without further chemical manipulation. The chemoselectivity of the reaction stems from the combination of a Cys specific, reversible thioester exchange (any Cys residue in either peptide can participate in this equilibrium) with an essentially irreversible intramolecular reaction that is specific to N-terminal Cys residues. Under typical ligation conditions (pH 6.5-7.5, 1 mM peptide) the intermolecular transthioesterification is rate limiting and no thioester intermediate is observed because of rapid rearrangement [47]. The reaction also utilizes the unique reactivity profile of the thioester as an activated acyl group. Compared to oxoesters with identical substituents, thioesters are much more reactive toward thiol nucleophiles [49] (and to a lesser extent toward amine nucleophiles [SO]), facilitating rapid
575
576
I
10 Synthesis of Large Biological Molecules
reaction kinetics without resorting to high levels of activation that could result in epimerization of the C-terminal amino acid. In contrast to the high reactivity toward thiols, thioesters are remarkably resistant to hydrolysis, the main competing reaction in aqueous solution (55 M). Indeed, thioesters have been shown to hydrolyze more slowly than the corresponding ester derivative [50, 511. It is these properties of thioesters that have made them important reactive intermediates in numerous biological processes from nonribosomal peptide synthesis, ubiquitination polyketide synthesis, and lipid biosynthesis. The native chemical ligation reaction has proved to be remarkably robust and has enabled the synthesis of a variety of proteins [52] from two polypeptide fragments, or using a single N-terminal protecting group, multiple peptide segments assembled in a sequential manner [53]. The chemoselectivity of the reaction extends beyond functional groups found in polypeptides, and the reaction has been used in the context of posttranslationally modified peptideslproteins including glycopeptides, lipopeptides, and phosphopeptides [2-41. In addition, native chemical ligation has proved to be an effective approach for the synthesis of macromolecules that do not require “native” amide bonds. For example, the reaction has been used for the conjugation of peptides to deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and PNA (peptide nucleic acids), and to N-terminal Cys or thioester bearing complex carbohydrates and in the assembly of branched dendritic macromolecules [54]. Because at their fundamental level most chemical ligation reactions are bimolecular, the ligation rate is highly sensitive to concentration. As a result, successful application of native chemical ligation to a given target is largely a function of the solubility of the macromolecules and is generally independent of its molecular weight. Indeed, as described in Chapter 10.1,methods have been developed to use large biologically derived protein fragments in these reactions.
10.2.2.8 Some Variations on the Native Chemical Ligation Theme
Although most applications of native chemical ligation utilize the originally envisioned cysteine-thioester pairing, several variations of this reaction have been described [3, 551. The critical amino-thiol moiety of an N-terminal Cys residue can be varied to yield alternative ligating groups. For example, adding an additional methylene group into the side chain yields a homocysteine that can react with a peptidyl thioester at pH 8 to form a thioester intermediate that can rearrange through a six-membered ring to form an amide bond at a hCys ligation site [5G]. Similarly, selenocysteine has been a substitute for Cys to facilitate ligation at pH G to form selenoproteins. These ligation reactions have excellent kinetics due to the high nucleophilicity of the selenol side chain [57-60]. An alternative strategy is to form the thioester intermediate by the reaction of a nucleophilic thioacid group on an N-terminal /3-bromoalanine residue at low pH, in analogy to the thioester forming ligation described earlier in this chapter. Subsequent neutralization of the reaction leads to acyl transfer, generating an amide bond at the Cys ligation site [Gl].
70.2 Chemical Synthesis offroteins and Large Bioconjugates
I
577
Native Chemical Ligation to Yield Noncysteine Ligation Products
10.2.2.9
The main limitation of the native chemical ligation approach is the requirement for a Cys residue at the site of ligation. Although Cys is a natural amino acid, it is found in low abundance, limiting the chances of finding a convenient natural ligation site. In addition, the reactivity profile of free thiols that are so useful in ligation, can be a liability when present in the final protein product. One approach to address this limitation is to modify the Cys residue following ligation. For example, Cys residues can be alkylated by alkyl halides to yield analogs of amino acid side chains, for example, glutamine, glutamate, or lysine [53]. This reaction is high yielding at pH 8 and is specific for reduced Cys residues. An alternative approach is to convert all reduced cysteine residues in the final polypeptide product to alanine by desulfurization [62]. This reaction is facilitated by treatment with hydrogenation catalysts such as activated Raney nickel and has been shown to proceed with retention of the peptide stereochemistry. 10.2.2.10 Amide Ligation Using Auxiliaries
Another approach for assembling unprotected peptides is through the reversible attachment of the functional equivalent of a Cys side chain (ligation auxiliary) onto the N-terminus of a peptide. In analogy to native chemical ligation (Fig. 10.2-G(a)),an intermolecular thioester exchange followed by an intramolecular S-to-N acyl transfer yields an amide bond at the site of ligation. Subsequent removal of the auxiliary yields the desired polypeptide [63, 64, 661. Two strategies for ligation auxiliaries have proven to be practical for polypeptide synthesis, and both utilize a benzyl moiety that is stable as a benzyl amine but labile as a benzyl amide. The first strategy is to incorporate a 3,4,5trimethoxy-2-mercaptobenzyl (Tmb)group onto the N-terminus of the peptide (Fig. 10.2-6(b))[64, 651. Following the S-to-N acyl transfer to yield a secondary benzyl amide, the Tmb group can be removed by TFA and scavengers. A second strategy is to use an N-terminal I-phenyl-2-mercaptoethyl group to facilitate ligation (Fig. 10.244~)).When the phenyl ring has a 2,4-methoxy substitution, the auxiliary can be removed with TFA; alternatively, substitution with a 2-nitro moiety, results in an auxiliary that is photolabile following
(b)
eOMeo H N ]+ -
(C)
HN3NpGzq
*s-
Me0
OMe
Fig. 10.2-6
X
Auxiliary mediated native chemical ligation. (a) trans thioesterification, S-to-acyl tranfer, removal of auxiliary. (b) Tmb auxiliary (c) Z-phenylethane thiol auxiliary
578
70 Synthesis $Large Bio/ogica/ Molecules
I ligation [66-691. Both these approaches enable ligation when there is a Gly residue at the ligation junction. 10.2.3 General Considerations
10.2.3.1 Synthesis of N-terminally Functionalized Peptides N-terminal modification of peptides with reactive groups for chemoselective ligation is synthetically straightforward using both Boc and Fmoc SPPS. AS shown in Fig. 10.2-7, bromoacetyl, ketone, aminooxy, azide, alkyne, thiol, and Cys, groups, can all be incorporated at the N-terminus using standard peptide coupling conditions. Aldehydes are most easily introduced after solid phase synthesis through quantitative transformation of an N-terminal Ser residue to a glyoxylyl group using NaI04. 10.2.3.2 Synthesis o f Functionalized Amino Acid Side Chains Any group that can be attached to the N-terminus of a peptide can be attached to an amine side chain through appropriate protecting group manipulation. For example, Lys (Alloc) side chain protecting groups can be removed selectively after full chain assembly in both Fmoc and Boc solid phase synthesis protocols. The revealed amino group can be modified as described above. In addition, numerous amino acids with chemoselective ligation moieties have been synthesized for direct incorporation into peptides. 10.2.3.3
Synthesis o f Peptides Modified at the C-terminus
C-terminal modification is significantly more complicated since it requires manipulation of the cleavable peptide linker or activation of the C-terminus after chain assembly. Specific peptide resin-linkers have been developed
Fig. 10.2-7
Solid phase synthesis of N-terminally modified peptides.
10.2 Chemical Synthesis ofproteins and Large Bioconjugates
Fig. 10.2-8 Solid phase synthesis of C-terminally modified peptides.
that generate C-terminal moieties such as thioacids, thiols, aldehydes, and hydrazones directly upon cleavage from the resin (Fig. 10.2-8(a)).Alternatively, safety-catchapproaches, like the sulfonamide linker, can be selectively activated following chain assembly, enabling the peptide to be cleaved from the resin by a desired nucleophile (typically an amine or thiol). An additional approach is to modify the side chain of the C-terminal amino acid (Fig. 10.2-8(b)).This approach is useful if the geometry of the ligation site can tolerate significant changes. Such side chain manipulation is easier to perform than direct modification of the peptide C-terminus, making it an attractive alternative approach for C-terminal modification. 10.2.3.4
Synthesis o f C-terminal Thioester Peptides
C-terminal thioester peptides are critical for the native chemical ligation approaches for peptide synthesis. In addition, thioester peptides are useful synthetic intermediates for many C-terminal modifications that can be introduced through aminolysis of the thioester bond after chain assembly by SPPS (see Chapter 10.1).Boc-based SPPS is the most effective method for the generation of thioester peptides because the thioester group is stable to the deprotection conditions for Boc removal (TFA) and side chain removal (HF). In contrast, Fmoc-based SPPS methods are less compatible with these thioester linkers since the Fmoc group must be removed with base, typically the secondary amine, piperidine. Several protocols have been developed using hindered, nonnucleophilic bases for Fmoc deprotection that facilitate the generation of short (- 10 amino acid) thioester peptides. Other approaches for thioester peptide synthesis by Fmoc SPPS protocols utilize “safety-catch” linkers that are stable for peptide elongation but can be subsequently converted into activated acyl groups. Following activation, the peptide can be cleaved from the resin using thiols. Several of these strategies are discussed in Chapter 10.1. However, Fmoc-based thioester peptide synthesis is still technically challenging and an active area of methodology development.
I
579
580
I
10 Synthesis $Large Bio/ogica/ Mo/ecu/es
Development of Fmoc SPPS compatible thioester synthesis is important since several posttranslation modifications such as phosphorylation and glycosylation are most efficient with Fmoc SPPS.
10.2.3.5 10.2.3.5.1
Native Chemical Ligation Reactions Selection ofthe Ligation Site
The first consideration when planning a protein synthesis by native chemical ligation approaches is the selection of an appropriate ligation site. Because of the challenges of large polypeptide synthesis by SPPS, no segment should exceed -60 amino acids in length and, typically, peptides of 25-60 amino acids are selected. Many synthetic protein targets must be assembled sequentially from more than two components, requiring a protecting group for the N-terminal Cys residue of all internal segments. Since ligation requires both the amine and thiol of the N-terminal Cys, only one of these groups needs to be protected and the N-Msc and S-Acm protecting groups have been utilized for this purpose [53, 70, 711. Alternatively, protection of both groups can be achieved using a thiaproline residue that can be converted to Cys through treatment with methoxylamine [72].An additional synthetic constraint is the requirement for a Cys residue to facilitate the ligation reaction. Ideally, a natural Cys residue is selected and it has been shown that native chemical ligation is compatible with a variety of Xaa-Cys ligation sites [73, 741. If no native Cys residue is available, one solution is to substitute a Cys residue at a noncritical site in the polypeptide sequence. Following ligation, this Cys residue can be left as a free thiol, alkylated to mimic a natural side chain or the polypeptide can be globally desulfurized, yielding a protein with Ala in place of all Cys residues [62].An alternative approach to non-Cys ligation sites is the use of a ligation auxiliary that facilitates ligation at unhindered Glycine residues in a polypeptide sequence. Although a Gly-Gly sequence is the most synthetically straightforward sequence to use with these auxiliaries, ligation sites using Xaa-Gly and Gly-Xaa sequences have been demonstrated [64-69]. Overall, it is important to consider that a strategy which involves the fewest number of chemical manipulations and purifications following SPPS is likely to result in the highest yield of synthetic products. 10.2.3.5.2
Selection o f Ligation Conditions
Chemical ligation methods are typically compatible with a wide range of reaction conditions. However, it is important to note that in addition to optimizing ligation rates, maintenance of the chemoselectivity of the reaction is critical. As a result, native chemical ligation is typically performed at a pH of 6.5-8.0 at 25-40°C to avoid the possibility of unwanted thioester reactivity such as aminolysis, hydrolysis, or epimerization of the C-terminal amino acid. To maintain pH control in the presence of high concentrations
10.2 Chemical Synthesis ofProteins and Large Bioconjugates
of peptide functional groups, and thiol additives, high buffer concentrations (100-500 mM) are used. Another important consideration is that Cys residues are prone to oxidation to form disulfides, which are unable to participate in ligation. An alkyl thiol or soluble phosphine is typically added to provide reducing conditions for the ligation reaction. Chemical ligation reactions proceed rapidly in aqueous solution and additives or cosolvents can be added to facilitate peptide solubility. The most common additive is the denaturant 6 M guanidine hydrochloride that facilitates the solubility of unstructured peptide fragments, thereby increasing peptide concentration and reducing the possibility of peptide conformation affecting ligation rates. Similarly, detergents have been used to facilitate the solubility of hydrophobic peptides and in some cases may also increase ligation rates by concentrating the peptides in peptide-micellar structures. Organic cosolvents such as trifluoroethanol, DMF, dimethylsulfoxide, or acetonitrile can also enhance peptide solubility, although these additives can make purification by HPLC more challenging. 10.2.3.5.3
Enhancing Ligation Rates
Ideally, chemical ligation rates should proceed with fast kinetics to avoid unwanted side reactions. Since ligations are typically equimolar bimolecular reactions, the most straightforward approach for increasing ligation rates is to increase peptide concentration. In addition, it has been shown that thioester peptides with better thiol leaving groups undergo faster ligation and transthioesterification. They can be synthesized before ligation, or preferably in situ, by adding an excess of thiophenol to the ligation reaction. It should be noted that the ligation buffer can significantly affect thiophenol solubility and more soluble thiols, such as mercaptoethylsulfonate, can be used when solubilizing agents such as 6 M guanidine HC1 are not used. Another approach to enhance the ligation rate is to increase the effective concentration of the peptides. I t has been shown that some proteins can adopt a native-like (although less stable) folded conformation following cleavage into two or more polypeptide segments. As a result, performing the ligation reaction under conditions that promote polypeptide folding can significantly accelerate the ligation reaction [75]. Similarly, use of detergents or lipid bilayers [76] can increase the effective concentration of hydrophobic polypeptides. 10.2.4 Applications and Practical Examples
10.2.4.1
Structure-function Analysis o f Chemokines and the Development o f Protein Pharmaceuticals
Chemokines are a large family of proteins that mediate the directed migration of leukocytes in the body. The moderate size (-70 amino acids) and medical importance of these proteins have made them an attractive target for chemical
1
581
582
I
I0 Synthesis of Large Biological Molecules \
S I
XRANTES(4-68)
0
Moderate potency Natural Product
-
&-
0 ,
?'il
Nzi 0-
Position 1 optimization
HO
0
XRANTES(4-68)
.
o
HO
Position 2 optimization
I
HO
Positions 1,2 and 3 combination
4 ~ ' ~ i Y ! 4 A0 N T E S ( 4 - 6 8 ) Highly potent O . 0 protien mimetic
A
Fig. 10.2-9 Protein Medicinal Chemistry. The N-terminus of the chemokine RANTES was systematically modified to improve receptor binding and HIV microbicide activity.
synthesis. These proteins adopt a conserved fold, consisting ofthree antiparallel ,&strands and a C-terminal a-helix, which is stabilized by two conserved disulfide bonds (Fig. 10.2-9). The structure-function analysis of chemokines has been greatly enhanced by chemical synthesis, particularly in the work of Clark-Lewis and coworkers. Using total SPPS, over 1000 chemokine and chemokine analogs were synthesized, utilizing both natural and unnatural amino acids to probe the molecular basis of chemokine function. One notable study probed the biological relevance of dimerization for the biological activity of chemokines. The chemokine interleukin-8 (IL-8) dimerizes at high
10.2 Chemical Synthesis offroteins and Large Bioconjugates
concentrations necessary for structural determination by nuclear magnetic resonance (NMR) or crystallography. The dimerization interface includes an extended /I-sheet structure between the monomers. To test the hypothesis that IL-8 functioned as a monomer at biologically relevant subnanomolar concentrations, a derivative of IL-8 was synthesized with a methyl group attached to the backbone amide (N-Me amide) designed to disrupt backbone hydrogen bonding and to prevent dimerization. The full biological activity of this analog provided the first strong support for monomeric IL-8 being the biologically relevant conformation of the chemokine. The chemokine IL-8 was also the first protein synthesized by native chemical ligation. Forming the protein by ligation has the advantage of using smaller synthetic peptides that can be synthesized rapidly with high purity. (Although chemokines can be synthesized by SPPS, at -70 amino acids, they represent the upper limit of effective synthesis by this approach and different chemokines contained variable amounts of microheterogeneity.) The centrally located Cys34 provided a convenient site for ligation between peptides corresponding to IL-8 I-33-thioester and IL-8 34-72. Following ligation, the reduced polypeptide was oxidatively folded in 1 M guanidine HC1, pH 8.5 to yield fully active IL-8. Recently, work on synthetic chemokines has been stimulated by the potential for analogs of the chemokine RANTES (regulated on activation, normal, T expressed, and secreted) to block human immunodeficiency virus (HIV) entry of cells. This inhibition is achieved through intracellular sequestration of the chemokine receptor CCR5, which is also a coreceptor for HIV entry. In order to develop RANTES as a pharmacological agent for use as an HIV microbicide, a large set of RANTES analogs was synthesized with nonnatural amino acid structures at the N-terminus of the protein. The analogs were synthesized by native chemical ligation in analogy to the approach described for IL-8. As shown in Fig. 10.2-9, chemical synthesis enabled the screening of multiple analogs and resulted in a RANTES analog with >50-fold greater potency than the starting lead compound, AOP (amino0xypentane)-RANTES. Interestingly, AOP-RANTES was originally generated by an oxime ligation between aminooxypentane and an N-terminal glyoxylyl-RANTES analog (derived form biological expression),demonstrating the power of semisynthetic methods in protein chemistry. I t is also notable that attempts to generate more potent N-terminal variants of RANTES using phage display libraries were unsuccessful. It was concluded that this work “was able to exploit the greater breadth of possible substitutions and thus higher degree of spatial resolution, afforded by total chemical synthesis.”
10.2.4.2
Synthesis of N-myristoylated HIV-1 Matrix Protein p17 from Three Peptide Segments
Protein lipidation is a critical posttranslational modification that serves to regulate the membrane attachment of numerous cellular and viral
I
583
584
I
10 Synthesis of Large Bio/ogica/ Molecules
Fig. 10.2-10 Total synthesis of HIV-1 matrix protein with an N-terminal myristoyl group.
proteins. HIV-1 matrix protein p17 is a 131 amino acid protein with an N-terminal myristoyl (C14) group. When covalently linked to the HIV Gag polyprotein, p17 targets the polyprotein to the host-cell membrane for particle assembly. However, on HIV viral maturation, proteolytic cleavage occurs at the C-terminus of p17 and enables p17 to partially dissociate from the viral membrane. Since large quantities of myristoylated p17 cannot be obtained through heterologous expression systems, the protein was chemically synthesized to study the effects ofmyristoylation on p17 structure and function. As shown in Fig. 10.2-10, the 131 amino acid protein was assembled from three peptide segments using an S-Acm protecting group for the peptide corresponding to residues 56-85 to avoid cyclization of this central subunit. Using this approach, 275 mg of this 15-kDa lipoprotein was synthesized which enabled detailed biophysical measurements. These studies suggest that the role of the myristoyl group is to stabilize the trimeric folded state of the protein rather than to effect a conformational change as had been previously proposed. Significantly, this large protein was synthesized with an overall yield of 7.5% based on the loading of the peptide resin used in solid phase synthesis, emphasizing the efficiency of the synthetic procedures (over 300 synthetic steps were performed in the synthesis of this protein). 10.2.4.3 Synthesis o f Nonlinear Protein Structures
The synthesis of proteins with nonlinear architecture has found many applications in protein design. One class of designed proteins consists of
70.2 Chemical Synthesis ofProteins and Large Bioconjugates
a linear template that contains multiple reactive groups onto which linear peptides can be ligated to generate a branched peptide structure. Chemical ligation approaches are the methods of choice for the generation of such template assembled synthetic protein (TASP) [77]and multiantigenic peptide (MAP) [78] structures, and they have been assembled using thioester [79], thioether, oxime, hydrazone, and thiazolidine ligation reactions. A notable example of this approach for assembling proteins is the synthesis of tetrameric and pentameric TASP molecules on the basis of the transmembrane (TM)domain of HIV virus protein u (Vpu).Viral membrane proteins frequently oligomerize to form ion channels but analysis ofthese channels is complicated by difficulties in determining the oligornerization state of the protein. As a result, the chemical synthesis of branched peptides with a desired (four or five) stoichiometry of TM peptides is an attractive approach. However, TM peptides are highly insoluble, which complicates the purification and assembly of the multimeric product. To overcome these problems, polyethylene glycol-derived polyamide (PPO) solubilization tag was attached through a cleavable thioester bond to the C-terminus of each Vpu TM peptide. In order to ligate the peptides to the tetravalent or pentavalent template, an N-terminal aminooxy group was incorporated to each TM peptide, complementary to the ketoamide moieties on the template. As shown in Fig. 10.2-11,this synthetic strategy enabled the assembly of soluble Vpu TM-PPO-based TASP molecules with a molecular weight of over 20 000 Da. Cleavage of the thioester link to the solubilizing PPO moiety and incorporation into liposomes enabled the characterization of 4 and 5 helical bundle ion channels. Conductivity measurements on these Vpu TASP molecules suggest that a pentamer is the oligomeric state of the Vpu ion channel. Another nonlinear architecture that has been explored in proteins is headto-tail cyclization. Small cyclic peptides are common in peptidomimetic efforts to mimic protein loops using peptides but traditional peptide cyclization methods are not applicable to large polypeptide chains. Cyclic proteins can be synthesized from a polypeptide containing both an N-terminal Cys and a C-terminal thioester [80-821. It has been shown in multiple proteins that the intramolecular ligation reaction proceeds at a faster rate than the competing polymerization reaction yielding near-quantitative cyclic polypeptide structures. This procedure has been used to synthesize naturally cyclic proteins such as the cyclotide family [82] and also engineered cyclic proteins designed to increase thermodynamic stability [SO-821. Protein cyclization was taken one step further by the synthesis of a protein catenane, consisting of two interlocked cyclic peptides [83, 841. This structure was designed from the tetramerization domain of p53 which folds in a bisecting U conformation (Fig. 10.2-12). To construct the catenane, linear peptides corresponding to the p53 tet domain were synthesized with both an N-terminal Cys and a C-terminal thioester. The catenane was assembled by folding the peptide to preorganize the bisecting conformation. Since protein folding is faster than chemical ligation, native chemical ligation of the ends
1
585
586
I
10 Synthesis $Large Biological Molecules
Fig. 10.2-11 Assembly of a pentameric ion channel based a transmembrane domain o f HIV (Vpu). The membrane domain was attached to a PPO-peg group t o solubilize the peptide for purification and ligation. Upon assembly into the 5-helix TASP molecule, the PPO-peg group was removed by hydrolysis.
of the p53 polypeptide resulted in quantitative catenane formation, forming a topologically linked dimer. These interlocked protein structures were found to be extremely thermodynamically stable - stabilizing the fold by >SO"C at 10 pM. Interestingly, the stability of these proteins stems from destabilization of the denatured state rather than stabilization of the folded state. 10.2.5 Future Directions
Chemical ligation approaches have revolutionized the synthesis of macromolecules, enabling the synthesis of monodisperse products over 50 000 Da in molecular weight. These highly chemoselective reactions have proven to be robust for the assembly of a wide variety of biological macromolecules and, as a result, many of the future directions in this field depend on the application of synthetic macromolecules to address fundamental questions about protein
10.2 Chemical Synthesis ofProteins and large Bioconjugates
I
CGGGEY ~'TLVIKGKERt;EMFKELNEALELKDAQAGKEPCIG-COS~
Fig. 10.2-12 Synthesis of a protein catenane based on the p53 tetramerization domain
structure and function in vitro as well as in vivo. Systematic incorporation of unnatural amino acids to modify the side chains and backbone structures of polypeptides promises to yield new insights into protein structure and function as well as into enzymatic catalysis. In addition, the incorporation of specific stable isotopes into proteins (2H,I3C, l S N ) promises to be a powerful approach for both NMR and infrared (IR) analysis of proteins. In order to use chemical ligation approaches, it is necessary to synthesize the large macromolecular precursors in a straightforward manner. Indeed, the synthesis of the modified synthetic polypeptides is frequently the ratedetermining step in synthesizing a protein. New methods for the synthesis of all peptides but particularly peptide thioesters need to be developed to improve synthetic access to proteins. For example, new approaches for synthesizing fragile posttranslationally modified glyco-, phospo-, and lipopeptides are being developed [85-871. Similarly, improvements to SPPS will increase the length of peptide precursors, and enable larger proteins to be synthesized.
587
588
I
70 Synthesis of Large Biological Molecules
Current methods for chemical ligation have great utility but new advances will greatly enhance the size and quantity of proteins that can be chemically synthesized. Of particular importance is the development of straightforward methods for the handling of peptides following ligation reactions. The development of solid phase ligation approaches [88, 891, one-pot syntheses [90, 911, and the use of affinity tags [92]promise to greatly simplify the yield of synthetic proteins assembled from multiple components. New approaches for chemical ligation will provide greater synthetic flexibilityas shown with amideforming ligation auxiliaries [62-691. Approaches have been described to use the chemoselective reaction between phosphines and azides to yield a thioester linked aminophosphorane intermediate that rearranges to yield a native amide bond [36, 37, 931. In addition, non-native ligation chemistries forming structures such as triazoles promise to enhance the types of modifications that can be made to synthetic macromolecules [39-411. Further development of simple and general ligation approaches will greatly enhance the synthesis of macromolecules and protein natural products.
References
M. Smith, In vitro mutagenesis, Annu. Rev. Genet. 1985, 19,423. 2. P.E. Dawson, S.B.H. Kent, Synthesis of native proteins by chemical ligation, Annu. Rev. Biochem. 2000, 69, 923. 3. B.L. Nilsson, M.B. Soellner, R.T. Raines, Chemical synthesis of proteins, Annu. Rev. Biophys. Biomol. Struct. 2005, 34,91. 4. J.D. Hartgerink, Covalent capture: a natural complement to self-assembly, Curr. Opin. Chem. Biol. 2004, 8, 604. 5. R.B. Merrifield, Solid phase peptide synthesis, J . Am. Chem. SOC.1963, 85, 1.
10.
11.
12.
2149.
B. Merrifield, in Peptides: Synthesis, Structures, and Applications, 1st ed., (Ed.: B. Gutte),Academic Press, San Diego, 1995, 93. 7. S.B. Kent, Chemical synthesis of peptides and proteins, Annu. Rev. Biochem. 1988,57,957. 8. J.A. Borgia, G.B. Fields, Chemical synthesis of proteins, Trends Biotechnol. 2000, 18, 243. 9. H.C. Hang, C.R. Bertozzi, Chemoselective approaches to glycoprotein assembly, Acc. Chem. Res. 6.
2001,-34, 727.
13.
E. Fisher, Untersuchungen uber aminosauren, polypeptide, und proteine, Ber. Chem. Ges. 1906,39, 530. T. Kimmerlin, D. Seebach, ‘100years of peptide synthesis’: ligation methods for peptide and protein synthesis with applications to beta-peptide assemblies, /. Pept. Res. 2005, 65, 229. R.N. Zuckermann, J.M. Kerr, S.B.H. Kent, W.H. Moos, Efficient method for the preparation of peptoids [oligo(n-substituted glycines)]by submonomer solid-phase synthesis, J . Am. Chem. SOC.1992, 114,10646. F.A. Robey, R.A. Fields, Automated synthesis of N-bromoacetyl-modified peptides for the preparation of synthetic peptide polymers, peptide-protein conjugates, and cyclic peptides, Anal. Biochem. 1989, 177, 373.
14.
15.
M. Schnolzer, S.B.H. Kent, Constructing proteins by dovetailing unprotected synthetic peptides: backbone engineered HIV protease, Science 1992, 256, 221. M. Bergmann, L. Zervas, Biochem. Z. 1932, 203, 280.
References I 5 8 9
F. Albericio, L.A. Carpino, Methods Enzymol., 1997, 289, 104. 17. S. Sakakibara, Chemical synthesis of proteins in solution, Biopolymers 1999, 51, 279. 18. J. Bedford, C. Hyde, T. Johnson, W. Jun, D. Owen, M. Quibell, R.C. Sheppard, Amino acid structure and “difficult sequences” in solid phase peptide synthesis, Int. /. Pept. Protein Res. 1992, 40, 300. 19. M. Mutter, A. Nefzi, T. Sato, X. Sun, F. Wahl, T. Wohr, Pseudo-prolines (psi-pro)for accessing inaccessible peptides, Pept. Res. 1995, 8, 145. 20. V.K. Sarin, S.B.H. Kent, R.B. Merrifield, Properties of swollen polymer networks: solvation and swelling of peptide-containing resins in solid phase peptide synthesis, /. Am. Chem. SOC.1980, 102,5463. 21. R.C. Sheppard, New solid-phase methods in the synthesis of natural peptides, Biochem. SOC.Trans. 1980, 8, 744. 22. B.T. Chait, S.B.H. Kent, Weighing naked proteins-practical, high-accuracy mass measurement of peptides and proteins, Science 1992, 257,1885. 23. K. Tanaka, The origin of macromolecule ionization by laser irradiation (Nobel lecture), Angew. Chem., Int. Ed. Engl. 2003, 42, 3860. 24. T. Hunt, Nobel Lecture. Protein synthesis, proteolysis, and cell cycle transitions, Biosci. Rep. 2002, 22,465. 25. M. Quibell, L.C. Packman, T. Johnson, Solid-phase assembly of backbone amide-protected peptide segments: an efficient and reliable strategy for the synthesis of small proteins, 1.Am. Chem. SOC.,Perkin Trans. 1 1996, I, 1227. 26. J. Blake, C.H. Li, New segmentcoupling method for peptide synthesis in agulous solution, Proc. Natl. Acad. Sci. U.S.A.1981, 78,4055. 27. S. Aimoto, Contemporary methods for peptide and protein synthesis, Curr. Organ. Chem. 2001, 5 4 5 . 28. D.S. Kemp, R.I. Carey, Synthesis of a 39-peptide and a 25-peptide by thiol-capture ligations: observation of 16.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
a 40-fold rate acceleration of the intramolecular 0,N-acyl transfer reaction between peptide fragments bearing only cysteine protecting groups,/. Org. Chem. 1993, 58,2216. J.A. Wells, Systematic mutational analyses of protein-protein interfaces, Methods Enzymol.1991, 202, 390. K. Rose, L.A. Vilaseca, R. Werlen, A. Meunier, I. Fisch, R.M. jones, R.E. Offord, Preparation of well-defined protein conjugates using enzyme-assisted reverse proteolysis, Bioconjug. Chem. 1991, 2, 154. M. Schnolzer, S.B. Kent, Constructing proteins by dovetailing unprotected synthetic peptides: backbone-engineered HIV protease, Science 1992, 256, 221. M. Baca, T.W. Muir, M. Schnolzer, S.B.H. Kent, Chemical ligation of cysteine-containing peptides: synthesis of a 22 kDA tethered dimer of HIV-1 protease, /. Am. Chem. SOC. 1995, 117,1881. J.P. Tam, J.X. Xu, K.D. Eom, Methods and strategies of peptide ligation, Biopolymers 2001, GO, 194. K. Rose, Facile synthesis of homogeneous artificial proteins, /. Am. Chem. SOC.1994, 116, 30. E. Saxon, C.R. Bertozzi, Cell surface engineering by a modified Staudinger reaction, Science 2000, 287, 2007. E. Saxon, 1.1. Armstrong, C.R. Bertozzi, A “traceless” staudinger ligation for the chemoselective synthesis of amide bonds, Org. Lett. 2000, 2, 2141. B.L. Nilsson, L.L. Kiessling, R.T. Raines, Staudinger ligation: a peptide from a thioester and azide, Org. Lett. 2000, 2, 1939. M. Kohn, R. Breinbauer, The Staudinger ligation-a gift to chemical biology, Angew. Chem., Int. Ed. Engl. 2004,43,3106. H.C. Kolb, M.G. Finn, K.B. Sharpless, Click chemistry: diverse chemical function from a few good reactions, Angew. Chem., Int. Ed. Engl. 2001,40, 2004. Q. Wang, T.R. Chan, R. Hilgraf, V.V. Fokin, K.B. Sharpless, M.G. Finn,
5901 10
Synthesis of Large Biological Molecules
Bioconjugation by copper( I)-catalyzed azide-alkyne [3 21 cycloaddition, /. Am. Chem. SOC.2003, 125,3192. C.W. Tornoe, C. Christensen, M. Meldal, Peptidotriazoles on solid phase: [1,2,3]-triazolesby regiospecific copper(i)-catalyzed1J-dipolar cycloadditions of terminal alkynes to azides,]. Org. Chem. 2002, 67, 3057. Z. Machova, R. von Eggelkraut-Gottanka, N. Wehofsky, F. Bordusa, A.G. Beck-Sickinger, Expressed enzymatic ligation for the semisynthesis of chemically modified proteins, Angew. Chem., Int. Ed. Engl. 2003,42,4916. Z.P. Wu, D. Hilvert, Conversion of a protease into an acyl transferase-selenolsubtilisin, J . Am. Chem. SOC.1989, 1I I, 4513. T. Nakatsuka, T. Sasaki, E.T. Kaiser, Peptide segment coupling catalyzed by the semisynthetic enzyme thiolsubtilisin, /. Am. Chem. SOC.1987, 109, 3808. S. Atwell, J.A.Wells, Selection for improved subtiligases by phage display, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,9497. D.Y. Jackson, J. Burnier, C. Quan, M. Stanley, J. Tom, J.A. Wells, A designed peptide ligase for total synthesis of ribonuclease a with unnatural catalytic residues, Science 1994,266,243. P.E. Dawson, T.W. Muir, I . ClarkLewis, S.B.H. Kent, Synthesis of proteins by native chemical ligation, Science (Washington, D. C.) 1994, 266, 776. T. Wieland, E. Bokelmann, L. Bauer, H.U. Lang, H. Lau, Uber Peptid synthesen. 8. Mitteilung Bildung van S-haltigen Peptiden durch intramolekulare Wanderung van Arninoacylresten. Liebigs Ann. Chem. 1953,583,129. I.H. Um, G.R. Kim, D.S. Kwon, The effects of solvation and polarizability on the reaction of S-P-Nitrophenyl thiobenzoate with various anionic nucleophiles, Bull. Korean Chem. SOC. 1994, is,58s.
+
41,
42.
43.
44.
45.
46.
47.
48.
49.
K.A. Connors, M.L. Bender, Kinetics of alkaline hydrolysis and N-butylaminolysis of ethyl P-nitrobenzoate and ethyl P-nitrothiolbenzoate, /. Org. Chem. 1961,26,2498. 51. W. Yang, D.G. Drueckhammer, Understanding the relative acyltransfer reactivity of oxoesters and thioesters: computational analysis of transition state delocalization effects, 1.Am. Chem. SOC.2001, 123,11004. 52. P.E. Dawson, S.B. Kent, Synthesis of native proteins by chemical ligation, Annu. Rev. Biochern. 2000, 69, 923. 53. T.W. Muir, P.E. Dawson, S.B.H. Kent, Protein-synthesis by chemical ligation of unprotected peptides in aqueous-solution, Methods Enzymol. 1997, 289,266. 54. A. Dirksen, E.W. Meijer, W. Adriaens, T.M. Hackeng, Strategy for the synthesis of multivalent peptide-based nonsymmetric dendrimers by native chemical ligation, Chem. Commun. 2006, I S , 1667. 55. J.P. Tam, Q. Yu, Z. Miao, Orthogonal ligation strategies for peptide and protein, Biopolymers 2000, 51, 311. 56. J.P. Tam, Q. Yu, Methionine ligation strategy in the biomimetic synthesis of parathyroid hormones, Biopolymers 1998, 46, 319. 57. R. Quaderer, A. Sewing, D. Hilvert, Selenocysteine-mediated native chemical ligation, Helv. Chim. Acta 2001,84, 1197. 58. W.A. van der Donk, M.D. Gieselman, Synthesis of selenocysteine-containing peptides by native chemical ligation, Abstr. Pap. Am. Chem. SOC.2001, 222, u45. 59. S.M. Berry, M.D. Gieselman, M. J. Nilges, W.A. van der Donk, Y. Lu, An engineered azurin variant containing a selenocysteine copper ligand, /. Am. Chem. SOC.2002, 124,2084. 60. R.J. Hondal, B.L. Nilsson, R.T. Raines, Selenocysteine in native chemical ligation and expressed protein ligation, / . A m . Chem. SOC.2001, 123, 5140. 61. J.P. Tam, Y.A. Lu, L. Chuan-Fa, J. Shao, Peptide synthesis using unprotected peptides through -
50.
References
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
orthogonal coupling methods, Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 12485. L.Z. Yan, P.E. Dawson, Synthesis of peptides and proteins without cysteine residues by native chemical ligation combined with desulfurization, /. Am. Chem. Soc. 2001, 123, 526. L.E. Canne, S.J. Bark, S.B.H. Kent, Extending the applicability of native chemical ligation, J . Am. Chem. Soc. 1996, 118,5891. J. Offer, P.E. Dawson. N"-2Mercaptobenzylamine-assisted chemical ligation, Org. Lett. 2000, 2, 23. J. Offer, C.N. Boddy, P.E. Dawson, Extending synthetic access to proteins with a removable acyl transfer auxiliary, 1.Am. Chem. Soc. 2002, 124, 4642. T. Kawakami, K. Akaji, S. Aimoto, Peptide bond formation mediated by 4,5-dimethoxy-2mercaptobenzylamine after periodate oxidation of the N-terminal serine residue, Org. Lett. 2001, 3, 1403. C. Marinzi, J. Offer, R. Longhi, P.E. Dawson, An o-nitrobenzyl scaffold for peptide ligation: synthesis and applications, Bioorg. Med. Chem. 2004, 12, 2749. P. Botti, M. Villain, S. Manganiello, H. Gaertner, Chemical synthesis of proteins through native and extended chemical ligation, Biopolymers 2003, 71, 283. P. Botti, M.R. Carrasco, S.B.H. Kent, Native chemical ligation using removable N-alpha-(l-phenyl-2mercaptoethyl) auxiliaries, Tetrahedron Lett. 2001, 42, 1831. T.M. Hackeng, J.A. Fernandez, P.E. Dawson, S.B. Kent, J.H. Griffin, Chemical synthesis and spontaneous folding of a multidomain protein: anticoagulant microprotein S, Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 14074. G.S. Beligere, P.E. Dawson, Synthesis of a three zinc finger protein, Zif268, by native chemical ligation, Biopolymers 2000, 52, 363. D. Bang, S.B. Kent, A one-pot total synthesis of crambin, Angew. Chem., Int. Ed. Engl. 2004, 43, 2534.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
T.M. Hackeng, J.H. Griffin, P.E. Dawson, Protein synthesis by native chemical ligation: expanded scope by using straightforward methodology, Proc. Natl. Acad. Sci. U.S.A.1999, 96, 10068. M. Villain, H. Gaertner, P. Botti, Native chemical ligation with aspartic and glutamic acids as C-terminal residues: scope and limitations, Eur. /. Org. Chem. 2003, 17, 3267. G.S. Beligere, P.E. Dawson, Conformationally assisted protein ligation using C-terminal thioester peptides,J. Am. Chem. SOC. 1999, 121, 6332. C.L. Hunter, G.G. Kochendoerfer, Native chemical ligation of hydrophobic [corrected] peptides in lipid bilayer systems, Bioconjugate Chem. 2004, 15,437. M. Mutter, P. Dumy, P. Garrouste, C. Lehmann, M. Mathieu, C. Peggion, S. Peluso, A. Razaname, G . Tuchscherer, Template assembled synthetic proteins (tasp) as functional mimetics of proteins, Angew.Chem., Int. Ed. Engl. 1996, 35, 1482. J.P. Tam, Recent advances in multiple antigen peptides, /. Immunol. Methods 1996, 196, 17. P.E. Dawson, S.B.H. Kent, Convenient total synthesis of a 4-helix template-assembled synthetic protein (TASP) molecule by chemoselective ligation, /. Am. Chem. Sac. 1993, 215, 7263. J.P. Tam, Y.A. Lu, Synthesis of large cyclic cystine-knot peptide by orthogonal coupling strategy using unprotected peptide precursor, Tetrahedron Lett. 1997, 38, 5599. J.A. Camarero, T.W. Muir, Biosynthesis of a head-to-tail cyclized protein with improved biological activity, /. Am. Chem. Soc. 1999, 121, 5597. N.L. Daly, S. Love, P.F. Alewood, D.J. Craik, Chemical synthesis and folding pathways of large cyclic polypeptide: studies of the cystine knot polypeptide kalata B1, Biochemistry 1999, 38, 10606.
I591
592
I
7 0 Synthesis of Large Bio/ogica/Mo/ecules
L.Z. Yan, P.E. Dawson, Design and 88. L.E. Canne, P. Botti, R.J. Simon, Y.J. Chen, E.A. Dennis, S.B.H. Kent, synthesis of a protein catenane, Angew. Chemical Protein Synthesis by Solid Chem., lnt. Ed. Engl. 2001, 40, 3625. phase ligation, J . Am. Chem. Soc., 84. J.W. Blankenship, P.E. Dawson, 1999, 121,8720. Thermodynamics of a designed protein catenane, J . Mol. Biol. 2003, 89. A. Brik, E. Keinan, P.E. Dawson, Protein synthesis by solid-phase 327, 537. 85. J.D. Warren, J.S. Miller, S.J. Keding, chemical ligation using a safety catch linker, J. Org. Chem. 2000, 65, S.J. Danishefsky, Toward fully synthetic glycoproteins by ultimately 3829. convergent routes: a solution to a 90. D. Bang, S.B.H. Kent, A one-pot total long-standing problem, J . Am. Chem. synthesis of crambin, Angew. Chem., SOC.2004, 126, 6576. lnt. Ed. Engl. 2004, 43, 2534. 91. T.W. Muir, Development and 86. R.S. Goody, T. Durek, H. Waldmann, L. Brunsveld, K. Alexandrov, in application of expressed protein GTPases Regulating Membrane ligation, Synlett 2001,733. Targeting and Fusion, Methods 92. D. Bang, S.B. Kent, His6 tag-assisted Enzymol.,2005, 403, 29. chemical protein synthesis, Proc. Natl. 87. Y. Kajihara, N. Yamamoto, Acad. Sci. U.S.A.2005, 102, 5014. T. Miyazaki, H. Sato, Synthesis of 93. B.L. Nilsson, L.L. Kiessling, R.T. diverse asparagine linked Raines, High-yielding Staudinger oligosaccharides and synthesis of ligation of a phosphinothioester and sialylglycopeptide on solid phase, azide to form a peptide, Org. Lett. Cum. Med. Chem. 2005, 12,527. 2001, 3, 9. 83.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
10.3 New Methods for Protein Bioconjugation
10.3 New Methods for Protein Bioconjugation
Matthew B. Francis
Outlook
This chapter surveys new chemical methods for the attachment of synthetic molecules to proteins. Strategies targeting both native and unnatural functional groups are discussed, including an evaluation of the selectivity that each technique can achieve. A particular emphasis has been placed on the unique mechanistic attributes that these reactions possess and the practical circumstances under which they can be used.
10.3.1 Introduction
The field of bioconjugation occupies a central role in chemical biology. At its simplest, this technique involves the attachment of new synthetic components to biomolecules of interest, with the goal of altering their chemical function or biological properties. The resulting hybrid structures have served as powerful tools for a variety of applications, including the observation of protein trafficking [l, 21, the elucidation of electron transfer pathways [3], the improvement of pharmacokinetic properties [4,51, the synthesis of artificial glycoproteins [6], and the construction of nanoscale materials [7, 81. Figure 10.3-1 summarizes some of the molecules and materials that are commonly used to achieve these goals. Regardless of the application, the preparation of each bioconjugate critically relies on at least one chemical reaction that forms a well-defined covalent link between the biomolecule and the synthetic group, creating a need for organic transformations that can modify biomolecules with high yield and specificity. The goals of this chapter are to survey the new chemical tools that have emerged to meet this demand and to provide a perspective on the unique reactivity attributes that have led to their success. Synthetic organic chemistry has provided countless powerful and elegant strategies for the construction of complex natural products. Generally, the reactions used for this purpose arise from the systematic optimization of reaction parameters, such as solvent, temperature, concentration, and protecting groups, until the desired reactivity and selectivity are achieved. In sharp contrast, reactions for biomolecule modification cannot be developed with this flexibility because they must be carried out under a narrow set of conditions to maintain the properly folded structure of the protein substrates. Ideally, they should proceed in aqueous solution within a pH range of 6-8, Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
I
593
594
I
10 Synthesis of Large Biological Molecules
Fig. 10.3-1 A survey of molecules and materials that are commonly attached to proteins through bioconjugation reactions.
at temperatures ranging from 4 to 37 "C, and in the absence of any protective groups. In most cases, they also require the complete removal of excess reagents before the proteins are returned to the biological setting. Perhaps the most significant challenge to meet, however, is the low concentration of most biomolecules in solution (typicallywell below 100 pM), requiring reaction rate constants that are effectively 1000-100 000 times greater than those needed for traditional synthetic operations. Thus, from the perspective of an organic chemist, the field of biomolecule modification provides a fascinating context for the development of chemical transformations that push the limits of reactivity, chemoselectivity, and functional group tolerance. Conceptually, the new bioconjugation reactions described herein have been divided into two types: Those that introduce new functionality by modifying the natural amino acid side chains, and those that target reactive groups not occurring in natural biomolecules. Historically, bioconjugation techniques targeting native functionality have been used more widely, as the introduction of abiotic functional groups into proteins has been difficult to achieve. However, with the advent of new technologies for the biosynthetic incorporation of unnatural amino acids, sugars, and lipids into proteins, exquisitely selective reactions targeting chemically distinct functional groups
10.3 New Methodsfor Protein Bioconjugation
have become possible. These techniques are not used for the majority of bioconjugation reactions at the time of this writing, but they are certain to provide countless new strategies as these methods become more available and general. Although these techniques are described in more detail in other chapters of this book, some examples of their use in selective bioconjugation will be presented whenever possible.
10.3.2 History/Development
By far, the most common bioconjugation reactions target nucleophilic amino acid side chains, including lysine, cysteine, and aspartic/glutamic acid residues that occur in areas of the protein that are not required for proper function [9]. Of these, the reaction of NHS esters 1,isocyanates 2, and isothiocyanates 3 with the &-aminogroups oflysine residues (Fig. 10.3-2(a))is perhaps the most widely used strategy, as most proteins possess multiple copies ofthis residue (often 20 or more) on their surface. These reactions rely on the ability of these reagents to acylate amino groups much more rapidly than they are hydrolyzed by the aqueous solvent. Because of the reliability of this reaction for simple protein modification, dozens of active acylating agents are now commercially available. As an alternative, lysine residues can also be modified through reductive alkylation. This reaction proceeds through the condensation of aldehydes with the amino groups, forming transient imines that are reduced by watercompatible hydride sources, such as NaBH3CN, NaBH4, or transition metal hydrides (see below). An advantage of this technique over lysine acylation is that it maintains the basicity of the amino group, thus preserving the overall charge state of the protein target. The carboxylate residues of proteins can also serve as sites for functionalization. Water-soluble carbodiimides, such as N-ethyl-3-N',N'dimethylaminopropyl carbodiimide (EDC, 4),form active esters with aspartic and glutamic acid residues that react with exogenous amines to form amide bonds, Fig. 10.3-2(b).It should be noted that this reaction often generates side products arising from the rearrangement of the 0-acylisouronium intermediate to form N-acyl urea 5, although nucleophilic catalysts (such as HOBT (hydroxybenzotriazole), 6 ) have been shown to suppress this pathway [lo]. In instances where lysine amino groups are located near the activated carboxylates, this strategy can serve as a particularly useful method for protein cross-linking [ll]. Unfortunately, the high prevalence of lysine and carboxylate-containing residues on protein surfaces places severe limitations on the ability to control the precise locations and the number oftimes a particular biomolecule is modified (for a notable exception, see Ref. 12). The need for this selectivity depends on the application at hand: while many experiments are tolerant of unevenly labeled samples, studies designed to probe enzyme function or to measure
1
595
596
I
10 Synthesis of Large Biological Molecules
(a) Lysine residues
R
(c) Cysteineresidues
R-N=C=X
H
2: X = 0 (Isocyanates) 3: x = s (Isothiocyanates) *\N
8: lodoacetamides
-
f
(b) Aspartic and glutamic acid residues
QH 5 (in varying amounts)
6: HOBT
Fig. 10.3-2 Common strategies for protein bioconjugation, targeting lysine, cysteine, aspartic acid, and glutamic acid residues. In most situations, only cysteine modification reactions are site selective.
distances with fluorescence resonance energy transfer (FRET) [ 131 require exquisite labeling specificity.To a limited extent, differences in pKa values can be used to distinguish between multiple copies of a single residue, but this does not provide a general method for achieving site selectivity. At present, virtually all applications require the site-specific modification of protein target cysteine residues. The low pK, of the sulfhydryl group (4), coupled with the potent nucleophilicity of the thiolate anion, provides a particularly reactive functional group for alkylation reactions. Cysteine is the and typically does not occur rarest of the genetically encoded amino acids [14], in the reduced form as a surface residue; as a result, it is frequently possible to introduce a uniquely reactive cysteine group using site-directed mutagenesis. Although this strategy can sometimes be accompanied by unwanted disulfide
10.3 New Methodsfor Protein Bioconjugation
bond formation or scrambling, the reliability of cysteine modification reagents renders this the current method of choice for applications that require functionalization in a precise location. Reagents for the modification of cysteine fall into two general classes. The first involves a series of alkylation reagents, including maleimides 7, acrylamides, iodoacetamides 8, and vinyl sulfones, designed to modify cysteines through the formation of a sulfur-carbon bond. This method is usually quite selective for thiolate anions, and in cases where lysine crossreactivity is problematic, the selectivity can sometimes be improved by lowering the pH of the reaction medium. Similar to lysine modification strategies, a range of reagents is commercially available for the alkylation of cysteine residues. The second class of cysteine modification reagents includes disulfide formation reagents. Free cysteine residues participate in rapidly equilibrating exchange reactions with symmetric disulfides, such as 9, with complete modification occurring through mass action [ 151. For more precious reagents, asymmetric disulfides can be generated with 4-and 2-thiopyridines [16].These species react with cysteine residues through the selective release of the stabilized thiopyridone group. Disulfide formation reactions are inherently chemoselective, and offer the unique feature of reversibility. This property can be used to release chemical groups on entrance of the protein into reducing environments, a useful feature for drug delivery applications [17]. Despite the utility of cysteine modification, there remains a growing need for reactions that can target other functional groups on proteins. These techniques are necessary in cases where it is inconvenient or impossible to introduce a unique cysteine residue, or when complementary strategies are required to attach two different functional groups to a single protein (e.g., for FRET and optical tweezer studies). Additionally, the targeting of a cysteine residue alone is not sufficient to select a single protein of interest in a living cell or crude lysate. To address these needs, new chemical strategies have become available to expand the set of residues that can be modified and to improve the selectivity with which they can be targeted. The remainder of this chapter focuses on the development, application, and future directions of this active area of research.
10.3.3 New Bioconjugation Methods Targeting the Natural Amino Acids
10.3.3.1
New Chemical Tools for the Modification of Tyrosine Residues
Tyrosine residues are underutilized targets for bioconjugate preparation. As it is displayed with intermediate frequency on protein surfaces, tyrosine can often be modified with greater selectivity than other residues. In contrast to charged amino acids, tyrosine residues are often partially “buried” in the surface of the proteins owing to the amphipathic nature of the phenolic group, Fig. 10.3-3(a-d). This close association with the topography of protein
I
597
598
I
10 Synthesis ofLarge Biological Molecules
Fig. 10.3-3 Tyrosine residues as targets for bioconjugation. (a) In contrast t o charged amino acid side chains, tyrosine residues (yellow) are more closely associated with the protein surface. The reactive 3- and 5-positions ofthe phenolic ring (indicated
by the white arrows) can be (b) fully exposed, (c) partially buried, or (d) fully buried. The protein shown is a-chymotrypsinogen A. (e) Modification o f tyrosine residues through electrophilic aromatic substitution reactions.
surfaces results in varying levels of accessibility for tyrosine residues, and thus significant differences in their reactive properties. In cases where no surface accessible tyrosines are present, they can be introduced using genetic methods, with the added advantage that their incorporation produces minimal changes in the charge state and redox sensitivity of the expressed proteins. As an additional consideration, the tyrosine reactivity is largely complementary to that of cysteine, lysine, and carboxylate-containing residues. When used in conjunction with other methods, this chemical orthogonality is extremely useful for the preparation of proteins that are labeled in multiple sites. Electrophilic aromatic substitution is the most common method for the modification of tyrosine residues, typically involving iodination [18, 191, nitration [20],or azo bond formation [21-231, Fig. 10.3-3(e).Coupling reactions with diazonium salts provide the most general method for the introduction of new functional groups, as virtually any substituent can be attached to the aniline precursor. Through quantitative reactivity studies, it has been determined that diazonium salts prepared from 4-nitroaniline derivatives (such as 10a) are particularly effective, typically reaching very high levels of conversion in under 30 min using less than five equivalents of reagent [lo, 241. Diazonium salts bearing nitrile- lob and acyl substituents 1Oc in the 4-position provide efficient coupling in some instances, but more electron-rich analogs are generally low yielding. A general route to appropriately functionalized diazonium salts is provided using 4-nitro-3-anthranilic acid (ll),Fig. 10.3-4(a).
10.3 New Methodsfor Protein Bioconjugation
Fig. 10.3-4 Highly efficient modification o f tyrosine residues using electron-deficient diazonium salts. (a) General preparation method for nitro-substituted diazonium salts. (b) There are 180 copies oftyrosine 85 (green) displayed on the interior surface o f bacteriophage MS2. (c) Virtually all these sites can be modified using diazonium salt 10a, as evidenced by (d) MALDI-TOF MS
and (e) the appearance o f an azo absorption band in the visible spectrum. (t) Similarly, 2100 copies oftyrosine 139 (yellow) line the exterior surface ofthe tobacco mosaic virus (TMV). (g) These sites can be modified using a two-step diazonium-couplingjoxime formation strategy. In both cases, the reactions are completely selective for the indicated tyrosine residues.
An advantage of diazonium-coupling strategies is the high level of conversion that can be reached. This is particularly useful for the functionalization of protein assemblies designed to serve as scaffolds for material applications, as their surfaces possess hundreds or thousands of individual sites for potential functionalization. As an example, diazonium-coupling reactions have been used to modify the tyrosine residues of two viral capsids, resulting in supramolecular assemblies that are homogeneously functionalized on the interior or exterior surfaces. In the first example, the targeting of tyrosine 85 of the protein capsid of bacteriophage MS2 provided 180 attachment sites on the interior surface ofthe spherical protein shell, Fig. 10.3-4(b)[24].After exposure to two equivalents of nitrodiazonium salt 10a, analysis by MALDI-TOF MS and UV-vis spectroscopy indicated that >90% of the sites had been modified (Fig. 10.3-4(c-e)). Remarkably, no capsid disassembly was observed in these
1
599
600
I
10 Synthesis of Large Biological Molecules
studies. Through further elaboration of these sites, carrier materials are being prepared for drug delivery applications and as targeted diagnostic agents. As a second example, tyrosine 139 of the tobacco mosaic virus (TMV) capsid was modified using ketone-substituted diazonium salt lOc, resulting in the installation of 2100 sites on the exterior surface, which can be further labeled through oxime formation [lo]. Once again, virtually complete conversion was obtained, and the capsid remained assembled after the modification reaction. As a result, tubelike materials with tailorable surface properties have become available for nanoscience applications. The above studies emphasize the ability of diazonium-coupling reactions to modify proteins with extremely high efficiency, but one of the limitations of this method is the lack of selectivity that can be obtained when there are multiple tyrosines on the surface of a single protein. This has not been problematic for the viral capsids shown above, as only one tyrosine is accessible on each monomer, but many applications demand higher levels of selectivity than allowed by these coupling reactions. To address this need, and to increase the substrate scope for bioconjugation reactions in general, a versatile Mannich-typereaction has been developed for tyrosine modification, Fig. 10.3-5 [25]. In this reaction, aldehydes and anilines are mixed to form
(4 r
J
~
!
O
H
Tyrosine residues
0 HKR
25 mM
25 mM
Phosphate buffer 22% 18 h
12
(b) Reactive anilines (with formaldehyde):
Unreactive anilines and aliphatic amines (<5% conversion):
H2N7QNO2
HZNQ
HzN)& CI
H
2
N
b
QCo2H H
A
CO,H
Fig. 10.3-5 Tyrosine modification using a three component Mannich-type reaction. (a) Aldehydes and anilines condense to form imines in situ, which react with tyrosine residues through an electrophilic aromatic substitution reaction. No reaction occurs
when proteins are treated alone with either component. (b) The reaction conversion is listed for a number o f anilines and aliphatic amines using a-chymotrypsinogen A as the substrate and formaldehyde as the aldehyde component.
N H
A
10.3 New Methodsfor Protein BioconJugation
imines 12, which subsequently react with phenolic side chains through an electrophilic aromatic substitution reaction [26]. Anilines bearing electrondonating substituents have proven to be the most effective components in the reaction, affording over 70% overall conversion in some cases. To date, no aliphatic amines have been observed to participate in the reaction - a useful feature, as cross-linking reactions with lysine residues are avoided. Formaldehyde has yielded the highest amount of reactivity, although aldehydes such as pyruvaldehyde, glyoxylic acid, and furan-carboxylic acid have proven effective in some instances. Enolizable aldehydes are generally ineffective in the reaction, presumably due to competing aldol self-condensation pathways. Some particularly attractive features of this reaction include its mild conditions (pH 6.5, aqueous buffer, 22-37 "C), very high selectivity for tyrosine residues, and broad substrate tolerance with respect to the aniline component. I t should be noted that formaldehyde cross-linking techniques require high concentrations of the aldehyde (up to 37%) and/or elevated temperatures [27]. With the low concentrations used in these reactions, no modification of the proteins has been observed in the absence of the aniline component. In many labeling applications, anilines bearing additional aliphatic amino groups (such as 13) are particularly useful building blocks, as the aliphatic amino group of these compounds can be coupled to NHS esters before using them in the Mannich coupling reaction. This effectively converts the large number of commercially available lysine labeling reagents into more selective tyrosine modification reagents using a simple one-pot procedure, Fig. 10.3-6. This strategy has been applied to the labeling of two antibody binders, protein A and protein G', with a number of useful functional groups for immunoassays [28]. As the Mannich reaction does not target cysteine or lysine residues, both thiols and aliphatic amines can be present in the bioconjugation substrates. This allows unprotected peptides to be coupled to tyrosine residues using a tandem Mannich-native chemical ligation (NCL) [29] strategy. TO do this, N-terminal cysteine mimic 14 has been coupled to tyrosine residues using the Mannich reaction, Fig. 10.3-7(a). This functional group couples to peptide thioesters (e.g., IS), ultimately resulting in the synthesis of branched polypeptide backbone architectures. By moving the location of the tyrosine residue through site-directed mutagenesis, the branch point can be repositioned on the protein surface, Fig. 10.3-7(b).The use of this technique allows the growing set of peptide building blocks, including lanthanide binding peptides [30]and affinity tags, to be appended to proteins in a flexible manner.
10.3.3.2
Protein Modification Using Transition Metal Catalyzed Reactions
Transition metal-mediated reactions provide an exceptionally powerful set of tools for site-selectiveprotein modification. These strategies have had a striking impact on organic synthesis over the last three decades due to the ability of transition metals to activate otherwise unreactive functional groups with
I
601
602
I
10 Synthesis $Large Biological Molecules
Fig. 10.3-6 Tyrosine modification using commercially available lysine-reactive probes. (a) The aliphatic amino group reacts chemoselectively with NHS esters, leaving the aniline amino group free t o participate in the Mannich reaction, O n addition of formaldehyde and a protein target, tyrosine residues are modified. (b) Modification o f
chymotrypsinogen A with several chromophores using a two-step, one-pot procedure. Control reactions carried out in the absence o f formaldehyde indicate that no lysine modification occurred owing to remaining NHS esters. (c) Structures o f t h e chromophores used in (b).
S H
100 HM lysozyme
Ligation center
Fig. 10.3-7 Native chemical ligations using tyrosine residues. (a) Reactive N-terminal cysteine mimics can be installed through tyrosine modification using the Mannich reaction. After disulfide reduction with DTT (dithiothreitol), these groups react with
C-terminal thioesters (e.g., 15) obtained using solid-phase synthesis techniques. (b) By changing the location o f t h e tyrosine residue, the branch point o f the resulting structure can be moved.
10.3 New Methodsfor Protein Bioconjugation
I
603
exceptional selectivity. Many of these reactions have been used successfully in aqueous solution and possess virtually complete functional group tolerance. These features suggest that transition metal catalyzed reactions could similarly expand the synthetic repertoire for bioconjugation by targeting previously unmodifiable protein functional groups. It is also possible to tune the reactivity of transition metals through adjustments in the ligand sphere, and the complex stereochemical environments provided by asymmetric ligands could provide a way to distinguish between several otherwise identical amino acid residues. To demonstrate the feasibility of this approach, pioneering studies by several groups have shown that aryl halides introduced into amino acids and peptides can participate in cross-couplingreactions using palladium catalysts in aqueous solution [31-331. As proteins possess a number of nucleophilic groups, it is likely that electrophilic transition metal complexes will prove to be the most useful. As an example, a new palladium based method has been developed for the alkylation of tyrosine residues [34]. In this reaction, allylic carbonates, esters, and carbamates are activated by palladium(0) complexes in aqueous solution, resulting in the formation of electrophilic ir -ally1 complexes (such as 16), Fig. 10.3-8(a). These species react at pH 8-10 with the phenolate anions of tyrosine residues, resulting in the formation of aryl ether 17 and regeneration of the Pd(0) catalyst. The reaction requires no organic cosolvent, is catalytic in palladium, and requires P(m-CbH4S03-)3 as a water-soluble phosphine ligand. In contrast to alkyl or allylic halides, the inert character of the allyloxycarbonylcompounds used in this reaction ensures that nonspecific
(“w
40 M M Pd(OAc),
0 5 mM P(C,H,SO,-),> pH 8 6
Tyrosine residues
~
Dye-NJ
PdLn+
D y e - d 16
17
0
(b)
w
\
o
A\ N
-
s
o
3
-?
H 18 water soluble farnesyl derivative
ll
-1
> 0 44 mM Pd(OAc),. 5 3 mM P(m-C,H,SO,.), pH 9, RT. 3 h
Chymotrypsinogen A (200 PM)
Fig. 10.3-8 Tyrosine modification using palladium n-ally1 chemistry. (a) Allylic acetates (shown), carbonates, and carbamates can be activated by palladium(0) in aqueous solution t o yield electrophilic rr-ally1 complexes. These species alkylate tyrosine residues with high
/
/
25667(Expecled 25656) (unmodified)
/
$I t
25875 (Expected 25860) (M+1 modification) 25000
27000 29000 ESI-MS ( d z )
selectivity. (b) Charged groups can be attached to hydrophobic chains to assist in solubilization. These carriers are lost on formation of the n-ally1 complexes, and thus are not incorporated into the protein targets. This provides a useful method for the synthesis o f membrane-associated proteins.
604
I
70 Synthesis of Large Biological Molecules
background alkylation of the protein does not occur. Extensive reactivity studies and trypsin digests have confirmed that the reaction displays excellent selectivity for tyrosine residues. Activated n-ally1 complexes that do not react with tyrosine residues undergo B-elimination under the basic conditions, to yield diene by-products. A particularly attractive feature of this method is the use of a “disposable” activating group that is cleaved prior to protein attachment. This allows otherwise prohibitively hydrophobic molecules to be solubilized in water by coupling them to charged carrier groups, such as taurine, Fig. 10.3-8(b).This group is lost on activation of carbamate 18 by the water-soluble palladium complex, which then transfers the hydrophobic group to the protein. This “solubility switching” strategy is used to prepare lipid membrane-associated proteins.
10.3.3.3
Modification of Tryptophan Residues Using Metallocarbenoids
Similar to cysteine residues, the low abundance of surface accessible tryptophans suggests that these residues could serve as highly selective bioconjugation handles when introduced using genetic methods. Furthermore, the importance of indole side chains as mediators of protein-protein interactions and electron transfer processes [35]creates a need for modification reactions that can target this residue. To achieve this, a highly selective transition metal-based method has been reported for the functionalization of these groups [36].On exposure ofvinyl diazo compound 19 [37]to R h ~ ( 0 A cin) ~ aqueous solution, electrophilic metallocarbenoid intermediate 20 is produced, Fig. 10.3-9. Normally, this highly reactive species reacts with water to form alcohol 21; however, it has been found that 20 can react with the indole side chains of tryptophan residues with comparable rates, resulting in one of the first modification reactions for this residue. The reaction proceeds readily in aqueous solution with ethylene glycol (up to 20%) [38] added as a cosolvent to assist in the solubilization of the diazo compound. Typically, 10 mM diazo compound and 100 pM Rh2(OAc)4are used, reaching up to 70% conversion with protein concentrations as low as 10yM. On the basis of reactions carried out with small molecule analogs, mixtures of N-alkyl 22 and 2-alkyl 23 products are produced, presumably resulting from direct NH insertion or through cyclopropanation followed by ring opening, respectively. Although the addition of some cosolvents can lead to the modification of disulfides (see below), the reaction otherwise displays excellent tryptophan selectivity. Early studies identified hydroxylamine hydrochloride as an essential component for the success of the reaction. When added to an unbuffered aqueous solution, this additive results in dramatically enhanced catalytic activity, presumably through the binding of the oxygen atom to the remaining vacant coordination site of the bimetallic metallocarbene complex (species 24 in Fig. 10.3-9(b)).However, the addition of this HC1 salt lowers the pH of the solution to 3.5, effectively denaturing many protein targets. Elevated
I
70.3 New Methodsfor Protein Bioconjugation
605
(4
100 pM Rh,(OAc), 75 mM HONHpHCI
ph+oR
p
h
q
O
R
*
Tryptophan
H,O/ethylene (80:20) glycol 10 mM
H
(1residues 0-100 pM)
20 0
RT, 7 h
19: R = (CH,CH2O),CH3
p
h
q
O
4;R 22
+
23
R
21 0
R'O C
0
3 CH, ?-OH H 26
Low pH
+
I
CH3 24: active carbene
CH3 25: inactive carbene
Fig. 10.3-9 Tryptophan modification using addition t o reacting with the aqueous rhodium carbenoids. (a) These species can be formed in situ through the reaction of vinyldiazo compound 19 with catalytic amounts of RhZ(OAc)4. Intermediate 20 can react with tryptophan residues, forming a mixture o f N - and 2-alkylated indoles, in
solvent. Control experiments that were run in the absence o f rhodium catalyst afford no modification products. (b) Proposed binding modes for hydroxylamine at low 24 and elevated 25 p H levels.
reaction pH level results in substantial losses in reactivity with this additive, possibly by liberating the nitrogen lone pair and switching the preferred binding mode to 25. A solution to this problem was found through the use of N-tert-butyl hydroxylamine (26),which discourages the deprotonated nitrogen from binding through steric interactions with the catalyst ligand sphere. Using this additive, reactions can be carried out at pH 6-7.
10.3.3.4
Modification of Disulfide Bonds Using Metallocarbenoids
The selectivity of transition metal-mediated reactions is often sensitive to changes in the specific reaction conditions. This behavior is also observed in the case of metallocarbenoid-based protein modification reactions. In the presence of >25% tert-butanol, metallocarbenoid intermediate 20 reacts with the nucleophilic lone pairs of disulfide groups to form ylides, such as 27a, Fig. 10.3-10(a)[39]. In some instances, this species undergoes a sigmatropic
Ph
606
I
10 Synthesis of Large Bio/ogica/ Mo/ecu/es
/
(a)
Sigma tropic rearrangement
P
h 28
w
0
0
19: R = (CH2CH,0),CH3
20 27b X = Rh,(OAC), 27C: X =H
100 FM Rh2(0Ac),
0 0 RT. 7 h
Ph*CO,R Mixture of 4 isomeric products
3 0 Tocinoic acid (1 mM)
yrotein 100 rnM HONH,.HCI
10 mM Chymotrypsinogen A (100 FM)
50% HO , / 30% glycerol / 20% 1-BuOH RT. 1.5 h
Ph+R
0
R = (CH,CHZO),CH3
Fig. 10.3-10 Disulfide modification using rhodium carbenoids. (a) Disulfide bonds react with metallocarbenes to form ylide-like intermediates 27a-c (where x = a negative charge, coordinated rhodium, or a proton). These can undergo a sigmatropic rearrangement to form 1,3-adducts, or
alkylate nearby nucleophiles. These pathways are demonstrated for (b) a cyclic peptide hormone and (c) a protein. The disulfide in (c) is represented by the yellow spheres, and the N-terminus i s indicated by the yellow arrow.
rearrangement to form IJ-dithiane 28 [40], a stable species that incorporates the functionality originating from the diazo compound while maintaining the overall linkage provided by the disulfide bond. Similar to the tryptophan reaction described above, this reaction requires hydroxylamine hydrochloride as a reaction additive, and it affords high product yields with substrate concentrations as low as 100 pM. Specific reaction conditions are shown for tocinoic acid (30) in Fig. 10.3-10(b). This reaction has also been applied to the modification of a protein target, although a new reaction pathway has been observed in this case. Instead of the pericyclic rearrangement, the ylide intermediate formed with a disulfide of chymotrypsinogen A is attacked by the nearby N-terminus, likely after protonation (yielding species 27c) or recomplexation of the rhodium
R
10.3 New Methodsfor Protein Bioconjugation
catalyst 27b. This results in the transfer of the styryl acetic acid group to this neighboring site. As is the case with the 1,3-insertion pathway, this reaction preserves the disulfide linkage after protein modification. Although the conditions of this reaction are unlikely to maintain secondary and tertiary protein structures, it still provides the only protein modification method that is directed by disulfide groups.
10.3.3.5
Reductive Alkylation o f Lysine Residues Using Transfer Hydrogenation
A transition metal catalyst has also been used to effect the reductive alkylation of amino groups on proteins [41]. This reaction uses [Cp* Ir(44’-dimethoxybipy)(H20)]S0431 as a mild transfer hydrogenation catalyst and formate ion as the stoichiometric hydride source, in Fig. 10.3-11(a). Presumably, this reaction occurs via the reversible formation of imine 33 with free amino groups on the protein surface, followed by reduction of iridium hydride 32. For most proteins, multiple modifications are observed (Fig. 10.3-11(b)),although the overall level ofconversion can be altered through variation of either the reaction temperature or the concentrations of the aldehyde and catalyst. In general, the reaction has shown excellent reliability for protein alkylation between pH 5 and 7.4. Compared to lysine acylation with NHS esters, reductive alkylation strategies offer several key advantages. First, the overall charge state of the protein remains unchanged after the modification takes place, thus minimizing changes in protein solubility and stability. This method also avoids competitive hydrolysis pathways that can be problematic in some activated esters. Similarly, the aldehyde feedstock materials that are used in this technique are frequently more convenient to prepare and store than the corresponding NHS esters. As an example of the latter case, a simple two-step oxidation/reductive alkylation protocol can be used to attach unfunctionalized poly(ethy1ene glycol) (PEG) to proteins. Conversion of commercially available PEG alcohol to corresponding aldehyde 34 is accomplished through oxidation with Dess-Martin periodinane (DMP) in CH2C12, Fig. 10.3-11(c).After isolation ofpolymer 34byprecipitation from ethyl ether, it can be coupled to proteins using the transfer hydrogenation reaction. In addition to providing access to PEGylated proteins for biomedical applications, the simplicity of this technique allows the facile attachment of virtually any polymer bearing primary hydroxyl groups.
10.3.3.6
Site-selective Modification o f the N-terminus
The N-termini of proteins offer several reactive options for the installation of a single new functional group. Compared to lysine side chains, the lower pK, of N-terminal amino groups (6-8) [42],can in principle be used to acylate this position selectively, although absolute specificity is seldom achieved in practice. More reliable strategies instead target the amino group in combination with b-functional groups that are absent in the case of competing lysine
1
607
608
I
10 Synthesis of Large Biological Molecules HCOi
x
31
,8,
OMe
A
+ H,NR'
H
OMe
H
32
H
33
14428
(+I) 14547 (+2)
(b)
10 pM protein
20 pM catalyst 25 mM HC0,Na
0
+
R
50 mM K,HPO, buffer H pH 7.4,22-37"C, 2-18h
~
(1 m w
(4 M e O b o * O nH MW = 2000
100 pM lysozyme 20 uM catalvst (aldehyde 34 at'l mM) 25 mM HC0,Na
50 mM K,HPO, buffer pH 7.4,37"C, 15 h
H
13600
1. 1 equiv DMP CH ZCI, 1 h
* Me0
2.PEG precipitation 37% conversion
*
MeO+O*
14665 (+3)
14309
$N -R ,
114781
14500
ESI-MS (ml~)
(+4)
15400
n 34
"
N A Protein
H
Fig. 10.3-11 Reductive alkylation of proteins using iridium catalyzed transfer hydrogenation. (a) The iridium(ll1) catalyst shown reacts with formate ion to form a water-stable hydride. This species reduces imines formed in situ. (b) This reduction process proceeds readily on proteins, affording multiple alkylated products.
Catalyst: Aldehyde: PEG-OH:
+
+
-
+
-
+
+
(c) Commercially available PEG alcohols can be readily oxidized to aldehydes using the Dess-Martin periodinane (DMP). This product can then be conjugated to proteins using the transfer hydrogenation process, as observed by SDS-PAGE analysis. The arrows indicate the PEG conjugates. No reaction occurs in the absence o f catalyst.
70.3 New Methodsfor Protein Bioconjugation
residues. This approach has been particularly successful in the context of NCL strategies with thioesters (Fig. 10.3-12(a))[29],a technique that is discussed in depth elsewhere in this book (see also Fig. 10.3-14). In addition, N-terminal cysteines can be modified with aldehydes through thiazolidine formation (Fig. 10.3-12(b))[43],although the amide linkage formed in NCL reactions is more resistant to hydrolysis. Similar linkages have been reported using the Pictet-Spengler reaction (Fig. 10.3-12(c))[44],which proceeds via electrophilic aromatic substitution reactions between indoles and imines formed with the N-terminus. An extensive review of these techniques has recently appeared in Ref. 43. A critical consideration for N-terminal modification strategies is the ease with which the identity of the first amino acid can be established. Although all proteins begin with methionine as the first amino acid due to the commonality of the AUG start codon, this group is nearly always removed after translation in eukaryotes. The situation is more complicated in prokaryotes, however, as the methionyl aminopeptidases are sensitive to the size of the second amino
(b) HS H2N
Thiazolidine formation
0
R = H, CH,
Fig. 10.3-12 Common strategies for modification of the N-terminus.
I
609
610
I acid residue [45]. Virtualy
10 Synthesis of Large Biological Molecules
100% of the proteins expressed in Escherichia coli lack the N-terminal methionine i j the second residue is small (such as glycine, alanine, serine, or cysteine). However, in cases where leucine, tryptophan, or tyrosine are present, the methionine is usually retained (note that the initial N-formyl group is always cleaved posttranslationally). As a result, 40% of the proteins in the E. coli genome retain a methionine at the N-terminus, thus requiring further processing of the protein before some modification reactions can take place. For the modification of proteins expressed in prokaryotes, strategies targeting N-terminal serine residues offer the advantage that the initial methionine is always removed when this residue is present. The resulting p-amino alcohol can then be oxidized in the presence of periodate to afford a glyoxamide group [46],which can serve as a handle for additional bioconjugation through oxime or hydrazone formation (Fig. 10.3-12(d)).This reaction is reported to occur under mild conditions (40 pM NaI04, pH 7, 0 "C) and with high yield. As an alternative to these techniques, reactive functionality can also be introduced at the N-terminus using a biomimetic strategy [47, 481. In the presence of pyridoxal phosphate (PLP, 35), imines are formed reversibly with lysine side chains and the N-terminus; in the latter case, the relatively low pK, of the a-proton enables a tautomerization reaction, affording imine 36, Fig. 10.3-13. This species hydrolyzes in the presence of water, resulting in an overall transamination sequence that generates a reactive pyruvamide or glyoxamide group 37 for further elaboration. In the case of proteins bearing N-terminal aspartic acid residues, this reaction is accompanied by
HZO
*
&iH
2-03~0
-
.&ZY
HzN -OR'
0 37
R'O,
RJ...& 0
N Fig. 10.3-13 A biomirnetic strategy for
transamination at p H 6.5 and at 22-37°C. N-terminal modification. After condensation The resulting pyruvamides can be further with pyridoxal phosphate (PLP), a variety of derivatized through oxime formation. N-terminal amino acids undergo oxidative
10.3 New Methodsfor Protein BioconJugation
a decarboxylation step. Because it can be used with many amino acids, this technique provides a general method for the site-selective modification of virtually any protein under mild reaction conditions. 10.3.3.7 Selective Modification o f the C-terminus In contrast to the relatively large number of reactions that target the Nterminus, only one generally effective method for C-terminal modification is currently available. This can be achieved through the use of intein-based methods to produce C-terminal thioesters, which can then be modified through NCLs with functionalized cysteines 38, Fig. 10.3-14 [49]. A more thorough description of the scope of this convenient method appears somewhere else in this book. 10.3.3.8 Binding o f Tetracysteine Motifs to Fluorescein Bis(arsenica1) (FIAsH) Dyes
The labeling of a single biomolecule in a complex protein mixture presents a particularly difficult challenge, as no bioconjugation reaction targeting a
H,Ni
H,N
Protein target
F
U
n
G
d
F
F
C
O
z
H
I'
Protein target
Fig. 10.3-14 Modification of the C-terminus using native chemical ligation.
I
611
612
I
10 Synthesis of Large Biological Molecules
FH SH SH sH
$-cyscys cys-cys-$ I 1 Pro-Gly
-
+ HO
39
+
HS-SH
u
Non-fluorescent
Pro-Gly Fluorescent
Fig. 10.3-15 Sequence-specific protein labeling with FlAsH dyes. Tetracysteine motifs on expressed proteins replace the ethanedithiol groups on biarsenical dye 39, resulting in a substantial enhancement in fluorescence.
single natural amino acid can be expected to display the required selectivity. As a solution to this problem, a labeling technique based on the recognition of a specific sequence of amino acids has been reported. It was recognized that the ethanedithiol groups of fluorescein bis(arsenica1)dye 39 (aka FlAsH) can be displaced by tetracysteine motifs expressed on a protein of interest, Fig. 10.3-15 [2]. Conformational changes that occur on binding reduce the fluorescence-quenching effect of the arsenic atoms, resulting in a substantial (up to GO-fold) enhancement in the quantum yield of the chromophore. The unbound dye remains relatively nonfluorescent, thereby reducing the need for scrupulous removal of the excess reagent. Although many ( C Y S ) ~sequences can be recognized, CCPGCC has been particularly effective. Since the initial publication, additional chromophores with varied optical characteristics have become available [SO]. Although, similar labeling selectivity can be achieved on the translational level using green fluorescent protein (GFP) fusion techniques [Sl], a particular strength of the FlAsH approach is the reliance on a small molecule modification that is less likely to affect protein trafficking, binding, and catalytic function. A more detailed description covering the applications of this powerful technique in cellular imaging appears somewhere else in this book.
10.3.4 New Methods for the Biosynthetic Incorporation of Unnatural Functional Groups
A number of versatile methods have recently become available for the incorporation of unnatural functional groups into biomolecules, allowing the development of previously impossible bioconjugation reactions that target these sites. The advantage of such techniques lies in their selectivity, as abiotic groups can be targeted with reagents that show no reactivity with ordinary biomolecules. As such, these reactions are exceptionally useful for the labeling
10.3 New Methodsfor Protein Bioconjugation
of a single target in a crude cell lysate or on the surface of living cells. Although more detailed descriptions of these techniques appear elsewhere in this book, each is briefly summarized here because of the striking impact that these methods are destined to have on protein modification. Corresponding bioconjugation reactions designed to target unnatural functional groups are detailed in Section 3.5. Recently, efforts by two groups have made it possible to incorporate unnatural amino acids into proteins on the translational level. The first of these makes use of the “Amber” codon, which lacks a cognate tRNA and therefore normally halts protein biosynthesis. Synthetic tRNAs generated to recognize this codon are charged with the new amino acid of interest and added to i n vitro protein expression systems [52,53].This effectively reprograms the ribosome to install a 21st amino acid wherever the Amber codon occurs. Although this method displays absolute site selectivity and is remarkably general with respect to the amino acids that can be introduced, the difficulty in preparing and purifying the synthetic tRNAs restricts the quantities of protein that can be obtained. More recently, this challenge has been addressed through the evolution of modified tRNA synthetases that attach the unnatural amino acid to the tRNA molecules directly [54-571. This allows bacteria or yeast to generate their own tRNA molecules, thus expanding the approach to large-scale applications. In one instance, bacterial hosts capable of biosynthesizing even the new amino acid have also been developed [58].As this method continues to improve with respect to practicality and accessibility, it is certain to provide many new avenues for selective protein modification. An alternative strategy takes advantage of the promiscuity with which some naturally occurring tRNA synthetases attach amino acids to tRNA carriers [59]. In this technique, auxotrophic hosts that cannot produce a targeted amino acid are used. When the amino acid is removed from the culture medium, protein biosynthesis comes to a halt. A replacement amino acid is then added, and the expression of a desired protein is induced. The new amino acid is recognized by the synthetase and is incorporated into the expressed protein at each site where the original amino acid would have appeared. This residue-specijic method has been demonstrated for both methionine [GO] and phenylalanine [ G l ] analogs to date. The technique is rapid and can incorporate new functionality without using directed evolution or heterologous tRNA/synthetase expression, although the site selectivity is not absolute. The incorporation of new functional groups can also be accomplished using the metabolic machinery for posttranslational protein modifications. These methods rely on the ability of some modification enzymes to process and install analogs of their natural substrates containing reactive handles of interest. In an early demonstration of this technique, it was shown that derivatives of N-acetylmannosamine 40a bearing ketones 40b) [G2] or azides 40c [63] in the acyl moiety are tolerated by enzymatic pathways that produce sialic acid. By “feeding” these unnatural building blocks to cell cultures,
I
613
614
I the new functional groups are incorporated into the secreted and cell-surface 10 Synthesis ofLarge Biological Molecules
glycoproteins of mammalian cells, Fig. 10.3-16.This technique has also proven successful for N-acetylglucosamine derivatives [64]. The new sites can then
Fig. 10.3-16 Introduction o f unnatural functional groups through posttranslational modification. (a) Ketones and azides can be introduced onto cell surfaces by “feeding” cells with unnatural sialic acid precursors, such as mannosamine derivatives 40b and c. These are incorporated into cell-surface glycans, which can be further elaborated using additional bioconjugation reactions. (b) Specific amino acid sequences can be modified using biotin ligase. Interestingly, “ketobiotin” i s also recognized as a substrate for the enzyme, allowing a
uniquely electrophilic handle to be introduced on a single lysine residue. In this example, fluorescent hydrazide 42 is condensed with this group to form a hydrazone. (c) C-terminal modification through protein farnesylation. Azide-containing farnesyl derivatives 43a and bare recognized by farnesyltransferases and added t o “CaaX” sequences. (d) Biotinylated phosphopantetheinyl derivative 44 can be added to fusion proteins bearing peptide carrier proteins (PCPs) derived from nonri bosomal peptide synthetases.
10.3 New Methodsfor Protein Bioconjugation
be targeted for chemoselective modification using secondary bioconjugation methods described below, enabling one to modulate the surface interactions of living cells [65, 661, achieve surface attachment [67], and identify specific glycoprotein subtypes at the proteome level [68]. As many posttranslational modification enzymes display exquisite specificity for a particular amino acid sequence, they are uniquely effective tools for labeling one particular protein present in a complex mixture. A recent report has capitalized on this feature by using biotin ligase to install biotin groups 41a on a single lysine residue embedded in a specific 15 amino acid sequence, Fig. 10.3-16(b)[69].After amide bond formation, these proteins can be further modified using commercially available avidin derivatives, including those bearing fluorescent semiconducting nanocrystals [70]. As the recognition sequence appears with little or no frequency in the proteome of most cells, virtually absolute selectivity can be obtained. Interestingly, biotin ligase has been shown to possess some substrate tolerance, allowing even “ketobiotin” analog 411, to be added to the proteins [69]. This results in the installation of a uniquely electrophilic ketone (see below) on the protein of interest, which can then be labeled chemoselectively with fluorescent chromophore 42 through hydrazone formation. Although this method is currently limited to the labeling of cell surfaces because of the presence of competing ketone metabolites in the cytoplasm, it provides a powerful tool for chemospecific labeling using the natural set of amino acids. A method for the incorporation of artificial functional groups into lipoproteins has also been developed [71]. In this technique, isoprenoid biosynthesis is first halted through the addition of lovastatin to a cell culture. After this, exogenous azidofarnesol derivatives 43a and b are added, leading to the attachment of these groups to “CaaX” boxes by farnesyl transferases (where a is any aliphatic amino acid andX is a C-terminalamino acid). The azide groups can then be biotinylated using the Staudinger ligation (see Section 3.5.2), allowing detection of the modified proteins using Western blot analysis. Importantly, proteins targeted for geranylgeranylation are not modified using the azidofarnesyl analogs. In conjunction with mass spectrometric sequencing techniques, this method can be used for the proteomic analysis of farnesylated proteins. Specific labeling can also be accomplished through the targeting of specific protein domains. In one report, an 80 amino acid peptide carrier protein (PCP) domain derived from a nonribosomal peptide synthetase was fused to a protein of interest [72]. After expression and lysis of the cells, biotinylated derivative 44 was installed on this domain by an added phosphopantetheinyl transferase, Fig. 10.3-16(d).After the labeling step, the protein was attached to avidin coated glass for further screening applications. The use of an exogenous transferase for the labeling step is advantageous, as the native transferases present in the E. coli lysate do not recognize the protein domain or substrate 44.
I
615
616
I 10.3.5
10 Synthesis of Large Biological Molecules
New Bioconjugation Methods Targeting Unnatural Functional Groups
Once artificial functional groups are incorporated into biomolecules, new reactive strategies can be developed to target these sites chemoselectively. As virtually any reactive group can be chosen in principle, these reactions allow the full range of organic chemical transformations to be considered. Several functional group pairs that exhibit orthogonal chemical reactiviq in a biological setting have already been identified.
10.3.5.1 Ketone Functionalizationthrough Hydrazone and Oxime Formation Although proteins possess numerous nucleophilic groups that can be used for covalent modification, the natural amino acids do not provide any electrophilic sites. Because of this, a useful strategy for the introduction of chemically orthogonal reactive groups is to add electrophiles that react selectively with exogenous nucleophiles. Perhaps the first functional group that was used for this purpose was the ketone, using the highly selective condensation of this group with hydrazine and alkoxyamine derivatives to form hydrazones and oximes, respectively [62, 731. Both these reactions proceed readily in aqueous solution, typically using 25 - 1000 pM concentrations of ketonereactive reagents. The reactions occur with a maximum rate at pH 6.5, and the dehydration of the aminocarbinol intermediate has been determined to be the rate-limiting step [74].In most cases, complete selectivity is observed, and high conversions can be obtained through the use of excess reagent. These condensation reactions can be found in many of the examples provided above (Figs. 10.3-4(g),10.3-12(d),10.3-13, and 10.3-16(b)).This is one of the most reliable strategies for protein modification under mild reaction conditions. Ketone groups are often introduced using the primary bioconjugation reactions described above, and are generated directly by biotin ligase [69], N-terminal serine oxidation with periodate [46], and N-terminal transamination with PLP [47]. In addition, this group has been introduced through the incorporation of 4-acetophenylalanineusing the nonsense codon suppression technique [75]. As an alternative, ketone functional groups can also be incorporated into glycoproteins using metabolic engineering [62], as described above.
10.3.5.2 Azide Modification Using the Staudinger Ligation Although ketones show predictable and chemospecificreactivity on the exterior of cells, they are of limited use in labeling studies that must take place inside living cells or in crude cell extracts because of the presence of endogenous ketone metabolites. To provide labeling reactions that can function under these circumstances, several pairs of reagents that do not react with any
70.3 New Methodsfor Protein Bioconjugation
I
617
biological functional groups have been developed. Of these, the azide has proven particularly useful, as it has a high thermodynamic driving force for several reactions, and yet it is kinetically inert under physiological conditions. The first bioconjugation reaction to capitalize on these properties was the Staudinger ligation [G3]. In this method, azides on biomolecules react with triarylphosphines (such as 45)to form iminophosphorane 46 with concomitant loss of nitrogen gas, Fig. 10.3-17(a).Normally, this species would be hydrolyzed to yield the amine and the phosphine oxide; however, it was shown that this intermediate could be trapped by a pendant ester group displayed on the aromatic ring. This ultimately results in the formation of an amide bond that links the phosphine group to the biomolecular target of interest. The mechanism of the reaction was examined in detail, including the isolation and X-ray characterization of intermediate 471,when the reaction was carried out in anhydrous solvent [7G].These studies have determined that the reaction rate is accelerated both in polar solvents (such as water) and when electronrich aryl rings are attached to the phosphorus atom (although this also leads to more rapid aerobic oxidation). For aliphatic azides, the rate-determining step is the formation of iminophosphorane 46, but for aromatic azides the
45
46
47a X = H 47b X = CH3
(b)
48
Cellular metabolism
*
PhzP”oO
HO~M&
40c
2-
0-Cell surfac
0-Cell surface
HO
H P ! ? f N Biotin-N
0
HO
0
(4
50 N3
Fig. 10.3-17 The Staudinger ligation. (a) Triarylphosphines and azides react to form iminophosphorane imtermediate 46, which is trapped by the pendant ester group. Intermediate 47b has been characterized by X-ray crystallography under anhydrous conditions. (b) Modification o f cell surfaces using the Staudinger ligation. Treatment of mammalian cells with mannose derivative 40c results in the incorporation o f azides
0
into sialic acid residues through metabolic engineering. These groups can then be labeled using biotinylated phosphine 49. (c) For direct protein modification, azidohomoalanine 50 can be incorporated into proteins biosynthesized in methionine auxotrophs. In this example, the azides were labeled with a phosphine conjugated t o a FLAG peptide epitope 51.
618
I intramolecular attack on the ester group is rate limiting. The size of the 10 Synthesis $Large Biological Molecules
ester substituent also influences the efficiency of the reaction, with bulky alkyl groups favoring competing hydrolysis pathways. It should be noted that “traceless” versions of this reaction have also been developed [77,78],in which the phosphine oxide moiety is excised during the peptide bond formation step. This alternative method has proven especially useful for protein synthesis via segment condensation reactions [79].A review of both Staudinger ligation types has recently appeared in Ref. 80. The use of this reaction in the biological context was first demonstrated for the chemospecific labeling of Jurkat cell surfaces [63]. Metabolic engineering with N-acetylmannosamine derivative 40c was used to incorporate azides into sialic acid groups on cell surfaces. The cells were then incubated with biotinylated phosphine 49,and the extent of the reaction was quantified by flow cytometry after treatment with fluorescent avidin. Importantly, neither the azide nor the phosphine displayed any reactivity with the cell-surface groups in the absence of its reactive partner. In addition, the cells showed unchanged growth rates after modification. Since the original disclosure, the Staudinger ligation has evolved into a powerful tool for the study of glycosylation pathways. The reactive specificity for the azide/phosphine pair allows virtually any substrate bearing an azido sugar to be derivatized and quantified in a Western blot or well-plate assay. As examples, azide analogs have been used to identify protein targets for N-acetylglucosamine modification in crude lysates [81] and to identify glycosidases using azidosugars further substituted with fluorine atoms to prevent enzyme turnover [82]. It has also been used to develop a parallel-plate “azido-ELISA” assay for the identification of specific peptide sequences that are targeted for mucin-type 0-glycosylation [68,83].More recently, the reaction has even been used to modify cell-surface glycoproteins in living animals [84]. The Staudinger ligation has also been used to modify proteins into which azides have been incorporated directly [85]. In this case, methionine auxotrophic hosts were used to introduce azidohomoalanine 50 into multiple sites of murine dihydrofolate reductase. These groups were then modified using a phosphine bearing a FLAG peptide epitope 51 and detected using a Western blot assay. The specificity of the labeling reaction was again demonstrated by labeling proteins in crude cell lysates. Second generation phosphine reagents have been developed for the fluorescent detection of azide groups [86]. This system employs coumarinsubstituted phosphine 52, which is nonfluorescent due to excited state quenching by the lone pair on the phosphorus atom. On oxidation of the phosphine in the Staudinger ligation this quenching process is relieved, resulting in a dramatic enhancement in the quantum yield for the dye, Fig. 10.3-18.The use ofthis activatable fluorescence system provides significant advantages over the traditional Western blot analyses because it can detect azide-labeled proteins without the need for extensive washing steps and antibody-based detection schemes.
10.3 New Methodsfor Protein Bioconjugation
N,-Protein
52: nonfluorescent (@= 0.01 1)
53: fluorescent (@= 0.65)
Fig. 10.3-18 Generation o f fluorescent Staudinger ligation products. Excited state quenching by the phosphorus lone pair is lost on ligation t o azides, resulting in a dramatic enhancement in the fluorescence quantum yield.
+
10.3.5.3 [3 21 Dipolar Cycloadditions of Azides and Alkynes In 2001, Kolb, Finn, and Sharpless published an article enumerating the
stereospecific chemical reactions that can join reactive components with high yields and little by-product formation [87]. An interesting feature that they share, termed Click reactions, is a great deal of exothermicity through the use of “spring-loaded” reactive components. This report also focused on the use of reactions that are air and water tolerant and can be used in the absence of protecting groups. Thus, many of the reactions on the “Click” list (e.g. hydrazone formation, oxime formation, and epoxide opening) would be natural considerations for biomolecule modification, and, in fact, have been used. One reaction that proceeds particularly well in aqueous solution is the Huisgen [ 3 21 electrocyclization of azides and alkynes [88]. Although the individual components of the reaction are unreactive under most conditions, they can be joined under thermal conditions (often by heating them to 80°C in the absence of solvent) to form triazole products. In the thermal reaction, equimolar mixtures of syn- and anti-triazoles are obtained when terminal alkynes are used. As an early demonstration of the specificity of these components in this reaction, highly potent enzyme inhibitors were synthesized in the active site of acetylcholine esterase using a library of azide and alkyne components [89]. Although no reaction occurred between these compounds in the absence of enzyme, the proximity of the reactive groups in the active site promoted the [ 3 + 21 cycloaddition at room temperature, affording hybrid compounds with femtomolar binding constants. The chemospecificity of the reaction suggested that it could be carried out using azides or alkynes attached to proteins if the reaction temperature could be lowered. This breakthrough was achieved by two groups who simultaneously reported that the reaction could be dramatically accelerated in the presence of Cu(1) salts [90, 911. This allowed the reaction to take place in aqueous solution with temperatures from 4 “C to RT. In the copper-catalyzed version of the reaction, terminal alkynes show high specificity for the antiproduct.
+
I
619
620
I
70 Synthesis ofLarge Biological Molecules
Because of the handling difficulties associated with Cu(1) salts, CuSO4 is typically reduced in situ using a reducing agent, such as ascorbic acid, tris(carboxyethy1phosphine) (TCEP), or small portions of copper wire. It has also been reported that the tris(pyrazolylmethy1)amineligand 58 dramatically accelerates the reaction rate, presumably by stabilizing the Cu(1) oxidation state [92]. As a result, the copper-catalyzed [3 21 cycloaddition reaction has matured into a powerful method for the construction of glycosylation inhibitors [93], dendrimers [94], and many other targets. In a recent study, a binuclear copper cluster has been implicated as the active species in the reaction mechanism [95]. Several reports have been made using this reaction as a protein bioconjugation technique. In the first report, GO azide functional groups attached to the surface of the cowpea mosaic virus (CPMV)using NHS ester or iodoacetamide reactions served as attachment sites for fluorescent alkyne derivatives [92]. The optimal conditions for the reaction were 21 pM protein (based on azidefunctionalized capsid monomers), 1 mM CuSO4, 2 mM TCEP, and 2 mM 58 in pH 8 buffer at 4°C for 16 h. As an alternative, small amounts of copper wire were also effective in generating and maintaining the Cu(1) species. The reaction was also successful when the alkynes were attached to the viral capsid and azides were attached to the dye. Further improvements in the reaction rate have resulted from ligand optimization studies using a fluorescence-quenching assay [96]. In particular, sulfonated bathophenanthroline ligand 59 was identified as an improved ligand for the reaction, leading to substantial rate enhancements. It has been suggested that the reaction acceleration is largely due to the improved solubility of the charged ligand, compared to 58. With this ligand system, and the use of C U ( M ~ C N ) ~ Oefficient T ~ , protein modification has been reported using as little as 2-2.5 equivalents of the coupling partner for each protein monomer. One drawback of this system is the requirement for the rigorous exclusion of oxygen. The exceptional functional group tolerance of the reaction was demonstrated through the coupling of PEG polymers, peptides, and even an intact protein (transferrin) to the surface of azide-functionalized CPMV capsids [97]. The [3 21 cycloaddition reaction has also been used as a method to detect probes attached to protein reactive sites [98]. Termed activity based protein profiling, this approach first involves the alkylation of active site thiols in proteases using azide-bearing phenylsulfonates. Because of the enhanced nucleophilicity of these residues in the active site, only these locations are modified. The protein conjugates are then coupled to alkyne-functionalized biotin or rhodamine compounds for subsequent identification. Interestingly, the reaction takes place in crude homogenates, further underscoring the high functional group tolerance of the [3 21 cyclization chemistry. The advantage of this approach is that it avoids the use of bulky chromophores or biotin affinity tags that could influence the selectivity of the key protein-labeling step.
+
+
+
10.3 New Methodsfor Protein Bioconjugation
Although the previously-described studies have relied on primary bioconjugation reactions to introduce the azide and alkyne functional groups, several reports have used the [ 3 21 cycloaddition reaction for the modification of artificial amino acids incorporated at the transcriptional level. In the first case, propargyloxy- and azide-functionalized phenylalanine derivatives were effectively added to the genetic code of Saccharomyces cerevisiae using orthogonal tRNA/synthetase pairs, Fig. 10.3-19(b)[99]. This approach was used to prepare human superoxide dismutase (SOD) mutants in which a tryptophan residue was replaced by either GO or 61. After purification of the His,-tagged protein using Ni-affinity chromatography, the [ 3 21 cycloaddition reaction was carried out through exposure of the protein to CuSO4, copper wire, ligand 58, and the appropriate reactive partner. No reaction was observed in the case
+
+
Fig. 10.3-19 Modification o f proteins using obtained when capsids bearing alkynes were
“Click” chemistry. (a) Sixty azide groups were introduced on the surface o f t h e cowpea mosaic virus (CPMV) through the alkylation o f genetically introduced cysteine residues. These groups can be modified through exposure t o alkynes, &(I) (the Cu(ll) source is reduced in situ by the TCEP), and ligand 58 or 59. Similar results were
exposed t o azides. (b) Azide- and alkyne-containing amino acids were incorporated into proteins using unnatural tRNA/synthetase pairs obtained using selection techniques. These groups can be modified with high chemoselectivity using
the appropriate Click CoLJPlingPartners.
I
621
622
I of the wild-type protein. This study highlights the power of artificial amino 10 Synthesis of Large Biological Molecules
acid incorporation in the development of selective bioconjugation. In another example, azide functional groups were displayed on the surface of E. coli methionine auxotrophs by adding azidohomoalanine 50 to the culture medium before induction [loo, 1011. The Cu-catalyzed [3 21 cycloaddition reaction was then used to attach a biotinylated alkyne to expressed proteins on the surface of the living cells. After secondary labeling with fluorescent avidin, flow cytometry was used to confirm the success of the labeling reaction. No conjugation was detected in the case of cells grown in methionine-containing media. A potential cause for concern in Click reactions is the requirement of copper ions, as Cu(1) can bind to proteins and is toxic to cells. Although the previous studies indicate that these problems are not insurmountable (particularly if high affinity, water-soluble ligands are used), a metal-free version of the Click reaction has been developed for applications in which cells must remain viable after modification [102].This reaction is driven by the relief of ring strain for cyclooctyne, resulting in successful reaction with azides at room temperature (Fig. 10.3-20). This reaction was tested in the context of a bioconjugation reaction with recombinant glycoprotein GlyCAM-Ig, into which azides were incorporated through metabolic engineering techniques (see above). On treatment with biotin-cyclooctyne 62,the [3 21 cycloaddition occurred in the absence of metal ions, as determined by biotin quantification using Western blot analysis. Under identical conditions terminal alkynes did not react, and control reactions performed on proteins lacking azide groups showed no labeling. This reaction was also used successfully for the labeling of Jurkat cells bearing azide-containing sialic acid derivatives on
+
+
+ oo 63a
9 0 -
3 62 (250 pM)
63b
Fig. 10.3-20 Metal-free bioconjugation using a strain-promoted [3 21 dipolar cycloaddition reaction. This reaction i s accelerated by the relief o f ring strain in the transition state as the alkyne carbons become sp2 hybridized.
+
H N
Biotin
~
0
70.3 New Methodsfor Protein Bioconjugation
I
623
their surface. Although this reaction appears to be somewhat slower than the copper-catalyzed reaction, no losses in cell viability were observed in these studies.
10.3.5.4 Aniline Functionalization through Oxidative Coupling Reactions
The techniques described above demonstrate the value of identifying new coupling partners that possess no cross-reactivity with native functionality. To add to this group, new reactive pairs based on the chemoselective oxidative coupling of aniline groups have recently been developed [ 1031. This reaction is based on the observation that N-acyl phenylenediamine derivatives (such as 64) trimerize extremely rapidly under oxidative conditions, ultimately resulting in the formation of highly stable dye molecules (e.g.. G S ) , Fig. 10.3-21 [104]. It was found that the addition of alkyl groups to the free phenylene diamine nitrogen atom, blocks its participation as a nucleophile in this reaction, but
NH2 65
NH2 64
+
NAO
Protein
67 R2
Jyq HNAo
Protein
1. Oxidation 3. Oxidation 2.H20
\ N 4 69
Fig. 10.3-21
Secondary bioconjugation using oxidative coupling reactions. (a) N-acylphenylene diamine derivatives rapidly trimerize under oxidative conditions t o yield stable dyes, such as 65. (b) By adding substituents to the amino group, a
~
.a.q0
Protein
\ N 4 70 chemoselective two-component analog was developed. Following formation o f adduct 69, a series o f oxidation and water addition steps occur to afford stable product 70. Presumably this reaction proceeds via charge transfer complex 68.
624
I still allows species 66 to react rapidly with additional anilines to form adduct 10 Synthesis $Large Biological Molecules
69. Subsequent reoxidation of this intermediate, followed by the nucleophilic addition of water, and a final oxidation step, affords product 70. This “A B” analog of 65 has similar stability, showing no degradation at pH levels 1 to 11 and under both oxidative and reductive conditions. The reaction proceeds in aqueous solution and is complete in less than 1 min, even at low concentrations of 66 and 67. The reaction has also been carried out using anilines attached to proteins, and shows virtually complete selectivity for the desired modification pathway. No protein modification occurs using either the aniline or the bis(iminoquinone) component alone. The origin of this chemoselectivity is presumed to result from the intermediary of radical pair 68 produced on electron transfer between the aniline and the oxidized phenylene diamine groups. A similar coupling reaction has also been developed for aminotyrosines, which can be convenienty prepared from azotyrosine 71 through reduction with sodium dithionite, Fig. 10.3-22(a)[24]. In the presence of NaI04, (NH4)2Ce(N03)G, or Hz02, these groups undergo exceptionally rapid coupling reactions with phenylene diamine derivatives (again, typically reaching complete conversion in under 1 min), presumably through an analogous charge transfer complex 74 [103]. Subsequent oxidation of adduct 75 yields stable product 76. The chemoselectivity of this technique for aminotyrosinecontaining proteins is shown in Fig. 10.3-22(b).
+
10.3.6 New Methods for Bioconjugate Purification
Many, if not most, bioconjugation reactions do not reach full conversion, affording mixtures of modified and unmodified proteins that are difficult to separate. While the presence of residual wild-type protein is tolerable in some cases, many applications (such as FRET studies) would benefit from the removal of unreacted proteins from the sample. Furthermore, the isolation and concentration of specifically labeled proteins or peptide fragments can assist their characterization by mass spectrometry. This is most commonly achieved using affinity chromatography based on biotinlavidin interactions. However, the harsh conditions required to release the protein substrate from the solid support can lead to substantial losses in activity in many instances. This method also requires the synthesis of bifunctional labeling agents that possess both the group of interest and the biotin tag. A simpler and more general alternative relies on the ability of B-cyclodextrin immobilized on a Sephacryl support to form hostlguest complexes with a wide range of organic molecules, including chromophores [105]. On addition of cyclodextrin-functionalizedSephacryl resin 77 to the protein-labeling reactions, the modified protein is bound by the resin while unmodified protein is left in the solution. Following isolation of the resin via filtration, the captured protein
10.3 New Methodsfor Protein Bioconjugation
Fig. 10.3-22 Secondary bioconjugation using aminotyrosines. (a) Following reduction o f azotyrosine 71 using dithionite, the aminotyrosine product 72 couples rapidly with N-acylphenylene diamine 73 under oxidative conditions. Following the formation of adduct 75, an additional oxidation step occurs t o yield stable product 76. Similar to the aniline coupling strategy described above, this reaction i s believed t o
proceed through charge transfer complex 74. (b) The chemoselectivity of the reaction was demonstrated by mixing proteins containing (boxed) or lacking aminotyrosine and labeling them with 73. Separation by SDS-PACE and visualization o f the fluorescent dye confirmed that only the aminotyrosine-labeled proteins participated in the reaction. Proteins: A - BSA, B - chymotrypsinogen A, C - RNAse A.
can be released through the addition of a competitive cyclodextrin binder, such as adamantane carboxylic acid 78 (Fig. 10.3-23). A particularly attractive feature of this method is the mild conditions that can be used to elute the protein from the resin.
10.3.7 Future Development
As the field of bioconjugation continues to evolve at a rapid pace, there are a number of challenges that are likely to be addressed. The first of these involves improvements in the overall reliability of existing protein modification reactions. A lesson from the field of natural product synthesis
I
625
626
I
70 Synthesis of Large Biological Molecules
Fig. 10.3-23 A general strategy for the purification o f chromophore-labeled proteins. (a) This approach takes advantage o f host/guest interactions between Sepharose-bound cyclodextrins and hydrophobic organic molecules. A sample o f compatible chromophores is shown at right. (b) The resin captures chromophore-labeled proteins selectively, allowing facile removal
o f unmodified protein via filtration. The captured proteins can be eluted from the resin using a competitive cyclodextrin binder, such a s adamantane carboxylic acid (78). (c) Purification o f Oregon Green labeled myoglobin. The removal o f residual unlabeled protein can be confirmed through UV-vis analysis, or (d-f) by using ESI-MS.
is that even the most predictable chemical reactions can display unexpected reactivity and selectivity when applied to complex molecular targets. Similar behavior is often observed for protein modification, as each biomolecular target presents multiple chemical environments of unmatched complexity. The “personality” of each protein can be difficult to predict, owing to variations in the solvent accessibility oftargeted residues and the effects oflocal environments on p K, values. Further complications arise on consideration of the rapid conformational changes of the surface groups and the aggregation of proteins and reagents in aqueous solution. As a result, the scope and utility of each bioconjugation reaction can be evaluated only by applying it to many
10.3 New Methodsfor Protein Bioconjugation
targets over a period of time. Although some of the structure/reactivity data can be generated through crystal structure analysis, it is likely that these studies will be facilitated by the new understanding which N M R structure determination, single molecule spectroscopy, and molecule dynamics can now provide. Given this structural diversity, the continued development of new reactions is also crucial. Even in cases where a modification strategy is already in place for a particular functional group, alternative reactions can allow expansions in substrate scope, alterations in modification selectivity, synthetic convenience, and perhaps even greater biocompatibility. Just as a welltrained synthetic chemist must know a dozen methods for the oxidation of an alcohol to an aldehyde, protein bioconjugation will be approached with much more success if many techniques are available to address the situation at hand. In terms of the reactions themselves, there are several areas that are likely to see improvement. First, there are still many native functional groups for which reliable modification strategies are yet to be developed (including disulfides, asparagines (potentially allowing an efficient synthesis of N-linked glycoproteins), and methionines). Secondly, many bioconjugation reactions currently in use do not reach full conversion in a reasonable period of time. The availability of new transition metal-based methods is likely to address both of these limitations as the design rules for effecting these transformations in aqueous media are further elucidated. Currently, it is difficult or impossible to distinguish between two instances of a native residue - a situation that is also likely to change as more complex modification reagents and catalyst ligands are applied. Clearly, artificial amino acid incorporation techniques will also be used to address these challenges. As noted above, chromatography techniques that can purify modified proteins are beginning to appear, and the analytical techniques for bioconjugate characterization are steadily improving. Advances in mass spectrometry (both in terms of capability and accessibility) will certainly continue to play a key role in this regard, as the mass accuracy of this analysis method can reveal information about a reaction outcome that SDS-PAGE cannot resolve. A frontier that is certainly gaining considerable attention is the ability to modify proteins inside living cells. Given the success of FlAsH techniques for in situ labeling, these strategies are certain to provide important tools that increase our understanding of cellular function. The design of these reactions is very challenging, in part due to the very low concentration of individual protein targets in a cell and the exquisitely high specificity required to avoid background labeling. Another significant challenge is presented by the high concentration (-5 mM) of glutathione in the cytoplasm of mammalian cells, as the free thiol group of this reagent foils most reactions involving electrophilic reagents, radicals, and oxidants. Added to this is the requirement for nontoxic reagents and the challenge of designing compounds that can cross the cell membrane to reach the targets inside. Over time these criteria
1
627
628
10 Synthesis of Large B;o/ogica/ Molecules
I will undoubtedly be met, perhaps by using drug-design principles from the pharmaceutical industry. In the light of all of these considerations, perhaps the most important frontier is a conceptual one. As the access to structural information increases, along with convenient computer programs that can be used to visualize and analyze complex biomolecules, it is hoped that more chemists will see proteins as the organic molecules that they are. The principles of chemical reactivity and conformational analysis apply to these compounds just as they do to any other natural product, and many groups have used exactly these same concepts to develop the reactions described herein. The development of future methods to meet the above challenges will be achieved most successfully by those who have adopted this mindset.
10.3.8 Conclusion
Taken together, the new chemical tools described herein have dramatically altered the landscape of chemical biology. Each of these techniques has expanded the scope of bioconjugates that can be prepared, and thus the creativity with which new experimental systems can be designed. Many of the labeling reactions can achieve levels of selectivity that were previously impossible to attain, even allowing single proteins to be targeted in the complex biochemical settings of living cells. Equally important is the continued development of the conceptual framework that is needed to create future reactions. In addition to improving our understanding of enzyme function and protein trafficking, these new techniques have enabled frontier applications in proteomics, single molecule spectroscopy, and the preparation of biomolecular materials, among others. As new strategies continue to emerge, this field is certain to retain its crucially important role in chemical biology.
Acknowledgments
I would especially like to thank the students with whom I have had the pleasure of working during the past four years. They are an extremely talented and creative group of scientists, and 1 cannot overemphasize my gratitude for their enthusiasm, hard work, and intellectual input. Our efforts in the area ofprotein modification have been generously funded by the Biomolecular Materials Program at Lawrence Berkeley National Labs, the DOE Nanoscale Science and Engineering Technology (NSET)program, the NIH (R01 GM072700-Ol),and the Department of Chemistry at UC Berkeley.
References I 6 2 9
References A.F. Straight, A. Cheung, J. Limouze, 11. ].A. Maurer, D.E. Elmore, H.A. I. Chen, N.J. Westwood, J.R. Sellers, Lester, D.A. Dougherty, Comparing T.J. Mitchison, Dissecting temporal and contrasting Escherichia coli and and spatial control of cytokinesis with Mycobacterium tuberculosis a myosin I1 inhibitor, Science 2003, mechanosensitive channels 299,1743-1747. (MscL) - new gain of function 2. B.A. Griffin, S.R. Adams, R.Y. Tsien, mutations in the loop region, /. B i d . Specific covalent labeling of Chem. 2000,275, 22238-22244. recombinant protein molecules 12. Q. Wang, T.W. Lin, L. Tang, J.E. inside live cells, Science 1998, 281, Johnson, M.G. Finn, Icosahedral 269-272. virus particles as addressable 3. E. Babini, I. Bertini, M. Borsari, nanoscale building blocks, Angew. F. Capozzi, C. Luchinat, X.Y. Zhang, Chem. Int. Ed. Engl. 2002, 41, G.L.C. Moura, I.V. Kurnikov, D.N. 459-462. Beratan, A. Ponce, A.J. Di Bilio, J.R. 13. For an example of double Winkler, H.B. Gray, Bond-mediated chromophore labeling for FRET electron tunneling in studies, see M. Borsch, M. Diez, ruthenium-modified high-potential B. Zimmermann, R. Reuter, iron-sulfur protein, J. Am. Chem. SOL. P. Graber, Stepwise rotation of the 2000, 122,4532-4533. y-subunit of EFoFl-ATP synthase observed by intramolecular 4. S. Zalipsky, Chemistry of single-molecule fluorescence polyethylene-glycol conjugates with resonance energy transfer, FEES Lett. biologically-active molecules, Adu. 2002, 527,147-152. Drug Deliv. Rev. 1995, 16, 157-182. 14. R.F. Doolittle, Redundancies in 5. S. Zalipsky, J.M. Harris, Introduction protein sequences, in Prediction of to chemistry and biological Protein Structure and the Principles of applications of poly(ethy1ene glycol), Protein Conformation,(Ed.: G.D. FasPoly(EthyleneGlycol) 1997, 680, 1-1 3. man), Plenum Press, New York, 6. H.C. Hang, C.R. Bertozzi, 1989. Chemoselective approaches to 15. J. Houk, G.M. Whitesides, Structure glycoprotein assembly, Acc. Chem. reactivity relations for thiol disulfide Res. 2001, 34, 727-736. interchange, /. Am. Chem. SOL.1987, 7. C.M. Niemeyer, Nanoparticles, 109,6825-6836. proteins, and nucleic acids: 16. T.P. King, Y. Li. L. Kochoumian, biotechnology meets materials Preparation of protein conjugates via science, Angew. Chem. Int. Ed. Engl. intermolecular disulfide bond 2001,40,4128-4158. formation, Biochemistry 1978, 17, 8. N.C. Seeman, A.M. Belcher, 1499- 1506. Emulating biology: building 17. For an example, see S. Zalipsky, nanostructures from the bottom up, M. Qazen, J.A. Walker, N. Mullah, Proc. Nut. Acad. Sci. U. S. A. 2002, 99, Y.P. Quinn, S.K. Huang, New 6451-6455. detachable poly(ethy1ene glycol) 9. For an excellent review of common conjugates: cysteine-cleavable bioonjugation techniques, see G.T. lipopolymers regenerating natural Hermanson, Bioconjugute Techniques, phospholipid, diacyl Academic Press, San Diego, 1996. phosphatidylethanolamine, 10. T.L. Schlick, Z.B. Ding, E.W. Kovacs, Bioconjug. Chem.1999, 10, 703-707. M.B. Francis, Dual-surface 18. H.R.Adams,C.H. Paik, W.C. modification of the tobacco mosaic Eckelman, R.C. Reba, Electrophilic virus, J . Am. Chem. SOC.2005, 127, iodination of aromatic rings, J . 3718-3723. 1.
630
I
I0 Synthesis of Large Biological Molecules
Labelled Comp. Radiopharm. 1982, 19,
30.
1477- 1478. 19.
W.C. Eckelman, H.R. Adams, C.H. Paik, Electrophilic iodination of aromatic rings, Int. ]. Nucl. Med. Biol.
31.
1984, 11,163-166. 20.
J.F. Leite, M. Cascio, Probing the topology of the glycine receptor by chemical modification coupled to mass spectrometry, Biochemistry 2002,41,6140-6148.
21.
22.
23.
24.
25.
26.
27.
H.G. Higgins, D. Fraser, The reaction of amino acids and proteins with diazonium compounds. 1. A spectrophotometric study of azo-derivativesof histidine and tyrosine, Australian]., Sci. Res. Ser. A Phys. Sciences 1952, 5, 736-753. H.G. Higgins, K.J. Harrington, Reaction of amino acids and proteins with diazonium compounds. 2. Spectra of protein derivatives,Arch. Biochem. Biophys. 1959, 85, 409-425. J.A. Shin, Specific DNA binding peptide-derivatized solid support, Bioorg. Med. Chem. Lett. 1997, 7,2367. J.M. Hooker, E.W. Kovacs, M.B. Francis, Interior surface modification of bacteriophage MS2, J. Am. Chem. SOC.2004, 126,3718-3719. N.S. Joshi, L.R. Whitaker, M.B. Francis, A three-component mannich-type reaction for selective tyrosine bioconjugation, J. Am. Chem. SOC. 2004, 126, 15942-15943. For an example of a lanthanide-promoted phenol modification with imines in organic solvents, see T.S. Huang, C.J. Li, Synthesis of amino acids via a three-component reaction of phenols, glyoxylates and amines, Tetrahedron Lett. 2000, 41, 6715. H. Fraenkel-Conrat, H.S. Olcott, Reaction of formaldehyde with proteins. VI. cross-linking of amino groups with phenol, imidazole, or indole groups, /. Biol. Chem. 1948,
476-478. 32.
N.S. Joshi, M.B. Francis, Submitted. P.E. Dawson, T.W. Muir, I. Clarklewis, S.B.H. Kent, Synthesis of proteins by native chemical ligation, Science 1994, 266, 776-779.
D.T. Bong, M.R. Ghadiri, Chemoselective Pd(0)-catalyzed peptide coupling in water, Org. Lett. 2001,3,2509-2511.
A. Ojida, H. Tsutsumi, N. Kasagi, I. Hamachi, Suzuki coupling for protein modification, Tetrahedron Lett. 2005, 46, 3301-3305. 34. S.D. Tilley, M.B. Francis, Submitted. 35. J. Stubbe, D.G. Nocera, C.S. Yee, M.C.Y. Chang, Radical initiation in the class I ribonucleotide reductase: long-range proton-coupled electron transfer? Chem. Rev. 2003, 103, 33.
2167-2201. 36.
J.M. Antos, M.B. Francis, Selective tryptophan modification with rhodium carbenoids in aqueous solution, ]. Am. Chem. SOC.2004, 126,
37.
H.M. Davies, P.R. Bruzinski, D.H. Lake, N. Kong, M.J. Fall, Asymmetric cyclopropanations by rhodium(I1) N-(arylsu1fonyl)prolinate catalyzed decomposition of vinyldiazomethanes in the presence of alkenes. Practical enantioselective synthesis of the four stereoisomers of 2-phenylcyclopropan-1-aminoacid, J . Am. Chem. SOC. 1996, 118,
10256-10257.
6897 - 6907. 38.
Most proteins are not denatured by the use of this cosolvent. For examples, see Y.L. Khmelnitsky, V.V. Mozhaev, A.B. Belova, M.V. Sergeeva, K. Martinek, Denaturation capacity - a new quantitative criterion for selection of organic-solvents as reaction media in biocatalysis, Eur. ]. Biochem. 1991,
39.
J.M. Antos, M.B. Francis, Unpublished results.
174,827-843. 28. 29.
K.J. Franz, M. Nitz, B. Imperiali, Lanthanide-binding tags as versatile protein coexpression probes, Chembiochem 2003,4,265-271. H. Dibowski, F.P. Schmidtchen, Bioconjugation of peptides by palladium-catalyzed C-C cross-coupling in water, Angew. Chem. Int. Ed. Engl. 1998, 37,
198,31-41.
References I 6 3 1 40.
41.
42.
43.
44.
45.
46.
47. 48.
An analogous rearrangement pathway has been observed for small molecule disulfides in organic solvents M. Hamaguchi, T. Misumi, T. Oshima, Reaction of vinylcarbenoids with cyclic disulfides: formation of 1,3-insertion products as well as 1,l-insertion products, Tetrahedron Lett. 1998, 39, 7113-7116. J.M. McFarland, M.B. Francis, Reductive alkylation of proteins using iridium catalyzed transfer hydrogenation, J . Am. Chem. SOC. 2005, in press. T.J. Sereda, C.T. Mant, A.M. Quinn, R.S. Hodges, Effect of alpha-amino group on peptide retention behavior in reversed-phase chromatography - determination of the pK(a) values of the alpha-amino group of 19 different N-terminal amino-acid-residues, /. Chromatogr. 1993, 646,17-30. J.P. Tam, Q.T. Yu, Z.W. Miao, Orthogonal ligation strategies for peptide and protein, Biopolymers 1999, 51,311-332. X.F. Li, L.S. Zhang, S.E. Hall, J.P. Tam, A new ligation method for N-terminal tryptophan-containing peptides using the Pictet-Spengler reaction, Tetrahedron Lett. 2000, 41, 4069-4073. P.H. Hirel, J.M. Schmitter, P. Dessen, G. Fayat, S. Blanquet, Extent of N-terminal methionine excision from escherichia-coli proteins is governed by the side-chain length of the penultimate amino-acid, Proc. Nut. Acad. Sci. U. S. A. 1989, 86,8247-8251. K.F. Geoghegan, J.G. Stroh, Site-directed conjugation of nonpeptide groups to peptides and proteins via periodate-oxidation of a 2-amino alcohol - application to modification at N-terminal serine, Bioconjug. Chem. 1992, 3, 138-146. J.M. Gilmore, R.A. Scheck, M.B. Francis, Unpublished results. For a related reaction catalyzed by copper ions, see H.B.F. Dixon, N-terminal modification of
proteins - a review, J . Protein Chem. 1984,3,99-108. 49. For examples, see T.J. Tolbert, C.H. Wong, Intein-mediated synthesis of proteins containing carbohydrates and other molecular probes, /. Am. Chem. SOC. 2000, 122, 5421-5428. 50. S.R. Adams, R.E. Campbell, L.A. Gross, B.R. Martin, G.K. Walkup, Y. Yao, I. Llopis, R.Y. Tsien, New biarsenical ligands and tetracysteine motifs for protein labeling in vitro and in vivo: synthesis and biological applications, J . Am. Chem. SOC. 2002, 124,6063-6076. 51. R.Y. Tsien, The green fluorescent protein, Annu. Rev. Biochem. 1998, 67,509-544. 52. C.J. Noren, S.J. Anthonycahill, M.C. Griffith, P.G. Schultz, A general method for site-specific incorporation of unnatural amino-acids into proteins, Science 1989, 244, 182-188. 53. J.A. Ellman, D. Mendel, S. Anthonycahill, C.J. Noren, P.G. Schultz, P. G. Biosynthetic method for introducing unnatural amino-acids site-specifically into proteins, Methods Enzymol.1991, 202,301-336. 54. L. Wang, A. Brock, B. Herberich, P.G. Schultz, Expanding the genetic code of Escherichia coli, Science 2001, 292,498-500. 55. J.W. Chin, S.W. Santoro, A.B. Martin, D.S. King, L. Wang, P.G. Schultz, Addition of p-azido-L-phenylalanine to the genetic code of Escherichia coli, J . Am. Chem. SOC. 2002, 124, 9026-9027. 56. L. Wang, P.G. Schultz, Expanding the genetic code, Chem. Commun. 2002, 1 , 1-11. 57. L. Wang, Z. Zhang, A. Brock, P.G. Schultz, Addition of the keto functional group to the genetic code of Escherichia coli, Proc. Nat. Acad. S C ~U.. S. A. 2003, 100, 56-61. 58. R.A. Mehl, J.C. Anderson, S.W. Santoro, L. Wang, A.B. Martin, D.S. King, D.M. Horn, P.G. Schultz, Generation o fa bacterium with a 2 1 amino acid genetic code, J . Am. Chem. SOC. 2003, 125,935-939
6321 70 iynthesis of 59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
Large Biological Molecules
69. 1. Chen, M. Howarth, W. Lin, A.Y. K.L. Kiick, D.A. Tirrell, Protein Ting, Site-specificlabeling of cell engineering by in vivo incorporation surface proteins with biophysical of non-natural amino acids: control probes using biotin ligase, Nat. of incorporation of methionine Methods 2005, 2, 99-104. analogues by methionyl-tRNA 70. M. Howarth, K. Takao, Y. Hayashi, synthetase, Tetrahedron 2000, 56, A.Y. Ting, Targeting quantum dots to 9487-9493. surface proteins in living cells with K.L. Kiick, R. Weberskirch, D.A. biotin ligase, Proc. Nat. Acad. Sci. U. Tirrell, Identification of an expanded S. A. 2005, 102,7583-7588. set of translationally active methionine analogues in Escherichia 71. Y. Kho, S.C. Kim, C. Jiang, D. Barma, S.W. Kwon, J.K. Cheng, J. Jaunbergs, coli, F E B S Lett. 2001, 502, 25-30. C. Weinbaum, F. Tamanoi, J. Falck, K. Kirshenbaum, I.S. Carrico, D.A. Y.M. Zhao, A tagging-via-substrate Tirrell, D. A. Biosynthesis of proteins technology for detection and incorporating a versatile set of proteomics of farnesylated proteins, phenylalanine analogues, Proc. Nut. Acad. Sci. U. S. A. 2004, Chembiochem 2002,3, 235-237. 101,12479-12484. L.K. Mahal, K.J. Yarema, C.R. 72. J. Yin, F. Liu, X.H. Li, C.T. Walsh, Bertozzi, Engineering chemical Labeling proteins with small reactivity on cell surfaces through molecules by site-specific oligosaccharide biosynthesis, Science posttranslational modification, J . Am. 1997, 276,1125-1128. Chem. SOC.2004, 126,7754-7755. E. Saxon, C.R. Bertozzi, Cell surface engineering by a modified staudinger 73. V.W. Cornish, K.M. Hahn, P.G. Schultz, Site-specificprotein reaction, Science 2000, 287, modification using a ketone handle, 2007-2010. J. Am. Chem. SOC.1996, 118, E. Saxon, S.J. Luchansky, H.C. Hang, 8150-8151. C. Yu, S.C. Lee, C.R. Bertozzi, 74. W.P. Jencks, Studies on the Investigating cellular metabolism of mechanism of oxime and synthetic azidosugars with the semicarbazone formation, J . Am. staudinger ligation, J. Am. Chem. SOC. Chem. SOC.1959,81,475-481. 2002, 124,14893-14902. 75. Z.W. Zhang, B.A.C. Smith, L. Wang, J.H. Lee, T.J. Baker, L.K. Mahal, A. Brock, C. Cho, P.G. Schultz, A J. Zabner, C.R. Bertozzi, D.F. new strategy for the site-specific Wiemer, M.J. Welsh, Engineering modification of proteins in vivo, novel cell surface receptors for Biochemistry-Us2003, 42,6735-6746. virus-mediated gene transfer, J. B i d . 76. F.L. Lin, H.M. Hop, H. van Halbeek, Chem. 1999,274,21878-21884. R.G. Bergman, C.R. Bertozzi, S.J. Luchansky, C.R. Bertozzi, Azido Mechanistic investigation of the sialic acids can modulate cell-surface staudinger ligation, J. Am. Chem. SOC. interactions, Chembiochem2004, 5, 2005, 127,2686-2695. 1706- 1709. 77. E. Saxon, J.I. Armstrong, C.R. R.A. Chandra, E.A. Douglas, R.A. Bertozzi, A “traceless” Staudinger Mathies, C.R. Bertozzi, M.B. Francis, ligation for the chemoselective Programmable cell adhesion encoded synthesis of amide bonds, Org. Lett. by DNA hybridization, Angew. Chem. 2000,2,2141-2143. Int. Ed. Engl. 2006, 45,896-901. 78. B.L. Nilsson, L.L. Kiessling, R.T. H.C. Hang,C.Yu, D.L. Kato, C.R. Raines, Staudinger ligation: a peptide Bertozzi, A metabolic labeling from a thioester and azide, Org. Lett. approach toward proteomic analysis 2000,2,1939-1941. of rnucin-type 0-linked glycosylation, 79. B.L. Nilsson, R.J. Hondal, M.B. Proc. Nat. Acad. Sci. U. S. A. 2003, Soellner, R.T. Raines, Protein 100,14846-14851. assembly by orthogonal chemical
References I 6 3 3
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
ligation methods, J . Am. Chem. Soc. 2003, 125,5268-5269. M. Kohn, R. Breinbauer, The staudinger ligation-A gift to chemical biology, Angew. Chem. Znt. Ed. Engl. 2004,43, 3106-3116. D.J. Vocadlo, H.C. Hang, E.J. Kim, J.A. Hanover, C.R. Bertozzi, A chemical approach for identifying 0-GlcNAc-modified proteins in cells, Proc. Natl. Acad. Sci. U. S. A. 2003, 100,9116-9121. D. J. Vocadlo, C.R. Bertozzi, A strategy for functional proteomic analysis of glycosidase activity from cell lysates, Angew. Chem. Int. Ed. Engl. 2004,43,5338-5342. H.C. Hang, C. Yu, M.R. Pratt, C.R. Bertozzi, Probing glycosyltransferase activities with the staudinger ligation, 1.Am. Chem. Soc. 2004, 126,6-7. J.A. Prescher, D.H. Dube, C.R. Bertozzi, Chemical remodelling of cell surfaces in living animals, Nature 2004,430,873-877. K.L. Kiick, E. Saxon, D.A. Tirrell, C.R. Bertozzi, Incorporation of azides into recombinant proteins for chemoselective modification by the staudinger ligation, Proc. Natl. Acad. S C ~U.. S . A. 2002, 99, 19-24. G.A. Lemieux, C.L. de Graffenried, C.R. Bertozzi, A fluorogenic dye activated by the staudinger ligation,]. Am. Chem. Soc. 2003, 125, 4708-4709. H.C. Kolb, M.G. Finn, K.B. Sharpless, Click chemistry: diverse chemical function from a few good reactions, Angew. Chem. Znt. Ed. Engl. 2001,40,2004-2021. R. Huisgen, in 1,3-Dipolar Cycloaddition Chemistry, (Ed.: A. Padwa), Vol I, Wiley, New York, 1984, pp. 1-176. W.G. Lewis, L.G. Green, F. Grynszpan, 2 . Radic, P.R. Carlier, P. Taylor, M.G. Finn, K.B. Sharpless, Click chemistry in situ: Acetylcholinesterase as a reaction vessel for the selective assembly of a femtomolar inhibitor from an array of building blocks, Angew. Chem. Int. Ed. Engl. 2002,41,1053-1057.
90.
91.
92.
V.V. Rostovtsev, L.G. Green,V.V. Fokin, K.B. Sharpless, A stepwise huisgen cycloaddition process: copper(1)-catalyzed regioselective “ligation” of azides and terminal alkynes, Angew. Chem. Znt. Ed. Engl. 2002,41,2596-2599. C.W. Torn~re,C. Christensen, M. Meldal, Peptidotriazoles on solid phase: [1,2,3]-triazoles by regiospecific copper(1)-catalyzed 1,3-dipolar cycloadditions of terminal alkynes to azides, J . Org. Chem. 2002, 67,3057-3062. Q. Wang, T.R. Chan, R. Hilgraf, V.V. Fokin, K.B. Sharpless, M.G. Finn, Bioconjugation by copper(1)-catalyzedazide-alkyne (3 21 cycloaddition,]. Am. Chem. SOC.2003, 125, 3192-3193. L.V. Lee, M.L. Mitchell, S.J. Huang, V.V. Fokin, K.B. Sharpless, C.H. Wong, A potent and highly selective inhibitor of human alpha-l,3-fucosyltransferasevia click chemistry,]. Am. Chem. SOC.2003, 125,9588-9589. P. Wu, A.K. Feldman, A.K. Nugent, C.J. Hawker, A. Scheel, B. Voit, J. Pyun, J.M.J. Frechet, K.B. Sharpless, V.V. Fokin, Efficiency and fidelity in a click-chemistry route to triazole dendrimers by the copper(I)-catalyzed ligation of azides and alkynes, Angew. Chem. Int. Ed. Engl. 2004, 43, 3928-3932. V.O. Rodionov, V.V. Fokin, M.G. Finn, Mechanism of the ligand-free Cu-I-catalyzed azide-alkyne cycloaddition reaction, Angew. Chem. Znt. Ed. Engl. 2005, 44, 2210-2215. W.G. Lewis, F.G. Magallon, V.V. Fokin, M.G. Finn, Discovery and characterization of catalysts for azide-alkyne cycloaddition by fluorescence quenching,]. Am. Chem. SOC.2004, 126,9152-9153. S . S . Gupta, J. Kuzelka, P. Singh, W.G. Lewis, M. Manchester, M.G. Finn, Accelerated bioorthogonal conjugation: a practical method for the ligation of diverse functional molecules to a polyvalent virus scaffold, Bioconjug. Chem. in press.
+
93.
94.
95.
96.
97.
634
I
10 Synthesis of Large Biological Molecules
functionality in bacterial cell surface A.E. Speers, G.C. Adam, B.F. Cravatt, proteins,]. Am. Chem. SOC.2004, 126, Activity-based protein profiling in 10598- 10602. vivo using a copper(1)-catalyzed 102. N.J. Agard, J.A. Prescher, C.R. azide-alkyne [ 3 + 21 cycloaddition, ]. Bertozzi, A strain-promoted ( 3 + 21 Am. Chem. SOC. 2003, 125, azide-alkyne cycloaddition for 4686-4687. covalent modification of 99. A. Deiters, T.A. Cropp, M. Mukherji, biomolecules in living systems, J . J.W. Chin, J.C. Anderson, P.G. Am. Chem. SOC.2004, 126, Schultz, Adding amino acids with 15046- 15047. novel reactivity to the genetic code of 103. J.M. Hooker, M.B. Francis, Saccharomyces cerevisiae, /. Am. Submitted. Chem. SOC.2003, 125,11782-11783. 104. J.F. Corbett, Benzoquinone imines. 100. A.J. Link, D.A. Tirrell, Cell surface part IV. Mechanism and kinetics of labeling of Escherichia coli via the formation of bandrowski’s base, copper(1)-catalyzed [3 + 21 J . Chem. SOC. B 1969, 818. cycloaddition, J. Am. Chem. SOC. 105. T. Nguyen, N.S. Joshi, M.B. Francis, 2003, 125,11164-11165. 101. A.J. Link, M.K.S. Vink, D.A. Tirrell, Submitted. Presentation and detection of azide 98.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
I635
11 Advances in Sugar Chemistry 11.1 The Search for Chemical Probes to Illuminate Carbohydrate Function
Laura L. Kiessling and Erin E. Carlson
Outlook
Until the 1970s, it was believed that the major cellular functions of carbohydrates were confined to their use as structural elements or energy sources. Since then, evidence that glycoconjugates function in many diverse roles has led to an increased appreciation of these biomolecules. Saccharides act as information carriers and effect many signaling events, cell-cell communication, cell adhesion, differentiation, inflammation, and tumor cell metastasis [ 1-31. Moreover, defects in the production of glycoconjugates cause a series of human diseases referred to as congenital disorders of glycosylation (CDG) [4,51. In prokaryotes, carbohydrates are essential constituents of bacterial cell walls; consequently, agents that block their incorporation can function as novel antimicrobials. These examples underscore the value of understanding glycoconjugate biosynthesis and function for human health. 11.1.1 Introduction
One barrier to understanding glycoconjugates is that their function is often only manifested in the context of the organism or physiologically relevant environment. For example, the loss of a glycosyltransferase can have no effect on eukaryotic cells grown in culture, but can have significant effects on the organism [GI. Similarly, some glycoconjugates in prokaryotes are likely to function only in pathogenesis [7]. Genetic approaches, such as the generation of knockout mice or the application of RNA interference, can illuminate Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
636
I J Advances in Sugar Chemistry
I the physiological roles of glycoconjugates. The use of compounds that block a specific protein-carbohydrate interaction or that inhibit the biosynthesis of carbohydrates would facilitate insights complementary to those obtained using only genetic approaches. Here, we will discuss the issues that have complicated the efforts to determine how carbohydrates function, the tools that have been developed to enhance our understanding, and the advances in the generation and identification of chemical agents to probe carbohydrate function. 11.1.2 History and Development 11.1.2.1
Protein-Carbohydrate Interactions
The diversity of carbohydrate structures is a major obstacle to understanding their function. Carbohydrate units can be connected in a variety of ways to afford many structural isomers and therefore thousands of polysaccharides. Moreover, complex glycoconjugates can be further elaborated via enzymes that add acyl, sulfate, and phosphate groups to sugar heteroatoms. Even subtle differences in functional group display can result in drastic differences in the specific proteins that recognize oligosaccharides and the affinity of the resulting complexes. Biologically relevant glycoconjugates are synthesized by the actions of many different enzymes. Activated sugar-nucleotide substrates are produced by nucleotidyltransferases and utilized by glycosyltransferases for the addition of carbohydrate units to a growing saccharide chain. Finally, glycoconjugates can also be heterogeneous. An oligosaccharide structure can differ depending on environmental conditions, cell type, and other variables. Thus, it is often very difficult to ascertain the substrate(s) of a carbohydratebinding protein or enzyme of interest. Advances in chemistry are facilitating the identification of glycoconjugate ligands for target lectins of interest. Access to improved mass spectrometers and new methods for sample preparation and analysis are critical for determining the oligosaccharide sequences of physiological glycoconjugates [8,91. Additionally, the advent of new methods for carbohydrate and glycoprotein synthesis provide access to homogeneous samples of oligosaccharides and glycoproteins [ 10-201. Glycoarrays, which can be generated using isolated glycoconjugates or chemically synthesized oligosaccharides or glycoconjugates, have emerged as useful tools to rapidly assess lectin specificity [21-241. Increasing access to these complex biomolecules has dramatically improved our ability to probe protein-carbohydrate interactions. Chemical approaches also have fueled the development of new assays to study protein-carbohydrate interactions. The features of these interactions mandate such novel strategies. When extracellular carbohydrate-binding events are investigated in solution, the resulting complexes are often of low
71. I
The Searchfor Chemical Probes to Illuminate Carbohydrate Function
affinity. Specifically, monovalent protein-carbohydrate dissociation constants to are typically on the order of M, and many protein-carbohydrate interactions are multivalent [25-291. Both these factors complicate the evaluation of the binding kinetics and thermodynamics of these proteins. Thus, assay selection can be crucial for identifying those compounds that interfere with the target process under physiologically relevant conditions. For example, selectins are involved in the attachment and rolling of leukocytes on the vascular surface. Like most proteins that bind carbohydrates, their affinity for the monovalent tetrasaccharide is very weak (& 1 mM) thus a multimeric carbohydrate display is necessary for effective assay development. Additionally, because the selectins act under blood flow, static assays can provide dramatically different results than the more physiologically relevant flow-based methods [30]. Unfortunately, assays that closely mimic physiological conditions are often of very low throughput and are therefore not ideal for ligand discovery. Despite these challenges, a number of techniques have been applied successfully to study either monovalent or multivalent protein-carbohydrate interactions, including fluorescence anisotropy assays 131- 371, enzyme-linked immunosorbant assays (ELISA),cell agglutination assays, and surface plasmon resonance studies 138-401. Carbohydrate affinity screening using derivatized latex beads, magnetic particles, agarose, or Sepharose resins is another promising approach [41]. As mentioned previously, glycoarrays are useful new tools, and have been used to characterize protein-carbohydrate [23, 42-45] and enzyme-carbohydrate [39, 46-48] interactions and to examine the adhesion properties of hepatocytes, leukocytes [49], and bacteria [SO]. These successes emphasize the value of high-throughput assay formats for elucidating carbohydrate function. Many of these techniques have also been successfully utilized for the identification of inhibitors of protein-carbohydrate interactions. ELISAs are often employed, and their utility is illustrated in the efforts to identify selectin inhibitors 151-531. However, other assays such as fluorescence polarization [33, 541 and carbohydrate affinity matrices [55] have also been utilized. These methods have provided valuable information about the substrate specificity of a number of carbohydrate-binding proteins and led to the discovery of many useful inhibitors. Still, there remains an acute need to develop effective high-throughput assays. Another key issue for identifying inhibitors of protein-carbohydrate interactions is the lack of information on what types ofcompounds might target lectins. Oligosaccharides that resemble the known or putative physiological ligand are the logical starting point, yet these compounds have drawbacks. They are typically polar, have low binding affinities, and often lack specificity (i.e., they interact with many related lectins). Moreover, even with advances in carbohydrate synthesis, it is difficult to rapidly assemble oligosaccharide derivatives to optimize their potency. These challenges are also apparent in the efforts to block carbohydrate-modifying enzymes.
-
I
637
638
I 11.1.2.2
1 1 Advances in Sugar Chemistry
Carbohydrate-modifyingEnzymes
For understanding glycoconjugate function, a complementary strategy to inhibiting protein-carbohydrate interactions is to block glycoconjugate assembly. The development of strategies for high-throughput analysis of enzymes that act on carbohydrates or glycoconjugates is also challenging. Many such enzymes use identical or similar glycosyl donors; therefore, it is difficult to determine their acceptor specificity. For example, there are several hundred glycosyltransferases in humans [SG], and many utilize similar or identical sugar-nucleotide substrates. Traditional proteomics-based strategies, such as two-dimensional gel electrophoresis [57] and isotope-coded affinity tags [58], can provide valuable information about protein abundance. However, these experiments give no information about enzyme activity levels or substrate specificity. The need for this information has prompted the development of several new strategies [59]. Recently, Pohl and coworkers reported an assay based on mass spectrometry for the study of a rabbit muscle phosphorylase [GO] and sugar nucleotidyltransferases from yeast and Escherichia coli [ G l ] . This group also has reported the design of a library of mass-differentiated substrates to examine the substrate specificity of glycosidases [G2]. Additionally, several research groups have used sugar derivatives to directly label and detect active enzymes [G3-G5]. Finally, carbohydrate microarrays have also been utilized to explore the substrate specificity of carbohydrate-utilizing enzymes [39, 481. These tools have provided valuable information about the activity and specificity of carbohydrate-utilizing enzymes. This knowledge is fundamental and also critical for developing assays that monitor the activities of biosynthetic enzymes. Another issue facing those interested in developing inhibitors of glycoconjugate biosynthesis is what types of compounds to test as inhibitors. Many known ligands for carbohydrate-utilizing enzymes are transition state analogs. For example, imino sugars are commonly used to mimic oxocarbenium ions that serve as intermediates in glycosidase or glycosyltransferase reactions [GG]. Although transition state analogs have provided important information about many enzymes, they often lack selectivity for the target of interest. Moreover, few of these compounds are cell permeable. Unlike protein-carbohydrate interactions that occur on the surface of the cell, carbohydrate processing occurs within the cell, necessitating the development of cell-permeable ligands to investigate the roles of carbohydrate-utilizing enzymes within an organism. Genetic knockout animals and human genetics have uncovered new and unexpected roles for enzymes that participate in glycoconjugate biosynthesis [GI. Some of these enzymes, however, are essential for early development. Cell-permeable compounds that block these enzymes would offer a number ofbenefits, including the ability to exert temporal control over enzyme function [G7]. One of the most efficient ways to identify "cell-permeable" ligands is through the utilization of high-throughput screens. Identification of inhibitors through this method has been hampered by the lack of effective assays.
1 I . 1 The Searchfor Chemical Probes to llluminate Carbohydrate Function
Recently, however, several such assays have been developed for the study of carbohydrate-enzyme interactions. As mentioned previously, carbohydrate microarrays can be employed for inhibitor identification. Indeed, they were recently used by Wong and coworkers to identify fucosyltransferase inhibitors from a small library (85 compounds) of triazole-containing compounds [4G, 471. Kiessling [G8] and Walker [G9] have reported high-throughput binding assays that use fluorescence polarization to facilitate the identification of ligands for uridine 5’-diphosphate-galactopyranose mutase (UGM) and MurG, enzymes that utilize nucleotide-sugar substrates and are involved in bacterial cell wall biosynthesis. These assays were used to screen large commercially available small molecule libraries (-16 000 and -49 000 members respectively). The availability of data from high-throughput screens such as these may lead to the identification of key scaffolds for inhibitor design. Such information will guide the development of effective probes for glycobiology.
11.1.2.3
Inhibition Strategies
Given the barriers to identifying inhibitors of protein-carbohydrate interactions and carbohydrate-modifying enzymes, it is perhaps not surprising that most inhibitors studied to date are analogs of natural carbohydrate substrates. Inhibitors based on the sugar scaffold, or “carbohydrate-derived” glycomimetics, have been used extensively to explore saccharide binding events [GG].Many incorporate structural alterations to improve their affinity or stability over the natural sugar substrates (Fig. 11.1-1).Common strategies involve the removal of unnecessary functional groups (e.g., hydroxyl groups) or the addition of hydrophobic or charged groups to alter the polarity of the ligand and, thereby, facilitate additional interactions. Other designs incorporate changes in the pyranose ring to afford compounds with enhanced stability or altered electronic properties, such as imino sugars. Glycosidic linkages have also been replaced with a carbon or sulfur to yield a more stable substrate. Rarely do these kinds of changes result in high potency inhibitors. Still, multiple iterative rounds of synthesis, testing, and redesign have resulted in the generation of effective ligands. An alternative approach to the use of monomeric sugar mimics is to employ multivalent displays of either natural saccharides or glycomimetics. A number of reviews describe the generation and applications of multivalent ligands [28, 29, 40, 70-741. There are several key variables that determine the activity of synthetic multivalent ligands: epitope valency and density; and the arrangement and flexibility of the binding epitopes [29, 751. For example, well-defined, low-molecular-weight dimeric sugar displays have been employed as have more complex, high-molecular-weight ligands that display hundreds of copies of a binding epitope. Structurally distinct scaffolds have been used to create multivalent carbohydrate displays including dendrimers, linear polymers, neoglycoproteins, and polydisperse polymers.
I
639
640
I
I 1 Advances in Sugar Chemistry
Heterocycle
H $ HO
L
O
H
olf'&-
HHO O OH
q HO
8P-OR 0-
Fig. 11.1-1
Common strategies for glycomimetic design
The biological processes that have been studied using these ligands range from virus interaction with host cells [73, 76, 771, bacterial toxin binding [78, 791, and adhesion of leukocytes to endothelial cells [30, 74, 80, 811. Thus, multivalent ligands have been used to explore protein-carbohydrate interactions, and they often serve as potent inhibitors. The identification of monovalent ligands of modest affinity can be leveraged to create multivalent probes. As the aforementioned examples highlight, most efforts to inhibit either protein-carbohydrate interactions or the enzymes responsible for glycoconjugate biosynthesis have focused on the utilization of carbohydrates and their derivatives. Many of the available compounds, however, are not optimal for studies in cells or organisms because they have low binding affinity and selectivity, poor metabolic stability, and limited cell permeability. Additionally, the synthesis of carbohydrate derivatives can be difficult and labor intensive, and many iterations may be required to improve the activities of the typical low-affinity carbohydrate leads. Therefore, attention has recently turned to the design of compounds that are not derived from carbohydrate building blocks. This review highlights the development of noncarbohydrate-like ligands to study the physiological roles of carbohydrates. First, we discuss the approaches to examine lectins, receptors that use sugar-binding interactions to facilitate cell adhesion or cell signaling events. In conjunction with our overview of glycomimetics that block protein-carbohydrate interactions, we also discuss strategies to develop inhibitors of carbohydrate-processing enzymes; this section focuses on the enzymes involved in bacterial cell wall biosynthesis because it is an area in which many new advances have been made. Enzymes that utilize sugars and synthesize glycoconjugates unique to pathogens have been identified, and cell-permeable inhibitors can be used to explore their biological roles or validate a potential
11.7
The Searchfor Chemical Probes to flluminate Carbohydrate Function
therapeutic target. The scaffolds identified in this work may also be applicable to the development of probes of other prokaryotic and perhaps even eukaryotic carbohydrate-utilizing enzymes.
11.1.3 General Considerations: Cell-surface Carbohydrate Recognition Interactions 11.1.3.1
S-type Lectins, Galectins
Galectins or S-type lectins, are one of the several classes of mammalian proteins that possess carbohydrate recognition domains (CRDs) [82, 831. There are 14 known galectins (two families, galectin-1 and galectin-3) that possess one or more CRD [25]. Galectins bind j3-galactosides in a shallow and highly solvent-exposed binding pocket (Fig. 11.1-2(a)) [84]. They are closely related in structure to plant lectins, such as concanavalin A. Most and possibly all galectins are functionally multivalent; members of this class either possess two CRDs within one peptide chain or form dimers or higher order oligomers. These lectins are involved in cell growth, differentiation and apoptosis, cell adhesion, chemoattraction, and cell migration [85]. Galectins are produced in the cytosol and then secreted, differentiating them from traditional lectins, which are often membrane-bound. These lectins lack a transmembrane segment or a secretion signal peptide, making their mechanism of secretion especially intriguing [%I. Consistent with their ability to occupy two different cellular locations, galectins appear to have important intracellular as well as extracellular functions. They have been implicated in RNA splicing, apoptosis, and the cell cycle [87, 881, although their intracellular roles remain obscure. Many questions about the roles and actions of galectins are unanswered, and glycomimetics could serve to resolve them. To probe galectin function, several groups have used mutagenesis [89, 901, X-ray crystallography [91, 921, and modeling [93] to design galectin inhibitors. It has been determined that the galactose 4- and 6-position hydroxyl groups form hydrogen bonds to the protein, while H I , H3, H4, and H5 form a hydrophobic patch that is in van der Waals contact with a tryptophan side chain. The remaining hydroxyl groups do not directly contact the protein, suggesting that these positions can be functionalized without the loss of activity. Indeed, a structure of the galectin-3 and Nacetyllactosamine complex, which was determined by X-ray crystallography [84], suggests that derivatization of the galactose 3-OH and 4-OH positions could afford higher affinity ligands (Fig. 11.1-3) [52]. Since the 4-OH is involved in a critical interaction with the protein, it was hypothesized that functionalization at the C3-position would yield more effective inhibitors. This hypothesis is consistent with the affinity of some complexes of galectins and natural j3-galactosides that possess carbohydrate residues at the 3position [93].
642
I
1 1 Advances in Sugar Chemistry
Fig. 11.1-2
(a) Structure of S-type lectin, galectin-3, b o u n d t o N-acetyllactosamine and (b) C-type lectin, E-selectin, b o u n d t o sialyl Lex.
Fig. 11.1-3 Examination of galectin-3 bound t o N-acetyllactosarnine suggests that additional affinity may be gained by derivatization at C3.
Compounds with functional groups at this 3-position have been shown to have affinity for galectins. For example, Sorme et al. synthesized 12 N-acetyllactosamine (LacNAc, 1) derivatives with C3 functionalization. Specifically, they generated a 3‘-deoxy-3‘-aminolactosamine derivative, which could be further modified via N-acylation or N-sulfonylation reactions [52]. These compounds were tested for their ability to bind to galectin-3, a lectin proposed to participate in the formation of the “immunological synapse” between T cells and antigen-presenting cells [94]. Compounds with inhibitory
11. I
Fig. 11.1-4
The Searchfor Chemical Probes t o Illuminate Carbohydrate Function
Natural substrate N-acetyllactosamine 1 and galectin inhibitor 2 .
activity were found, including a derivative approximately 50-fold greater than LacNAc, using a competitive ELISA (the most potent inhibitor 2 is depicted in Fig. 11.1-4, IC50 = 4.4pM) [52]. Structural data derived from the inhibitor-galectin-3 complex was later obtained and utilized along with a fluorescence polarization assay [ 351 to design and identify even more potent LacNAc derivatives (& 1 320 nm, & of 2 = 880 nM) [54]. The resulting compounds exhibit affinity that is more than 2 orders of magnitude tighter than the parent LacNAc. Although these compounds are carbohydrate derivatives rather than glycomimetics, they illustrate that sugar modification can be used to identify inhibitors with much higher potencies than the lead compound. As with most lectins, multivalency is an important component in galectin binding. Accordingly, several groups have generated polyvalent displays of lactose. Although these compounds were more potent than monovalent lactose, their binding was nearly equivalent when compared on the basis of lactose residue concentration [95,96].A recent study identified several peptidic ligands for a galectin [97, 981; however, to date no other noncarbohydrate probes have been utilized for their study. Undoubtedly, such glycomimetics can further our understanding of galectin function.
C-type Lectins, Selectins Many cell-surface interactions are mediated by selectins, which are members of a large class of Ca2-t-dependent sugar-binding proteins known as the C-type lectins. There are a number (>70) of human proteins that contain C-type lectin-like domains (CTLDs). Many of these bind carbohydrates in a Ca2+-dependentmanner, though some appear to bind proteins, lipids, and even sugar moieties in a Ca2+-independentmanner [99]. CTLD-containing proteins that act via carbohydrate-binding include the selectins, mannose-binding proteins (MBPs) and dendritic cell-specific intracellular adhesion molecule-3grabbino-non-integrin(DC-SIGN),all ofwhich are involved in immune system function. C-type lectins are produced either as secreted soluble proteins, such as MBPs, or as transmembrane proteins, such as the selectins and DC-SIGN. In the immune system, C-type lectins mediate both adhesion and pathogen recognition events. MBPs are involved in the recognition of pathogens [loo]; selectins mediate cell-cell adhesion [loll; and DC-SIGN is involved in both 11.1.3.2
1
643
644
1 1 Advances in Sugar Chemistry
I processes [102].Most glycomimetic inhibition strategies reported to date have focused on the selectins. The selectins were first identified in 1989 as proteins that participate in the inflammatory immune response. Three members of this class have been identified; they are designated as E- [103], P- [104],and L-selectin [lo51 with reference to the cell type on which they were discovered (endothelium,platelets, and lymphocytes respectively).The selectins facilitate migration of leukocytes into tissue (Fig. 11.1-5). Although neutrophil recruitment is necessary for an effective immune response, overrecruitment causes widespread tissue inflammation and damage and is the known cause of asthma, septic shock, psoriasis, and rheumatoid arthritis [IOG, 1071. As a result of their biological significance,many researchers have focused on identifying selectin inhibitors. The selectins share many features. Each ofthe selectins has a similar domain structure; they are composed of an N-terminal Ca2+-dependentlectin domain, an epidermal growth factor repeat, and modules similar to those found in certain complement binding proteins [log].In the CRD, human selectins share >SO% homology. Similar to galectins, these proteins bind their substrates in shallow, solvent-exposed pockets (Fig. 11.1-2(b))[log]; however, the tertiary structures of the selectins and the galectins are dramatically different, as is their carbohydrate recognition specificity. Selectins have been shown to bind sialyl LeX(sLeX)or the related sialyl Lea in vitro; yet in physiological settings, each selectin appears to recognize more complex glycoconjugates. Still, the finding that each selectin can recognize these tetrasaccharides has led to their use as blueprints for inhibitor design. Both monomeric and polymeric derivatives of sLeXhave been generated [72, 74, 801. Of these sLex-basedcompounds, the oligomeric and polymeric inhibitors are the most potent. They exhibit significant increases in activity compared to their monovalent counterparts [30, 1101. Despite these potencies,
Fig. 11.1-5 Selectins mediate the rolling ofwhite blood cells, causing them t o adhere t o and then pass through the endothelium toward the site o f infection.
7 7 . J The Searchfor Chemical Probes to Illuminate Carbohydrate Function
the search for high affinity monovalent inhibitors has continued. Specifically, the therapeutic value of selectin inhibitors has prompted considerable effort to develop more conventional “druglike” compounds that block these protein-carbohydrate interactions. Moreover, higher affinity monovalent ligands could be used to generate even more potent multivalent inhibitors.
11.1.4 Applications: Identification o f Inhibitors of Protein-Carbohydrate Interactions
One of the first examples ofthe identification of non-carbohydrate inhibitors of a selectin came from Kondo and coworkers. Although structural data for E- or P-selectin bound to sLeXwas unavailable at the time of their study, they utilized hypothesized interactions between E-selectin and a previously identified sLeX mimetic 3 [111] to devise a pharmacophore 4 (Fig. 11.1-6) [112]. They used this model to perform a high-throughput screen of a commercially available compound database in silico. These studies led first to the identification of lead structures and ultimately to the design of compound 5 (Fig. 11.1-7). Compound 5 was found to inhibit E-, P-, and L-selectin (I& values of 86, 6.1, and 30 yM respectively in an ELISA). In subsequent investigations, analogs of 5 were synthesized to develop structure-activity relationships, and several compounds with the ability to differentially inhibit specific selectins were identified [Sl]. It is not clear, however, whether compound 5 and its congeners are monomeric inhibitors of the selectins. Given their long hydrophobic alkyl substituents, compounds of this class may be acting as aggregates. Still, the results suggest that this general approach could be explored further. In an alternative approach, the Gravel research group utilized a cyclohexane scaffold 6 to generate a mimetic of sLeX(Fig. 11.1-8) [113].As in the previous example, these experiments were performed prior to the publication of structural data for E- or P-selectin bound to sLex.They utilized the unbound structures of these proteins to perform docking experiments with both the
Hydrophobic interaction
Ionic interaction
HOoH Calcium coordination
3
4
Fig. 11.1-6 Comparison ofthe previously identified sLeXmimetic 3 and the pharmacophore design 4.
1
645
646
I
I 1 Advances in Sugar Chemistry 0 VC,,H35
Fig. 11.1-7 L-selectins.
Potent inhibitor S of
E-,P-, and
QCOOH 5
Fig. 11.1-8 Bicyclohexyl mimetic designed t o probe E- and r
d
o
H
P-selectins.
6
natural ligand and their proposed library of compounds. Pivotal to the design of their docking experiments was the hypothesis that interactions between the 2- and 3-OH groups on the fucose unit of sLeXand the calcium ion, and the carboxylic acid group of the sialic acid moiety and Arg97 are essential for binding. Their modeling studies suggested that a bicyclic mimic such as compound 6 , though significantly smaller than the natural ligand, would possess the features necessary to favorably interact with the receptor. Indeed, these molecules did have inhibitory activity comparable to the natural ligand. However, the authors found that both enantiomers (only one of which has a display of hydroxyl groups similar to that of fucose) had the same activity as measured by a cell-based competition assay (ICso = 4.5-7.0 mM). This result, along with the elucidation of the structure of both P- and E-selectins bound to sLeXby X-ray crystallography [log],suggests that their model was only partially correct. Structural data confirms that interactions between the carboxylic acid moiety and Arg97 are important for binding. These data also indicate that it is the 3- and 4-OH groups that are important for substrate binding not the hypothesized 2- and 3-OH groups. This difference likely explains the lack of specificity of compound 6 and its enantiomer. As previously mentioned, high-throughput screening may lead to potent inhibitors of protein-carbohydrate interactions. Some success in the selectin field has been achieved by Slee et al., who identified several potent inhibitors of P-selectin by screening a library of compounds in an ELISA [53]. After initial lead identification, they performed modeling studies that suggested ligand modifications that would enhance the activity of their ligand. They ultimately identified a compound, 7, with very good P-selectin inhibitory activity (ICso = 300 nM) (Fig. 11.1-9). What sites on P-selectin this ligand binds,
1 1 . 1 The Searchfor Chemical Probes t o Illuminate Carbohydrate Function
I
647
Ho2cw N,ci~H33
Fig. 11.1-9 Potent inhibitor of P-selectin and selectin-mediated rolling in uiuo.
H
\ /
/ \
-
7
however, are not apparent. Given that the lectin interacts with glycosylated peptide sequences that contain sulfated tyrosine residues, compound 7 may compete with the peptide sequence. Interestingly, compound 7 bears some structural resemblance to the Kondo inhibitor 5, suggesting that this type of “trimodal” scaffold may be a general selectin inhibitor. As mentioned for the Kondo ligands, it is not clear whether this compound, with its long alkyl substituent, acts as a true monovalent inhibitor. Still, this compound is notable because it was also found to inhibit selectin-mediated rolling in vivo and dramatically reduce inflammation in a mouse peritonitis model. The vast majority of glycomimetic studies have been targeted toward one or two members of each lectin class. Strategies in which the same scaffold can be used to derive specific inhibitors of different members of a large class of proteins are even more powerful. Until recently, general scaffolds for inhibitors of protein-carbohydrate interactions had not been described. In contrast, peptidomimetic scaffolds such as benzodiazepines have been shown to be useful for generating a variety of agonists and antagonists to G-protein coupled receptors [114, 1151. Kiessling and coworkers sought to develop this type of privileged scaffold for use in generating glycomimetics. To ascertain whether such a strategy could be implemented, they targeted C-type lectins. Many C-type lectins bind oligosaccharides that possess a key carbohydrate residue with the axial-equatorial-equatorial hydroxyl orientation in mannose (and L-fucose).While these groups can be essential for binding, substitution at C1 and C6 of the mannosylated (or fucosylated) ligand often varies [116]. Thus, Schuster et al. utilized shikimic acid 8 as a building block to synthesize mannose (fucose)-like compounds 9 (Fig. 11.1-10) 1551. Functionalization of shikimic acid through the conjugate addition of a nucleophile (i.e.,a thiolate) generates a structure that possesses the desired hydroxyl group orientation, while introducing a site of diversity. Further library diversification can be achieved by varying the amino acid substituent at the acid moiety ( R I ) , adding
648
I
1 I Advances in Sugar Chemistry
OWOH
8
H N,0
91
\
Fig. 11.1-10 Shikimic acid 8 can be functionalized to yield a mannose m i m e t i c 9.
9
dithiols (RZ), and subsequently functionalizing the resulting free thiol with alkyl or benzyl bromides (R3). To test this strategy, they synthesized a focused library of 192 compounds, which was screened for inhibition of MBP. From this small library, they identified 10 compounds with activity comparable to or better than the known ligand, a-methyl mannopyranoside (IC50 = 4-14 mM). The high hit rate underscores the utility of this strategy.
11.1.5 Overview and Future Development: Inhibition of Protein-Carbohydrate Interactions
While several researchers have identified non-carbohydrate ligands for protein-carbohydrate interactions, this research area is still in its infancy. To date, most inhibitors of protein-carbohydrate interactions are based on the structure of the natural ligand. As highlighted in the above examples, rational design has facilitated the identification of effective inhibitors. Several research groups have developed non-carbohydrate probes of the selectins using design strategies aimed at mimicking key interactions between the natural ligand and the calcium ion. Nevertheless, the potent glycomimetics have features that suggest that they may not be functioning as monomeric inhibitors. Still, it is intriguing that the inhibitors identified in a high-throughput screen share common features with those developed through rational design. Together, these results emphasize the need to develop high-throughput screens that can be used to identify new scaffolds. One of the problems in optimizing glycomimetics is that it is often difficult to rapidly synthesize variants of the lead compounds. Thus, it is critical that additional privileged glycomimetic scaffolds be developed that can be readily diversified.
11.1.6 General Consideration: Inhibitors o f Sugar- Nucleotide-binding Enzymes
Prokaryotes and eukaryotes devote significant resources to the biosynthesis of glycoconjugates. As a result of recent genome-sequencing projects, more
7 1 . I The Searchfor Chemical Probes to Illuminate Carbohydrate Function
than 7200 glycosyltransferase-related sequences have been identified. This accounts for about 1% of the open reading frames in a given organism [56]. Nucleotide-sugar substrates are used by organisms to make multisaccharide units or to perform posttranslational modifications. The enzymes that facilitate these processes primarily belong to the glycosyltransferase family; however, a number of other enzymes catalyze critical reactions that depend on these substrates. Given the many different known roles they play, studying the function and mechanism of these biosynthetic enzymes is of critical importance. The lack of cell-permeable inhibitors of these enzymes, however, is a barrier. The importance of glycoconjugate biosynthesis is underscored by the number of human diseases that are associated with mutations in the biosynthetic machinery. For example, reduction in the activity of 8-1,4galactosyltransferase-T1 [ 1171 appears to be a factor in rheumatoid arthritis, and loss of ~-1,4-galactosyltransferase-T7 [l181 activity has been implicated in progeroid-type Ehlers-Danlos syndrome [ 1191. Additionally, there are a growing number of congenital disorders of glycosylation (CDGs), multisystemic diseases caused by defects in the synthesis and processing of N-linked glycans [4, 51. These glycans are involved in protein-carbohydrate interactions vital to normal function such as T-cell clustering through the galectins [94]. Moreover, N-linked glycoconjugates are thought to play crucial roles in protein folding, localization, and half-life [120, 1211. The study of N-glycan biosynthesis in mammals is complicated because when key glycosyltransferases are knocked out in mice, embryonic lethality can result [122, 1231. Thus, the temporal control offered by chemical inhibitors could have a major impact on our understanding of enzymes that mediate glycan biosynthesis. There are several natural products that block glycoconjugate biosynthesis, including the N-glycosylation inhibitor tunicamycin 10 and glycosidase inhibitors such as castanospermine and deoxymannojirimycin (Fig. 11.1-11). These probes have proved valuable in a number of studies. Tunicamycin blocks a key transphosphorylation event required for N-glycan biosynthesis, which is initiated by the assembly of N-acetylglycosamines (GlcNAc)and a dolichol pyrophosphate [ 1241. Tunicamycin inhibits this assembly and has therefore been useful in the study of N-glycoprotein deficiency effects [125]. Most studies
10
HOI
\OH
1-Deoxymannojirimycin
Fig. 11.1-11 Tunicarnycin 10 is an inhibitor o f N-glycan biosynthesis, while natural products such as 1-deoxyrnannojirirnycin inhibit glycosidase function.
I
649
650
I using this compound have been performed in cell culture, but there have been 7 7 Advances in Sugar Chemistry
several reports on the effects of tunicamycin treatment on sea urchin [126], Xenopus [127], and chick embryos [128], and recently Caenorhabditis elegans [129]. These studies indicate that tunicamycin has a dramatic effect during early development; thus, they highlight the utility of small molecule probes for investigating carbohydrate-processing enzymes. Similarly, the natural product glycosidase inhibitors can also be used to investigate glycoconjugate biosynthesis [130, 1311. These agents, such as deoxymannojirimycin, function as transition state inhibitors; their charged amino groups presumably mimic the charge distribution of an oxocarbenium ionlike transition state. Thus, nature has provided two general inhibitor strategies, both of which depend on carbohydrate derivatives. These have spawned many efforts to design bisubstrate inhibitors or transition state analogs. Still, many of the resulting inhibitors have low cell permeability or lack the specificity required to target a single enzyme. Thus, efforts to develop new inhibitor strategies are being sought. Many of these new inhibitor strategies have emerged from studying glycoconjugate biosynthesis in prokaryotes. Indeed, sugar-nucleotide-utilizing enzymes are of critical importance in prokaryotes. All bacteria (gram-positive, gram-negative, and acid-fast or mycobacteria) utilize sugar-nucleotide substrates for the construction of their cell walls. The carbohydrate-containing cell wall acts as a formidable barrier against cellular destruction, and compromised structural integrity can result in the loss of cell viability [132, 1331. Despite the differing compositions of these cell walls, many of the same enzymes are involved in their biosynthesis. Peptidoglycan is the most well studied of these crucial cell wall components and its synthesis is a common target of antibiotics [134, 1351. Unfortunately, traditional antibiotics have begun to lose their effectiveness due to the emergence of antibiotic-resistant strains of many gram positive, gram negative, and mycobacteria [ 134- 1361. Common antibiotics do not target enzymes that mediate carbohydrate biosynthesis. Thus, the development of ligands to study and inhibit these enzymes may facilitate the development of new antimicrobial agents. The structurally complex mycobacterial cell wall provides an example of the importance of carbohydrate residues (Fig. 11.1-12).In mycobacteria, the inner lipid membrane is attached to a peptidoglycan layer, which is composed of a complex structure of peptides and sugar moieties. The peptidoglycan layer is tethered to an arabinogalactan layer through a rhamnose-GlcNAc sugar linker. This rhamnose-GlcNAc disaccharide is found only in bacteria and its biosynthetic enzymes have been shown to be essential for mycobacterial growth [134, 1371.The arabinogalactan layer also contains sugar residues that have not been found in humans, specifically arabinofuranose and galactofuranose [ 135, 1381. The enzymes involved in synthesis of the arabinogalactan are necessary for mycobacterial viability [139-1411. Thus, development of inhibitors of the key biosynthetic enzymes would provide valuable therapeutic leads and useful probes of cell wall biosynthesis.
I I. I The Searchfor Chemical Probes to Illuminate Carbohydrate Function
Fig. 11.1-12 Mycobacterial cell wall components.
Efforts to generate inhibitors of glycan biosynthesis suggest that agents that function in this capacity can be identified [66, 1421. Interestingly, much of the binding affinity for the natural sugar-nucleotide substrate arises from nucleotide-protein interactions. Thus, inhibitors that exploit this binding region should be more potent. Here, we will describe recent advances in understanding and inhibiting enzymes that use nucleotide-sugar building blocks for glycoconjugate biosynthesis. To date, several efforts have focused on enzymes critical for bacterial cell wall assembly. These general strategies will undoubtedly be useful for investigating eukaryotic glycoconjugate biosynthesis pathways as well.
11.1.7 Applications: Identification of Inhibitors of Sugar- Nucleotide-binding Enzymes
11.1.7.1 Probe Identification through High-throughput Screening The recent development of high-throughput screens for several sugar-nucleotide-processing enzymes has aided in finding inhibitors of glycan biosynthesis. An example of the utility of this approach is highlighted by the identification of compounds that block members of the Mur family that use nucleotide-sugar substrates. The Mur family of enzymes mediates peptidoglycan construction in eubacteria (Fig. 11.1-13).MurC, MurD, MurE, and MurF are involved in the formation of peptide bonds, and inhibitors for these enzymes have been reviewed recently [143, 1441. In this chapter, we will focus on the enzymes that utilize sugar-nucleotide substrates including MurA ( U DP-N-acetylglucosamine enolpyruvyltransferase), MurB (UDP-N-acetylenolpyruvylglucosamine reductase), and MurG (glycosyltransferase). MurA is the enzyme responsible for the first committed step in bacterial cell wall biosynthesis, catalyzing the transfer of phosphoenolpyruvate (PEP) to position 3 of UDP-N-acetylglucosamine (Fig. 11.1-13). As with tunicamycin,
1
651
652
I
7 7 Advances in Sugar Chemistry
Fig. 11.1-13
Peptidoglycan synthesis proceeds through a complex set o f reactions, primarily glycosyl-transfers and amide bond-forming reactions.
nature has generated an inhibitor of this enzyme: the natural product antibiotic, fosfomycin 11. Fosfomycin covalently labels a cysteine residue in the PEP binding site of MurA and renders the enzyme inactive [145].A structure of the MurA-fosfomycin complex, determined by X-ray crystallographic analysis, has provided valuable information about the complex [146]. Moreover, it has been utilized in the design of inhibitors of this sugar-nucleotide-processing enzyme [143, 1461. Several research groups have reported the identification of non-carbohydrate inhibitors of MurA [146-1481. For example, Bush and coworkers identified inhibitors using a high-throughput screen of a library of compounds in an assay that monitored formation of inorganic phosphate (Fig. 11.1-13)[148].Three of the identified inhibitors exhibit lower ICso values than does fosfomycin 12- 14 (Fig. 11.1-14). Modeling and inhibition studies were used to determine the likely binding mode of these compounds. These data suggest that the identified inhibitors are noncovalently binding at or near the PEP binding site, leaving the sugar site unoccupied. These compounds are not glycomimetics, yet they suggest that targeting unique features of the sugar-nucleotide-binding site can lead to potent inhibitors. High-throughput screening techniques have also been utilized to identify inhibitors of MurG, a glycosyltransferase that mediates one of the final steps of peptidoglycan synthesis (Fig. 11.1-13) [69]. Rather than assaying for activity, the Walker group screened for compounds that could inhibit binding of the substrate UDP-GlcNAc. With a fluorescence polarization assay, they tested a commercially available library of approximately 49 000 druglike compounds, and identified several MurG inhibitors containing a 2-thioxo-4-thiazolidinone core (15, Fig. 11.1-15, K, = 1.3 pM, ICso = 1.4 pM) [69, 1491. Using the MurG structure determined by X-ray crystallography [150], they modeled the complexes to explore the possible binding mode(s) of
?h
1 7 . 1 The Searchfor Chemical Probes to Illuminate Carbohydrate Function
Fig. 11.1-14 Fosfornycin 11 and several other MurA inhibitors 12-14.
- 0.r-
0
S
0
0
11
12
pp:h% 13
14
and Fig. coworkers. 11.1-15 MurC inhibitor identified by Walker
15
this scaffold. The authors suggest that the thiazolidinone heterocycle could mimic the diphosphate moiety by engaging in hydrogen-bonding interactions. Presumably, the carbonyl (and carbonyl-like) moieties of the heterocycle interact with hydrogen-bond donors on the protein. The studies also suggest that the thiazolidinone substituents interact with the uridine and sugar-binding regions of the protein. More recently, these inhibitors have been shown to selectively block MurG over several other enzymes that utilize similar or identical substrates [149]. Inhibitors of enzymes that use sugar-nucleotide substrates have also been found in the pathway that leads to arabinogalactan synthesis in mycobacteria [151, 1521. Arabinogalactan is composed of two sugars derived from the donors, UDP-arabinofuranose and UDP-galactofuranose (UDP-Galf). The biosynthetic donor of galactofuranose moieties (UDP-Gar) is synthesized by UGM and the Gay-containing oligosaccharides are assembled by the putative enzyme, UDP-galactofuranosyltransferase. The most efforts to explore G a y incorporation have focused on UGM. UGM is responsible for the isomerization of the thermodynamically favored UDP-galactopyranose to the less favored UDP-galactofuranose (Fig. 11.1-16). Sugar-based probes have been employed to study both UGM [153-156] and the transferase [ 1571, but only recently have non-carbohydrate inhibitors been identified. The Bertozzi and McNeil groups used a design strategy that appears similar to that used by nature for tunicamycin. Specifically, they modified a uridine with substituents. From their uridine-based library,
1
653
654
I
1 I Advances in Sugar Chemistry
0
UDP-galactopyranose HO
93%
UGM
HO&
8
?
0-
0-
o-yo-yo
bH
bH
UDP-galactofuranose
OH
HO
OH
7%
Fig. 11.1-16 UGM is responsible for the isomerization of the UDP-galactopyranose t o UDP-galactofuranose.
18 Fig. 11.1-17
Recently identified inhibitors of UCM.
they identified several inhibitors of UGM [158]. Although the results are promising, the initial hits did not appear to be cell permeable. It will be interesting to explore the specificity of such ligands. Although it is not clear whether one can achieve selectivity against other UDP-sugar binding enzymes with this strategy, tunicamycin acts selectively on its target. The Kiessling group pursued an alternative approach. Although the assay used by Bertozzi and McNeil monitored UGM activity, a high-throughput fluorescence polarization-binding assay was used by Soltero-Higgin et al. to identify UGM inhibitors. As with the Walker screen, the hits identified contain a thiazolidinone or related nitrogen-containing heterocyclic core (16-18, & 2 4.0 yM, ICso 2 1.6 yM) (Fig. 11.1-17) [68]. It is intriguing that these compounds have structural features similar to those identified for MurG. These shared features include the five-membered ring heterocycle and the 1,3-arrangement of the substituents. This display of functionality likely facilitates interactions with the sugar-nucleotide-binding regions of the protein. Unlike the most potent MurG lead 15, which displays one aromatic and one aliphatic substituent, all the UGM inhibitors contain
11. I
The Searchfor Chemical Probes to llluminate Carbohydrate Function
two aromatic substituents. Both MurG and UGM inhibitors possess an aromatic group, which may serve as a uracil mimetic. The second aromatic functionality in the UGM inhibitors may act as a sugar mimic and/or participate in hydrophobic interactions with the enzyme's cofactor, flavin adenine dinucleotide (FAD). Soltero-Higgin et al. found that their most potent compound shows selectivity; it blocks UGM activity but has little or no effect on an a-l,3-galactosyltransferase.These leads, along with recently reported information about the mechanism ofthis enzyme [159],can be used to develop even more potent inhibitors. Several inhibitors of the rharnnose synthetic pathway, a recently validated mycobacterial target [137], have been also identified through the use of a high-throughput screen [ 1601. McNeil and coworkers developed a microtiter plate-based assay and examined 8000 commercially available compounds. They identified 11 compounds that were active against one or more of the enzymes in the rhamnose biosynthetic pathway (RmlB, RmlC, and RmlD, percentage inhibition at 10 pM = 39-97%). Additionally, four of these molecules were found to hinder Mycobacteriurn tuberculosis growth in culture (minimum inhibitory concentration or MIC = 16-128 pg mL-'). One of the most intriguing findings of these studies is the structural similarity of the inhibitors identified. Interestingly, two of the four molecules contained the same thiazolidinone core found by Walker and Kiesshg (15 and 16).
11.1.7.2
Design o f Effective Probes
The heterocycles identified as inhibitors of sugar-nucleotide-binding enzymes are similar to these identified by researchers employing design strategies. Andres et al. successfully utilized known structural information on MurB (Fig. 11.1-13) complexed to its substrate, enolpyruvyl uridine diphosphate N-acetylglucosamine (EP-UNAG), to design effective inhibitors [161]. Their design incorporates a heterocyclic core decorated with three substituents 19. They hypothesized that the acid functionality would provide ionic interactions equivalent to that of the natural diphosphate substrate. Using an assay in which enzyme activity was monitored by spectrophotometric detection of cofactor (NADPH) consumption, they identified six inhibitors. The most effective of these (19, Fig. 11.1-18)has good potency, IC50 = 7.7 pM. This success attracted the attention of other research groups, prompting the development of several structurally related inhibitors. Snyder and coworkers utilized the best hit from the aforementioned study 19 to design compounds with a more rigid core structure 20 [162].The authors point out that the scaffold used by Andres et al. was generated as a mixture of four diastereomers. They theorize that utilization of bioisosteric imidazolinone analogs, a core structure that does not contain any stereogenic centers, might provide compounds with potent activity without the difficulties associated with the biological analysis of diastereomeric mixtures. The resulting compounds not only had high inhibitory activity against MurB but also showed significant whole cell
I
655
656
I
7 I Advances in Sugar Chemistry
c+
0'
19
0'
? I + 20
Fig. 11.1-18 Potent MurB inhibitors developed bywalsh 19 and Snyder 20.
antibacterial activity, which compound 19 did not have. One of the most potent inhibitors 20 (IC50 = 15 pM, MIC = 4 pg mL-') is depicted in Fig. 11.1-18. To identify probes of rhamnose biosynthesis, Lee and coworkers developed an in silico library of 3888 compounds that were based on heterocycle 19. The authors selected RmlC, as they believed it to be the best drug target in this biosynthetic cascade. It has high substrate specificity, a unique structure, and lacks a cofactor binding site. They docked these compounds into the active site of RmlC and selected compounds with the best affinity (the top 5%) for synthesis. They reported the synthesis of 47 of the 144 prospects (each of the 47 compounds was synthesized as the esterified and free acid forms, for example, 21 and 22 in Fig. 11.1-19).Although they did not find any compounds that potently inhibit bacterial growth, they were able to identify molecules 21 and 22 that can differentiate between two similar enzymes, RmlC and RmlD (Fig. 11.1-19) [163]. This result provides additional evidence that selective inhibitors of nucleotide-sugar-processing enzymes can be discovered. To identify inhibitors of several Mur enzymes, Mansour and coworkers synthesized a small library (-50 members) of urea- or carbonate-containing
Fig. 11.1-19
Inhibitors ofthe rhamnose biosynthetic pathway.
'n 1
1 1 . 1 The Searchfor Chemical Probes t o Illuminate Carbohydrate Function
F
/
N H
Fig. 11.1-20 The most potent urea-containing inhibitor of MurA and Band bacterial growth.
A Y C N S
N H
23
compounds. They discovered several effective in uitro inhibitors of both MurA and MurB [ 1641. Moreover, some compounds showed good antimicrobial activity against several gram-positive bacteria. The core of each of the most potent inhibitors contains a urea moiety (the most potent inhibitor, 23, is depictedin Fig. 11.1-20,MIC = 0.5-64 pg mL-', IC50 for MurA>25 pg mL-I, MurB = 19 pg mL-') [164]. Modeling studies (using MurB structural data) suggest that these compounds occupy regions of the binding site spanned by both the nucleotide and sugar portions of the substrate. The authors propose that the urea occupies the phosphate-binding region and that a strong hydrogen bond is formed between the carbonyl oxygen and an active site, lysine. Additionally, they suggest that the two aromatic moieties could be occupying the sugar and the nucleobase binding sites. Structural data are needed to determine how these compounds are oriented within the binding site.
11.1.8 Overview and Future Development: Inhibitors o f Carbohydrate-processing Enzymes
Despite the relatively small number of studies that have identified noncarbohydrate inhibitors of sugar-nucleotide-processing enzymes, it is apparent that structural commonalities exist between these inhibitors (Fig. 11.1-21). Some authors have suggested that these core structures may be acting as electronic mimics of the diphosphate through hydrogen-bonding interactions with their protein-binding partners. It is also possible that these core elements are simply effective spatial mimics of the diphosphate moiety. The oriented display of substituents of these heterocyclic scaffolds appears to be conserved throughout the currently developed probes, suggesting that the spatial orientation enforced by these core elements is at least partially responsible for the inhibitory activity of these compounds. Undoubtedly, much will be learned from the continued pursuit of molecules based on these and similar core structures. While the identification of these core structures suggests a promising direction for generating inhibitors of glycan biosynthesis, it also suggests a potential problem. Specifically, given the aforementioned similarities between these probes, it may be difficult or impossible to achieve selectivity for targeting one enzyme over another. While this problem may arise, the current data suggest that selective inhibitors can be developed. For example, despite
I
657
658
I
J 7 Advances in Sugar Chemistry
0
Fig. 11.1-21 Several structurally and/or electronically related scaffolds have been identified.
the large similarities between the MurG and UGM inhibitors presented here, both the Walker and Kiessling groups report selective inhibition of their target enzyme over related proteins [68, 1491. Thus, it seems likely that these common core structures can be diversified to yield selective inhibitors of many different sugar-nucleotide-utilizing enzymes. It is also possible that information acquired from the study of bacterial sugar-processing enzymes will provide clues for the development of probes for eukaryotic enzymes that mediate glycan biosynthesis. In addition to its role in bacterial cell wall biosynthesis, UGM is also found in eukaryotic parasites, such as Leishmania, and multicellular organisms, such as C. elegans [ 1651. Therefore, the thiazolidinone-based inhibitors identified for a bacterial UGM could be tested for efficacy in a eukaryotic system. It will be intriguing to determine whether these scaffolds or others will be identified as hits from screens with eukaryotic enzymes. We anticipate that with the advent of cell-permeable probes of glycan biosynthesis, a greater understanding of the roles of these enzymes in human disease will emerge. 11.1.9 Conclusion
Elucidating the biological roles of glycoconjugates is difficult. Using genetics, molecular biology, biochemistry, and chemistry, compelling evidence has emerged that glycoconjugates control fundamental processes ranging from developmental patterning [ 1661 to immune system function [167]. Despite the power of current tools, inhibitors that can be used to explore key interactions or biosynthetic pathways are largely lacking from our armamentarium. Still, significant progress has been made toward the identification of potent inhibitors of glycan biosynthesis and their utilization for understanding carbohydratebinding proteins and enzymes. Key elements enabling this progress are the development of effective high-throughput assays and advances in chemical syntheses, which provide access to defined carbohydrate substrates. It is intriguing that common inhibitor structures have emerged from these studies, suggesting that some scaffolds may be well suited to occupy lectin or nucleotide-sugar-binding sites. Undoubtedly, additional scaffolds will be uncovered as more targets are investigated. We envision that the chemical probes that result will provide insight into the biological roles of glycoconjugates.
References I659 References 1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
G.E. Ritchie, B.E. Moffatt, R.B. Sim, B.P. Morgan, R.A. Dwek, P.M. Rudd, Glycosylation and the complement system, Chem. Rev. 2002, 102, 305-31 9. C.R. Bertozzi, L.L. Kiessling, Chemical glycobiology, Science 2001, 291,2357-2364. T. Feizi, Carbohydrate-mediated recognition systems in innate immunity, Immunol. Rev. 2000, 173, 79-88. S . Grunewald, G. Matthijs, J. Jaeken, Congenital disorders of glycosylation: a review, Pediatr. Res. 2002, 52, 618-624. H.H. Freeze, Human disorders in N-glycosylation and animal models, Biochim. Biophys. Acta 2002, 1573, 388-393. J.B. Lowe, J.D. Marth, A genetic approach to mammalian glycan function, Annu. Rev. Biochem. 2003, 72,643-691. M.A. Schmidt, L.W. Riley, I. Benz, Sweet new world: glycoproteins in bacterial pathogens, Trends Microbiol. 2003, 11,554-561. A. Dell, H.R. Morris, Glycoprotein structure determination mass spectrometry, Science 2001, 291, 2351-2356. J. Zala, Mass spectrometry of oligosaccharides, Mass Spectrom. Rev. 2004, 23,161-227. A. Holeman, P.H. Seeberger, Carbohydrate diversity: synthesis of glycoconjugates and complex carbohydrates, Curr. Opin. Biotechnol. 2004, 15,615-622. S.J. Keding, S.J. Danishefsky, Prospects for total synthesis: a vision for a totally synthetic vaccine targeting epithelial tumors, Proc. Nutl. Acad. Sci. U S A . 2004, 101, 11937-1 1942. S. Hanson, M. Best, M.C. Bryan, C.-H. Wong, Chemoenzymatic synthesis of oligosaccharides and glycoproteins, Trends Biochem. Sci. 2004, 29,656-663.
13. D. Kahne, Combinatorial approaches
14.
15.
16.
17.
18.
19.
20.
21.
22.
to carbohydrates, Curr. Opin. Chem. B i d . 1997, I , 130-135. P. Sears, C.-H. Wong, Toward automated synthesis of oligosaccharides and glycoproteins, Science 2001,291,2344-2350. C. Leimkuhler, 2. Chen, R.G. Kruger, M. Oberthur, W. Lu, C.T. Walsh, D. Kahne, Glycosylation of glycopeptides: a comparison of chemoenzymatic and chemical methods, Tetrahedron: Asymmetry 2005, 16,599-603. P. Mowery, Z.Q. Yang, E.J. Gordon, 0. Dwir, A.G. Spencer, R. Alon, L.L. Kiessling, Synthetic glycoprotein mimics inhibit L-selectin-mediated rolling and promote L-selectin shedding, Chem. Biol. 2004, 1 I , 725-732. M.J. Grogan, M.R. Pratt, L.A. Marcaurelle, C.R. Bertozzi, Homogeneous glycopeptides and glycoproteins for biological investigation, Annu. Rev. Biochem. 2002, 71,593-634. Y. He, R.J. Hinklin, J. Chang, L.L. Kiessling, Stereoselective N-glycosylation by staudinger ligation, Org. Lett. 2004, 6,4479-4482. D. Macmillan, A.M. Daines, Recent developments in the synthesis and discovery of oligosaccharides and glycoconjugates for the treatment of disease, Curr. Med. Chem. 2003, 10, 2733-2773. W. Zhang, Fluorous tagging strategy for solution-phase synthesis of small molecules, peptides and oligosaccharides, Curr. Opin. Drug. Discov. 2004, 7, 2269-2272. T. Feizi, W.G. Chai, Oligosaccharide microarrays to decipher the glyco code, Nut. Rev. Mol. Cell Bid. 2004, 5, 582-588. I . Shin, S. Park, M.R. Lee, Carbohydrate microarrays: an advanced technology for functional studies of glycans, Chem. - Eur. J. 2005, 1I , 2894-2901.
660
I
1 7 Advances in Sugar Chemistry 23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
D.M. Ratner, E.W. Adams, J. Su, B.R. O’Keefe, M. Mrksich, P.H. Seeberger, Probing protein-carbohydrate interactions with microarrays of synthetic oligosaccharides, Chembiochem2004, 5, 379-383. 0. Blixt, S. Head, T. Mondala, C. Scanlan, M.E. Huflejt, R. Alvarez, M.C. Bryan, F. Fazio, D. Calarese, J. Stevens, N. Razi, D.J. Stevens, J.J. Skehel, 1. van Die, D.R. Burton, I.A. Wilson, R. Cummings, N. Bovin, C.-H. Wong, J.C. Paulson, Printed covalent glycan array for ligand profiling of diverse glycan binding proteins, Proc. Natl. Acad. Sci. U.S.A. 2004, 101,17033-17038. Y.C. Lee, R.T. Lee, Carbohydrate-protein interactions: basis of glycobiology, Ace. Chem. Res. 1995, 28,321-327. E.J. Toone, Structure and energetics of protein carbohydrate complexes, Curr. Opin. Struct. Bid. 1994, 4, 719-728. L.L. Kiessling, N.L. Pohl, Strength in numbers: non-natural polyvalent carbohydrate derivatives, Chem. Biol. 1996, 3,71-77. R. Roy, Syntheses and some applications of chemically defined multivalent glycoconjugates, Cum. Opin. Struct. Biol. 1996, 6, 692-702. B.E. Collins, J.C. Paulson, Cell surface biology mediated by low affinity multivalent protein-glycan interactions, Curr. Opin. Chem. Biol. 2004,8,617-625. W.J. Sanders, E.J. Gordon, 0. Dwir, P.J. Beck, R. Alon, L.L. Kiessling, lnhibition of L-selectin-mediated leukocyte rolling by synthetic glycoprotein mimics, J . Bid. Chem. 1999, 274,5271-5278. K. Kakehi, M. Oda, M. Kinoshita, Fluorescence polarization: analysis of carbohydrate-protein interaction, Anal. Biochem. 2001, 297,111-122. E.G. Weinhold, J.R. Knowles, Design and evaluation of a tightly binding fluorescent ligand for influenza a hemagglutinin, J . Am. Chem. Soc. 1992, 114,9270-9275.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
G.S. Jacob, C. Kirmaier, S.Z. Abbas, S.C. Howard, C.N. Steininger, J.K. Welply, P. Scudder, Binding of sialyl lewis X to E-selectin as measured by fluorescence polarization, Biochemistry 1995,34,1210-1217. R.V. Weatherman, L.L. Kiessling, Fluorescence anisotropy assays reveal affinities of C- and 0-glycosides for concanavalin a, J. Org. Chem. 1996, 61,534-538. P. Sorme, B. Kahl-Knutsson, M. Huflejt, U.J. Nilsson, H. Leffler, Fluorescence polarization as an analytical tool to evaluate galectin-ligand interactions, Anal. Biochem. 2004,334,36-47. C.T. Oberg, S. Carlsson, E. Fillion, H. Leffler, U.J. Nilsson, Efficient and expedient two-step pyranoseretaining fluorescein conjugation of complex reducing oligosaccharides: galectin oligosaccharide specificity studies in a fluorescence polarization assay, Bioconjugate Chem. 2003, 14,1289-1297. M. Mizuno, M. Noguchi, T. Imai, T. Motoyoski, T. Inazu, Interaction assay of oligosaccharide with lectin using glycosylasparagine, Bioorg. Med. Chem. Lett. 2004, 14,485-490. E.A. Smith, W.D. Thomas, L.L. Kiessling, R.M. Corn, Surface plasmon resonance imaging studies of protein-carbohydrate interactions, J . Am. Chem. Soc. 2003, 125, 6140-6148. B.T. Houseman, M. Mrksich, Carbohydrate arrays for the evaluation of protein binding and enzymatic modification, Chem. Bid. 2002, 9,443-454. D.A. Mann, L.L. Kiessling, in Glycochemistry:Principles, Synthesis, and Applications, 1st ed., (Eds.: P.G. Wang, C.R. Bertozzi), Marcel Dekker, New York, 2001, pp. 221-275. D.M. Ratner, E.W. Adams, M.D. Disney, P.H. Seeberger, Tools for glycomics: mapping interactions of carbohydrates in biological systems, Chembiochem 2004,51375-1383. E.W. Adams, D.M. Ratner, H.R. Bokesch, J.B. McMahon, B.R.
References I 6 6 1
43.
44.
45.
46.
47.
48.
49.
50.
51.
O’Keefe, P.H. Seeberger, Oligosaccharide and glycoprotein microarrays as tools in HIV glycobiology: glycan-dependent gpl20/protein interactions, Chem. Bid. 2004, 11, 875-881. S. Fukui, T. Feizi, C. Galustian, A.M. Lawson, W. Chai, Oligosaccharide microarrays for high-throughput detection and specificity assignments of carbohydrate-protein interactions, Nut. Biotechnol. 2002, 20, 1011-1017. S. Park, M.-r. Lee, S.-J. Pyo, I. Shin, Carbohydrate chips for studying high-throughput carbohydrateprotein interactions, /. Am. Chem. SOC.2004, 126,4812-4819. T. Feizi, F. Fazio, W. Chai, C.-H. Wong, Carbohydrate microarrays-a new set of technologies at the frontiers of glycomics, Cum. Opin. Struct. Biol. 2003, 13, 637-645. M.C. Bryan, L.V. Lee, C.-H. Wong, High-throughput identification of fucosyltransferase inhibitors using carbohydrate microarrays, Bioorg. Med. Chem. Lett. 2004, 14,3185-3188. F. Fazio, M.C. Bryan, 0. Blixt, J.C. Paulson, C.-H. Wong, Synthesis of sugar arrays in microtiter plate,]. Am. Chem. SOC.2002, 124, 14397-14402. H.C. Hang, C. Yu, M.R. Pratt, C.R. Bertozzi, Probing glycosyltransferase activities with the staudinger ligation, /. Am. Chem. Soc. 2004, t26,6-7. L. Nimrichter, A. Gargir, M. Gortler, R.T. Altstock, A. Shtevi, 0. Weisshaus, E. Fire, N. Dotan, R.L. Schnaar, Intact cell adhesion of glycan microarrays, Glycobioloa 2004, 14,197-203. M.D. Disney, P.H. Seeberger, The use of carbohydrate microarrays to study carbohydrate-cell interactions and to detect pathogens, Chem. Biol. 2004, 11,1701-1707. H. Moriyama, Y. Hiramatsu, T. Kiyoi, T. Achiha, Y. Inoue, H. Kondo, Studies on selectin blocker. 9. SARs of non-sugar selectin blocker against E-, P-, L-selectin bindings, Bioorg. Med. Chem. 2001, 9, 1479-1491.
52.
53.
54.
55.
56.
57.
58.
59.
60.
P. Sorme, Y. Qian, P. Nyholm, H. Leffler, U.J. Nilsson, Low micromolar inhibitors of galectin-3 based on 3’-Derivatization of N-acetyllactosamine, Chembiochem 2002,3, 183-189. D.H. Slee, S.J. Romano, 1. Yu, T.N. Nguyen, J.K. John, N.K. Raheja, F.U. Axe, T.K. Jones, W.C. Ripka, Development of potent non-carbohydrate imidazole-based small molecule selectin inhibitors with antiinflammatory activity, J . Med. Chem. 2001,44,2094-2107. P. Sorme, P. Arnoux, B. Kahl-Knutsson, H. Leffler, J.M. Rini, U.J. Nilsson, Structural and thermodynamic studies on cation-11 interactions in lectin-ligand complexes: high-affinity galectin-3 inhibitors through fine-tuning of an ariginine-arene interaction, /. Am. Chem. Soc. 2005, 127,1737-1743. M.C. Schuster, D.A. Mann,T.J. Buchholz, K.M. Johnson, W.D. Thomas, L.L. Kiessling, Parallel synthesis of glycomimetic libraries: targeting a C-type lectin, Org. Lett. 2003, 5, 1407-1410. P.M. Coutinho, E. Deleury, G.J. Davies, B. Henrissat, An evolving hierarchical family classification for glycosyltransferases, ]. Mol. Biol. 2003, 328,307-317. H. Wang, S. Hanash, Intact-protein based sample preparation strategies for proteome analysis in combination with mass spectrometry, Muss Spectrom. Rev. 2005, 24,413-426. S.P. Gygi, B. Rist, S.A. Gerber, F. Turecek, M.H. Gelb, R. Aebersold, Quantitative analysis of complex protein mixtures using isotope-coded affinity tags, Nut. BiotechnoL. 1999, 17, 994-999. N.L. Pohl, Functional proteomics for the discovery of carbohydrate-related enzyme activities, Curr. Opin. Chem. Bid. 2005, 9, 76-81. C.J. Zea, N.L. Pohl, Kinetic and substrate binding analysis of phosphorylase b via electrospray ionization mass spectrometry: a model for chemical proteomics of
662
I
1 7 Advances in Sugar Chemistry
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
sugar phosphorylases, Anal. Biochem. 2004,327,107-113. C.1. Zea, N.L. Pohl, General assay for sugar nucleotidyltransferases using electrospray ionization mass spectrometry, Anal. Biochem. 2004, 328,196-202. Y. Yu, K.4. KO, C. Zea, N.L. Pohl, Discovery of the chemical function of glycosidases: design, synthesis, and evaluation of mass-differentiated carbohydrate libraries, Org. Lett. 2004, 6,2031-2033. C.-S. Tsai, Y.-K. Li, L.-C. Lo, Design and synthesis of activity probes for glycosidases, Org. Lett. 2002, 4, 3607-3610. M. Ichikawa, Y. Ichikawa, A mechanism-based affinity-labeling agent for possible use in isolating N-acetylglucosaminidase, Bioorg. Med. Chem. Lett. 2001, 11, 1769-1773. D.J. Vocadlo, C.R. Bertozzi, A strategy for functional proteomic analysis of glycosidase activity from cell lysates, Angew. Chem., Int. Ed. Engl. 2004,43,5338-5342. P. Sears, C.-H. Wong, Carbohydrate mimetics: a new strategy for tackling the problem of carbohydrate-mediated biological recognition, Angew. Chem., Int. Ed. Engl. 1999,38,2300-2324. B.R. Stockwell, Chemical genetics: ligand-based discovery of gene function, Nut. Rev. Genet. 2000, I , 116-125. M. Soltero-Higgin, E.E. Carlson, J.H. Phillips, L.L. Kiessling, Identification of inhibitors for UDP-galactopyranose mutase, J. Am. Chem. SOC.2004, 126,10532-10533. J.S. Helm, Y. Hu, L. Chen, B. Gross, S. Walker, Identification of active-site inhibitors of MurG using a generalizable, high-throughput glycosyltransferase screen, I.Am. Chem. SOC.2003, 125,11168-11169. L.L. Kiessling, J.K. Pontrello, M.C. Schuster, in Carbohydrate-Based Drug Discovery, 1st ed. (Ed.: C.-H. Wong), Wiley-VCH, Weinheim, 2003, pp. 575-608.
71.
72.
73.
74.
75.
76.
77.
78.
79.
L.L. Kiessling, T. Young, K.H. Mortell, in Glycoscience: Chemistry and Chemical Biology 1-111,1st ed., (Eds.: B. Fraser-Reid, K. Tatsuta, J. Thiem), Springer, New York, 2003, pp. 1817-1861. L.L. Kiessling, J.E. Gestwicki, L.E. Strong, Synthetic multivalent ligands in the exploration of cell-surface interactions, Curr. Opin. Chem. Biol. 2000,4,696-703. M. Mammen, S.-K. Choi, G.M. Whitesides, Polyvalent interactions in biological systems: implications for design and use of multivalent ligands and inhibitors, Angew. Chem., lnt. Ed. Engl. 1998,37,2755-2794. E.E. Simanek, G.J. McGarvey, J.A. Jablonowski, C.-H. Wong, Selectin-carbohydrate interactions: from natural ligands to designed mimics, Chem. Rev. 1998, 98, 833-862. J.E. Gestwicki, C.W. Cairo, L.E. Strong, K.A. Oetjen, L.L. Kiessling, Influencing receptor-ligand binding mechanisms with multivalent ligand architecture, J. Am. Chem. Soc. 2002, 124,14922-14933. H. Kamitakahara, T. Suzuki, N. Nishigori, Y. Suzuki, 0. Kanie, C.-H. Wong, A lysoganglioside poly-L-glutamic acid conjugate as a picomolar inhibitor of influenza hemagglutinin, Angew. Chem., Int. Ed. Engl. 1998,37,1524-1528. J.D. Reuter, A. Myc, M.M. Hayes, Z.H. Gan, R. Roy, D.J. Qin, R. Yin, L.T. Piehler, R. Esfand, D.A. Tomalia, J.R. Baker, Inhibition ofviral adhesion and infection by sialic-acid-conjugated dendritic polymers, Bioconjugate Chem. 1999, 10,271-278. P.I. Kitov, J.M. Sadowska, G. Mulvey, G.D. Armstrong, H. Ling, N.S. Pannu, R.J. Read, D.R. Bundle, Shiga-like toxins are neutralized by tailored multivalent carbohydrate ligands, Nature 2000, 403,669-672. E.K. Fan, Z.S. Zhang, W.E. Minke, 2. Hou, C. Verlinde, W.G.J. Hol, High-affinity pentavalent ligands of Escherichia coli heat-labile enterotoxin
References I663
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
by modular structure-based design, /. Am. Chem. SOL.2000,122,2663-2664. N. Kaila, B.E. Thomas, Design and synthesis of sialyl Lewis" mimics as E- and P-selectin inhibitors, Med. Res. Rev. 2002, 22, 566-601. E. J. Gordon, J.E. Gestwicki, L.E. Strong, L.L. Kiessling, Synthesis of end-labeled multivalent ligands for exploring cell-surface-receptor-ligand interactions, Chem. Biol. 2000, 7, 9-16. N.L. Perillo, M.E. Marcus, L.G. Baum, Galectins: versatile modulators of cell adhesion, cell proliferation, and cell death, J. Mol. Med. 1998, 76,402-412. H.-J. Gabius, H.-C. Siebert, S. Andre, J. Jimenez-Barbero, H. Riidiger, Chemical biology of the sugar code, ChemBioChem 2004,5740-764. J. Seetharaman, A. Kanigsberg, R. Slaaby, H. Leffler, S.H. Barandes, X-ray crystal structure of the human galectin-3 carbohydrate recognition domain at 2.1-A resolution, /. Biol. Chem. 1998, 273,13047-13052. R.-Y. Yang, F.-T. Liu, Galectins in cell growth and apoptosis, Cell. Mol. L@ Sci. 2003, 60, 267-276. R.C. Hughes, Secretion of the galectin family of mammalian carbohydrate-binding proteins, Biochim. Biophys. Acta 1999, 1473, 172-185. S.F. Dagher, J.L. Wang, R.J. Patterson, Identification of galectin-3 as a factor in pre-mRNA splicing, Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 1213-1217. R.Y. Yang, D.K. Hsu, F.T. Liu, Expression of galectin-3 modulates T-cell growth and apoptosis, Proc. Nntl. Acad. Sci. U.S.A. 1996, 93, 6737-6742. J. Hirabayashi, K. Kasai, Effect of amino acid substitution by site-directed mutagenesis on the carbohydrate recognition and stability of human 14-kDa B-galactosidebinding lectin, /. Biol. Chem. 1991, 266,23648-23653. W.M. Abbott, T. Feizi, Soluble 14-kDa 8-galactoside-specific bovine
91.
92.
93.
94.
95.
96.
97.
lectin, /. Bid. Chem. 1991, 266, 5552-5557. Y.D. Lobsanov, M.A. Gitt, H. Leffler, S.H. Barandes, J.M. Rini, X-ray crystal structure of the human dimeric S-Lac lectin, L-14-11,in complex with lactose at 2.9 A resolution,/. Biol. Chem. 1993, 268, 27034-27038. D.-I. Liao, G. Kapadia, H. Ahmed, G.R. Vasta, 0. Herberg, Structure of S-lectin, a developmentally regulated vertebrate 8-galactoside-binding protein, Proc. Natl. Acad. Sci. U.S.A. 1994, 91,1428-1432. K. Henrick, S. Bawumia, E.A.M. Barboni, B. Mehul, R.C. Hughes, Evidence for subsites in the galectins involved in sugar binding at the nonreducing end of the central galactose of oligosaccharide ligands: sequence analysis, homology modeling and mutagenesis studies of hamster galectin-3, Glycobiology 1998, 8, 45-57. M. Demetriou, M. Granocsky, S. Quaggin, J.W. Dennis, Negative regulation of T-cell activation and autoimmunity by Mgat5 N-glycosylation, Nature 2001, 409, 733-739. I. Vrasidas, S. Andre, P. Valentini, C. Bock, M. Lensch, H. Kaltner, R.M.J. Liskamp, H.-J. Gabius, R.J. Pieters, Rigidified multivalent lactose molecules and their interactions with mammalian galectins: a route to selective inhibitors, Org. Biomol. Chem. 2003, I , 803-810. N.L. Pohl, L.L. Kiessling, Scope of multivalent ligand function: lactose-bearing neoglycopolymers by ring-opening metathesis polymerization, Synthesis 1999, SI, 1515-1519. S. Andre, C.J. Arnusch, I . Kuwabara, R. Russwurm, H. Kaltner, H.-J. Gabius, R.J. Pieters, Identification of peptide ligands for malignancy- and growth-regulating galectins using random phage-display and designed combinatorial peptide libraries, Bioorg. Med. Chem. 2005, 13, 563-573.
664
I
7 7 Advances in Sugar Chemistry
98. C.J. Arnusch, S.Andre, P. Valentini,
M. Lensch, R. Russwurm, H.-C. Siebert, M.J.E. Fischer, H.-J. Gabius, R.J. Pieters, Interference of the galactose-dependent binding of lectins by novel pentapeptide ligands, Bioorg. Med. Chem. Lett. 2004, 14, 1437- 1440. 99. K. Drickamer, C-type lectin-like domains, Curr. Opin. Struct. Biol. 1999, 9,585-590. 100. K. Hskansson, K.B.M. Reid, Collectin structure: A review, Protein Sci. 2000, 9, 1607-1617. 101. W.I. Weis, M.E. Taylor, K. Drickamer, The C-type lectin superfamily in the immune system, Immunol. Rev. 1998, 163, 19-34. 102. T.B.H. Geijtenbeek, D.J. Kroopshoop, D.A. Bleijs, S.J. van Vliet, G.C.F. van Duijnhoven, V. Grabovsky, R. Alon, C.G. Figdor, Y. van Kooyk, DC-SIGN-ICAM-2 interaction mediates dendritic cell trafficking, Nat. Immunol. 2000, 1, 353-357. 103. M.P. Bevilacqua, S. Stengelin, M.A. Gimbrone, B. Seed, Endothelial leukocyte adhesion molecule 1: an inducible receptor for neutrophils related to complement regulatory proteins and lectins, Science 1989, 243,1160-1165. 104. G.I.Johnston, R.G. Cook, R.P. McEver, Cloning pf GMP-140, a granule membrane-protein of platelets and endothelium-sequence similarity to proteins involved in cell-adhesion and inflammations, Cell 1989, 56,1033-1044. 105. L.A. Lasky, M.S. Singer, T.A. Yednock, D. Dowbenko, C. Fennie, H. Rodriguez, T. Nguyen, S. Stachel, S.D. Rosen, Cloning of a lymphocyte homing receptor reveals a lectin domain, Cell 1989,56,1045-1055. 106. J.G. Geng, M. Chen, K.C. Chou, P-selectin cell adhesion molecule in inflammation, thrombosis, cancer growth and metastasis, Curr. Med. Chem. 2004, 11,2153-2160. 107. D. Marshall, D.O. Haskard, Clinical overview of leukocyte adhesion and
108.
109.
110.
111.
112.
113.
114.
115.
116.
migration: where are we now? Semin. Immunol. 2002, 14,133-140. L.A. Lasky, Selectins: interpreters of cell-specific carbohydrate information during inflammation, Science 1992, 258,964-969. W.S. Somers, J. Tang, G.D. Shaw, R.T. Camphausen, Insights into the molecular basis of leukocyte tethering and rolling revealed by structures of P- and E-selectin bound to sLeXand PSGL-1, Cell 2000, 103, 467-479. E.J. Gordon, L.E. Strong, L.L. Kiessling, Glycoprotein-inspired materials promote the proteolytic release of cell surface L-selectin, Bioorg. Med. Chem. 1998, 6, 1293- 1299. H. Tsujishita, Y. Hiramatsu, N. Kondo, H. Ohmoto, H. Kondo, M. Kiso, A. Hasegawa, Selectin-ligand interactions revealed by molecular dynamics simulations in solution, J . Med. Chem. 1997, 40, 362-369. Y. Hiramatsu, T. Tsukida, Y. Nakai, Y. Inoue, H. Kondo, Study of selectin blocker. 8. Lead discovery of a non-sugar antagonist using a 3D-Pharmacophore model, J . Med. Chem. 2000,43,1476-1483. M. De Vleeschauwer, M. Vaillancourt, N. Goudreau, Y. Guindon, D. Gravel, Design and synthesis of a new sialyl Lewis X mimetic: how selective are the selectin receptors? Bioorg. Med. Chem. Lett. 2001, 11, 1109-1112. M.A. Estiarte, D.H. Rich, Burger’s Medicinal Chemistry and Drug Discovery, 6th ed., (Ed.: D. Abraham), John Wiley and Sons, New York, 2003, pp. 633-685. G.R. Dawson, N. Collinson, J.R. Atack, Development of subtype selective GABA(A)modulators, C N S Spectr. 2005, 10, 21-27. R.T. Lee, M. Ichikawa, K. Fay, K. Drickamer, M.-C. Shao, Y.C. Lee, Ligand-binding characteristics of rat serum-type mannose-binding protein (MBP-A),J.Biol. Chem. 1991, 266, 48 10-481 5.
References I665 117. E.G. Berger, J. Rohrer, 118.
119.
120.
121.
122.
123.
124.
125.
Galactosyltransferase-still up and running, Biochimie2003,85,261-274. R. Almeida, S.B. Levery, U. Mandel, H. Kresse, T. Schwientek, E.P. Bennett, H. Clausen, Cloning and expression of a proteoglycan UDP-ga1actose:b-xylose ~-1,4-galactosyltransferase I. A seventh member of the human p4-galactosyltransferase gene family, J. Biol. Chem. 1999, 274, 26165-26171. T. Okajima, S. Fukumoto, K. Furukawa, T. Urano, K. Furukawa, Molecular basis for the progeroid variant of ehlers-danlos syndrome. Identification and characterization of two mutants in galactosyltransferase I gene, J. Biol. Chem. 1999,274,28841-28844. C . Hammond, I . Braakman, A. Helenius, Role of N-linked oligosaccharide recognition, glucose trimming, and calnexin in glycoprotein folding and quality control, Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 913-917. P. Schieffele, J. Peranen, K. Simons, N-glycans as apical sorting signals in epithelial cells, Nature 1995, 378, 96-98. E. Ioffe, P. Stanley, Mice lacking N-acetylglucosaminyltransferase I activity die at mid- gestation, revealing an essential role for complex or hybrid N-linked carbohydrates, Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 728-732. M. Metzler, A. Gertz, M. Sarkar, H. Sachachter, J.W. Schrader, J.D. Marth, Complex asparagine-linked oligosaccharides are required for morphogenic events during post-implantation development, EMBOJ. 1994, 13,2056-2065. E.S. Trombetta, The contribution of N-glycans and their processing in the endoplasmic reticulum to glycoprotein biosynthesis, Glycobiology 2003, 13, 77R-91R. J.S. Tkacz, 0. Lampen, Tunicamycin inhibition of polyisoprenyl N-acetylglucosaminyl pyrophosphate
126.
127.
128.
129.
130. 131.
132.
133.
134. 135.
formation in calf-liver microsomes, Biochem. Biophys. Res. Commun. 1975, 65,248-257. A. Mizoguchi, T. Mizuocki, Y. Kitazume, G. Tamura, A. Kobata, Abnormal spicule formation induced by tunicamycin in the early development of the sea-urchin embryo, Cell Struc. Funct. 1981, 6, 341- 346. R.S. Winning, N.C. Bols, J. J. Heikkila, Tunicamycin-inducible polypeptide-synthesis during xenopus-laevis embryogenesis, Differentiation 1991, 46, 167-172. N . Zagris, M. Panagopoulou, N-glycosylated proteins interfere with the 1st cellular migration in early chick embryo, Int. J. Deu. Biol. 1992, 36,439-443. X. Shen, R.E. Ellis, K. Lee, C.-Y. Liu, K. Yang, A. Solomon, H. Yoshida, R. Morimoto, D.M. Kurnit, K. Mori, R.J. Kaufman, Complementary signaling pathways regulate the unfolded protein response and are required for C. elegans development, Cell 2001, 107,893-903. K.M. Koeller, C.-H. Wong, Emerging themes in medicinal glycoscience, Nat. Biotechnol 2000, 18, 835-841. N. Asano, Glycosidase inhibitors: update and perspectives on practical use, Glycobiology 2003, 13,93R-l04R. D.S. Boyle, W.D. Donachie, MraY is an essential gene for cell growth in Escherichia coli, J. Bacterial. 1998, 180, 6429-6432. S.A. Denome, P.K. Elf, D.E. Henderson, D.E. Nelson, K.D. Young, Escherichia coli mutants lacking possible combinations of eight penicillin binding proteins: viability, characteristics, and implications for peptidoglycan synthesis, Antimicrob. Agents Chemother. 1999, 181,3981-3993. C. Walsh, Antibiotics: Actions, Origins, Resistance, ASM Press, Washington, 2003. T.L. Lowary, Recent progress towards the identification of inhibitors of mycobacterial cell wall
666
I
7 I Advances in Sugar Chemistry
136.
137.
polysaccharide biosynthesis, Mini. Rev. Med. Chem. 2003, 3,689-702. G.D. Wright, Mechanisms of resistance to antibiotics, Curr. Opin. Chem. Biol. 2003, 7, 1-7. Y. Ma, F. Pan, M. McNeil, Formation of dTDP-Rhamnose is essential for growth of mycobacteria,J . Bacteriol.
the fluorescence probe 8-anilino-1-naphthalenesulfonate (ANS) with the antibiotic target MurA, Proc. Natl. Acad. Sci. U.S.A. 2000, 97,6345-6349.
2002, 184,3392-3395. 138.
L.L. Pederson, S.J. Turco, Galactofuranose metabolism: a potential target for antimicrobial chemotherapy, Cell. Mol. Lfe Sci. 2003, 60,259-266.
139.
140.
141.
142.
143.
161.
145.
146.
Eschenburg, M.A. Priestman, F.A. Abdul-Latif,C. Delachaume, F. Fassy, E. Schonbrunn, A novel inhibitor that suspends the induced fit mechanisms of UDP-N-acetylglucosamine enolpyruvyl transferase (MurA),J. Biol. Chem. 2005, 280, 14070-14075. E.Z. Baum, D.A. Montenegro, L. Licata, I. Turchi, G.C. Webb, B.D. Foleno, K. Bush, Identification and characterization of new inhibitors of the Escherichia coli MurA enzyme, Antimicrob. Agents Chemother. 2001,
147. S.
148.
R. Koplin, J.R. Brisson, C.J. Whitfield, UDP-galactofuranose precursor required for formation of the lipopolysaccharide 0 antigen of Klebsiella pneumoniae serotype 01 is 45,3182-3188. synthesized by the product of the rfbD(KPO1)gene, /. Biol. Chem. 1997, 149. Y. Hu, J.S. Heim, L. Chen, C. Ginsberg, B. Gross, B. Kraybill, 272,4121-4128. P.M. Nassau, S.L. Martin, R.E. K. Tiyanont, X. Fang, T. Wu, Brown, A. Weston, D. Monsey, S. Walker, Identification of selective M. McNeil, K. Duncan, inhibitors for the glycosyltransferase Galactofuranose biosynthesis in via high-throughput MurG Escherichia coli K-12: Identification screening, Chem. Bid. 2004, I I, and cloning of UDP-galactopyranose 703-71 1. mutase, J . Bacteriol. 1996, 178, 150. Y. Hu, L. Chen, S . Ha, B. Gross, 1047- 1052. B. Falcone, D. Walker, F. Pan, M. Jackson, Y. Ma, M. Mokhtarzadeh, S. Walker, Crystal M. McNeil, Determination that cell structure of the MurG:UDP-GlcNAc wall galactofuran synthesis is complex reveals common structural essential for growth of mycobacteria, principles of a superfamily of J. Bacteriol. 2001, 183,3991-3998. glycosyltransferases, Proc. Natl. Acad. P. Compain, O.R. Martin, S C ~U.S.A. . 2003, 100,845-849. Carbohydrate mimetics-based 151. X. Wen, D.C. Crick, P.J. Brennan, glycosyltransferaseinhibitors, Bioorg. P.G. Hultin, Analogues of the Med. Chem. 2001, 9,3077-3092. mycobacterial arabinogalactan A.H. Katz, C.E. Caufield, linkage disaccharide as cell wall S tructure-based design approaches to biosynthesis inhibitors, Bioorg. Med. cell wall biosynthesis inhibitors, Chem. 2003, 1 I , 3579-3587. Curr. Pharm. Des. 2003, 9,857-866. 152. K. Marotte, T. Ayad, Y. Genisson, L.L. Silver, Novel inhibitors of G.S. Besra, M. Baltas, J. Prandi, bacterial cell wall synthesis, CUT. Synthesis and biological evaluation of Opin. Microbiol. 2003, 6,431-438. imino sugar-oligoarabinofuranoside F.M. Kahan, J.S. Kahan, P. J. Cassidy, hybrids, a new class of mycobacterial H. Kropp, The mechanism of action arabinofuranosyltransferase of fosfomycin (phosphonomycin), inhibitors, Eur. J . Org. Chem. 2003, Ann. N.Y. Acad. Sci. 1974, 235, 14,2557-2565. 364-386. 153. A. Caravano, D. Mengin-Lecreulx, E. Schonbrunn, S. Eshenburg, J.-M. Brandello, S.P. Vincent, K. Luger, W. Kabsch, N. Amrhein, P. Sinay, Synthesis and inhibition Structural basis for the interaction of properties of conformational probes
References I667
for the mutase-catalyzed development of a microtiter U DP-Galactopyranose/furanose plate-based screen for inhibitors of interconversion, Chem. - Eur. J. conversion of dTDP-glucose to 2003, 9,5888-5898. dTDP-rhamnose, Antimicrob. Agents 154. Q.Zhang, H. Liu, Mechanistic Chemother. 2001, 45, 1407-1416. investigation of UDP161. C.J. Andres, J.J. Bronson, galactopyranose mutase from S.V. D’Andrea, M.S. Deshpande, Escherichia coli using 2- and P.F. Falk, K.A. Grant-Young, 3-fluorinated UDPW.E. Harte, H.-T. Ho, P.F. Misco, galactofuranose as probes, J . A m . J.G. Robertson, D. Stock, Y. Sun, Chem. Soc. 2001, 123,6756-6766. A.W. Walsh, 4-thiazolidinones: Novel 155. J.N. Barlow, J.S. Blanchard, inhibitors of the bacterial enzyme Enzymatic synthesis of MurB, Bioorg. Med. Chem. Lett. 2000, UDP-(3-deoxy-3-fluoro)-D-galactose 10,715-717. , and UDP+deoxy-2-fluoro)-D162. J.J.Bronson, K.L. DenBleyker, P.J. galactose and substrate activity with Falk, R.A. Mate, H.-T. Ho, M.J. UDP-galactopyranose mutase, Pucci, L.B. Snyder, Discovery of the Carbohydr. Res. 2000, 328,473-480. first antibacterial small molecule 156. N.Veerapen,Y. Yuan, D.A.R. inhibitors of MurB, Bioorg. Med. Sanders, B.M. Pinto, Synthesis of Chem. Lett. 2003, 13,873-875. ammonium and ions 163. K. Babaoglu, M.A. Page, V.C. Jones, and their evaluation as inhibitors of M.R. McNeil, C. Dong, J.H. UDP-galactopyranose mutase, Naismith, R.E. Lee, Novel Inhibitors Carbohydr. Res. 2004, 339, of an emerging target in 2205-2217. Mycobacterium tuberculosis; 157. S. Cren, S.S.Gurcha, A.J. Blake, G.S. substituted thiazolidinones as Bersa, N.R. Thomas, Synthesis and inhibitors of dTDP-rhamnose biological evaluation of new synthesis, Bioorg. Med. Chem. Lett. inhibitors of UDP-Gay transferase-a 2003, 13,3227-3230. key enzyme in M. tuberculosis cell 164. G.D. Francisco, Z.Li, D. Albright, wall biosynthesis, Org. Biomol. Chem. N.H. Eudy, A.H. Katz, P.J. Petersen, 2004, 2,2418-2420. P. Labthavikul, G . Singh, Y. Yang, 158. M.S. Scherman, K.A. Winans, R.J. B.A. Rasmussen, Y. Lin, T.S. Stern, V. Jones, C.R. Bertozzi, M.R. Mansour, Phenyl thaizolyl urea and McNeil, Drug targeting carbamate derivatives as new mycobacterium tuberculosis cell wall inhibitors of bacterial cell-wall synthesis: development of a biosynthesis, Bioorg. Med. Chem. Lett. microtiter plate-based screen for 2004, 14,235-238. U DP-galactopyranose mutase and identification ofan inhibitor from a l65. S.’. K.L. Owens, M. Showalter, C.L. Griffith, T.L. uridine-based library, Antimicrob. Doering, V.C. Jones, M.R. McNeil, Agents Chemother. 2003, 47, 378-382. Eukaryotic UDP-galactopyranose 159. M. Soltero-Higgin, E.E. Carlson, T.D. mutase (GLF gene) in microbial and Gruber, L.L. Kiessling, A unique metazoal pathogens, Eukaryot. Cell catalytic mechanism for 2005,4,1147-1154. U DP-galactopyranose mutase, Nat. 166. R.S. Haltiwanger, Regulation of Struct. Mol. Biol. 2004, I I , 539-543. signal transduction pathways in 160. Y. Ma, R.J. Stern, M.S. Scherman, development by glycosylation, Curr. V.D. Vissa, W. Yan, V. Cox Jones, Opin. Struct. Biol. 2002, 12, 593-598. F. Zhang, S.G. Franzblau, W.H. 167. A. Cambi, C.G. Figdor, Dual function Lewis, M.R. McNeil, Drug targeting of C-type lectin-like receptors in the Mycobacteriurn tuberculosis cell wall immune system, Curr.Opin. Cell synthesis: Genetics of dTDPBiol. 2003, 15, 539-546. Rhamnose synthetic enzymes and
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
668
I
I 1 Advances in Sugar Chernistv
11.2 Chemical Clycomics as Basis for Drug Discovery
Daniel B. Werz and Peter H. Seeberger
Outlook
Chemical glycomics uses synthetic carbohydrates and glycoconjugates to study natural carbohydrates and glycoconjugates their role in important biological processes such as inflammation, cell-cell recognition, immunological response, metastasis, and fertilization. The development of an automated oligosaccharide synthesizer greatly accelerates the assembly of complex, naturally occurring carbohydrates as well as chemically modified oligosaccharide structures, and promises to make a major impact in the field of glycobiology. Tools such as microarrays, surface plasmon resonance (SPR), and fluorescent carbohydrate conjugates to map interactions of carbohydrates in biological systems are presented. Case studies of the successful application of carbohydrates as active agents are discussed: Fully synthetic oligosaccharide vaccines to combat tropical diseases (e.g., malaria), bacterial infections (e.g.,tuberculosis), viral infections (e.g., HIV), and cancer. Aminoglycosides serve as examples of drugs acting via carbohydrate nucleic acid interactions, while heparin works through carbohydrate-protein interactions. A carbohydrate-functionalized fluorescent polymer has been shown to detect miniscule amounts of bacteria faster than commonly used methods.
11.2.1 Introduction
Three major classes of polymers are responsible for the storage and transfer of information in biological systems: These are nucleic acids, proteins, and polysaccharides. DNA, the genetic material transferring information from generation to generation, functions as the blueprint of life. RNA serves as a transient repository of genetic information on the way from DNA to proteins, but also has pivotal roles in cell division, gene expression, and catalysis. The protein synthesis machinery, called the ribosome, consists of RNA [l].Proteins, the second major class of biopolymers, which are encoded by nucleic acids, represent the catalytic machinery carrying out most of the reactions in the cell. Proteins are also important as skeletal material of numerous organisms to provide strength as well as flexibility. Glycosyltransferases, a special class of enzymes, are responsible for the synthesis of carbohydrates, the third class of biopolymers. While nucleic acids and proteins are linear assemblies, carbohydrates are structurally and stereochemically more diverse. A wide array of available Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA. Weinheim ISBN: 978-3-527-31150-7
11.2 Chemical Clycomics as Basisfor Drug Discovery
monosaccharide building blocks as well as the possibility of different stereochemical linkages between each pair of carbohydrates results in tremendous complexity. Additionally, the chain length of the oligosaccharides can also vary widely from monosaccharides up to branched oligosaccharides with more than 30 building blocks, or in the case of polysaccharides to several thousand building blocks. The most prominent example for the latter type is cellulose, which is the major constituent of plant tissues, and chitin, which forms the shells of insects and crabs. Moreover, oligosaccharides are present in the form of glycoconjugates in all cell walls mediating a variety of events, such as inflammation, cell-cell recognition, immunological response, metastasis, and fertilization [2]. The carbohydrate coat called glycocalix surrounding a cell is specific for a particular species, its cell type, and its developmental status. Alterations in cell-surface oligosaccharides have been found in association with many pathological conditions such as cancer and tuberculosis. Usually, the desired glycoconjugates exist in heterogeneous mixtures that are difficult to isolate in the pure form, and when possible, only small amounts are obtained. For the other two major classes of biopolymers, many tools are available to elucidate their structure, their function, and their structure-function relationships. Detailed insights into protein-protein interactions, protein-nucleic acid interactions, and nucleic acid-nucleic acid interactions have been gained (Fig. 11.2-1). This research has been of fundamental importance for the development of new therapeutics that aim to modify, enhance, or disrupt these interactions. In contrast, carbohydrates, although studied for more than IOOyears, have attracted less interest in the field of drug discovery. Forty years ago, biochemical research concerning carbohydrates was focused on their role in energy storage and supply in biological systems. Biosynthesis and biodegradation pathways were discovered. But the function of carbohydrates in biologically important recognition processes became evident much later. Thus, all aspects of glycobiology, now often termed glycomics, are still not so well understood than its two counterparts, genomics and proteomics, dealing with nucleic acids and proteins. The era of biotechnology was initiated by two major breakthroughs that paved the way for further developments in biochemical research. First, the sequencing of nucleic acids and proteins has been automated and allows for the composition of an unknown sample to be determined quickly and reliably [ 3 ] . Secondly, the synthesis of defined oligonucleotides [4]and peptides [ S ] has also been automated and even allows nonspecialists in this field to obtain rapidly larger-scale quantities of these important classes of biopolymers. The rational design of specific modifications has come within reach and is an important research tool in biomedicine, biotechnology, and pharmaceutics. In contrast, oligosaccharide sequencing and structure determination remains a difficult task, even though major efforts have been directed toward the improvement of modern analytical methods such as high-performance
I
669
670
I
7 I Advances in Sugar Chemistry
__--
__._..._._.___,--
--__
. I . . .
,,' ,
0, < '
,,,,'
.,
Proteomics Protein - Proteir;\ interactions
Nucleic acid - Nucleic acid interactions
:\'
-----...._.._.
Glycomics
/
",%,Carbohydrate- Carbohydrate,,," interactions ,,/ *.
_.-'.
-----.__..-.._.--_ _ - +
Fig. 11.2-1
Interactions o f t h e three main classes o f biopolymers.
liquid chromatography (HPLC),two-dimensional nuclear magnetic resonance (NMR) techniques, and special mass spectroscopic methods such as electrospray ionization and matrix-assisted [GI. Until recently, access to pure oligosaccharides remained technically difficult and extremely time-consuming. Multiple chemical [7]and enzymatic methods [8]are known, and an automated method has been developed, but no general approach has evolved to date.
11.2.2 Automated Carbohydrate Synthesis
Analogous to the highly efficient synthesis of peptides and oligonucleotides, solid-phase synthesis has been used for the automated assembly of oligosaccharides [9, 101. Two advantages of the solid-phase approach are noteworthy: The use of excess reagent drives reactions to completion; and purification after each reaction step is not required, but rather washing procedures remove excess reagents [9,10]. Our laboratory decided to utilize an acceptor-bound approach for the carbohydrate assembly, whereby the anomeric position of the first carbohydrate is attached at its reducing end to the solid support [9, 101. Therefore, glycosyl
7 7.2 Chemical Clycornics as Basisfor Drug Discovery
phosphates [Ill and glycosyl trichloroacetimidates [I21 proved to be ideal glycosylating agents that are relatively stable and can be stored for many months in the refrigerator. Glycosyl phosphates are readily synthesized by a one-pot procedure starting from differentially protected glycals. Epoxidation with dimethyl dioxirane (DMDO) is followed by opening of the 1,2-anhydrosugar with dibutylphosphate. Protection of the ensuing C2 hydroxyl group produced a good to excellent yield of the desired glycosyl phosphates [11].Glycosylation reactions in the presence of trimethylsilyl triflate result in good yield. The reaction times usually range between 10 and 30 minutes. Selectivity at the anomeric center is achieved by using appropriate participating or nonparticipating groups at the C2 hydroxyl. Easily and selectively removable temporary protecting groups such as Fmoc (fluorenylmethoxycarbonyl), that is cleaved by weak bases, have shown to be important for successful oligosaccharide syntheses [ 131. Orthogonal protecting groups are utilized in concert to access branched oligosaccharides [13, 141. In addition to a useful protecting group strategy, the next strategic consideration involves the choice of an appropriate resin and the right linker connecting the first sugar at its reducing end with the solid support. The linker has to be compatible with a wide range of reaction conditions applied during oligosaccharide assembly. However, after the synthesis is completed, rapid and efficient cleavage is necessary. Two linkers that are readily connected to Merrifield’s resin have shown to fulfill these requirements: An alkene-containing linker [ 151, which is released from the solid support by olefin cross-metathesis using Gmbbs’ catalyst, and ethylene as well as an ester-containinglinker, which is cleaved by strong bases such as methanolate [13].The latter linker can be used only when the deprotecting sequences during oligosaccharide assembly avoid strong basic conditions. Furthermore, novel capping and tagging methods [ 161 developed for automated synthesis help to greatly simplify the postsynthetic workup and purification process of synthetic oligosaccharides. Following each coupling step, unreacted hydroxyl groups that may give rise to shorter carbohydrate sequences are treated with a capping reagent that renders them silent in subsequent couplings. Usually, branched carbohydrates such as the Lewis antigens have been synthesized in solution by highly convergent routes [17, 181. The LewisX pentasaccharide, the Lewis Y hexasaccharide, and dimeric combinations of Lewis antigens, including the LeY-Le’ nonasaccharide, are blood group determinant oligosaccharides. The latter two also act as tumor markers that are currently being explored in cancer therapy [19].A retrosynthesis ofthe fully protected Lewis blood group oligosaccharides 1-3 is shown in Scheme 11.2-1. With our sequential strategy using a small number of glycosyl donors 4-8 as building blocks, an automated solid-phase synthesis of these biologically important compounds was possible [13]. Activation of the glycosyl phosphate monomers 4-8 was carried out at -15 “C in dichloromethane under acidic conditions with the Lewis acid TMSOTf, Removal of Fmoc was accomplished by treatment with excess piperidine,
I
671
672
BnO
I
7 I Advances in Sugar Chemistry
OBn
B~O!$$&,o-!~(oBU)z OFmoc
8
OBn
BnO OBn PlVO ~ OPlV o ~
~
(
o
B
7
O,& ,$! .
uLevo$o-~ FmocO ) zTCAHN
(oBU)2
6
Scheme 11.2-1 Retrosynthesis o f t h e protected Lewis X pentasaccharide 1 , Lewis hexasaccharide 2, and LeX-LeY nonasaccharide 3 indicates monosaccharide
FmocO
&OBu), PlVO
5
Fmoco+O-&OBu)2
BnO
OPlV
4
building blocks 4-8. Bn - benzyl, Bu - butyl, Fmoc - 9-fluorenylmethoxycarbonyl, Lev - levulinoyl, Piv - pivaloyl, TCA - trichloroacetyl.
whereas the levulinoyl group was removed by treatment with a solution of hydrazine. The coupling as well as the deprotection steps were repeated at least twice to ensure high coupling efficiencies and a single deprotection event. A general cycle for the installation of one building block is shown in Table 112-1. Repetition of these cycles (Scheme 11.2-2) with the corresponding building blocks completed the assembly of the penta-, the hexa-, and the nonasaccharide, respectively. The total time durations for assembly of the carbohydrate skeleton were 12 h for 1, 14 h for 2, and 23 h for 3 [13]. Cleavage of the ester linker from the resin using a solution of sodium methanolate over a period of 6 h provided the crude oligosaccharides. HPLC purification produced the fully protected Lewis X pentasaccharide 1, Lewis Y hexasaccharide 2, and LeY-Le' nonasaccharide 3 in 12.6,9.9, and 6.5% yields, respectively [13]. 11.2.3 Tools for Clycomics
Once a carbohydrate structure of biological interest has been synthesized, several tools [20] to map the interactions of the carbohydrates in biological
17.2 Chemical Glycornics as Baskfor Drug Dkcouery
Table 11.2-1 General cycle used with glycosyl phosphates for the construction of oligosaccharides 1-3 Step
Function
Couple Wash Couple Wash Deprotection Wash Wash Wash Wash
Reagent
Time (min)
5 equiv donor and 5 equiv TMSOTf Dichloromethane 5 equiv donor and 5 equiv TMSOTf N,N-Dimethylformamide (DMF) 3 x 175 equivalent piperidine in DMF or 5 x 10 equivalent hydrazine in DMF N,N-Dimethylformamide (DMF) 0.2 M acetic acid in tetrahydrofuran Tetrahydrofuran Dichloromethane
21 9 21 9 3 4 or 80
Scheme 11.2-2 Automated
oligosaccharide synthesis with glycosyl phosphates. Initial glycosylation of resin-bound acceptor produces a coupling product that may be subsequently deprotected. Iteration of coupling and deprotection cycles with phosphate donors 4-8 followed by cleavage of the resin-bound oligosaccharides and purification gives 1-3.
systems are at the disposal of today's glycobiologist. Figure 11.2-2 provides an overview of tools including modified surfaces for microarrays and surface plasmon resonance (SPR), monovalent fluorescent conjugates, neoglycoprotein and carbohydrate vaccines, multivalent quantum dot conjugates, affinity tagged saccharides, derivatized magnetic particles, and latex microspheres. All these methods relied on clever linking chemistries. Amine-containing linkers
1
673
674
I
I J Advances in Sugar Chemistry
Fig. 11.2-2 Tools for glycobiology: a - modified surfaces for microarrays and surface plasrnon resonance (SPR), b - monovalent fluorescent conjugates, c - neoglycoproteins and carbohydrate
vaccines, d - multivalent quantum dot conjugates, e - future neoglycoconjugates, f - affinity tag conjugates, g - magnetic particle conjugates, h - latex microsphere and sepharose affinity resin conjugates.
are able to react with amine-reactive substrates such as activated esters. In analogy, the carboy1 group containing linkers react with amine-containing molecules. Furthermore, thiol-containing linkers react readily with maleimide and iodoacetyl moieties and vice versa. In addition, thiol-containing moieties show a high affinity to gold surfaces. One special linker has been devised for most tools described in this chapter (Scheme 11.2-3). 2-(2-(2-Mercaptoethoy)ethoxy)ethanol was selected due its compatibility with existing synthetic methods, the ease of temporarily masking the thiol functionality with a protecting group, and the readily applicable thiol-based conjugation chemistry.
11.2.3.1
Carbohydrate Microarrays
Microarrays [21] in the “chip” format, prepared by attachment of biopolymers to a surface in a spatially discrete pattern, have enabled a low-cost and high-throughput methodology for screening interactions involving these molecules. The most important advantage compared to classical methods is that microarrays allow for several thousand binding events to
11.2 Chemical Glycomics as Basisfor Drug Discovery
Scheme 11.2-3 2-(2-(2-Mercaptoethoxy) ethoxy)ethanol as a linker for preparing neoglycoconjugates: a - Linker synthetically incorporated into reducing end o f mono- or oligosaccharide. b - All protecting groups
removed from carbohydrate and thiol. c - Reduced thiol coupled to maleimide or iodoacetyl functionalized structure (chip, bead, resin, fluorescent dye, quantum dot, etc.).
be screened in parallel, whereby the experiment requires only miniscule amounts of both analyte and ligand. Thus, binding profiles and lead structures are readily examined. Miniaturization through the construction of microarrays is particularly well suited to all investigations in the field of glycomics [22]. In contrast to the other two classes of biopolymers, no biological amplification strategy such as the polymerase chain reaction (PCR) or cloning exists to produce usable quantities of complex oligosaccharides. Therefore, the miniaturized assay format is the method of choice to perform several experiments with only mol of compound.
I
675
676
l
J J Advances in Sugar Chemistry
Hitherto, many methods for the preparation of carbohydrate microarrays have been described, such as nitrocellulose coated slides for noncovalent immobilization of microbial polysaccharides [23], and selfassembled monolayers modified by Diels- Alder mediated coupling of cyclopentadiene-derivatized oligosaccharides [24], just to name two. Unfortunately, the first method requires large polysaccharides or lipid modified sugars for the noncovalent interaction. The latter method requires the preparation of oligosaccharides bearing the sensitive cyclopentadiene moiety. In our laboratory, the best results were obtained by utilizing maleimide functionalization of glass slides and the immobilization of the oligosaccharides with thiol-containing linkers. However, with this linker system two methods of surface functionalization should be distinguished: One presents a relatively low density of immobilized oligosaccharides and excellent resistance to nonspecific binding of proteins to the chip surface. The other permits a high-density immobilization of carbohydrates, and therefore, allows for the examination of oligosaccharide clusters at the surface.
1 12.3.2 Hybrid Carbohydrate/Clycoprotein Microarrays A chip containing both carbohydrates and glycoproteins permits the rapid determination of the context of binding to the glycoprotein. Incubation of proteins with this hybrid array establishes whether the peptide context is essential for binding or the carbohydrate structure alone is sufficient. To prepare these slides, the glass surface is usually modified with two different chemistries, for example, on one side a maleimide chemistry, and on the other an N-hydroxysuccinimide (NHS) activated ester.
11.2.3.3 Microsphere Arrays In contrast to common microarrays, the microsphere system uses optical methods to define the position and structure of a carbohydrate series [25]. Incubation of the immobilized microsphere with a fluorophorelabeled carbohydrate-binding protein and the subsequent measuring of the fluorescence signals permits a determination of the binding profile. Binding events take place when one bead emits at both the wavelength of an internal code, which is used as a marker for the oligosaccharide attached to the microsphere, and the fluorophore-labeled protein.
11 2 3 . 4
Surface Plasmon Resonance (SPR)
A method to get quantitative insights into the binding of analytes to ligands in real time is SPR [26]. For SPR experiments, one of the interacting species is immobilized on the surface ofa chip. The prospective binding partner is flowed
11.2 Chemical Clycomics as Basisfor Drug Discovery
over the chip. During this process, the refractive index of the chip changes owing to the interaction as well as the accumulation of analyte. The kinetic data, obtained in this fashion allows one to calculate association and dissociation constants from sub-microgram quantities of material. There is no need to label the ligand or the analyte, and any influence of a label on the binding affinities can be excluded. A further advantage is that these measurements permit evaluations of low and high affinity interactions. SPR is on the way to become an extremely powerful tool in glycomics, since structure-activity relationships are quickly assessed.
11.2.3.5
Fluorescent Carbohydrate Conjugates
Microarrays do not represent ideal formats for the examination of monovalent protein-carbohydrate interactions. Commonly, the densities of the immobilized oligosaccharides are too high to ensure that monovalent interactions are observed. Another limitation of the array technique is the requirement of purified receptor. Therefore, another more appropriate approach is needed to study interactions with cells. Monovalent and multivalent fluorescent probes can be utilized to evaluate the influence of oligosaccharide clustering on recognition by cell-surface lectins. Fluorescence microscopy and flow cytometry are appropriate methods to visualize the corresponding receptor-carbohydrate interactions.
11.2.3.6
Carbohydrate Affinity Screening
In contrast to the array technique that usually utilizes purified receptors, this synthetic tool facilitates the isolation and purification processes of carbohydrate-binding proteins [20]. Crude mixtures or biological extracts are separated by carbohydrate-containing affinity columns. Thus, this purification method also provides information about the interaction of carbohydrates with other biopolymers.
11.2.4 Oligosaccharide Conjugate Vaccines: Malaria and HIV
In addition to serving as tools, carbohydrates also hold great potential as vaccines, as small amounts of antigen can be used to protect a large number of people. Immunological investigations using fully synthetic carbohydrate vaccines have shown very promising results in the treatment of various diseases. These affiliations include cancer, bacterial infections such as tuberculosis, and tropical diseases such as leishmaniasis and malaria. The malaria parasite Plasmodiumfalciparum, infecting 5- 10% of the human population worldwide, accounts for about 100 million clinical cases and
I
677
678
I 1 Advances in Sugar Chemistry
I the death of more than
2 million people annually caused by the malaria toxin [27]. Therefore, the development of a malaria vaccine would be of highest importance. Glycosylphosphatidylinositol (GPI), which is released when parasites rupture the host's red blood cells, has the properties predicted of this mortality-inducing toxin [28]. Experiments demonstrated that anti-GPI vaccination can prevent malarial pathology in an animal model [29]. To prepare this antigen, the synthetic hexasaccharide malaria toxin 9 (Fig. 11.2-3) [30] was reacted with a linker, and conjugated to maleimideactivated carrier protein. Mice treated with chemically synthesized GPI attached to the protein were substantially protected from death by malaria. Between GO and 75% of the vaccinated mice survived, whereas the survival rate for unvaccinated mice was only 0-9%. It should be noted that only miniscule amounts (10-9-10-7 g per person) of the hexasaccharide 9 that was partly assembled by automated synthesis are necessary to perform the vaccination. This study suggests that GPI is a highly conserved endotoxin of malarial parasite origin. The preclinical model revealed that a nontoxic GPI oligosaccharide coupled to a carrier protein is immunogenic and provides significant protection against malarial pathogenesis. An antitoxic oligosaccharide vaccine against malaria might be within reach. The elucidation of HIV envelope glycoprotein interactions with prospective binding partners advances our understanding of the viral entry and provides a basis for the design of new vaccines interfering with HIV entry. Using the chip format, interactions of carbohydrates decorating the viral surface envelope proteins with receptors are readily discovered. Relevant substructures that are important for binding can be identified simultaneously when the arrays are composed of a series of closely related analogs [31].
\
I
9
Fig. 11.2-3
The anti-toxin malaria CPI vaccine candidate 9.
11.2 Chemical Clycomics as Basisfor Drug Discovery
I
One important carbohydrate structure found at the HIV envelope glycoprotein gp120 is the triantennary N-linked mannoside (Man)g(GlcNAc)z. Utilizing a variety of synthetic mannose-containing substructures 10-16 (Fig. 11.2-4(a)), a chip with a wide range of concentrations was printed to establish a saturation point for observed binding to a fluorescently labeled protein [31]. Thus, a carbohydrate-binding profile can be established for a given protein by comparing the integrated fluorescence of different spots. Incubation of these arrays with a series of different gpl20-binding proteins (ConA, 2G12, Cyanovirin-N, DC-SIGN, and Scytovirin-N) revealed a precise evaluation of their binding profiles [31]. Figure 11.2-4(b) shows the corresponding chips. The experiments with 2G12 showed no binding with 12, 15, and 16 suggesting that a Manal-2Man linkage, the only structural motif in common, is necessary for recognition by 2G12. In contrast, Scytovirin-N, a protein that was isolated from the cyanobacteriurn Scytonema varium, binds only to the structures 10 and 14. This result clearly illustrates that a different structural motif within the oligosaccharide is recognized by Scytovirin-N. The terminal Manal-2Man linkage, together with the underlying al-6 trimannoside moiety is necessary for Scytovirin-N binding. These studies also corroborate that these proteins can bind high-density arrays of Manal-2Mancontaining oligosaccharides in the absence of the polypeptide backbone.
11.2.5 Carbohydrate- Nucleic Acid Interactions: Aminoglycosides
Aminoglycosides represent a family of naturally occurring pseudooligosaccharides that consist of two to five monomers and a one-to-one ratio between the amino and hydroxy groups. Clinically, these compounds have been used to treat infectious diseases induced by a variety of gram-negative bacteria. Aminoglycosides exhibit their antibiotic activity by inhibiting protein synthesis by binding to bacterial ribosomes. Most commonly, aminoglycosides bind to the A site in the small ribosomal subunit (30s) of the bacterial ribosome resulting in misreading during the translational process. Not surprisingly, charge interactions between amino groups and the phosphate backbone dominate as binding forces in these aminoglycoside-RNA complexes. As with many other antibiotics, the efficiency of aminoglycosides has been compromised by the emergence of resistant bacterial strains [32, 331. The most prominent mechanisms that cause resistance are enzymatic modifications of the aminoglycoside including N-acetylation and 0-phosphorylation. These modifications result in a large decrease in binding affinity to the therapeutic target [34]. To facilitate the discovery of safer and more active aminoglycosides, highthroughput methods are necessary. Microarray techniques enable medicinal chemists to identify weak binders to resistance-causing enzymes and tight binders to ribosomal RNA. Recently, our laboratory reported the construction of aminoglycoside microarrays to study antibiotic resistance [35, 361.
679
680
I
I J Advances in Sugar Chemistry
I
OR
13
-OR 10
OH OH HO HHO O
S
0 OH HHO O M -OR
OR 14
11
OH OH
& ''ORHO
Hoa 15
12
HO
1
HO
OR 16
71.2 Chemical Clycomics as Basisfor Drug Discovery
Fig. 11.2-4 (a) Synthetic substructures of the triantennary Winked mannoside including thiol-containing linker for immobilization and conjugation chemistry. (b) Carbohydrate microarrays containing synthetic mannose 10-16 and galactose,
printed at 2 mM. Each carbohydrate is spotted with a diameter of approximately 100-200 pm. False color image of incubations with fluorescently labelled C o n 4 2G12, CVN, DC-SICN, and Scytovirin.
The antibiotic was immobilized on amine-reactive glass slides using a DNA arraying robot. Two aminoglycoside acetyltransferase resistance enzymes, 2’-acetyltransferase (AAC(2’))from Mycobacterium tuberculosis [37]and 6’-acetyltransferase (AAC(G’))from Salmonella enterica [38] were used as examples. Hybridization to the aminoglycoside arrays revealed that each aminoglycoside interacts with both the enzymes. Comparison with calorimetric studies of aminoglycoside-binding affinities to AAC(6’) [ 391 found a strong correlation with the array results. Arrays were also incubated with two different RNA sequences to determine binding specificity for bacterial and human A-site RNA. To facilitate the discovery of inhibitors of resistance-causing enzymes, a library of aminoglycoside mimetics was synthesized and immobilized. Guanidinoglycosides [40] (Fig. 11.2-5) were chosen as aminoglycoside analogs for several reasons: First, guanidinoglycosides can be readily prepared from aminoglycosides. Second, the increased positive charge due to the larger number of nitrogen-containing guanidino groups may allow guanidinoglycosides to bind more tightly to the negatively charged aminoglycoside binding pocket [41]. Third, the large difference in the pK, values of guanidino and amino groups (12.5 vs. 8.8) suggests that guanidinoglycosides are likely not substrates for acetyltransferases such as AAC(2’)and AAC(6’).As anticipated, guanidinoglycosides revealed higher afinity to resistance-causing enzymes than the corresponding aminoglycosides. Guanidinoglycosides do not serve as substrates and inhibit acylation of several clinically important antibiotics. This promising approach proves valuable for screening a plethora of compounds in a short time to discover improved drugs that evade current modes of bacterial resistance. 11.2.6 Carbohydrate- Protein Interactions: Selectins and Heparin
Cell-surface carbohydrates also act as recognition molecules allowing for the normal trafficking of lymphocytes through the vascular system to the lymphatic compartment [42]. During this process lymphocytes have to migrate through specialized endothelial cells in the high endothelial venules. I t has been shown that the binding of the lymphocytes is dependent on the presence of sialic acid and calcium. As binding counterparts, three different calcium-dependent proteins, called E-, P-, and L-selectins, were identified [43, 441. These proteins
I
681
682
I
1 1 Advances in Sugar Chemistry
HO
OH
Kanamycin A HO
Neomycin B
0
OH
HO
OH Ribostamycin
NH HO
OH
6-N-P-Alanin-l,3,3'-N-guanidinoribostamycin
Fig. 11.2-5 Representative examples of guanidinoribostamycin) with a aminoglycosides (Kanamycin A, Neomycin corresponding linker for immobilization 6,Ribostamycin). Furthermore, a guanidino- chemistry is shown. glycoside (6'-N-B-alanin-l,3,3'-N-
allow for normal trafficking and are involved in the extravasation of leukocytes during the inflammatory cascade. With the aid of monoclonal antibodies, sialylated carbohydrate structures, notably sialyl Lea and sialyl Le", were discovered to function as receptors for the selectins [43]. Sialyl Le" is usually located on leukocytes, but also highly expressed on a variety of different cancer cells [45]. The same holds true for sialyl Lea,which serves as a tumor marker on gastrointestinal and pancreatic cancers [4G]. Owing to the function ofsialyl Lewis structures in the extravasation of cancer cells from the bloodstream and promoting metastatic spread to other tissues, a clear correlation of expression of sialyl Lea and sialyl Le" on tumors with enhanced progression and metastasis was observed. Since it is assumed that these tumor-associated carbohydrate markers enhance extravasation and metastasis by interactions with selectins, experiments were performed where selectin expression was inhibited. Long-term studies showed that cancer
11.2 Chemical Clycomics as Basisfor Drug Discovery
patients with tumors that express high amounts of sialyl Lea had a 4.5 times higher probability to survive over a 10-yearperiod if the expression of E-selectin was permitted [47]. These results point to a specific new form of cancer therapy by directly inhibiting these carbohydrate-protein interactions that are responsible for metastasis and tumor progression. Thus, the pharmaceutical industry has explored the use of the bioactive conformations of sialyl Lea and sialyl Le" to design glycomimetic drugs that bind to selectins. Beyond developing glycomimetics based on rational design, combinatorial approaches had much success. Solid-phase techniques were used to obtain libraries of fucopeptides [48] for in vitro screening, and high-throughput screening of a P-selectin assay showed that glycomimetics devoid of carbohydrate structures also revealed strong binding [49]. However, in general selectins are problematic for drug discovery because they show relatively weak multivalent interactions that make a general approach more difficult. Heparin is widely known to be a biologically important and chemically unique polysaccharide, regulating a large variety of physiological processes. It interacts with a plethora of different proteins of physiological importance [50]. The interaction with antithrombin I11 (AT 111) is best understood. Thus, since the late 1930s heparin has served as a clinical anticoagulant in the treatment of heart disease. Interactions with growth factors, chemokines, lipid-binding proteins, and viral envelope proteins are worth noting [SO]. Heparin is a linear, unbranched, highly sulfonated polymer that consists of (1+4)-linked pyranosyluronic acid and glucosamine units (Fig. 11.2-6) [51]. The type of uronic acid varies; usually 90% of L-iduronic acid and 10% of D-glucuronic acid are found. Commonly, 20 to 200 disaccharide repeat units are found giving rise to a tremendous complexity. Because ofthe high content of negatively charged sulfate and carboxyl groups, the most prominent type of interaction between heparin and basic amino acids of the protein is of ionic nature. But, in some cases, hydrogen bonding and even hydrophobic interactions are not negligible. With the exception of the AT 111-heparin interaction, where the exact sequence of heparin associating with the protein has been identified, the structure-function relationship of
Fig. 11.2-6
Schematic view of heparin.
I
683
684
I heparin is still very poorly understood. A better understanding is necessary to 1 1 Advances in Sugar Chemistry
apply defined heparin sequences in the treatment of other diseases. A variety of techniques including S P R have been applied to study heparin-protein interactions [50].
11.2.7 Detection o f Pathogenic Bacteria
Usually, the detection of pathogenic bacteria, such as Escherichia coli is based on the selective growth of these bacteria in liquid media or on plates. This procedure may require several days [52]. More recently, methods such as pathogen recognition by fluorescently labeled antibodies, DNA probes, or bacteriophages have been developed and proved to be much faster [52]. In many cases, bacteria as well as viruses bind to carbohydrates displayed on the host cells they infect. Escherichia coli binds to mannose, influenza virus binds to sialic acid, to name two examples [53]. To ensure the high-binding affinity necessary for strong adhesion and successful infection of the cell, the pathogen often uses multivalent interactions [54]. Conducting polymers displaying carbohydrates can simulate these binding events and serve as an ideal material to detect even small amounts of pathogens.
/
5
0
0
1)
HO-ND
EDAC
0 N,N'-Diisopropylamine
2)
O -H O .
HO+NHz
0rJo
0
0
50
5p
50
5p
-O-NH~
3,
0rJo
r'
quench
OH 17
Scheme 11.2-4 Synthesis of the carbohydrate-functionalized fluorescent polymer 17 for the detection o f pathogenic bacteria.
71.2 Chemical Clycomics as Basisfor Drug Discovery
Recently, our laboratory reported a carbohydrate-functionalized poly@phenylene ethynylene) (PPE) 17 (Scheme 11.2-4) that can be used for the detection of Escherichia coli by multivalent interactions [55]. Therefore, 2'-aminoethyl mannoside and galactoside were coupled to PPE. Unreacted succinimide esters were quenched by addition of excess ethanolamine before washing with water-removed uncoupled reagents. The loading of the polymer was determined by a phenol sulphuric acid test and revealed that about 25% of the reactive sites were functionalized with glycosides. A fluorescence resonance energy transfer (FRET) experiment insured that mannose-binding lectins interact with mannose displayed on the polymer without affecting binding selectivity and do not exhibit any nonspecific binding. Experiments with two bacterial strains differing in their mannose-binding properties revealed that the mannose-functionalized polymer imparted strong fluorescence to mannosebinding Escherichia coli. Even separation and rinsing procedures are not able to remove the bacteria from the polymer. In contrast, the mutated strain unable to bind mannose showed no signal and no aggregation of bacteria. The binding events involving the functionalized polymers and the bacteria were studied with the microscope. Mutant bacteria that lost the ability to bind to mannose do not bind to the polymer, whereas the mannose-binding bacteria aggregate in clusters with fluorescent centers (Fig. 11.2-7).The number ofcells in these clusters varies between 30 and several thousand. As anticipated, the larger the aggregates, the stronger the fluorescence signal. Competitive binding experiments with other carbohydrates displayed on the polymer do not reveal any fluorescent clusters. To determine the detection limit of this new method, serially diluted solutions of mannose-binding E. coli were incubated with the mannose-containing polymer. Fluorescence microscopy experiments revealed a limit in the range of 103-104 bacteria. Similar values were obtained earlier by using fluorescently labeled antibodies. Further competitive experiments have shown that only relatively high concentrations of free mannose (10 mM) inhibit binding to the polymer, significantly. At concentrations of less than 10 yM the clustering is not affected. However, many pathogens bind to the same carbohydrates, for example, E. coli as well as Salmonella enterica bind to mannose. This limitation may be overcome using cross-reactive sensor analysis [56]. Thus, the binding to a variety of different analytes is checked in parallel. By comparison with known data, the detection and determination of single or multiple pathogens, even within complex mixtures, should be possible in the near future. The underlying principle is the basis for the olfactory sense in most animals. 11.2.8 Conclusion
The isolation, purification, and structure elucidation as well as the synthesis of carbohydrates have been challenging goals for decades. Recently, new methods
I
685
686
I
I 1 Advances in Sugar Chemistry
Fig. 11.2-7 Laser scanning confocal microscopy image of: (a) Mutant Escherichia coli that does not bind t o polymer 17. (b) A fluorescent bacterial aggregate due t o multivalent interactions between the mannose-binding bacterial pili and the polymer 17 (superimposed fluorescence and
transmitted light images). (c) Fluorescence microscopy image of a large fluorescent bacterial cluster. (d) Conventional fluorescence spectra of polymer 17 (black) and normalized fluorescence spectra of a bacterial cluster obtained using confocal microscopy (red).
to gain access to these complex molecules have been developed, including a fully automated oligosaccharide synthesizer. Glycosyl phosphates and glycosyl trichloroacetimidates proved to be a powerful class of glycosylating agents for this purpose. High-yieldingcoupling steps are achieved on the solid support by using an excess amount of building blocks in the presence of a stoichiometric amount of TMSOTf. Suitable protection and deprotection strategies lead to the assembly of linear and even branched oligosaccharides that can now be performed in a fully automated manner. Several tools to understand the intricate role of oligosaccharides in various cell-signaling processes have been developed. The “chip” format enables glycoscientists to elucidate interactions of carbohydrates with fluorescently labeled proteins, including bacterial and viral toxins. Clever linking chemistries provide a wider range of glycans for screening in the microarray format. The chips are constructed by using standard DNA gene chip instrumentation. To
References I687
detect interactions, only miniscule amounts of both ligand and analyte are necessary. The tool kit consisting of carbohydrate synthesizer and carbohydrate microarrays lays the foundation for the discovery and elucidation of new drugs, as studies with the fully synthetic antitoxin malaria vaccine candidate have shown. HIV neutralizing proteins have been identified by studies with carbohydrate microarrays; aminoglycoside microarrays were used to test antibacterial resistance. Fluorescent polymers can be utilized to detect small amounts of pathogenic bacteria in a short time. Although many complex carbohydrate structures of pyranosides are now accessible by automated synthesis, the automated assembly of bacterial sugars is still a difficult goal to achieve. A further bottleneck is the rapid and highly efficient synthesis of the monosaccharide building blocks. More efficient syntheses for most of the approximately SO carbohydrate building blocks are required. Future glycobiologists will be able to screen a plethora of complex carbohydrates that are thought to play previously unimaginable roles in biological systems. The knowledge gained from glycomics will be as important a basis for future drug discovery as that discovered in the field of genomics and proteomics during the last 30 years. We are still just beginning to understand the importance of carbohydrates in biological information transfer and much remains to be discovered.
Acknowledgments
We thank all present and past members of the Seeberger group and our collaborators who contributed to the results reported in this chapter. Daniel B. Werz is grateful to the Alexander von Humboldt Foundation for a Feodor Lynen Research Fellowship and to the Deutsche Forschungsgemeinschaft (DFG) for an Emmy Noether Fellowship. Peter H. Seeberger thanks the ETH for financial support.
References 1.
(a) P. Nissen, J. Hansen, N. Ban, T.A. Steitz, The structural basis of ribosome activity in peptide bond synthesis, Science 2000, 289, 920-930; (b) N. Ban, P. Nissen, J. Hansen, P.B. Moore, T.A. Steitz, The complete atomic structure of the large ribosomal subunit at 2.4A resolution, Science 2000, 289, 905-920.
2.
(a) A. Varki, Biological roles of oligosaccharides: all the theories are correct, Glycobiology 1993, 3, 97-130; (b) H. Lis, N. Sharon, Protein glycosylation. Structural and functional aspects, Eur. /. Biochem. 1993, 218, 1-27; (c) R.A. Dwek, Glycobiology:Toward understanding the functions of sugars, Chem. Rev.
6881 J J Advances in Sugar Chemistry
1996, 96,683-720; (d) Y.C. Lee, R.T. Lee, Carbohydrateprotein interactions: Basis of glycobiology, Acc. Chem. Res. 1995, 28, 322-327; (e) W.H. Chambers, C.S. Brisette-Storkus, Hanging in the balance: natural killer cell recognition of target cells, Chem. Biol. 1995, 2, 429-435. 3. T. Hunkapiller, R. J. Kaiser, B.F. Koop, L. Hood, Large-scale and automated DNA sequence determination, Science 1991,354,59-67. 4. (a) M.H. Caruthers, Gene synthesis machines: DNA chemistry and its uses, Science 1985,230,281-285; (b) M.H. Caruthers, Chemical synthesis of DNA and DNA analogs, Acc. Chem. Res. 1991,24,278-284. 5 . E. Atherton, R.C. Sheppard, Solid-phase peptide synthesis: A practical approach, Oxford University Press, Oxford, 1989. 6. (a) R. Rodebaugh, S. Joshi, B. Fraser-Reid, H.M. Geysen, Polymer-supported oligosaccharides via n-pentenyl glycosides: methodology for a carbohydrate library, J . Org. Chem. 1997, 62, 5660-5661; (b) J. Rademann, A. Geyer, R.R. Schmidt, Solid-phase supported synthesis of the branched pentasaccharide moiety that occurs in most complex type N-glycan chains, Angew. Chem., Int. Ed. 1998, 37, 1241- 1245. 7. (a) S.J. Danishefsky, M.T. Bilodeau, Glycals in organic synthesis: the evolution of comprehensive strategies for the assembly of oligosaccharides and glycoconjugates of biological consequence, Angew. Chem., Int. Ed. Engl. 1996, 35, 1380-1419; (b) P.H. Seeberger, S.J. Danishefsky, Solid-phase synthesis of oligosaccharides and glycoconjugates by the glycal assembly method: A five year retrospective, Acc. Chem. Res. 1998, 31, 685-695; (c) R.R. Schmidt, J.C. Castro-Palomino, 0. Retz, New Aspects of glycoside bond formation, Pure Appl. Chem. 1999, 71,729-744. 8. C.-H. Wong, Enzymic and chemo-enzymic syntheses of
carbohydrates, Pure Appl. Chem. 1995, 67,1609-1616. 9. O.J. Plante, E.R. Palmacci, P.H. Seeberger, Automated solid-phase synthesis of oligosaccharides, Science 2001,291,1523-1527. 10. P.H. Seeberger, Automated carbohydrate synthesis to drive chemical glycomics, Chem. Commun. 2003, 1115-1121. 1 I. 0. J. Plante, R.B. Andrade, P.H. Seeberger, Synthesis and use of glycosyl phosphates as glycosyl donors, Org. Lett. 1999, I, 211-214. 12. R.R. Schmidt, W. Kinzy, Anomeric-oxygen activation for glycoside synthesis: the trichloroacetimidate method, Adv. Carbohydr. Chem. Biochem. 1994, 50, 21-123. 13. K.R. Love, P.H. Seeberger, Automated solid-phase synthesis of protected tumor-associated antigen and blood group determinant oligosaccharides, Angew. Chem., Int. Ed. 2004, 43, 602-605. 14. M.C. Hewitt, P.H. Seeberger, Automated solid-phase synthesis of a branched Leishmania cap tetrasaccharide, Org. Lett. 2001, 3, 3699-3702. 15. R.B. Andrade, O.J. Plante, L.G. Melean, P.H. Seeberger, Solid-phase oligosaccharide synthesis: Preparation of complex structures using a novel linker and different glycosylating agents, Org. Lett. 1999, I, 1811-1814. 16. E.R. Palmacci, M.C. Hewitt, P.H. Seeberger, “Cap-Tag” - novel methods for the rapid purification of oligosaccharides prepared by automated solid-phase synthesis, Angew. Chem., Int. Ed. 2001, 40, 4433-4437. 17. G.Hummel, R.R. Schmidt, Glycosylimidates. 79. A versatile preparation of the lactoneo-series antigens-preparation of sialyl dimer Lewis X and the dimer Lewis Y, Tetrahedron Lett. 1997, 38, 1173-1 176. 18. P.P.Deshpande, S.J. Danishefsky, Total synthesis of the potential anticancer vaccine KH-1
References I689
19.
20.
21.
22.
adenocarcinoma antigen, Nature 1997, 387,164-166. G. Ragupathi, P.P. Deshpande, D.M. Coltart, H.M. Kim, L. J. Williams, S.J. Danishefsky, P.O. Livingston, Constructing an adenocarcinoma vaccine: immunization of mice with synthetic KH-1 nonasaccharide stimulates anti-KH-1and anti-Le(y) antibodies, Znt. J . Cancer 2002, 99, 207- 2 12. D.M. Ratner, E.W. Adams, M.D. Disney, P.H. Seeberger, Tools for glycomics: Mapping interactions of carbohydrates in biological systems, ChemBioChem2004, 5, 1375-1383. (a) D. Barnes-Seemann, S.B. Park, A.N. Koehler, S.L. Schreiber, Expanding the functional group compatibility of small molecule microarrays: Discovery of novel calmodulin ligands, Angew. Chem., lnt. Ed. 2003, 42,2376-2379; (b) S. Fukui, T. Feizi, C. Galustian, A.M. Lawson, W. Chai, Oligosaccharide microarrays for high-throughput detection and specifity assignments of carbohydrate-protein interactions, Nat. Biotechnol. 2002, 20, 1011-1017: (c) A.N. Koehler, A.F. Shamji, S.L. Schreiber, Discovery of an inhibitor of a transcription factor using small molecule microarrays and diversity oriented synthesis, J. Am. Chem. SOC. 2003, 125,8420-8421; (d) P.J. Hergenrother, K.M. Depew, S.L. Schreiber, Small molecule microarrays: Covalent attachment and screening of alcohol-containing small molecules on glass slides, J . Am. Chem. SOC.2000, 122,7849-7850. (a) S. Bidlingmaier, M. Snyder, Carbohydrate analysis prepares to enter the “omics” era, Chem. Biol. 2002, 9,400-401; (b) K.R. Love, P.H. Seeberger, Carbohydrate arrays as tools for glycomics, Angew. Chem., Znt. Ed. 2002, 41, 3583-3586: (c) L.L. Kiessling, C.W. Cairo, Hitting the sweet spot, Nat. Biotechnol. 2002, 20, 234-235; (d) D.M. Ratner, W.W. Adams, J. Su, B.R. O’Keefe, M. Mrkisch, P.H. Seeberger, Probing
23.
24.
25.
26.
27.
28.
29.
30.
31.
protein-carbohydrate interactions with microarrays of synthetic oligosaccharides, ChemBioChem2004, 5, 379-383. D. Wang, S. Liu, B.J. Trummer, C. Deng, A. Wang, Carbohydrate microarrays for the recognition of cross-reactive molecular markers of microbes and host cells, Nut. Biotechnol. 2002, 20, 275-281. B.T. Houseman, M. Mrkisch, Carbohydrate arrays for the evaluation of protein binding and enzymatic modification, Chem. Biol. 2002, 9, 443-454. E.W. Adams, J. Ueberfeld, D.M. Ratner, B.R. O’Keefe, D.R. Walt, P.H. Seeberger, Encoded fiber-optic microsphere arrays for probing protein-carbohydrate interactions, Angew. Chem., Znt. Ed. 2003, 42, 5317-5320. B.T. Houseman, E.S. Gawalt, M. Mrksich, Maleimide functionalized self-assembled monolayers for the preparation of peptide and carbohydrate biochips, Langmuir 2003, 19,1522-1531. World Health Organization, World malaria situation 1990, World Health Stat. Q. 1992, 45, 257-266. L. Schofield, F. Hackett, Signal transduction in host cells by a glycosylphosphatidylinositol toxin of malaria parasites, J . Exp. Med. 1993, 177,145-153. L. Schofield, M.C. Hewitt, K. Evans, M.-A. Siomos, P.H. Seeberger, Synthetic GPI as a candidate anti-toxic vaccine in a model of malaria, Nature 2002,418,785-789. M.C. Hewitt, D.A. Snyder, P.H. Seeberger, Rapid synthesis of a glycosylphosphatidylinositol-based malaria vaccine using automated solid-phase oligosaccharide synthesis, J . Am. Chem. Soc. 2002, 124, 13434-13436. E.W. Adams, D.M. Ratner, H.R. Bokesch, j.B. McMahon, B.R. O’Keefe, P.H. Seeberger, Oligosaccharide and glycoprotein microarrays as tools in HIV glycobiology: Glycan-dependent
690
I
1 1 Advances in Sugar Chemistry
32.
33.
34.
35.
36.
37.
38.
39.
40.
41. M.W. Vetting, S.S. Hegde, gpl20/protein interactions, Chem. F. Javid-Majd, J.S. Blanchard, S.L. Biol. 2004, 11,875-881. Roderick, Aminoglycoside C. Walsh, Molecular mechanism that 2’-N-acetyltransferase from confer antibacterial drug resistance, Mycobacterium tuberculosis in Nature 2000, 406, 775-781. complex with coenzyme A and G.D. Wright, Mechanisms of aminoglycoside substrates, Nat. Struct. resistance to antibiotics, Cur. Opin. Biol. 2002, 9, 653-658. Chem. Biol. 2003, 7,563-569. 42. B.M. Gesner, V. Ginsburg, Effect of B. Llano-Sotelo, E.F. Azucena Jr, L.P. glycosidases on the fate of transfused Kotra, S. Mobashery, C.S. Chow, lymphocytes, Proc. Natl. Acad. Sci. Aminoglycosides modified by U.S.A.1964,52,750-755. resistance enzymes display 43. (a)M.L. Phillips, E. Nudelman, F.C. diminished binding to the bacterial Gaeta, M. Perez, A.K. Singhal, ribosomal aminoacyl-tRNA site, Chem. S. Hakomori, J.C. Paulson, ELAM-1 Biol. 2002, 9, 455-463. mediated cell adhesion by recognition M.D. Disney, S. Magnet, J.S. of carbohydrate ligand, sialyl-Le’, Blanchard, P.H. Seeberger, Science 1990, 250, 1130-1132; (b) E.L. Aminoglycoside microarrays to study Berg, J. Magnani, R.A. Warnok, M.K. antibiotic resistance, Angew. Chem., Robinson, E.C. Butcher, Comparison Int. Ed. 2004,43, 1591-1594. of L-selectin and E-selectin ligand M.D. Disney, P.H. Seeberger, Aminoglycoside microarrays to specifities: the L-selectin can bind the E-selectin ligands sialyl Le(x)and sialyl explore interactions of antibiotics with Le(y), Biochem.Biophys. Res. Commun. RNAs and proteins, Chem. - Eur. J . 1992, 184,1048-1055; (c) M. Yoshida, 2004, 10,3308-3314. A. Uchimura, M. Kiso, A. Hasegawa, S.S. Hegde, F. ]avid-Maid, J.S. Synthesis of chemically modified sialic Blanchard, Overexpression and mechanistic analysis of acid-containing sialyl LeX-ganglioside chromosomally encoded analogues recognized by the selectin aminoglycoside 2-N’-acetyltransferase, family, GlycoconjugateJ.1993, 10, /. Biol. Chem. 2001, 276,45876-45881. 3-15. S . Magnet, T. Lambert, P. Courvalin, 44 J.L. Magnani, The discovery, biology, and drug development of sialyl Lea J.S. Blanchard, Kinetic and mutagenic characterization of chromosomally and sialyl Le’, Arch. Biochem. Biophys. 2004,426, 122-131. encoded salmonella enterica AAC(6’)-lyaminoglycoside 45. R. Kannagi, Carbohydrate-mediated N-acetyltransferase, Biochemistry 2001, cell adhesion involved in hematogenous metastasis of cancer, 40,3700-3709. S.S. Hedge, T.K. Dam, C.F. Brewer, GlycoconjugateJ. 1997, 14, 577-584. 46. J.L. Magnani, B. Nilsson, J.S. Blanchard, Thermodynamics of M. Brockhaus, D. Zopf, Z. Steplewski, aminoglycoside and acyl-coenzyme A H. Koprowski, V. Ginsburg, A binding to salmonella enterica monoclonal antibody-defined antigen (AAC(2’)k)from mycobacterium associated with gastrointestinal cancer tuberculosis, Biochemistry2002, 41, is a ganglioside containing sialylated 7519-7527. lacto-N-fucopentaoseI1, /. Biol. Chem. (a) T.J. Baker, N.W. Luedtke, Y. Tor, 1982,257,14365-14369. M. Goodman, Synthesis and Anti-HIV activity of guanidino glycosides, J . Org. 47. S. Matsumoto, Y. Imaeda, Chem 2000,65,9054-9058; (b) N.W. S. Umemoto, K. Kobayashi, H. Suzuki, T. Okamoto, Cimetidine Luedtke, T.J. Baker, M. Goodman, increases survival of colorectal cancer Y. Tor, Guanidinoglycosides: A novel patients with high levels of sialyl family of RNA ligands, 1. Am. Chem. iewis-X and siaiyl Lewis-A epitope SOC.2000, 122, 1?035-i2036.
References I 6 9 1
48.
49.
50.
51.
52.
expression on tumor cells, Br.]. Cancer 2002, 86,161-167. C.M. Huwe, T.J. Woltering, J . Jiricek, G. Weitz-Schmidt, C.-H. Wong, Design, synthesis and biological evaluation of aryl-substituted sialyl Lewis-X mimetics prepared via cross-metathesis of C-fucopeptides, Bioorg. Med. Chem. 1999, 7, 773-788. D.H. Slee, S.J. Romano, J. Yu, T.N. Nguyen, J.K. John, N.K. Raheja, F.U. Axe, T.K. Jones, W.C. Ripka, Development of potent non-carbohydrate imidazole-based small molecule selectin inhibitors with anti-inflammatory activity, 1. Med. Chem. 2001,44,2094-2107. I. Capila, R.J. Linhardt, Heparin-protein interactions, Angew. Chem., Int. Ed. 2002, 41, 390-412. (a) B. Casu, Structure and biological activity of Heparin, Adv. Carbohydr. Chem. Biochem. 1985,43, 51-134; (b) W.D. Comper, Heparin and Related Polysaccharides, Vol. 7, Gordon and Breach, New York, 1981. R.C. Willis, Improved molecular techniques help researchers diagnose
53.
54.
55.
56.
microbial conditions, Mod. Drug Discov. 2004, 7, 36-38. (a) K.A. Karlsson, Bacterium-host protein-carbohydrate interactions and pathogenicity, Biochem. Soc. Trans. 1999, 27,471-474; (b) K.A. Karlsson, Pathogen-host protein-carbohydrate interactions as the basis of important infections, Adv. Exp. Med. B i d . 2001, 491,431-443. M. Mammen, S.-K. Choi, G.M. Whitesides, Polyvalent interactions in biological systems: implications for design and use of multivalent ligands and inhibitors, Angew. Chem., Int. Ed. 1998,37,2745-2794. M.D. Disney, J. Zheng, T.M. Swager, P.H. Seeberger, Detection of Bacteria with carbohydrate-functionalized fluorescent polymer, /. Am. Chem. Soc. 2004, 126, 13343-133346. K.J. Albert, N.S. Lewis, C.L. Schauer, G.A. Sotzing, S.E. Stitzel, T.P. Vaid, D.R. Walt, Cross-reactive chemical sensor arrays, Chem. Rev. 2000, 100, 2595-2626.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors Paul A. Townsend, Simon J . Crabb, Sean M . Davidson, Peter W. M . Johnson, Graham Packham. and Arasu Ganesan
Outlook
It is only a decade since the first human histone deacetylase (HDAC) was identified. Within this short period of time, these enzymes have had a glorious history. Broad ranging studies by both chemists and biologists have dramatically increased our fundamental understanding of H DACs and their function in eukaryotic cell regulation. On the drug discovery front, multiple HDAC inhibitors are at stages of clinical development as anticancer agents. It is probable that more than one will soon be approved as a drug. A further development is the link between HDAC inhibitors and a growing set of therapeutic indications outside the cancer area. One can anticipate proof of concept animal models leading to clinical trials for these drugs in the near future. In this review, we have focused on the bicyclic depsipeptide family of natural product HDAC inhibitors. Compared to other classes, these compounds exhibit high potency and a marked degree of selectivity between individual HDACs. One of the natural products, FK228, is currently in advanced clinical trials for cancer. Others, the spiruchostatins, were recently discovered and show a similar biological profile of action. With these natural products, it is unclear (and unlikely) that their precise structure represents the optimal molecule within this class for human therapeutics. Several academic laboratories, including our own, have achieved the total synthesis of depsipeptides. These routes are being applied to the preparation of novel unnatural analogs, which hold great promise in further exploiting the depsipeptides as subtypeselective biological probes of HDAC function and as potential therapeutic agents. Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH 61 Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
694
I
12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
12.1 Epigenetic Mechanisms o f Gene Regulation
One of the hallmarks of cellular pathologies such as neoplastic transformation is that the normal control of differentiation, cell-cycle progression, and appropriate entry into apoptosis (programmed cell death) becomes deranged. This abnormal phenotype is a consequence of altered patterns of protein expression, which in turn result from a variety of genetic abnormalities. An area of increasing interest in basic and clinical research, are the epigenetic control mechanism [l],focusing on the modulation of DNA packaging as a means of gene expression regulation. The genomic DNA of eukaryotes is tightly compacted into the higher order structure of chromatin, which comprises histones, nonhistone proteins, and DNA. These components come together in a tightly wound and organized structure that is dynamic in its nature. The basic repeating unit of such chromosomal organization is the nucleosome that occurs in approximately every 200 DNA bp, consisting of 146 bp of DNA wrapped left handed twice around an octamer core ofpaired histones H3, H4, H2A, and H2B as successive “beads on a string”. Nucleosomes are then usually further packed together via the linker histone H1 allowing condensation of this fundamental unit into higher order structures visible as chromosomes at metaphase. Posttranslational modification of the higher order structure of DNA has now been demonstrated to have an important role in regulating gene expression - bearing out a prediction [2] made over 40 years ago. Modification to DNA occurs primarily by methylation at CpG residues, which appears [3] to be a gene-silencing mechanism. In a similar manner, the histone proteins undergo a variety of reversible posttranslational modifications (Fig. 12-1)that cause an alteration in chromatin structure and, hence, have a profound impact on the accessibility of DNA to the transcriptional machinery. Histones exist as globular domains with long N-terminal tails making up 25% of their structure. Lysine residues in the tail can undergo acetylation, methylation, ubiquitinylation, and sumoylation. Additional posttranslational modifications include methylation of arginine, phosphorylation of serine, and poly-ADP ribosylation of glutamate and aspartic acid residues. Histone acetyltransferases (HATS)mediate the transfer of an acetyl group from acetyl-coenzyme A (CoA) to the &-aminogroup of lysine. This simple change dramatically alters the lysine side chain from its protonated positively charged state at physiological pH to a neutral residue. As a result, the afinity between the negatively charged DNA phosphodiester backbone and the positively charged histones is weakened, enabling protein complexes such as yeast mating type switching (SWI)/sucrosenonferuenting (SNF) and other transcriptional factors to bind DNA, further relaxing its tightly wound structure. Acetylation on the K9 and K4 lysines of the N-terminus tails of internal, core, histones of the nucleosome is particularly associated with enhanced gene expression. The return of acetyl-lysineto lysine is catalyzed by a second family of hydrolyzing enzymes, the histone deacetylases (HDACs).
12.1 Epigenetic Mechanisms ofGene Regulation
Fig. 12-1 Examples of posttranslational modifications at histone tails. Source: M. Biel, M. Wascholowski, A. Ciannis, Angew. Chem., Int. Ed. 2004, 44, 3186-3216.
Such deacetylation tends to lead to a more tightly bound and transcriptionally silenced state (Fig. 12-2). In general, transcriptional activators can bind and recruit HATS while transcriptional repressors and corepressors interact with HDACs. The unwinding of DNA off histones by lysine acetylation is conceptually helpful for understanding the action of HATS and HDACs. It is, nevertheless, a simplistic and incomplete explanation for the way in which these enzymes control gene expression. For example, in some cases [4]inhibition of HDACs can lead to a counterintuitive decrease in gene expression. It is likely that the overall pattern of histone modification (of which acetylation is but one example) represents
Fig. 12-2 Schematic representation of histone acetylation as a model for transcriptional control by epigenetic mechanisms.
1
695
696
I a “histone code” that in turn acts as a conduit for the recruitment of binding 12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
partners that determine the state of gene transcription. Among the histone modifying enzymes, HATs and HDACs are the best characterized biochemically and provide attractive opportunities for interdisciplinary research between chemists and biologists. At present, the HDACs have outstripped the HATs in terms of their impact on drug discovery. Multiple HDAC inhibitors from several chemical classes are currently in clinical trials for cancer chemotherapy, whereas the literature on HAT inhibitors is limited to in vitro data. 12.2 Histone Deacetylases
HDACs are an evolutionarily conserved group of enzymes, which catalyze the hydrolysis of acetyl-lysine residues in proteins. While the importance of this process for histones in modulating chromatin structure cannot be overestimated, “histone” deacetylase is a dangerously misleading nomenclature. Reversible lysine acetylation has been identified [ S ] in an increasing number of nonhistone proteins, both nuclear and cytoplasmic (Table 12-1).Transcription factors dominate the list, with over 40 documented including MyoD, NF-KB, GATA-1, c-Jun, B-Myb, and AML-1. With these proteins, acetylation can modify DNA binding affinity, coregulator association, nuclear localization, and susceptibility to posttranslational modification such as phosphorylation or ubiquitinylation. There are more than a dozen individual human HDAC enzymes [GI,which can be divided into three main classes on the basis of structure and functional characteristics through homology to yeast HDACs. HDACs 1 through 11 share the common mechanistic feature of being metalloenzymes, with a highly conserved catalytic domain of 390 amino acids containing a zinc atom. They are further subdivided into class I and class 11 enzymes. The class I HDACs 1 , 2 , 3 ,and 8 are homologous in their catalytic sites to the yeast HDAC Rpd3. They have a ubiquitous distribution and are localized to the nucleus. The class I1 HDACs 4, 5, 6, 7, 9, and 10 are larger in size, restricted in their tissue distribution, have the ability to shuttle between the cytoplasm and nucleus and are homologous to the yeast HDAC HdaI. HDACll has similarities to both class I and class I1 and is usually classified separately. The class 111 HDACs (sirtuins) comprise a distinct set of enzymes sirtuin (SIRT) 1-7 with a common 275 amino acid catalytic domain and homology to yeast silent information regulator 2 (Sir2). These HDACs do not contain a catalytic zinc, using nicotinamide adenine dinucleotide (NAD+)as the cofactor that acts as an acetyl acceptor following hydrolysis of the nicotinamide moiety. The sirtuins potentially constitute a link between cellular energy status and transcriptional regulation. They are gaining widespread interest [7] because of their intriguing involvement in several fundamental processes, including
12.3 Class I and Class / I HDACs as Drug Discovery Targets Table 12-1 Nonhistone proteins regulated by acetylation status Function
Transcription factor
Tumor suppressor Cell cycle Cell adhesion Nuclear hormone receptor Nuclear import factor Cytoskeleton protein Chaperone protein Signaling regulation Apoptosis regulator Nonhistone chromatin protein DNA metabolism DNA replication factor Chromatid cohesion protein Viral protein Bacterial protein Histone acetyl transferase
Acetylation targets
p73, TCF, GATA-I, RelA, E2F, UBF, EKLF, NF-Y, STATG, CREB, c-Jun,CIEBDj3, E2A, HMGI (Y), UBF, N F - K B p65/Rel A, NF-KB p50, YYI, BclG, Cart-1, HIV-1 Tat, Brm, MyoD, TALl/SCL, E2A, HIF-la, TFIIE, TFIIF, PC4, TFIIB, TAFI68 P53 Rb p-Catenin AR, E R a Importin a , Rehl a-Tubulin HSPOO Smad7 Ku70 HMGBl/HMGl, HMGB2/HMGZ, HMGNl/HMG14, HMGN2/HMG17 Flap endonuclease-1, thymine DNA glycosylase, Werner DNA helicase PCNA, MCM3 San, cohesion subunits Adenoviral ElA, large T antigen, HIV Tat, s-HDAg Alba, CheY, acetyl CoA synthetase DCAF, p300, CBP
longevity, apoptosis, gene silencing, and DNA damage repair. Nevertheless, our understanding of the sirtuins is at a more embryonic stage than that of the zinc metalloenzymes. For this reason, they will not be discussed further.
12.3 Class I and Class II HDACs as Drug Discovery Targets
HDACs play a fundamental role in determining the state of chromatin, and are involved in the modulation of numerous other important proteins. Thus, although the first human HDACs were only identified a decade ago, it is not surprising that these enzymes are already attractive therapeutic targets [8] for a host of diseases including cancer, neurodegenerative disorders, cardiac hypertrophy, inflammation, diabetes, atherosclerosis, and infectious diseases. Altered acetylation patterns are a hallmark [9] of many primary tumor types. The best evidence for the importance of HDACs in cancer comes from studies with small molecule HDAC inhibitors, ranging from cell-based in vitro experiments to tumor xenograft models and human clinical trials. Reassuringly, despite the potential for HDAC inhibitors to affect a range of
I
697
698
I
12 The Bicyclic Depsipeptide Family of Histone Deacetylase lnhibitors
normal processes in healthy cells, early clinical studies have established [lo] that they are relatively well tolerated in humans. Investigations into HDAC inhibitors outside the cancer area are more recent and are at an earlier stage of drug development. Nevertheless, there are mouse and Drosophila models demonstrating [ll]positive effects of HDAC inhibitors for the treatment of neurodegenerative ailments such as Parkinson’s and Huntington’s disease. Similarly, mice knockouts and in vitro studies link [12] aberrant HDAC activity with cardiac hypertrophy. The inhibition of some HDACs has a beneficial effect in repressing hypertrophy, while HDACs 5 and 9 are anti hypertrophic. This suggests that a selective HDAC inhibitor will not be suitable for therapy. In scientific papers and the patent literature, there are reports of the beneficial effects of HDAC inhibitors in models for various other therapeutic indications including inflammation [13], immunomodulation [ 141, diabetes [15], and atherosclerosis [lG]. HDAC inhibitors are potentially useful for the treatment of infectious diseases. This is most well documented with the malaria parasite. Merck and GlaxoSmithKline have reported [17] a series of inhibitors based on the apicidin cyclic tetrapeptide natural product scaffold with some selectivity for Plasmodium over human HDACs. In the antiviral field, HDAC inhibitors were recently shown [18] to drive the expression of latent reservoirs of HIV, thus facilitating their eradication. Outside the human therapeutic areas, there is an interesting recent patent [19] by Dow who has independently isolated FK228, a HDAC inhibitor, from a Madagascar plant and shown that it is an antiinsecticidal agent.
12.4 HDAC Inhibitors
The lead small molecule inhibitors of zinc-dependent class I and class 11 HDACs were identified indirectly before an understanding of their mechanism of action or the characterization of the human enzymes. Thus, Breslow’s pioneering studies about the cell differentiating ability of DMSO led to synthetic hydroxamic acid compounds that were later recognized as potent HDAC inhibitors. Meanwhile, high-throughput screening of crude natural product extracts in cell-based antimicrobial and anticancer assays followed by isolation of the active principle provided compounds such as trichostatin, trapoxin, and FK228 that were later shown to share the common mechanism of HDAC inhibition. Regardless of their origin, the structures of most inhibitors of the zincdependent HDAC inhibitors can be easily rationalized. They conform to the classical medicinal chemistry dogma for modulating hydrolase enzymes with a catalytic metal at the active site by competitive reversible inhibitors. Such compounds have two key features:
12.4 HDAC Inhibitors
1. A resemblance to the enzyme substrate, promoting high affinity recognition and binding by the enzyme. 2. Replacement of the scissile bond by a metal-binding group, often a bidentate chelator.
This strategy has yielded successful drugs in the past, such as the angiotensin converting enzyme (ACE) inhibitor Captopril and later congeners. More recent examples include inhibitors of matrix metalloproteinases and peptide deformylase. For HDACs, the pharmacophore is defined by a metal-binding group attached to a linear unit of similar dimensions to the lysine side chain of the substrate. This is terminated by a “cap” that serves to orient the inhibitor in the enzyme’s substrate-binding channel. The difficulty of expressing eukaryotic HDACs and obtaining them in pure form has hampered our understanding of the mechanism of action at the molecular level. A seminal breakthrough came about in 1999 with the X-ray structure [20] of a HDAC-like protein from the thermophilic bacterium Aqu@x aeolicus. Since bacteria lack histones, presumably the protein acts as a lysine deacetylase upon other substrates. The bacterial protein shares high homology with class I HDACs in its catalytic domain and offers a reliable working model for the latter. The zinc atom in the enzyme active site lies at the end of a narrow substrate-binding channel that binds the acetyl-lysine side chain (Fig. 12-3). More recently, the structures of human HDACS [21] and a bacterial enzyme [22]homologous to class I1 HDACs were disclosed. At a gross level, all these structures are similar in their substrate-binding channels. They are less informative in
Fig. 12-3 The X-ray structure of a bacterial corresponds to that of Fig. 12-4, with the histone deacetylase-like protein homologous to human class I HDACs. The color coding of amino acid residues
catalytic zinc in purple. Source: T. A. Miller, D. 1. Witter, 5. Belvedere,J. Med. Chern. 2003,46,5097-5116.
I
699
700
I
12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
Fig. 12-4 Sequence homology between rim regions are shown in color. Source: T. A. Miller, D. J. Witter, S. Belvedere,J. mammalian HDACs and the bacterial Med. Chem. 200% 4 6 5097-5116. HDAC-like protein (HDLP). Conserved residues within the active site, channel, and
predicting the differences between isoforms at the “rim” of the channel, and it is precisely these differences that are likely to determine substrate specificity (Fig. 12-4). In the instance of lack of X-ray structures, one approach has been to estimate [23] the eukaryotic enzymes by homology modeling. The simplest HDAC inhibitors are short chain carboxylic acids such as butyric acid, where presumably the acid is the zinc-binding group. These are relatively low in potency (micromolar ICSO). Valproic acid, an old drug used as an anticonvulsant, is similarly a modest HDAC inhibitor and has now advanced to clinical trials as an anticancer agent (Fig. 12-5).The low potency, combined with short half-life and metabolic instability are the liabilities associated with this class of HDAC inhibitors. Hydroxamic acids are excellent metal-binding chelators, and they represent the most important family of HDAC inhibitors with many examples of nanomolar potency. This motif has been exploited by nature, as in the natural
12.4 H D A C lnhibitors I701
Valproic acid
SAHA
MS-275
0
'
x FK228
Fig. 12-5 agents.
Examples of H D A C inhibitors that have reached clinical trials as anticancer
product trichostatin A (TSA).Although too toxic for therapeutic use, TSA was the first HDAC inhibitor to be mechanistically identified as such [24] and remains the standard chemical probe of HDAC function and is widely used as a molecular biological tool. Thousands of synthetic hydroxamic acid H DAC inhibitors have been reported. Breslow's suberoylanilide hydroxamic acid (SAHA) illustrates the design requirements for HDAC inhibition perfectly: a hydroxamic acid metal-binding group, a linear spacer, and an anilide cap. SAHA was commercialized via the startup Aton Pharmaceuticals, later acquired by Merck for several hundred million dollars. SAHA is currently under review for FDA approval and is an excellent illustration that drugs can be minimalistic in structure and be successfully discovered in an academic setting. The third family of HDAC inhibitors are cyclic tetrapeptide natural products exemplified by the trapoxins and apicidins. A ketone functions as the metalbinding group and an adjacent epoxide capable of irreversible covalent binding to the enzyme is often present. The natural products contain a mixture of L and D amino acids and a proline residue to favor the tight turn necessary to cyclize a tetrapeptide. Although the cyclic tetrapeptides have yet to advance to clinical trials, they are important biological tools. Schreiber's group [25] used an affinity column with immobilized trapoxin B to identify its target of action, and this led to the first characterization of a mammalian HDAC. More recently, Nishino and Yoshida have reported [26]a series of unnatural analogs based on the tetrapeptide scaffold with different zinc-binding groups. Benzamides represent a fourth class of HDAC inhibitors. Unlike the other H DAC inhibitors above, benzamides do not conform to the simple pharmacophore model with an obvious metal-binding group connected to a linear spacer. Whether they work by the same mechanism or target an allosteric site on the enzyme is not fully resolved. Nevertheless, they display nanomolar potency, and more than one compound have reached phase I clinical trials for cancer.
702
I
72 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
At least in part, different HDACs presumably achieve selectivity by discriminating between the side chains of adjacent residues near the scissile acetyl-lysinein the protein substrate. The minimal pharmacophore for HDAC inhibition of a metal-binding site and a linear spacer does not take these additional interactions into account. The early hydroxamic acid inhibitors have fairly small caps that do not protrude much beyond the substrate-binding channel. Although potent, they are pan-HDAC inhibitors that are effective against all the isoforms. Until a better understanding of the function of individual HDACs is available, it is unclear whether a global HDAC inhibitor is best for a therapeutic setting. The past history of drug discovery does suggest that subtype-selectiveagents are generally superior to nonselective inhibitors. By real time, quantitative, polymerase chain reaction (qPCR)we have investigated the level of HDAC genes in a wide variety ofcancer cell lines compared to normal human dermal fibroblasts (Fig. 12-6).Although it is difficult to directly compare probes, the results suggest that cancer cells appear to express more ofthe class I HDACs and that these should be the ones targeted by inhibitors. Similar observations [27] with patient samples show elevated levels of HDACl and HDAC2 in different cancers.
Fig. 12-6 Relative expression levels of HDACs, by qPCR, in a series o f cancer cell lines.
72.5 The Depsipeptide HDAC Inhibitors
To achieve selectivity in a classical metal-binding HDAC inhibitor, the cap needs to contain functionality for additional interactions with the “rim”. Of the inhibitors described above, the cyclic tetrapeptides have this potential due to their large macrocyclic scaffold, but have yet to result in clinical candidates. Structurally, the most complex HDAC inhibitors are the depsipeptide natural products exemplified by FK228. These compounds, which are treated separately in the next section, have even more elaborate “caps”, and are the bestdocumented example of selective HDAC inhibitors.
12.5 The Depsipeptide HDAC Inhibitors
The depsipeptide FK228 (originally called FR901,228) was isolated [28] by Fujisawa Pharmaceuticals from an extract of the bacteria Chrornobacteriurn violaceurn No. 968 on the basis of an assay for phenotypic reversal of rastransformed tumor cells. The compound was shown to be active in a tumor xenograft animal model, and to have effects [29] similar to the known HDAC inhibitors, trichostatin A and trapoxin. Superficially,FK228 (Fig. 12-7)does not
0
I
FK228
FR901,375
lntracellular disulfide reduction
Spiruchostatin A R = i-Pr Spiruchostatin B R = s-Bu Spiruchostatin C R= i-Bu
\
SHC!
Spiruchostatin D
/
HS
FK228, active form
Fig. 12-7 The bicyclic depsipeptide HDAC inhibitors
704
I2 The Bicyclic Depsipeptide Family of Histone Deacetylase /nhibitors
I resemble these classic HDAC inhibitors. Within the reducing environment of the cell, however, one can anticipate reduction of the disulfide bridge to give free thiols, which now fit the model of a metal-binding group connected to a linear spacer. Key experiments [30] by Yoshida’s group provided supporting evidence for this hypothesis. Thus, when assayed in uitro against partially purified HDACl and 2, FK228 is significantly more active in the presence of the reducing agent dithiothreitol (DTT).The activity is lost when the oxidizing agent HzOz is added, or when the reduced dithiol version of FK228 is used. Furthermore, a thiomethyl derivative obtained by alkylation of the thiol was inactive. These results indicate that the free thiol is needed for enzyme inhibition. Excitingly, the data also revealed that FK228 was much more active against the class I HDACs 1 and 2 than the class I1 HDACs 4 and 6. Compared to simpler inhibitors such as trichostatin, the large macrocyclic “cap” contains sufficient structural information for additional rim interactions outside the substrate-binding channel, enabling differences in affinity between isoforms. Another patent by Fujisawa disclosed [31]the structure of FR-901375 from an extract of Pseudomonas chloroaphis No. 2522. While it is a likely HDAC inhibitor, no data have been reported in this regard and the decision seems to have been made to promote FK228 instead as the clinical candidate. In 2001, additional depsipeptide natural products, the spiruchostatins,were reported [32]by Shinya’s group at the University of Tokyo and Yamanouchi Pharmaceuticals. These compounds were isolated from an extract of Pseudomonas sp. 471576, on the basis of the ability to increase expression of luciferase driven by the plasminogen activator inhibitor (PAI-I) promoter. Given the struchral similarity to FK228, the spiruchostatins were likely to be HDAC inhibitors and this was confirmed in a later patent [33] and in our biochemical studies (see below) with the natural product prepared by total synthesis.
12.6 Total Synthesis of Depsipeptide HDAC Inhibitors - Routes to the B-HydroxyAcid Fragment
Compared to other classes of H DAC inhibitors, the depsipeptides exhibit two impressive features. Firstly, they are highly potent with IC5os in the low nanomolar range. Secondly, they are significantly more active against class I HDACs compared to class I1 HDACs. Fortuitously, it is the former that are more heavily implicated in cancer and cardiac hypertrophy. On the other hand, the depsipeptides are structurally the most complicated class of HDAC inhibitors. Their elaborate framework has apparently deterred the pharmaceutical industry from the preparation of unnatural analogs and the iterative improvement of their properties. The Fujisawa and Yamanouchi patents only cover the natural products and so far only academic groups have described the total synthesis of depsipeptides.
12.6 Routes to the j3-Hydroxy Acid Fragment I 7 0 5
Disconnection of the depsipeptides at the amide and ester bonds plus the intramolecular disulfide bridge leads to a peptide fragment and a p-hydroxy acid. Neither of these is particularly daunting by the standards of modern day complex molecule total synthesis. Nevertheless, the molecule as a whole has an intricate array of functional groups that need to be selectively manipulated. In addition, two macrocycles need to be made, which is always challenging due to the entropic difficulty of making large-sized rings. All the depsipeptides contain a common B-hydroxy acid, which can be disconnected by an aldol reaction. However, it is an example of an “acetate aldol” that suffers from poor facial selectivity of the acetate enolate. Many of the auxiliaries and reagent-based conditions that work for propionate and other a-substituted enolates are unsuitable for acetate aldols. In the event, each depsipeptide total synthesis has featured a different route for the synthesis of this B-hydroxy acid fragment. In Simon’s pioneering FK228 synthesis [34], methyl pentadieonate was reacted with trityl thiol to give the 1,6 conjugate addition product 1 as an inconsequential mixture of a$- and p ,y-unsaturated isomers. Reduction to the alcohol 2 and oxidation provided the a,B-unsaturated aldehyde 3. The key asymmetric acetate aldol reaction was carried out using Carreira’s conditions (Scheme 12-1) to give 4 in nearly quantitative yields and perfect enantioselectivity, followed by hydrolysis to acid ent-5.This is the enantiomer of the fragment present in the natural products. Because of later difficulties with the macrolactonization, that step was carried out under Mitsunobu conditions with inversion of the alcohol, hence necessitating the opposite stereochemistry in precursor ent-5. In the Wentworth-janda synthesis [35] of FR-901375, aldehyde 3 was obtained by a shorter route via conjugate addition to acrolein and Wittig reaction (Scheme 12-2). The authors had difficulties reproducing the high enantioselectivity of Simon’s aldol reaction and alternative solutions were sought. The successful synthesis utilized the Evans’ chiral auxiliary with chloroacetate. The chloride is a “dummy” substituent ensuring high diastereoselectivity in the aldol adduct 6 . The chloride was then reduced and the auxiliary removed to give acid ent-5. In our synthesis [36] of spiruchostatin A, we followed Simon’s procedure for the preparation of 3. We too were unable to achieve the Carreira aldol in good yield. Moreover, the reaction requires the preparation of three noncommercial materials: the binaphthyl chiral aminophenol, the t-butyl salicaldehyde, and the silyl ketene acetal. Instead, we opted for a diastereoselective aldol with the Nagao auxiliary. For reasons that are not completely clear, the Nagao thiazolidinethione auxiliary exhibits high diastereoselectivity in acetate aldols unlike the more popular Evans oxazolidinone auxiliary. In this case, aldol adduct 7 was obtained in good yield (Scheme 12-3).Unlike the other syntheses, this was coupled directly to the peptide rather than hydrolyzed to the acid 5. In the Doi-Takahashi synthesis [37] of spiruchostatin A, the acetate aldol was performed with the Seebach quaternary oxazolidinone chiral auxiliary. The best
706
I
12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
k 0
Me0
78% M e O L S T f l
(6:la$ to P,yisomer) 1
0.07 equiv
3
0.03 equiv Ti(0i-Pr),
b 91%
C
HO-
(6:la,p to p,y isomer) 2
0.07 equiv
4
1.5 equiv
99%, >98% ee
BnO Toluene 4 "C, 36 h; TBAF, THF, 5 min
I
d 100%
0
OH
HO-STfl enf-5 Scheme 12-1 Simon's route to acid 5. Reagents and conditions: (a) 1.2 equiv TrtSH, 1.2 equiv C s 2 C 0 3 ,THF, 20 h. (b) 2 equiv DIBAL, CH&, -78 "C, 3 h. (c) 1.2
equiv (COCI)?,2.4 equiv DMSO, CH2C12, -78"C, 30 min; 2.4 equiv Et3N, -30°C 4 h. (d) 10 equiv LiOH, MeOH, 3 h.
diastereoselectivity was observed with transmetallation of the lithium enolate to zirconium. Basic hydrolysis of the product 8 then afforded free acid 5.
12.7 Total Synthesis o f Depsipeptide HDAC Inhibitors - Peptide Synthesis and Formation o f the seco-Hydroxy Acid
Simon's FK228 synthesis, the first in this area, provided a blueprint for preparation of the peptide fragment and its linkage to the B-hydroxy acid 5. Starting from D-valine, standard peptide coupling furnished the linear peptide 9 (Scheme 12-4).The dehydrobutyrine side chain was now introduced by conversion of the threonine to a tosylate followed by elimination. After Fmoc deprotection, the free N-terminus was coupled to acid ent-5, and the C-terminus methyl ester hydrolyzed to give seco-acid 10. A similar strategy was employed in Wentworth and Janda's synthesis of FR901,375. For this target, the absence of the dehydrobutyrine unit simplifies the tetrapeptide synthesis, which was accomplished in a straightforward manner. Coupling with ent-5 and hydrolysis gave the seco-acid 11.
12.7 Peptide Synthesis and Formation of the seco-Hydroxy Acid
69%
!
0
1.5 qeuiv 0
1
I
707
1.8 equiv Bu2BOTf
A ,N I . ~2~equivI i-Pr,NEt 0 -.,
'Bn
CH2CI2, -78 to -10 "C, 8 h
69%, >90% de Scheme 12-2 The Wentworth-Jandaroute to acid 5. Reagents and conditions: (a) (i) 0.7 equiv TrtSH, 0.7 equiv E t j N, CH2Cl2, 1 h; (ii) Ph3P=CH-CHO, benzene, reflux. 7 h. (b) (i)Al amalgam, aq THF, O"C, 2 h; (ii) aq LiOH/H202 in THF, 1 h.
1.7 equiv
JNk d., 'r
0
L
S3T
d,,7
~
r
t
BuLi 1.2 equiv Cp,ZrC12 THF, -78 "C to rt
OH
JN STrt h
0 1.9 equiv TiCI, 1.9 equiv i-Pr,NEt CH,C12, -78 "C, 30 rnin
3
H
S
' 0
XJ.
Ph Ph
'r
76%
8
STrt
51Yo
The Canesan and Doi-Takahashi procedures for enantioselective acetate aldol reactions with aldehyde 3. Scheme 12-3
In the spiruchostatin syntheses, the presence ofa statine unit in the peptide fragment requires a significantly different protecting group strategy. Statine esters, unless sterically hindered, rapidly undergo intramolecular cyclization
708
I
12 The Bicyclic Depsipeptide family of Histone Deacetylase Inhibitors
HzNs*
Me0
b 85%
9
AOMeoH
45%
‘,
Scheme 12-4 Simon’s and Wentworth-Janda’sroutes to a linear seco-hydroxy acid. Reagents and conditions: (a) (i) 1 equiv Fmoc-L-Thr-OH,1.5 equiv BOP, 3 equiv i-Pr2NEt,MeCN, 30 min; (ii) 5% Et2NH/MeCN, 3 h; 1.1 equiv Fmoc-D-Cys(Trt)-OH,1.1 equiv BOP, 2.5 equiv i-PrzNEt, MeCN, 30 min; (iii) 5% Et2NH/MeCN, 3 h; (iv) 1.1 equiv Fmoc-D-Val-OH,1.6 equiv BOP, 6 equiv i-Pr2NEt, MeCN, 30 min. (b) (i) 3 equiv TslO, pyridine, O”C, 20 min; (ii) 10 equiv DABCO, MeCN, 2 h; 5% Et2NH/MeCN, 22 h; (iii) 1 equiv acid ent-5, 1.5 equiv BOP, 3
equiv i-Pr2NEt, MeCN/CH2C12, 30 min; (iv) 2 equiv LiOH, aq THF, O”C, 3.5 h. (c) (i) 1 equiv Fmoc-D-Cys(STrt)-OH,1.2 equiv EDC, 1.2 equiv HOBt, DMF/CH2C12, 20 h; (ii) 1.3 equiv TBSCI, 1.3 equiv imidazole, DMF, 20 h; (iii) 50% Et2NH/CH2C12,0°C, 3 h; (iv) 1.1 equiv Fmoc-D-Val-OH,1.3 equiv EDC, 1.4 equiv HOBt, DMF/CH?CIz, 20 h; (v) 50% Et2NHJCHzC12, O”C, 4 h; (vi) 1.1 equiv Fmoc-D-Val-OH,1.3 equiv EDC, 1.4 equiv HOBt, 20 h; (vii) 38% Et2NH/CH2C12, O”C, 3 h, rt, 3 h; (viii) 1 equiv acid ent-5, 1.5 equiv BOP, 3 equiv i-PrzNEt, MeCN/CH2C12, 1 h; (ix) LiOH, aq THF, 1 6 h.
to the lactam when the amine is deprotected. Furthermore, the B-hydroxy ester unit is prone to protecting group migration and elimination. In our total synthesis (Scheme 1 2 4 , the eventual solution used a nonhindered ester that can be removed under neutral conditions without destroying the fragile ,5-hydroxy acid. Meanwhile, the statine was N-protected with a Boc group. Upon acidic removal, the resulting protonated amine is not nucleophilic and does not cyclize to the lactam. Addition of an acylating agent and a base, neutralizes the amine in situ, which then undergoes intermolecular coupling. This is a testament to the speed of acylations with activated carboxylic acids, given the presence of an undesirable intramolecular pathway that does not compete effectively. The statine 12 was prepared by Claisen condensation ofvaline with methyl acetate followed by stereoselective reduction of the j3-keto ester, following precedents as in Jouillik’s total synthesis [38] of tamandarin. The Boc group
12.8 Macrocyclizations and Completion ofthe Synthesis
o
W
y
-
13
'L
y
d
B O C - N A C ~ ~ H34% H
\
~
o
c
.
~
v
l
l
OH 0 OH
14
15
Scheme 12-5 The Ganesan and Doi-Takahashi syntheses o f spiruchostatin A seco acids. Reagents and conditions: (a) (i) 1.1 equiv PfpOH, 1.2 equiv EDC,HCI, 0.2 equiv DMAP, CH2C12, O"C, 30 rnin, rt, 4 h; (ii) 3.2 equiv LiCH2C02CH3, THF, - 7 8 ° C 45 min; (iii) 3.5 equiv KBH4, M e O H , -78 toO"C, 50 min; (iv) 26 equiv LiOH, 4 : 1 THF/H20, O"C, 2 h; (v) 15 equivTceOH, 6.2 equiv DCC, 0.12 equiv DMAP, CH2C12. 0°C t o r t , 18 h. (b) (i) 20%TFA/CH2C12, 3 h; (ii) 1 equiv Fmoc-D-Cys(Trt)-OH, 1.2 equiv PyBOP, 3.5 equiv i-Pr2NEt, CH3CN, 20 min; (iii) 4 equiv TIPSOTf, 6 equiv 2,6-lutidine,
CH2C12, 3 h; (iv) 5% Et>NH/CH3CN, 3 h; (v) 1.3 equiv Fmoc-D-Ala-OH, 1.3 equiv PyBOP, 3 equiv i-PrzNEt, CHICN, 1 h; (vi) 5% Et2NH/CH3CN, 5 h; (vii) 0.9 equiv 7, 0.1 equiv DMAP, CH2C12, O"C, then rt, 7 h; (viii) 10 equiv Zn, NHqOAc/THF, 5 h. (c) (i) irnzC0, (Et02CCH2C02)2 M g , THF; (ii) NaBH4, T H F / M e O H ; (iii) LiOH, aq THF; (iv) allyl bromide, K2C03. (d) (i) HCI, EtOAc; (ii) Fmoc-D-Cys(STrt)-OH, EDC, HOBt, i-PrzNEt, (iii) EtZNH; (iv) Fmoc-D-Ala-OH, EDC, HOBt, i-Pr2NEt; (v) acid 5 , PyBOP, i-PrzNEt; (vi) Pd(PPh,)4, morpholine, M e O H .
was removed, and the amine coupled with D-Cys(Trt)as described above. The free alcohol was protected and the peptide sequentially coupled with D-alanine and thiazolidinethione 7. Reductive removal of the trichloroethyl ester under neutral buffered conditions provided seco-acid 13. The Doi-Takahashi route was essentially similar, except that the statine unit 14 was an allyl ester, and the secoacid 15 had a free alcohol in place of the triisopropylsilyl (TIPS) protected 13.
12.8 Total Synthesis o f Depsipeptide HDAC Inhibitors - Macrocyclizationsand Completion of the Synthesis
Interestingly, all the depsipeptide total syntheses to date have chosen to form the macrocyclic ring by disconnecting the same ester bond.
I
709
710
I There are two strategies for such macrolactonizations. The first, which is 72 The Bicyclic Depsipeptide family oftfistone Deacetylase lnhibitors
more common, involves the activation of the carbonyl group followed by nucleophilic intramolecular displacement by the alcohol. In Simon's FK228 synthesis, attempts at cyclizing seco-acid 10 in this manner were unsuccessful. Consequently, the second strategy, whereby the alcohol is converted into a leaving group that is displaced by the carboxylic acid, was explored. Under carefully controlled Mitsunobu conditions, the macrocycle was obtained in good yield (Scheme 12-6). The stereochemical inversion occurring in this process meant that the alcohol in 10 had the enantiomeric chirality to the natural product. After macrolactonization, the second cyclization involving formation of the disulfide bridge was smoothly accomplished by iodine oxidation, completing the total synthesis of FK228. The same sequence of reactions was used in the Wentworth-Jandasynthesis of FR901,375. For our spiruchostatin total synthesis, we chose to reexamine the first strategy of carbonyl activation. At the very least, since our target was different from Simon's, it was possible that his negative results would not apply to us. Initial experiments with the popular Yamaguchi method, whereby the hydroxy acid is treated with trichlorobenzoyl chloride, proved promising. When the additional alcohol in the seco-acid was protected, this furnished the macrocycle in good yield (Scheme 12-7). The mechanism of the Yamaguchi procedure is believed to involve a mixedanhydride. A recent paper [39] suggests that in some cases, the activated species is actually the symmetrical anhydride, and the reagent can be replaced by simpler benzoic acids. Following cyclization, iodine oxidation by the Simon procedure gave the disulfide, which was deprotected
(a) 25 equiv Ph,P, 20 equiv TsOH 5 equiv DIAD, THF, 0 "C, 4 h (b) 1 equiv l2 MeOH, 10 rnin * 52%
(a) 25 equiv Ph,P, 5 equiv TsOH 20 equiv DIAD, THF, 0 "C, 4 h
v
-
11 QH
(b) 20 equiv l2 MeOH, 10 rnin TBS (c) 5% aq HF/MeCN, 1 h * 37%
FR901,375
Scheme 12-6 Completion ofthe total syntheses of FK228 and FR901,375 by Mitsunobu macrolactonization.
12.8 Macrocyclizations a n d Completion ofthe Synthesis
0
\
1.5 equiv Et,N, 2 h; 1 equiv DMAP, toluene
(b) 12, MeOH/CH,CI, (c) HCI, EtOAc
OH
n
" 13
Spiruchostatin A
34%
L
O
0
""'
OH
OH 0
15
(a) 1.2 equiv
NO,
0
0
2.4 equiv DMAP, CH2CI, (b) 12, MeOH/CH,CI, 67%
NO,
I
-
-
Spiruchostatin A
16. epi-Spiruchostatin A
Final stages in the Canesan and Doi-Takahashi total syntheses of spiruchostatin A, and the structure of spiruchostatin A epimer 16.
Scheme 12-7
to furnish the natural product. Similarly, the minor diastereomer of 7 was carried forward through the whole sequence to provide 16, which is identical to spiruchostatin A except for being the epimer in the B-hydroxy acid fragment. In the Doi-Takahashi synthesis of spiruchostatin A, in which the additional alcohol remains unprotected, the Shiina procedure for carbonyl activation was used. This enabled the macrolactonization to proceed under milder conditions at room temperature. The spiruchostatin syntheses show that it is possible to form the macrocycle by the classical carbonyl activation method rather than the alcohol activation seen in the Simon and Wentworth-Janda syntheses of FK228 and FR901,375 respectively. The Shiina reagent is the reagent ofchoice due to its room temperature activation, and we have successfully used [40] this method for the preparation of a number of unnatural analogs. Since the depsipeptides contain two macrocyles, the depsipeptide framework and the disulfide bridge, the sequence in which these are formed is a separate issue. All the syntheses have first made the cyclic depsipeptide. In the Doi-Takahashi
I
711
712
I route, an intermediate with the intramolecular disulfide bridge in place did not 12 The Bicyclic Depsipeptide Family ofHistone Deacetylase Inhibitors
undergo macrolactonization. This surprising result suggests that the disulfide bridge does not predispose the system toward the second cyclization, although modeling indicates that favorable low-energy conformations are accessible.
12.9 The Biological Characterization o f Spiruchostatin A
As described above, the spiruchostatins were first isolated on the basis of their ability to regulate gene expression in cell-based reporter assays. Nevertheless, the close structural similarity to FK228 suggested that these natural products were HDAC inhibitors. Following our total synthesis, we characterized in detail the activity of spiruchostatin A as an HDAC inhibitor in various model systems. Initial analysis [3G]demonstrated that spiruchostatin A was a potent nanomolar growth inhibitor of MCF7 human breast cancer cells. An increase was observed in histone acetylation and in p21cip1/waf1 promoter activity - two characteristic cellular responses to HDAC inhibitors. FK228 is believed to work by a prodrug mechanism involving intracellular activation by reduction of the disulfide bond. We have obtained evidence [41] that spiruchostatin A works in a fashion similar to in vitro enzyme inhibition assays. In the presence of DDT, reduced spiruchostatin A inhibited total HeLa cell HDAC activity with an ICso of approximately 2 nM. In the absence of DIT, intact spiruchostatin A was essentially inactive. Another hallmark of FK228 is its selectivity between HDAC isoforms. The Yoshida group has investigated this with overexpressed HDACs containing an epitope tag that is partially purified from cell lysates by immunoprecipitation using antibodies. In this assay, spiruchostatin A was approximately 500-foldhigher in the activity against the class I HDACl compared to the class I1 HDACG (Table 12-2).These results show that FK228 and spiruchostatin A have similar characteristics and mechanisms as HDAC inhibitors.
Table 12-2 Inhibition values of depsipeptides against HDACl and HDAC6. For comparison, the values with the nonselective inhibitor trichostatin A are shown
Compound
HDACl lCs0 [nmol]
HDAC6 lCso [nmol] ~~~~~
Trichostatin A FK228 (with DDT) Spiruchostatin A (with DDT)
15.0
61
4.0
790
0.6
360
I
12.9 The Biological Characterization ofSpiruchostatin A Spiruchostatin A
. A
2 a,
g
a
FK228 MCF7 NHDF
4i\t-,
cn 40
20 0
I
-2
-
1
,
I
0 1 2 Log dose [nM]
3
1201
-
m A
20 7 I
I
4
-2
-1
0
3
4
3
4
2
SAHA
A
1 2 Log dose [nM]
1
Log dose [nM]
1
0
MCF7 NHDF
40
TSA
- 2 - 1
713
3
MCF7 NHDF
4
- 2 - 1
1 2 Log dose [nM]
0
Fig. 12-8 Growth inhibition curves offour HDAC inhibitors in MCF7 and normal human dermal fibroblast (NHDF) cell lines. Cells were treated with inhibitor and relative cell growth determined 6 days later.
We have performed side-by-side experiments to compare the growth inhibitory activity of spiruchostatin A and FK228 compared to other classes of HDAC inhibitors (Fig. 12-8).The depsipeptides FK228 and spiruchostatin A are extremely potent inhibitors with subnanomolar/low nanomolar potency in MCF7 growth inhibition assays. By contrast, the hydroxamic acids, SAHA and PXDlOl both of which are in clinical trials, are much less potent. The same result was obtained with the benzamide MS-275, another clinical candidate, with high nanomolar IC50 values in these cells. We observe similar relative potency for these inhibitors across various tumor cell types, including A2780 ovarian carcinoma cells and PC3 prostate cancer cells. We have also compared the activities of depsipeptide and hydroxamic acid HDAC inhibitors on cellular responses of malignant and normal cells. When tested at equipotent concentrations, these inhibitors have remarkably similar effects, inducing essentially identical levels of histone acetylation and p21C'P'/Waf'protein expression. However, only SAHA and TSA induced robust a-tubulin acetylation. This is consistent with previous findings [42]that
714
12 The Bicyclic Depsipeptide Family ofHistone Deacetylase Inhibitors
I HDACG, which is very weakly inhibited by spiruchostatin A and FK228, is responsible for a-tubulin acetylation, and with previous studies [43]of FK228. The inhibitors also had essentially identical effects on G2M cell-cycle arrest and cell death. Of a subset of eight genes selected from approximately 100TSAregulated genes identified [44]by microarray analysis, all were also regulated by spiruchostatin A. The growth inhibitory activity of HDAC inhibitor is relatively selective toward malignant cells [45]and all inhibitors tested showed approximately equivalent levels of sparing of normal human dermal fibroblasts (Fig. 12-8). These findings clearly demonstrate that the relative selectivity of depsipeptide inhibitors for class I enzymes does not limit their anticancer activity, at least in vitro, and confirm that inhibition of HDACG is not required for the effects [46] of HDAC inhibitors on cell-cyclearrest and gene expression. Such observations are consistent with the predominant expression of class I HDACs in malignant cells, and the correlation between expression of these enzymes and outcome in malignancies. To address the importance of the cyclic cap structure of spiruchostatin A, we examined the properties of analog 16, epimeric at the thiol-bearing side chain. Although this compound conforms to the general requirements for an HDAC inhibitor (i.e., a zinc-binding group, an aliphatic chain to mimic the lysine side chain and a cap structure), epi-spiruchostatin A was inactive in both in vitro and cell-based assays. Because of its epimeric nature, if this compound is oriented within the active site in the same way as spiruchostatin A, the rest of the depsipeptide framework will be a mirror image. Clearly this is leading toward unfavorable interactions with the “rim”, or the loss of positive interactions, leading to loss of activity. One potential limitation of HDAC inhibitors is that they can induce expression of p-glycoprotein protein (pgp)-1,a major drug emux pump. This may lead to resistance to the HDAC inhibitors, as well as potential drug-drug interactions by decreasing the intracellular accumulation of coadministered agents. Using quantitative PCR, we demonstrated that spiruchostatin A significantly induced expression of pgp-1 RNA in MCF7 cells, as documented with FK228 (Fig. 12-9). Interestingly, pgp-1 RNA was not induced by epispiruchostatin A, demonstrating that induction is likely to be predominantly mediated by a direct effect of HDAC inhibition, rather than the xenobiotic responses that mediate the induction of pgp-1 by many other compounds. Previous work using TSA, has demonstrated [47] a significant role for the transcription factor NF-Y in the control of pgp-1 expression. Besides their application as anticancer agents, HDAC inhibitors also have potential clinical utility in cardiovascular disease. We have characterized [48]the effects of spiruchostatin A in cardiac myocytes. In these cells, phenylephrine triggers a cascade of events leading to hypertrophy, including activation of markers of fetal cardiac gene expression, such as atrial natriuretic factor (ANF) and B-MHC, and reorganization of fibers to form sarcomeres. Spiruchostatin A increased histone acetylation in cardiac myocytes, and reversed the effects
12.9 The Biological Characterization ofSpiruchostatin A FK228 (3.8 nM) epi-Spi (30 nM) Spi (30 nM)
DMSo
r-~ - - ~ ~ ~
0
50
T
-
-
~
~
-
T
-
~
100 150 200 Fold induction
~
-
r
250
~
-
-
-
-
~
-
~
~
300
Fig. 12-9 Induction of pgp-1 RNA expression. M U 7 cells were treated with the indicated compounds for 16 h and the expression of pgp-1 RNA analyzed using Q-RT-PCR. Fold induction is shown relative to DMSO treated cells.
of phenylephrine on ANF and p-MHC expression and sarcomere formation, suggesting that depsipeptides may have antihypertrophic activity. Despite the overall similarities between the effects of hydroxamic acid and depsipeptide HDAC inhibitors on cancer cells, we have identified some important class-specificeffects (in addition to the selective induction of a-tubulin acetylation). Importantly, the kinetics of inhibition of cellular HDACs by these inhibitors varies widely. While hydroxamic acids induce rapid histone acetylation in intact cells, the onset of action of the depsipeptide inhibitors is much slower (Fig. 12-10).Also, following removal of compound by extensive washing, histone acetylation is rapidly lost in hydroxamic acid treated cells, but is maintained for protracted periods in cells treated with depsipeptide inhibitors. The mechanisms responsible for these differences are not known, but presumably relate to uptake of compound, and/or its metabolism to active forms by intracellular reduction mechanisms.
Fig. 12-10 Histone acetylation in spiruchostatin A- or TSAtreated cells. MCF7 cells were treated with 15 nM spiruchostatin A or 80 nM TSA and analyzed by immunoblotting for histone acetylation at the indicated time points.
I
715
716
I
12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
Fig. 12-11 Induction of histone acetylation by spiruchostatin A. (a) MCF7 cells were treated with spiruchostatin A, reduced spiruchostatin A or spiruchostatin A in serum free media (SFM), all a t 15 n M for up t o 24 h. Untreated cells were analyzed as a
control (Co). (b) MCF7 cells were treated with indicated concentrations o f spiruchostatin A in the presence or absence o f epi-spiruchostatin A. Histone acetylation and PCNA expression (loading control) was analyzed by immunoblotting.
The kinetics of acetylation were not altered by culturing cells in the absence of serum, suggesting that binding to serum proteins does not limit drug action (Fig. 12-1l(a)).We also tested the effect of prereducing spiruchostatin A before addition to cells. However, the kinetics of acetylation induced by reduced and oxidized spiruchostatin A were essentially identical, suggesting that intracellular reduction is not a rate-limiting step (Fig. 12-1l(a)).Finally, we used the inactive epimer of spiruchostatin to investigate the potential contribution of saturable transporters (Fig. 12-11(b)). We reasoned that this chemically similar compound might compete for a putative transporter and interfere with spiruchostatin A-induced acetylation. However, spiruchostatin A-induced acetylation was equivalent in the presence or absence of its epimer. Further studies are required to determine the factors that influence the kinetics of action of depsipeptide HDAC inhibitors. The significance of these findings for the clinical application of these compounds is unclear. We and others have shown that transient histone acetylation associated with “pulse” treatment of cells with hydroxamic acids is not sufficient to promote G2M arrest. Consistent with this, it may be the
References
duration of histone acetylation rather than the peak levels that best predict responses in individual patients in clinical trials. Therefore, the ability of depsipeptide inhibitors to promote prolonged acetylation may be advantageous. However, it may be necessary to maintain the circulating concentrations of these compounds above a threshold for a considerable time before acetylation is induced. A combination of a rapid acting hydroxamic acid HDAC inhibitor and a long-lived depsipeptide HDAC inhibitor may provide a particularly attractive combination.
References 1. (a) N. Sengupta, E. Seto, Regulation of
2.
3.
4.
5.
6.
7.
8.
C.R. Maroun, I. Paquin, A. Vaisburg, histone deacetylase activities, /. Cell. Histone deacetylase inhibitors: Latest developments, trends and prospects, Biochem. 2004, 93, 57-67; (b) M. Biel, M. Wascholowski, A. Giannis, Curr. Med. Chem. Anticancer Agents Epigenetics - An epicenter of gene 2005, 5, 529-560. regulation: Histones and 9. M.F. Fraga, E. Ballestar, A. Villar-Garea, M. Boix-Chornet, histone-modifying enzymes, Angew. J. Espada, G. Schotta, T. Bonaldi, Chem., Int. Ed. Engl. 2004, 44, C. Haydon, S. Ropero. K. Petrie, N.G. 3186-3216. lyer, A. Perez-Rosado, E. Calvo, J.A. V.G. Allfrey, R. Faulkner, A.E. Mirsky, Lopez, A. Cano, M.J. Calasanz, Acetylation and methylation of D. Colomer, M.A. Piris, N. Ahn, histones and their possible role in the A. Imhof, C. Caldas, T. Jenuwein, regulation of rna synthesis, Proc. Natl. M. Esteller, Loss of acetylated lysine 16 Acad. Sci. U.S.A. 1964, 51, 786-794. and trimethylated lysine 20 of histone S.B. Baylin, J.E. Ohm, Epigenetic gene H4 is a common hallmark of human silencing in cancer - a mechanism for cancer, Nat. Genet. 2005, 37, 391-400. early oncogenic pathway addiction?, 10. (a) A. Mai, S. Massa, D. Rotili, Nat. Rev. Cancer 2006, 6, 107-116. I. Cerbara, S. Valente, R. Pezzi, 1. Nusinzon, C.M. Horvath, Histone S. Simeoni, R. Ragno, Histone deacetylases as transcriptional deacetylation in epigenetics: an activators? Role reversal in inducible attractive target for cancer therapy, gene regulation, Sci. STKE 2005 r e l l . Med. Res. Rev. 2005. 25, 261-309; K. Zhang, S.Y. Dent, Histone (b) S. Minucci, P.G. Pelicci, Histone modifying enzymes and cancer: Going deacetylase inhibitors and the promise beyond histones, I . Cell. Biochem. of epigenetic (and more) treatments 2005, 96, 1137-1148. for cancer, Nat. Rev. Cancer 2006, 6, M. Dokmanovic, P.A. Marks, 38-51. Prospects: Histone deacetylase 11. (a) J.S. Steffan, L. Bodai, J. Pallos, inhibitors, /. Cell. Biochem. 2005, 96, M. Poelman, A. McCampbell, B.L. 2 93- 304. Apostol, A. Kazantsev, E. Schmidt, J.M. Denu, The Sir2 family of protein Y.Z. Zhu, M. Greenwald, deacetylases, Curr. Opin. Chem. Biol. R. Kurokawa, D.E. Housman, G.R. 2005, 9,431-440. Jackson, J.L. Marsh, L.M. Thompson, (a) T.A. Miller, D.J. Witter, Histone deacetylase inhibitors arrest S. Belvedere, Histone deacetylase polyglutamine-dependent inhibitors,J. Med. Chem. 2003, 46, neurodegeneration in Drosophila, 5097-5116; (b) C. Monneret, Histone Nature 2001, 413, 739-743; deacetylase inhibitors, Eur. I. Med. (b) E. Hockly, V.M. Richon, Chem. 2005, 40, 1-13; (c) 0. Moradei,
1
717
7181 12 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors
12.
13.
14.
15.
16. 17.
B. Woodman, D.L. Smith, X. Zhou, E. Rosa, K. Sathasivam, S. Ghazi-Noori, A. Mahal, P.A. Lowden, J.S. Steffan, J.L. Marsh, L.M. Thompson, C.M. Lewis, P.A. Marks, G.P. Bates, Suberoylanilide hydroxamic acid, a histone deacetylase inhibitor, ameliorates motor deficits in a mouse model of Huntington’s disease, Proc. Natl. Acad. Sci. U.S.A. 2003, 100,2041-2046. (a) T. McKinsey, E.N. Olson, Toward transcriptional therapies for the failing heart: chemical screens to modulate genes, J. Clin. Invest. 2005, 115, 538-546; (b) J. Backs, E.N. Olson, Control of cardiac growth by histone acetylation/deacetylation, Circ. Res. 2006, 98, 15-24. (a) N. Yamaji, N. Shindou, Y. Terada, World Patent, 2004, 017996; (b) F. Blanchard, C. Chipoy, Histone deacetylase inhibitors: New drugs for the treatment of inflammatory diseases?, Drug Discou. Today 2005, 10, 197-204. S. Skov, K. Rieneck, L.F. Bovin, K. Skak, S. Tomra, B.K. Michelsen, N. Odum, Histone deacetylase inhibitors: a new class of immunosuppressors targeting a novel signal pathway essential for CD154 expression, Blood 2003, 101, 1430-1438. S.G.Gray, P. De Meyts, Role of histone and transcription factor acetylation in diabetes pathogenesis, Diabetes Metab. Res. Rev. 2005, 21, 416-433. M. Crestani, C. Godio, N. Mitro, World Patent, 2005, 105066. (a) S.B. Singh, D.L. Zink, J.M. Liesch, R.T. Mosley, A.W. Dombrowski, G.F. Bills, S.J. Darkin-Rattray, D.M. Schmatz, M.A. Goetz, Structure and chemistry of apicidins, a class of novel cyclic tetrapeptides without a terminal a-keto epoxide as inhibitors of histone deacetylase with potent antiprotozoal activities, J . Org. Chem. 2002, 67, 815-825; (b) P.1. Murray, M. Kranz, M. Ladlow, S. Taylor, F.’Berst, A.B.
18.
19.
20.
21.
22.
23.
24.
Holmes, K.N. Keavey, A. Jaxa-Chamiec, P.W. Seale, P. Stead, R.J. Upton, S.L. Croft, W. Clegg, M.R. Elsegood, The synthesis of cyclic tetrapeptoid analogues of the antiprotozoal natural product apicidin, Bioorg. Med. Chem. Lett. 2001, 11, 773-776. G. Lehrman, I.B. Hogue, S. Palmer, C. Jennings, C.A. Spina, A. Wiegand, A.L. Landay, R.W. Coombs, D.D. Richman, J.W. Mellors, J.M. Coffin, R.J. Bosch, D.M. Margolis, Depletion of latent HIV-1 infection in vivo: a proof-of-concept study, Lancet 2005, 366,549-555. P. Lewer, D.O. Duebelbeis, P.R. Graupner, J.X. Huang, US Patent 2005,261174. M.S. Finnin, J.R. Donigan, A. Cohen, V.M. Richon, R.A. Rifkind, P.A. Marks, R. Breslow, N.P. Pavletich, Structures of a histone deacetylase homologue bound to the TSA and SAHA inhibitors, Nature 1999, 401, 188-193. J.R. Somoza, R.J. Skene, B.A. Katz, C. Mol, J.D. Ho, A.J. Jennings, C. Luong, A. Arvai, J.J. Buggy, E. Chi, J. Tang, B.-C. Sang, E. Verner, R. Wynands, E.M. Leahy, D.R. Dougan, G . Snell, M. Navre, M.W. Knuth, R.V. Swanson, D.E. McRee, L.W. Tari, Structural snapshots of human HDAC8 provide insights into the class I histone deacetylases, Structure 2004, 12, 1325-1334. T.K. Nielsen, C. Hildmann, A. Dickmanns, A. Schwienhorst, R. Ficner, Crystal structure of a bacterial clas 2 histone deacetylase homologue, J. Mol. Biol. 2005, 354, 107-120. D.-F. Wang, P. Helquist, N.L. Wiech, 0. Wiest, Toward selective histone deacetylase inhibitor design: Homology modeling, docking studies, and molecular dynamics simulations of human class I histone deacetylases, J. Med. Chem. 2005,48,6936-6947. M. Yoshida, M. Kijima, M. Akita, T. Beppu, Potent and specific inhibition of mammalian histone deacetylase both i n uiuo and i n uitro by
References I 7 1 9
25.
26.
27.
28.
trichostatin A, /. Biol. Chem. 1990, 265, 17174- 17179. J. Taunton, J.L. Collins, S.L. Schreiber, Synthesis of natural and modified trapoxins, useful reagents for exploring histone deacetylase function,]. Am. Chem. SOC.1996, 118, 10412-10422. (a) N. Nishino, B. Jose,S. Okamura, S. Ebisusaki, T. Kato, Y. Sumida, M. Yoshida, Cyclic tetrapeptides bearing a sulfhydryl group potently inhibit histone deacetylases, Org. Lett. 2003, 5, 5079-5082; (b) B. Jose, Y. Oniki, T. Kato, N. Nishino, Y. Sumida, M. Yoshida, Novel histone deacetylase inhibitors: cyclic tetrapeptide with trifluoromethyl and pentafluoroethyl ketones, Bioorg. Med. Chem. Lett. 2004, 14,5343-5346; (c) M.P. Bhuiyan, T. Kato, T. Okauchi, N. Nishino, S. Maeda, T.G. Nishino, M. Yoshida, Chlamydocin analogs bearing carbonyl group as possible ligand toward zinc atom in histone deacetylases, Bioorg. Med. Chem. 2006, 14,3438-3446. (a) P. Zhu, E. Martin, J. Mengwasser, P. Schlag, K.P. Janssen, M. Gottlicher, Induction of HDAC2 expression upon loss of APC in colorectal tumorigenesis, Cancer Cell 2004, 5, 455-463; (b) K. Halkidou, L. Gaughan, S. Cook, H.Y. Leung, D.E. Neal, C.N. Robson, Upregulation and nuclear recruitment of HDACl in hormone refractory prostate cancer, Prostate 2004, 59, 177-189; (c) C.A. Krusche, P. Wulfing, C. Kersting, A. Vloet, W. Bocker, L. Kiesel, H.M. Beier, J.Alfer, Histone deacetylase-1 and -3 protein expression in human breast cancer: a tissue microarray analysis, Breast Cancer Res. Treat 2005, 90,15-23. (a) H. Ueda, H. Nakajima, Y. Hori, T. Fujita, M. Nishimura, T. Goto, M. Okuhara, FR901228, A novel antitumor bicyclic depsipeptide produced by Chromobacterium violaceum No. 968. I. Taxonomy, fermentation, isolation, physico-chemical and biological properties, and antitumor activity,/.
29.
30.
31.
32.
33.
34.
Antibiot. 1994, 47, 301-310; (b) N. Shigematsu, H. Ueda, S. Takase, H. Tanaka, K. Yamamoto, T. Tada, FR901228, A novel antitumor bicyclic depsipeptide produced by Chromobacterium violaceum No. 968. 11. Structure determination, ]. Antibiot. 1994, 47, 311-314; (c) H. Ueda, T. Manda, S. Matsumoto, S. Mukumoto, F. Nishigaki, I . Kawamura, K. Shimomura, FR901228, A novel antitumor bicyclic depsipeptide produced by Chromobacterium violaceum No. 968. Ill. Antitumor activities on experimental tumors in mice, ]. Antibiot. 1994, 47, 315-323. H. Nakajima, Y.B. Kim, H. Terano, M. Yoshida, S. Horinouchi, FR901228, a potent antitumor antibiotic, is a novel histone deacetylase inhibitor, Exp. Cell Res. 1998, 241, 126-133. R. Furumai, A. Matsuyama, N. Kobashi, K.-H. Lee, N. Nishiyama, H. Nakajima, A. Tanaka, Y. Komatsu, N. Nishino, M. Yoshida, S. Horinouchi, FK228 (depsipeptide) as a natural prodrug that inhibits class I histone deacetylases, Cancer Res. 2002, 62,4916-4921. M. Okuhara, T. Goto, T. Fujita, Y. Hori, H. Ueda, Japanese Patent, 1991, 3141296. (a) K. Shin-ya, Y. Masuoka, A. Nagai, K. Furihata, K. Nagai, K. Suzuki, Y. Hayakawa, Y. Seto, Spiruchostatins A and B, novel gene expression-enhancing substances produced by Pseudomonas sp, Tetrahedron Lett. 2001, 42, 41-44; (b) K. Nagai, M. Taniguchi, N. Shindo, Y. Terada, M. Mori, N. Amino, K. Suzumura, I. Takahashi, M. Amase, World Patent, 2004, 020460. N. Shindou, A. Terada M. Mori, N. Amino, K. Hayata, K. Nagai, Y. Hayakawa, K. Shinke, Y. Masuoka, Japanese Patent, 2001,348340. K.W. Li, W. Xing, J.A. Simon, Total synthesis of the antitumor depsipeptide FR901,228,]. Am. Chem. SOC.1996, 118,7237-7238.
720
I
72 The Bicyclic Depsipeptide Family of Histone Deacetylase Inhibitors 35.
36.
37.
38.
39.
40. 41.
42.
43.
Y. Chen, C. Gambs, Y. Abe, P. Wentworth Jr, K.D. Janda, Total synthesis of the depsipeptide FR-901375,J. Org. Chem. 2003, 68, 8902-8905. A. Yurek-George, F. Habens, M. Brimmell, G. Packham, A. Ganesan, Total synthesis of spiruchostatin A, a potent histone deacetylase inhibitor, J. Am. Chem. SOC.2004, 126,1030-1031. T. Doi, Y. Iijima, K. Shin-ya, A. Ganesan, T. Takahashi, A total synthesis of spiruchostatin A, Tetrahedron Lett. 2006, 47, 1177-1180. B. Liang, D.J. Richard, P. Portonovo, M.M. Jouillii., Total syntheses and biological investigations of tamandarins A and B and tamandarin A analogs, J . Am. Chem. SOC.2001, 123,4469-4474. I. Dhimitruka, J. Santa Lucia Jr, Investigation of the Yamaguchi esterification mechanism. Synthesis of a lux-s enzyme inhibitor using an improved esterification method, Org. Lett. 2006, 8, 47-50. A. Yurek-George, A. Cecil, T. Hill, A. Ganesan, unpublished results. S.J. Crabb, H. Rogers, P.A. Townsend, A. Yurek-George, K. Carey, B.M. Pickering, S. Maeda, P.W.M. Johnson, K. Shin-ya, M. Yoshida, A. Ganesan, G. Packham, Depsispeptide histone diacetycase inhibitors induce delayed and protracted histore acetylation, submitted for publication. Y. Zhang, N. Li, C. Caron, G. Matthias, D. Hess, S. Khochbin, P. Matthias, HDAC-6 interacts with and deacetylates tubulin and microtubules in vivo, EMBOJ. 2003, 22,1168-1179. K.M. Koeller, S. J. Haggarty, B.D. Perkins, 1. Leykin, J.C. Wong,
44.
45.
46.
47.
48.
M.C. Kao, S.L. Schreiber, Chemical genetic modifier screens: small molecule trichostatin suppressors as probes of intracellular histone and tubulin acetylation, Chem. B i d . 2003, 10,397-410. M. Howell, B.M. Pickering, K. Carey, S.J. Crabb, R. Mitter, P.W.M. Johnson, G. Packham, Microarrey analysis of histone deacetylase regulated genes in MCF7 human breast cancer cells, Manuscript in preparation. J.S. Ungerstedt, Y. Sowa, W.S. Xu, Y. Shao, M. Dokmanovic, G. Perez, L. Ngo, A. Holmgren, X. Jiang, P.A. Marks, Role of thioredoxin in the response of normal and transformed cells to histone deacetylase inhibitors, Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 673-678. S.J. Haggarty, K.M. Koeller, J.C. Wong, C.M. Grozinger, S.L. Schreiber, Domain-selective small-molecule inhibitor of histone deacetylase 6 (HDAC6)-mediated tubulin deacetylation, Proc. Natl. Acad. Sci. U.S.A.2003, 100,4389-4394. S . Jin, K.W. Scotto, Transcriptional regulation ofthe M D R l gene by histone acetyltransferase and deacetylase is mediated by NF-Y, Mol. Cell. Biol. 1998, 18, 4377-4384. S.M. Davidson, P.A. Townsend, C. Carroll, A. Yurek-George, K. Balasubramanyam, T.K. Kundu, A. Stephanou, G. Packham, A. Ganesan, D.S. Latchman, The transcriptional co-activator p300 plays a critical role in the hypertrophic and protective pathways induced by phenylephrine in cardiac cells but is specific to the hypertrophic effects of urocortin, Chem. Biochem. 2005, 6 , 162-170.
PART V Chemical Informatics
Chemical Biology. From Small Molecules to System Biology and Drug Design Edited bv Stuart L. Schreiber. Tamn M. Kauoor. and Gunther Wess Copyright 02007 WILEY-VCH Verlag G k b H & Co. KGaA, Weinhelm ISBN: 978-3-527-31150-7
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess Cowriaht 0 2007 WILEY-VCH Verlaq CmbH & Co KCaA, Weinheim
I723
13 Chemical Informatics
13.1 Chemical Informatics
Paul A. Clemons
Outlook
This chapter begins with an overview of cheminformatics and chemical space, presenting concepts and terminology that will aid the reader’s understanding of the following sections. The second section provides a conceptual perspective on chemical structure, summarizing the evolution of the molecular graph representation now intimately familiar to the synthetic organic chemist. The third and main section outlines the development of computable molecular descriptors, including those based on both empirical and theoretical models. The purpose of this section is to demystify the process of computing descriptors and to give readers, especially experimental chemists and biologists, a clear connection between their intuitive concept of chemical structure and how molecular structures can be represented computationally. The fourth section uses several recent examples to illustrate how the concept of chemical space can be applied to problems in cheminformatics, such as property prediction, diversity analysis, and reagent selection. A brief final section challenges cheminformatics to approach future efforts to understand molecular diversity in terms of the experimental performance of small molecules across multiple biological contexts. The novice reader should use this narrative as a starting point for further inquiry, particularly by exploring the primary sources and other references cited herein. The expert reader is encouraged to allow this chapter to bring fresh perspective to a familiar field, and especially to appreciate how future challenges will require increasingly tight connections between synthetic chemists, chemical biologists, and computational scientists. Chemical Biology. From Small Molecules to System Biology and Drug D e s i p Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH 6; Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
724
I
13 Chemical Informatics
13.1.1 Introduction: Cheminformatics and Chemical Space
The similarity of small molecules and the diversity of small-molecule collections can be described in many ways, both computational and experimental. Predictive models and classification methods that relate computable properties to measured outcomes can provide useful insights into synthetic library planning, selection of compounds for screening efforts, and prioritization of “hits” from high-throughput screening (HTS).In the past, chemical intuition dominated analyses of small-molecule structure, structural similarity, and chemical reactivity. Chemists trained in synthetic organic chemistry, for example, have developed over two centuries of deep intuition about chemical reactivity that can now be expressed in terms of formal logical rules 111. Medicinal chemists have similarly built extensive working knowledge of the structural patterns “accepted” by human biology as bioavailable drug molecules. In recent decades, chemists have increasingly turned to computation to solve chemical problems [2]. The diversity of applications for computers in chemistry reflects the variety in chemical research, and computers are now indispensable in all areas of chemistry. In 1959, Konrad Zuse sold the first commercially available computer, the magnetic drum-based 222, to Bayer AG [2, 31. Beginning in the mid-l9GOs, chemists began to make use of the rapidly growing capabilities afforded by computers to frame and solve problems in chemistry. Initially, computer assistance to chemical research focused on structure elucidation based on assisted evaluation of spectroscopic data [4,51, and on programs to design organic syntheses on the basis of known reaction data [2, GI. More than a decade ago, Ugi etal. made a distinction between “computational chemistry”, in which the calculation of molecular energy levels and geometries prevails, and “computer chemistry”, in which the logical and combinatorial capabilities of computers (rather than the arithmetic ones) are exploited to solve chemical problems not approachable by numerical computations per se [2]. Though this conceptual distinction is an important one, the precise terminology did not persist. Instead, the newer term cheminformatics now enjoys wide use to encapsulate a broad range of activities at the interface of chemistry and computer science, such as synthetic planning, molecular property calculation, database searching, combinatorial library manipulation, chemical similarity and diversity, and simulations of molecular behavior. While early efforts to use computers in chemistry were significant accomplishments if a computer provided any solution to a chemical problem, the present situation is far different. Today, a proliferation of methods and approaches requires a distillation of meaningful results from a vast array of potential solutions. Most frequently, this situation necessitates agile and iterative feedback between hypothesis generation (afforded by computational scientists) and hypothesis testing (usually performed in the laboratory).
73.7 Chemica/ Informatics
Thus, despite the emergence of cheminformatics as a thriving and distinct subdiscipline, the need for close connections between cheminformatics and experimental (e.g., synthetic organic) chemists has never been greater. Against this backdrop, making clear distinctions between computed properties and measured properties of small molecules is especially important. In both cases, a structural representation of a small molecule is the input parameter to a conceptual set of operations that give rise to numerical outputs such as molecular descriptors, physicochemical properties, or biological outcomes (Fig. 13.1-1(a)).However, to be useful in predictive ways, such as when used to support prospective decisions about the investment of synthetic chemistry resources, at least some of these numerical outputs must be computable given only a structure representation. Only this situation allows relationships between experimentally determined values and computed values to be used to predict experimental outcomes for new molecules, based on their structural similarity to molecules that have already been experimentally tested (Fig. 13.1-1(b)).Most broadly, chemical space is a colloquialism that refers to the ranges and distributions of computed or measured outputs based on chemical structure inputs, and serves as a mathematical framework for quantitative comparisons of similarities and differences between small molecules (Fig. 13.1-1(c)).
13.1.2 General Considerations: Chemical Structure Graphs
Synthetic organic chemistry can be viewed as an ongoing series of experiments to relate properties of chemical structure, particularly topological, steric, and electronic properties, to a particular class of measured outcomes, namely, the reactivities of combinations of functional groups under diverse reaction conditions, as judged by reaction rates and yields of product formation. Physical chemistry seeks relationships between chemical structure and such outcomes as boiling or melting points, vapor pressure, and electrochemical potential. Analytical chemistry often relates chemical structure to the measured behavior of molecules in appropriately applied electromagnetic fields. Each of these aspects of the field of chemistry is connected through the basic principle of chemical structure, which is a profound physical feature of the molecular world where we live. At its most fundamental, stereoelectronic structure is a quantum-mechanical reality of all molecules, with the intrinsic uncertainty that this reality implies. Thus, perfectly accurate structural descriptions of molecules are both elusive and potentially cumbersome. Instead, chemists have devised an exceptional model of molecular structure by inference. This model has been built over decades between evolving theory and experiments that measure various molecular properties that derive from structure itself. Closely aligned with our intuitive definition of “structure”, of course, are methods that provide direct information about
1
725
726
I
73 Chemical Informatics
Fig. 13.1-1 The concept of chemical space. computed and measured properties. (a) Chemical structure as an input t o operations producing numerical outputs. (b) Conceptual illustration ofa possible predictive relationship between arbitrary
(c) Chemical space as a mathematical framework for comparing molecules, where “distance” is related t o “dissimilarity”.
the “size” and “shape” of molecules, such as X-ray crystallography and magnetic resonance spectroscopy. However, even these methods provide only a partial picture of molecular structure. Experimental realities such as lattice constraints, resolution limits, dynamic equilibria between rotamers, and modeling ambiguities often raise questions about how the same molecule might “look” under other experimental (or natural) circumstances.
13.1 Chemical Informatics
Considering structure in this manner, however, promotes the notion that it is rarely molecular structure per se that intrigues and excites us. Rather, molecular structure is often just a surrogate that we use to encode likely behaviors of molecules under different sets of circumstances. We often wonder, for example, how a change in structure might result in some difference in a measurable outcome. Indeed, it is molecular properties that are of primary interest after all! Because of this fact, chemists have developed very elegant and compact representations of chemical structure. The concept of the chemical graph has a history that predates modern theories of chemical bonds and molecular structure. Scottish chemist William Cullen introduced “affinity diagrams” in his mid-eighteenth century lectures, using lines to represent forces acting between molecules undergoing chemical reactions [7]. Subsequently, in 1789, William Higgins used lines to denote forces connecting atoms to depict individual molecules, in this case the various oxides of nitrogen [7]. Both of these “chemical graphs” predated the modern concept of the chemical bond as articulated much later by Couper and Kekule [8], among others [9], but they did set the stage for more serious attempts to study the spatial arrangement of atoms in molecules, notably by Dalton and Wollaston, each of whom made use of models reminiscent of the modern “ball-and-stick” depictions of chemical structure [7]. A more familiar concept of the molecular graph was introduced implicitly by Sir Arthur Cayley in 1874 [lo], though the term graph was not used explicitly until several years later by Sylvester [I11,who was inspired by the valence-theory pioneer Edward Frankland’s “graphic-like symbolic formulae” [ 121. Cayley’s seminal paper in chemical graph theory considered the mathematical theory of isomers, and identified two types of molecular graphs, which Cayley named “plerograms” and “kenograms” [lo, 13, 141. Though a contemporary of the chemists involved in the development of chemical-bonding theory, such as Couper and Kekule [9],Cayley is most widely known as a pure mathematician, a fact that foreshadows the modern need for interdisciplinary approaches to chemical research. In modern terminology, Cayley’s plerograms are molecular graphs in which all atoms are represented by vertices, and all bonds by edges. Cayley’s kenograms represent what are known today as hydrogen-suppressed molecular graphs [12j. Many advances in the understanding of electronic structure accompanied the first half of the twentieth century, especially including the introduction of shared electrons and electron-dot structures by Lewis in 1916 [15], quantum mechanics in 1926, and Pauling’s hybrid molecular orbitals in 1931 [8]. Despite these advances, chemists rarely take the time or trouble to draw the more “accurate” space-filling, or even three-dimensional ball-and-stick, structures during normal presentation. Rather, chemists have developed conventions such as condensed formulas, dashed-wedge line notation, and hydrogen-suppressed chemical graphs, each of which embed implications of electronegativity, lone pairs, molecular orbitals, and three-dimensionality as a symbolic logic [ 15, 161 that trained chemists interpret automatically.
I
727
(c) 3R,4S,5R-trihydroxy-cyclohex-l-enecarboxylic acid (d)
011100 010iooio 01000100 00000111 00000000 00000000
ioioo010 00000000 00000000 00110010 00000000
01011100 00000000 00010001 00000000 00000000
ooooiioo
10000100 00000000 00000000 00000000
(e) O[C@@H]lCC(=C[C@@H] (0)[C@H]lO)C(=O)O (f)
12 12 0 0 0 0 0 -0.7145 0.2062 -0.7145 -0.6187 -0.0000 -1.0312 0.7145 -0.6187 0.7145 0.2062 -0.0000 0.6187 -0.0000 1.4437 0.7145 1.8562 -0.7145 1.8562 1.4289 -1.0312 -0.0000 -1.8562 -1.4289 -1.0312 1 2 1 0 2 3 1 0 3 4 1 0 4 5 1 0
0 0999 v2000 0.0000 C 0 0 0.0000 C 0 0 0.0000 C 0 0 0.0000 C 0 0 0.0000 C 0 0 0.0000 C 0 0 0.0000 C 0 0 0.0000 0 0 0 0.0000 0 0 0 0.0000 0 0 0 0.0000 0 0 0 0.0000 0 0 0
0
‘.L________Atomic
5
5
6 1 6 7 7 8 7 9 410 311 212
1
2 1 2 1 1 1 1
0 0 0 0
0 0 0 0
0 0 0 0
0
0 0 0 0 0
0 0 0 0 0 0
0 0 0
0 0
+-Connection
0
0 0 0 0
0 0 0 0 0
0 0 0
0
0
table
0
0
0
0
0
0
0 0
0 0
0 0
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0 0
0 0
0 0
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0
0 0
0 0
0 0
0 0
0 0
0
0
0
0
0
0
0
coordinates
0
0 0 0 0 1 6 6
0 0
73.7 Chemical Informatics
Most importantly in the present context, of course, the intersection of hydrogen-suppressed graphs with general topological and graph-theoretical considerations [ 131 represents an important conceptual advance in the transition between human-readable and machine-readable structure representations, as we shall see in the following section. However, it is also important to remember that one result of simplifying representations, whether made by man or machine, is concealing of a considerable amount of latent complexity. Any representation of chemical structure is thus a complex cipher, allowing our model of structure such brevity as to mask the distinction between the model and the reality of chemical structure. The foregoing evolution of such representations is a testament to both our evolving understanding of structure and the human capacity for encoding any information. In this latter sense, however, chemical structure representation is quite naturally suited to the computer age.
13.1.3 History and Development: Computable Representations of Structure
Since the advent of modern computers, much attention has been paid to methods to represent chemical structure in ways that are electronically encodable. Such representations underlie most modern systems designed to store and utilize chemical information, such as chemical documentation using databases. Beginning in the mid-twentieth century, several methods of encoding chemical information for machine processing were developed. Chemical cipher notations had been introduced and refined by Gordon [17, 181, Dyson [19], Waldo [20, 211, and Wiswesser [22, 231, among others, beginning in the late 1940s. In 1962, Bouman introduced one of the first linear-cipher representations, a “linearly organized chemical code for use in computer systems (Locus)”,whose representations of chemical structure are recognizably ancestral to modern molecular line-entry notations (Fig. 13.1-2(a)) [24]. Significantly, one stated objective of Bouman was to reduce the chemical knowledge required to use the system, allowing more of the coding work to be done by machines or by chemically na’ive clerical stafF. In 1964, Spialter introduced the “atom connectivity matrix (ACM)” in an attempt to define algebraically a “characteristic polynomial” associated t Fig. 13.1-2 Encoding chemical structure. (a) Early encoding after Bouman [24], similar to modern line notation. (b) Early encoding after Spialter [25],similar to modern connection table. Modern encoding methods using (c) International Union of Pure and Applied Chemistry (IUPAC) systematic nomenclature, (d) fragment
codes exemplified by M D L public keys (Elsevier MDL; San Ramon, CA), (e) Simplified Molecular Input Line Entry Specification (SMILES) [28, 291 line notation, and (9 atomic coordinates and a connection table from the industry-standard structure-definition file(SDF) format.
I
729
730
I with chemicalinformatics topology [25]. Again, though clearly inspired by earlier graph13 Chevn;ca/
theoretic work such as that by Ray and Kirsch [2G] among others, Spialter’s paper is among the first to show something recognizable as a precursor to a molecular connection table (Fig. 13.1-2(b))[25]. Many issues familiar in modern cheminformatics were addressed by these early studies, such as the trade-off in readability by an algorithm uersus a trained chemist, the rank and seniority of substructures, and the uniqueness and generality of chemical representations. On the other hand, stereochemical distinctions were not addressed by these early systems; rather the focus of encoding was on the topological connectivity of the molecular graph. For the most part, current methods of computer-encodable structure representation fall into four classes [27]:systematic nomenclature (Fig. 13.1-2(c)),fr.agmentationcodes (Fig 13.1-2(d)), line notations (Fig. 13.1-2(e)),and connection tables (Fig. 13.1-2(f)).In general, unambiguous stereochemical representation remains a problem for all but the most sophisticated of encoding systems. Importantly, encoding methods such as these give rise directly to a wealth of computational approaches to assess similarity between compounds and diversity among compound collections. Rather than relying on chemical training to interpret chemical similarity or dissimilarity, such structureencoding methods allow algorithmic processing of often-large collections of structures for specific properties, such as substructure matches, or general properties, such as the overall diversity of a compound collection. Many methods have been developed to take advantage of increased computing power and computer science sophistication in the representation and computation of structural features. The remainder of this section provides some key details about illustrative examples of several such molecular descriptor methods.
13.1.3.1
Functional Group Constants
Attempts to investigate the effects of physicochemical properties on chemical reactivity, biological activity, and toxicity date back over a hundred years [30-321. In 1936, Hammett predicted entropies of ionization of benzoic acid derivatives on the basis of both structural changes and a consideration of the temperature-dependence of the dielectric constant of the solvent in which ionization occurred. Hammett’s own comments prefigure ongoing controversy about interpreting the structural determinants of molecular properties: “The effect of a change in structure of reactant upon the equilibrium or rate of an organic chemical reaction . . . has been attributed [both] to an increase or decrease in the electrical work [of ionization due to] the substitution” [33, 341. Further extensions of these groundbreaking ideas by Hammett resulted in the so-called Hammett equation, initially used to summarize substituent effects on rate and equilibrium constants for meta- and para-substituted benzene derivatives 135, 361:
13.7 Chernica/ informatics
The symbol ko is an intercept term that is equal to k for the parent (unsubstituted) compound. The reaction constant p depends on reaction conditions such as solvent and temperature, representing the susceptibility of the reaction to environmental effects. In contrast, the substituent constant D P is a measure of the electronic effect of replacing hydrogen by a given substituent, and is assumed to be independent of the reaction conditions. By defining p = 1 for the room temperature ionization of substituted benzoic acids in water, Hammett calculated op values directly for 13 substituents, and predicted those for a further 17 substituents by applying the primary D P values to other reactions. Later work increased the number of c r p values to 44 and the number of reaction series to 51 [35]. From a cheminformatic perspective, the most important consequence of the Hammett equation is that it separates explicitly the contribution of environment from that of chemical structure in the prediction of an outcome (in this case, a reactivity property). As such, the Hammett equation represents one of the earliest attempts to predict molecular behavior on the basis of chemical structure alone. Notably, however, later investigators experienced difficulties when trying to apply Hammett-type relationships to biological systems, indicating that additional structural determinants need to be considered [32, 371. In the 1960s. several seminal papers by Hansch and coworkers inaugurated the era of quantitative structure-activityrelationships (QSARs),using structural determinants to model and predict first the physicochemical, and then the biological properties. First, Fujita et al. explicitly measured partition coefficients between 1-octanol and water for over 200 mono- and disubstituted benzenes [38]. These measured values were used to derive new substituent constants for 67 functional groups attached to various benzene derivatives, representing the change in partition coefficient introduced by adding the substituent. While some variation between these constants was observed across different electronic environments, the variations were relatively small and were sometimes related by simple linear expressions, allowing the authors to use this system to establish correlations between partition coefficients and biological activities. Shortly thereafter, Iwasa et al. demonstrated the value of using substituent constants, this time for aliphatic groups, to correlate chemical structure with the narcotic action of alcohols, esters, ketones, and ethers on tadpoles [39]. These seminal papers set the stage for the entire field of QSARs, which in general attempts to derive equations that relate predicted or measured physicochemical properties to some biological outcome. In 1969, Hansch reflected on these early results in the Accounts of Chemical Research [37, 401, relating nearly 20 years of interest in indole derivatives, and an ongoing collaboration with Robert Muir of the Pomona botany department to correlate chemical structure with the biological activities of indoleacetic acid-like synthetic hormones. In an almost prescient allusion to the ongoing challenges of interdisciplinary work, Hansch recounts that “attempts to formulate these
I
731
732
I [results] in quantitative terms were frustrated by our conceptual training . . . 13 Chemical Informatics
Muir was well aware of “lock and key” theory of enzyme-substrate reactions, . . .[and] I was conditioned to explain substituent effects in the electronic terms of the Hammett equation.” Hansch et al. were considering different ways of mathematically combining Hammett constants and partition coefficients to reduce data variance in their models, and Fujita had initially suggested a linear combination. Only later, when Hansch could “bring [himlself to postulate that log (1/C) was not linearly but parabolically dependent on log P”, did they obtain a generally useful relationship. Hansch rationalized this relationship by saying that molecules that are highly hydrophilic will not penetrate lipophilic barriers, while highly hydrophobic molecules will be soaked up by the first lipophilic material they encounter; either way, such molecules will have difficulty reaching their sites of action. Thus, only molecules with intermediate lipophilicities will readily exert biological influence. These insights represent groundbreaking thinking for their time, and herald the modern age of QSAR. Currently, both linear and nonlinear relationships between structure and activity are routinely considered, and the effects of both electronic (polar) and hydrophobic interactions are embedded within QSAR models. Such considerations allow generally predictive models of activity based on small-molecule structures, at least within congeneric series of molecules. Moreover, hydrophobicity, expressed as the octanol-water partition coefficients (log P ) , has proven useful in predicting various biological observations [37,40],and this property is now used extensively in drug discovery and predictive toxicology [41, 421. The Hanschtype approach that correlates physicochemical properties with activities using multivariable regression techniques has subsequently been widely applied to problem areas such as toxicity, enzyme inhibition, ligand-receptor binding, carcinogenicity, mutagenesis, and metabolism [43],and the insights of Hansch with respect to the interplay of hydrophobic and electronic parameters presage decades of research into molecular descriptor analysis that continues to this day.
13.1.3.2
Graph-Theoretic Indices
Recalling Cayley’s plerograms and kenograms [lo, 12, 141, small molecules can be (and usually are) represented as polygonal shapes where each vertex represents an atom and each edge represents a bond. This representation is termed the molecular graph, and a given structure can be a path, a tree, or a graph, in the formal language of topology. Graph theory provides for the calculation of indicators defined over such graphs, generally termed indices [14, 441. The use of topological indices in chemistry began in 1947 when Harold Weiner developed the oldest among the topological indices for molecular structure [45-47], the Weiner index, and used it to predict physical properties of paraffins [44, 481. The Weiner index, W , on a graph G , is
given by: W(G) =
C d(atom,,atom,) ’
where d is the shortest distance obtained by counting bonds between the two atoms, and the sum is computed over all pairs of atoms in G . Importantly, it has subsequently been shown that the Weiner index for a molecular graph may have strong correlations with chemical properties [49-511. Consequently, it is often the objective of synthetic efforts, particularly in drug discovery optimization, to construct compounds with certain properties by synthesizing lead compounds with a particular Weiner index. This strategy is an important example of how computed properties (that correlate predictively with desired properties) can be used to create new compounds that have certain values; that is, they occupy certain regions of a chemical descriptor space [44]. Weiner also observed the following relation for molecules that have acyclic graphs: W(G) = n,(bond,IG)nz(bond,IG) where the sum is computed over all bonds in G , and where nl(bondilG) and nz(bondi1G) are the number of atoms lying on either side of a given bond [14, 46, 481. This result can be conceptualized first by considering that large contributions to the sum in the first definition of W will come from atoms near the molecular perimeter, since these are more bonds removed, on average, from most other atoms, whereas smaller contributions to the sum will come from more central atoms (Fig. 13.1-3(a)).Since all pair-wise distances used in the sum are obtained by counting bonds, the alternative calculation of W involves considering the number of times each bond must be traversed to account for all paths between pairs of atoms separated by at least that bond (Fig. 13.1-3(b)). Weiner’s work set the stage for one of the first true multidimensional molecular descriptor spaces. In 1979, Randit et al. published the details of a program, written in both BASIC and FORTRAN, which found all the paths through a molecular skeleton represented using a molecular graph [52]. Though the total number of such paths increases rapidly with molecular size, and especially with the number of rings in a molecule, even at the time of first publication such path counting was a practical computing task for most chemical structures. The strategy of this approach was to develop a set of molecular codes corresponding to the number of self-avoiding paths of each length in a molecule, for use both as a convenient representation in subsequent similarity searches [52-541, and as a quantitative measure of structural complexity. Since the basic calculation method for these codes was again based on counting bonds, it is easy to visualize how these path codes are related to the Weiner index (Fig. 13.1-3).While these initial molecular codes did not address
734
I
73 Chemical fnformatics
Fig. 13.1-3 Topological indices and path counts. (a) Illustration o f path counting leading to topological indices; beginning with the atom labeled 1, red bonds illustrate paths of lengths 1 through 6, terminating with the atoms labeled with asterisks. (b) Illustration ofWeiner’s observation [14, 481 that a bond, labeled with an asterisk, will be traversed 3 x 9 = 27 times to account for all paths between pairs of atoms on either
side ofthe bond. (c) Illustration of a graph C as a molecular representation, and o f the relationship between Randit’s path coding system [52] and the Weiner index. (d) Illustration o f a graph C’ representing Randit’s later attempts [53]to include bond order in finding paths; note how this modification breaks the symmetry o f this graph, requiring relabeling o f four atoms.
multiple bonds, Randit later published a second version of the program [53] that enumerates paths in chemical graphs with multiple bonds (Fig. 13.1-3(d)). In this case, both the input information and the algorithm are more complex, and the numeric values of the codes could be much larger, especially in the case of molecules with multiple double bonds, but this improvement was a step closer to representing the chemical reality of bond order. Randit’s methods allow the association of numerical parameters with chemical structure in a way analogous to more detailed structural studies based on numerical calculations derived from theoretical models (e.g., quantum chemical calculations). The distinction between these two approaches is in the nature of the parameters, rather than the goal, which in both cases is to define correlative relationships between numerical computation and
I
13.1 Chemical ~nformatics 735
actual molecular properties. While graph theory emphasizes conceptual development, information encoding, and speed of calculation, quantummechanical calculations focus on practical simulation of physical chemistry theory and a more accurate (though more computationally demanding) depiction ofchemical structure. Since this initial work, several other topological indices proposed by Randit [54-571, Basak [55, 58-60], and Balaban [55,61, 621, have been used to predict toxicity as well as many physicochemical and biological properties [55]. These indices have also been used in diversity analysis [55,631 and in analyzing the “drug-likeness’’ of compounds and compound collections [55,641. In general, topological indices have now been highly developed theoretically [65], and are being complemented by related information-theoretic indices, such as the Shannon index [SO, 65, 661.
13.1.3.3
Structural Feature Counts
Early QSAR studies concentrated on establishing correlations between biological activities and experimentally derived physicochemical properties, such as partition coefficients, molar refractivity, or pK,, and predominantly used linear regression as a correlation technique [38, 43, 671. Although this approach is still used, experimental physicochemical parameters have largely been supplanted by computer-generated descriptors. In many cases, these descriptors consist of feature counts, often computed on whole molecules (e.g., number of carbons, number of rings, etc.), but increasingly fragment codes have also been used to allow prediction of outcomes based on molecular fragments or more local structural features, such as phamzacophores. Such methods are generally fast, since only simple forms of structure representation are needed for this type of modeling, circumventing the need for time-consuming three-dimensional rendering, conformational analysis, and molecular alignment, as is done with some other QSAR methods. Fragment-based QSAR approaches are especially suited to rapid virtual screening of large libraries against protein structures, a need that is often encountered in both drug discovery and toxicology. The earliest attempt to utilize substructural fragments to predict outcomes was the 1956 Free-Wilson method [68], which uses linear correlations between an observable property and constant, additive contributions of substituents to a common skeleton. Subsequently, similar approaches have been used both by Leo et al. [69] and by Ghose, Crippen, and coworkers [70-721 to calculate log P values by adding partial log P “contributions” from each fragment in the molecule. Attempting to couple a description of molecular topology with atom identities in the molecular graph, Carhart et al. at Lederle Laboratories presented a new descriptor methodology based on atom-pairs [73], inspired by Weiner’s earlier work (Fig. 13.1-4(a)).Prior work had focused either on topology, as we have seen from Weiner and Randit, for example, summing electronegativity products for all pairs of atoms separated by paths of the same length [74],or on developing chemically intuitive relationships only between directly connected
736
I
13 Chemical informatics
Fig. 13.1-4 Atom-pairs and topological torsions. (a) Illustration o f atom-types used in atom-pair descriptor calculation including atomic identities, pi-bonding, and molecular topology. (b) Distinct atom-types, some of which occur multiple times, make up the basic unit of atom-pair calculation. (c) Atom-pairs are enumerated by assembling the list ofall distinct pairs of atom-types and the path length connecting them. (d) Distance metric defined by Carhart [73]in an atom-pair descriptor
space. (e) Topological torsions represent the topologies of sets of four directly connected atoms, using the same atom-types as atom-pairs; these encode local information only, whereas atom-pairs contain information about both local and distant pair-wise relationships. Note how the inclusion o f stereochemistry in topological torsions would increase the number o f distinct topological torsions in this molecule from 18 to 20 (gray ovals).
13. 1 Chemical lnformatics
pairs of atoms [75]. Carhart outlines two applications to structure-activity problems to which molecular descriptors are to be applied: similarity between compounds, and correlations of descriptors with measured biological activities. Notably, this paper clearly frames the important goal of all molecular descriptor studies, namely, to “express an irregular object like a chemical structure in a regular form that allows the quantitative comparing and contrasting of those structures” [73]. Further, Carhart explicitly enjoins his readers to consider structure as a vector of numerical descriptors representing the position of compounds in a high-dimensional space (i.e., a chemical space) with each coordinate axis representing a different descriptor. Carhart argues that Hansch analysis requires a set of compounds that are closely related, sharing a common skeleton and differing only in the nature and positioning of a few substituents [73]. Descriptors such as molecular connectivity and other topological indices have the advantage that they can be computed easily from the connection table of a structure and can be applied to much more diverse sets of compounds. However, these descriptors encode only whole-molecule measures of topology and therefore may be difficult to interpret even if they do correlate with some measurable biology. In contrast, certain other parameters that are computed from analyses of molecular shape-encoded space-filling and electrostatic potential features can also yield good models of activity; however, their computation requires detailed conformational analyses. As a compromise, Carhart et al. offered the atompair, which is meant to afford generality, ease of interpretation, and encoding of local topological structure. Perhaps the simplest method of encoding chemical structure using a computer is simply to count molecular features, such as atoms, substructures, or topological elements (e.g., rings). In general, descriptors relating to topological substructure can take the form either of counts (e.g., the number of hydroxyl groups in a structure) or of binary variables that record the simple presence or absence (as 1 or 0, respectively) of a particular moiety. Atom-pairs, in particular, encode the number of occurrences of pairs of atom-types separated by a particular number of bonds in the molecular graph. Atom-type designations in atom-pairs have constitutional, topological, and electronic character - atoms of the same type share atomic identity, the same number of non-hydrogen bonding partners, and the same number of bonding j 7 electrons (Fig. 13.1-4(b)).Because of this representation, molecules tend to have many, fewer than the theoretical maximum possible, atom-pairs (1/2 [n (n-l)]’ for a molecule with n atoms), both by virtue of having multiple atoms of the same type, and because the order of the two atoms’ appearance within an atom-pair is not important (Fig. 13.1-4(c)).A very significant aspect of the definition of atom-types is that Carhart et al. provided both a distance metric and a normalized similarity score for molecules based on the atom-pair definition (Fig. 13.1-4(d)).Formally, such a provision is a requirement for any metric descriptor space intended to afford a basis for comparison of molecular
I
737
738
I similarity or analysis of molecular diversity. Often, however, descriptors are 73 Chemical Informatics
provided without such a formalism, leaving their value as a mathematical description of chemical structure wanting. One obvious criticism of atom-pairs is the loss of conformational information associated with two-body topological descriptions of structure. However, additional work at Lederle Laboratories sought to address this problem. The reasoning was that although a specific three-dimensional arrangement of atoms may be necessary for activity, the features essential for activity are actually encoded in the topological description of the molecule. Torsion angles defined by four consecutively bonded atoms represent the minimal structural unit in terms of which molecular conformation can be completely described. On the basis of this rationale, Nilakantan et al. proposed topological torsions as a new descriptor set for use in QSAR studies [76].The topological torsion consists of four consecutively bonded non-hydrogen atoms along with the number of non-hydrogen branches (Fig. 13.1-4(e)),and is arguably the topological analog of the torsion angle. Immediately, the workers at Lederle recognized that the short-range description provided by topological torsions complements the atom-pair description in that each encodes different information about molecular topology and shape. Like atom-pairs, topological torsions correspond to readily recognizable features of molecules, and are similarly easy to calculate. Comparing the two descriptions gives slightly different results in similarity calculations. Whereas atom-pairs are sensitive to small changes even in large molecules, topological torsions are local - the effects of changing a single atom in a molecule is independent of the total number of atoms. Rather, the actual number of topological torsions affected by a change in structure depends only on the local topology in the vicinity of the change [76]. Nilakantan et al. suggest that similarity analyses using both sets of descriptors be combined by merging the lists of similar compounds that were obtained by each method. Of course, by modern standards of highdimensional descriptor spaces, this approach somewhat misses the point - a more useful measure of similarity (after accounting for differences in range and variance among the descriptors) would be to perform similarity calculations in a metric space containing both atom-pairs and topological torsions at the outset. Ongoing adaptation of the early work at Lederle has led to an explosion of methods designed to exploit structural feature counts. A computerage generalization of feature-count descriptors are structure keys, which work by associating bits in a string with the presence or absence of defined molecular features (see Fig. 13.1-2(d)).In their most general form, structure keys require that the choice of features to be included be specified in advance, and the position of the bit in the bit-string encodes the same feature for all molecules [77]. Indeed, in principle, Carhart’s atompairs could be encoded as structure keys, with each bit corresponding to the presence or absence of an allowed (predefined) atom-pair. Structure
73. I Chemical lnforrnatics
keys are an important advance in the substructure searching of large databases, or (more accurately) substructure screening. A screen is a process by which candidates are ruled out efficiently, leaving only a small number of candidates for more accurate but time-consuming comparisons. Because structure keys encode substructural features exactly, and at defined positions within bit-strings, encoded database objects that fuil to contain any definite feature of a query structure can immediately be eliminated from consideration. Another conceptual possibility used in structure-key descriptors is to set lower limits for the number of occurrences of a structural feature required to set a particular bit. For example, one could associate a series of bits with properties encoding the number of rings in a molecule, with one bit for each of the features “>I ring”, “ > 2 rings”, etc. In this way, bit-strings can be made to encode not only the presence but also the number of each desired feature. It is important to note that this type of strategy enables the encoding of any collection of molecular features, provided that a sufficient number of bits are allowed. In addition to discrete parameters (e.g., number of rings), even continuous-valued parameters (such as log P) could be encoded in keys, provided that the continuous values can be binned to an acceptable resolution. Similar to atom-pairs, atom-triples have also been used to encode ligand features in terms of the properties of triangles [78] since three-body objects retain more information than pair-wise representations. However, often the number of constituent triples for which calculations are required became limiting, allowing fewer structures or fewer conformers to be considered. In one adaptation, Good et al. [78]restricted their consideration to “key functional centers” in molecules that participate in the triplet descriptions. While this method reduces computation times for large databases, its inherent bias (i.e., the preselection of which pharmacophores are allowed to participate) presents a new set of problems for truly generic database and substructure searching. Circumventing this conceptual limitation of structure keys required an important evolution of feature-counting methodologies - the notion of the molecular fingerprint [77, 79, 801. In general, molecular fingerprints are bitstrings that encode information about molecular atom-types, topology, and even extended functional groups, but without prespecifying which features are to be encoded. This generality is accomplished by generating the list of features from the molecular structure itself, with a pattern representing each atom, each pair of connected atoms, each triplet of connected atoms, and SO on, Each of these patterns, up to some connectivity radius, is used to seed a pseudorandom number generator that determines which bits are set by that pattern. Though this hash-coding procedure does not preserve the positional meaning of individual bits within the overall fingerprint, it does ensure that any molecular fingerprint containing a given pattern will contain the bits associated with that pattern.
I
739
740
I
13 Chemical fnforrnatics
A fingerprint space can thus be viewed as a bit-string that is shared among a very large unknown set of molecular features. Since each feature sets its own subset of the bits (usually 4-43),the presence of a feature is related to the chance that at least one of these bits is shared with no other pattern. Obviously, this probability depends upon the total length of the fingerprint, the total number of bits set by each pattern, and the total number of patterns. While structure keys indicate the definitive presence or absence of a particular feature, fingerprints are better at ruling out features (a required bit is absent) than confirming them, since the presence of a pattern can only be determined with some probability [77,79-811. Nevertheless, because of their higher density than structure keys and their generality, fingerprints are now quite widely used in cheminformatic applications. Since the introduction of fingerprints, and their wide adoption in database systems such as Daylight, other fingerprints have been developed that are tailored for other applications, such as learning and clustering [77, 81, 821.
13.1.3.4
Electrotopological States (E-states)
Among the most self-contained and complete molecular descriptions is that of Kier and Hall [83, 841, termed the electrotopological state (E-state). This description combines electronic and topological characteristics of small molecules, making use of the hydrogen-suppressed graph to generate state values for each non-hydrogen atom. To compute E-state values, individual non-hydrogen atoms within the molecular structure first receive intrinsic state values according to the formula:
)(; I=
2
S”+1
S
where N is the principal quantum number, S is the number of connected atoms other than hydrogen, and 8’ is the number of valence electrons not involved in bonds to hydrogen (Fig. 13.1-5(a)).The intrinsic state aims to encode the accessibility of an atom to intramolecular interaction as well as the collection of bonds over which adjacent atoms may influence its state [83, 841. Note that this definition provides identical resolution of structural elements as the atom-types used for atom-pair and topological torsion calculation (compare with Fig. 13.1-4(b)).Estates, however, modify the intrinsic state by accounting for all influences between atoms using the formula:
where ry is the number of atoms in the shortest path containing atoms i and j, and the sum is taken over all atoms j in the molecule. The resulting
13.1 Chemical Informatics
1.33
Fig. 13.1-5 Intrinsic and electrotopological states. (a) Illustration o f intrinsic state values; note that these values encode similar information and have equivalent
resolution t o the atom-type definitions i n Fig. 13.1-4(b). (b) Illustration o f t h e electrotopological state (E-state) values of Kier and H a l l [83, 841.
E-state values now reflect the influences of neighboring atoms, and thus discriminate atoms with quite similar environments as having at least slightly different E-state values (Fig. 13.1-5(b)).One of the primary benefits of the E-state description of molecules is its generality; the calculations proceed from first principles and can produce, overall, a high-dimensional “state space” into which each molecule is positioned. Indeed, Kier and Hall argue that to “generalize any analysis of molecular description to large collections of arbitrary structures, it is necessary to work in a mathematical framework that accounts adequately for the number and type of descriptors necessary to build a relatively complete description of chemical structure.” This and similar methods allow for an encoding of such structural features as size, branching, unsaturation, cyclicity, heteroatom content, etc., in quantitative terms, and provide a framework for numerous structure-activity applications [55, 56, 85-87].
13.1.3.5
Shape and Field Descriptor Methods
While most of the foregoing methods focus on the rapid encoding of molecular structure, particularly to facilitate large database searches and similarity comparisons, it is still desirable and practical in some circumstances to encode chemical structure using descriptors that explicitly account for molecular shape properties, such as surface area or volume, in some regular fashion. In general, one obstacle to conformation-dependent drug design is the accurate characterization of molecular shape. One of the pioneers of this type of work, Hopfinger made an important distinction between shape and conformation, noting that conformation “is a component of shape in that conformation defines the location of atoms in space. The properties of these atoms, most notably their ‘sizes’, represent an additional set of factors needed” to fully specify molecular shape [88].
I
741
742
I
73 Chemical Informatics
Earlier work in this area of shape analysis focused on QSAR studies accounting for conformational features of molecules, such as interatomic distances [89], explicit atomic coordinate sets [go], computed intermolecular distances [91], and simpler shape descriptors such as molecular volume “921. Each of these descriptor types formally requires conformational analysis, and therefore produces, accordingly, a family of solutions for most structures. Against this backdrop, Hopfinger developed a model of molecular shape on the basis of shape overlap, and used these descriptors to aid in the prediction of activities of a series of dihydrofolate reductase inhibitors. In this study, Hopfinger compares his QSAR example favorably to a similar model from Silipo and Hansch [93],which is based solely on physicochemical and substructural features. In this example, at least two shape descriptors and one physicochemical feature were required to explain the variance in enzymatic inhibition data [88].Thus, at least in this QSAR example, systematic consideration of three-dimensional molecular geometry was essential to explain drug potency. Hopfinger later developed a general formalism, on the basis of a molecular mechanics pair-wise potential function, to compute molecular potential energy fields [94].These functions, too, are conformationspecific, requiring additional analysis and multiple solutions per molecule. However, molecular descriptors can be derived from the resulting potential energy fields, which in turn can be used in QSAR studies. In 1988, Cramer et al. introduced comparative molecular field analysis (CoMFA) [95], a descriptor methodology based on the notion that the most relevant calculable properties to small molecule-receptor interaction are shape-dependent properties. Cramer argued that because biological effects are noncovalent, molecular mechanics force fields used to model stereoelectronic effects could account for most such effects. CoMFA attempts to sample these fields by considering a probe object designed to “feel” these forces from a molecule at each point of a three-dimensional lattice. Each lattice point gives rise to a steric and electronic potential term experienced by the probe object, and thus the size of the resulting descriptor list can depend greatly on the resolution of the probe object. However, because each descriptor has the same energetic unit (e.g., kcal/mol), there is no need to normalize the descriptor set before deriving a QSAR model. In general, CoMFA produces descriptor lists that are considerably larger than the number of compounds under consideration. Accordingly, CoMFA was one of the first QSAR methods to rely on partial least-squares (PLS) analysis [88,95-981, which seeks to derive linear equations from tables having many more columns than rows. Since the development of CoMFA, a number of modifications and evolutionary advances have afforded methods to improve model performance through variable subset selection. QSAR methods such as those used by Hopfinger and Cramer measure the overall stereoelectronic similarity between pairs of molecules, in general by relating activity data to comparisons of query molecules with a single lead molecule. Good et al. extended this work by attempting such correlations
13. I
Chemical informatics
to data matrices obtained by the complete set of pair-wise comparisons among a collection of molecules [78, 99, 1001, which gave excellent correlation for a set of steroids. This work extends the notion of a property overlap parameter, such as that used by Carhart [73] as a measure of similarity; again, the numerator measures property overlap while the denominator normalizes the similarity result (see also Fig. 13.1-4(d)).As originally applied, electron density was used as the structural property for which overlap was measured. In the study by Good, electrostatic potential, electric field, and shape were also used by modifying the original program. These additional parameters were used to derive good QSAR models for several systems. In 1996, Cramer introduced another advance in shape-based molecular description as an extension of CoM FA, introducing “topomers” [ 1011. Topomers make use of the substructural commonalities among members of congeneric series of molecules to align the structures in a CoMFA field. For this reason, their use is restricted to cases in which all members compared contain a common substructural element, which is reminiscent of the empirical work of Hansch on substituted benzenes. Cramer uses a “topomeric” algorithm to align the variable portion of each molecule, in the process selecting a representative conformation. The steric components of CoMFA are then calculated for each of these variable portions, and the resulting descriptors used to generate clusters of similar molecules. In the case of the original topomer paper, Cramer segregated over 700 commercially available thiols into 231 bioisosteric clusters with compositions, at least as well in agreement with medicinal chemistry experience and intuition as clusters derived with previous computational methods. Cramer’s topomer work is based on the idea that earlier efforts at molecular alignment (including in his earlier CoM FA work) overemphasize the need to find receptor-bound or minimum-energy conformations [ 1011. The authors offer three explanations for why this might be so. First, they argue that steric interactions are the most important class of noncovalent interactions responsible for receptor engagement. Second, they cite the nonindependence of electronic factors from steric factors, alluding to the possibility of correlations between different descriptors, a complication that is endemic to multidimensional descriptor spaces. Third, they note that adding another geometric field (such as the electronic components of CoMFA) would halve the contribution of steric information to the differences between one molecular shape and another - in this case, many more compounds would be required to recapitulate the observed bioisosteric classes. This last reason is especially thought provoking - there are infinite possible descriptors, but choosing too many for a particular comparison may obscure the classification one is seeking, particularly if the “extra” descriptors do not encode information germane to that classification. In Cramer’s case, bioisosteric classes were sought that aesthetically agreed with the intuition of medicinal chemists; for this reason, tqpomer classification
I
743
744
informatics I of these thiols was restricted to descriptors resulting from steric field 13 Chemical
interactions. A less direct but equally significant feature of the topomer paper is the fact that Cramer et al. explicitly considered (and discussed in detail) several features of the available clustering methods, the consequences of the chosen number of clusters, and justified their choices. Sadly, such rigor is often lacking in molecular descriptor analysis, particularly as commercial descriptor calculation and clustering packages with fewer adjustable parameters (or more “entrenched” default values for these parameters) emerge. Cramer et al. rationalize the use of hierarchical clustering with complete linkage (where intercluster distances are defined in terms of the worst-case scenario, or maximum distance, between any pair ofobjects, one from each cluster) with the intention of maximizing intracluster similarity at the expense of computational resources. In particular, complete linkage hierarchical clustering produces roughly spherical clusters, whose positions remain essentially stationary as new objects are added, and which merge reluctantly. Practically speaking, such clusters should be relatively robust to the input set of molecules. In one particularly simple and elegant shape-based approach to molecular description, Sauer and Schwarz [102, 1031 proposed the use of ratios between principal moments of inertia (Fig. 13.1-6(a)).Here the authors reasoned that the shape envelope of small molecules could be viewed as falling between three limiting cases representing rods, disks, and spheres (Fig. 13.1-6(b)).By using ratios computed using the principal moments of inertia of small molecules, the authors reduced the problem of shape to a two-dimensional mapping onto an isosceles triangle (Fig. 13.1-6(c)).Using this framework, the authors set out to describe differences in chemical space coverage coming from skeletal diversity, as defined by the number of different scaffolds represented by a compound collection, versus appendage diversity, as defined by the inclusion of multiple building blocks on a common scaffold. Most importantly, this method encodes molecular shape independently of molecular size, allowing shape comparisons to be made between molecules spanning large ranges of molecular weight. In general, shape-based descriptor methods can be viewed as the most “realistic” picture of chemical structure, since latent features such as molecular topology and valence remain implicitly encoded, whereas the overall description is capable of encoding additional stereochemical and conformational information. In general, this accuracy bears a certain computational cost, either because detailed modeling must be employed to generate a “good” three-dimensional structure for which to compute descriptors, or because conformational uncertainty warrants calculation of descriptors for a family of conformers. Nonetheless, shape-based molecular description can provide powerful insights into the relationships between topology, stereochemistry, and conformation in determining molecular properties.
Fig. 13.1-6 Shape-envelope analysis based envelope" of small molecules. on principal moments of inertia. (a) Illustration of principal moments of inertia. (b) Relationships of principal moments of inertia to the ideaiized "shape
(c) Two-dimensional map of a chemical space based on principal moments-of-inertia ratios.
746
I
13 Chemical lnformatics
13.1.4 Applications and Examples: Molecular Descriptor Spaces
As we have seen, molecular descriptors constitute information about steric and electronic constraints conferred by chemical structure [104, 1051. Molecular descriptors underlie both pharmacophore models [106, 1071 and analyses of similarity or diversity among compound collections [log, 1091. The calculation of descriptors therefore serves as a starting point in the analyses of smallmolecule relationships assessed prior to compound synthesis, before selecting compounds for HTS, and in the interpretation of biological measurements of small-molecule perturbation. As described earlier, QSARs have emerged as a computational paradigm in modern drug design [ 110- 1121. This approach attempts to encode biological activity as a mathematical function using numerical methods to correlate large amounts of screening data for hundreds or thousands of candidate compounds. The data are mapped onto a chemical space consisting of several descriptors, with the hope that this space can reliably estimate the properties of new molecules [44]. A fundamental assumption of QSAR is that variations in the biological activity of a series of chemicals that target a common mechanism of action are correlated with variations in their structural, physical, and chemical properties [32, 1131. Since structural properties of a small molecule can often be determined more efficiently than biological properties, a statistically valid QSAR model is a desirable substitute for the time- and labor-intensive processes of chemical synthesis and biological testing. Obtaining a statistically robust model depends on how well the selected descriptors encode variations in activity within a structure series [32]. Information about molecular mechanism can aid a chemist in selecting among available descriptors, but as we have seen, there are numerous bodies of molecular descriptor theory, and the overall number of available descriptors can easily number in the thousands. For this reason, modern molecular modeling programs often include statistical tools to help evaluate which descriptors best encode structure-activity variation. About a decade ago, computational chemistry researchers began to address the questions associated with how to validate a descriptor or set of descriptors. Patterson et al. [114] established a framework for considering diversity in the context of both lead discovery and lead optimization. In particular, Patterson’s method relies on the discovery of “neighborhood behavior” between molecules when considering the effects of changes in a measure of molecular diversity and some biological activity. The chief requirement of a “valid” molecular diversity description, argue the authors, is that small differences (distances) in the underlying descriptor space do not often produce large differences in biological response. A second important result of this work was the finding that, in general, higher dimensionality of an underlying descriptor space most often was predictive of good neighborhood behavior, and therefore of “validity” of the descriptor space with respect to arbitrarily chosen biological
13.7 Chemical lnformatics
outcomes. In this particular study, Patterson et al. used their method to validate a number of individual descriptors and multidimensional descriptor spaces, concluding that CoMFA fields, as well as two-dimensional (2-D) fingerprints of the variable portions of the molecule series (each molecular descriptions of high dimensionality), were most often possessed of neighborhood behavior. Satisfactorily, later work using these concepts at Bristol-Myers-Squibb [ 1151 allowed for the prospective choice of molecules to synthesize that they were significantly enriched in biological activity against angiotensin 11. In these later studies, the topomer shape similarity description was once again shown to be a highly effective predictor of activity, followed by the atom-pair description. For this particular problem, most other descriptions did not exhibit the required “neighborhood’ behavior. Consistent with the results of Patterson, which allow large differences in diversity descriptors to produce large variation in biological activity, later work found that the use of “valid” molecular description methods was more important than whether the test compounds used to inform the prospective syntheses were weakly active or strongly active, suggesting that this method should be a general way to approach lead optimization problems. To generalize these conclusions with respect to chemical descriptor spaces, especially notable is the better performance of two-dimensional fingerprints of variable side-chains to whole-molecule two-dimensional fingerprints in the original validation study [ 1141, suggesting that the highest dimensional space relating to the variable portions ofthe molecules is desirable to use as a diversity description. Intuitively, such descriptor spaces satisfactorily correspond to the most information-rich description of the molecules under consideration. Benigni et al. [11G] also compared different molecular description methods, inspired by the study of global versus local properties of a molecular descriptor space. Comparing a series of 148 structure keys, similar to those described earlier, to a heterogeneous set of 37 one-dimensional (e.g., molecular weight), two-dimensional (e.g., Weiner indices and E-states), and three-dimensional (e.g., surface areas) molecular descriptors, Benigni et al. investigated a collection of nearly 300 noncongeneric small molecules at both global and local levels. Among the strengths of this approach was the authors’ clear distinction between effects evident using local methods such as cluster analysis and effects evident using global methods such as principal component analysis (PCA). While cluster analysis techniques provided a detailed description of local structure within a chemical space, such as similarities between cluster members and intercluster distances, factorial techniques, such as PCA, describe the entire dataset in terms of a small number of orthogonal basis vectors. The authors make use of this complementarity to show that the two descriptor spaces are globally similar (isomorphic) as judged by the overall high mutual correlation of their PCA transforms, and the progressive increase in this concordance with increasing numbers of principal components (matched between the two spaces to achieve similar levels of explanation of the overall variance). On the other hand, cluster analysis, using k-means clustering and several choices of
I
747
748
I k, revealed that the structure-key description had much lower cluster propen13 Chemical fnformatics
sity (departure from a uniform population of the descriptor space) than did the composite space composed of the one-dimensional, two-dimensional, and three-dimensional descriptors. The authors suggest that this result can be explained by the much lower information density of the former space, composed as it is from a series of binary features (presence or absence of predefined structural features; see also Section 13.1.3.3) rather than from a collection of discrete- or continuous-value variables. The generality of these results to additional descriptor spaces will likely require additional experiments involving many more compounds, but the conclusion that global isomorphism between two descriptor spaces does not predict similarity in the fine structure between those spaces is inescapable. The latter result has very important consequences when considering the use of molecular descriptors in different computational chemistry tasks. First, it suggests that any sufficiently information-rich representation of chemical structure, whether composed of a large number of binary variables (such as fingerprints) or composed of a smaller number of discreteor continuous-valued variables, is suitable for global analysis problems, such as maximizing the overall diversity of a screening collection. On the other hand, it suggests that the choice of descriptor space is quite important for local problems such as lead exploration as envisioned in the neighborhood plots of Patterson, or QSAR studies among members of congeneric series. Rusinko et al. [117] reported an elegant method for feature (chemical subspace) selection among binary descriptors using recursive partitioning. The method requires that some measure of activity be recorded for the compounds, but this activity figure can be qualitative. In this study, the activities were simply 0, 1, 2, 3 , representing no activity, weak, moderate, or strong activity. The authors' method uses sparse-matrix techniques to move quickly through a very large set of descriptors and choose those descriptors most responsible for discriminating active compounds from inactive ones. The descriptors used were atom-pairs, topological torsions, and atom-triples, computed for a group of 1650 monoamine oxidase (MAO) inhibitors. Using the statistical T-test to find individual descriptors that accounted for large differences in mean activities between the two groups, the authors achieved 15-fold enrichment (7/227) versus 72/3 5631 in inhibitors relative to random selection. However, the false-negative and false-positive rates were both high, since the method picked 220 other molecules that were not M A 0 inhibitors and failed to find 65 M A 0 inhibitors in the dataset. The authors provide an excellent discussion of the comparison of this method with other methods, especially including those methods that fail badly when multiple mechanisms of action are simultaneously operant in a dataset. Also using chemical space as a framework, Agrafiotis [118] presented a very fast method for diversity analysis on the basis of simple assumptions, statistical sampling of outcomes, and principles of probability theory. This method presumes that the optimal coverage of a chemical space is that of uniform coverage. The central limit theorem of probability theory
73.7 Chemical Informatics
suggests that the distribution of distances between uniformly distributed points becomes normal in the limit of a large number of dimensions. By representing uniform coverage of chemical space in terms of a normal distribution of distances, Agrafiotis was able to use a statistical test for normality, the Kolmogorov- Smirnov (K-S) test, to determine whether a given experimental coverage of chemical space, represented by a collection of compounds under study, is more or less uniform. An important result of this work was that a relatively small sampling of the overall collection of intercompound distances closely approximated the expected distribution if all pair-wise distances were explicitly computed, allowing the method to be used to select subsets of building blocks in a combinatorial synthesis that provided the most uniform coverage of products in the descriptor space of interest. Oprea provided a novel and important advance in descriptor space analysis by introducing the ChemGPS system [119]. The key feature of this work is to attempt to provide a global map of “drug-like’’ descriptor space by deliberately choosing molecules well outside the drug-like space as “satellites” with extreme values relative to the molecules under consideration. As a method for providing a standard metric for chemical space, ChemGPS is essentially generic; though it focuses on the drug-like space, the principles could be applied broadly and are largely independent of the choice of molecular descriptors used. In later work, Oprea applied a different descriptor set to molecules in an effort to produce a chemical space relevant to absorption, distribution, metabolism, and excretion (ADME)/toxicologystudies [120]. In this case, the principal components corresponding to this space, named GPSVS, were shown to be correlated to physically interpretable properties of compounds, namely, solubility and permeability. This finding is certainly not a general feature of PCA-based methods, since a priori there is little reason to expect a preservation of chemical interpretability in the light of a PCA transformation of data. However, in this case, the combination of the ChemGPS method with a particular descriptor set (VolSurf) chosen for its relevance to ADME properties, afforded a solution that provided a map of chemical space subject to practical interpretation, despite its reduced dimensionality. In an effort to compare descriptor distributions between compounds from different sources and synthetic paradigms, Feher and Schmidt [121] used PCA-based methods to compare property distributions from natural products, drugs, and combinatorial libraries. In this case, the authors used chemical space as a common framework to ask questions about the how the origins of compounds are manifest in their structural features at a global level. In particular, this study demonstrates the general dominance of synthetic efficiency, rather than structural diversity, in the preparation of compounds by combinatorial chemistry. The descriptors most able to distinguish natural products from those synthetic molecules studied were those that rendered the latter class easier to make, such as fewer
1
749
750
I stereocenters, more aromatic rings, fewer complex ring systems, and more 73 Chemical Informatics
flexible substituents. The authors confront the apparent paradox that the search for synthetic substitutes for natural compounds often proceeds by making exactly the types of changes known to medicinal chemists to result in weaker and less specific activities. Not surprisingly, actual drug molecules occupy a region of chemical space overlapping with both natural products and synthetic molecules, since some drugs come from each of these sources. Here, the authors suggest complementing traditional “drug-like” property filters (i.e., Lipinski’s “rule of 5” [40]) with “natural product-like’’ property filters in an effort to synthesize molecules sharing more features in common with natural products, in hope of synthetically accessing a potentially underpopulated portion of pharmacologically relevant chemical space. These examples provide a good survey of approaches to problems in cheminformatics, which rely on molecular descriptors and the definition of a molecular descriptor space. One take-home message underpinning all of these studies is that in defining chemical similarity and diversity, both the choices of objects (molecules) and attributes (descriptors) are important in determining the outcome. Many of these studies also show how advances in computer hardware and software have been brought to bear to address large-scale problems not explicitly tractable even a generation ago.
13.1.5 Future Development: Multidimensional Outcome Metrics
In the past, it has been difficult to assemble collections of data on small molecules that afford global comparisons of outcomes over both broad structural classes of molecules and broad coverage of biological motivation, for several reasons. First, many assays are still carried out in a low- or medium-throughput format, and are typically performed on subsets of compounds identified by higher throughput methods [122-1251. Consequently, the scope of chemical structural diversity exposed to these assays is restricted; indeed, such assays are often focused intensely on lead series lacking skeletal diversity. Furthermore, since many such assays are performed in the private sector by pharmaceutical companies, the results from diverse assays are often not cross-referenced between different organizations, producing result-sets that are either disjoint, or whose relationships are difficult to interpret [41]. However, the advent of technologies such as various microarray formats, and the increasing prevalence of HTS and high-content screening in the academic sector, now facilitates the public assessment of diverse compound collections in many different biological contexts, especially including phenotypic assays [126].
13.1 Chemical lnforrnatics
Early work in the area of generating multidimensional biological measurements of small molecules was carried out by Kauvar et al., who focused on generating vectors of binding affinities to collections of proteins [127]. Additional multidimensional phenotypic screening has involved chemical-genomic profiling of yeast with different genetic backgrounds for growth sensitivity [128], a study of stereochemical and skeletal diversity among a collection of carbohydrates using chemical-genetic modifier screens [129], and mechanism discovery by profiling small molecules using high-throughput microscopy [ 1301, among others. More recently, similar studies have been extended to models of the proteome [131] and the tyrosine kinome [80]. The most obvious consequence of these types of experimental advances is the need for new computational methods in modeling structure-outcome relationships. Traditionally, QSAR has considered situations where the descriptors used to characterize molecular structure form a chemical space, but the measurement of activity is a scalar quantity, usually an IC50 against a particular target (Fig. 13.1-7(a)). In future, however, profile-based characterization of small molecules, particularly early in drug discovery or in the academic sector, will provide a much richer set of biological characterization - inherently multidimensional - about small-molecule collections. Under many circumstances, the data from multiple parallel or multiplexed biological assays can be rendered formally comparable, allowing activity (or, more broadly, phenotype) to be encoded as a vector of values (Fig. 13.1-7(b)). Thus, modeling the relationships between small-molecule structures and the phenotypes that they cause in biological systems will require new computational approaches beyond the traditional regression techniques of QSAR. The more subtle, but potentially more exciting, consequence of such multidimensional data analysis is the superposition of biological annotations onto a collection of measurements, allowing connections between the biological “coordinates” to be made independently of the measurements themselves. As we have seen in this chapter, there are specific relationships between various calculated molecular descriptors, based on the theory of their construction or on their relationships to molecular properties such as size and shape. Similarly, there are implicit encodable relationships between the different assays that comprise any multidimensional fingerprint of assay outcomes, such as combinations of cell states and cellular assays (Fig. 13.1-7(c)). Exploiting such relationships across diverse collections of small molecules indeed may uncover new relationships between the biological states themselves. Even more powerful is the notion of a global set of annotations encompassing any conceivable small-molecule assay design and allowing connections between experiments (on the same or similar compounds) conceived and performed independently in different laboratories worldwide. In their simplest form, such annotations can take the form ofliterature terms [132], for example,
I
751
752
I
13 Chemical Informatics
References 1753 4
Fig. 13.1-7 Transition from one-dimensional to multidimensional activity measurements. (a) Traditional quantitative structure-activity relationship (QSAR) considers the relationship between some calculated descriptor space and a single measurement of activity, such as an lCs0 for enzyme inhibition. (b) Future work with chemical space will require a more
complex mapping to activities that are vector, rather than scalar, quantities, as increasing amounts of multidimensional data become available. (c) Conceptual illustration of complex design and experimental relationships possible among components of multidimenslonal biological activities (see text).
to connect members of different target classes among a large group of proteins. More complex examples are clearly possible, including visual phenotypes measured via high-content screening [130, 133-1351, or the genotypes of cell lines used in cell-based assays [136, 1371. To fully leverage this type of analysis will require a rich ontology for phenotypes that explicitly link the biological literature to the experimental design of small-molecule assays. It is in this way, requiring full engagement of experimental biologists, that cheminformatics and chemical space can fulfill their full potential in modern chemical biology research.
References 1. E.J. Corey, X.-M. Cheng, The logic of
Chemical Synthesis, John Wiley, New York, 1989. 2. I. Ugi, J. Bauer, K. Bley, A. Dengler, A. Dietz, E. Fontain, B. Gruber, R. Herges, M. Knauer, K. Reitsam, N. Stein, Computer-assisted solution of chemical problems - the historical development and the present state of the art of a new discipline of chemistry, Angew. Chew., Int. Ed. Engl. 1993,32,201-227. 3. K. Zuse, Der Computer, Mein Lebenswerk, Springer, Berlin, New York, 1984. 4. J. Lederberg, Topological mapping of organic molecules, Proc. Natl. Acad. Sci. U.S.A. 1965,53, 134-139. 5. R.K. Lindsay, Applications ofArt$cial Intelligencefor Organic Chemistry: T h e DENDRAL Project, McCraw-Hill Book, New York, 1980. 6. G.E. Vleduts, Concerning one system of classification and codification of organic reactions, If: Storage Retr. 1963, 1 , 117.
7.
8.
9.
10.
11. 12.
13.
14.
D. Bonchev, D.H. Rouvray, Chemical Graph 7heory: Introduction and Fundamentals, Abacus Press, New York, 1991. J. McMurry, Organic Chemistry, Brooks/Cole Publisher, Pacific Grove, 1992. C.A. Russell, The History ofValency, Humanities Press, New York. 1971. A. Cayley, On the mathematical theory of isomers, Philos. Mag. 1874, 47,444-446. J.J.Sylvester, Chemistry and algebra, Nature 1877, 17, 284. D. Vukicevic, A. Milicevic, S. Nikolic, J. Sedlar, N. Trinajstic, Paths and walks in acyclic structures: plerographs versus kenographs, ARKIVOC2005, x 33-44. N. Biggs, E.K. Lloyd, R.J. Wilson, Graph Theory 1736-1936, Clarendon Press, Oxford [England], 1976. 1. Gutman, D. Vidovic, L. Popovic, Graph representation of organic molecules: Cayley’s plerograms vs
754
I
13 Chemical lnforrnatics
15.
16. 17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
his kenograms,J. Chem. SOC., Faraday Trans. 1998, 94,857-860. A. Streitwieser, C.H. Heathcock, E.M. Kosower, Introduction to Organic Chemistry, Macmillan, New York, 1992. K.P.C. Vollhardt, Organic Chemistry, W.H. Freeman, New York, 1987. W.H.T. Davison, M. Gordon, Sorting for chemical groups using Gordon-Kendall-Davisonciphers, Am. Doc 1957, Vlll, 202. M. Gordon, C.E. Kendall, W.H.T. Davison, Chemical Ciphering: A Universal Code as an Aid to Chemical Systematics, Royal Institute of Chemistry of Great Britain and Ireland, London, 1948. G.M. Dyson, E.F. Riley, Mechanical storage and retrieval of organic chemical data, Chem. Eng. News /1961,74-80. W.H. Waldo, Searching two dimensional structures by computer, J . Chem. Doc. 1962, 2, 1. W.H. Waldo, R.S. Gordon, J.D. Porter, Routine report writing by computer, A m Doc 1958, 9, 28. W.J. Wiswesser, The Wiswesser line formula notation, Chem. Eng. News 1952,3523. W.J. Wiswesser, A Line-Formula Chemical Notation, W. Y. Crowell Co., New York, 1954. H. Bouman, Linearly organized chemical code for use in computer systems (locus),/. Chem. Doc. 1962, 3, 92-96. L. Spialter, The atom connectivity matrix (ACM)and its characteristic polynomial (ACMCP),J . Chem. Doc. 1964,4,261-269. L.C. Ray, R.A. Kirsch, Finding chemical records by digital computers, Science 1957, 126, 814-819. A.M.M. Jorgensen, J.T. Pedersen, Structural diversity of small molecule libraries, /. Chem. In$ Comput. Sci. 2001,41,338-345. D.A. Weininger, SMILES, a chemical language and information system 1: Introduction and encoding rules, J.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
Chem. lnf: Comput. Sci. 1988,28, 31-36. D.A. Weininger, J.L. Weininger, SMILES 2: Algorithm for generation of unique SMILES notation, J . Chem. lnf: Comput. Sci. 1989, 29, 97-101. S. Borman, Production of optically active drugs using lipases, Chem. Eng. NEWS1990, 28,9-14. R.L. Lipnick, Charles Ernest Overton: narcosis studies and a contribution to general pharmacology, Trends Pharmacol. Sci. 1986, 7, 161-164. R. Perkins, H. Fang, W. Tong, W. J. Welsh, Quantitative structure-activity relationship methods: perspectives on drug discovery and toxicology, Environ. Toxicol. Chem. 2003, 22, 1666-79. L.P. Hammett, The effect of structure upon the reactions of organic compounds. Temperature and solvent influences, I . Chem. Phys. 1936,4,613-617. C. Hansch, A. Leo, R.W. Taft, A survey of Hammett substituent constants and resonance and field parameters, Chem. Rev. 1991, 91, 165-195. L.P. Hammett, Physical Organic Chemistry; Reaction Rates, Equilibria, and Mechanisms, McGraw-Hill Book Company, Inc., New York, London, 1940. J. Shorter, The prehistory of the Hammett equation, Chem. Listy 2000, 94,210-214. C. Hansch, A quantitative approach to biochemical structure-activity relationships, Acc. Chem. Res. 1969, 2,232-239. T. Fujita, J. Iwasa, C. Hansch, A new substituent constant, pi, derived from partition coefficients,J. Am. Chem. SOC.1964,86,5175-5180. J. Iwasa, T. Fujita, C. Hansch, Substituent Constants For Aliphatic Functions Obtained From Partition Coefficients,J . Med. Chem. 1965, 56, 150-3. C.A. Lipinski, F. Lombardo, B.W. Dominy, P. J. Feeney, Experimental and computational approaches to estimate solubility and permeability
References I 7 5 5
41,
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
in drug discovery and development settings, Adv. Drug Delivery Rev. 1997,23, 3-25. A.P. Beresford, M. Segall, M.H. Tarbit, In silico prediction ofADME properties: are we making progress? Curr. Opin. Drug Discov. Devel. 2004, 7, 36-42. H. Yu, A. Adedoyin, ADME-Tox in drug discovery: integration of experimental and computational technologies, Drug Discov. Today 2003,8,852-61. C. Hansch, A. Leo, D.H. Hoekman, Exploring Q S A R ,American Chemical Society, Washington, 1995. Y.A. Ban, S. Bereg, N.H. Mustafa, A conjecture on Wiener indices in combinatorial chemistry, Algorithmica 2004, 40,99-117. R. Gozalbes, J.P. Doucet, F. Derouin, Application of topological descriptors in QSAR and drug design: history and new trends, Curr. Drug Targets In&. Disord. 2002, 2, 93-102. I. Gutman, O.E. Polansky, Mathematical Concepts in Organic Chemistry, Springer-Verlag, Berlin, New York, 1986. 0. Ivanciuc, S.L. Taraviras, D. Cabrol-Bass, Quasi-orthogonal basis sets of molecular graph descriptors as a chemical diversity measure, J. Chem. In$ Comput. Sci. 2000,40,126-134. H. Wiener, Structural determination of paraffin boiling points, J . Am. Chem. SOC.1947, 69, 17-20. E. Estrada, E. Uriarte, Recent advances on the role of topological indices in drug discovery research, Curr. Med. Chem. 2001,8, 1573-1588. A.R. Katritzky, V.S. Lobanov, M. Karelson, Normal boiling points for organic compounds: correlation and prediction by a quantitative structure-property relationship, /. Chem. In$ Comput. Sci. 1998,38, 28-41. D.E. Needham, I.C. Wei, P. J. Seybold, Molecular modeling of the physical properties of alkanes, J . Am. Chem. SOC.1988, 110,4186-4149.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
M. Randic, G.M. Brissey, R.B. Spencer, C.L. Wilkins, Search for all self-avoiding paths for molecular graphs, Comput. Chem. 1979,3,5-13. M. Randic, G.M. Brissey, R.B. Spencer, C.L. Wilkins, Use of self-avoiding paths for characterization of molecular graphs with multiple bonds, Comput. Chem. 1980,4,27-43. M. Randic, On characterization of molecular branching, /. Am. Chem. SOC.1975, 97,6609-6615. A.K. Debnath, Quantitative structure-activity relationship (QSAR) paradigm - Hansch era to new millennium, Mini Rev. Med. Chem. 2001, 1, 187-195. L.B. Kier, L.H. Hall, W.J. Murray, M. Randic, Molecular connectivity. I: Relationship to nonspecific local anesthesia, 1.P h a m . Sci. 1975, 64, 1971-4. T. Pisanski, D. Plavsic, M. Randic, On numerical characterization of cyclicity,J. Chem. In$ Comput. Sci. 2000,40,520-523. S.C. Basak, S. Bertelsen, G.D. Grunwald, Use of graph theoretic parameters in risk assessment of chemicals, Toxicol. Lett. 1995, 79, 239-50. B.D. Gute, G.D. Grunwald, S.C. Basak, Prediction of the dermal penetration of polycyclic aromatic hydrocarbons (PAHs): a hierarchical QSAR approach, S A R Q S A R Environ. Res. 1999, 10, 1-15. C. Hansch, D. Hoekman, H. Gao, Comparative QSAR: Toward a Deeper Understanding of Chemicobiological Interactions, Chem. Rev. 1996, 96, 1045-1076. A.T. Balaban, Highly discriminating distance-based topological index, Chem. Phys. Lett. 1982, 89, 399-404. A.T. Balaban, D. Mills, S.C. Basak, Correlation between structure and normal boiling points of acyclic carbonyl compounds, /. Chem. In$ Comput. Sci. 1999, 39, 758-764. R.A. Lewis, J.S. Mason, I.M. McLay, Similarity measures for rational set selection and analysis of
756
I
13 Chemical Informatics
64.
65.
66.
67.
68.
69.
70.
71.
72.
combinatorial libraries: the diverse property-derived (DPD) approach, J . Chem. Inf: Comput. Sci. 1997, 37, 599-614. S.L. Dixon, H.O. Villar, Investigation of classification methods for the prediction of activity in diverse chemical libraries, J. Cornput.-Aided Mol. Des. 1999, 13, 533-45. A. Katritzky, E.V. Gordeeva, Traditional topological indices vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research, J . Chem. Inf: Comput. Sci. 1993, 33, 835-857. C.E. Shannon, W. Weaver, The Mathematical 7'heot-pof Communication, University of Illinois Press, Urbana, 1998. C. Hansch, B.R. Telzer, L. Zhang, Comparative QSAR in toxicology: examples from teratology and cancer chemotherapy of aniline mustards, Crit. Rev. Toxicol. 1995, 25, 67-89. T.C. Bruice, N.Kharasch, R. J. Winzler, A correlation of thyroxine-like activity and chemical structure, Arch. Biochem. Biophys. 1956, 62,305-17. A. Leo, C. Hansch, D. Elkins, Partition coefficients and their uses, Chem. Rev. 1971, 71,525-616. A.K. Ghose, G.M. Crippen, Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure-activity relationships I. Partition coefficients as a measure of hydrophobicity, J . Comput. Chem. 1986, 7,565-577. A.K. Ghose, A. Pritchett, G.M. Crippen, Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships I I I: modeling hydrophobic interactions, J. Comput. Chem. 1988, 9,80-90. V.N. Vishwanadhan, A.K. Ghose, G.R. Revankar, R.K. Robins, Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships: 4.Additional parameters for hydrophobic and dispersive interactions and their
73.
74.
75.
76.
77.
78.
79.
80.
application for an automated superposition of certain naturally occurring nucleoside antibiotics, J. Chem. Inf: Comput. Sci. 1989, 29, 163-172. R.E. Carhart, D.H. Smith, R. Venkataraghavan, Atom pairs as molecular features in structure-activity studies: Definition and applications, J. Chem. In$ Comput. Sci. 1985, 25, 64-73. G. Moreau, P. Broto, The auto-correlation of a topological structure: A new molecular descriptor, Nouv. J. Chim. 1980, 4, 359-360. T.H. Varkony, Y. Shiloach, D.H. Smith, Computer-assisted examination of chemical compounds for structural similarities, J . Chem. rnf: Comput. Sci. 1979, 19, 104-111. R. Nilakantan, N. Bauman, J.S. Dixon, R. Venkataraghavan, Topological torsion: A new molecular descriptor for SAR applications. Comparison with other descriptors, J . Chem. rnf: Comput. Sci. 1987, 27, 82-85. L. Xue, J. Bajorath, Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening, Comb. Chem. High Throughput Screen. 2000, 3, 363-72. A.C. Good, I.D. Kuntz, Investigating the extension of painvise distance pharmacophore measures to triplet-based descriptors, 1. Cornput.-Aided Mol. Des. 1995, 9, 373-9. C. Bologa, T.K. Allu, M. Olah, M.A. Kappler, T.I. Oprea, Descriptor collision and confusion: toward the design of descriptors to mask chemical structures, J . Cornput.-Aided Mol. Des. 2005, 19, 625-35. J.S. Melnick, J. lanes, S. Kim, J.Y. Chang, D.G. Sipes, D. Gunderson, L. James, J.T. Matzen, M.E. Garcia, T.L. Hood, R. Beigi, G. Xia, R.A. Harig, H. Asatryan, S.F. Yan, Y. Zhou, X.J. Gu, A. Saadat, V. Zhou, F.J. King, C.M. Shaw, A.I. Su, R. Downs, N.S. Gray, P.G. Schultz,
References I 7 5 7
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
M. Warmuth, J.S. Caldwell, An efficient rapid system for profiling the cellular activities of molecular libraries, Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 3153-8. Y.C. Martin, J.L. Kofron, L.M. Traphagen, Do structurally similar molecules have similar biological activity? J . Med. Chem. 2002, 45, 4350-8. J . Hert, P. Willett, D.J. Wilton, P. Acklin, K. Azzaoui, E. Jacoby, A. Schuffenhauer, Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures, J . Chem. If: Comput. Sci. 2004, 44, 1177-85. L.B. Kier, L.H. Hall, An electrotopological-state index for atoms in molecules, Pharm. Res. 1990, 7,801-7. L.B. Kier, L.H. Hall, Molecular Structure Description: the Electrotopological State, Academic Press, San Diego, 1999. G.E. Kellogg, L.B. Kier, P. Gaillard, L.H. Hall, E-state fields: applications to 3D QSAR,J. Cornput.-Aided Mol. Des. 199G, 10, 513-20. L.B. Kier, L.H. Hall, General definition of valence delta-values for molecular connectivity, J . Pharm. Sci. 1983, 72,1170-3. L.B. Kier, W.J. Murray, L.H. Hall, Molecular connectivity. 4. Relationships to biological activities, J . Med. Chem. 1975, 18, 1272-4. A.J. Hopfinger, A QSAR investigation of dihydrofolate reductase inhibition by Baker triazines based upon molecular shape analysis, J . Am. Chem. SOC. 1980, 102,7196-9206. L.B. Kier, The preferred conformations of ephedrine isomers and the nature of the alpha adrenergic receptor, 1.Pharmacol. Exp. Ther. 1968, 164, 75-81. H.J. Weintraub, A.J. Hopfinger, Conformational analysis of some phenethylamine molecules, J . Theor. Bid. 1973, 41, 53-75.
91.
92.
93.
94.
95.
96.
97.
98.
99.
G.M. Crippen, Distance geometry approach to rationalizing binding data, J . Med. Chem. 1979, 22, 988-97. K. Yamamoto, A quantitative approach to the evaluation of 2-acetamide substituent effects on the hydrolysis by Taka-N-acetyl-betaD-glucosaminidase. Role of the substrate 2-acetamide group in the N-acyl specificity of the enzyme, J . Biochem. (Tokyo) 1974, 76, 385-90. C. Silipo, C. Hansch, Correlation analysis. Its application to the structure-activity relationship of triazines inhibiting dihydrofolate reductase, J . Am. Chem. SOC.1975, 97,6849-61. A.J. Hopfinger, Theory and application of molecular potential energy fields in molecular shape analysis: a quantitative structure--activity relationship study of 2,4-diamino-5-benzylpyrimidines as dihydrofolate reductase inhibitors, J . Med. Chem. 1983, 26, 990-6. R.D. Cramer, D.E. Patterson, J.D. Bunce, Comparative molecular field analysis (CoMFA): 1. Effect of shape on binding of steroids to carrier proteins,]. Am. Chem. SOC.1988, 110, 5959-5967. R.D. Cramer, J.D. Bunce, D.E. Patterson, I.E. Frank, Cross-validation, bootstrapping, and partial least squares compared with multiple linear regression in conventional QSAR studies, Quant. Struct.-Act. Relat. 1988, 7, 18-25. W. Lindberg, J.-A. Persson, S. Wold, Partial least-squares method for spectrofluorimetric analysis of mixtures of humic acid and ligninsulfonate, Anal. Chem. 1983, 55,643-648. S. Wold, A. Ruhe, H. Wold, W.J. Dunn, The collinearity problem in linear regression: The partial least squares (PLS) approach to generalized inverses, S I A M J . Sci. Stat. Comput. 1984, 5, 735-742. A.C. Good, E.E. Hodgkin, W.C. Richards, Utilization of Gaussian functions for the rapid evaluation of
758
I
13 Chemical lnformatics
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
molecular similarity,]. Chem. InJ Comput. Sci. 1992, 32, 188. A.C. Good, S.J. Peterson, W.G. Richards, QSAR’s from similarity matrices, Technique validation and application in the comparison of different similarity evaluation methods, J . Med. Chem. 1993, 36, 2929-37. R.D. Cramer, R.D. Clark, D.E. Patterson, A.M. Ferguson, Bioisosterism as a molecular diversity descriptor: steric fields of single ”topomeric”conformers,J . Med. Chem. 1996, 39, 3060-9. W.H. Sauer, M.K. Schwarz, Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity,J . Chem. InJ Cornput, sci, 2003, 43, 987-1003. W.H. Sauer, M.K. Schwarz, Size doesn’t matter: Scaffold diversity, shape diversity and biological activity of combinatorial libraries, Chimia 2003,57,276-283. M.G. Bures,Y.C. Martin, Computational methods in molecular diversity and combinatorial chemistry, Curr. Opin. Chem. Biol. 1998, 2, 376-80. P. Willett, Chemoinformatics similarity and diversity in chemical libraries, C u r . Opin. Biotechnol. 2000, 11,85-8. O.F. Guner, History and evolution of the pharmacophore concept in computer-aided drug design, Curr. Top. Med. Chem.2o02, 2, 1321-32. F. Yamashita, M. Hashida, In silico approaches for predicting ADME properties of drugs, Drug Metab. Phamacokinet. 2004, 19, 327-38. M.P. Bradley, An overview of the diversity represented in commercially-availabledatabases, ]. Comput. Aided Mol. Des. 2002, 16, 301-9. J.H. Voigt, B. Bienfait, S. Wang, M.C. Nicklaus, Comparison of the NCI open database with seven large chemical structural databases, ]. Chem. InJ Comput. Sci. 2001, 41, 702- 12.
110. C. Hansch, D. Hoekman, A. Leo, D. Weininger, C.D. Selassie, Chem-bioinformatics: Comparative QSAR at the interface between chemistry and biology, Chem. Rev. 2002, 102,783-812. 111. C. Hansch, A. Kurup, R. Garg, H. Gao, Chem-bioinformatics and QSAR: a review of QSAR lacking positive hydrophobic terms, Chem. Rev. 2001, 101, 619-72. 112. Y.C. Martin, 3D QSAR: current state, scope, and limitations, Perspectives in Drug Discovery and Design 1998, 12-14,3. 113. M.A. Johnson,G.M. Maggiora, American Chemical Society. Meeting C o n c e ~ t and s A ~ ~ l i c a t i oof n sMolecular Similarity, Wiley, New York, 1990. 114. D.E. Patterson, R.D. Cramer, A.M. Ferguson, R.D. Clark, L.E. Weinberger, Neighborhood behavior: a useful concept for validation of “molecular diversity” descriptors, J . Med. Chem. 199639,3049-59. 115. R.D. Cramer, M.A. Poss, M.A. Hermsmeier, T.J. Caulfield, M.C. Kowala, M.T. Valentine, Prospective identification of biologically active structures by topomer shape similarity searching, J . Med. Chem. 1999,42,3919-33. 116. R, Benigni, G , Gallo, F. Giorgi, A. Giuliani, On the equivalence between different descriptions of mo~ecules:Value for computational approaches, J . Chem. InJ Comput. Sci. 1999, 39, 575-578. 117. A. Rusinko 111, M.W. Farmen, C.G. Lambert, P.L. Brown, S . S . Young, Analysis of a large structure/biological activity data set using recursive partitioning, J . Chem. InJ Comput. Sci. 1999, 39, 1017-26. 118. D.K. Agrafiotis, A constant time algorithm for estimating the diversity of large chemical libraries, J . Chem. 1nJ Comput. Sci. 2001, 41, 159-67. 119. T.I. Oprea, J. Gottfries, Chemography: the art of navigating in chemical space, ]. Comb. Chem. 2001,3,157-66. 120. T.I. Oprea, I. Zamora, A.L. Ungell, Pharmacokinetically based mapping
device for chemical space navigation, J . Comb. Chem. 2002,4,258-66. 121. M. Feher, J.M. Schmidt, Property distributions: differences between drugs, natural products, and molecules from combinatorial 5 . Comput. Sci. chemistry, J. Chem. 1 2003,43,218-27. 122.
123.
124.
125.
126.
G.W. Caldwell, Compound optimization in early- and late-phase drug discovery: Acceptable pharmacokinetic properties utilizing combined physicochemical, in vitro and in vivo screens, Curr. Opin. Drug Discov. Devel. 2000, 3, 30-41. C.M. Krejsa, D. Horvath, S.L. Rogalski, J.E. Penzotti, B. Mao, F. Barbosa, J.C. Migeon, Predicting ADME properties and side effects: the BioPrint approach, Curr. Opin. Drug Discov. Devel. 2003, 6, 470-80. T.R. Stouch, J.R. Kenyon, S.R. Johnson, X.Q. Chen, A. Doweyko, Y. Li, In silico ADME/Tox: why models fail, J. Comput. Aided Mol. Des. 2003, 17,83-92. H. van de Waterbeemd, E. Gifford, ADMET in silico modelling: towards prediction paradise? Nat. Rev. Drug Discov. 2003, 2, 192-204. P.A. Clemons, Complex phenotypic assays in high-throughput screening, Curr. Opin. Chem. Biol. 2004, 8, 334-8.
127.
2005,48,6918-25.
D.E. Root, S.P. Flaherty, B.P. Kelley, B.R. Stockwell, Biological mechanism profiling using an annotated compound library, Chem. Biol. 2003, 10, 881-92. 133. Z.E. Perlman, T.J. Mitchison, T.U. Mayer, High-content screening and profiling of drug activity in an automated centrosome-duplication assay, Chembiochem 2005, 6, 145-51. 134. J.C. Yarrow, Y. Feng, Z.E. Perlman, T. Kirchhausen, T.J. Mitchison, Phenotypic screening of small molecule libraries by high throughput cell imaging, Comb. Chem. High Throughput Screen. 2003,
132.
6,279-86. 135.
L.M. Kauvar, D.L. Higgins, H.O. Villar, J.R. Sportsman, A. Engqvist-Goldstein, R. Bukar, K.E. Bauer, H. Dilley, D.M. Rocke, Predicting ligand binding to proteins by affinity fingerprinting, Chem. Biol. 1995, 2, 107-18.
S.J. Haggarty, P.A. Clemons, S.L. Schreiber, Chemical genomic profiling of biological networks using graph theory and combinations of small molecule perturbations, /. Am. Chem. SOC.2003, 125,10543-5. 129. Y.K. Kim, M.A. Arai, T. Arai, J.O. Lamenzo, E.F. Dean 111, N. Patterson,
128.
P.A. Clemons, S.L. Schreiber, Relationship of stereochemical and skeletal diversity of small molecules to cellular measurement space, J . Am. Chem. SOC.2004, 126,14740-5. 130. Z.E. Perlman, M.D. Slack, Y. Feng, T.J. Mitchison, L.F. Wu, S.J. Altschuler, Multidimensional drug profiling by automated microscopy, Science 2004, 306, 1194-8. 131. A.F. Fliri, W.T. Loging, P.F. Thadeio, R.A. Volkmann, Biospectra analysis: model proteome characterizations for linking molecular structure and biological response, J . Med. Chem.
J.C. Yarrow, Z.E. Perlman, N.J. Westwood, T.J. Mitchison, A high-throughput cell migration assay using scratch wound healing, a comparison of image-based readout methods, BMC Biotechnol. 2004, 4, 21.
E.O. Perlstein, D.M. Ruderfer, G. Ramachandran, S.J. Haggarty, L. Kruglyak, S.L. Schreiber, Revealing complex traits with small molecules and naturally recombinant yeast strains, Chem. Biol. 2006, 13, 319-27. 137. S.L. Schreiber, Small molecules: the missing link in the central dogma, Nat. Chyem. Biol. 2005, I, 64-<
136.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
760
I
13 Chemical Informatics
13.2 WOMBAT and WOMBAT-PK Bioactivity Databases for Lead and Drug Discovery
Marius Olah, Ramona Rad, Liliana Ostopovici, A h a Bora, Nicoleta Hadaruga, Dan Hadaruga, Ramona Moldovan, Adriana Fulias, Maria Mracec, and Tudor I. Oprea
Outlook
This chapter highlights the importance of gathering appropriate and accurate information with respect to chemical structures and associated bioactivities, focused on drug discovery. The contents of WOMBAT and WOMBAT-PK are summarized, and examples are given for some of the problems that are encountered when indexing correct biological properties and chemical structures. Two examples for data mining in WOMBAT are given.
13.2.1 Introduction: The WOMBAT Databases
The current paradigm for drug discovery allows a relatively short period, 6-12months, for the process that modifies an initial active compound - either from high throughput screening (HTS),or from publications and patents - into a well-characterized lead molecule. During this time, project team members have relatively little time to familiarize themselves with ‘prior art’, that is, to gather information pertinent to the new biological target, the disease models, as well as active chemotypes on the intended, or related targets. The task of gathering background information related to chemotypes is made easier if one has access to chemical databases such as Chemical Abstracts via SciFinder [l],Beilstein [2], and Spresi [3], or to medicinal chemistry-related patent databases such as the MDL Drug Data Report, MDDR [4], the World Drug Index, WDI [S], and Current Patents Fast Alert [GI.Collections of biologically active compounds include Comprehensive Medicinal Chemistry, CMC [7] and DiscoveryGate [8],while the PubChem [9]database, part of the Molecular Libraries Initiative (MLI) [lo],is more focused on tools for chemical biology. Clinical pharmacokinetics data for marketed drugs is captured in the Physician Desk Reference, PDR [ll],while DrugBank [12] also captures compounds in clinical trials. Primary HTS data are captured in PubChem [9],which has author-defined labels for “active” and “inactive” chemical probes. However, most of the other databases listed above do not capture biological endpoints in a simple searchable manner: There are no fields that one can query in a quantitative manner to identify what is the target-related activity of a particular compound, or what other measured properties it has. Such information is important if Chemical Biology. From Small Molecules to System Biology and Drug D e s i F . Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
I one considers that (a) not all chemotypes indexed in patent databases are 13.2 WOMBATand WOMBAT-PK
indeed active - some are merely patent claims with no factual basis; (b) not all chemotypes disclosed as active are equally active, or selective for that matter, on the target(s)of choice; and (c) not all compounds sharing the same therapeutic indication behave in the same manner with respect to, for example, side effects. Some of these were considered at AstraZeneca R&D Molndal, Sweden, in May 2001, to initiate a data-gathering project centered primarily on the Journal ofMedicinal Chemistry (JMC),in collaboration with scientists at the Romanian Academy Institute of Chemistry in Timisoara, Romania. The major goal of this project was to capture chemical structures and the associated biological activities disclosed in the JMC, with an initial goal of 20000 entries set for the first year. The first version of this database was available at AstraZeneca R&D Molndal in May 2002; this version contained 21 700 structures (with duplicates), and 36 738 experimental activities on 324 targets, captured from 837 JMC papers (1996-1999). Because the internal dissemination of this database within AstraZeneca R&D (a company with 11 R&D sites across four continents) was not deemed a success, AstraZeneca decided to discontinue the project as of May 2002. Backed by private funding, the database, renamed World of Molecular BioAcTivity (WOMBAT)in 2003, continued to evolve [13]as discussed for WOMBAT 2006.1, below. Recognizing the paucity of chemical databases that capture clinical pharmacokinetics data in a searchable manner, we further developed the WOMBAT-PK (WOMBAT-Pharmacokinetics),to index such data from literature [14].This chapter summarizes the contents of WOMBAT and WOMBATPK [ 1S], some of the problems encountered in appropriately indexing biological activities and correct chemical structures (with focus on machine-readable contents for data mining), and provides some examples of data mining with WOMBAT. Other bioactivity databases [ 161,focused mostly on patent literature, are shown in Table 13.2-1together with the on-line references. 13.2.2 WOMBAT 2006.1: Overview
WOMBAT 2006.1 contains 154 236 entries (136 091 unique SMILES Simplified Molecular Input Line Entry System [17, 18]),covering 6801 series from over 6791 papers with more than 307 700 activities for 1320 unique targets. All biological activities are automatically converted to the - log,, of the molar concentration, regardless of activity type. Numerical values for activity are stored in three fields; the additional two fields capture the experimental error, when reported']. Besides exact numeric values (the vast majority), WOMBAT 1) In the absence of reported errors, the 3 activity
value fields are equal. The decision to index these values for each molecule was taken because 'missing values' are given a different interpretation by statistical techniques.
761
762
I
13 Chemical fnformatics
Table 13.2-1 Examples of annotated databases, modified from [16] Database
Description
AurSCOPE
Databases containing biological and chemical information relating to a class of drug targets or a pharmaceutical topic of interest Bioinformatics databases about drugs, natural products, protein targets, ADME (Absorption, Distribution, Metabolism and Excretion)/Tox,and drug-protein binding Ligand profiling data including target-specific activity, pharmacology, and ADME-related properties Resource for biomolecular data focused on public databases for small molecule/domain interactions Database about small molecules and resources for studying their effects on biology Pharmacological, pathological, and gene expression profiles for benchmark drugs Chemical structures, biological activities, toxicity, and pharmacological data for a large number of compounds curated from patents and journals Database of chemical structures with associated binding affinity (K,) for given targets Captures published information for therapeutically relevant kinases Chemical structures, bioactivities, therapeutically relevant databases for a large number of compounds curated from journals and patents Small molecule meta-database which compiles various publicly available small molecule databases Contains biologically relevant compounds, including launched and candidate drugs, and well-defined derivatives Public domain resource that provides information related to drugs and their binding properties Provides a high volume of information on the biological activities of small molecules; it links chemical structures to other Entrez databases Online resource of commercially available compounds dedicated to virtual screening practitioners
BIDD
Bioprint
Blueprint
ChemBank Drugmatrix GVK Biosciences databases
KiBank Kinase Knowledgebase Jubilant Biosys databases
Ligand Info
MDL Drug Data Report PDSP K, PubChem
ZINC
Homepage
http://www.aureus-pharma.com/ http://bidd.nus.edu.sg/
http://www.cerep.fr/
http://www.blueprint.org/
http://chembank.broad.harvard.edu/ http://www.iconixpharm.com/ http://www.gvkbio.com/
http://kibank.iis.u-tokyo.ac.jp/ http://www.eidogen-sertanty.com/ http://www.jubilantbiosys.com/ products. htm http://ligand.info/
http://www.mdli.com/
http://pdsp.cwru.edu/
http://pubchem.ncbi.nlm.nih.gov/
http://blaster.docking.org/zinc/
13.2 WOMBATand WOMBAT-PK
Fig. 13.2-1 Bioactivity distribution pie charts in WOMBAT 2006.1, classified by target type. The size of the pie chart is proportional to the representation of each target class: enzymes, 42%; ion channels, 7%; proteins 7%; and receptors, 45%.
now captures ‘inactives’(3639),‘less than’ (21926), ‘greater than’ (635),as well as percentage inhibition values (8448single dose experiments). The bioactivity distribution by target type is given in Fig. 13.2-1. Four target types are captured in WOMBAT: receptors (which includes GPCRs - G-protein coupled receptors, nuclear hormone receptors, integrins and other receptors, e.g., sigma), enzymes (associated with the Enzyme Commission E.C. number [19]),ion channels, and proteins (biological targets that are not known as receptors, enzymes, or ion channels, e.g., transporters). A vast majority of the biological activities are related to inhibitors and antagonists: -56% of the activities are ICsO values (and variations), and 37% are Ki values (and variations). Much less frequent are Dz or ECso values (-3% of the measurements are for agonists or substrates) and binding affinity constants (-1% Kb and Kd). In WOMBAT 2006.1, enzyme inhibitors populate more of the inactivellow-activitybins, while receptor antagonists populate more of the medium/high-activity bins (see also Fig. 13.2-2).The target profile of biological activities is given in Table 13.2-2,with focus on some targets classes of current interest to the pharmaceutical industry. Table 13.2-2further indicates the ratio of “actives” in this release of WOMBAT: This table shows that for some target classes (e.g., phosphatases) there is a relatively small number of “actives”, a trend that is observed in most of the indexed enzymes. On the other hand, receptor classes have a higher ratio of “actives”. The target type distribution by activity in Fig. 13.2-2 reflects approximately 15 years of medicinal chemistry (see also Table 13.2-2). Medicinal chemistry publications currently indexed in WOMBAT are listed in Table 13.2-3.
I
763
764
I
73 Chemical informatics
Fig. 13.2-2 Target type distribution pie charts in WOMBAT 2006.1, classified by activity value (in the - log,o scale). The size of the pie chart is proportional to the
representation of each activity category: inactives, 2%; low activity (0-6), 18%; medium activity (6-8), 41%; and high activity (8-14.4), 40%.
Table 13.2-2 Target class profile for WOMBAT.2006.1*) Target class
Entries
Percentage
G-protein coupled receptors Integrins Nuclear hormone receptors Sigma receptors Ion channels Serine proteases (0xido)reductases Kinases Phosphatases Oxygenases Aspartyl proteases Metalloproteases Cysteine proteases Transporters Others
50 778 3127 4335 2123 13 500 7596 7770 9705 1361 605 1 4904 4296 2063 5462 31 165
32.92 2.03 2.81 1.38 8.75 4.92 5.04 6.29 0.88 3.92 3.18 2.79 1.34 3.54 20.21
Actives
Percentage
3 1 111 1692 2436 l6Gl 5352 3166 2865 3241 81 1716 2881 1471 771 2860
20.17 1.10 1.58 1.08 3.47 2.05 1.86 2.10 0.05 1.11 1.87 0.95 0.50 1.85
NIA
N/A
The WOMBAT database schemata, illustrated in Fig. 13.2-3, are further discussed in the next section. Their organization, illustrated in Figs. 13.2-4 to 13.2-6, shows the 3 panels of the database: The Bioactivity Summary ") [Entries indicate the number of structures recorded for each target class, whereas "actives" indicate those entries with an activity
of 100 n M or better; percentage values relate to the total number of entries]
13.2 WOMBATand WOMBAT-PK Table 13.2-3 Medicinal chemistry publications covered in WOM BAT.2006.1 journal title
Percentage
J. Med. Chem.
77.6
Bioorg. Med. Chem. Lett.
15.4
Bioorg. Med. Chem.
5.6
Eur. J. Med. Chem.
1.o
I ROOT :-
Publication years
1991-2004 [complete] 2005 [partial coverage] 2002-2003 [complete] 2004 [partial coverage] 2002-2003 [complete] 2004 [partial coverage] 2002-2003 [complete] 2004 [partial coverage]
I
,. ..~..~.~~ ~ ~....~.. ~ . ... ~ ,
SMDLID
,... .
.......
+!
~~~~
~
~~~~
entry identifier
. . ~. . . ~...~~...~.
series identifier (related to the references database)
SID Structure
Y
,........
chenucal structure (MDL MOL & SMILES formats)
~ ~ . ..~ .
~.
-.+ . R. e ~~. ference i ..... . ~.~~ ~~
~
~
~~~~
.
~
. ..
short bibliografic reference I
~
-+ K e y w o r d s
...~.~..~ . .. .. .. .. ~
-y
~
~. . . . ~~
.
Properties
structure keywords (stereo & salt data) calc & exp properties (LogP/S, R05, LigEff, etc)
AID
activity identifier (1, 2, ..., n)
T a r g e tT y p e
target type (receptor, enzyme, ...)
T a r g e thlame
target name
ActType
activity type (1C50, Ki, EC50, ...)
ValueType
activity value type (=, <, >, inhib%, inactive)
A c tValue
numeric activity value, in -log10 units
Range
confidence range for the actlvlty value
BioKeywords
target & exp determlnatlon information
S w i s s P r ot I D
SwissProt I D / A N &species
R e c C l ass1 f
GPCR/ N H R family/subfamily classification
t-.+
Fig. 13.2-3 WOMBAT database schemata (simplified)
I
765
766
I
13 Chemical lnformatics
Fig. 13.2-4 WOMBAT bioactivity summary panel (example).
panel (Fig. 13.2-4) provides bioactivity types and values, some basic target information, the minimal reference information as well as structural, chemical (2D depiction and SMILES code), and related information (chirality, salt). The Target and Biological Infomation panel (Fig. 13.2-5) provides detailed target information, including biological information (species, tissue, etc.), detailed target and target class information (including hierarchical classification for G-protein coupled receptors, nuclear hormone receptors, and enzymes) as well as further information regarding the bioassays (radioligand, assay type, etc.). SwissProt [20] reference IDS are stored for most targets (-88%). The Computed Chemical Properties panel (Fig. 13.2-6) includes several calculated and experimental properties for each chemical structure, for example, counts of miscellaneous atom types, Lipinski’s rule-of-five (Ro5) parameters [21] (including the calculated octanol/water partition coefficient), ClogP [22] and Tetko’s calculated water solubility [23],polar surface areas (PSAs)and nonpolar surface areas (NPSAs), and so on. Finally, the Reference Database contains bibliographic information (Fig. 13.2-7),including the Digital Object Identifier
13.2 WOMBATand WOMBAT-PK
Fig. 13.2-5 WOMBAT target and biological information panel (example).
(DOI) format [24] with URL links to pdf files for all literature entries, as well as the PubMed ID for each paper.
13.2.3 WOMBAT Database Structure
WOMBAT is a dynamic database, which evolves as new data types are included. The database structure is, however, preserved as much as possible from one release to the next. Each root record (or WOMBAT entry) is identified by a unique number (SMDLID),and is defined by the combination of one chemical structure and one or more associated biological activities as entered in one publication (Fig. 13.2-3). One field, series identifier (SID), links all the root records indexed from one reference (article). There are 6801 SID values in WOMBAT 2006.1 (see also Fig. 13.2-7). At the root level, information about the bibliographic reference (unique SID) from which the entry originated the entry is recorded together with various properties (illustrated in Fig. 13.2-6). Separate keywords describe structural characteristics, related to stereochemistry (e.g., absolute, relative, f,R/S, ‘non-chiral’ or racemic) and to the salt
I
767
768
I
13 Chemical Informatics
Fig. 13.2-6 WOMBAT computed chemical properties panel (example).
Fig. 13.2-7
WOMBAT references database (example).
I see also Fig.13.2-3. We record the salt separately to avoid the salt13.2 WOMBATand WOMBAT-PK
form removal step that is usually performed in cheminformatic studies prior to structure computations. For each SMDLI D, we define the following biological activity sub-records: the activity identifier (AID), with values from 1 to n, where n is the number of biological activity determinations for one structure; TargetName (the target name on which the activity was measured); ActType (the activity type, e.g., I&), ValueType, which can be one of five types: Exactly (=), lower than (<),greater than (>), percentage inhibition at a given concentration (@I), or inactive; ActValue, the numeric value of the activity, in - log,, of the molar concentration; Range, the experimental confidence range for the measured activity, also in logarithmic units. For each SMDLID and each AID, we also record a number of BioKeywords related to biological activity information (e.g., bio-species, tissue and cell types, and so on) and target-related information (e.g., the E.C. number [19],what radio-labeled substrate or ligand was used, and so on) - see also Fig.13.2-5. Thus, for one series (same SID value), each activity block (AID range 1, . . ., n) has separate TargetName, ActType, ValueType, and BioKeywords.
13.2.4 WOMBAT Quality Control
Quality control is performed at the moment of data entry, in particular with respect to errors present in publications. Chemical structures are checked for structural consistency by matching the molecular weight (MW) and chemical formula with the ones available in the Experimental section and/or Supporting Information - whenever available, and by comparison to prior publications. Whenever in doubt, we also use other sources, such as the Merck Index [25] and free Internet resources. In the instances where external and literature data cannot be reconciled, SciFinder [I] is also used. The error rate so far in medicinal chemistry publications is not at all negligible: We find an average of approximately two errors per publication in all the 6791 papers indexed in WOMBAT 2006.1. Given the median of 25 compounds per series, this implies an overall error rate of 8%. These errors are distributed as follows [26]: incorrectly drawn or written structures (3%);incorrect molecular formula or MW (3%); unspecified position of attachment of substituents, or ambiguous numbering scheme for the heterocyclic backbone (0.9%); structures with the incorrect backbone (0.7%); incorrect generic names or chemical names (0.2%); duplicates (0.2%); incorrect biological activity (0.3%); incorrect references (0.2%).
769
770
I
13 Chemical Informatics
Not machine-readable
“
Machine-readable /
/
(1R,2S,3S,5S)-8-methyl-3-phenyl -2-propyl-8-azabicyclo[3.2.1IoctaneA.
OH
OH
Error: ‘Stereo bonds are only allowed between chiral and achiral atoms’
(2R,3R,4S,5R)-2-(6-amino-9H-purin-9-yl)-
5-(Rgroup)-tetrahydrofuran-3,4-diol
Cross upldown wedge error
Undefined chirality may be interpreted as both R and S
Fig. 13.2-8
Human vs. machine-readable chemical structure representations. Names based on the depicted structures were The cross upldown wedge error interpreted using ACDName [30]. (middle) causes errors in assigning the absolute chirality.
A special attention is given to stereochemistry, as some compounds are published without proper chirality representation even though the information is available, for example, for natural compounds and their derivatives. Furthermore, as illustrated in Fig. 13.2-8, compounds published in medicinal chemistry literature are often depicted in a “human-readable” format; that is, structures are drawn in a format that chemists can interpret to reconstruct proper chirality. However, this format is not “machinereadable”, that is, cheminformatics software for 3D structural conversion, or for automatically generating IUPAC (International Union of Pure and Applied Chemistry) nomenclature, cannot perceive the stereo centers correctly
I if the “above/below plane” convention is not strictly enforced. We illustrate 13.2 WOMBATand WOMBAT-PK
this with ACDName [27] on the structures depicted in Fig. 13.2-8: The software does not perceive two stereo centers for the tropane ring on the left side and returns an error for the sugar structure. The errors are not specific to ACDName - this program is used only to illustrate the problem. Another type of problem in structure-conversion is the cross up/down wedge error, when two such bonds emerge from the same chiral center (Fig. 13.2-8): Software cannot assign the proper chirality, since by convention three atoms are in the ‘paper plane’, and only one is ‘wedged’ (up or down); two wedged bonds are simply not possible according to the convention. Most of these errors can be corrected by checking previous literature. Sometimes, even the cited reference may turn out to be an error, for example, the reported MW is not consistent with the drawn, or named, structure. From a quality control standpoint, the assignment of the SwissProt ID for each target can be a challenge, as publications do not always specify the exact target used in an assay. In some instances, the species from which the target was isolated is not explicitly mentioned, whereas some publications do not mention what target subtype was used. For example, there are 1780 entries in WOMBAT 2006.1 that contain ‘estrogen receptor’ (ER) in the target name, which implies that ERs present in a particular organ (e.g., uterus, breast, brain) were tested for binding, agonism or antagonism. Of these, 1201 entries were annotated for a specific receptor subtype, either E R a or ERP, or ‘3A1’ and ‘3A2’ according to the nuclear receptor nomenclature [28]. For the remaining 579 entries, a target could not specifically be assigned to a single SwissProt ID. This begs the question of storing multiple SwissProt ID values when a mixture of targets is present. This situation is common for integrin receptors that have the two protein chains separately defined in SwissProt. In the ER example, 114 of the 579 entries were tested on MCF7 cells; however, it is now clear that a third ER, GPR30 [29], could be present in MCF7 cells [30]. Therefore, the observed anti-estrogenic activities for these 114 entries should be questioned in the light of this new information; should three such receptors be encoded? It further illustrates the dynamic nature of biological targets: As biologists uncover more information about a particular target or class of targets, and as our understanding about each target evolves, the exact nomenclature changes as well. For example, there are 852 entries in WOMBAT 2006.1 that contain ‘VEGFR-2’ as the target name: This target name stands for the vascular endothelial growth factor receptor subtype 2, but was previously known as ‘Flk-l/KDR’,or ‘fetal liver kinase-1’ and ‘kinase insert domain-containing receptor’. The VEGFR-2 name is present in all 852 entries, even though some of the older (before 1999) publications did not refer to this target by the VEGFR-2 name. In an annotated database such as WOMBAT, one has to monitor and update not only changes related to biology but also changes related to chemistry (and chemical
771
772
informatics I errors), discussed in more detail below. Practical applications based on 13 Chemical
WOMBAT data mining using targets [31] and descriptors [32] have been described. 13.2.5 Uncovering Errors From Literature
As the demand for integrated chemical and biological information increases, scientists rely more often on annotated databases that capture medicinal chemistry literature (see Tables 13.2-1 and 13.2-3). There is little, if any, error checking downstream from publication time, even though mechanisms for publishing errata have been in place for quite some time. While the responsibility for published data accuracy resides primarily with the author(s), it is also the responsibility of annotated database curators to capture as many of these errors as possible. While ensuring the quality control in WOMBAT, we have found inconsistencies in many ofthese publications. These errors may have a significant effect on the way we understand the molecular basis of chemical-biological interactions, at least for some particular series used for structure-activity studies. Coats has traced the errors in a known steroid benchmark for quantitative structure activity relationship (QSAR) studies to the original publications [33]. Some of these errors are discussed below.
Example 1. The following errors were found in Table 1 of Ref. 34, page 126: compounds with molecular names 53 and 56, respectively, appear to be duplicated because all their substituents are identical. On the basis of their activities, 56 (compound 15e in [35])has the meta -0CH3-CbH4 substituent, while 53 (compound 15g in [35])has the para -OCH3-C6H4 substituent; the -NH- group is missing from the L substituent in compound 27 (compound 9 in [36]),and the -CH2- group is missing from the L substituent in compound 45 (compound 13 in [35]); the R substituent of compound 66 is -C6Hz-2-CO2CH3-4,5-(CH3)2 instead of the correct -c6 H 2 -2-CO2CH 3 -4,s- (OCH 3)2 group (compound 51, in [ 361); the R substituent of compound 68 is -CbH2-2-CO2-4,5-(CH3)2instead of the correct - C G H ~ - ~ - C O ~ - ~ , S - (group O C H (compound ~)~ 7 in [35]); compound 44 has a -log(ICso) of 7.67 instead of the correct 7.74 (compound 15d in [35]). Example 2. In Table l b of Ref. 37, page 4361, the core structure contains an oxygen atom instead of the correct sulfur atom [38]:
13.2 WOMBATand WOMBAT-PK
wrong
I
correct
Thus, 47 structures (where X is the rest of the molecule) are incorrect in Ref. 37. Since the paper illustrates the capabilities of a particular structure-activity method, the consistent error does not influence the validity of the models; it would, however, greatly influence the use of this series/model in a medicinal chemistry project where the goal would be to improve the binding affinity. Starting from the same initial publication [38], other errors were propagated in [39]: compound 37 has an incorrect double substitution in the para position of the aromatic ring, 2,4-N02,4-OH,while the correct one is 3-N02,4-OH; the R substituent of compound B.12 is 2,4,6-C12,4-OMe instead of the correct 2,6-C12,4-OMe.
Example 3 . Errors could also be found in Chemical Abstracts’ SciFinder [l]. All the errors we encountered originate in the primary publications; their appearance in SciFinder illustrates how such errors can propagate (since SciFinder is a very popular resource). For example, the compound RB-380 (CAS# 187454-94-0),published in [43] (original molecule name 24) has a ring size of 14 atoms, instead of the correct 13: SciFinder structure
Correct structure
H C34H42N607S2
L-Phenylalaninamide, N-(5-mercapto-loxopenty1)-a-methyl-D-tryptophykhomocysteinyl-L-a-aspartyl-,cyclic (1+2)disulfide (this name is given in CAS)
C33H40N607S2
Cyclo-S,S-[(5-thiopentanoyl)-c~Me(R)-Trp~ Cysl-Asp-Phe-NH, (this name is given in the original publication (401, in the experimental section)
773
774
I
13 Chemical Informatics
The correction we propose is based on the experimental section name and on the following text fragment (p.648, results section [40]):". . . by introducing an additional amide bond (compound 16 or RB 370) or a disuljide bridge (24 or RB 380) into the 13-membered ring (Schemes 2 and 3), and by changing the size ofthe ring (Table 1, compounds 43 and 45)." By analyzing the data from Table 1 of Ref. 40, compound 43 (which is actually 44 - which is another small error) has a 13 atoms ring, while compound 45 has a 14 atoms ring.
Example 4. Stereochemical ambiguities and structural errors can be encountered in the Merck Index [25] as well, as shown in these two examples:
k %&
Compound identifiers Merck Index structure and error description MG30, anagyrine (CAS# 486-89-5):chiral center inversion and cross / &' upldown wedge H
M 1854, carisoprodol (CAS# 78-44-4): completely different structure. All other information about M1854 is correct (name, formula and molecular weight). The formula is correct in the ninth edition of the Merck Index
+fOOH
Correct structure
/
H
HNY--NH2 0
The examples from SciFinder and the Merck Index are not intended to question the quality of these products, which we consider to be outstanding. They are invaluable resources to many chemists worldwide, and the error rate in these two databases is insignificant if one takes into account the enormous volume of indexed data. We have published a structure-activity paper on HIV-protease inhibitors [41] in which a modified peptide was present in both the training set, and the test set. A1 Leo of Pomona College has recently [42] detected 100 chemical and name errors in the printed version of the sixth edition of Burger's Medicinal Chemistry [43],errors that are to be corrected in the on-line edition [44].One can never be too careful in verifying the available information, in particular if one is to invest a significant amount of resources in that area.
13.2 WOMBATand WOMBAT-PK
13.2.6 WOMBAT-PK: Clinical Pharmacokinetics (PK) and Toxicological (Tox) Data
As PK data has become more important during lead discovery and evaluation, we screened the clinical pharmacokinetics literature and developed a chemical database that captures such data in numerical searchable format (WOMBATPK). Its organization is illustrated in Figs. 13.2-9-11, which illustrate three of the 4 panels of the database: The Compound Description panel (Fig. 13.2-9) provides the drug marketed names, some physico-chemical characteristics, as well as structural, chemical (2D depiction and SMILES code), and related information (chirality, salt). The Phamacokinetic Data panel (Fig. 13.2-10) provides the drug target information, and multiple PK and Tox parameters, indexed in both numerical and text form. The third panel, Potential Side Efects, captures data for BBB (blood-brain barrier) permeability, cardiac toxicity data, possibly related to hERG (human ether-a-go-go potassium channel 1) bioactivity, in vitro bioactivities from WOMBAT, as well as mammalian tox data (e.g., the lethal dose 50%, LD50). The fourth panel, Computed Chemical Properties panel, is identical to the one in WOMBAT
Fig. 13.2-9 WOMBAT-PK compound description panel (example).
I
775
776
I
1 3 Chemical Informatics
Fig. 13.2-10 WOMBAT-PK pharmacokinetic data panel (example)
(see Fig. 13.2-6). The 2006 release of WOMBAT-PK contains 900 marketed drugs (in rare cases, some are metabolites) with documented PK and Tox properties. Currently indexed PK, Tox, and physico-chemical properties data are summarized in Table 13.2-4. The top nine properties were captured from the following sources: Goodman 8 Gilman's ninth edition [45] (GSrG), Avery's fourth edition [46] (Av), and the Physician Desk Ref. 11 (PDR). FDA's Center for Drug Evaluation and Research website [47] was consulted for FDA-approved drug labels. Other resources (e.g., Google'") were sometimes used to compile the WOMBAT-PK database. The maximum recommended therapeutic dose [48](MRTD)is available from the FDA [49],whereas MRTD-U (MRTD corrected for the fraction-unbound) was determined by using the percentage plasma protein binding (%PPB)data already indexed in WOMBATPK. Thus, MRTD-U = MRTD x (1 - %PPB), and is available for 498 drugs. Experimental LogD7.4and LogP values from compilation tables [SO] and from the Sangster database [Sl], and pK, values from Avery [46] and the Merck Index [25] were collected for these drugs. In WOMBAT-PK, drug targets are assigned to 753 drugs (of these, 97% have SwissProt IDS), whereas the phase I metabolizing enzymes (all with SwissProt IDS) are recorded for
13.2 WOMBATand WOMBAT-PK
Fig. 13.2-11
WOMBAT-PK potential side effects panel (example)
419 entries. Regarding cardiac toxicity, there are 218 drugs indexed for QTprolongation (a clinical observation based on the ECG, the electrocardiogram), 89 for Torsade de Pointes risk (another ECG signal), and 71 with hERG binding data. Curating clinical PK data requires individual examination [52], and sources such as Goodman & Gilman’s are often considered more reliable. Often, such experimental values are “greater than” or “less than” a given cutoff value. A systematic round-off procedure was implemented, whereby < 5” was attributed a higher value (=2.5),compared to “< 1” (=0.5). Numeric values also differ, sometimes significantly, due to various factors (e.g., multiple dose vs. single dose, children vs. healthy volunteers); thus, conflicting values were sometimes reported. The “on file” values in Table 13.2-4 are often averages between G&G and Avery data, although ~ 3 0 % of the indexed values differ by more than 20% between these two sources (data not shown).To identify trends, we attenuated the effect of such discrepancies by implementing an incremental increase procedure to some of the PK properties, as illustrated in Table 13.2-5. Incremental rank values were selected from experience whenever possible: for example, experimental errors related to percentage oral occur mostly for values between 20 and 80%; 617 and 1217 represent the 112 and full value “
I
777
778
I
13 Chemical Informatics
Table 13.2-4 Experimental PK and Tox data captured in
WOMBAT-PK 2006.1 Property
O n file
%Oral bioavailability %Urinary excretion %Plasma protein binding Clearance, C1 (mL min-' x kg) Nonrenal clearance (fractional) Volume of distribution, VD,, (L kg-') Half-life, T1/2 (hr) Terminal half-life, TT'1/2 (hrs) Effective concentration (mM L-') MRTD (pmole kg-I-bwlday) MRTD-U (pmole kg-l-bwlday; f u corrected) LogD7.4 (measured) LogPoct (measured) pKal pKa2 In vitro Binding Data (from WOMBAT)
CSLC
Avery
277 NIA 434 422 442 45 3 576 580 NIA N/A NIA N/A N/A 274 75 NIA
740 339 776 514 442 552 839 581 119 575 498 513 472 350 99 453
of creatinine clearance (120 mL/70 kg min-'), respectively; 3, 5.5, and 12 are typical 70-kg man volumes in liters for plasma, blood, and extracellular fluids [14]. WOMBAT-PK also captures information about the known (or intended) drug target(s). These are often retrieved from the therapeutic classification data (e.g., anti-histaminic compounds are intended to act as antagonists of the H1 histamine receptor), or can be inferred by searching medicinal chemistry literature - see also Fig. 13.2-10. Of interest is the cross-index of Table 13.2-5 Parent value ranking for certain PK parameters in
WOMBAT-PK 2006.1 % Oral 0- 5 5.1-19.99 20.0-79.99 80.0-95 >95.1
%PPB
0-5 5.01-20 20.01-80 80.01-95 95.01-99 >99.1
Rank 3 oral
Rank 5 oral
0 0 1 2 2
0 1 2 3 4
Rank PPB
0 1 2 3 4 5
CI (mL min-' x kg)
0 ~ 7 ) (6.01/7)-(1217) (12.0117)-5 5.01-10 10.01-15.5 >15.5
Rank CI
0 1 2 3 4 5
% Urine 0-1 1.01-5 5.01-20 20.01-50 50.01-80 >80
Rank urine
V D (L kg-') 0-1 1.01-3 3.01-5.5 5.51-12 >12
Rank V D
0 1 2 3 4 5
0 1 2 3 4
I the WOMBAT and WOMBAT-PK databases, which shows in vitro binding 73.2 WOMBATand WOMBAT-PK
information for certain drugs in medicinal chemistry literature. For example, aspirin has a relatively weak binding affinity to cyclooxygenases COX-1 and COX-2 (but acts as suicide inhibitor); in the same time, it appears to be 2 to 3 orders of magnitude more potent on GP IIb/IIIa, an a 2 b p 3 a integrin involved in platelet aggregation. This probably explains why aspirin is effective at the 75-80 mg/day dose range as an antiaggregant, compared to the 500-1000 mg/day dose range for the anti-inflammatory effects [53].
13.2.7 Datamining With WOMBAT
Example 1. One of the major areas of interest in medicinal chemistry is oncology. The cancer medicinal chemistry space was described earlier by mining the WOMBAT and WOMBAT-PK databases [54].The oncology subset of WOMBAT 2006.1 contains active 917 unique targets, detailed in Table 13.2-6. A query for targets that have over 300 entries allows us to establish an activity histogram, contrasting low-activity entries (Fig. 13.2-12a) with high-activity entries (Fig. 13.2-12b). This allows the user to rapidly identify targets for which the number of low-activity entries exceeds significantly the number of high-activity entries, such as, GGTase, PKA, and Tubulin - see Fig. 13.2-12 legend for target names. In fact, there are only seven entries of Tubulin inhibitors with activity better than 100nM, and only two of them are Ro5 [21] compliant (data not shown). One can conclude that such targets are areas of opportunity for the design of novel inhibitors. By the same token, AR, ERB, and MMP-13, respectively, are targets where the number of highactivity entries highly exceeds the low-activity records. These targets are, probably, already abundant with high-quality ligands, indicating that perhaps selectivity or pharmacokinetic profiling are currently the key areas for further optimization. Example 2. The concept of leadlikeness [32, 55, 561 and its application in developing leadlike libraries [55,57, 581 have been extensively discussed. The reduction of the leadlike concept into practice at Astex [59] resulted in a proposal for fragment libraries in lead discovery called the ‘Rule of Three’: Table 13.2-6 Distribution of target types among oncology targets in WOMBAT 2006.1 Target type
Enzyme Ion channels Protein Receptor
count
Percentage
759 4 56 98
82.77 0.44 6.11 10.69
779
780
I
13 Chemical Informatics
13.2 WOMBATand WOMBAT-PK 4
Fig. 13.2-12 Activity histogram for the most-populated oncology-related targets in WOMBAT 2006.1. There were at least 300 records per target. The top panel (a) shows low-activity compounds (10 pM or less), whereas the bottom panel (b) shows high-activity compounds (100 n M or better). The bars are color-coded according to R 0 5 violations (see also legend). Numbers on top of each bar indicate the number o f compounds with low (a) and high (b) entries per target. Target names are as follows: AR - androgen (or dihydrotestosterone) receptor, CDKZ/cyclin A - cell division protein kinase 2, DHFR - dihydrofolate reductase, ECFR - epidermal growth factor receptor, ER, ERa, and ERB - estrogen
receptor, ER alpha and beta subtypes, respectively, Ftase - protein farnesyltransferase, CCTase - protein geranyl-geranyltransferase, Lck - proto-oncogene tyrosine-protein kinase, MAPK p38 - cytokine suppressive anti-inflammatory drug binding protein, or mitogen-activated protein kinase p38 a , MMP-1 through MMP-9 - matrix metalloproteases 1 through 9, respectively, PDCFR - platelet derived growth factor receptor, PKA - CAMP-dependent protein kinase A, PKC-a - protein kinase C, alpha type, VECFR-2 - vascular endothelial growth factor receptor 2, or kinase insert domain receptor, c-Src - proto-oncogene tyrosine-protein kinase SRC, and Tubulin.
MW < 300, ClogP < 3, number of hydrogen bond donors and acceptors 5 3, flexible bonds 5 3, and PSA 5 60 A’. Using these criteria, WOMBAT 2006.1 returns 6607 entries. Of these, 2001 entries contain at least one biological activity better than, or equal to 100 nM, and 543 of these contain a generic name. This usually means that they are either launched drugs, or natural products, or otherwise in an advanced stage of development. The examples given in Fig. 13.2-13 illustrate the chemotype, target, and activity diversities that can be found in rule-of-three compliant molecules: Neurotransmitter and nuclear hormone receptor agonists (EC50) and antagonists (Ki, ICso, and A’), neurotransmitter transporters, as well as enzyme inhibitors are present, most of them with multiple activities. On the basis of the WOMBAT 2006.1 entries, it appears that there are a number of interesting chemotypes that are rule-of-three compliant. Such cheminformatics-based mining can identify target-specific small molecules for fragment library design [63].
13.2.8 Conclusions and Future Challenges
As annotated databases, WOMBAT and WOMBAT-PK continue to evolve in time - not only with the addition of more entries but also with updates and restructuring of the biological, clinical, and chemical information, which is subject to revision even after the data are captured and indexed. The inclusion of the precomputed properties panel allows the users to quickly identify rule-offive or rule-of-three compliant datasets, or to constrain the query with respect to, for example, flexible bonds, PSA, computed solubility or LogP, and so on. WOMBAT and WOMBAT-PK are currently available in the MDL Isis/Base format. WOMBAT is also integrated in CABINET (Chemical And Biological
I
781
782
I
13 Chemical informatics FH3 H , CH . N i o q N H,C.‘ C H ’ C C H 3
Quinpirole MW = 219.33 ClogP = 2.02 EC, = 8.66 (D,) K, high = 8.80 (D4) K, low = 7.31 (D,) K, high = 7.62 (D3) K, (OW = 6.38 (D,)
Physostigmine MW = 275.35 ClogP = 1.95 IC, = 9.16 (AChE) IC, = 8.09 (BChE)
&CH
0
H /
Norethindrone MW = 298.43 ClogP = 2.78 EC, = 8.66 (PR,) K, = 8.73 (PR,) CH3
CH3
9
&
HO
RTI-110 MW = 279.77 ClogP = 3.12 IC, = 9.21 (DAT) IC, = 8.38 (NET) IC, = 8.26 (5-HTT)
H3C
Ondansetron MW = 293.37 ClogP = 2.71 K, = 8.2 (H3) K, < 6.0 (H,) K, = 9.1 1 (5-HT,) A, = 9.9 (5-HT4)
-
OH
Morphine MW = 285.35 ClogP = 0.57 K, = 9.3 (P,) K, = 8.6 012) K, = 6.55 (6) K, = 7.31 (k,) K, = 7.48 (k,)
FH3 O
w
N
H CH,
z
H
5-OMe-a-Me-Tryptamine MW = 204.27 ClogP = 1.75 K, = 8.66 (5-HT2,) K, = 8.08 (5-HT2,) K, = 9.0 (5-HT2,)
LY-191704
MW = 249.74 ClogP = 2.82 IC, = 8.07 ( 5 ~ - R 1 ) IC, = 5.76 (5a-R2)
Fig. 13.2-13 Examples o f rule-of-three compliant molecules that have biological activity better than 10 nM. Under each molecule, the following information is included: molecule name, MW, ClogP, the biological activity type, value, and target. Target names are as follows: D3 and 0 4 - dopaminergic receptor types 3 and 4, AChE and BChE - acetyl- and butyryl-choline esterases, PRA and PRe - progesterone receptor types A and B,
H
SU-5416 MW = 238.29 ClogP = 2.83 IC, = 8.1 (Flt-I)
H I and H3 - histamine receptor types 1 and 3 , S-HT~A, S-HT~B, S-HTzc, 5-HT3, 5-HT4 - serotonin receptor subtypes ZA, 26, ZC, and types 3 and 4, DAT, NET, 5-HTT - dopamine, norepinephrine, and serotonin transporter proteins, p1, p2,6, k l , k3 - opioid receptor types mu-1, mu-2, delta, kappa-1, and kappa-3, 5u-R1 and 5a-R2 - 5-alpha-reductase isozymes 1 and 2, Flt-l - fms-like tyrosine kinase receptor.
References I783
Informatics NETwork) [ G l , 621 as a server. CABINET [G2], a federation ofhighperformance scientific databases that collaborate through web-like interfaces to provide integrated access to diverse chemical and biological information, is described elsewhere [Gl]. Federated database servers such as CABINET could, for example, bring together WOMBAT and C-QSAR [ G 3 ] , but the challenge goes beyond technical issues related to field correspondence. Data normalization (e.g., ensuring similar treatment regarding chirality, salt information, measured and computed properties) is likely to require on-the-fly data interpreters, which in turn forces lack of ambiguity for all data entries in WOMBAT and other databases. Data transparency is not always possible: For example, most WOMBAT entries related to epithelial growth factor receptor (EGFR) are classified as ‘TargetType = enzyme’, because EGFR is a membrane receptor-linked tyrosine-protein kinase and medicinal chemists target EGFR for kinase inhibition. However, in one instance, ‘TargetName’ was assigned as ‘receptor’ because the endogenous ligand, EGF, was used to test for EGFR antagonism [64]. Thus, restricting data fields to certain value types, usually an asset for database indexing, can become a hindrance when the unexpected occurs. And, one of the challenges in database federation remains adaptive data normalization for biology-related data fields, since biological phenomena are not always amenable to unambiguous mapping. By successfully addressing these problems, it is quite likely that integrated data mining tools will change the way we conduct everyday research. Acknowledgments
The authors thank Prof. Hugo Kubinyi (Heidelberg, Germany) for suggestions.
References I . Chemical Abstracts online and its
search module, SciFinder, are available from the American Chemical Society, http://www.cas.org/ SCIF I N D E R/ ,2006. 2. The Beilstein Information Systems is available from, http://www. beilstein.com/. 2006. 3. The Spresi Database is available from InfoChem GmbH, Miinchen, http://www.spresiweb.de/; and from Daylight Chemical Information Systems, http://www.daylight.com/ products/databases/Spresi.html, 2006. 4. MDDR is available from MDL Information Systems,
http://www.mdli.com/products/ finders/database_finder/ and from Prous Science Publishers, http://www.prous.com/index.html, 2006. 5. WDI. The Denvent World Drug Index, is available from Dement Publications Ltd., http:// thomsonderwent.com/products/ Irlwdij and from Daylight Chemical Information Systems, http://www. daylight.com/products/databases/ WDI.htm1, 2006. 6. The Current Patents Fast Alert database is available from Current Patents Ltd., London,
784
I
13 Chemical Informatics
http://www.current-patents.com/, 16. 2006. 7. The Comprehensive Medicinal Chemistry database is available from MDL Information Systems, Inc., 17. http://www.mdli.com/products/ knowledge/medicinalLchem/index.jsp, 2006. 18. DiscoveryGate is available from MDL 8. Information Systems, Inc., http:// www.mdli.com/products/knowledge/ discoverygate/; a subset of DiscoveryCate is available through the l9. PubChem system, see http:// www.mdli.com/company/news/ press-releases/2006/pr-pubchemZlmarOG.jsp, 2006. 9.
10.
11.
12.
13.
14.
15.
The PubChem database is available online at the National Center for Biotechnology Information, http://pubchem.ncbi.nlm.nih.gov/, 2006. C.P.Austin, L.S. Brady, T.R. Insel, F.S. Collins, NIH molecular libraries initiative, Science 2004, 306, 1138-1139. The Physician Desk Reference is produced by 2003, ISBN 1-56363-472-4,and is available online at http://www. pdr.net/, 2006. The DrugBank database is available at, http://redpoll.pharmacy . ualberta.ca/drugbank/, 2006. M. Olah, M. Mracec, L. Ostopovici, R. Rad, Bora, N. Hadaruga, I. OIah, M. Banda~'. Simon, M. Mracec, T.l. *OMBAT: Of bioactivity, in Cheminformatics in Drug Discovery, (Ed.: T.I. Oprea), Wiley-VCH, New York, 2005, 223-239. T.I. Oprea, P. Benedetti, G. Berellini, M. Olah, K. Fejgin, S. Boyer, Rapid ADME filters for lead discovery, in Molecular Interaction Fields, (Ed.: G. Cruciani), Wiley-VCH, New York, 2006,249-272. WOMBAT and WOMBAT-PK are available from Sunset Molecular Discovery, Santa Fe, New Mexico, http://www.sunsetmolecular.com, 2006.
20.
21.
22.
23.
*.
24.
25. 26.
M. Olah, T.I. Oprea, Bioactivity databases, in Comprehensive Medicinal Chemistry 11, (Eds.: J. Taylor, D. Triggle), Elsevier, New York, 2006. D. Weininger, SMILES 1. Introduction and encoding rules, J. Chem. Ins ComPut. sci. 1988, 28, 31-36. D. Weininger, A. Weininger, J.L. Weininger, SMILES 2. Algorithm for generation of unique SMILES notation, J. Chem. In& C o m P t . SCi. 1989,29,97-101. The Nomenclamre is recommended by the International Union of Biochemistry and Molecular Biology, and is available at
http://www.chem.qmul.ac.uk/iubmb/ enzyme/, 2006. Swiss-Prot Protein knowledgebase database, http://kr.expasy.org/sprot/, 2006. C.A. Lipinski, F. Lombardo, B.W. Dominy, P.J. Feeney, Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Delivery Rev. 1997, 23, 3-25. A. Leo, Estimating LogP,,, from structures, Chem. Rev. 1993, 5, 1281-1306. I.V. Tetko, V.Y. Tanchuk, Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program,J. Chem. In$ Comput. Sci. 2002, 42, 1136- 1145, http:// 146.107. 217.178/lab/alogps/index.html. The Digital Object Identifier (DOI) is a system for identifying and exchanging intellectual property in the digital environment (http://www.doi. org/). An object is directly accessible using the customized address http://dx.doi.org/DOI_VALUE, 2006. Merck Index (13th edition), Merck & Co, Rahway N J , 2001. T.I. Oprea, M. Olah, L. Ostopovici, R. Rad, M. Mracec, in On the Propugation of Errors in the Q S A R Literature in EuroQSAR 2002 - Designing Drugs and Crop Protectants: Processes, Problems and Solutions, (Eds.: M. Ford,
References I 7 8 5
27.
28.
29.
30.
31.
32.
33.
34.
35.
D. Livingstone, J. Dearden, H. Van de Waterbeemd), Blackwell Publishing, New York, 2003, 314-315. ACDName is available from Advanced Chemistry Development Inc., Toronto, Ontario, CA, http://www.acdlabs.com. G-protein coupled receptors are classified according to the GPCR nomenclature available at http://www.gpcr.org/7tm, whereas nuclear receptors are annotated based on the N R nomenclature available at http://www.receptors.org/NR, 2006. E.J. Filardo, J.A. Quinn, K.I. Bland, A.R. Frackelton Jr, Estrogen-induced activation of Erk-1 and Erk-2 requires the G protein-coupled receptor homolog, GPR30, and occurs via trans-activation of the epidermal growth factor receptor through release of HB-EGF, Mol. Endocrinol. 2000, 14, 1649- 1660. C.M. Revankar, D.F. Cimino, L.A. Sklar, J.B. Arterburn, E.R. Prossnitz, A transmembrane intracellular estrogen receptor mediates rapid cell signaling, Science 2005,307,1625-1630. N. Nidhi, M. Glick, J.W. Davies, J.L. Jenkins, Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases, J. Chem. In$ Model. 2006, 46,000-000. T.I. Oprea, Cheminformatics in lead discovery, in Cheminformatics i n Drug Discovery, (Ed.: T.I. Oprea), Wiley-VCH, New York, 2005,27-42. E.A. Coats, The CoMFA steroids as a benchmark dataset for development of 3D-QSAR methods, in 3 0 Q S A R in Drug Design, Vol. 3, Recent Advances, (Eds.: H. Kubinyi, G . Folkers, Y.C. Martin), Kluwer/ESCOM, Dordrecht, 1998,199-213. Q. Chen, C. Wu, D. Maxwell, G.A. Krudy, R.A.F. Dixon, T.1. You, A 3D-QSAR analysis of in vitro binding affinity and selectivity of 3-izoxazolylsulfonylaminothiophenes as endothelin receptor antagonists, Quant. Struct.-Act. Relat. 1999, 38, 124-133. C. Wu, M.F. Chan, F. Stavros, B. Raju, I. Okun, S. Mong, K.M. Keller,
36.
37.
38.
39.
40.
41.
42. 43.
44. 45.
T. Brock, T.P. Kogan, R.A. Dixon, Discovery ofTBC11251, a potent, long acting, orally active endothelin receptor-A selective antagonist, J. Med. Chem. 1997,40, 1690-1697. C. Wu, M.F. Chan, F. Stavros, B. Raju, I. Okun, R.S. Castillo, Structure-activity relationships of N2-aryl-3-(isoxazolylsulfamoyl)-2thiophenecarboxamides as selective endothelin receptor-A antagonists, /. Med. Chem. 1997,40, 1682-1689. S.S. So, M. Karplus, Three-dimensional quantitative structure-activity relationships from molecular similarity matrices and genetic neural networks. 2. Applications, /. Med. Chem. 1997, 40, 4360-4371. B.J. Burke, A.J. Hopfinger, 1-(Substituted-benzy1)imidazoleZ(3H)thione inhibitors of dopamine B-hydroxylase, J. Med. Chem. 1990, 33, 274-281. A. Vedani, D.R. McMasters, M. Dobler, Multi-conformational ligand representation in 4D-QSAR: Reducing the bias associated with ligand alignment, Quant. Struct.-Act. Relat. 2000, 19, 149-161. A.G.S. Blommaert, H. Dhotel, B. Ducos, C. Durieux, N. Goudreau, A. Bado, C. Garbay, B.P. Roques, Structure-based design of new constrained cyclic agonists of the cholecystokinin CCK-B receptor, J. Med. Chem. 1997,40,647-658. T.I. Oprea, C.L. Waller, G.R. Marshall, 3D-QSAR of human immunodeficiency virus ( I ) protease inhibitors. 11. Predictive power using limited exploration of alternate binding modes,J. Med. Chem. 1994, 37,2206-2215. A. Leo, Personal communication, 2004. D. Abraham, Burger’s Medicinal Chemistry (6th edn), Wiley-VCH, New York, 2003. D. Abraham, Personal communication, 2004. J.G. Hardman, L.E. Limbird, P.B. Molinoff, R.W. Ruddon, A.G. Gilman, Goodman @ Gilman’s the
786
I
13 Chemica/ informatics
pharmaceutical research, Curr. Opin. Chem. Biol. 2004,8,255-263. 57. R.A. Goodnow Jr, P. Gillespie, K. Bleicher, Cheminformatic tools for library design and the hit-to-lead process: a user’s perspective. in Cheminformatics in Drug Discovery, (Ed.: T.I. Oprea), Wiley-VCH, New York, 2005,381-435. 58. K.H. Baringhaus, H. Matter, Efficient fda.gov/scripts/cder/drugsatfda/. strategies for lead optimization by 2006. simultaneously addressing affinity, 48. J.F. Contrera, E. J. Matthews, N.L. selectivity and pharmacokinetic Kruhlak, R.D. Benz, Estimating the parameters, in Cheminformatics in safe starting dose in phase I clinical Drug Discovery, (Ed.: T.I. Oprea), trials and no observed effect level Wiley-VCH, New York, 2005, based on QSAR modeling of the 333-379. human maximum recommended daily dose, Regul. Toxicol. Pharmacol. 2004, 59. M. Congreve, R. Carr, C. Murray, H. Jhoti, A ‘Rule ofThree’ for 40,185-206. fragment-based lead discovery? 49. MRTD is available from the CDER, Drug Discov. Today 2003, 8, website, http://www.fda.gov/cder/ 876-877. Offices/OPS_IO/MRTD.htm,2006. 60. T.I. Oprea, J. Blaney, 50. C. Hansch, A. Leo, D. Hoekman, Cheminformatics approaches to Exploring QSAR, Vol. 2, ACS fragment-based lead discovery, in Publishers, Washington D.C., 1995. Fragment-based Approaches in Drug 51. The Sangster database is available at, Discovery, (Eds.: W. Jahnke, D.A. http://logkow.cisti.nrc.ca/. Erlanson), Wiley-VCH, New York, 52. L.Z. Benet, Personal communication, 2006,99-121. 2006. 61. V. Povolna, S. Dixon, D. Weininger, 53. S. Andrieu, M. Lebret, J. Maclouf, CABINET - Chemistry and biological F. Beverelli, J.F. Giudicelli, informatics network, in A. Berdeaux, Effects of antiaggregant Cheminformatics in Drug Discovery, and antiinflammatory doses of aspirin (Ed.: T.I. Oprea), Wiley-VCH, New on coronary hemodynamics and York, 2005,241-269. myocardial reactive hyperemia in 62. CABINET is available from conscious dogs, J. Cardiovasc. Metaphorics LLC, Santa Fe, N M , Pharmacol. 1999,33,264-272. http://cabinet.metaphorics.com/. 54. D.G. Lloyd, G. Golfis, A.J.S. Knox, 63. C. Hansch, D. Hoekman, A. Leo, D. Fayne, M.J. Meegan, T.I. Oprea, D. Weininger, C.D. Selassie, C-QSAR Oncology exploration: charting cancer database. Available from the BioByte medicinal chemistry space, Drug Corporation, Claremont, CA, Discov. Today 2006, 11, 149-159. http://www.biobyte.com. 55. M.M. H a m , A. Leach, D.V.S. Green, 64. P. Furet, B. Gay, G. Caravatti, Computational chemistry, molecular C. Garcia-Echeverria, J. Rahuel, complexity and screening set design, J . Schoepfer, H. Fretz, Structure-based in Cheminfomatics in Drug Discovery, design and synthesis of high affinity (Ed.: T.I. Oprea), Wiley-VCH, New tripeptide ligands of the Grb2-SH2 York, 2005,43-57. domain, I. Med. Chem. 1998,41, 56. M.M. Hann, T.I. Oprea, Pursuing the leadlikeness concept in 3442- 3449.
Pharmacological Basis of7herapeutics (9th edn), McGraw Hill, New York, 1996. 46. T.M. Speight, N.H.G. Holford, Avery’s Drug Treatment (4th edn), Adis International, Auckland, 1997. 47. FDA labels are at the Center for Drug Evaluation and Research (CDER), website, http://www.accessdata.
PART VI Drug Discovery
Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag CmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
14 Chemical Biology and Drug Discovery 14.1 Managerial Challenges in implementing Chemical Biology Platforms
Frank L. Douglas
14.1.1 introduction
This chapter will present the experiences and perspectives that led to the creation of a concept named Chemical Biology Platform (CBP). CBPs embrace the modern day version of the “drug discoverer” and the management challenges associated with innovation. The management challenges are largely due to the complexity and marked increase in quantity of information about chemical structures, disease targets, and pathophysiology, as well as the pharmacology studies in disease models and patient subpopulations. Currently, management must also address the additional complexity of mergers, which also affects information integration and organizational collaboration. The challenge of accessing and correlating information generated by the partners in the merger is often underestimated. Perhaps, even more challenging is the attempt to build a culture for the newly merged company in which scientists from different countries and organizations share information, collaborate, determine global standards, and leverage both tacit and explicit knowledge. The discussion will therefore focus on both the scientific and cultural underpinnings of CBPs within an organizational context. 14.1.2 The Management Challenge
The discovery and development cycle requires 10 to 15 years to move from a conceptual biological and chemical approach, through preclinical and clinical Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Giinther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
I789
790
I development to approval. Since, not surprisingly, the probability of success 74 Chemical Biology and Drug Discovery
(POS) increases with a project’s progression along the development path to approval, the key challenge of management is to change the traditional POS and compress the time relationship to one that is most representative of a knowledge-driven paradigm (see Fig. 14.1-1).The knowledge-driven S curve is often achieved when a team is working on follow-on compounds for a validated disease target, or when the target being pursued is a “common mechanism”, which is relevant for more than one disease and the compound has approval for one of the diseases within the common mechanism. Practically, in the innovation of new drugs, one can classify the set of research and development activities into four primary areas or clusters of activities, technologies, and responsibilities. The classifications are: target identification, lead finding, lead optimization, and proof of product or product realization. Note that target validation, a critical element in research and development, is ultimately demonstrated in successful phase 111 clinical studies. In Aventis, the traditional Research and Development organization was reorganized into four divisions where the most relevant disciplines were clustered within each division, as shown in Table 14.1-1. This organizational design was based on three principles. First, clustering disease expertise with concomitant technological support increases innovation and knowledge. Secondly, aligning global resources and accountability will leverage scarce resources, enhance technological innovation and reduce cycle time; and third, late stage expertise applied to early innovative projects will rapidly identify issues, conserve resources, and provide clinical knowledge for next generation projects.
Probability of success
0
TI
1
LI LO IND/CTX
PR
GRAMS
Discovery and development time
Fig. 14.1-1 TI-target identification; Li-lead identification; LO-lead optimization; PR-product realization; GRAMS-global regulatory and marketing support.
15 years
74. I
Managerial Challenges in Implementing Chemical Biology f/atforms
Table 14.1-1 Centers ofexpertise in the Drug innovation and Approval Organization of Aventis Lead generation
Lead organization ~
Functional genomics Lead discovery technologies Chemistry (medicinal and computational) Chemical development
Drug metabolism and pharmacokinetics (DMPIC) Drug safety evaluation Clinical discovery and human pharmacology (phases I and IIa)
Product realization
Global regulatory and marketing support
Clinical development (phases IIb and 111) Pharmaceutical development Biostatistics and data management Global project teams
Quality assurance Chemistry, manufacturing and control Pharmacovigilance and epidemiology Regulatory liaison and policy Global labeling
We, at Aventis, were also convinced that a dramatic increase in the POS would occur only when the following three conditions were satisfied, namely: The selected target is relevant and critical in the disease process. Proof of principle of target validation can be demonstrated in the relevant patient population, usually in phase IIa clinical trials. Clinical trials can be designed and performed to demonstrate a good benefitfsafety ratio, usually in phase 111. Each of these requirements represented unique challenges in which insufficient information and knowledge affect the POS.
14.1.3 Observation-based Discovery Background
Historically, drug discovery proceeded through the exploitation of observations about a potential therapeutic product without having either an optimized compound or an identified target as starting point. Two outstanding examples are aspirin and penicillin, and both exemplify how POS is increased and the time to development of follow-on products is accelerated by use of accumulated knowledge. The story of acetylsalicylic acid (aspirin) began as early as fifth century B.C. when Hippocrates noted that the powder from the bark and leaves of the willow tree could treat headaches and fever. However, it was not until the late 1820s that the work of several European scientists, including Johann
I
791
792
I Buchner of Germany, Brugnatelli and Fontana of Italy, and Henri Leroux, 14 Chemical Biology and Drug Discovery
resulted in the extraction of the active ingredient salicin [l]. In 1899, the German chemist Felix Hoffmann convinced Bayer to market acetylsalicylic acid, which was first synthesized in 1953 by Frederic Gerhardt and was devoid of the severe stomach irritation that was seen with the unbuffered salicylic acid. This was followed by the rapid development of several organic acids similar to aspirin, for example, ibuprofen and diclofenac, which were approved for the treatment of pain and inflammatory disorders [2]. Thus, the focus became the modification of chemical structure to optimize the activity of these compounds to treat inflammation and pain. Finally in 1971, Sir John Vane identified aspirin’s mechanism of action, namely the inhibition of cyclooxygenase (COX) enzyme that converted arachidomic acid to prostaglandins [3].The identification of COX as the target accelerated the discovery and development of nonsteroidal anti-inflammatory drugs (NSAIDs). Perhaps one of the most impressive acceleration of the time from discovery to product was that of development of COX-2 inhibitors, such as celecoxib (Celebrex).Celebrex, a COX-2 selective inhibitor, was brought onto the market in 1999, about 8 years after identification of the COX-2 enzyme. This example demonstrates the marked reduction in cycle time that is possible when one is able to satisfy the requirements of achieving the “S Curve”. These requirements are: a validated target knowledge of the structure of the target a large library of compounds with clear structure-activity relationship (SAR) predictive animal models, and/or a human model of disease. To clarify the understanding of a human model of the disease, we mean a human illness in which Koch-like postulates can be demonstrated, that is: a marker of the disease is present in the population; the intervention impacts the marker; the change in the marker correlates with the clinical response. The recognition of the role of prostaglandins in inflammation and platelet function led to the rapid use of the production or inhibition of various prostaglandins as markers for inflammatory and thrombotic diseases. Another example of this historical approach is the discovery of penicillin. In 1871, the English surgeon Lister observed that urine samples contaminated with mold did not allow the growth of bacteria. More important, in 1897 Ernest Duchenne reported that Penicillium glaucum inhibited the growth of Escherichia coli when both were grown on the same culture and that P. glaucum also prevented animals inoculated with lethal doses of typhoid bacilli, from contracting typhoid. Duchenne’s premature death from tuberculosis prevented
14. J Managerial Challenges in Implementing Chemical Biology Platforms
his further pursuit of the observations [4]. In 1928, Sir Alexander Fleming observed that a species of the mold Penicillium had inhibited the growth of Staphylococcus aureus in a culture. Like a true drug discoverer, however, Sir Alexander Fleming, having discovered lysozyme in 1922, sensed the importance of his serendipitous observation and pursued it. His tireless enthusiasm for and presentation of his work on penicillin finally won the interest of Drs. Cecil Paine, Howard Florey, and Ernest Chain. They were able to demonstrate the medical potential of penicillin in individual infected cases, as well as succeed in extracting “purified” drug in about 1940. Between 1940 and 1942, efforts were successfully focused on the challenge of optimizing the production of penicillin. Seventeen years later, John Sheehan of Massachusetts Institute of Technology achieved a total synthesis of natural penicillin [S]. Thus, the penicillin story demonstrates a history similar to that of aspirin. As in the case of aspirin, a target was serendipitously recognized from an in vitro observation and there was a simultaneous proof of presence of an active ingredient or compound. Drug discovery was thereafter focused on isolating and synthesizing the active ingredient while pharmacological experiments were performed in parallel. In the case of aspirin, the discovery of penicillin is another case in which one started with a validated, unidentified target and an active unidentified, unoptimized drug. Progress was accelerated when the structure of penicillin was solved and its mechanism in inhibiting the crosslinking of peptidoglycan was identified [GI.This discovery led to a number of semi- and synthetic penicillins and cephalosporins, both based on the j3-lactam structure that inhibited the enzyme that forms the peptidoglycan structure of the cell walls of bacteria. 14.1.4 Mechanism-based Discovery Background
Propranolol is an interesting development and example of the progression toward mechanism-based research. The hypothesis of the existence of the ,!?-receptorand the search for an antagonist occurred almost simultaneously. “Tools” to optimize compounds and to characterize a- versus j3-receptors became available. The continued modification of these compounds along with simultaneous improvement of the bioassays resulted in a rapid cycle of information generation and exploitation. In addition, Sir James Black was able to go rapidly into a proof of concept in healthy volunteers with pronethalol, a prototype and predecessor compound to propranolol. This evidence revealed that a drug discovery team of pharmacologists and chemists was rapidly incorporating new information, making correlations, and prototyping. It was the genesis of the concept of chemical biology but not formally accepted as a practice. The POS was greater than what would have been expected at the beginning of this project because tool compounds existed
I
793
794
I that allowed simultaneous attempts at validating the hypothesized target as 14 Chemical Biology and Drug Discovery
well as finding the optimal compound. Similar conditions existed in the case of antihistamines and that enabled Sir James Black to propose that there were two histamine receptors and to validate rapidly the hypothesized HI receptor with cimetidine, an optimized compound. The key point of these successes, however, is the fact that Sir James Black, the pharmacologist, and Dr Stephenson, the chemist, integrated and correlated previous information to uncover new drugs [7].
14.1.5 Twenty-first Century Experience: Ketek (Novel Anti-infective Drug in 2003)
In our own experience at Aventis with Ketek, we could go rapidly from concept to regulatory submission, because the in vitro biological models existed. The models rapidly validated (a) its antibacterial activities and (b) the binding at two sites on the 23s rRNA of the 50s ribosomal subunit which made it effective against penicillin-resistant Streptococcus pneurnoniae [8].Secondly, an understanding of the drug’s metabolism enabled targeted clinical studies to evaluate any potential liabilities with respect to liver side effects or QT,. Thus, the POS was high due to the extensive knowledge in the antibiotic arena and expertise in QT, that existed in Hoechst Marion Roussel where it could be leveraged during the discovery and development of Ketek. This was the case of a validated target but unoptimized compound (Fig. 14.1-1).Ketek was also a second compound in the series, as the first compound was terminated because of liver side effects. The above examples satisfy the Sir James Black criteria for selecting projects with a high initial POS. Sir James Black’s advice was: 1. Start with a clinical problem. 2. Identify the controlling chemicals or hormones in the system. 3 . Start at the most basic molecular level and test similar molecules for in vitro activity [9]. The three points mentioned above were clearly observed in the discovery of Enbrel. In this case, a fusion protein consisting of soluble p75-TNF (tumor necrosis factor) receptor type 11 and the F, protein of human IgG receptor was the “chemical” of interest. This approach was very clever in that Craig Smith and Raymond Goodwin proposed that injecting a soluble TNF receptor would assist in binding the excess TNF, which on interacting with its receptor on the cell triggers the inflammatory process in rheumatoid arthritis patients. The excess circulating TNFa, was the identified and somewhat validated target. This cytokine plays a critical role in synovial proliferation. The technical optimization step was the cloning and expressing of the TNF receptor. And as in the earlier case of propranolol, an animal model existed, namely,
14. I
Managerial Challenges in Implementing Chemical Biology Platforms
the collagen-induced arthritis mouse model, in which the concept could be simultaneously optimized and validated. Further, TNF served as a biomarker in the patient studies.
14.1.6 Observation Summary and Future Application
The above examples reveal the following characteristics for an enhanced POS: 1. degree of validation of the target 2. optimization of leads 3 . ability to link optimization of lead with in vivo validation of target 4. ability to test early in humans, particularly with aid of biomarkers 5. rapid prototyping through leveraging of knowledge generated from previous, relevant studies. In complex, global organizations, the challenge is to create an environment that enables the transfer of information and knowledge, and utilizes rapid prototyping. One answer is the establishment of CBPs and was applied in Aventis. Figure 14.1-1 schematically shows the above scenario for a CBP project and compares it with known mechanism-based approaches and unidentified and unvalidated target projects. The middle curve represents the case of aspirin or penicillin in which a validated but unidentified target is discovered. Concurrent with the discovery of this target there is also recognition of the existence of an active principle or compound. The discovery effort is therefore initially focused on isolating and characterizing the active compound, followed by simultaneous development of in vitro and in vivo biological assays to enable optimization of the compound. The positive POS value depends on the disease being studied. For example, it is greater for anti-infectives as compared to an antipsychotic, because the efficacyin vitro and animal assays are more predictive for efficacy in man when one is dealing with anti-infectives. The POS rises rapidly through phase IIa, the end of the lead optimization period. The bottom curve for a selected and unidentified and unvalidated target represents today’s paradigm. Here, the example is a selected putative target based on differential gene expression. Targets of this nature are rarely validated. A second challenge is that its protein product, for example, enzyme, although easily identified, often is not easily crystallized, and therefore little structural information is available to permit a rational drug design approach. This period of target identification/lead identification (LI) can sometimes be quite long, 2 to 5 years, before one can start the lead optimization phase of activities. The POS approaches 100% much more slowly, even after
I
795
796
I
14 Chemical Biology and Drug Discovery
initial work in clinical phase 111 is underway, and only at the conclusion of phase 111 are the data available to determine whether the target is valid and relevant. The upper curve is the best-case scenario. Here the target is not only identified but also validated. In addition, the biological structure is known and as a result one can start with rational drug design and de nouo synthesis. Here, the time to LI is shortest. At the very outset of the project, the POS is very high, both because the target is validated and there is structural information that enables rapid lead finding, optimization, and prototyping. This situation is approached when one is working on follow-on or next generation compounds for a drug that is already in the market, and has a clear mechanism of action or target. The genomic age presents a significant opportunity to rapidly generate information and approximate the upper or common mechanism curve. Genomics, proteomics, metabolomics, pharmacogenomics, and bioinformatics will bear fruit when two additional disciplines mature. These disciplines are the structural biology and the application of knowledge management to families of targets such as kinases, proteases, ion channels, and G-protein coupled receptors (GPCRs).This will enable prediction and generation of SARs in silico, which is the hope and future of CBPs.
14.1.7 Establishment o f Organizational Structures for Chemical Biology Platforms
In 1997,as mentioned above, Hoechst Marion Roussel, later to become Aventis, reorganized Research and Development and renamed it Drug Innovation and Approval (DI&A) (Fig. 14.1-2). A key aspect of this organization was the creation of the Lead Optimization organization that had the responsibility to develop proof of concept in man. This organization provided support to the project teams by generating data in the areas of drug metabolism and pharmacokinetics (DMPK), toxicology, biomarkers, and phases I and IIa clinical trials. The goal was to go rapidly into human studies and through “rapid prototyping” feed back information to the project teams to enable the optimization of their compounds. Another key component of the Drug Innovation and Approval organization was the multidisciplinary project teams. The project teams were the “units of innovation” and were managed by the Heads of the various sites, who had responsibility from target identification through phase IIa. After phase IIa, the projects were managed on a global basis from the Global Drug Development Center, in Bridgewater, New Jersey. Since each site had responsibility for specific diseases, through phase IIa, as well as the global functions, lead generation and lead organization had units at each site (see Table 14.1-1); all members of these project teams were colocated through phase IIa. This permitted the close, rapid exchange of information and collaboration around
14.1 Managerial Challenges in Implementing Chemical Biology Platforms
Fig. 14.1-2
Drug Innovation and Approval (DldA).
each project. The members of project teams also benefited from the knowledge that existed in their disciplines, as they could bring the expertise of their colleagues to any challenge. In 1999, during another set of discussions on how to best share knowledge across project teams in different sites, we discerned several key points. First, we had 54 projects with kinases as targets. These projects were focused on inflammatory diseases, cancer, and central nervous system disorders and existed in all three sites. Secondly, there were no organized mechanisms to foster communication or knowledge sharing among the scientists. A third revelation was that there were some common problems, for example, the toxicity of lead compounds against kinase targets; or the need to develop biased libraries of compounds to enhance “hit” finding; or lack of structural information about the specific kinase enzymes. A fourth revelation was that, although we had made significant progress in DMPK, we were still dramatically losing compounds in man because of safety issues. However, sharing of knowledge among the DMPK scientists did contribute positively to the improvement in attrition rate due to poor DMPK characteristics. Another reality was that 60% of the 200 top selling drugs came from four classes of mechanisms, namely, GPCRs, proteases, kinases, and ion channels and transporters.
1
797
798
I
14 Chemical Biology and Drug Discovery
Finally, there was the recognition that the strategies used to find leads were related to the amount of information we had about the structure of the target. Thus the more knowledge available, the less time was needed to find a lead compound. In fact, the strategies used to find lead compounds were in decreasing order; de novo synthesis, virtual screening, focused screening, and high-throughput screening, depending on the extent of knowledge available. A focus on understanding the structure of the target to identify the spatial and energy requirements of the potential agonist or inhibitor was a clear need. The anticipated deciphering of the human genome was seen as the event that would catalyze the ability to elucidate the structure of targets and further enable rational drug design.
14.1.8 Chemical Biology Platforms (CBP)
In 2000, I introduced the Kinase Chemical Biology Platform that was the first of our four CBPs. The initial step was to identify all scientists across the company (now Aventis) with expertise and interest in kinases. The survey yielded about 300 scientists, many of whom were actively involved in kinase projects. We created a Kinase Community of Practice with these scientists as members and used knowledge mail to facilitate communication, exchange, and development of the kinase network. The second step was the establishment of the Platform. There were two key principles in establishing the CBP. First, (a) no changes in the DI&A basic organizational structure and (b) the goal of the Platform was to facilitate knowledge transfer to enable simultaneous drug discovery. (Simultaneous drug discovery meant anticipating the critical issues and working on them in a parallel rather than sequential fashion.) A CBP core team was appointed and given a charter. This team consisted of senior scientists who were respected by their peers. Each represented one of the following disciplines: medicinal chemistry, computational chemistry, structural biology, molecular biology, toxicology, DMPK, clinical pharmacology, and IT. A knowledge management specialist was assigned to the CBP. The overall responsibility of each CBP core team was to: leverage globally the target family knowledge across projects independent of disease focus and priorities of each site; improve Aventis’ target family compound collections (focused libraries) develop and apply the concept “all target compounds see all targets of a family”; develop target family-specific predictive models and tools use external networks of experts in the field
to produce better compoundsfaster.
14. I
Managerial Challenges in Implementing Chemical Biology Platforms
Each member of the CBP core team was expected to convene a small team of individuals from hislher discipline, who were active members of project teams within the same target family. These CBP strategy teams, as they were called, identified problems that were common to several project teams and developed strategies to solve them. Sometimes this involved engaging academic experts to assist in the resolution. The results and “learnings” were shared with all interested scientists (Fig. 14.1-3). The responsibility ofthe core team was to discuss issues being pursued by the strategy teams, identify the downstream implications for their individual areas, and to look for “breakthrough” solutions or new methods of solving problems. Areas of particular interest included use of structural biology information, strategies for designing focused libraries, and identification of biomarkers.
14.1.8.1
Chemical Biology Early Success and Organizational Benefits
One of the early successes in the kinase CBP was the establishment of a core panel of kinases against which all compounds of interest were screened, and from which “surrogates” were used to form cocrystals and develop SAR. Within 1 year, active compounds were found for the kinases, including 21 active series, and 9 lead compounds were selected.
1
799
800
I
14 Chemical Biology and Drug Discovery
A second immediate success was in DMPK. When a project team working on ITK realized that their early compounds had safety problems due to inhibition of P-450, the ITK team collaborated with the SYK team who had had the same issue and had resolved it after a 2-year effort. ITK was able to benefit from the recent knowledge that was gained in solving the SYK problem. As a result, ITK required 6 months less to successfully design lead compounds without P-450 inhibition liabilities. A third, and perhaps the most significant, achievement was the reduction of the portfolio from 54 kinase to 38 kinase projects based on a more robust evaluation of the POS of each project and ofthe resource commitment required to prosecute the project. Thus, the organization conserved scarce resources and reallocated it to other priorities. We enabled knowledge sharing through the use of methods to capture lessons learned in projects. A particularly effective method was the use of the interrupted case study approach. Whenever a “breakthrough” or novel solution to a problem was found, the scientists involved were invited to write up the results as a case, and present the study at a workshop-setting with an interrupted problem-solving approach. The scientist would at the outset, describe the problem and its importance to the project. The participants would brainstorm among themselves on potential solutions. The presenter would select one or more suggested solutions that were tried and share the results. After another round of brainstorming about other approaches or further efforts, the final direction was presented. In this manner, the presenter would finally unveil the unique solution. This method gained tremendous popularity because it sometimes uncovered additional unanticipated approaches. During the establishment of the kinase CBP, we encouraged the core team, lead by Dr Andreas Batzer, to develop a “Book of Knowledge” in which they recorded the organizational hurdles and the solutions that were encountered in establishing the platform. This turned out to be a very useful exercise and led to one of the most memorable experiences that I have had in my career in the pharmaceutical industry. About 6 months after the initiation of the kinase CBP, I was invited by Dr Hans Peter Nestler to attend a workshop that he organized. He had no other request but my presence. I was on vacation but in Frankfurt, so I decided to attend the afternoon session. The first thing that was remarkable is that Hans Peter had organized a “virtual” workshop among the centers in Frankfurt, Paris, and New Jersey and was conducted by videoconference. The second was that it brought together scientists from the different disciplines, who were working on projects in the protease target family. I listened without interruption and at the end of the session, Hans Peter asked for my comments. I complimented him on the excellent effort and asked how he was able to organize this workshop. He explained that he had used the recommendations from the chemical biology Book of Knowledge as well as had benefited from discussions with Andreas Batzer and his colleagues. And
14.1 Managerial Challenges in lmplementing Chemical Biology Platforms
thus, the Protease Chemical Biology Platform with Hans Peter as head was launched. Shortly thereafter, a total of four chemical biology platforms: kinase (CBK) led by Dr Andreas Batzer, protease (CBP) led by Dr Hans Peter Nestler, ion channels and transformers (CBICT) led by Dr Heiner Glombik, and G-protein coupled receptors (CBG) led by Dr Bruce Baron, were in operation. Thus, within 18 months of my describing CBPs in my keynote address at IBC Drug Discovery Conference in Boston in 2000, four CBPs were functioning. Incidentally, this conference was very significant because the other keynote address was delivered by Dr Craig Venter, who described the challenges of deciphering the human genome. The next address was mine and it acknowledged that, due to this incredible achievement that was led by Dr Venter and Dr Francis Collins, one would be able to think in terms of target families and develop knowledge about both structure and pathophysiology more rapidly. The deciphering of the genome was critical to the application of CBPs in industry.
14.1.9 Other Organizational and Knowledge Challenges
The desire to correlate information across projects and sites disclosed a critical barrier. As a consequence of mergers or groups working independently, such as in business unit structures with a single company, there was a lack of standardization of assays, connectivity of databases, annotation of data, and hence, we were unable to leverage knowledge or data. Thus, the correlation of chemical and biological data was very difficult. We therefore launched, with the help of a small team from McKinsey & Company, a program to establish an informatics platform to support the CBPs. The goals of this effort included: Provision of a curated, standardized, central repository to enable rapid querying and retrieval of diverse, accurate biological data (e.g., sequence similarity, expression, disease association). Knowledge-based establishment of correlations between chemical space (compounds, hits, leads, etc.) and biological space (e.g., target sequence and target 3D structure, as well as ADMET data). Ability to increase POS of the selected portfolio of projects by selecting groups of targets with similar biological properties. Identification of additional predictive and simulation tools to leverage curated data, for example, ADM ET (absorption, distribution, metabolism, elimination and toxicology). Rapid identification of “privileged fragments” that lead to selection of compounds of high interest for a specific target.
I
801
802
I
14 Chemical Biology and Drug Discovery
The overall hope was that the IT platform would not only improve communications among the scientists but lead to increased correlations and serendipitous findings.
14.1.1 0 Conclusion
Table 14.1-2summarizes the differences between the traditional drug discovery approach and that fostered by chemical biology principles. CBPs were designed to take advantage of the promise of genomics and power of information technology in improving decision making and POS in drug discovery and development. The platforms were expected to become the “Knodes” or knowledge nodes of scientific networks that were focused on understanding and generating information about families of enzymes, receptors, ion channels, and transporters with respect to their ability to provide solutions for altered homeostasis and disease in man. By the end of 2002, the Aventis project portfolio was transformed. Of the 139 projects in the LI phase, the kinase and GPCR target families each contributed 19%,the protease and ion channels/transporters about 8% each. For projects in the candidate identification phase, GPCR, kinase, and protease target families each contributed about 20% of the compounds and ion channels/transporters about 12% of the compounds in the portfolio. With respect to processes, there were improved attempts and greater focus on assuring standardization of assays, sharing of information, as well as biased compound libraries across project teams, thus facilitating common Table 14.1-2 Chemical biology
Targets Workflow Scientific concept Organization
Traditional drug discovery
DI&A chemical biology
Collection of targets Sequential activities in chemistry and biology Traditional disciplines
Selected target families Simultaneous efforts in internal and external networks Knowledge-based approaches in biology and chemistry Cross-functional, beyond disciplines, virtual, capability oriented, DI&A network centric Best in class, knowledge-based, learning curves Focus on optimizing the global target family portfolio Entrepreneurial, value oriented
Silos of functionality
Capabilities
Existing skills in disciplines
Value
Individual projects
M ind-set
Functional, hierarchical lines of command
Source: CBK
References I803
mechanism projects across sites. External networks were under way and the early results of the experiment were encouraging. I would recommend further evaluation of this organizational approach to improve productivity in the biopharmaceutical industry, and of the attempts made to quantify the results to determine organizational benefits.
References 1. 2.
3.
4.
5.
6.
Mary Bellis, History of Aspirin, About.com. 1ohn.S. Nicholson, Ibuprofen, in Chronicles of Drug Discovery, (Eds.: J.S. Bindra, D. Leidner), John Wiley, New York, 1982,149. J0hn.R. Vane, Inhibition of prostaglandin synthesis as a mechanism of action for aspirin - like drugs, Nature 1971, 231,232-235. Mary Bellis, the History of Penicillin in [email protected]. C&EN Special Issue, The Top Pharmaceuticals that changed the world, vol83, Issue 25 (6/20/05). E.M.J.R. Wise, J.T. Park, Penicillin: its basic site of action as an inhibitor of a
peptide cross-linking reaction in a cell wall mucopeptide synthesis, Proc. Natl. Acad. Sci. 0.S. A. 1965, 54(1),75-81. 7. 1ames.W. Black, Nobel Lecture: Drugsfor Emasculated Hormones: the Principles of Syntopic Antagonism, 1988, Dec. 8. 8. R. Bersicio, et al. Structural insight into the antibiotic action of telithromycin against resistant mutants, J. Bacteriol. 2003, 185(14),4276-4279. 9. James Black Foundation Promotional Materials, Published by The James Black Foundation, King’s College School of Medicine and Dentistry, Half Moon Lane, Dulwich (London), England.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim 804
I
14 Chemical Biology and Drug Discovery
14.2
The Molecular Basis o f Predicting Druggability
Bissan Al-Lazikani,Anna Gaulton, Gaia Paolini, Jerry Lanfear, John Overington, and Andrew Hopkins
14.2.1 Introduction
Medicinal chemists have learnt through the experience of many hundreds of screening campaigns in the pharmaceutical industry that for many targets small-molecule modulators have not yet been discovered, even when screened against a diverse chemical file of hundreds of thousands to millions of compounds. Even when the medicinal chemist is fortunate enough to discover a small-molecule modulator of the biological target of interest, it is common for many “lead” compounds to be unsuitable for optimization into prototype drugs. Chemical biologists may not require such optimized chemical tools but both the chemical biologist and the medicinal chemist can learn from each others experience in discovering chemical tools and leads. The failure of many screening campaigns to discover druglike leads or chemical tools against certain targets has lead to two competing hypotheses to explain and overcome this phenomenon. The first hypothesis is that the discovery of a chemical tool against a target is a function of the diversity of chemical space screen against the target, independent of the target: the diversity argument. The second hypothesis claims that the ability to discover a small-molecule modulator is an inherent property of the physicochemical topology of a biological target, independent of chemical space: the druggability argument. These constraints are more severe if the aim is to discover drugs that can be orally administered. The concept of druggability postulates that since the binding sites on biological molecules are complementary in terms of volume, topology, and physicochemical properties to their ligands, then only certain binding sites on putative drug targets are compatible with binding compounds having high affinity to compounds with “druglike” properties [l]. Furthermore, the concept also asserts that molecular recognition on biological targets, such as proteins, has evolved to be exquisitely specific at discrete sites on protein surfaces and creates stringent physicochemical limits that restrict the target set available to modulation by small molecules. The extension of this concept to a whole genome analysis leads to the identification of the druggable genome: the genes and their expressed proteomes predicted to be amenable to modulation by compounds compatible with druglike properties [2, 31. Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GrnbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
14.2 The Molecular Basis of Predicting Druggability
14.2.2 Chemical Properties of Drugs, Leads, and Tools
For in vitro or cellular experiments, the chemical biologists would require a compound to have a minimum set of physicochemical characteristics to ensure that the compound is within a range of solubility and polar/hydrophobic balance of properties that enable the tool to permeate the cell membrane and reach the site of action. For the medicinal chemist, the same principles apply but the great range of biological barriers that a drug needs to pass through to affect the biological system of a whole organism is far greater and thus reduces the molecular property range of chemical space. Lipinski introduced the concept of physicochemical property limits to the drugs, with respect to solubility and permeability of drugs from a seminal analysis of the Denvent World Drug Index, which demonstrated that orally administered drugs are far more likely to reside in areas of chemical space defined by a limited range of molecular properties. Lipinski’s analysis demonstrated that, 90% of orally absorbed drugs had molecular weights of less than 500 Da, less than 5 hydrogen-bond donors (such as the OH and NH group count), fewer than 10 hydrogen-bond acceptors (such as the total, combined nitrogen and oxygen atom count being 10 or less), and lipophilicity of logP of 5 or less [4]. The multiples of five observed in the molecular properties of drugs led to the coining of the term Lipinski’s rule-ofjive (Ro5). Since the work of Lipinski et al., various expansions of the definition, and methods to predict, “drug-likeness” have been proposed in the literature [4-161. The common thread emerging from the field is that drug-likeness is defined by a range of molecular properties and descriptors that can discriminate between drugs and nondrugs for such characteristics as oral absorption, aqueous solubility, and permeability. This is illustrated by the observation that the distribution of mean molecular properties of approved oral (small-molecule) drugs has changed little in the past 20 years, despite changes in the range of indications and targets [17].
14.2.3 Molecular Recognition is the Basis for Druggability
The molecular basis of the a priori druggability hypothesis is derived from the biophysical study of molecular recognition. The binding energy ( A G ) of a ligand to a molecular target (e.g., protein, RNA, DNA, carbohydrate) is defined in Eq. (1). A G = -RTlnK, = 1.4logK, (1) where R = gas constant = 1.986 cal mol-’ K-l The affinity of binding is predominately driven by the van der Waals components and entropy components of the binding energy by the burying
I
805
806
I of hydrophobic surfaces. Thus for a ligand, such as a drug molecule, to 14 Chemical Biology and Drug Discovery
bind with an affinity of Ki = 10 nM it requires a binding energy ( A G ) of -11 kcal mol-’. A lower affinity “hit” from a high-throughput screen of Ki = 1 pM affinity equates to 8.4 kcal mo1-l. Thus a 10-fold increase in potency is equivalent to 1.36 kcal mol-’ ofbinding energy. The binding energy potential of a ligand is, in general, proportional to the available surface area and its properties. The hydrophobic effect from the displacement of water and the van der Waals attractions between atoms contributes approximately 0.03 kcal mol-’ k 2Thus, . a ligand with a 10 nM dissociation constant would be required to bury 370 A’ of hydrophobic surface area, assuming that there are no strong ionic interactions between the protein and the ligand. Empirical analysis of nearly 50 000 biologically active druglike molecules reveals a linear correlation between molecular weight and molecular surface area (Fig. 14.2-1). The contribution of the hydrophobic surface to the binding energy is demonstrated by the phenomenon of the “magic methyl”, in which experienced medicinal chemists often observe that a single methyl group, judiciously placed, can increase ligand affinity by 10-fold, approximately equivalent to the maximal affinity per nonhydrogen atom [18].The accessible hydrophobic surface area of a methyl group is approximately 46 A’ (if one assumes that all of the hydrophobic surface area is encapsulated by the proteinbinding site and thus makes full contact with the target) with a hydrophobic effect of0.03 kcal mol-’ k 2 equal to 1.36 kcal mol-’ approximately, equivalent to the observed 10-fold affinity increase. In addition to the predominantly hydrophobic contribution to the binding of many drugs, ionic interactions, such as those found in zinc proteases (such as ACE inhibitors) contribute to the binding energy. The attraction of complementary polar groups contributes up to 0.1 kcal molt’ k 2with , ionic salt bridge approximately three times greater, allowing low-molecular-weight compounds to bind strongly. Unlike hydrophobic interactions, complementary polar interactions are dependent on the correct geometry. Thus encapsulated cavities are capable of binding low-molecular-weightcompounds with high affinities since they maximize the ratio of the surface area to the volume. Thus, the physicochemical characteristics of the binding site define the physical and chemical properties of the ligand. Therefore, a target needs a pocket that is either predefined or formed on binding by allosteric mechanisms. In general, thermodynamics and selection pressure play a part in reducing the accidental existence of such favorable pockets for ligand interactions. The thermodynamic argument contests that it costs energy to maintain an exposed hydrophobic pocket in an aqueous environment. Selection pressure may also increase the specificity of molecular recognition for ligand pockets to avoid inappropriate signaling or inactivation from the milieu of metabolite and small molecules in which cells are bathed. A quantitative approach is already well established for assessing the druglike properties of a small molecule. Could such a quantitative approach be
74.2 The Molecular Basis offredicting Druggability
Fig. 14.2-1 Relationship between molecular weight and molecular surface area. Analysis o f 49 456 biologically active, druglike compounds (1100 Da MW) with lCs0 <= 100 nM. Molecular weight was calculated from the chemical structures represented as desalted, canonical SMILES strings. The calculated molecular surface
area o f N, 0, P, and 5 atoms was estimated using the fast Ertl method [19] using a 2D approximation. All other atom types (excluding hydrogen atoms) were estimated using an overlapping spheres method. All calculations were performed using S Scitegic’s Pipeline Pilot (Sari Diego, CAI.
established for assessing the properties of proteins as drugs? The “ruleof-five’’is a set of properties to suggest which compounds are likely to show poor absorption or permeation, since such compounds are unlikely to show good oral bioavailability [4].Physicochemical constraints such as this, limit the type of proteins we see as drug targets; simply put, drug targets need to be able to bind compounds with complementary properties. Since a receptorbinding site must be complementary to a drug, it is reasonable to assume that equivalent rules could be developed to describe the physicochemical properties of binding sites with the potential to bind “rule-of-five” compliant molecules
I
807
808
14 Chemical Biology and Drug Discovery
I with a potent-binding constant (e.g.,
Ki < 100 nM). A number of properties complementary to the “rule-of-five’’can be calculated, for example, the surface area and volume of the pocket, hydrophobic and hydrophilic characters, and the curvature and shape of the pocket. Following the assumption that properties of the drug are complementary to those of the binding site, analysis of the calculated physicochemical properties of the putative drug-binding pocket on the target protein can provide an important guide to the medicinal chemist in predicting the likelihood of discovering a drug against the particular target site. On the basis of the known physicochemical properties of passively absorbed oral drugs, one would predict “druggable” binding sites to be predominately apolar cavities of 400-1000 A3, where over 65% of the pocket is buried or encapsulated, with an accessible hydrophobic surface area of at least 350 A2. Druggability predictions have been empirically explored using heteronuclear NMR (nuclear magnetic resonance) to identify and characterize the binding surfaces on protein by screening -10 000 low molecular molecules (average MW 220, average cLogP 1.5) [20]. Screening results from 23 proteins reveal that 90% of the ligands bind to sites known to be small molecule-ligandbinding sites. In the relatively small sample of proteins studied, Hajduk et al. noted a high correlation between experimental NMR hit rates and the ability to find high-affinity ligands. Only in 3 of the 23 proteins were distinct uncompetitive new binding sites were discovered. The authors’ postulated that these new sites could possibly play an unknown physiological role in the protein’s functions.
14.2.4 Estimatingthe Size of the Druggable Genome
Whilst our current knowledge may be limited in predicting a priori where uncompetitive allosteric-binding sites may appear from a protein sequence, we may be able to identify, at the sequence and structural levels, which targets are more likely to be potentially amenable to modulation by druglike small molecules from extrapolation of our current knowledge. Using the knowledge about proteins, to which current drugs and leads bind, we can infer the subset of the human genes and proteins that have a high probability of being potentially druggable, that is, capable of binding druglike small molecules with high affinity. Outlined below are a number of methodologies and approaches that have been used to infer the druggable portion of targets encoded by the human genome. In this paper, we have extended the work of Hopkins and Groom and attempted to estimate the size of the druggable human genome using three distinct methodologies: homology-based analysis from comprehensive survey of drugs and leads;
14.2 The Molecular Basis of Predicting Druggability
feature-based probabilistic druggability analysis; structure-based amenability analysis.
14.2.4.1
Initial Estimates
To gauge the number of possible drug targets in the human genome, one should begin with a survey of the knowledge of the current modes of action of existing drugs. In a review of the pharmacological literature, Drews [21, 221 identified 483 targets for known drugs. From this figure, Drews later estimated the number of ligand-binding domains as a measure of the number of potential points at which small-molecule therapeutic agents could be close to 10 000; however, the methodology of how these numbers were derived is not disclosed [23].
14.2.4.2
Hopkins and Groom’s Method
The first systematic survey of the druggable genome, following the publication of the draft human genome [24,25],was by Hopkins and Groom [2]. Hopkins and Groom attempted to identify the genes that produced potentially druggable proteins by their membership in druggable gene families. The explicit assumption of a gene family based analysis is that the conserved architecture of the druggable protein domain is likely to be conserved amongst related members of that domain’s gene family. Hopkins and Groom approached the problem in two stages. Firstly, a database of drug target sequences from a comprehensive survey of the literature and investigation of drug databases was compiled. Secondly, the constructed drug target sequence database was used to identify related members of a putative druggable gene family from the protein domain annotation of the translated human protein sequences. Hopkins and Groom’s analysis of the literature, the Investigational Drugs Database and the Pharma Projects database identify 399 nonredundant molecular targets shown to bind rule-of-fivecompliant compounds, with binding affinities below 10 pM. Whilst there is some degree of overlap with Drews’s work [21, 221, a significant amount of redundancy was observed in the initial study. In addition, a number of new proteins targeted by experimental drugs were captured. Likewise, some targets for biological agents, for which modulation by rule-of-fivecompliant compounds has not yet been shown, were eliminated from the survey. Nearly half of the targets fall into just six major gene families: GPCRs (G-protein coupled receptors), serine/threonine and tyrosine protein kinases superfamily, zinc metallopeptidases, serine proteases, nuclear hormone receptors, and phosphodiesterases. Ofthe 399 targets ofthe marketed and experimental drugs identified, 376 sequences could be assigned to 130 drug-binding domains, as captured by their InterPro domain annotation. Of these, 125 are domains with homologs and orthologs present in the human proteome. The sequence and functional similarities within a gene family assume a general conservation of binding site architecture between family
I
809
810
I members. The explicit assumption being that if one member of a gene 14 C h e m i c a l Biology and Drug Discovery
family is modulated by a drug molecule, other members of the family could also be able to bind a compound with similar physicochemical properties. Following the above logic, 3051 genes were identified as belonging to the 125 druggable InterPro domains and thus predicted to encode proteins that have some precedence for inferring their ability to bind druglike molecules. The Hopkins and Groom’s database identifies only 120 biological targets as the modes of action for marketed, rule-of-five compliant drugs, significantly less than the previous estimate that launched drugs that acted on 483 targets. Interestingly, of the vast majority of the drugs and leads identified in this survey, about 90% are competitive with endogenous ligands at a structurally defined binding site. This figure is similar to the rates of discovering new binding sites, as shown by Hajduk et al. [20] (AH, personal communication).
14.2.4.3
Orth et al. Update 2004
Orth et al. [2G] based an estimate on the druggable gene families on the InterPro domain assignments in the annotated gene-encoding loci of the 2004 release of the CCDS. The authors’ estimate the 3080 nonredundant geneencoded loci in the human genome predicted to be belonging to the druggable genome with over 2950 druggable gene sequences in public database.
14.2.4.4
Russ and Lampel’s Update 2005
Russ and Lampel [27] conducted an estimate on druggable genome based on the preliminary final assembly (Ensemble Release 35) of the human genome where 99% of the sequence has high quality cover. The authors found that PFAM protein domain annotation predicted fewer false positives than the InterPro classification used by Hopkins and Groom [2], estimating 3100 druggable genes from the previously defined set of druggable protein domains, approximately 2900 of which were predicted by both approaches. Of the 3100 predicted genes, 2600 are covered by the consensus CCDS annotation of the major genome databases. Extrapolation from the manual VEGA genome annotation databases (about 40% of total genome) leads the authors to a conservative estimate of around 2500 druggable genes. The authors consider these assessments from the highly confident gene prediction databases to be a lower conservative estimate of the size of the druggable genome.
14.2.4.5
Homology-based Analysis o f Drug Targets
To expand the homology analysis methodology for identifying which targets expressed from the human genome are likely to be druggable, it is necessary to expand our survey to identify all the known biological targets of drugs and lead
14.2 The Molecular Basis ofPredicting Druggability
compounds. Inpharmatica commissioned the construction of two databases, DrugStoreT"and StARLITe'", to accurately ascertain the number of biological targets modulated by drugs and preclinical medicinal chemistry compounds, respectively. Inpharmatica's Drugstore is a relational database relating all FDA approved drugs to their molecular targets and approved indication. From this analysis, we have identified 26000 drug products which reduce to 1783 unique new molecular entities (NMEs),ofwhich 1415 are small-molecule chemical entities, 180 are biological therapeutics (18 ofwhich are antibodies), and the remainder are vitamins and supplements. As drug discovery has been more target centric over the past two decades, in its research modus operandi, a key point of debate has been how many modes of action are acted upon by approved drugs? The first attempt to ascertain this number was by Drews, who estimated that known drugs acted on 483 targets - the source of the often quoted "500 targets" figures. Hopkins and Groom's analysis challenged this figure and suggested, irrespective of polypharmacology off-target effects, rule-of-five compliant (orally administered) approved drugs acted primarily on only 120 modes of action. A sequent analysis by Burgess and Golden proposed that all approved NMEs consisting of new chemical entities (NCEs) and new biological entities (NBEs) targeted 272 proteins [28-311. Here we propose, from the analysis of the DrugStore'" database, that all NME primarily act on 301 drug targets, of which 238 are human proteins and only 170 are human proteins targeted by small-molecule drugs (Table 14.2-1, Fig. 14.2-2). Biological drugs target 59 modes of action with the currently marketed antibody therapeutics acting on 15 human targets. Only nine targets are currently found to be modulated by both small-molecule and biological drugs. The remaining targets are predominately anti-infective drug targets. The drug target universe expands considerably if we expand our analysis to include biological targets for which medicinal chemists have developed smallmolecule leads. Unlike the bioinformatics community which has developed a wealth of public databases to assemble and disseminate protein and genomic sequences, medicinal chemistry structure-activity relationship (SAR) data is
Table 14.2-1 Molecular targets of approved drugs Class of drug target
Species
Targets of approved N M E s Targets of approved N M E s Targets of approved NCEs Targets of approved antibodies Targets of approved biologicals
All (anti-infectives and human) Human only Human Human All (anti-infectives and human)
Number of molecular targets
301 238 170 15 59
1
811
812
I
14 Chemical Biology and Drug Discovery
Fig. 14.2-2 Molecular targets o f currently FDA approved drugs (a) by number o f d r u g substances and (b) by number ofdrug target in gene family. Figures are derived from analysis o f 1606 active ingredients (25 024 approved products) Orange Book, Sept 2002.
not publicly available in a systematic database and is spread between company in-house data warehouse, peer-reviewed journal articles, and patents, often in formats not easily accessible to machine processing. To survey the universe of drug targets with known leads, Inpharmatica have created the StARLITe'" database of bioactive compounds by extracting structures, assays, targets, and SAR from the key medicinal chemistry journals (i.e., J . Med. Chem. 1980-2004, Bioorg. Med. Chem. Lett. 1990-2004) covering 350 000 compounds and 1275 000 assay points. The comprehensive survey of medical chemistry identifies 1155 known targets with at least one drug or lead compound with a binding affinity below 10 pM,707 of which are human molecular targets (Table 14.2-2, Fig. 14.2-3). Applying Lipinski's criteria to the compounds in the dataset (as represented as desalted, canonical SMILES strings) reveals 587 human proteins with at least one or more compounds, which complies with the "rule-of-five'' with a binding affinity more potent than 10 pM,which could be unambiguously identified and assigned to a protein sequence (Fig. 14.2-4).The extremely thorough analysis of the literature, represented in the StARLITe'"
71 10 20 149 59 12 20 54 52 20 11 60 45 188 67 15 99 101 34 68 1155
Total
Redundant ortholog targets (all species) t 1 0 KM
Aminergic GPCRs Aspartyl proteases Cysteine proteases Enzymes - others GPCRs class A - others GPCRs class B GPCRs class C Hydrolases Ion channels - ligand gated Ion channels - others Kinases - others Metalloproteases Nuclear hormone receptors Others Oxidoreductases PDEs Peptide GPCRs Protein kinases Serine proteases Transferases
Gene family
943
56 33 144 63 13 72 90 30 46
8
71 4 18 117 47 7 20 44 42 18
Ro5 Redundant ortholog targets (all species) t10 pM
Inpharmatica's StARLITe'" database and unambiguously assigned to a molecular target via a protein sequence
Table 14.2-2 Molecular targets with chemical leads and tools. Identified from the medicinal chemistry literature in
987
808
707
34 3 14 81 30 2 10 28 20 12 6
34 7 16 102 35 5 10 34 26 14 7 41 22 108 39 11 52 75 27 42 61 4 17 104 38 5 19 37 37 16 8 50 26 109 58 13 59 78 30 39 61 9 19 131 49 10 19 46 47 18 11 53 33 146 62 15 80 87 34 57
39 19 79 37 11 42 66 24 30 587
Ro5 Nonredundant human targets (10 p M
Nonredundant human targets t10 p M
Ro5 Redundant ortholog mammalian targets t 1 0 p M
Redundant ortholog mammalian targets <10 p M
A
...,
r"
-...
-_ -
2
9c
D
09
a
s.
%2
z.
9
m
>
2,
n
0
2 I
i u
814
I
14 Chemical Biology and Drug Discovery
Fig. 14.2-3 Gene Family distribution of nonredundant human proteins with small-molecule chemical leads with binding affinities t10 pM. Data derived from an analysis of Inpharmatica’s StARLITe’” database.
database, doubles over in size the number of identified proteins with existing lead matter. Using this larger database of drug targets, which show some precedent of modulation by small-molecule leads or drugs, we attempted to estimate the size of the potential druggable genome based on a homology of known drug targets. The underlying assumption in this analysis is that if one gene family member has shown the propensity to selectively bind small molecule modulates, other members of the gene family may significantly contain physical-chemical and architectural properties that are also likely to bind druglike small molecules. Proteins that have a similar sequence are generally likely to share very similar three dimensional properties and perform similar or related functions. If a protein therefore has a high degree of sequence similarity to the target of a drug (or other protein that is known to be
14.2 The Mo/ecu/ar Basis of Predicting Druggability
Fig. 14.2-4 Proportion oftargets with leads observed with at least one rule-of-five compliant compound within each gene family.
druggable) we predict that the protein is likely to be druggable too, if we believe the binding site architecture to be conserved, Where proteins are less closely related in sequence, it is more difficult to infer druggability. Relatively small differences in the binding site of a protein could have a large impact on its ability to bind small molecules. The authors recognize that this is a simplistic assumption and is likely, if anything, to over estimate the number of potential members of the predicted druggable subset of the human genome. For example many individual members of the gene family may bind distinct ligands, the molecular recognition properties of their respective binding sites could be significantly divergent. Using the BLAST sequence alignment algorithm to search each of the sequences against the human genome, we identified 945 distinct genes that show homology to the molecular targets of approved drugs at a cut off of 30% sequence identity and E value less than or equal to lo-’. Expanding the BLAST analysis to
I
815
816
I include human proteins from the known druglike leads from the StARLITe'" 14 Chemical Biology and Drug Discovety
database, identified a 2921 protein sequence within the same sequence identity cut-offs. In addition to using a sequence homology approach, we also approached the problem of identifying the druggable subset of the human proteome using a feature-based Bayesian method.
14.2.4.6
Feature-based Druggability Prediction
Drug targets, be they targets of small molecular weight drugs or protein therapeutics, may share common sequence-based features that are not necessarily detectable by overall sequence similarity. An alternative approach to using sequence-based similarity methods is to examine the presence of sequence-based features that are enriched in drug targets compared to that of the rest of the genome. A large set of over 100 protein properties and features were calculated for each sequence in the Drugstore database such as the number of transmembrane helices, signal peptides, isoelectric point, length distribution, percentage of helical structure, antigenicity, net charge at pH 7.4, domain complexity, subcellular localization, and so on. Features that were enriched in existing drug targets were retained and used to construct probabilistic Bayesian models for both small-molecule druggability prediction and protein therapeutic druggability prediction. The implementation of this Bayesian probabilistic scoring allows ranking of any portfolio of targets based on their predicted druggability. The major advantage of this approach is the independence of any prior knowledge about the examined protein, or homology to precedented target families. The Bayesian models also hold the advantage of being tunable to reflect specific gene families, or drug profiles. The probabilistic models were then used to rank all sequences from the human genome according to both small-molecule and protein druggability as predicted by the presence of druggable features in the protein sequence. The small-molecule model predicts 2325 gene products to be druggable with high confidence level (i.e., achieving scores comparable with those of existing targets).
14.2.4.7
Structure-based Druggability Analysis of PDB Structures
Following the hypothesis that druggable-binding sites can be predicted a priori, we have developed an algorithm to analyze the Protein Data Bank (PDB) for druggable-binding sites. Actual and putative ligand-binding sites were respectively identified either by virtue of the presence of a ligand in the crystal structure or by analysis of the surface of the protein structure. A range of physicochemical properties of the identified binding sites and cavities were calculated from the protein structures including volume, depth, curvature, accessibility, hydrophobic surface area, and polar surface area. The
14.2 The Molecular Bask ofpredicting Druggability
algorithm was a trained set against a test set of 400 protein complexes binding small-molecule, rule-of-five compliant ligands. From this analysis, a decision tree was derived to predict the druggability of a binding site or cavity from calculated physicochemical properties. The decision tree predicts whether a cavity is druggable within the statistical confidence levels of the tree. This method has demonstrated a91% success rate when predicting druggability on the protein drug targets (of oral drugs as defined in Inpharmatica’s Drugstore database of approved drugs). The method requires either an experimentally derived structure or a high quality homology model. Ideally, because of the inherent flexibility of many protein-ligand-binding sites, a sample of multiple conformations is preferred. The method is scalable to be employed on the entire PDB (December 2004 release). By removing short peptides, 27 409 files were suitable for analysis, which were further classified into 76 322 structural domains using SCOP [32] and DISCO base; of which 28% (21 522) of the structural domains were found to have at least one site predicted, to some degree, to be druggable. Because of the high redundancy in the PDB and the high number of ligand-protein complexes reduced to a nonredundant set of human targets, 427 proteins were predicted to contain a druggable-binding site, with 281 of these proteins having no prior known compounds or drugs developed against those targets. Structure-based druggability algorithms could be automatically applied to continuously assess the stream of novel structures determined by the structural genomic initiatives. Combining a nonredundant set of genes from all the following methods: current targets of approved drugs; current targets of chemical lead or chemical tool; sequence homology to current drug targets; sequence homology to current chemical lead targets; feature-based sequence probability prediction; structure-based prediction; sequence homology to structure-based prediction, that were outlined earlier we can identify a total of 3505 unique genes that are predicted with first- and second-order evidence and with high confidence level to encode small-molecule druggable proteins of which only 170 are the primary human targets for marketed drugs (Table 14.2-3).The results of this combined analysis concur with the previous result estimated by Hopkins and Groom [2] which shows that approximately 14% of the human genome could be inferred to be potentially druggable.
14.2.5 How Many Drug Targets are Accessible to Protein Therapeutics?
If, in our explorations, the proportion of the protein targets expressed by the human genome accessible to modulation by high affinity to druglike small
1
817
818
I
74 Chemical Biology and Drug Discovery
Table 14.2-3
Predictions ofthe size ofthe human druggable
genome Druggability prediction method
Targets of approved NCEs Sequence homology to NCE drug targets Targets of chemical leads with activities (binding affinities) below 10 pM Targets of Ro5 chemical leads with activities (binding affinities < =10 pM) Sequence homology to targets with chemical leads Feature-based druggability sequence probability prediction Structure-based prediction Sequence homology to proteins predicted druggable by structure-based method (high confidence level) Sequence homology to proteins predicted druggable by structure-based method (low confidence level) Predicted druggable genome (high confidence level)
Number of molecular targets
170 945 707 587 2921 2325 427 3541 6619 3505
;t
Unique druggable targets from combining drug targets targets with leads, homology to drugllead targets and structure-based prediction.
molecules is limited how much larger is the universe for drug targets if we expand our investigations to include targets of protein therapeutics such as antibodies and recombinant biologicals? At the time of writing, approved antibody therapeutics were known to act on 15 human targets whilst in total all biological drugs in the pharmacopeia currently work via 59 modes of action. Because of the inherently lower toxicity observed for fully humanized antibodies and the rising rate of biological approvals, it has been argued that antibodies may soon overtake NCE approvals [ 3 3 ] . Interestingly, it has also been observed by studying rates of attrition that antibodies acting against novel modes of action often show a higher chance of success in phase I1 clinical studies than small-molecule drugs acting on mechanisms of precedence [34-361. Thus, we attempted to estimate how many targets are accessible to biological drugs as the targets of antibody therapies. Other criteria, such as antigenicity are also important in developing inhibitory antibodies. However, these have not been considered in this analysis, as they are not common to both antibody and other protein drugs. To estimate the number of genes expressing products that could be accessible to antibody therapeutics, we assume that proteins are required to be located in the extracellular matrix. We also assume that the extracellular location is the union of secreted and transmembrane sets of proteins. Where the extracellular location is known, this is often included in Swiss-Prot and gene ontology (GO) [37] database annotation for the protein. Secreted proteins can be predicted by the presence of a signal peptide whilst transmembrane
14.2 The Molecular Basis of Predicting Druggability
domains can be identified by sequence property prediction. Analysis reveals 1384 genes predicted to encode secreted proteins with high confidence level (i.e.,predicted by multiple different methods). Ifthe confidence level is lowered (i.e., signal peptide predicted by single method) 6560 genes are predicted to be secreted. Our transmembrane analysis reveals that 973 genes are predicted by multiple methods to have transmembrane domains and be located at the plasma membrane whereby this number increases by 1407 genes which may be plasma membrane proteins when predicted only by a single method. Combining these results, we identified that the total number of extracellular proteins with high confidence levels is expressed by 2287 genes. The study was extended to identify proteins that have features similar to the current set of biological drug targets using the Bayesian probabilistic feature-based algorithm discussed above. Trained on the existing set of biological drug targets, 1637 gene products were predicted to be druggable via biological therapeutics with high confidence levels (i.e.,achieving scores comparable with those of existing protein targets). Therefore, the total number of genes predicted to encode protein therapeutic druggable proteins is 3258 equivalent to 13% of the gene in the human genome (Table 14.2-4). 14.2.6 Conclusion
From a comprehensive survey of the medicinal chemistry literature and by combining a variety of methodologies - sequence homology, structure-based, and feature-based - we have identified that approximately 3500 genes in the human genome are predicted to be accessible to modulation by high affinity to Table 14.2-4 Predictions of the number of genes i n the human
genome accessible to protein therapeutics (recombinant soluble proteins and antibodies) Druggability prediction method
Targets of approved antibodies Targets of approved biologicals Secreted protein (high confidence level) Secreted proteins (low confidence level) Transmembrane predictions (high confidence level) Transmembrane predictions (low confidence level) Unique, combined transmembrane, and secreted predictions (high confidence level) Feature-based biological target sequence probability prediction Total unique genes predicted to be accessible via biologic:a1 therapeutics
Number of molecular targets
15 59 1384 6560 973 1407 2287 1637 3258
1
819
820
I druglike small molecules: approximately 14% of the human genome. Of the 14 Chemical Biology and Drug Discovery
approximately 3500 human druggable genes, small-molecule chemical tools or leads (with binding affinities equal to or more potent that l O p M ) have already been identified that act on 707 of these and 170 are the primary targets
Fig. 14.2-5 Gene family distributions (a) small-molecule druggable genome (b) protein therapeutics.
14.2 The Molecular Basis ofpredicting Druggability
for approved, small-molecule drugs. While there may be many more proteins expressed by the human genome, which may be discovered to be modulated by small-molecule tools or drugs, the proteins identified as belonging to the subset known as the druggable genome represent those targets we can readily predict as having a higher confidence level of discovering a small-molecule chemical tool than the remaining genes in the genome. Since it was first proposed that the various physicochemical constraints on druglike chemicals would reduce the available target space, it has been suggested that accessible drug target space may expand considerably with the application of biologic drugs such as fully humanized antibodies. Protein therapie approved to date act via about 59 human targets, 18 ofthese are targeted by marketed antibodies. With the commercialization of recombinant protein production, the number of biological drugs receiving approval and being studied in the clinic is steadily rising. Several commentators predict that the rise of antibody therapies may challenge the premier position of small-molecule chemical entities as the dominant technology of medicines [ 3 3 ] . Our analysis of the proposition of the genome, potentially accessible to modulation by protein therapeutics such as antibodies, is around 13% with 3258 genes predicted to encode proteins druggable via protein therapeutics. Interestingly, 70% of all the drug targets are also predicted to be accessible to modulation by antibody therapy. Indeed, if we expand the analysis to compare the overlap between the antibody-accessible druggable genome and the small-molecule druggable genome, 1516 genes are predicted to encode proteins druggable by both small molecules and protein therapeutics; which is approximately 45% of our current estimate of the small-molecule druggable genome (Figs. 14.2-5 and 6).
Fig. 14.2-6 Overlap of antibody and small-molecule druggable universes.
I
821
822
I
14 Chemical Biology and Drug Discovery
Acknowledgments
We would like to thank Colin Groom (UCB Celltech, Cambridge, UK) for his long-standing contribution to this work. We also sincerely thank Edith Chan (Inpharmatica, London), Robin Spencer (Pfizer, Groton), Lee Beeley (Pharmamatters,Ramsgate), and Jonathan Mason (Pfizer, Sandwich) for their helpful discussions in the development of this work.
References 1.
2.
3.
4.
5.
A.L. Hopkins, C.R. Groom, Target analysis: a priori assessment of druggability, Ernst Schering Research Foundation Workshop,Berlin, 2003, 42. A.L. Hopkins, C.R. Groom, The Druggable Genome, Nat. Rev. Drug Discou. 2002, I , 727-730. J. Overington, Prioritizing the proteome: identifying pharmaceutically relevant targets, Drug Discov. Today 2002, 7, 516-521. C.A. Lipinski, F. Lombardo, B.W. Dominy, P.J. Feeney, Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Deliu. Rev. 1997, 23, 3-25. A. Ajay, W.P. Walters, M.A. Murcko, Can we learn to distinguish between “drug-like’’ and “nondrug-like” molecules?j. Med. Chem. 1998, 41,
Opin. Drug Discov. Devel. 2001, 4, 102-109. 10.
44,1841-1846. 11.
designing drug-like libraries: a novel computational approach for prediction of drug feasibility of compounds, J. Comb. Chem. 1999, I , 524-533. 7. W.P. Walters, A. Ajay, M.A. Murcko, Recognizing molecules with drug-like properties, Curr. Opin. C h e w Biol.
13.
14.
15.
16.
17.
1999,3,384-387. 8. C.A. Lipinski, Drug-like properties
and the causes of poor solubility and poor permeability, J . Pharmacol. Toxicol. Methods 2000, 44, 3-25. 9. B.L. Podlogar, I. Muegge, L.J. Brice, Computational methods to estimate drug development parameters, Curr.
D.F. Veber, S.R. Johnson, H.Y. Cheng, B.R. Smith, K.W. Ward, K.D. Kopple, Molecular properties that influence the oral bioavailability of drug candidates, J. Med. Chem. 2002,45, 2615-2623.
12.
33 14- 3324. 6. J. Wang, K. Ramnarayan, Towards
I. Muegge, S.L. Heald, D. Brittelli, Simple selection criteria for drug-like chemical matter, /. Med. Chem.2001,
J.R. Proudfoot, Drugs, leads, and drug-likeness: an analysis of some recently launched drugs, Bioorg. Med. Chem. Lett. 2002, 12, 1647-1650. W.P. Walters, M.A. Murcko, Prediction of ‘drug-likeness’,Adv. Drug Delivery Rev. 2002, 54, 255-271. W.J. Egan, W.P. Walters, M.A. Murcko, Guiding molecules towards drug-likeness, Curr. Opin. Drug Discov. Deuel. 2002, 5, 540-549. I. Muegge, Selection criteria for drug-like compounds, Med. Res. Rev. 2003, 23, 302-321. M.S. Lajiness, M. Vieth, J. Erickson, Molecular properties that influence oral drug-like behavior, Curr. Opin. Drug Discov. Devel. 2004, 7,470-477. M. Vieth, M.G.Siegel, R.E. Higgs, I.A. Watson, D.H. Robertson, K.A. Savin, P.A. Durst Hipskind, et al. Characteristic physical properties and structural fragments of marketed oral drugs, J. Med. Chem. 2004,47, 224-232.
18.
I.D. Kuntz, K. Chen, K.A. Sharp, P.A. Kollman, The maximal affinity of
References
19.
20.
21.
22.
23.
24.
25. 26.
27.
28.
ligands, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,9997-10002. P. Ertl, B. Rohde, P.Selzer, Fast calculation of molecular polar surface area as a sum of fragment based contributions and its application to the prediction of drug transport properties, J . Med. Chem. 2000, 43, 3714-3717. P.J. Hajduk, J.R. Huth, S.W. Fesik, Druggability Indices for protein targets derived from NMR-based screening data, 1.Med. Chem. 2005, 48,2518-2525. J . Drews, S. Ryser, Classic drug targets, Nat. Biotechnol. 1997, 15, 1318-1 319. J. Drews, Genomic sciences and the medicine of tomorrow, Nat. Biotechnol. 1996, 14, 1516-1518. J. Drews, Drug discovery: a historical perspective, Science 2000, 287, 1960-1964. E. Lander, Initial sequencing and analysis of the human genome, Nature 2001,409,860-921. J. Venter, The sequence of the human genome, Science 2001,1304-1351. A.P. Orth, S. Batalov, M. Perrone, S.K. Chanda, The promise of genomics to identify novel therapeutic targets, Expert Opin.Ther. Targets 2004, 8, 587-596. A.P. Russ, S. Lampel, The druggable genome Drug Discov. Today, 2005, 10(23-24), 1577-9. K. Davies, Cracking the ‘Druggable Genome’. Bio-IT world, 2002, http://www.bio-itworld.com/ archive/100902/firstbase.html.
C. Burgess, I. Golden, IBC Drug Discovery and Technology Conference, Curagen Corpo, Boston, 2002. 30. J.B. Golden, Prioritizing the human genome: knowledge management for drug discovery, Curr. Opin.Drug. Discov. Devel. 2003, 6,310-316. 31. J . Golden, Towards a tractable genome: knowledge management in drug discovery, Curr. Drug Discov. 2003,17-20. 32. A.C. Murzin, S.E. Brenner, T. Hubbard, C. Chothia, SCOP: a structural classification of proteins database for the investigation of sequences and structures, J . Mol. Biol. 1995, 274,536-540. 33. S. Arlington, S. Barnett, S. Hughes, J. Palo, Pharma 2010: The Threshold of Innovation, IBM Business Consulting Services, London, 2002. 34. A.K. Pavlou, J.M. Reichert, Recombinant protein therapeutics-success rates, market trends and values to 2010, Nat. Biotechnol. 2004, 22, 1513-1519. 35. J.M. Reichert, Protein therapeutic success rates increase with biotech advances. Tufts center for the study of drug development impact report 2005, 7. 36. Windhoven know they R&D enemy: the key to fighting attrition, In Vivo 2005. 37. G.O. Consortium, Creating the gene ontology resource: design and implementation, Genome Res. 2001, 1 1 , 1425-1433. 29.
I823
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim I825
15 Target Fami lies 15.1 The Target Family Approach
Hans Peter Nestler
Outlook
Chemical Biology strives to combine structural information about biological and chemical molecules to design and discover novel molecular entities to modulate biological processes. An integral concept is the clustering of proteins into target families based on their structural and functional similarities. In this chapter, we review the foundations of target families and highlight the application of this knowledge for the efficient use of synthesis and screening technologies to develop novel pharmaceutical agents. 15.1.1 Introduction
The sequencing of the human genome [l]marked the apex of the transformation of biology from an observational and descriptive activity to a hypothesisdriven science. With the information about the building blocks for cells, it is now possible to modulate and investigate the phenomenology of organisms at a molecular level. Drug discovery underwent, in parallel, a tremendous change from an empirical process driven by the experience of medicinal chemists that translated pharmacological effects to changes in molecules, to a knowledge-driven operation based on biochemistry, high-throughput synthesis and screening, and structure-driven drug design. Yet, in spite of this evolution, the productivity of the pharmaceutical industry has plummeted and 2004 saw the lowest number of new drugs in history, coming to the market. Soon after the sequences became available, discussions arose about how many of the approximately 27 000 genes that had been assigned [I]would be “druggable”, that Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber. Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
826
I is, their associated protein products could be modulated with small molecules 15 Target Families
in a directed fashion to achieve a desired therapeutic effect [2]. Considering that most novel therapies would rely on oral administration of drugs, these molecules have to fulfill the requirements to achieve suitable pharmacokinetic behavior. The most quoted and commonly used guidelines are Lipinski’s “rule-of-five” [ 3 ] and Veber’s “rotational bonds” [4]that have been based on a statistical analysis of marketed oral drugs. Taking such boundaries into account, it has been estimated that about 10- 15% ofthe human genome would be “druggable” [5].While this number may seem low, it should be considered that only one-third of these mechanisms are consciously addressed and a significant fraction of drugs, even those in development, still act through undefined molecular pathways. Furthermore, the hype around the sequencing of the genome and the assumed impact on drug discovery meanwhile has vanished as it was recognized that biological networks are too complex and redundant to allow control through one molecular dial. “Systems biology” tries to address this challenge by exploring the interactions of proteins and the resulting pathways of transferring biological signals and actions. “Chemical biology” is the matching complement in drug discovery that tries to synergize on structural relationships of proteins to efficiently address the druggable genome (Fig. 15.1-1)[GI. Similarities among protein structures have been investigated for a long time, covering all levels from primary (sequence)via secondary (domain folds) to tertiary (overall three-dimensional) structures. Investigations of tertiary structures help predict functional sites and roles for novel proteins and to understand enzyme mechanisms on a molecular level. Especially, through bioinformatic analysis it has been possible to identify homologous reaction mechanisms, even within proteins with lower sequence similarities and various biochemical activities [7, 81, as highlighted by the cases of leukotriene A4 hydrolase and angiotensin converting enzyme (a zinc metalloprotease) which are both inhibited by bestatin but have distinct biological roles [9]. Primary structure investigations have been of preferential interest for evolutionary analyses. Yet, these phylogenetic analyses have been crucial in defining target families, groups of proteins of pharmaceutical interest having a similar gene and therefore protein sequence. Kinases are the prototype of a target family as their active sites are structurally highly homogenous and bind the same cosubstrate, adenine triphosphate (ATP)(Fig. 15.1-1). Other gene families include G-proteincoupled receptors (GPCR),ion channels and transporters, and proteases although the structural diversity among these families is higher and therefore they group in structurally and mechanistically diverse subfamilies, such as cysteine or metalloproteases. “Chemical biology”, as we term these target family oriented concepts, has reshaped all the stages of drug discovery and today it is a widely used discovery paradigm in the pharmaceutical industry. The focus as well as the impact of using target family knowledge has definitely been on the early stages, from target identification via structural understanding through lead finding efforts. The later stage of the drug discovery process, the optimization of
IS.7 The Target Family Approach
Fig. 15.1-1
Distribution ofgenes in representative target families, drug candidates by target families and drugs by target families (data sources [l] and Phar m a p rojects ”) . W h ile a sign ifica nt fraction o f molecular targets is still unknown, CPCRs have been identified as the most prominent target o f drugs on the
market with a representation significantly higher than in the human genome. Analysis furthermore shows the upcoming o f kinases in drug discovery with a significant percentage o f drugs in clinical trials, while proteases and ion channels are represented according to their occurrence in the human genome.
lead compounds into drug candidates, is not as amenable to technological solutions that can be provided through target family concepts as the challenges become very specific for each lead series. Still, transferring insights and understanding compound interactions with targets and other proteins help avoid entering dead-end alleys of modification. As the impact of the latter aspects of chemical biology is hard to track (mostly due to the fact that no “what-would-have-happened-if control data are existing) and is best shown by anecdotal examples, we will focus on target family ideas that enable the early stages. We will demonstrate their application and applicability to representative target families, in this chapter. We will pay particular attention to the core aspect of “chemical biology”, the matching of chemical and biological spaces [6]. For a drug molecule to exert its pharmaceutical action, it is crucial that the molecular shape complements the cast offered by the target protein. This fact has been recognized first by Emil Fischer who phrased it as a “Key-Lock’’ ”
1
827
828
I principle [lo], being unaware of the dynamic and flexible nature of protein 15 Target Families
structures, and today we understand the interactions of two molecules more in a “Hand-Glove’’fashion with strong elements of induced fits [ll].We term the ensemble of available interaction shapes in the genome the “biological space”, while the “chemical space” is considered the ensemble of shapes offered by small molecules. With our structural understanding constantly evolving through molecular biology and crystallography,the efforts to rationally design matching chemical structures increased and led to successes in drug development, such as the HIV-protease inhibitors. Rational design depends on valid starting points and structure-activity relationship (SAR) and is quite powerful for the optimization and understanding of structural motifs that trigger activity and selectivity at the protein target level. Rational design suffers shortcomings when we attempt to address the challenges of finding novel starting structures for optimization. High-throughput screening efforts try to tackle this challenge by playing a high-number trial-and-error game. As the screening collections reflect the target history of the respective company, they often cover narrow aspects ofchemical space. Combinatorial chemistry claimed to fill the chemotype gaps in the collections and to cover the chemical space with diverse structures. Despite the tremendous number of compounds produced at the peak of combinatorial chemistry, the libraries fell short of the promise, as the libraries offered diversity around a point in space, thus densely populating this area but neglecting others completely. This effort can be imagined as putting a small rubber ball on the tip of a needle and trying to fill and represent a large lecture hall in this manner. Thus, compound libraries can be very powerful for exploring the match of chemical and biological spaces once an active compound has been identified. Unfortunately, combinatorial Chemistry was limited in its early years to a small repertoire of synthesis that could be used and therefore a limited structural diversity that could be addressed and started often from a biologically na‘ive structure, thereby populating unpromising areas of chemical space. We will revisit these aspects and attempts for resolving issues when we discuss the lead finding approaches, later in this chapter. 15.1.2 Understanding Biological Space
As mentioned above the key concept of “chemical biology” is the structural matching of chemical and biological spaces. Thus, the first important element must be the understanding of the biological space. Many questions have to be addressed: Which proteins cluster into families and are related to each other at a structural level?Which genes are expressed under which physiological setting and how do their levels respond to insults on the system? Are the expressed proteins functionally active, on which ligands do they exert their actions, and is their functioning dependent on their subcellular distribution? With the sequencing of the human genome, the blueprint of human physiology became
75.7 The Target Family Approach
accessible: All proteins can be enumerated at the gene level and classified on the basis of their sequence homology by bioinformatic tools [I].However, this comfortable straightforward picture is complicated by the fact that genes can be expressed in various forms, but the target family classifications hold up in a first approximation. In spite of the successes at the genomic and proteomic levels, the identification of novel protein targets for modulation does not proceed at the expected pace as the proteins do not act as isolated entities but as complexes in an almost overcrowded environment. To exert their biological effect, the individual entities enter into dynamic physical interactions with each other and our textbook knowledge about kinetics and thermodynamics does not necessarily stand up to the task because of the high concentration and high viscosity of the cytoplasmic space. Furthermore, the monitoring of gene expression and protein analysis does not reveal the complete picture about their respective binding partners. Today, we are still ignorant about many protein/proteinand protein/ligand complexes, such as GPCR agonist, ion-channel modulator, or protease-substrate pairs that associate and dissociate in a cell and are responsible for biological activity. Even in cases where we know the respective binding partners, we are a long way from understanding the structural basis and dynamics of these interactions. Structural biology methods such as crystallography and nucleous magnetic resonance (NMR) have taught us much about soluble proteins, such as kinases and proteases, but gaining structural insights about membrane-bound proteins such as GPCRs and ion channels, remains difficult. To date, only one structure for a bacterial GPCR and three for ion channels have been reported [12-151. We will discuss in this section the approaches to identify physiological and artificial ligands for proteins as well as to gain structural knowledge about their interactions.
15.1.2.1
Charting Biological Space - Structural Biology and Informatics
Chemical biology relies heavily on a structure-driven rationale to make lead finding efforts within target families more efficient and to anticipate cross-reactivities between target family members on the basis of structural similarities. After the sequencing of the human genome opened the way for comparing proteins with each other at a sequence level, attempts to correlate primary sequence to three-dimensional structure intensified especially for membrane-bound proteins to provide a counterpart to structural biological information available for soluble proteins. Sequence comparisons within the protein families of GPCRs [16, 171, ion channels [18],and kinases [19] yielded phylogenetic trees with functional and structural clustering according to ligand types, especially for the GPCR family [20]. While phylogenetic analyses give some functional and structural hints, the resolution of these analyses does not allow prediction or assignment of ligands or substrates. For families of soluble protein targets, the situation for gaining structural knowledge is quite comfortable. A wealth of crystal structures in free
I
829
830
I as well as inhibitor-bound forms is available for proteases and kinases, very 15 Target Families
often for a variety of ligands to each protein. This information is used intensely for inhibitor optimization purposes but also allows structural comparisons at a target family level.These analyses are based on structural overlays ofthe protein structures within the target families, respectively the subfamilies of proteases, and the affinity and repulsion of various small molecular probes, such as water or methanol, to the active site’s surface. The studies provide “target family landscapes” that show the relationships of the target family members at a structural level [21-231. The landscapes provide the tools necessary to understand the cross-reactivities of inhibitors with closely related proteins or to assess the likelihood of success for transforming an inhibitor for a particular target into an inhibitor for another family target (Fig. 15.1-2). Furthermore, they allow selection of closely related proteins as structural surrogates for those family members, where crystallographic information is not available. This so-called homology modeling is of crucial importance for understanding the structural space covered by membrane-bound proteins, such as GPCRs or ion channels. Using the rhodopsin GPCR structure [12] as a template and target family homology, it has been possible to get topological information about the binding sites for many GCPRs to foster an understanding of the binding modes of ligands [25].At a resolution of about 3.5 A, which can usually be achieved, it is possible to understand differential binding of ligands to the receptors and to rationalize their activation, as demonstrated by Goddard et al. in a homologous series of ketones activating the olfactory receptor 912-93. Furthermore, the differences in activation between mouse and human orthologs could be assigned to a Ser105/Gly105 mutation [26]. This study also points to an instrumental aspect for the structural modeling of membrane proteins. In addition to sequence homologies, ligand-binding strengths are used to refine the topologies and interactions. If combined with molecular dynamics, refinement of the loops connecting the transmembrane helices as demonstrated by the program PREDICT [27],the accuracy of the models becomes powerful enough to perform virtual screening and to discriminate between ligands and their binding modes [28, 291. In the ion-channel field, homology models can be based on three crystal structures of various potassium channels, two of which show the channel in the open [13, 141 and one in the closed state [IS]. Although ion channels are multimeric proteins and structurally more diverse than GPCRs, good models have become available using the three structures and ligand-activity information, as highlighted by the possibility of predicting hERG blocking activity of ligands [30, 311. The hERG channel is of general pharmacological interest as an antitarget, because blocking this channel can induce fatal cardiac fibrillation. Thus, most biological data is available and the homology-based models, even though they are built on the bacterial MthK channel [13], have meanwhile reached the same accuracy as models derived from SAR data [32]and can guide chemical optimization to achieve specificity of ligands. Beyond the prediction of ligand-binding, homology models help the functional analysis of ion channels. In a recent example, the gating of the
75.7 The Target Family Approach I831
Fig. 15.1-2 Assigning membership of a protein to a protein family and analyzing the structural relationships can be achieved by two major concepts. Starting from protein sequence information, the similarities o f the sequences can be investigated and proteins can be clustered in phylogenetic trees. These analyses were the basis o f the assignments o f target families as reported, for example, by Venter et al. [I]. At a higher resolution, such trees can also be generated within gene families. While these trees can provide information about the evolutionary relationships, the relations do not translate into structural similarities at a detailed level, as shown by the distribution of affinities toward various small molecule ligands
throughout the kinome [24]. To gain insight at the structural level, three-dimensional structures must be aligned and compared. The comparison involves studies o f interactions with various probes such as amides, carbonyl, or water. The proteins are positioned in a cube and the interaction o f the probes at various positions in the cubes is measured. The statistical analysis o f the interaction surfaces provides the dimensions for separating the proteins in structure-based landscape maps [21, 221. The protein relations within these maps reflect the affinity profiles toward small molecule ligands and can be used t o rationalize specificities.
832
I Kir6.2 channel by ATP could be explained at the atomic level [33]utilizing the 15 Target Families
structures ofthe open Kir3.1 channel [14] and the closed KirBacl.1 channel [15]. 15.1.2.2 Understanding Biological Machines - From Structure to Function
With the structural knowledge acquired, the second challenging aspect is to establish the biological relevance of target family members. Gene expression analysis is a very powerful tool to identify changes of gene regulation under various physiological and pathophysiological conditions. The mRNA levels in cells and tissues give indications about which proteins could be relevant for a specific biological response. However, where the genomic tools have been quite successful in identifying candidate targets in the GPCR and kinase families where, the activity of proteins is tightly correlated to the expression levels. There are other gene families, such as proteases, that are not regulated by gene expression levels. Proteases usually have rather constant expression levels as proenzymes over a broad range of physiological conditions and are activated irreversibly by proteolytic cleavage, a characteristic that is important for quick responses through activation cascades and discriminates them from other gene families. Thus, while we can deduct proprotease levels from the gene expression patterns, we cannot infer proteolytic activity levels from them. The fact points directly to a more important challenge that cannot be resolved at a genomic level: How do we find the interaction partners for our target proteins and what are the structural determinants of the interactions? Although phylogenetic analysis allows some classification in structural and functional terms, the question concerns all target families and the approaches used are termed deorphaning for GPCRs, phosphoproteomics for kinases, and substrate mapping for proteases. All processes require tedious work, but can be rewarding by yielding structural knowledge that can be employed in lead discovery and optimization. Orphan GPCRs are receptors without known agonistic or antagonistic ligands. As GPCRs are usually identified on the basis of sequence homologies, most of the GPCRs have no pharmacologic function or ligands associated at the time of their identification. To find such ligands and later on elicit a biological response, GPCRs are cloned and overexpressed with linkage to easily detectable reporter genes and are screened against a collection of known signal transmitters or dedicated libraries. Especially with the evolution of screening technology for GPCR, it is possible today to deorphan many GPCRs, either with their endogenous ligands or synthetic analogs. The identified ligands give insight into the structural requirements for binding, information that can be used to refine the above-mentioned homology models, and can be used as tools to elucidate the biological functions [34]. Fortunately, the endeavor of deorphaning GPCRs is supported by the existence of many GPCR targeted drugs. As GPCRs are the endogenous targets to be addressed, because they can be addressed extracellularly, the majority of drugs in the market are directed to GPCRs (see also Fig. 15.1-1).These compounds can be applied to modulate
15. I The Target Family Approach
GPCR action and it is a valid assumption that many of the “orphan drugs” will show to be GPCR modulators, thus expanding the toolchest of deorphaning agents. For kinases and proteases, the search for substrates may seem more straightforward, as these enzymes act on and transform other proteins. Phosphoproteomics has been established for kinases to identify interaction partners at the protein level on a genomic scale [35]. Basically, cell cultures are incubated with 32P-ATPand the cellular extracts are analyzed by twodimensional gel electrophoresis. As all kinases can use ATP as a substrate, the phosphorylation patterns become very complex and do not point to an individual kinase. To achieve specificity in detection and to avoid the heavy use of radioisotopes, antibodies reacting to the phosphorylated proteins are required. While nonspecific phosphoserine or tyrosine recognizing antibodies are available, they pose the same challenge deconvoluting the specific phosphorylation of one substrate by a specific kinase. Sequence-specific antibodies can be raised against the phosphorylated peptide epitope [ 361. To identify the epitopes, combinatorial peptide libraries are incubated with purified kinases and 32P-ATP.The phosphorylated peptides can be identified by microradiography and Edman degradation [37, 381 and can be used for raising the antibodies. The gained sequence information could be applied for designing selective inhibitors addressing the substrate-bindingpockets instead of the ATP site, an approach that is currently not followed, as the peptidebinding sites are not as distinct as for proteases. While antibodies reveal information on the phosphorylation state of a protein, it remains unclear which kinase is responsible for the phosphorylation at a specific position. In a complementary approach, Shokat et al. were able to track phosphorylation substrates for individual kinases, using kinases with an extended ATP-binding site and a bulky ATP derivative. As only the mutated kinases are able to use the bulky ATP analog, only the substrates of this kinase will be phosphorylated at the specific phosphorylation sites [39]. Taking the information from all these approaches together, it is possible to decipher the signaling pathways of the kinome and to derive structural insights from the substrate sequences, which could be translated into inhibitors and drugs. Tracking protease activity remains one of the major challenges. As mentioned earlier, gene expression levels do not correlate tightly with the activity of a protease and even monitoring tools like in situ hybridization cannot elucidate the protease activity in tissues or cellular systems, as the antibodies employed do not often discriminate between the proenzyme and activated proteases. Recently, efforts to image protease activity in a cell have led to activity labeling probes, that act as suicide substrates and lead to fluorescent tagging of the active site of active proteases [40].Currently, this technology is limited to proteases that allow for covalent attachment of the probes, namely, serine and cysteine proteases that act through a nucleophilic substitution, and it does not reveal the proteins that are cleaved by the protease. Unfortunately, straightforward labeling approaches as for kinases are not suitable, as no
1
833
834
I additional moieties are introduced. Therefore, alternate approaches based 15 Target Families
on two-dimensional gel electrophoresis have been devised that allow either the differential labeling of substrates or utilize the differential mobility of substrates and cleavage products after digestion. The identities of the proteins are determined by mass spectrometry and sequence analyses, although these technologies do not reach a resolution that would allow determination of the characteristics of the protease selectivity pockets. For the first approach, cell extracts are divided into two parts and in each portion the proteins are labeled with a fluorescent dye, using different dyes for the portions. One fraction is subjected to proteolytic digestion, while the other fraction remains untreated. After mixing of the portions and electrophoretic separation, substrates can be identified through the varying color of the spots [34]. In the second approach, a cellular extract is separated by electrophoresis in one dimension. After proteolytic digestion in the gel, the protein mixture is separated in the second dimension where the cleavage products show a different mobility from the parent proteins [41].While the first approach allows for analysis at a proteomic level and under various conditions, the latter approach allows a direct correlation of the cleavage peptides to the parent proteins. We use the insights into the biological space of the target families to select screening collections as well as to define specificity requirements for target family members to build appropriate profiling panels. To gain a more detailed insight into the structural parameters controlling substrate selections, peptide libraries have been used intensely. Proteolytic digestion of such libraries that commonly contain hexato octapeptides returns ensembles of peptide substrates [42].These substrate ensembles carry pharmacophoric information of the substrate pockets as well as on the specificity of these pockets. Together with the knowledge about the preferred p-strand geometry of protease inhibitors [43] and the ensuing privileged scaffolds, this information can guide protease inhibitor design.
15.1.3 Exploring Chemical Space
As mentioned in the introduction,the expedition through biological space with small molecules has gone through several stages, swiveling between post- and presynthesis selection of chemical structures [44]. From a purely empirical level led by phenomenological studies without guidance from structural information, through a phase of strong desire to rationally design drugs via the high-number trial-and-error games of high-throughput screening and combinatorial chemistry, we have reached today a stage where chemical biology strives to integrate knowledge and technologies in the quest of finding novel starting points for biological space exploration (Fig. 15.1-3). The achievements of the past are not forgotten, but are used today in a biologically conscious combination, which is exemplified in the novel lead discovery approaches that were established in the last 5 years.
15. I The Target family Approach
Fig. 15.1-3 Schematic visualization o f the various concepts to address chemical and biological space (shaded areas) in drug discovery. Medicinal chemistry focused on compound series (red dots) that had shown activity in pharmacological assays and compound optimization was driven by a tight feedback from biological experiments, leading to a focused nonarrayed addressing of chemical space. The combinatorial promise was t o systemically explore the chemical space with diverse arrays o f compounds (blue dots) to find the suitable starting points. Analysis o f combinatorial chemistry libraries showed their limited
15.1.3.1
diversity and often mismatch to biological space. Chemical biology approaches combine the technologies established for array synthesis with choosing appropriate starting points for the libraries. Focused libraries start from known active compounds. Scaffold hopping (blue arrows) and morphing (green arrows) attempts evolve known structures by searching for close neighbors or by combination of elements o f two compound series. Fragment approaches identify chemical motifs with biological activity that can provide novel starting points (flags) for arrayed synthesis.
Building on the Established - Privileged Scaffolds
Combinatorial chemistry had raised the expectations of solving the challenge of making the complete chemical space available for testing. Yet, it was quickly realized that this hope was futile. Calculating the numbers of possible chemical structures that would be considered druglike, for example, based on
I
835
836
I carbon, hydrogen, nitrogen, oxygen, sulfur, and phosphorus with molecular 15 Target Families
weights below 500, estimates reached ballpark figures of 10'' [45]. Even if we assume that we could represent this space through 1%of the structures, an estimate that is made often for representative selections from compound sets, we are still looking at structures. The material requirements for a single representation of each structure go beyond the resources available in the known universe. Besides the disillusioning caused by the numbers, it was soon recognized that compounds from combinatorial libraries were often inactive or poorly active on biological molecules unless they were derived from known active compounds. The structures were based on chemical feasibility and therefore densely populated the regions of chemical space offered by the scaffolds. With the insight that combinatorial libraries would not be capable of addressing the biological space and would even fall way short of filling the chemical space even within the boundary of molecular weights below 500, the utilization of combinatorial chemistry and parallel synthesis shifted from a diversity approach to densely populating chemical space around proven starting points, compounds with documented biological activity. The literature and database on marketed drugs provide many of these starting points. The analysis of drugs in the market and development revealed that a limited set of 32 frameworks formed the basis of more than 50% of the marketed drugs [4G].Although this analysis, like all retrospective studies, may be biased toward GPCR activity modulators that represent a significant fraction of drugs in the market, the study underlines two aspects. First, up-todate we have explored only a very limited subset of chemical space in our drug discovery efforts, but remaining within this space makes us quite successful. Secondly, nature may not be as structurally creative and tolerant as it has been assumed and therefore biological space may be not as diverse as envisioned. Beyond these points, the bias toward GPCR ligand may not be as limiting as it may seem, as GPCR through their subfamilies are binding a variety of structural motifs, such as nucleotides, lipids, and peptides, and small molecule ligands like nicotinic acid or dopamine [47]. These ligand types are actually shared with other target families and therefore the structural motifs from the drugs in the market can be transferred to drug discovery of other target families that may seem unrelated at first glance, such as nucleotide mimics for kinases and peptide mimics for proteases. Although we are using a target family approach, molecular frameworks may be the uniting concept between target families, a fact underlining the importance of structural analysis and knowledge gathering discussed earlier. These insights have reshaped our thinking about library synthesis and highthroughput screening and lead to the concept of focused target family libraries to improve screening efficiency. Focused screening sets provide, if constructed appropriately, multiple advantages. Firstly, they reduce the cost and efforts of screening campaigns and address the throughput limitations of some assay types. Second, high-quality activity data are gathered from the beginning as the smaller compound numbers allow measuring of multiple data points per
75.I The Target Family Approach
compound and thus reduce false positive and negative occurrence. Third, they provide higher hit rates and thus SAR from the initial screening and provide guidance for chemical programs directly. Yet, a delicate balance between focused screening and the chance for serendipity remains to be maintained, especially to address the challenge of discovering novel chemotypes that enable securing an intellectual property position and exploring novel interfaces of chemical and biological spaces.
15.1.3.2
A Journeythrough Chemical Space - Focused Libraries and Scaffold Hopping
The heavy use of privileged scaffolds leads to an incestuous reinvestigation of established structures. While this may be advantageous for efficiently optimizing lead structures toward drug candidates as we are moving on known terrain, it also limits our ability to resolve old issues or to find new activities. It is commonly understood that similar chemical structures elicit similar biological responses and we base our optimization strategies on this concept [48]. Yet, the investigation of target families and the ensuing structural investigations highlight one pitfall: If two similar molecules cause a similar response on the target, then we have to assume that two structurally similar targets respond to a molecule in a similar way. Especially for kinase, the prototypic target family, we observe this phenomenon with significant activities of one compound on several kinases. Most known kinase inhibitors act as competitors of ATP, the universal cosubstrate for all kinases, and therefore frequent hitters are quite common in high-throughput screening of kinase inhibitors [49]. In a recent investigation, the binding affinities of 20 structurally diverse kinase inhibitors that are in clinical trials or marketed drugs were investigated against a panel of 113 kinases distributed across the kinome. The study highlights that even “selectivity”-optimized kinase inhibitors are a long way from being selective and hit targets across the kinome [24]. The kinome maps are phylogenetic trees based on sequence similarities, and we have already discussed the shortcomings of phylogenetic analysis for highresolution structural grouping. Inhibition profiles of series of compounds can give us guidance for structural clustering of kinases that is necessary to devise selective and potent inhibitors [23].Taking the structural similarities of proteins and especially their ligand-binding sites one step further, we realize that kinases are not the only proteins interacting with nucleotides, such as ATP. A large group of GPCRs binds to nucleotides and their modulators bear strong structural similarities to kinase inhibitors. Their scaffolds are interchangeable and the activity of kinase inhibitors is often observed on nucleotide-binding GPCRs, most likely being an additional factor of side effects ofkinase inhibitors observed in physiological settings. However, as the nucleotide-binding sites of GCPRs are structurally more diverse, the problem of cross-reactivities are not as pronounced, and other GPCR subfamilies do not suffer as strongly from ligand promiscuity.
I
837
838
I
15 Target Families
As we have gained more and more structural insights, the rational design of lead structures and the virtual screening of compound collections or even virtual compound collections have gained tremendous importance. While the methods have become more sophisticated over the years, the challenges of making extrapolations from known chemotypes and data, remain. With the advent of combinatorial chemistry molecular diversity was one of the predominant themes. Although many measures for diversity have been devised, the “Tanimoto” coefficient being the most renown, the results depend heavily on the descriptors used to span the chemical space. Furthermore, coming from a structural diversity assessment the measures do not reflect the diversity with respect to the targets. Until today, the development and selection of suitable descriptors for the chemical space remains a challenge: An exhaustive enumeration of molecules in the “druglike” space is not feasible, therefore all the descriptor sets in use focus on specific applications and pharmacophoric subregions of chemical space. The use of the above-mentioned “privileged fragments” as virtual building blocks for the enumeration of structures, constitutes one approach that has proven useful for the design of target family oriented libraries (Fig. 15.1-4). Utilizing these scaffolds, for example, fused heteroaromatic cores for kinases and nucleotidebinding GPCRs, offers the ability to target the libraries toward the respective protein families and ensures the stability of the computational methods through the similarity of the generated structures. The targeted libraries usually represent 200- 1000 compounds around a given scaffold, giving a high certainty in assessing whether the elaborated chemotype is suitable for a given target or target family. The privileged fragments mimic in most cases the natural ligands. This makes kinases and nucleotide-binding GCPRs quite suitable to this approach and the scaffolds used cannot deny their pedigree. In addition to these ATP mimetics, kinases accept another class of ligands, “hingebinders”, out of their catalytically inactive conformation. This conformation has been termed DFG-out conformation, due to the observed orientation of a loop containing the amino acid triplet aspartate-phenylalanine-glycine. This binding mode was unexpected but is used by many selective kinase inhibitors, such as Gleevec. The other subclasses of GPCRs, such as amine or peptide-binding GPCRs, accept tertiary amines or dipeptide ligand mimics. Peptidomimetic approaches are used heavily to build protease scaffolds. Selective protease inhibitors are quite straightforward to be obtained because of the substrate variety and specificity of the proteases. However, the concept of privileged scaffolds does not carry far. The unifying element in protease substrates is the extended p-strand conformation that allows interactions with four to six subpockets in the protease active site [43]. Mimics for this conformation have been developed but they still lack universal applicability. Unlike the scaffolds for kinase or GPCR ligands, the cores of protease inhibitors, like the peptidic backbone in the substrate, do not contribute the majority of binding energy, and are therefore not crucial for
15. I The Target Family Approach
Fig. 15.1-4 Examples of privileged scaffolds and their relation to target families. Drugs and compounds in discovery tend to mimic the natural ligands. For example, protease inhibitors, although they are diverse in structure, mimic the p-strand peptide conformation common to all protease substrates t o orient small groups into the binding pockets. Some ofthem, in addition, address the catalytic residues with covalent binding (e.g., Pranalcasan or GW 311616) or by mimicking the transition state of proteolysis, like saquinavir. Even protease inhibitors not reaching through the catalytic triad use cyclic scaffolds to achieve and extend conformation, like the factor Xa inhibitor [50]. With this characteristic, the structures come close to those o f peptide-binding GPCR antagonists, as
shown in the example o f the neuropeptide Y antagonist. Structural orientation through cyclic structures is also used in more compact ligands, like the CB1 antagonist rimonabant, where the template establishes the correct orientation o f the lipophilic residues to the tertiary amine. The same scheme helps position the two lipophilic residues ofthe MAP p38 kinase inhibitor relative t o the ATP mimicking pyrimidine amine. Inhibitors binding to the open conformation o f kinases, like gleevec or iressa, show a more extended shape. In addition to addressing some ATP interactions in the active site, they also reach into the peptide substrate region and therefore have to mimic the strand conformation o f phosphorylated peptides as well.
I
839
840
I affinity to the target (although they may severely affect the pharmacokinetic 15 Target Families
properties of the inhibitors). The energetic drivers of proteaselinhibitor binding are the interactions in the subpockets, determining activity and selectivity of the inhibitors. Recently, these pockets were probed directly with molecular fragments that are linked to each other upon showing affinity to the targets. These fragment-based approaches will be discussed below. As mentioned before, the design based on privileged scaffolds has an impact on the novelty of the discovered molecules. This problem is augmented by the fact that the virtual screening tools that are used today tend to favor intrapolation to known or closely similar structures over extrapolation to novel scaffolds. To circumvent these issues, which became especially prominent in kinase-directed drug discovery, the concepts of “scaffold hopping” or “scaffold morphing” are applied (Fig. 15.1-5).In both exercises, the matrices comparing inhibition or affinity across targets and compounds [23,24,49,52] are crucial to support the selection of appropriate starting points. Scaffoldhopping describes
I
I
Scaffold hopping-major variations discovered in silico
Starting probe (described active against 5-HT3a in MDDR) ~
~
Scaffold morphing in biology-structural variation to modulate function
w -& ‘5., N\ 1
oA
acetylEpibatidine choline receptor (poison frog)
Lo+ OH
\
very fast Anatoxin deathAfactor (cyanobacteria)
\
o
W:L
Fig. 15.1-5 In silico scaffold hopping and biological scaffold morphing. Starting from a bioactive probe reported as active against the 5HT3A receptor in the MDDR, about 120 000 records ofthe MDDR were searched using relaxed similarity requirements. The discovered chemotypes provide novel ideas for chemistry [Sl]. The bicyclic structure evolved to address multiple targets for
\ I
Atropin muscarinic cholinergic receptor antagonist (plant alkaloid) Cocaine dopamine receptor antagonist (plant alkaloid)
0
biological defense. While cyanobacteria as monocellular organisms use only cytotoxity for defending themselves, multicellular organisms have fine-tuned the activity of tropane-like molecules to affect the central nervous system of natural enemies, while at the same time being resistant to the poisons. Yet the successful bicyclic amine was maintained as a core ofthe molecule.
IS. I The Target Family Approach
a virtual screening technique that uses rather loose similarity boundaries. To assess the similarity of molecules, Boolean strings of structural property descriptors, so-called fingerprints, are compared with each other. One of the metrics of similarity is the “Tanimoto coefficient”, which compares numbers or present and absent bits in the strings to the total number of bits set. If the fingerprints are based on two-dimensional structural descriptors, like frameworks and small fragments, compounds with a Tanimoto coefficient of larger than 85% are usually considered similar. At this similarity level, compounds retrieved from database similarity searches are expected to be active on the same target [53]. For scaffold hopping, the similarity boundaries are loosened to a 60-70% level compared to the starting structures. The resulting compounds are clustered according to the similarity of their scaffolds. These scaffold clusters are investigated and if found active used for compound library design. Usually, the resulting compounds carry structural elements of the starting molecules, which also serve as anchor in the target protein [51, 541. The description of molecules in two dimensions basically reflects the connectivity of the atoms, which is useful for fast searches of large databases starting from known structures. However, the interaction with the biological target occurs in a three-dimensional space and currently, we assume that the target recognizes more properties of the ligand than individual atoms. Thus, virtual screening is often performed using 3D-pharmacophore models. These pharmacophore models are rather straightforward to derive, if detailed structural information on the target is available. The structural information is then translated in a cast that is used to select fitting molecules. The shapes and, especially, electrostatic properties can be refined by information on ligands to the target proteins. For kinases, the addition of the shapes provides an enveloping shape for the ATP-binding pocket that can be addressed through screening [XI. Using ligand information for the building of pharmacophore models comes especially into play when little or no structural information on the target is available, such as for GPCRs and ion channels. We discussed above the approaches for structural prediction based on sequence similarity, but in reality virtual screening of pharmacophores derived from ligand-activity relationships are providing more accurate information. The need for ligand-target information is addressed by databases that collect and consolidate information from the literature in a target family oriented fashion. Under the target family paradigm, the crystallographic information for GPCR and ion channels discussed above 112-151 is used to template the pharmacophore models. Although these ligand-based models are more challenging to build and are not as accurate as the models based on structural information, they are - in part also because of their fuzziness - quite useful in scaffold-hopping approaches [55]. The observation that structural elements are conserved even through changing scaffolds led to the idea of “scaffold morphing”. Several scaffolds with proven activities are overlaid and combined to yield novel chemotypes. In addition to generating novel chemical matter, it is hoped to combine favorable properties from the individual scaffolds while loosing the undesired
I
841
842
I characteristics in the process. Scaffold morphing is not unknown to medicinal 15 Target Families
chemistry and biological evolution. If we recall the previous discussion on protein homology and phylogeny in this chapter, we realize that nature uses combinations of functioning domains to provide novel three-dimensional structures. The best domain combinations survive the evolutionary pressure. Thus, it should not really come as a surprise that we find small molecule motifs repeated with minor modifications in various natural products. Once a successful scaffold was selected the biochemical synthesis pathway had an evolutionary advantage and propagated itself into various organisms. Medicinal chemistry uses iterative modification of bioactive structures in its efforts to provide selective and pharmacokinetically optimized compounds, once a suitable starting point for variation has been found. Although thus established for a while experimentally, the adaptation of “scaffold morphing” ideas and algorithms to lead finding and virtual drug discovery has been tackled only recently and the success of generating structural diversity for finding novel starting points and entering novel regions of chemical space remains to be evaluated. The observation of “privileged fragments” across target families in literature as well as in the discussed virtual screening approaches, led to novel screening approaches that investigate the interactions of such fragments instead of “full-size’’ligands with their protein counterparts.
15.1.3.3 Putting the Pieces Together - Fragment Approaches
For a long time it was thought to be impossible to detect the interaction of small molecular fragments with target proteins, as the energetic determinants of small fragments binding to a protein surface or pocket were believed to work against high affinity interactions. Studying protein structures and the energetics of protein-ligand interactions leads to a different perception: First, it should be possible to identify weakly binding molecules by measuring the affinity of a ligand to the protein instead of attempting to influence the biochemical behavior in competition to natural ligands, as these molecules only have to interact in a two-way equilibrium instead of a three-way competition. Second, the required molecular size for ligand protein interactions in defined pockets has been overestimated. A recent study by Kuntz et al. shows that even small molecules can form tight complexes with proteins. Each heavy atom can contribute as much as 1.5 kcal mol-I in binding energy or a 10-foldincrease in affinity [56]. Third, it is not so much the enthalpy contributions but entropic aspects that determine the suitability of fragments to serve as anchors for lead optimization. “Molecular anchors” show an energetic “stability gap” between the best binding conformation and the second-best binding mode.
15. I
The Target Family Approach
Promiscuously binding fragments show a more or less continuous distribution of energy levels for different interactions of fragment and protein [57]. Consequently, even a molecule fragment with as little as 10-12 heavy atoms could theoretically lead to a nanomolar inhibitor or ligand. The small size may even prove advantageous as the detrimental effects of molecule parts bumping into the protein surface could be avoided. However, as the surface area addressed through the ligand determines the binding energy, the topology of the protein surface will be of crucial importance and will bias the applicability of this technology to enzymes. In fact, most of the approaches are directed to the deep specificity pockets of proteases or address the ATP site of kinases. Recently, several reviews have summarized some of the successes from the chemical point of view [58-601; so, we will highlight here only some examples that are illustrative of the use in target families. Two different concepts are currently followed for affinity screening approaches. One focuses on optimizing the throughput for detecting interactions and employs mass spectrometry or surface plasmon resonance (SPR) technologies, establishing structural insights only in second level experiments. Although several approaches have been described to use mass spectrometry in affinity screening, the most promising concept couples the equilibration with a brief size exclusion chromatography to remove unbound library members before determining the ligands bound to target proteins by mass spectrometry [61]. A family experiment using several J N K kinases thus provided selective inhibitors with nanomolar activities and molecular weights starting at around 350 Da [62]. As the removal of unbound compounds relies on the size difference between small molecules and proteins, the approach has also been shown to be quite powerful for screening membrane-bound proteins that are captured in micelles. In a pilot study, GPCR aggregates provide a high molecular size during separation from the small unbound ligands and allow identifying of ligands to the M 2 receptor [63]. SPR, another established methodology for quantifying protein-protein interactions, suffered for a long time from the slow speed because long equilibration times are required before the readout. Additionally, the detection of the interactions of small molecules with proteins seemed to be impossible as the SPR signal correlates with the increase in layer thickness. Small molecules lead to only a small change in layer thickness but improvements in technology meanwhile allow the measurement ofweaker affinities. The breakthrough ofusing SPR for affinity screening came with the capability to combinatorially synthesize small ligands conjugated to surface attachment tags. These conjugates can be spotted in arrays on the SPR detection [64]. Recently, the search for fragments binding to the S1-specificity pocket of the serine protease factor VIIa yielded haloaromatic moieties that can be substituted for the well-known but undesired benzamidine as anchor. Haloaromatic moieties were known as ligands to the benzamidine binding S1 pockets of the S1-clan serine proteases, such as factor Xa or thrombin. This
1
843
844
I knowledge guided the design for a library of approximately 1500 small-size 15 Target Families
fragments, which were immobilized on a microarray. Afinity screening with factor VIIa identified several small ligands, and their interaction in the S1 pocket could be confirmed by crystallography using trypsin as a surrogate for faster crystallographic screening and reconfirmation of the binding in factor VIIa [G5]. The second line of concepts for fragment screening tries to extract as much structural information from the initial interaction experiment and relies on either N M R or crystallography, paying for the increased information content with a limitation in throughput. The door to these experiments was pushed wide open when Fesik and colleagues reported the successful screening of small molecular fragments against the S 1’ pocket of stromelysin (matrix metalloprotease 3 ) . Using biaryl systems, they could show that the resonances in the NMR spectra shifted when the molecule fragments bound to the protein (Fig. 15.14).Conjugating these fragments with hydroxamic acid, a potent zinc chelator, provided compounds with nanomolar affinities [GG]. The elegance and potential of fragment-based screening approaches was underlined by a detailed investigation of the thermodynamics of the interactions [G7]. Using the NMR-fragment screening but another fragment set, high affinity inhibitors with novel structural motifs were discovered from a small set of fragments for urokinase, wherein the deep S1-specificity pocket served as an anchoring point, for the ligands [G8]. Starting from these anchors the ligands grew into the S2, S3, and S4 pockets of the enzyme. Biochemical data as well as structural information from NMR experiments guided the optimization toward selective and nanomolar inhibitors [G9]. Fragment-based screening for kinase ligands takes a slightly similar approach as kinase inhibitors do not have to bridge several pockets. They can be grown from a central scaffold into some side pockets of the ATP-binding site to improve selectivity and activity. The growth of inhibitors has been demonstrated in the case of growing nonnucleotide binders into the nucleotide-binding pocket of adenosine kinase [70]. In addition to being used for confirmation and investigation of binding modes, crystallography has recently been established as a screening tool. The technological advances in computing power and structure solving algorithms allow the soaking and high-throughput crystallography of compound libraries [71-731. The intriguing aspect of seeing the ligand’s orientation directly as a screening result, may counterbalance a higher false negative rate caused by ligands cracking the crystals or by ligands not being able to penetrate the protein because of restricted conformational flexibility in the crystal. Fragments to be screened are usually selected on the basis of known ligands or crystal structures of the protein. Millimolar activities in biochemical assays usually provide enough affinity to yield cocrystals, but compounds with effects in the range are usually not detected in traditional high-throughput screenings as they are in competition with the biological substrates. In a de novo lead finding approach, Sanders et al. utilized the structural similarity of the active pockets from urokinase and dihydroneopterin aldolase (DHNA) to select the
15. I
Fig. 15.1-6 Selected fragment screening experiment applied to proteases and kinases. In their landmark study, Fesik et al. equilibrated hydrophobic molecules with stromelysin and detected binding by shift o f NMR signals, retrieving structural information from the initial study [MI. Other studies screened fragment collections using
The Target Family Approach
mass spectrometry [62] or surface plasmon resonance [65]and established the binding modes o f the ligands after identification through crystallography. In a recent approach, crystals o f CDK2 were used t o select oxindole ligands from a dynamic combinatorial library and established the binding modes by crystallography in situ [76].
fragments to be screened against DHNA. The probing of the enzyme with the same fragment set that had been used for urokinase by Nienaber et al. [71], allowed establishing the structural requirements for selectivity in the initial screening run and guiding the extension of the discovered fragments into nanomolar inhibitors for DHNA [74]. Starting from privileged scaffolds for the ATP pocket of kinases, fragments binding to p38 MAP kinase and cyclindependent kinase 2 (CDK2) were discovered, that can serve as novel central building blocks for kinase inhibitors [75].As the throughput of crystallography is still limited compared to biochemical screenings, collection sizes have to be small or as in the previous example, mixtures of fragments have to be screened. To expand the size of collections that can be screened by crystallography, Congreve et al. devised a dynamic combinatorial library system using “CDK2”
I
845
846
I
75 Target Families
protein crystals as selectors for the tightest binding ligands which are formed from the condensation of isatin and hydrazines. Instead of equilibrating with a large amount of template protein, the reaction mixture is exposed to individual crystals of CDK2 guiding the selective formation of imino-indolones. The structures of selected reaction products are determined by crystallography, immediately establishing a binding mode for the nanomolar inhibitors of CDK2 [7G]. Today, the application of fragment approaches is still limited to soluble proteins, but in future there will be adaptations to membrane-bound proteins, especially those in which the ligand does not have to compete with natural ligands, like GPCRs or ion channels, to exert a functional response in a biochemical assay. The structural insights in the target families will guide the selection of fragment sets and allow using individual proteins as surrogates for the whole target family.
15.1.4 Epilogue
Over the last 5 years chemical biology has reshaped the methods of doing drug discovery. The investigations of the structural characteristics of target families allow us today to take a more rationale approach toward selecting appropriate compounds for synthesis and testing. Through the sequencing of the human genome, we have the blueprint of the building blocks of life that can be modulated in their interactions through therapeutics. In addition to the aspects discussed in this chapter, the analysis of pharmacokinetic characteristics of molecules in the human body has established guidelines and boundaries for molecules that help us to navigate the chemical space in regions that offer a higher population of structures than those that may be suitable as drugs [3,4].While many of the concepts of the target family approach may not be novel if looked at individually, their conscious combination adds another dimension: “chemical biology” is based on a thorough structural knowledge of similarities and differences within a target family. On the basis ofthe sequence homologies of proteins we can currently make predictions for ligands to hitherto unexplored targets, thus building a powerful stepping-stone for lead discovery. We have also learned how to use closely related family members as surrogates when the target under study is not amenable to a particular technology, such as crystallography. Today’s structural understanding also allows us to make more sophisticated choices about investigations to prevent side effects, and the increasing biological knowledge helps us to rationalize side effects of drugs and to modify affected drugs accordingly. Yet, we still run into the trap of building assay schemes for drug discovery that allow high throughput and are self-consistent. The high-throughput design sacrifices the biochemical mimicking of the cellular environment, such as the previously mentioned high concentration and viscosity, for technical feasibility.
References I 8 4 7
The self-consistency often leads to the risk of loosing the relevance for the pathophysiological phenomenology and thus jeopardizes the predictivity for the therapeutic setting, being detached from reality like the “Hessian glass bead game” [77].Eventually “systems biology” will elucidate how the building blocks of life work together in networks and pathways and which results can be expected by tweaking one dial in the system, leading to novel and powerful assay set-ups. Thus, drug discovery may come a full circle to where it started, but equipped with the chemical biology armentarium of understanding and predicting the phenomenological changes observed in diseased states and after the administration of drugs.
References 1.
J.C. Venter, M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton, H.O. Smith, M. Yandell, C.A. Evans, R.A. Holt, J.D. Gocayne, P. Amanatides, R.M. Ballew, D.H. Huson, J.R. Wortman, Q. Zhang, C.D. Kodira, X.H. Zheng, L. Chen, M. Skupski, G. Subramanian, P.D. Thomas, J. Zhang, G.L. Gabor Miklos, C. Nelson, S. Broder, A.G. Clark, J. Nadeau, V.A. McKusick, N. Zinder, A.J. Levine, R.J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Remington. J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, 2. Deng, V. Di Francesco, P. Dunn, K. Eilbeck. C. Evangelista, A.E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T.J. Heiman, M.E. Higgins, R.R. Ji, Z. Ke, K.A. Ketchum, Z. Lai, Y. Lei, 2. Li, J. Li, Y. Liang, X. Lin, F. Lu, G.V. Merkulov, N. Milshina, H.M. Moore, A.K. Naik, V.A. Narayan, B. Neelam, D. Nusskern, D.B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng,
F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ah, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M.L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S. McCawley, T. Mclntosh, The sequence of the human genome. Science 2001, 291,1304-1351. 2. J. Drews, Drug discovery: a historical perspective, Science 2000, 287, 1960-1963. 3. C.A. Lipinski, F. Lombardo, B.W. Dominy, P.J. Feeney, Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Delivery Rev. 1997, 23, 3-25. 4. D.F. Veber, S.R. Johnson, H.-Y. Cheng, B.R. Smith, K.W. Ward et al., Molecular properties that influence the oral bioavailability of drug candidates, J. Med. Chem. 2002, 45, 2615-2623. 5. A.L. Hopkins, C.R. Groom, The druggable genome, Nat. Rev. Drug Discov. 2002, I , 727-730.
848
I
15 Target Families 6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
G. Wess, M. Urmann, B. Sickenberger, Medicinal chemistry: challenges and opportunities, Angew. Chem., Int. Ed. Engl. 2001, 40, 3341-3350. P.P. Wangikar, A.V. Tendulkar, S. Ramya, D.N. Mali, S. Sarawagi, Functional sites in protein families uncovered via an objective and automated graph theoretic approach, ]. Mol. Bid. 2003, 326, 955-978. M.A. Koch, R. Breinbauer, H. Waldmann, Protein structure similarity as guiding principle for combinatorial library design, Biol. Chem. 2003,384,1265-1272. L. Orning, G. Krivi, F.A. Fitzpatrick, Leukotriene A4 hydrolase. Inhibition by bestatin and intrinsic aminopeptidase activity establish its functional resemblance to metallohydrolase enzymes, ]. Biol. Chem. 1991,266,1375-1378. E. Fischer, Effekt der Zuckerkonfiguration auf die Enzymwirkung. Ber. Dtsch. Chenz. Ges. 1894, 27,2985. D.E. Koshland Jr, The lock-and-key principle and the induced-fit theory, Angew. Chem., Int. Ed. Engl. 1994, 33, 2475-2478. K. Palczewski, T. Kumasaka, T. Hori, C.A. Behnke, H. Motoshima et al., Crystal structure of rhodopsin: A G protein-coupled receptor, Science 2000, 289,739-745. Y. Jiang, A. Lee, J. Chen, M. Cadene, B.T. Chait et al., Crystal structure and mechanism of a calcium-gated potassium channel, Nature 2002, 417, 515-522. M. Nishida, R. MacKinnon, Structural basis of inward rectification: cytoplasmic pore of the G protein-gated inward rectifier GIRKl at 1.8. ANG. resolution, Cell 2002, 111, 957-965. A. Kuo, J.M. Gulbis, J.F. Antcliff; T. Rahman, E.D. Lowe et al., Crystal structure of the potassium channel KirBacl.1 in the closed State, Science 2003,300,1922-1926. R. Fredriksson, M.C. Lagerstrom, L.-G. Lundin, H.B. Schioth, The
17.
18.
19.
20.
21.
22.
23.
24.
25.
G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints, Mol. Pharmacol. 2003, 63, 1256-1272. D.K. Vassilatis, J.G. Hohmann, H. Zeng, F. Li, J.E. Ranchalis et al., The G protein-coupled receptor repertoires of human and mouse, Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 4903-4908. M.H. Saier Jr, A functionalphylogenetic classification system for transmembrane solute transporters, Microbiol. Mol. Biol. Rev. 2000, 64, 354-411. S . Caenepeel, G. Charydczak, S . Sudarsanam, T. Hunter, G. Manning, The mouse kinome: discovery and comparative genomics of all mouse protein kinases, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 11707-11712. S.M. Foord, Receptor classification: post genome, Curr. Opin. Phamacol. 2002, 2,561-566. T. Naumann, H. Matter, Structural classification of protein kinases using 3D molecular interaction field analysis of their ligand binding sites: target family landscapes, 1.Med. Chem. 2002, 45,2366-2378. H. Matter, W. Schwab, Affinity and selectivity of matrix metalloproteinase inhibitors: a chemometrical study from the perspective of ligands and proteins,]. Med. Chem. 1999, 42, 4506-4523. M. Vieth, R.E. Higgs, D.H. Robertson, M. Shapiro, E.A. Gragg et al., Kinomics-structural biology and chemogenomics of kinase inhibitors and targets, Biochim. Biophys. Acta 2004, 1697,243-257. M.A. Fabian, W.H. Biggs, D.K. Treiber, C.E. Atteridge, M.D. Azimioara et al., A small molecule-kinase interaction map for clinical kinase inhibitors, Nat. Biotechnol. 2005, 23, 329-336. N. Vaidehi, W.B. Floriano, R. Trabanino, S.E. Hall, P. Freddolino et al., Prediction of structure and function of G protein-coupled
References I 8 4 9
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
receptors, Proc. Natl. Acad. Sci. U.S.A. and strategies, Curr. Opin. Chem. Biol. 2002, 99, 12622-12627. 2003, 7,64-69. P. Hummel, N. Vaidehi, W.B. 36. H. Zhang, X. Zha, Y. Tan, P.V. Hornbeck, A.J. Mastrangelo et al., Floriano, S.E. Hall, W.A. Goddard 111, Phosphoprotein analysis using Test of the binding threshold antibodies broadly reactive against hypothesis for olfactory receptors: phosphorylated motifs, /. Biol. Chem. explanation of the differential binding 2002, 277,39379-39387. of ketones to the mouse and human orthologs of olfactory receptor 912-93, 37. 2 . Songyang, S. Blechner, Protein Sci. 2005, 14, 703-710. N. Hoagland, M.F. Hoekstra, H. Piwnica-Worms et al., Use of a n S. Shacham. Y. Marantz, S. Bar-Haim, oriented peptide library to determine 0. Kalid, D. Warshaviak N. Avisar, the optimal substrates of protein B. Inbal, A. Heifetz, M. Fichman, kinases, Curr. Biol. 1994, 4, 973-982. M. Topf, 2 . Naor, S . Noiman, O.M. Becker, PREDICT modeling and 38. P.M. Chan, H.P. Nestler, W.T. Miller, Investigating the substrate specificity in-silico screening for G-protein of the Her-Z/Neu kinase using peptide coupled receptors, Proteins 2004, 57, libraries, Cancer Lett. 2000, 1 GO, 51-86. 159-169. T. Klabunde, G. Hessler, Drug 39. L.A. Witucki, X. Huang, K. Shah, design strategies for targeting Y.Liu, S . Kyin eta]., Mutant tyrosine G-protein-coupled receptors, kinases with unnatural nucleotide ChemBioChem 2002,3,928-944. specificity retain the structure and O.M. Becker, Y. Marantz, S. Shacham, phospho-acceptor specificity of the B. Inbal, A. Heifetz et al., G wild-type enzyme, Chem. Biol. 2002, 9, protein-coupled receptors: in silico 25-33. drug discovery in 3D, Proc. Natl. Acad. 40. D.C. Greenbaum, W.D. Arnold, F. Lu, Sci. U.S.A. 2004, 101, 11304-11309. L. Hayrapetian, A. Baruch et al., Small J.S. Mitcheson, J. Chen, M. Lin, molecule affinity fingerprinting a tool C. Culberson, M.C. Sanguinetti, A for enzyme family subclassification, Structural basis for drug-induced long target identification, and inhibitor QT-syndrome, Proc. Natl. Acad. Sci. design, Chem. Biol. 2002, 9, U.S.A. 2000, 97, 12329-12333. 1085-1094. R.A. Pearlstein. R.J. Vaz, J. Kang, X.-L. 41. H.P. Nestler, A. Doseff, A Chen, M. Preobrazhenskaya et al., two-dimensional, diagonal sodium Characterization of HERG potassium dodecyl sulfate polyacrylamide gel channel inhibition using CoMSiA 3D electrophoresis technique to screen for QSAR and homology modeling protease substrates in protein approaches, Bioorg. Med. Chem. Lett. mixtures, Anal. Biochem. 1997, 251, 2003, 13,1829-1835. 122-125. A.M. Aronov, Predictive in silico modeling for hERG channel blockers. 42. M. Meldal, 1. Svendsen, K. Breddam, F.-I. Auzanneau, Portion-mixing Drug Discou. Today 2005, 10,149-155. peptide libraries of quenched J.F. Antcliff, S. Haider, P. Proks, fluorogenic substrates for complete M.S.P. Sansom, F.M. Ashcroft, subsite mapping of endoprotease Functional analysis of a structural specificity, Proc. Natl. Acad. Sci. U.S.A. model of the ATP-binding site of the 1994, 91, 3314-3318. KATP channel Kir6.2 subunit, E M B O 43. I.D.A. Tyndall, T. Nall, D.P. Fairlie, I. 2005, 24,229-239. Proteases universally recognize beta 0. Civelli, GPCR deorphanizations: strands in their active sites, Chem. Rev. the novel, the known and the 2005, 105,973-999. unexpected transmitters, Trends 4.A. Eschenmoser, One hundred years Pharmacol. Sci. 2005, 26, 15-19. of the lock-and-key principle, Angew. D.E. Kalume, H. Molina, A. Pandey, Chem., Int. Ed. Engl. 1994, 33, 2363. Tackling the phosphoproteome: tools
850
I
15 Target Families 45.
46.
47.
48.
49.
50.
51.
52.
53.
R.S. Bohacek, C. McMartin, W.C. Guida, The art and practice of structure-based drug design: a molecular modeling perspective, Med. Res. Rev. 1996, 16, 3-50. G.W. Bemis, M.A. Murcko, The properties of known drugs. 1. Molecular frameworks, J. Med. Chem. 1996,39,2887-2893. K. Bondensgaard, M. Ankersen, H. Thogersen, B.S. Hansen, B.S. Wulff et al., Recognition of privileged structures by G-protein coupled receptors, J. Med. Chem. 2004, 47, 888-899. Y.C. Martin, J.L. Kofron, L.M. Traphagen, Do structurally similar molecules have similar biological activity? I.Med. Chem. 2002, 45, 4350-4358. A.M. Aronov, M.A. Murcko, Toward a pharmacophore for kinase frequent hitters, J . Med. Chem. 2004, 47, 5616-5619. H. Matter, E. Defossa, U. Heinelt, P.-M. Blohm, D. Schneider et al., Design and quantitative structure-activity relationship of 3-amidinobenzyl-1H-indole-2carboxamides as potent, nonchiral, and selective inhibitors of blood coagulation factor Xa, J . Med. Chem. 2002,45,2749-2769. J.L. Jenkins, M. Glick, J.W. Davies, A 3D similarity method for scaffold hopping from known drugs or natural ligands to new chemotypes, J. Med. Chem. 2004,47,6144-6159. D. Horvath, C. Jeandenans, Neighborhood behavior of in silico structural spaces with respect to in vitro activity spaces-a novel understanding of the molecular similarity principle in the context of multiple receptor binding profiles, I. Chem. InJ Comput. Sci. 2003, 43, 680-690. H. Matter, Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors, I . Med. Chem. 1997, 40, 1219-1229.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
L. Naerum, L. Norskov-Lauritsen, P.H. Olesen, Scaffold hopping and optimization towards libraries of glycogen synthase kinase-3 inhibitors, Bioorg. Med. Chem. Lett. 2002, 12, 1525-1528. D.G. Lloyd, C.L. Buenemann, N.P. Todorov, D.T. Manallack, P.M. Dean, Scaffold hopping in de novo design: ligand generation in absence of receptor information, J. Med. Chem. 2004,47,493-496. I.D. Kuntz, K. Chen, K.A. Sharp, P.A. Kollman, The maximal affinity of ligands, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,9997-10002. P.A. Rejto, G.M. Verkhiver, Unraveling principles of lead discovery: from unfrustrated energy landscapes to novel molecular anchors, Proc. Natl. Acad. Sci. U.S.A. 1996, 93,8945-8950. D.A. Erlanson, R.S. McDowell, T. O’Brien, Fragment-based drug discovery, J . Med. Chem. 2004, 47, 3463-3482. D.C. Rees, M. Congreve, C.W. Murray, R. Carr, Fragment-based lead discovery, Nat. Rev. Drug Discov. 2004, 3,660-672. H.P. Nestler, Combinatorial chemistry and fragment screening - two unlike siblings? Curr. Drug Discov. Technol. 2005, 2, 1-12. Y.M.Dunayevskiy, P. Vouros, E.A. Wintner, G.W. Shipps, T. Carell et al., Application of capillary electrophoresis-electrospray ionization mass spectrometry in the determination of molecular diversity, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 6152-6157. G. Agnihotri, M.P. Scott, M.H. Alaoui-Ismaili, U.F. Mansoor, D. Murphy et al., Identification of potent inhibitors of c-Jun N-terminal kinase-1 (JNK1) using ultra high-throughput affinity based screening, 12th Symposium on Second Messengers and Phospho-proteins (SMP-2004),2004. Y. Hou, J. Felsch, A. Annis, C.E. Whitehurst, C.C. Cheng et al., Identification of small molecule
References
64.
65.
66.
67.
68.
69.
70.
ligands for G protein coupled receptor using affinity selection screening, GPCR IBC Conference, 2002. G . Metz, H. Ottleben, D. Vetter, Small molecule screening on chemical microarrays, Meth. Princ. Med. Chem. 2003, 19,213-236. S. Dickopf, M. Frank, H.-D. Junker, S. Maier, G. Metz et al., Custom chemical microarray production and affinity fingerprinting for the S 1 pocket of factor VIIa, Anal. Biochem. 2004,335,50-57. P.J. Hajduk, G. Sheppard, D.G. Nettesheim, E.T. Olejniczak, S.B. Shuker et al., Discovery of potent nonpeptide inhibitors of stromelysin using SAR by NMR,]. Am. Chem. SOC. 1997, 119,5818-5827. E.T. Olejniczak, P.J. Hajduk, P.A. Marcotte, D.G. Nettesheim, R.P. Meadows et al., Stromelysin inhibitors designed from weakly bound fragments: effects of linking and cooperativity,]. Am. Chem. SOC. 1997, 119, 5828-5832. P.J. Hajduk, S. Boyd, D. Nettesheim, V. Nienaber, J. Severin et al., Identification of novel inhibitors of urokinase via NMR-based screening, J . Med. Chem. 2000,43. 3862-3866. M.D. Wendt, T.W. Rockway, A. Geyer, W. McClellan, M. Weitzberg et a]., Identification of novel binding interactions in the development of potent, selective 2-naphthamidine inhibitors of urokinase. Synthesis, structural analysis, and SAR of N-phenyl amide 6-substitution, J . Med. Chem. 2004, 47,303-324. P.J. Hajduk, A. Gomtsyan, S . Didomenico, M. Cowart, E.K.
71.
72.
73.
74.
75.
76.
i'7.
Bayburt et al., Design of adenosine kinase inhibitors from the NMR-based screening of fragments, J . Med. Chem. 2000,43,4781-4786. V.L. Nienaber, P.L. Richardson, V. Klighofer, 7.1. Bouska, V.L. Giranda et al., Discovering novel ligands for macromolecules using X-ray crystallographic screening, Nut. Biotechnol. 2000, 18, 1105-1108. R. Carr, H. Jhoti, Structure-based screening of low-affinity compounds, Drug Discov. Today 2002, 7, 522-527. A. Sharff, H. Jhoti, High-throughput crystallography to enhance drug discovery, Curr. Opin. Chem. Biol. 2003, 7, 340-345. W.J. Sanders, V.L. Nienaber, C.G. Lerner, J.O. McCall, S.M. Merrick et al., Discovery of potent inhibitors of dihydroneopterin aldolase using crystaLEAD high-throughput X-ray crystallographic screening and structure-directed lead optimization, J . Med. Chem. 2004, 47, 1709-1718. M.J. Hartshorn, C.W. Murray, A. Cleasby, M. Frederickson, I.J. Tickle et al., Fragment-based lead discovery using X-ray crystallography, J. Med. Chem. 2005, 48,403-413. M.S. Conpreve, D.1. Davis, L. Devine, C. Granata, M. O'Reilly et a]., Detection of ligands from a dynamic combinatorial library by X-ray crystallography, Angew. Chem., lnt. Ed. Engl. 2003, 42,4479-4482. D.F. Horrobin, Opinion: modern biomedical research: an internally self-consistent universe with little contact with medical reality? Nat. Rev. Drug Discov. 2003, 2, 151-154. Y
I851
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim 852
I
15 Target Families
15.2 Chemical Biology of Kinases Studied by NMR Spectroscopy
Marco Betz, Martin Vogtherr, Ulrich Schieborr, Bettina Elshorst, Susanne Grimme, Barbara Pescatore, Thomas Langer, Krishna Saxena, and Harald Schwalbe
Outlook
The review presents N M R methods that contribute to the structure-guided drug design at the family of protein kinases. Eight kinase-targeted oncology drugs emerged on the market in the past eight years, although the understanding of the molecular key events for tumourgenesis has made great advances. Kinases have a key role in dysregulation of tumour growth and survival. Consequently, tumour-specific kinase inhibitors are needed to open new therapeutic opportunities for cancer patients. The recent advances of the recombinant expression of the catalytic domains of protein kinases will be described, which pushed the frontier of amendable kinases to NMR-guided drug discovery. The publication will focus on methods, which provide information on the binding properties of small molecules to the catalytic domains of protein kinases as identified from NMR-based screening trials. Moreover, aspects of the dynamic behaviour of key residues involved in kinase-ligand interactions at the active site will be explained. An applicable tool (LIGDOCK) to calculate docking complexes with small molecules at high precision, which helps medicinal chemists to judge structure activity relationships, will be presented. The resulting information about selectivity, binding site and binding mode was used for step-by-step optimisation of molecular fragments. The insights obtained from NMR studies had important implications for the drug discovery process as demonstrated by an enhancement of selectivity of small compound collections towards a given target kinase.
15.2.1 Introduction 15.2.1.1
Kinases as Drug Targets
The so-called gene-family approach to drug discovery allows the simultaneous assessment of both potency and selectivity of protein targets. Primary screens of individual therapeutic targets are followed by secondary screens composed of homolog members of the protein family [l-31. To benefit from synergy, drug discovery programs have integrated the recent advances in genomics and structural genomics [4]. This facilitates a gene-family approach for protein classes such as G-protein coupled receptors (GPCRs) ion channels, protein kinases, and protein phosphatases. These protein families are key players in cell Chemical Biology. From Small Molecules to System Biology and Drug Desip. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GinbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
15.2 Chemical Bio/ogy ofKinases Studied by NMR Spectroscopy
signaling networks, governing processes such as cell growth, cell division, and cell death. The pathophysiology of many diseases is caused by the perturbation ofthese pathways, whether caused by environmental stresses or genetic defects. Regarding protein kinases, deregulated activity is involved in all aspects of neoplasia, including proliferation, invasion, angiogenesis, and metastasis IS]. As a result of the sequencing of the human genome, approximately 500 protein kinases have been predicted. Although only a comparatively small number of protein kinases has actually been targeted by established drugs, it is now accepted that finding protein kinase inhibitors is a viable way to discover new drugs. The remarkable success of the first inhibitors, the anticancer drugs Gleevec (Novartis) [GI and Iressa (AstraZeneca) [7], supports the idea of targeting a kinase that is pivotal to a malignant phenotype. These findings have increased the efforts in drug discovery and development research in this area.
15.2.1.2
Kinases - A n Overview
The highly organized succession of biochemical reactions found in living organisms takes place along the route of signal transduction. The concept of “activating” a protein is essentially important in signal transduction [8]. Activation is usually accomplished by structural reorganization of the protein triggered by events such as ligand binding, protein-protein interactions, or chemical modifications. One of the most versatile activation mechanisms is the activation by phosphorylation. A defined activation status of a target is maintained by two counteracting classes of enzymes, the protein kinases that specifically phosphorylate targets and the protein phosphatases that specifically dephosphorylate them. The majority of protein kinases catalyzes the transfer of the terminal phosphate group from ATP to a specific serine and threonine, and the minority uses tyrosine residues as protein substrates [9, lo]. Inhibition of protein kinases has been a powerful tool to study signal transduction pathways. It is easily achieved by inhibitors such as staurosporine. However, staurosporine inhibits a broad spectrum of kinases, therefore it is not suited for therapeutic purposes [l11. As the understanding of kinase mechanism and inhibition has advanced in the past years, there is now an increasing number of kinase inhibitors that are specific for one particular kinase. The discovery of such specific inhibitors has in turn enhanced our understanding of kinase action and signal transduction pathways.
15.2.1.3
Structural Biology of Kinases
Most native protein kinases are assembled in a modular fashion. This assembly always contains the protein kinase catalytic domain that catalyzes the transfer of a phosphate group to a substrate. Most kinases have additional variable domains that are involved in kinase recognition, activation, or localization.
I
853
854
I Common variable kinase 15 Target Families
domains include protein-protein recognition domains (eg., SH2, SH3, PH, or polo box domains), signaling domains, and membrane anchoring domains [12, 131. Although the variable domains are of particular interest in biochemical, structural, and pharmacological research, the focus of this review is restricted to the kinase catalytic domains. The high degree of structural and functional conservation is an ideal prerequisite for a target-family approach, therefore this domain has also been the major point of attack in most kinase directed drug strategies. All related serinelthreonine kinases and protein tyrosine kinases share a structurally conserved catalytic domain of about 270 amino acids. Numerous kinase catalytic domains have been structurally characterized by X-ray crystallography [ 14- 181. All of them share the highly conserved bilobal fold that is depicted in Fig. 15.2-1. In this fold, the N-terminal lobe is composed almost entirely of j3-sheets, whereas the C-terminal lobe is dominated by a-helices. The two lobes are joined by a polypeptide chain, which functions
Fig. 15.2-1 Ribbon diagram showing the structure ofthe catalytic domain o f murine protein kinase A (PKA) in complex with Mg/ATP (1 Q24.pdb). The basic architecture that has been observed in all subsequent kinase domain structures is denoted.
75.2 Chemical Biology of Kinases Studied by NMR Spectroscopy
as a hinge. The catalytic site is located at the interface region between both lobes. The adenine moiety of ATP binds deep in a hydrophobic pocket between the lobes, while the phosphates of ATP are aligned by interactions with the backbone amides of a glycine-rich loop. The protein substrate binding site is associated mostly with the C-terminal lobe. The catalytic cycle of the phosphate transfer and the conformational reorganizations linked to it are reasonably well understood [19]. Crystallographic studies of mammalian protein kinase A (PKA) with and without Mg-ATP and an inhibitory polypeptide have revealed two different conformational states. The so-called open form is seen in the apo form and in the binary complex with the peptide. The N-terminal lobe is turned away from the C-terminal lobe by 14" when compared to the closed conformation. The closed structure can be observed in the ternary complex with Mg-ATP and the peptide substrate. This conformation is necessary to bring the residues into the correct orientation to promote catalysis [20]. A key aspect of regulation is that most kinases can be activated by specific phosphorylation, but there are numerous other kinase-specific activation and inactivation pathways that involve protein-protein interactions. The phosphorylation takes place on residues located in a particular segment in the center of the kinase domain, which is termed the activation segment. It is defined as the region spanning conserved sequences DFG and APE. The conversion from an inactive to an active state involves conformational changes in the protein that lead to the correct disposition of substrate binding and catalytic groups. Structures of kinases with unmodified activation loops fall into two classes cyclin dependent kinase 2 (CDK2)and insulin receptor kinase (IRK) are representatives of enzymes that adopt inactive conformations in their resting state. Their activation loop has an inhibitory fold, blocking the sites for ATP and substrate [14, 161. p21 activated protein kinase 1 (PAK1)and PKA, when freed of a negative regulator as the inhibitory switch (IS) domain [21] or the R subunit [22], respectively, appear to relax into an accessible conformation.
15.2.1.4 NMR Spectroscopy in Drug Discovery N M R spectroscopy is involved at many different stages in pharmaceutical research. Biomolecular N M R spectroscopy that will be discussed in this review is just one application of a technique that is routinely used in pharmaceutical industry for reaction control, metabonomics [23],or characterization of natural products [24]. The most commonly used biomolecular N M R techniques in pharmaceutical research are screening of small-molecule substance libraries and the structural characterization of protein-ligand complexes. These applications can be at least partially accomplished by other approaches (highthroughput screening (HTS), surface plasmon resonance (SPR), modeling, and X-ray crystallography) [25]. However, it has turned out that N M R ideally complements these methods.
I
855
856
I
75 Target Families
For a long time, N M R spectroscopy of proteins has been restricted to small proteins and peptides. However, recent methodological and technical progresses have enabled N M R spectroscopy to routinely study proteins that are as large as 40 kDa, thereby in principle allowing NMR studies on most protein kinase catalytic domains. Although numerous internal studies on protein kinases have been conducted in pharmaceutical companies, there are only a few publications that describe the advantages of biomolecular NMR spectroscopy for protein kinases. In this review, the typical workflow of an NMR spectroscopic approach, when investigating protein kinases as pharmaceutical targets, is outlined. In particular, protein NMR experiments to characterize the target protein are covered. The review includes the NMRdriven search for the best expression construct and the optimal buffer conditions, where the achieved protein yield meets adequate signal-to-noise ratio (discussed in Chapters 15.2.2.1 and 2.2). Several kinases are screened in a fragment-based approach, the set-up of ligand-based NMR experiments results in the identification of new ligands, whose binding affinities toward the individual kinase are derived (Chapter 3 . 3 ) . The combination strategy with the protein-based N M R characterization of the interactions, which leads to detailed knowledge about viable ligands at their residue-specific binding site, will be explained (Chapter 15.2.4).Subsequent molecular docking with NMRderived constraints, which reveals the binding modes of molecular fragments at atomic resolution and serves as a starting point for further optimization steps, are shown (Chapter 15.2.4.3).
15.2.2 Protein NMR Spectroscopy on Kinases
15.2.2.1
Protein Expression
The requirement for stable proteins with good solution behavior is common to both X-ray crystallography and NMR techniques. In most protein expression laboratories, the two most frequently utilized expression hosts for recombinant proteins are Escherichia coli and baculovirus-mediated insect cells [26]. For advanced N M R investigations, the incorporation of the nonradioactive but NMR-active isotopes "N, 13C, and 2H is necessary throughout the protein. The host expression organism is grown in isotopically enriched minimal media or special commercial full media. In practical terms, this limits the labeled protein expression to E. coli and yeast, when efficiency at the economical expense is regarded. Proteins with disulfide bonds and proteins that require glycosylation or other posttranslational modifications are often difficult if not impossible to obtain from expression in E. coli. In these cases, yeast expression systems such as Pichia pastoris can be used for N M R purpose, since P. pastoris can metabolize methanol as the sole carbon source and provides glycosylated proteins [27].
15.2 Chemical B i O b g y of Kinases Studied by N M R Spectroscopy
I As an alternative, for the recombinant expression of eucaryotic proteins, selectively labeled amino acids can be introduced by the baculovirus expression system in Spodopterafiugiperda (Sf9) insect cells at reasonable costs as shown [28]. Recently, the uniform isotope labeling of the Abl kinase using baculovirus-infected insect cells has been reported [29]. The expression of protein kinases implies several aspects, particularly if the specific protein targets are difficult to express. A typical workflow is depicted in Fig. 15.2-2. The employed strategy for recombinant kinase expression is
systems A and B, uniformly Fig. 15.2-2 Typical workflow o f t h e expression o f protein kinases. Optimization ' S N / ' 3 C / 2 H - l a b e l e dprotein kinases are possible and lead t o a triple labeled sample is performed i n E. GO/; and Pichia pastoris for the NMR assignment. W i t h the simultaneously t o find the m o s t suitable expression system. If both expression hosts baculovirus system only 15Nselective a m i n o acid labeling is economically feasible. fail, the expression system is changed t o baculovirus-infected insect cells. W i t h the
857
858
75 Target Families
I to utilize E. coli and P. pastoris as expression hosts in parallel. Screening of several constructs and fusion partner tags in both expression systems leads to properly folded kinases in most cases. If both expression hosts fail, the expression system can be changed to baculovirus-infected insect cells. Figure 15.2-3 shows an ensemble of ['H, "N]-TROSY (transverse optimized spectroscopy) spectra recorded of Bruton's tyrosine kinase (BTK) labeled with five different amino acids [30]. The expression of a particular kinase can be toxic to the host cells. This effect was reported for PAK1, which could only be expressed in E. coli in an autoinhibited state. A single point mutation of residue lysine 299 to an arginine residue forces the activation segment into a conformation, where ATP binding is prevented, leading to a kinase-dead protein that is not toxic to the host cells anymore [31]. However, for drug discovery that targets the active side of P A K l with ATP analogs, a different point mutation is inserted. ATP binding should still be possible but the catalytic activity should be inhibited. In case of expression difficulties, surrogate kinases are of major importance.
Met: 12/12 signals
(a) mm
~
(b)
112 1 114
-E ' ~
116
122
00000 0.000 00000
00000
00000
~
7
I
126 j 128 1
4
Leu: 25/26 signals D
m
l
00000 00000 0.000 00000
,
e
124 '
130 132
1D
~~
118
2
(c)
Ile. 12/13 signals
~
~
iI:
' -
1
% 7 , I
95
90
80
85
75
,
,
10
9
'
,
, ,
8
7
,
6 ppm
Phe: 11/12 signals
,
,
11
10
~
9
8
,
,
7
6 PPm
: :00000 y o
00000
110
11
(e)
Val: 17/18 signals
(d)
135
ppm
70
0000. 00000
110
.0000 00000
4 I
135[
,
,
,
,
11
10
9
8
,
,
7
,
1
6ppm
,
,
11
10
'
1
,
,
,
,
9
8
7
Gppm
6'H [PPml
Fig. 15.2-3 [' H,'SN]-TROSY spectra of different BTK samples with selectively labeled "N-amino acids: Met (a), Ile (b), Leu (c), Val (d), and Phe (e). The circles in
the upper corner represent the 20 possible amino acids. Proline is not considered because it lacks the NMR-detectable amide proton.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
Protein kinase B (PKB/Akt) is a validated target in drug design, but the catalytic domain cannot be expressed in E. coli in a functional form. PKA and Akt/PKB, both of which belong to the AGC family (PKA/protein kinase G/PKC) of protein kinases, share a high sequence homology. Distinct point mutations in the active site of PKA (PKAB6 and PKAB8 chimeras) are introduced to enhance their similarity and their corresponding binding profile [32]. Depending on the used expression system and fusion protein, the yield of an expressed kinase can vary by one magnitude. Changing the gluthation S-transferase fusion protein (GST) to an N-terminal His tag for the expression of p38 results in an increase by factor 8 for the yield of the recombinant protein. Another issue for construct optimization efforts is done for mitogenactivated protein kinase-activated protein kinase-2 (MAPKAP-2). Screening of 20 protein constructs with different N- and C-terminal ends leads to an NMR-feasible kinase. In this case, the protein expression yield of the different constructs is of minor importance; the major goal to achieve is a properly folded protein with long-term stability during the recording of the N M R spectra. The domain boundary has to be carefully chosen, which is also observed for PKA expression. The A helix (amino acids 16-31) contains an N-myristylation motif and a SO-residue extension at the C-terminus of the catalytic domain. This amino acid stretch with the aromatic FTEF sequence must be included during the NMR investigations because it folds back onto an hydrophobic patch of the N-terminal lobe, thus stabilizing the whole protein construct [ 331. As a result, these expression efforts lead to a triple labeled protein kinase sample, providing the basis for the N M R assignment of the specific kinase.
15.2.2.2
Construct and Condition Optimization
The highly conserved protein kinase catalytic domain fold has a size of approximately 40 kDa. This hampers protein N M R investigations by aggravated signal overlap and rapid 'H and 13C transverse relaxation. Deuteration of nonexchangeable protons in combination with H decoupling efficiently increases the size limit for solution N M R eliminating the possible relaxation pathways. Additionally, the discovery of ['H, "N]-TROSY [34] based triple-resonance methods enhances the advantages of deuteration, allowing the sequential resonance assignment of large proteins to be obtained. It is necessary to assign at least the N M R resonances of atoms comprising the protein backbone prior to further site-specific N M R studies. This can be accomplished routinely by using a suite of triple-resonance experiments and uniformly 2H/13C/'SN-labeledprotein samples [35]. However, the relative high concentrations (100-GOO pM) necessary for assignment require a careful choice of measuring conditions. Only in some cases, the published
'
I
859
860
I conditions
15 Target Families
for crystallization or testing for enzymatic activity can be directly translated for NMR investigations. However, conditions can be directly optimized by the DOSY (diffusion ordered spectroscopy) N M R experiment, from which diffusion constants can be obtained [ 3 6 ] , and by analyzing ['H, "N]-TROSY spectra for a set of conditions. Both methods are powerful diagnostic tools. Usually it is sufficient to analyze a small number of possible constructs at defined buffer conditions. Later the buffer conditions can be optimized for the best construct by a two-dimensional grid search for optimal pH and salt concentration. However, in some cases the optimal conditions for various constructs can be different. It is essential to keep the concentration constant in all samples because TROSY signal intensity and aggregation behavior are themselves concentration dependent. For diffusion measurements, the bipolar LED sequence with water suppression gives the best results [3G].While an absolute comparison between different proteins is generally difficult, the relative diffusion constants for one particular protein at different buffer conditions are good measures for its oligomeric state or aggregated state, respectively. Furthermore, the concentration dependence of the diffusion constant is a practical indicator of the aggregation tendency. Figure 15.2-4 shows an example ofwhat can be achieved by proper optimization of buffer conditions for MAPIZAP-2.
I
-z2 a
105
-
110
-
'
, b
115 120-
v)
r
Lo
125
-
130
-
,
1.)
135 i
1
,
;,
,
,
,
,
,
,_
I
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
N M R conditions should follow the usual guidelines for N M R samples [37]. Ample buffering is particularly important at preferably low pH and at a sufficient distance from the isoelectric point. To improve the signal-to-noise ratio, the highest possible concentration is to be used without running into aggregation problems. Since one of the fundamental requirements for crystallization is good solution properties, this type of screen can also be used to assess suitable solution conditions for crystallization trials. The application of the described workflow led to the successful expression and purification of several kinases, which are validated targets in drug discovery. Figure 15.2-5 shows a gallery of the corresponding spectra, which were recorded on the uniformly N-labeled proteins in our group.
15.2.2.3
NMR Resonance Assignment
The N M R resonance assignment is possible in a standard fashion using a set of triple-resonance experiments (e.g., HNCO, HN(CA)CO, HNCA, HNCACB, HN(C0)CACB)[35].The use ofuniformly 2H/'3C/'sN-labeled protein samples and a set of TROSY-typeversions of these experiments is indispensable. Highfield instruments (800 MHz or higher) and cryogenic probeheads contribute significantly to enhanced sensitivity. Figure 15.2-6 shows the amide region of a ['H, "N]-TROSY spectrum of uniformly "N-labeled PKA catalytic domain with construct, buffer, and expression essentially taken over from published X-ray crystallization studies. The spectrum obtained at a 800 MHz N M R spectrometer is well resolved and demonstrates that NMR studies on protein kinases are viable. To benefit from enhanced sensitivity of proton detection, 'H nuclei are required at exchangeable sites such as the amide protons or the C" protons. Deuterated protein samples are usually prepared from host cells grown in DzO-based media, which contain deuterated carbon sources. Subsequent D/H exchange in the HzO-based N M R buffer reintroduces protons at the labile positions. However, large proteins usually yield less peaks in the 'H-15N correlation than it is expected from the primary sequence. One explanation is a high deuteration level of nonexchangeable amide sites in the hydrophobic core of large proteins. This problem can be minimized, but not abolished, if the protein is overexpressed in perdeuterated media powder, which has been resolved in HzO [38]. Even then, protein resonances are missing in the HN-based triple-resonance spectra. The absence of resonances presumably has more fundamental reasons. Possible reasons are fast proton exchange rates with the solvent or excessive line broadening caused by intramolecular motions. In a more detailed fashion, the dynamic property of a particular segment of mitogenactivated protein (MAP) kinase p38a is discussed in Chapter 15.2.4.2. Anyhow, the absence of peaks complicates sequential resonance assignment. As compensation, additional information beyond elaborate N M R pulse
I
861
862
I
15 Target Families
Fig. 15.2-5 [’ H,”N]-TROSY spectra o f t h e catalytic domains o f various protein kinases. For p38, PKA and PKC assignments o f t h e correlation peaks are available.
75.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
Fig. 15.2-6 [’ H,15N]-TROSYspectrum of active murine protein kinase A (PKA) with the annotated assignment.
sequences is to be included. There are three practical tools that are used to enable the assignment of the protein, (a)the chemical shift matching procedure, (b) the use of paramagnetic spin labels, and (c) the use of amino acid-selective labeled samples as described previously in Sections 15.2.1 and 15.2.2 [ 3 3 ] .
I
863
864
I 15.2.2.3.1
15 Target Families
The Chemical Shift Matching Procedure
The standard set of triple-resonance experiments needed for the sequential assignment complements each "N-lH correlation resonance with a set of up to six intra- and interresidual cross peaks (Fig. 15.2-7(a)).The assignment process can be divided into two steps. The first step is the search for resonances of amino acids that are identified as neighbors on the basis of cross peaks with identical carbonyl or C"/Cfi carbon chemical shifts. An ensemble of several consecutive cross peaks is called a stretch. Assumptions about the type of the amino acid can be made from the chemical shifts of the corresponding C" and Cf' carbons. The second step is the unique positioning of a stretch onto the primary sequence. This process is called matching. It is possible that the stretch is so long that the consecutive amino acid types occur only once within the sequence. In practice, the incompleteness of observable resonances for a given protein implies that most of the stretches are comparatively short. On the other hand, the number of possible matching positions grows with increasing size of a protein. The MAPPER procedure [39] has been suggested to alleviate this problem. This procedure is based upon the average chemical shifts of each amino acid type and its standard deviation. It ranks possible assignments according to their probabilities based on their chemical shift statistics. However, the NMR investigation of protein kinases is made easier if the X-ray structure of the particular kinase is available. For a protein with known structure it is possible to predict chemical shifts quite accurately [40]. The chemical shift matching procedure uses these chemical shift values instead of a statistical input of normally distributed standard resonances. This approach was tested on proteins with known assignments to investigate its success rate and unambiguity. Three cases are to be distinguished (a) the correct position is found as an unique solution, (b) the correct position is found along with other solutions (i.e., minima within the fold of the global minimum), (c) the correct position is not found. To be useful for the sequential assignment, case (a) is favored and case (c) is to be avoided. Figure 15.2-7(b) exemplifies the matching procedure of a given stretch of four amino acids. As expected, the assignment becomes less ambiguous with increasing number of residues within a given stretch. However, even for the chemical shift data belonging to one single "N-' H correlation, the correct and unambiguous solution is found for nearly 50% of the amino acids. This data includes carbon chemical shifts for two neighboring amino acids. For stretches of four amino acids, the solution is correct and unambiguous in 97% of the cases. Such stretches are difficult to find, particularly for larger proteins, but easier to match on the basis of qualitative chemical shift arguments. The strength of the matching procedure is more obvious for short stretches, which are easy to find but difficult to locate. Even for sparse data, the correct solution is found in approximately 90% or more of all proteins under investigation. For more than 60% of all two-residue stretches, the correct solution is observed unambigously.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy I865
(a)
AM93 i,i-l
7.65
(b) 10
ARG94
1.21
93-94
9.21
SER95 i-1 i,i-1
‘33
733
93-95
LEU96 1-1 1.1-1
841
841
93-96
9 - 8 E a a 7
D
p
6
an
tE 5 $ 4 0
g 3 “
2
1 100 200 300
100 200 300 Residue number
Fig. 15.2-7 The shift match procedure can be divided into two steps. The identification o f resonances o f subsequent residues and the calculation o f the root mean square difference (RMSD) in calculated and predicted chemical shifts. (a) Strips from the threedimensional NMR spectra HNCOCACB and HNCACB. Through the matching o f C“ and CP chemical shifts the neighboring amino acids can be concatenated to a “stretch”. The positioning of this stretch onto the primary sequence, putative Ala93 to Leu96, is ambiguous a t this stage. (b) In each o f
100 200 300
these diagrams, a t the bottom, the RMSD is shown as a function o f the assumed start residue in the protein sequence. Low RMSD values indicate possible correct start residues. The diagram shows the alignment o f the resonances o f t w o residues, 93-94 (left), three residues, 93-95 (middle) and four residues, 93-96 (right). The alignment o f two residues leads to three different possible solutions (circles). After the identification o f three or four subsequent resonance sets, the correct alignment is the unique solution o f the shift matching process.
866
I In near all cases, the solution becomes unique upon the addition of a third 15 Target Families
amino acid. The Use o f Paramagnetic Spin Labels Unpaired electrons cause faster relaxation of all neighboring nuclei resonances in a distance-dependent manner. Functional groups with unpaired electrons can be introduced by chemical modification of known inhibitors [41]. It has been demonstrated that these effects allow the detection and the structural characterization of protein-ligand interactions relatively oriented to the localization of the paramagnetic center within the structure [42, 431. The reverse case, identification of neighboring protein resonances within a published protein-inhibitor structure, can be used as a tool for the assignment of signals. The quantification of this effect can be utilized to deduce distances between isotopically labeled protein residues and the paramagnetic center [44]. There are two types of effects that are caused by the paramagnetic center, line broadening due to increased relaxation rates and chemical shift changes due to contact or pseudocontact shifts. As an alternative to chemically modified inhibitors, short polypeptide extensions to the primary sequence of the target protein can be appended, which contain binding sites for trivalent metal ions. On loading with paramagnetic lanthanide ions, which bind with high affinity to the binding tags, chemical shifts of the neighboring residues are perturbed. Moreover, bound lanthanide ions induce residual dipolar couplings (RDCs) because the unpaired electron restricts the molecular tumbling by a weak alignment in the static magnetic field. This provides an additional tool for the determination of protein structures by solution NMR [45, 461. However, for assignment (particularly of a large protein with overlapping signals) it is desirable to keep chemical shift changes to a minimum. Therefore, paramagnetic agents that induce a purely relaxing effect are best suited for assignment purposes. Metal ions (e.g., Mn2+) can unspecifically bind in many proteins. Spin-labeled ligands that bind to well-defined sites in the protein are better suited to help the assignment procedure. For the Abelson kinase, spin-labeled Gleevec (Fig. 15.2-8(a))has been reported previously as the ligand [44]. Exhibiting a high affinity, such ligands cause additional problems due to the chemical shift perturbations (CSPs) of the protein resonances. This can be circumvented by a weakly binding adenosine derivative (Fig. 15.2-8(b)). The position of the paramagnetic center can be inferred with sufficient accuracy from kinase-ATP complex structures, which are publicly available. By superimposing the spin-labeled derivative over the adenosine moiety in the X-ray complex, the position of the ATP B-phosphorus atom can serve as a good approximation for the paramagnetic center. For apo and/or inactive kinases where no complex structure is available, the position can be inferred from a molecular docking approach on the basis of the known binding mode of adenosine to the hinge region of a kinase [33]. 15.2.2.3.2
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
Fig. 15.2-8 Chemical structures o f (a) the spin-labeled analog o f ST571 (Cleevec; imatinib), (b) spin-labeled adenosine, (c) SB203580 (DFC-in inhibitor), and (d) BIRB796 (DFC-out i nhibitor).
Although this approach works in principle with uniformly "N-labeled protein, it is more favorable in combination with selective amino acid labeling. First, in the absence of a spin-labeled inhibitor, the specific amino acid resonances are identified when compared to the spectra recorded from uniformly labeled kinases. The number of peaks is reduced and therefore more manageable if the spin-labeled inhibitor is added in a second step. Especially in regions with aggravated signal overlap, the selective labeling technique clearly separates the peaks and, therefore, enables the quantification of the induced peak attenuation. Figure 15.2-9 shows an example of selectively "N-Met labeled kinase BTK, which was expressed in baculovirus-infected insect cells. For this kinase no assignment is available so far. The ['H,''N]-TROSY spectra shows 12 peaks corresponding to the 12 methionines of the primary sequence (Fig. 15.2-7(a)). Upon adding the spin-labeled adenosine, four peaks are strongly attenuated corresponding to the four methionines which are in close distance to the spin label. This information is very valuable for the assignment process.
15.2.2.4 15.2.2.4.1
Protein-basedResults of NMR Investigations on Kinases Extent of Assignment
The assignments for two protein kinases have been recently published for active murine PKA [33] and for the inactive human MAP kinase p38a [47]. Both kinases have been studied by numerous other biophysical methods and
I
867
868
I
75 Target Famihes
\-I
Distance [A]
Distance [A]
MET431
12.8 *
MET501
21.3
MET437
16.2
MET509
14.9 * 18.9
MET449
13.6 *
MET570
MET450
14.9
MET587
17.1
MET477
16.5
MET596
18.9
MET489
25.1
MET630
23.3
Fig. 15.2-9 (a) ['H,"N]-TROSY spectra o f the selectively "N-Met labeled kinase BTK
attenuated, are marked with an asterisk. (b) Ribbon presentation ofthe structure o f BTK. Methionines are depicted as balls and (black spectrum) showing 12 peaks the spin-labeled adenosine i s shown as corresponding to the 12 methionine in the sticks with the unpaired electron marked as primary sequence. Upon adding of spin-labeled adenosine, the peak intensities a star. (c) Table ofthe distances o f t h e are attenuated according t o the distance o f methionines to the spin-labeled adenosine. the amino acid to the paramagnetic center. Four methionines are in closer distance t o the spin-labeled adenosine marked with an The percentage rate ofthe residual peak intensity is denoted by the peaks (light gray asterisk. spectrum). Four peaks, which are strongly
thus are well-characterized proteins. A wealth of structural and functional data is available to be compared with the NMR results. As pointed out above, there are commonly less peaks in the spectra recorded from large proteins than expected from their primary sequence. For the p38 MAP kinase only three quarters of the theoretical observable HN-Peaks could be detected in the ['H,''N]-TROSY spectrum and therefore, the number of assigned peaks is
15.2 Chemical Bio/ogy of Kinases Studied by NMR Spectroscopy
I selectively labeled amino acids
comparably lower. Even the samples with 15N yield less than the expected number of peaks, indicating that the disappearance of signals is not due to the overlapping resonances. A detailed statistic for p38 MAP kinase is given in Table 15.2-1. For PKA even less peaks could be observed and assigned. Figure 15.2-10 depicts the extent of both assignments mapped on the crystal structures of each kinase. Table 15.2-1 Statistics o f amino acids, observable and assigned peaks in the [’H,’5N]-TROSY spectra ofthe kinase protein p38 Selectively labeled construct
ASP Ile
Leu Met Phe
TYr Val Total
Number of amino acids
Number of observable peaks
Number of assigned peaks
27 22 42 10 13 15 22 34514
20 15 37
15 15 22 5 12 9 19 167(64%)
9
12 12 21 261(76%)
a The total number of amino acids without prolines which principally do not show a correlation peak due to the lack of an amide proton.
Fig. 15.2-10
Ribbon representation ofthe assigned regions marked in yellow are the protein kinase PKA (a) and p38 MAP kinase more surface exposed regions. (c) Statistics (b) showing the N-lobe, the C-lobe, and the of the assigned and unassigned peaks in the ATP-binding site. In both proteins the [‘ H,’5N]-TROSY spectra.
869
870
I
15 Target Families
In both proteins, the assigned regions are the more surface exposed regions. The N- and C-terminal sequences and also the p-sheet N-lobe are almost entirely assigned. On the other hand, the C-helix,the catalytic loop and parts of the activation segment remain unassigned in both proteins. These unassigned regions are solvent inaccessible in the tertiary structure and form a contiguous patch. However, the distribution of assigned versus unassigned regions of both proteins (see Fig. 15.2-11) is different in many regions of the C-lobe. It can be speculated that this observation indicates that the dynamics in the Globe, or in the activation segment, are different in the two kinases, which could correspond to the different functionality of these two proteins. It is documented for the crystal structures of inactive human CDK2 and the partially activated human CDK2-cyclin A complex that large conformational changes of the activation segment occur [48]. Comparing the position of the activation segment in the structures for Twitchin Kinase, IRK, calmodulindepend kinase I (CaMKI),and MAPK, a variety of conformations are revealed that are accessible to different kinases in their inactive state [15, 16, 49,
Fig. 15.2-11 Sequence alignment ofthe protein kinases PKA and p38. Assigned amino acids (black), prolines (gray), and unassigned amino acids (white) are mapped onto the sequence.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
SO]. The survey over the static crystal structures provides clues to the conformational malleability of particular regions of the protein kinases, as they move through the catalytic cycle while various substrates, inhibitors, and scaffold proteins participate. It can be presumed that the mentioned regions have residual mobility even in the absence of any ligands. These local segmental motions could happen on a timescale, which is unfavorable for conventional detection by solution N M R , as consequence, resonances vanish because of excessive linebroadening. Fluorescence resonance energy transfer (FRET)measurements support this hypothesis. One cysteine at the Nterminal lobe of PKA was labeled with fluorescent probe acting as an acceptor. The fluorescent donor was anchored at the opposing lobe and the observed intramolecular anisotropy decay revealed that the apoenzyme is likely to be highly dynamic [51].
15.2.2.4.2
Addressing the Activation Status by NMR
Activation and substrate binding of kinases is accompanied by concerted motions between the two domains of the catalytic core. These motions influence the relative orientation of the two lobes with respect to each other. In particular, the activation of PKA is triggered by phosphorylation of the pivotal residue Thr197 in the activation segment, which contributes a stabilizing ionic interaction with the conserved Arg165 preceding the catalytic aspartate. The crystal structure of the CDK2, partially activated upon the binding of cyclin A, shows that the helix comprising residues PSTAIRE i s also involved in the long-range nature of the structural rearrangement [48]. Noteworthy, the corresponding helix C in PKA has not been assigned because of the lack of correlation peaks which indicates the dynamic feature of this structural region. The length of the activation segment varies up to 10 amino acids among protein kinases. The variability may allow a kinase to be constitutively active (e.g., phosphorylase kinase (PhK) possesses a glutamate residue at the conserved position [52]),or it may allow control by autophosphorylation, if the segment has a sequence corresponding to the specificity of the kinase itself (e.g., PKA). Alternatively, the specificity attracts other protein kinases, which function as part of the signal cascade. The complex rearrangements caused by the activation/inactivation are also easily detected by N M R spectroscopy since they result in large CSPs in [lH, "N]-TROSY spectra. As an example, PKA possesses four phosphorylation sites. The autophosphorylated site Thr197 is sufficient to achieve full activation, whereas the function of the other phosphorylation sites is presently unclear. Since PKA possesses a low basal activity, the suppression of the autophosphorylation reaction can only be achieved by introducing mutants that lack the phosphorylation site. Figure 15.2-12 shows the spectrum of the resulting constitutively inactive mutant T197A [ 3 2 ] .Large CSPs as compared to wild-type PKA document the inactivation of the protein. By contrast, the mutation on the other possible phosphorylation sites, as exemplified by the
I
871
872
I
75 Target Families
(a) Overlay o f a section o f the active state o f wild-type PKA and the inactive mutant T197A is proven by the large CSPs kinase A (PKA) (black) and mutant T197A shown in the overlay. (b) The mutation o f (red). The mutant T197A lacks the the other phosphorylation site Ser338 to an autophosphorylation site Thr197 and is alanine does not cause conformational therefore constitutively inactive. The changes since no CSPs could be observed in conformational rearrangement between the the spectrum.
Fig. 15.2-12
[' H,15N]-TROSY spectra ofwild-type protein
mutant S338A, leads to much smaller CSPs. Finding constitutively active mutants is an alternative, but care has to be taken: A single point mutation of the corresponding threonine to an acidic residue is successful in few cases. The implications have been investigated in more detail for MAPK and PKA, respectively [32, 531. Changing the activation status in vitro can be a tedious undertaking, since reaction by incubation with the activating kinase or inactivating the phosphatase, respectively, tends to be incomplete. Coexpression of a kinase together with its activating predecessor in the signal cascade usually leads to a defined activation status. For example, p38 MAP kinase was expressed in a dual construct with the activating kinase MKKG, which itself had to be activated by point mutation [54]. Comparing the recorded ['H, ''NI-TROSY of a uniformly "N-labeled sample with that of the inactive p38 MAP kinase provided the proof for the long-range nature of interactions that rearrange the particular protein loops at the lip of the catalytic cleft. Biomolecular NMR can monitor the success of such experiments as an complemental technique to the measurement of the enzymatic activity of the kinase. Mapping the observed CSPs on the crystal structure of the corresponding kinase illustrates atomic details of the activation itself.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
Activation can also be monitored through ATP-binding studies. Measurement of ATP binding, for example, by the highly sensitive'H STD (saturation transfer detection) NMR, needs small amounts of protein and does not rely on '5N-labeled protein. However, care should be taken since in many cases protein kinases possess very weak affinity to ATP that is caused by their low basal activity. Although these affinities are very low (millimolar &), STD NMR can be sensitive enough to indicate binding that is easily misinterpreted in terms of phosphotransferase ability. Direct observation of 31P NMR signals seems to be a more straightforward approach to monitor protein phosphorylation. It has the additional advantage that the phosphate groups in different phosphorylation sites can be discriminated and characterized in terms of function and dynamics [55]. However, "P NMR cannot characterize alternative phosphorylation-independent activation mechanisms. The direct observation of 31P N M R signals is inherently less sensitive than the observation of proton signals and can thus easily fail to detect low phosphorylation levels. 15.2.2.4.3
Studying the Dynamic Behavior of Kinases
Complex motions that are essential for the biochemical function are typical for the entire class of protein kinases. A thorough understanding of protein dynamical processes can provide novel points of attack for pharmaceutical applications. X-ray structural analyses of protein kinases provide snapshot pictures of the protein at different stages of their conformational cycle. For PKA, which is the best studied kinase in this respect, many different crystal structures provide a thorough understanding of the motions [56, 571. This data can help understand activation processes in other kinases too. However, this picture is inherently incomplete as it relies upon the availability of structural data that covers the whole motion. Models based on analogy arguments have to be further tested whether motions inferred for PKA can in fact be transferred to other kinases. Solution-state NMR can act as a complement in providing a dynamic picture that links the static structures obtained by X-ray crystallography. Two NMR methods can provide information about conformational rearrangements of a protein at atomic resolution. NMR relaxation measurements yield information on the timescale of a process, whereas RDC can characterize the spatial nature of such a motion. "N and 2 H relaxation can be employed to detect fluctuations of backbone dynamics of protein kinases on the nano- to picosecond and milli- to microsecond timescales. Analysis of the relaxation data allows for a semiquantitative estimation of the conformational entropy change for the main chain of protein kinases dependent on ligand binding or point mutation. RDCs can be measured in weakly aligning media such as phage solutions, bicelles or liquid crystals. The protein is restricted in its free tumbling by the
I
873
874
I media. The orientation-dependent dipolar coupling between nuclear spins, 15 Target Farndies
which is averaged to zero in free solution, becomes measurable without losing the advantages of solution-state NMR. RDCs contain information about the orientation of a nuclear spin pair with respect to an alignment tensor that is dependent on the way the protein rotation is hindered. Noteworthy, RDCs can be useful for the assignment of a protein with known tertiary structure. The knowledge of an orientation can be used as an additional tool to distinguish between several possible assignments. But the main contribution is the ability to determine relative orientations of protein domains or, as in the case of protein kinases, to distinguish between the more rigid and the flexible segments. 15.2.2.4.4
Binding Mechanisms by Lineshape Analysis
The elucidation of different mechanisms of ligand binding can lead to a deep understanding of specificity. The lineshapes of NMR signals include information about the kinetics of processes in the range of a micro- to millisecond timescale [58, 591. By titration of a ligand to a kinase, the lineshape alters differently for different underlying reaction mechanisms [GO]. Figure 15.2-13shows the expected lineshape ofa nucleus that changes its resonance frequency during the binding event. Typical lineshapes are revealed, when two states exist in equilibrium: Protein/ligand complexes and unbound molecules. At the beginning and at the endpoint of the titration, a sharp peak is expected at resonance frequency of the free protein and of the complex, respectively (Figure 15.2-13(a)).The lineshape constantly alters during the titration to broad peaks at intermediate resonances. The mechanism showing this titration behavior corresponds to a classical key/bolt principle. The ligand initially fits to the pocket of the protein without a conformational change in the neighborhood of the observed nucleus. In Fig. 15.2-13(b)it is presumed that the ligand first binds to the protein and an intermediate state is built, which then reacts to the complex. A third peak occurs during the titration, which gives evidence for the existence of the intermediate state. A titration as depicted in Fig. 15.2-13(c)would indicate an induced tit mechanism. The ligand induces a conformational change of the protein (or the free protein already exists in two forms) and only in this induced state the protein can react with the ligand to the complex. The peak of the “activated” protein initially occurs during the titration and disappears with increasing ligand concentration. The successful interpretation of lineshape titrations in terms of reaction mechanisms has already been demonstrated on SH2 domains [61, (521. The reaction mechanism can be obtained independently for different areas of the protein and the consistency of the interpretations can be checked by mapping the results of each nucleus on the structure of the protein. With such an amount of data redundancy, ligand binding mechanisms can also be elucidated for catalytic domains of kinases. A relevant interpretation for a drug discovery process is
15.2 Chemical Biology ofKinases Studied by N M R Spectroscopy
Fig. 15.2-13
Examples for simulated lineshape oftitrations curves, which indicate different binding mechanisms. The color o f the curves changes from blue to red with increasing ligand concentration, while the protein concentration is kept constant. (a) The model P L cf PL describes a titration curve o f a small ligand binding, for example, to the hinge ofthe ATP-binding site. The lineshape changes from a sharp peak ofthe particular amide resonance in the free protein state t o another sharp peak at the resonance o f the bound state. The intermediate curves show broader peaks with maxima between the lamor frequencies ofthe two states. This titration would give no evidence for a reaction mechanism that i s more complicated than a simple keylbolt mechanism. (b) The model P L t, PL” u
+
+
PL shows a possible titration curve o f a ligand that does not initially fit into the ATP-binding pocket. The ligand binds to the protein building an intermediate state. This intermediate state i s in a conformational equilibrium with the complex. The titration ends with two separate peaks. This ensemble oftitration curves is incompatible with a key/bolt mechanism described in (a). (c) The model P L tf P +L cf PL depicts possible titration curves o f a weak binder to an alternative conformation o f the kinase (e.g., the putative DFG-in/DFC-out). A conformational equilibrium already exists in the free form. The ligand binds only to one o f t h e conformations. The titration starts with two peaks. One o f t h e m constantly decreases while the other signal broadens and then arises in a new sharp peak.
+
*
even possible for cases, when a resonance appears or disappears with ligand binding, as it is shown in an example in Chapter 15.2.4.2.
15.2.3 Screening of Kinases by NMR 1 5.2.3.1
Screening Techniques/Strategies
Fragment-based screening is a lead discovery approach as an alternative to HTS-based techniques [63, 641. Much lower molecular weight (150-300 Da) compounds are screened relative to HTS campaigns. Fragment-based hits are typically weak inhibitors (10 pM-mM), and therefore need to be screened at higher concentrations using sensitive biophysical detection techniques such as protein crystallography and NM R. For high concentration bioassays, SPR, and
1
875
876
I conventional HTS the interpretation of data can often be hindered because of 15 Target Families
the high false-positive readouts. X-ray crystallography and NMR spectroscopy provide robust and straightforward information, but the typical throughput of up to 10000 compounds per screen is to be regarded as medium size. This drawback is compensated by higher hit rates than observed with HTS, because the lower complexity of the compounds have a higher probability of matching a target protein-binding site. Moreover, fragment hits typically possess high efficiency upon binding (binding energy per unit molecular mass). If the binding interactions of these fragment hits can be structurally validated, they are highly suitable for subsequent chemical optimization as clinical candidates with good druglike properties [25]. Especially, after an initial HTS screen fails to produce viable hits, the pharmaceutical research seeks to expand its lead identification strategies. Therefore, NMR-based screening is gaining momentum relative to HTS-based techniques [65-681. Unlike enzymatic assays, N M R or X-ray crystallography screens do not require enzymatic activity but measure the binding effect itself. Advantageously, an inactive kinase can be targeted by the screening efforts. Additionally,both methods can monitor binding sites other than the active site. The general advantage of an NMR-based approach is that it is a solution-state technique, which facilitates the handling. The folded state is discriminated from the unfolded state of the target protein in each single experiment. There is no need for immobilization (like in SPR), or crystallization (like in X-ray screening) of the protein. NMR-based screening methods can be classified into two groups according to the source of the observed signals, ligand-detected NMR and proteindetected NMR. Ligand-detected NMR is a robust method, which is well suited to screening compound mixtures with rapid deconvolution [69]. The target protein does not have to be labeled and its size is typically much more than 20 kDa. The protein production requirements are considered to be moderate. If the screen is performed in the presence of a competitor with known interaction mode, active site versus nonactive site binders can be distinguished and binding affinities can be derived. Protein-detected NMR, as exemplified by the patented "SAR (structureactivity relationship) by NMR" approach [70], provides the principle interactions between ligand and protein, if a backbone assignment is available. Like ligand-detected NMR, compound mixtures can be screened, but the deconvolution to identify the real hit needs additional experiments. Usually, the "N CSPs of the protein amide resonances are observed, which are caused upon ligand binding. If the three-dimensional structure of target protein is available, direct information about the binding interactions is extracted at the residue-specific resolution. In combination with a follow-up investigation using NMR-restrained molecular docking, the binding mode of the viable hits can be derived at atomic resolution [71].At least "N-labeling is required, which
75.2 Chemical Biology of Kinases Studied by NMR Spectroscopy I 8 7 7
increases the demands for the protein production and pushes the achievable size limit to 40-50 kDa. An investment in a cryo-probehead reduces the protein amount by at least fourfold. The synergy of ligand- and protein-based NMR screening is revealed in their combination. Ligand-detected N M R is used as a primary screen in large scale sampling, hit validation is performed with protein-based N M R with much less samples. False positives obtained from the primary screen are ruled out and subsequent analysis during the validation step increases the knowledge gain about the desired interaction mode of the prestage drug candidates.
15.2.3.2
Fragment Approach
The fragments can be considered as building scaffolds of a more complex compound. After an initial validation they are combined or optimized into compounds that meet the rational criteria of lead generation. As reviewed in a more detailed fashion, the optimization process can be divided into three strategies [64]. Fragment linking is used if two fragments have been identified, which bind in adjacent binding sites being close enough to each other to be chemically linked. Fragment evolution means the subsequent chemical modification by introducing optimized functional groups or new side chains that target additional interactions in the active site of the protein. Fragment self-assembly makes use of reactive templates that are capable of self-assembly in the presence of a seed template molecule. The first two methods are heavily constrained on the available structural data. The assignment of, at least, protein backbone resonances is clearly the prerequisite to be achieved prior to a chemical optimization series. Lacking an assignment, the information obtained by NMR-based fragment identification and by their corresponding binding affinities can be transferred to X-ray crystallography to reduce the number of trials. The prominent role of virtual screening of large compound databases is to be outlined. Both NMR and X-ray crystallography have a medium throughput combined with costly instrumentation or specialized infrastructure. A rational approach made in silico to select a smaller subset for screening, addresses the economic demands of industrial research. The filter rules used during the virtual screening already incorporate the basic properties of fragments from known inhibitors. The first filter rule downsizes the available compound collection for NMR or crystallography suitable properties, like for example water solubility, and removes unwanted functional groups. During the next filtering the wanted functionality is included, which match the localized recognition elements in a simplified model of interaction. As an example, the pharmacophoric fingerprint for a given protein kinase considers the aromaticity of the adenine moiety in combination with its H-bond donor and acceptor functions [72]. Successful applications have not only been reported for single
878
I protein
15 Target Families
kinases, for example carboxyl-terminal Src kinase homologous kinase-1 (ChK-1) kinase and CK2 [73, 741, but also at the gene-family level [75, 761.
15.2.3.3
NMR Reporter Screening
Ligand-detected N M R screening in the presence of a competitor with known complex structure and affinity has several advantages. Firstly, the strong resonances of the well-behaved and highly soluble competitor serve as the reporter for the binding event. As a matter of principle, it has lower affinity and its signals can be obtained by various N M R methods (e.g., T l p relaxation or STD N M R experiments depending on the best dispersion within a given screening run) [42, 43, 77, 781. This permits the detection of potential highaffinity molecules that are only marginally soluble, thus significantly enlarging the diversity of compounds amendable to N M R screening. Secondly, with the known binding constant of the reporter compound, K,values of the hits can be derived. This useful approach allows a ranking applied to the primary hit list. For practical purposes, relative binding affinities are sufficient in most cases if the absolute & value of the reporter remains unknown. It is the prioritization of viable compounds to further investigations, which is of importance especially in the competitive environment of industrial research. Thirdly, if the compound was identified in the absence of the competitor and if the experiment is repeated in its presence, conclusions about the binding site can be drawn. Either the competitor is directly displaced by the candidate or the binding event takes place at an allosteric site. If applied to protein kinases, a simple derivative of the adenine group is suitable, which provides rather low affinity combined with good aqueous solubility. In a similar manner, allosteric kinase inhibitors can be discovered by spinlabeled adenine analogs (Fig. 15.2-8(b)).The degree of paramagnetic relaxation enhancement allows an estimation of the distance of the ligand relative to the spin label. This method is ideally suited to the fragment-linking approach of lead generation [79, 801. The use of the ATP resonances as the reporter has been described recently. Reduction of the ATP STD NMR signals by a competitive inhibitor, permitted a direct measurement of the inhibitor Ki with respect to the natural substrate ATP. After this initial measurement, the assay was combined with paramagnetic relaxation enhancement effect. In a second step, Maganese ions were added to the samples, which turned the Mg2+/ATP complex into a paramagnetic probe. The proximity of a potential non-ATP competitive compound can be inferred [81]. Alternatively, a recognition site in close proximity to the ATP-binding site can be targeted with an oligopeptide as the reporter ligand, whose sequence is derived from the activating kinase MKK3b in case of p38 MAP kinase [82]. The inhibition of the protein-protein interaction with a small peptide could serve as a template for peptidomimetic inhibitor development [83, 841.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
Though no nonpeptidic small molecules have been reported to bind at the protein substrate recognition sites or the recruitment sites, respectively, it is still a considerable scope. The diversity of potential contacts and the resulting selectivity nurture prospects of further research efforts instead of targeting the conserved ATP-binding site.
15.2.3.4
Results for Screening
An effective follow-up strategy of initial fragment-based hits is crucial if the information from weak N M R binders should lead to the identification of potent and selective optimization candidates. Especially, the protein kinase-family approach with its deep and well-defined ATP-binding pocket is challenged by the large amount of data being generated by the high hit rates. Depending on the virtual screening method that leads to the tailored screening library, hit rates roughly up to 5% can be expected. Figure 15.2-14 exemplifies that the family approach with a kinase-biased compound collection yields a large number of hits. Data mining and the subsequent selection for specificity toward a particular kinase is an important but difficult process [85]. The general workflow, which proves to be valid for compound or library optimization respectively, is outlined. First, the existing knowledge about kinase inhibitors is incorporated into the starting compound collection, which serves as a validation set. The fragments of known high-affinity ligands are identified and ligand-detected N M R approach is applied to the ensemble of kinases. As a proof of principle, the binding profile of different kinases, that is the binding sites of selected fragments must be identified by the different N M R methods and the results are compared with the data obtained by other assays or X-ray crystallography, respectively. Second, the validated N M R methodology is applied to a screening approach at larger scale. Virtual screening creates the kinase-biased compound collection. Simplified models of interactions observed at the validation set provide the basis for the filtering rules, which downsize large in-house compound collections. Third, all hits are subjected to an N M R competition assay to derive binding affinities (Ki values) to each kinase. This data generates a ranking list, in which compounds with reasonably sufficient affinity or selectivity toward a particular kinase are identified. Patent situation and chemical feasibility, if the fragment is capable of further development, is regarded. Fourth, protein-detected N M R by ['H, "N]-TROSY spectra is applied to a selection of compounds obtained during the ranking. On one hand, this step verifies the observed hits and rules out false positives. On the other hand, the binding site is characterized by mapping the observed CSPs on the corresponding 3D structure. Fifth, the CSPs for compounds of high interest are subsequently used as restraints for molecular docking simulations. The binding mode is revealed at atomic resolution and medicinal chemists can select desired ligand properties
I
879
880
I
75 Target Families
Fig. 15.2-14 Kinase selectivity profiles for a best with the observed kinase affinities. representative dataset obtained by liganddetected NMR fragment approach. Each row represents a kinase and the columns represent a small-molecule fragment. Eight hundred and seventy compounds were chosen out of a larger kinase-biased screening library. The color-coding scheme corresponds t o a particular compound having K, values greater (light gray) or lower (dark gray) than 1.5 m M toward a single kinase. The horizontal order is the result o f a hierarchical clustering analysis (euclidean-Ward) with 65 descriptors (out o f 210 chemical descriptors), which correlate
Twenty-nine clusters are lined up consecutively and all the compounds, which are members of a single cluster, are ordered again. The higher the average affinity, the position o f a compound is more t o the left side within a given cluster and vice versa. At the bottom, three clusters are denoted as an example. Most fragments bind with similar affinities t o all kinases. To choose for selectivity, isolated dark gray areas are t o be picked within a row, where several similar compounds group together but do not show the same affinities toward the other kinases (light gray in the vertical).
to design the next optimized compound collection. In an iterative fashion, the newly synthesized library is subject to step three. Figure 15.2-15 exemplifies the improvement after two iterations. The workflow clearly demonstrates that the combination strategy of liganddetected and protein-detected NMR meets the economical demands of pharmaceutical research. The throughput of ligand-detected NMR is faster and the requirements for producing unlabeled protein is moderate. With the increase of ligand knowledge, protein-detected N M R with "N-labeled kinases becomes more prominent. Specific questions are addressed with more sophisticated N M R methods, which are more time consuming but lead to detailed information about protein-ligand interaction at atomic resolution.
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy I881
Fig. 15.2-15 Development o f kinase-biased screening libraries during an NMR-based fragment approach. The K, values ofthe fragments are obtained by quantification o f the STD NMR resonances o f an adenosine derivative, which i s used as the reporter ligand in a competition assay, (a) As a proof o f principle, published high-affinity ligands are fragmented into their components. The NMR method is applied t o this validation set, which reveals the typical affinities o f a particular kinase toward the “standard” kinase fragments. For example, fragments usually exhibit higher affinities t o kinase A
than to the others due t o the activated state ofthis kinase. (b) A largerfragment library is created by virtual screening, which utilizes the pharmacophoric fingerprints o f known kinase inhibitors. After the screening, viable hits are selected and characterized by protein-based NMR. The information about selectivity, binding site, and binding mode was used for step-by-step optimization of Small compound collections. (c) and (d) show that the synthesis efforts result in the enhancement o f selectivity toward the third kinase.
A potent inhibitor of the serine/threonine Jun N-terminal Kinase 3 (JNK3) was identified after using a fragment-based N M R approach. A follow-up study with competition-bindingmethods and molecular docking based on the crystal structure of JNK3 proposed potential binding models. These models were used in turn to synthesize a set of several thousand optimized compounds
882
IS Target Families
I that contained elements from the original fragments, leading to the final hits [%I. The same NMR approach by the fragment-linking strategy was used to enhance the activity of a weak inhibitor of p38 MAP kinase. By adding the first one and then a second aromatic ring onto the central five-member heterocylic ring, the activity reached the nanomolar regime [87].
15.2.4 Characterizing Kinase-Ligand Interactions by NMR
There are many interaction sites for inhibiting the phosphotransferase activity of a protein kinase. Antagonism of the ATP-binding site to inhibit enzymatic activity is the center of most investigations. Inhibition of this site can be accomplished by unspecific inhibitors like staurosporine, and various kinasespecific inhibitors have been discovered. Nevertheless, selectivity continues to be a problem due to the commonality in the binding of ATP. All ATP site binders bind to the highly conserved “hinge” region that connects N- and C-terminal lobes. But the deep ATP cleft consists of several subsites that can be utilized in the structure-based design of inhibitors. For example, the pivotal role of protruding nonconserved residues has been reported, which facilitates the access to particular subpockets, like a gate keeper. In the cases of imatinib, gefitinib, and erlotinib clinical trials exhibited that single point mutations in the active site lead to chronic resistance during the drug treatment [88, 891. Alternatively, the kinase activation by interfering with regulatory subunit binding can be prevented. Interactions can be stabilized, which maintain kinase in the inactive form where it cannot bind ATP or where the residues are misaligned for catalytic activity. Since inactive kinases must be correctly recognized by activating enzymes, they differ more strongly from one another than the activated forms, all of which fulfill the same function. The design of binders to the inactive form could achieve a higher degree of selectivity. In particular, the Asp-Phe-Gly motif (DFG) of the activation loop has attracted much attention from medicinal chemists. A selective inhibitor at an adjacent binding site turns a residue of the DFG loop into an “out” conformation that precludes ATP from binding [go, 911. Kinase activity can be indirectly inhibited by blocking the protein substrate recruitment site or by direct inhibition of substrate phosphoacceptor subsite. Like all protein-protein interaction surfaces this binding site is more difficult to target by smallmolecule inhibitors. It remains a considerable task for selectively targeting individual kinases in this manner.
15.2.4.1
Mapping o f Chemical Shift Perturbations
The observation of ligand-induced NMR CSPs usually defines the interaction site of a ligand reasonably well, if the assignment of NMR resonances is
15.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
available. As mentioned above, protein-based N M R screening approaches are ideally suited as a follow-up to ligand-detected NMR approach at a larger scale. Simultaneously with the validation of the primary assay it allows the determination of the binding site for identified ligands (also offering the combination with other primary assays like HTS or SPR). Figure 15.2-16 shows the ligand-induced CSP for p38 interacting with either the small-molecule inhibitor SB203580 (see structure in Fig. 15.2-8(c)) binding to the ATP-binding site or an oligopeptide binding to the protein substrate docking site for MEF2A [82]. As the protein-ligand complexes are known in both cases, CSPs can be easily compared with the complex structures and show close similiarity between the X-ray structure and the structure derived by N M R . The pronounced CSPs of SB203580 are induced by the ring current effects of the aromatic ring systems in SB203580. The peptide derived from MEF2A induces weaker CSPs because of the lack of aromatic amino acids in the peptide sequence. The affected region covers a larger part of the protein, reflecting the size of the peptide and the tertiary rearrangements known from the X-ray structure.
Fig. 15.2-16 Ligand binding is detected by CSPs. The two-dimensional [’ H,15N]-TROSY spectra o f the uniformly 15N-labeledp38 MAP kinase in the absence and presence are compared. The difference o f a given amide resonance on ligand binding is calculated and projected on the crystal structure o f the kinase. A color-coding scheme is used,
which considers the average value o f CSPs and their mean square deviation. (a) CSPs o f the small-molecule inhibitor 58203580 mapped on 1A9U.pdb (b) CSPs o f t h e oligopeptide (KPDLRVVIPP) derived from the protein substrate MEF2A mapped on 1 LEW.pdb.
I
883
884
I 15.2.4.2
15 Target Families
DFC-in/DFC-out
Recently, an alternative binding site adjacent to the ATP-binding cleft has been exploited for pharmaceutical intervention. The pyrazole-urea-based inhibitor BIRB796 (structure see Fig. 15.2-8(d))induces an alternative conformation of the DFG motif of p38 MAP kinase, turning the side chain of Phe169 from an “in” to an “out” configuration. The corresponding loop undergoes a 10 A shift that precludes ATP binding through the incompatibility of the new position of the Phe side chain. This recognition principle has been successfully applied to the protein kinases such as Raf [92],p38 MAP kinase [go, 91, 931, or kinase insert domain receptor (KDR) [94]. In the NMR analysis of this part of the polypeptide chain, the DFG loop (Asp168-Phe169-Gly170) turned out to be one of the segments that could not be assigned in the spectra of the apo form of p38 MAP kinase. A [’H, ”N]-TROSY spectrum recorded from selectively l 5N-Phe labeled samples revealed 12 of 13 phenylalanine correlations. The 12 visible signals were unambiguously assigned; the unobservable signal belongs to Phe169 in the DFG loop. This finding was confirmed by the spectrum of selectively ”N-Phe labeled mutant Phel69Tyr, which exhibited an identical TROSY spectrum with 12 peaks. Altered field strengths, temperatures, and more sensitive acquisition conditions with a cryoprobe head did not affect the result. On addition of the pyrazole-urea-based DFG-out inhibitor to a selectively ”N-Phe labeled p38 sample, 13 peaks can be detected. A further investigation with 13C’-labeledAsp/”N-labeled Phe, recording a HNCO-type experiment confirmed the assignment of Phe169. The lineshape of the Phe169 amide resonance was simulated and analyzed with respect to the ability to detect the peak in a [‘H, ”N]-TROSY spectrum. The chemical shift difference between DFGin and DFG-out conformations was estimated by a chemical shift prediction according to the published X-ray structures. Figure 15.2-17 shows the relative maximum peak intensities of the amide ”N-resonance of Phe169 as a function of the exchange rate and the relative population of the “out” state in the simulated spectra. The lowest peak intensities are expected for medium exchange rates at equally distributed states. The extent of this area shrinks with the decreasing field strength of the spectrometer. The situation during the NMR measurement of the apo protein and complexes with DFG-in ligands seems to be in the depicted area, where the lineshape leads to excessive broadening. In principle, the peak is detectable again by changes in temperature (move left or right in the diagram), by a decrease of the field strength or by “freezing” one of the two conformations with a ligand (move up or down in the diagram). For the apo form of the p38 MAP kinase it was deduced that the absence of the amide peak for Phe169 in the DFG motif under all tested N M R conditions is consistent with a conformational “in/out” equilibrium taking place at an intermediate NMR timescale. Binding of the pyrazole-urea-based DFG-out inhibitor is not compatible with the DFG-in conformation; therefore, the conformational exchange process of the DFG loop is directly interfered.
75.2 Chemical Biology ofKinases Studied by NMR Spectroscopy
Fig. 15.2-17 Simulation o f NMR spectra o f a two state DFC-in/DFC-out model. The grayscale represents the relative maximum peak intensities o f the 15N-amide resonance o f Phe169 as a function o f the exchange rate and the population ofthe “out” state. The magnetic field strength is set according to a H resonance at 600 MHz. The chemical shift difference was set to 13.7 PPm, as predicted by chemical shift calculations applied to the published X-ray structures (1 P38.pdb and 1 KV1 .pdb). The lowest
’
maximum detectable peak intensities are expected for medium exchange rates and about uniformly distributed states. The extent of this area shrinks with decreasing field strength ofthe spectrometer. Unobservable peaks can be made visible again by changes in temperature (move left or right in the diagram), by decrease o f t h e field strength, or by “freezing” one of the two conformations with a ligand (move up Or down in the diagram).
The observation of the Phe169 amide resonance in the presence of DFG-out inhibitors confirms this hypothesis. In contrast to the pyrazole-urea-based compounds, the inhibitor class similar to SB203580 has been described as DFG-in binders [95],where the conformation of the DFG loop is similar to that observed in the crystal structures of apo form of p38 [96, 971. SB203580 or SKF86002 as DFG-in ligands of p38 do not invoke additional peaks in the [‘H, ”N]-TROSY spectrum of ”N-Phep38. The observed N M R data suggests that the reported “DFG-in” binders leave a putative conformational DFG “in/out” equilibrium in a time regime,
I
885
886
I where the Phe169 amide correlation depletes. This suggestion is in agreement 15 Target Families
with the results obtained by biological assays, that is, DFG-in ligands do not interfere with the p38 activation [98],whereas DFG-out inhibitors block both activity and activation of p38 [99].
15.2.4.3
LIGDOCK
The LIGDOCK procedure [71] was suggested for the determination of protein-ligand complex structures from non-X-ray data. Ambiguous experimental data from NMR [loo, 1011 or from other biophysical or biochemical experiments is introduced in an ambiguous manner [102-1041, which makes it possible to determine proteinlligand complexes on the basis of only a few experiments. The concept is based on the idea to collect readily available CSPs, first. If necessary, more sophisticated experimental results have to be added to improve the accuracy of the structure determination. The calculations consist of three stages. In the first step, the two molecules are positioned distinct to each other and a rigid body minimization is performed. Poses that best fulfill the experimental parameters proceed to a simulated annealing in torsion angle space keeping the ligand and the binding area ofthe protein as flexible. Possible solutions are equilibrated with a molecular dynamics simulation using explicit water. A critical step of the procedure is the ranking of the structures. Accurate structures are picked from a “selection plot” in which both the intermolecular van der Waals and experimental energy are plotted. Structures having both a low van der Waals and a low experimental energy are possible solutions. By contrast, structures in which only one of the two energy terms is low are discarded. The approach was tested for three examples with increasing degree of complexity. The determination of PTP8 in complex with ptplb can be resolved with CSPs only. Here, the definition of the binding site suffers to resolve the structural problem. The calculation for H7 in complex with PKA presents two problems, which are common in the structure determination using non-X-ray data: only partial NMR assignment of the protein was available and additionally, the protein conformation in the complex is an “open” conformation but the apo structure has a “closed” form. The choice of the starting conformation influences the result of the simulation. Nevertheless, the calculations were started with the “wrong” apo form. Surprisingly, the orientation and possible constructive interactions of the quinazoline ring that is the main feature of the H series of inhibitors are correctly reproduced, although the starting structure of the protein and the known X-ray structure of the complex were very different and only partial assignment of PKA was available. The determination of the structure of SB203580 in complex with p38 was most complicated because of the specific shape of the ligand. It has one twofold and one threefold rotation symmetry axes, implying that the ligand can occupy the binding site also in other symmetry-related orientations. Therefore, it is not possible to determine the complex structure with CSPs only. But in combination with either STD
References I887
experiments of selectively labeled p38 (SOS-NMR, structural information using overhauser effects and selective labeling) [lo51 or the introduction of a knowledge-based restraint, this structural problem could be resolved.
References 1.
2.
3.
4.
5. 6.
7.
8. 9. 10.
11. 12. 13. 14.
15.
A. Bellacosa, C.C. Kumar, A. Di Cristofano, J.R. Testa, Adu. Cancer Rex 2005, 94, 29-86. H. Hirai, N. Kawanishi, Y. Iwasawa, Curr. Top. Med. Chem. 2005, 5, 167- 179. J.G. Shelton, L.S. Steelman, S.L. Abrams, F.E. Bertrand, R.A. Franklin, M. McMahon, ].A. McCubrey, Expert Opin. 7'her. Targets 2005, 9,1009-1030. S.M. Keenan, J.A. Geyer, W.J. Welsh, S.T. Prigge, N.C. Waters, Comb. Chem. High Tnroughput Screen 2005, 8, 27-38. J. Dancey, E.A. Sausville, Nat. Rev. Drug Discov. 2003, 2,296-313. T. Schindler, W. Bornmann, P. Pellicena, W.T. Miller, B. Clarkson, J. Kuriyan, Science 2000, 289,1938-1942. S. Kobayashi, T.J. Boggon, T. Dayaram, P.A. Janne, 0. Kocher, M. Meyerson, B.E. Johnson, M.J. Eck, D.G. Tenen, B. Halmos, N. Engl. J . Med. 2005,352,786-792. T. Hunter, Cell 2000, 100, 113-127. S.K. Hanks, Genome Biol. 2003, 4, 111. S.S. Taylor, E. Radzio-Andzelm, T. Hunter, F A S E B J . 1995, 9, 1255- 1266. A. Gescher, Cen. Pharmacol. 1998, 31,721-728. F.A. al-Obeidi, J.]. Wu, K.S. Lam, Biopolymers 1998, 47, 197-223. T.J. Boggon, M . J . Eck, Oncogene 2004, 23,7018-7927. H.L. De Bondt, J. Rosenblatt, 1. Jancarik, H.D. Jones, D.O. Morgan, S.H. Kim, Nature 1993, 363, 595-602. S.H. Hu, M.W. Parker, J.Y. Lei, M.C. Wilce, G.M. Benian, B.E. Kemp, Nature 1994, 369, 581-584.
16.
17.
18.
19. 20.
21.
22.
23.
24. 25.
26. 27. 28.
29.
30.
S.R. Hubbard, L. Wei, L. Ellis, W.A. Hendrickson, Nature 1994, 372, 746-754. D.R. Knighton, J.H. Zheng, L.F. Ten Eyck, V.A. Ashford, N.H. Xuong, S.S. Taylor, J.M. Sowadski, Science 1991, 253,407-414. F. Zhang, A. Strand, D. Robbins, M.H. Cobb, E.J. Goldsmith, Nature 1994, 367,704-711. J.A. Adams, S.S. Taylor, Protein Sci. 1993, 2, 2177-2186. J . Zheng, D.R. Knighton, N.H. Xuong, S.S. Taylor, J.M. Sowadski, L.F. Ten Eyck, Protein Sci. 1993, 2, 1559-1573. M. Lei, M.A. Robinson, S.C. Harrison, Structure ( C a m b ) 2005, 13, 769-778. D.R. Knighton, J.H. Zheng, L.F. Ten Eyck, N.H. Xuong, S.S. Taylor, j.M. Sowadski, Science 1991, 253, 414-420. M.E. Bollard, E.G. Stanley, 1.C. Lindon, J.K. Nicholson, E. Holmes, NMR Biomed. 2005, 18,143-162. W.F. Reynolds, R.G. Enriquez,J. Nat. Prod. 2002, 65,221-244. R.A. Carr, M. Congreve, C.W. Murray, D.C. Rees, Drug Discou. Today 2005, 10,987-992. I. Hunt, Protein Expr. Pur$2005, 40, 1-22. M.J. Wood, E.A. Komives, J . Biomol. NMR 1999, 13, 149-159. M. Bruggert, T. Rehm, S. Shanker, J. Georgescu, T.A. Holak, J . Biomol. N M R 2003, 25,335-348. A. Strauss, F. Bitsch, G. Fendrich, P. Graff, R. Knecht, B. Meyhack, W. Jahnke,J . Biomol. NMR 2005,31, 343-349. C. Mao, M. Zhou, F.M. Uckun,J. B i d . Chem. 2001, 276,41435-41443.
888
I
15 Target Families 31.
32.
33.
34.
35.
36. 37. 38.
39.
40.
41. 42. 43.
44.
45.
46. 47.
48.
M. Lei, W. Lu, W. Meng, M.C. Parrini, M.J. Eck, B. J. Mayer, S.C. Harrison, Cell 2000, 102, 387-397. T. Langer, S. Sreeramulu, M. Vogtherr, B. Elshorst, M. Brtz, U. Schieborr, K. Saxena, H. Schwalbe, FEBS Lett. 2005, 579, 4049-4054. T. Langer, M. Vogtherr, 8 . Elshorst, M. Betz, U. Schieborr, K. Saxena, H. Schwalbe, Chembiochem 2004, 5, 1508- 1516. K. Pervushin, R. Riek, G. Wider, K. Wuthrich, Proc. Natl. Acad. Sci. U.S.A. 1997, 94,12366-12371. M. Sattler, J. Schleucher, C. Griesinger, Prog. N M R Spectrosc. 1999,34,93-158. C.S.J. Johnson, Prog. N M R Spectrosc. 1998,34,203-256. W. Kremer, H.R. Kalbitzer, Methods Enzymol. 2001, 339, 3-19. F. Lohr, V. Katsemi, J. Hartleib, U. Gunther, H. Ruterjans, J. Biomol. N M R 2003,25,291-311. P. Guntert, M. Salzmann, D. Braun, K. Wuthrich, J. Biomol. N M R 2000, 18,129-137. S. Neal, A.M. Nip, H. Zhang, D.S. Wishart, J. Biomol. N M R 2003, 26, 21 5-240. P.A. Kosen, Methods Enzymol. 1989, 177,86-121. W. Jahnke, Chembiochem 2002, 3, 167-173. W. Jahnke, S. Rudisser, M. Zurini, J Am. Chem. SOC.2001, 123, 3 149- 3 150. B. Cutting, A. Strauss, G. Fendrich, P.W. Manley, W. Jahnke, J. Biomol. N M R 2004, 30,205-210. J. Wohnert, K. J. Franz, M. Nitz, B. Imperiali, H. Schwalbe,J. Am. Chem. SOC.2003, 125,13338-13339. K.J. Franz, M. Nitz, B. Imperiali, Chembiochem 2003,4, 265-271. M. Vogtherr, K. Saxena, S. Grimme, M. Betz, U. Schieborr, B. Pescatore, T. Langer, H. Schwalbe, J . Biomol. N M R 2005,32,175. P.D. Jeffrey, A.A. Russo, K. Polyak, E. Gibbs, J. Hunvitz, J. Massague, N.P. Pavletich, Nature 1995, 376, 313-320.
49. 50.
51.
52. 53.
54.
55.
56.
57.
58. 59.
60. 61.
62.
63.
64.
J. Goldberg, A.C. Nairn, J. Kuriyan, Cell 199G, 84, 875-887. D.M. Payne, A.J. Rossomando, P. Martino, A.K. Erickson, J.H. Her, J. Shabanowitz, D.F. Hunt, M.J. Weber, T.W. Sturgill, E M B O J . 1991, 10,885-892. . F. Li, M. Gangal, C. Juliano, E. Gorfain, S.S. Taylor, D.A. Johnson, J . Mol. Biol. 2002, 315,459-469. L.N. Johnson, M.E. Noble, D.J. Owen, Cell 1996,85,149-158. J. Zhang, F. Zhang, D. Ebert, M.H. Cobb, E.J. Goldsmith, Structure 1995, 3,299-307. D. Brancho, N. Tanaka, A. Jaeschke, 1.1. Ventura, N. Kelkar, Y. Tanaka, M. Kyuuma, T. Takeshita, R.A. Flavell, R.J. Davis, Genes Dev. 2003, 17,1969-1978. M.H. Seifert, C.B. Breitenlechner, D. Bossemeyer, R. Huber, T.A. Holak, R.A. Engh, Biochemistry 2002, 41,5968-5977. D.A. Johnson, P. Akamine, E. Radzio-Andzelm, M. Madhusudan, S.S. Taylor, Chem. Rev. 2001, 101,2243-2270. S.S. Taylor, J. Yang, J. Wu, N.M. Haste, E. Radzio-Andzelm, G. Anand, Biochim. Biophys. Acta 2004, 1697,259-269. P.W. Andersen,3. Phys. Soc.Jpn. 1954, 9,316-339. J. Sandstrom, Dynamic nuclear magnetic resonance spectroscopy, Academic Press, New York, 1982. U.L. Gunther, B. Schaffhausen,J. Biomol. N M R 2002, 22,201-209. U. Gunther, T. Mittag, B. Schaffhausen, Biochemistry 2002, 41,11658-11669. T. Mittag, B. Schaffhausen, U.L. Gunther, /. Am. Chem. SOC.2004, 126,9017-9023. D.A. Erlanson, R.S. McDowell, T. O’Brien, C. Wiesmann, K.J. Barr, J. Kung, J. Zhu, W. Shen, B.J. Fahr, M. Zhong, L. Taylor, M. Randal, S.K. Hansen, 3. Med. Chem. 2004, 47, 3463- 3482. D.C. Rees, M. Congreve, C.W. Murray, R. Carr, Nat. Rev. Drug Discou. 2004, 3, 660-672.
References I889 65. 66. 67. 68.
69. 70.
71.
72.
M. Vogtherr, K. Fiebig, EXS 2003, 93, 183-202. J.M. Moore, Curr. Opin. Biotechnol. 1999, 10,54-58. B. Meyer, T. Peters, Angew. Chem., Int. Ed. Engl. 2003, 42, 864-890. M. Pellecchia, D.S. Sem, K. Wuthrich, Nut. Rev. Drug DiSCOv. 2002, I , 211-219. K.A. Mercier, R. Powers,]. Biomol. N M R 2005,31,243-258. S.B. Shuker, P.J. Hajduk, R.P. Meadows, S.W. Fesik, Science 1996, 274,1531-1534. U. Schieborr, M. Vogtherr, B. Elshorst, M. Betz, S. Grimme, B. Pescatore, T. Langer, K. Saxena, H. Schwalbe, Chembiochem2005, 13, 13. N. Baurin, F. Aboul-Ela, X. Barril, B. Davis, M. Drysdale, B. Dymock, H. Finch, C. Fromont, C. Richardson, H. Simmonite, R.E. Hubbard']' Cornput' sci' 2004,44,2157-2166. P.D. Lyne, P.W. Kenny, D.A. Cosgrove, C. Deng, S. Zabludoff, J.J. Wendoloski, S. Ashwell, 1.Med. Chem. 2004,47,1962-1968. E. Vangrevelinghe, K. Zimmermann, J. Schoepfer, R. Portmann, D. Fabbro, P. Furet,J. Med. Chem. 2003,46,2656-2662. E. ter Haar, W.P. Walters, S. Pazhanisamy, P.Taslimi, A.C. Pierce, G.W. Bemis, F.G. Salituro, S.L. Harbeson, Mini. Rev. Med. Chem. 2004,4,235-253. C. Chuaqui, 2 . Deng, J. Singh,]. Med. Chem. 2005,48, 121-133. C. Dalvit, M. Flocco, S. Knapp, M. Mostardini, R. Perego, B.J. Stockman, M, Veronesi, M, Varasi,]. Am. Chem. SOC.2002, 124, 7702-7709. W. Jahnke, P. Floersheim, C. Ostermeier, X.Zhang, R. Hemmig, K. Hurth, D.P. Uzunov, Angew. Chem., Int. Ed. Engl. 2002, 41, 3420-3423. W. Jahnke, M.J. Blommers, C. Fernandez, C. Zwingelstein, R. Amstutz, Chernbiochem 2005, 6, 1607- 1610.
'"f:
73.
74.
75.
76. 77.
78.
79.
80. W. Jahnke, A. Florsheimer, M.J.
81.
82.
83.
84.
85. 86. 87.
88. 89.
90.
91.
Blommers, C.G. Paris, J. Heim, C.M. Nalin, L.B. Perez, Curr. Top. Med. Chem. 2003,3,69-80. M.A. McCoy, M.M. Senior, D.F. Wyss,]. Am. Chem. Soc. 2005, 127, 7978-7979. C.I. Chang, B.E. Xu, R. Akella, M.H. Cobb, E.J. Goldsmith, Mol. Cells 2002, 9, 1241-1249. G. Kontopidis, M.J. Andrews, C. Mclnnes, A. Cowan, H. Powers, L. Innes, A. Plater, G. Griffiths, D. Paterson, D.I. Zheleva, D.P. Lane, S. Green, M.D. Walkinshaw, P.M. Fischer, Structure ( C u m b ) 2003, 1I , 1537- 1546. C, ~ ~M.J, ~ l ~D,I, ~ d ~ Zheleva, D.P. Lane, P.M. Fischer, Curr. Med. Chem. Anticancer Agents 2003, 3,57-69. C. Mclnnes, P.M. Fischer, Curr. P h a m . Des. 2005, 11,1845-1863. J. Fejzo, C. Lepre, X. Xie, Curr. Top. Med. Chem. 2003, 3,81-97. J. Fejzo, C.A. Lepre, J.W. Peng, G.W. Bemis, M.A. Murcko, Ajay, J.M. Moore, Chem. Biol. 1999, 6, 755-769. R. Ren, Nut. Rev. Cancer2005, 5, 172-183. T.A. Carter, L.M. Wodicka, N.P. Shah, A.M. Velasco, M.A. Fabian, D.K. Treiber, Z.V. Milanov, C.E. Atteridge, W.H. Biggs 111, P.T. Edeen, M. Floyd, J.M. Ford, R.M. Grotzfeld, S. Herrgard, D.E. Insko, S.A. Mehta, H.K. Patel, W. Pao, C.L. Sawyers, H. Varmus, P.P. Zarrinkar, D.J. Lockhart, Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 11011-11016. C . Pargellis, L. Tong, L. Churchill, P.F. Cirillo, T. Gilmore, A.G. Graham, P.M. Grob, E.R. Hickey, N. Moss, S. Pav, J . Regan, Nut. Struct. B i d . 2002, 9, 268-272. J. Regan, A. Capolino, P.F. Cirillo, T. Gilmore, A.G. Graham, E. Hickey, R.R. Kroe, J. Madwed, M. Moriak, R. Nelson, C.A. Pargellis, A. Swinamer, C. Torcellini, M. Tsang, N. Moss,]. Med. Chem. 2003,46,4676-4686.
~
~
~
~
~
890
I
15 Target Families 92.
93.
94.
95.
96.
97.
P.T. Wan, M.J. Garnett, S.M. Roe, P.G. McCaffrey, S.P. Chambers, M.S. S. Lee, D. Niculescu-Duvaz, V.M. Su,J. Biol. Chem. 1996, 271, Good, C.M. Jones, C.J. Marshall, C.J. 27696-27700. Springer, D. Barford, R. Marais, Cell 98. S. Kumar, M.S. Jiang, J.L. Adams, 2004, I 1 6,855-867. J.C. Lee, Biochem. Biophys. Res. J. Branger, B. van den Blink, Commun. 1999,263,825-831. S. Weijer, J. Madwed, C.L. Bos, 99. Y. Kuma, G. Sabio, J. Bain, A. Gupta, C.L. Yong, S.H. Polmar, N. Shpiro, R. Marquez, A. Cuenda, J. D.P. Olszyna, C.E. Hack, S.J. van Biol. Chem. 2005, 280,19472-19479. Deventer, M.P. Peppelenbosch, 100. C. Dominguez, R. Boelens, A.M. T. van der Poll, J. Immunol. 2002, Bonvin, J. Am. Chem. SOC.2003, 125, 168,4070-4077. 1731-1737. P.W. Manley, G. Bold, J. Bruggen, 101. A.D. van Dijk, R. Boelens, A.M. G. Fendrich, P. Furet, J. Mestan, Bonvin, J.P. Linge, S.I. O’Donoghue, C. Schnell, B. Stolz, T. Meyer, M. Nilges, FEBSJ. 2005, 272, B. Meyhack, W. Stark, A. Strauss, 293-312. J. Wood, Biochim. Biophys. Acta 2004, 102. J.P. Linge, S.I. O’Donoghue, 1697,17-27. M. Nilges, Methods Enzymol. 2001, Z . Wang, B.J. Canagarajah, J.C. 339,71-90. Boehm, S. Kassisa, M.H. Cobb, P.R. 103. M. Nilges,J. Mol. Biol. 1995, 245, Young, S. Abdel-Meguid, J.L. Adams, 645-660. E.J. Goldsmith, Structure 1998, 6, 104. M. Nilges, S.I. O’Donoghue, Prog. 1117-1128. N M R Spectrosc. 1998, 32, 107-139. 2. Wang, P.C. Harkins, R.J. Ulevitch, 105. P.J. Hajduk, J.C. Mack, E.T. J. Han, M.H. Cobb, E.J. Goldsmith, Olejniczak, C. Park, P.J. Dandliker, Proc. Nutl. Acud. Sci. U.S.A. 1997, 94, B.A. Beutel,]. Am. Chem. SOC.2004, 2327-2332. 126,2390-2398. K.P. Wilson, M.J. Fitzgibbon, P.R. Caron, J.P. Griffith, W. Chen,
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
15.3 The Nuclear Receptor Superfamily and Drug Discovery
15.3 The Nuclear Receptor Superfamily and Drug DiscoveryQ
John T. Moore, J o n L. Collins, and Kenneth H . Pearce
Outlook
Nuclear receptors are an evolutionarily related family of proteins unified by common structural and functional properties. In general, these receptors act as specialized transcription factors that ultimately regulate target genes involved in a variety of critical biological processes, such as cellular differentiation, reproduction, metabolic homeostasis, and immune system function. For a subset of the receptors, activities can be regulated by endogenous hormones, lipids or metabolites and in some cases synthetic small molecule ligands. Drug discovery advances within the field have shown that designer ligands can exert pathway- and tissue-selective effects on the receptors, thus maximizing medically beneficial responses over side-effect liabilities. This review covers many of the features of nuclear receptor structure/function and highlights some of the key methodologies currently being used to aid discovery of new nuclear receptor-targeted drugs.
15.3.1 Introduction
A central theme that defines the field of endocrinology is the act of controlling activities and processes at distal sites in the body. Signaling molecules, in some cases nonprotein small molecules, traverse the body and ultimately relay their chemically encoded information to a protein receptor at the target tissue. The nuclear hormone receptor ( N R ) is a classic example of a receiver for such small molecule, chemical messengers. The N R is well adapted for this type of function because it not only specifically binds the small molecule but is also capable of relaying or transducing a complex set of signals carried along by the properties of the ligand. As reviewed herein, the nature of the information that the ligand-bound N R relays, depends on a complex interplay of factors, such as ligand and cell type. In humans, 48 N R genes have been identified (Fig. 15.3-1) [l].A feature that unifies the N R s as a superfamily is that each receptor consists of an assembly of functional modules (Fig. 15.3-2) [2].For the purpose of this review, the module most relevant to current drug discovery approaches is the C-terminal
$<
A similar version of this paper was published in ChemMedChem 2006, 1, 504-523, Wiley-VCH, Weinheim, Germany
Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wrss Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheirn ISBN: 978-3-527-31150-7
I
891
c
I
NR112-PXR NR113-CAR
NR11I-VDR
N R I B2-RARB NRIBS-RARG
_.
-
-
-
NRGAI. .GCNF NRSAl-SF-1 NRSA2-LRHNR2AI-HNF4 NR2A2-HNF4
NR2ES-PNR
Fig. 15.3-1 The NR superfamily represented as a phylogeny plot. The 48 identified receptors within the human genome are shown clustered according t o
amino acid sequence relationships. NRs are named according to the accepted unified nomenclature (see Table 15.3-1 for more description) [l].
ligand-binding domain (LBD). The LBD is typically about 250 amino acids in length and contains a key regulatory element, the so-called activation function 2 (AF2) domain, as well as all the recognition elements required for ligand binding (Fig. 15.3-3(a),left) [3]. The fold of the N R LBD is typically described as three stacked a-helical sheets. The helices comprising the “front” and “back” sheets are roughly aligned parallel to one another. The helices in the middle sheet run across the two outer sheets and occupy space only in the upper portion of the domain (Fig. 15.3-3(a),right). The space in the lower part of the domain is relatively
Fig. 15.3-2 Domain organization ofthe NRs. Shown are the basic structural modules comprising an NR (AF-1, activation function-1; DBD, DNA binding domain; LED, ligand binding domain). In the linear schematic at the top, the general functions o f the respective regions o f NRs are noted.
Examples of selected NRs (see Table 15.3-1 for abbreviations) are shown below to demonstrate that most NR LBDs are similar in amino acid length, but the N-terminal region varies ~ ~ O n gfamily St members. Numbers represent amino acid Position.
894
I
75 Target Families
15.3 The Nuclear Receptor Supefamily and Drug Discovery 4
Fig. 15.3-3 Representative structures of NR functional modules. (a) The first NR
atom type with carbons shown as blue and oxygens shown as red. The domain on the LBD to be solved crystallographically was right is rotated 90” t o clearly show the three the apo RXR LBD (51. The representative helical layers that comprise the NR LBD example structure shown here, depicted as fold. (b) Shown, as a ribbon diagram, this is a ribbon diagram, is for PR bound t o its the first X-ray crystal structure o f an NR DBD bound to a DNA response element. natural ligand progesterone [6]. This structure, which was the first ofthe steroid This representative structure is the DBD receptors to be solved, shows the basic fold from CR bound in an antiparallel fashion to its inverted direct-repeat DNA response site conserved among members ofthe NR [7]. The CR DBD is bound as a homodimer, superfamily. The major helices (red) are labeled, the well-conserved small p-sheet is where one ofthe subunits is colored yellow and the other is colored blue. The DNA helix shown in yellow, and the random coil is shown with atoms represented as spheres stretches connecting the major structural and colored according t o atom type elements are colored green. The final (carbon - green; oxygen - red; C-terminal helix is labeled as the AF2 helix nitrogen - blue; phosphorus - magenta). and is described in more detail in the text. The progesterone molecule is colored by
void of protein, and for most NRs this creates an internal cavity for small molecule ligands. The central part of the typical N R contains the DNA binding domain (DBD), which is usually about 70 amino acids, contains two zinc-finger motifs, and is the most highly conserved sequence segment amongst the NRs. For some NRs, the DBD forms a dimer and binds a DNA response element containing a direct repeat of six base pairs [4]. The typical DBD contains three helices, (Fig. 15.3-3(b))the first of which docks into the major groove of the DNA recognition site. A second smaller helix and the loop preceding it create a domain-domain interface. The third helix makes no DNA or other contacts. Most NRs have an N-terminal domain, commonly referred to as the activation finction I (AF1) domain. This module varies greatly in length amongst receptors and generally contains a nonligand dependent transcriptional AF. Upon activation by the ligand messenger, NRs typically function as transcription factors where they bind to recognition elements and regulate the expression of target genes. Once complexed to DNA, NRs recruit accessory proteins such as coactivators, corepressors, and basal transcriptional factors, thus initiating gene transcription (Fig. 15.3-4).In some cases, genes under the control of a negative response element are downregulated by an NR; thus NRs are able to act directly as activators or suppressors of gene function. As will be discussed later in this chapter, N R pathway regulation goes beyond direct, DNA-mediated transcriptional regulation. For example, some NRs crosstalk with other important signal transduction schemes such as nuclear factor kappa B (NF-KB)and activator protein 1 (AP-I)[8](Fig. 15.3-4). NRs have a rich and long-standing history in drug discovery. This can be attributed to several features inherent to this class of targets: (a) NRs have been designed by nature to selectively bind “druglike” small molecules and (b) a diverse set of biologically important functions can be regulated through a single ligand-activated receptor (see Table 15.3-1, e.g., of NR-targeted drugs). Data
I
895
896
I
15 Target Families
Fig. 15.3-4 A simplified schematic depicting the general mechanisms of NR function. Some unliganded NRs, such as the steroid receptors, exist in the cytoplasm in an inactive complex with heat shock proteins (hsps). Ligand binding triggers hsp uncoupling and transport ofthe NR to the nucleus. To directly regulate gene transcription, the ligand-bound NR associates with a DNA response element within the promoter ofthe target gene. In many cases the NR localizes in the form o f a
homo- or heterodimer. This complex is able t o recruit coactivator (CoA) proteins and other transcriptional components to regulate target gene expression. Another mechanism whereby ligand-activated NRs can affect gene transcription involves association with other transcription factors (TF) such as NF-KB, AP-1, (activator protein-1) is a transcription factor, TF and CATA (abbrev and ref). The precise molecular mechanism ofthis latter activity remains controversial.
Classic steroid receptors
General categorylal
AR
Androgen receptor
NR3C4
Progesterone; progestins Testosterone; androgens
PR
Progesterone receptor
NR3C3
MR
Mineralocorticoid receptor
Aldosterone; deoxycorticosterone
Cortisol; glucocorticoids
NR3C1
GR
Glucocorticoid receptor
NR3C2
Estradiol, estrogens
NR3A1 NR3A2
E Rct ERB
Estrogen receptor
Natural ligand
Unified nomenclature[b]
Subtypes and abbreviations (other common abbreviations)
Name
Table 15.3-1 The human nuclear receptor superfamily and examples of ligands and therapeutic utilities
Flutamide, bicalutamide (Casodex)
Tamoxifen, raloxifene (Evista),genestein, diethylstilbestrol, equine estrogens (Premarin) Prednisone, dexamethasone, fluticasone propionate (Flovent, Flonase), mometasone furoate (Nasonex),budesonide (Rhinocort/Pulmicort) Spironolactone (Aldactone), eplerenone (Inspra) RU486 (Mifepristone)
Examples of therapeutic ligands (trade name)
(continued overleaf)
Abortifactant, menstrual control Prostate cancer
Hypertension, heart failure
Inflammatory and immunological diseases, asthma, arthritis, allergic rhinitis, cancer, immune suppressant for transplant
Menopausal symptoms, osteoporosis prevention, breast cancer
Therapeutic reIevanceICl
P w. 2 E
09
z
0
3
n
3.
3
P
VI
‘n,
2
(D
$
2
2 $
2
lu
9 a. .-
Name
Xenobiotic receptors
Unified
VDR RXRa RXRP RXRY PXR
Vitamin D receptor
Retinoid X receptor
Pregnane X receptor
NRlH4
FXR
Farnesoid X receptor
NRlI2
NR2Bl NR2B2 NR2B3
NRlIl
NRlH2 NRlH3
LXRa LXRB
Liver X receptor
NRlBl NRlB2 NRlB3 NRlCl NRlC2 NRlC3
RARa RARB PPARa PPARS PPARy
NRlAl NRlA2
nomenclatureIb]
TRa TRB
abbreviations)
abbreviations (other common
Subtypes and
Peroxisome proliferators-activated receptor
Classic RXR- Thyroid hormone heterodimer receptor receptors Retinoic acid receptor
categorylal
General
Xenobiotics
All trans-retinoic acid
Vitamin D, bile acids
Chenodeoxycholic acid
St. John’s wort, rifampicin
LG1069 (Targretin)
Calcitriol (Rocaltrol)
Role in protection from toxic metabolites
Role in lipid and cholesterol metabolism; atherosclerosis Cholesterol maintenance, protect hepatocytes from bile toxicity; cholestasis H ypocalcemia, osteoporosis, renal failure Skin cancer
Dyslipidemia (PPARa), diabetes and insulin sensitization (PPARy )
Isotretinoin (Accutane) Acne
Retinoic acid
Fatty acids, eicosanoids Fenofibrate (Tricor; PPARa), thiazolidinediones (Avandia, Actos; PPARy) 24,25-Epoxycholesterol 24-Hydroxycholesterol
Therapeutic
Thyroid deficiency
Levothyroxine (Synthroid)
(trade name)
ligands
Examples of therapeutic
Thyroid hormone
Natural ligand
VI
2 %-+ 2 --. 2.
Orphan Receptor (or recently deorphaned)
NR2E3 NR2Fl NR2F2 NR2F6
PNR COU P-TF I COUP-TFII COUP-TFIII (Ear2)
Photoreceptor-specific nuclear receptor Chicken ovalbumin upstream promoter-transcription factor
Tailless-like
Testis receptor
Unknown
Unknown
Unknown
Unknown
Unknown
Palmitic acid
NR2Al N R2A2 NRlDl NRlD2 NR2Cl NR2C2 NR2El
HNF4a HNF4y Rev-erbAa Rev-erbAfi TR2 TR4 TLX
Human nuclear factor 4 Reverse erbA
Cholesterol, cholesterol sulfate
NRlFl NRlF2 NRlF3
RORa RORfi RORy
RAR-related orphan receptor
Unknown
NR3Bl NR3B2 NR3B3
ERRa ERRB ERRy
ER-related receptor
Xenobiotics
NR113
CAR
Constitutive androstane receptor
-
Tamoxifen, diethylstilbestrol (ERRY)
Phenobarbitol
(continued
Role in neuronal development Role in photorece differentiation Role in neuronal development (CO vascular develop (COUP-TFII)
Unknown
Circadian rhythm
Role in cerebellu development, maintenance of b (RORa);circadian (RORB); lymph n organogenesis (R Role in diabetes
Muscle fatty acid metabolism ( E R R
Role in protection toxic metabolites
Role in mammalian sexual development Role in lipid homeostasis, cell-cyclecontrol Role in vertebrate embryogenesis
Phospholipids
General repressor of NRs, obesity
Unknown
NROBZ -
Role in sex determination and development
Unknown Unknown
NRGAl
GCNF
Phospholipids
NROBl
NR5A2
LRHl
Unknown
NR5A1
NR4A2 Unknown
Therapeutic
NR4A3
ligands (trade name)
Examples of therapeutic
Role in thymocyte apoptosis Role in dopaminergic neuron development Unknown
Natural ligand
Unknown
NR4A1
clatureIb]
(other common abbreviations)
NGFIBa (also NUR77) NGFIBB (NURR1, NOT1) NGFIBy (NOR1) SF1
Unified nomen-
Subtypes and abbreviations
DSS-AHC critical region DAXl on the chromosome, gene 1 Short heterodimer SHP partner
Liver receptor homologous protein 1 Germ cell nuclear factor
Neuron-derived orphan orphan receptor 1 Steroidogenic factor 1
Nur related factor 1
NGF-induced factor B
Name
a Each of the 48 human receptors is roughly categorized into several very generalized groups. The order descends from the historically, more studied, classical receptors (top) to the more recently discovered family members (bottom). b Nomenclature from Ref. 111. c Biological role of the receptor if ligand is currently not identified.
NR-like, DBD-less repressors
General categoryia1
2
2
2 2. =
2
15.3 The Nuclear Receptor Superfamily and Drug Discovery
compiled for the year 2003 (http://www.rxlist.com/top2OO.htm) show that 34 of the top 200 most prescribed drugs target an NR. Currently, drugs targeting an N R account for over 30 billion dollars in pharmaceutical sales and treat numerous debilitating diseases. In the light ofthese facts, the N R field remains an area of intense research with most of the current effort directed toward improving upon current N R drugs or screening currently unexploited NRs. The purpose ofthis review is to briefly cover the following general topics as they pertain to the chemical biology of NRs: The history of NR-targeted drug discovery, principles of NR-ligand recognition and protein conformational change, biological pathways controlled by NRs, recent N R drug pursuits, and finally some new technologies and future pharmaceutical prospects for this target class.
15.3.2 Brief History of N R s in Medicine and Drug Discovery
The first generation of N R drugs was discovered prior to a detailed knowledge of the target class. Many clinically useful compounds were initially found by tracking down biological activity from natural extracts. Only later did these bioactive molecules lead scientists to the actual drug target. Studies dealing with bioactive fractions from natural extracts containing steroid or thyroid hormones helped lay the foundation for modern NR-based endocrinology. For example, study of adrenal gland extracts initiated GR drug discovery and these tissue extracts were used clinically to correct the manifestations of Addison’s disease (glucocorticoid deficiency) (for review see Refs. [9, lo]). From this early clinical work, a well-defined relationship that connected the adrenal extract with maintenance of homeostatic function emerged. For example, it was noted that, in addition to bringing about remission from stress-related diseases, the extracts also suppressed symptoms in patients suffering from inflammatory conditions such as allergy, hay fever, and asthma. At the same time, biochemical characterization of the adrenal gland extracts identified cortisone as an active steroidal component. In 1948, when sufficient quantities of cortisone could be purified, its effects on inflammatory diseases were directly tested. Ultimately, total synthesis of cortisone was accomplished by Woodward and colleagues and a group at Merck [11, 121, thus completing the first-generation evolution of this drug and setting the stage for later syntheses of potent synthetic steroids such as prednisolone and dexamethasone. A similar history was seen with the first generation of drugs that targeted other steroid receptors. It was known as early as 1916 that ovariectomy could reduce the incidence of mammary cancer in high-incidence strains of mice [13]. Studies of the biological effects of extracts containing estrogenic activity prompted screens for compounds with antiestrogenic effects, initially for contraception in the 1960s, but later for estrogen-responsive breast cancers. Screens for antiestrogenic nonsteroidal compounds led to the discovery of
I
901
902
I ethamoxytriphetol, clomiphene, and then, tamoxifen. Tamoxifen ultimately 15 Target Families
became the gold standard for the endocrine treatment of breast cancer and relatively recently became the first approved cancer chemopreventative agent. Not surprisingly, the first set of N R genes cloned were from the steroid receptor subgroup where prior research yielded compounds to aid in purification of the receptor. The first human N R cloned was the glucocorticoid receptor (GR), an accomplishment that relied heavily on reagents made available from the purification and biochemical characterization of adrenal extracts. With purified receptors, selective antibodies were used to help isolate the corresponding cDNA [14-161. cDNAs representing the full-length coding region of GR provided the first full-length amino acid sequence of an NR. The estrogen receptor (ER) was also cloned around the same time by three groups using independent strategies [17-191. Comparison of emerging N R sequences (from human as well as from other species) revealed conserved domains shared virtually among all NRs. The finding that NRs could be isolated without knowledge of their ligand increased the rate at which new NRs could be identified. Initially, oligonucleotides representing conserved N R motifs (such as the highly conserved DBD) were employed as molecular probes to perform low stringency DNA hybridizations to cDNA libraries. The number of orphan NRs quickly surpassed the number of classical NRs [20-221. By the late 1990s, the chosen method for identification of new NRs shifted from the laboratory to in silico methods. This advance was made possible by the availability of large databases of randomly generated partial cDNA sequences, known as expressed sequence tags (ESTs),and the development of bioinformatic searches and query tools such as BLAST. Two new mammalian NRs were successfully identified through automated searches of EST databases. The pregnane X receptor (PXR) was identified in a public database of mouse ESTs by a high-throughput in silico screen for NR-like sequences [23], and the photoreceptor cell-specific receptor (PNR) was found in a human EST database [24]. After the isolation of PNR from EST databases, the number of human NRs totaled 48. The availability of the complete human genome sequence in 2001 confirmed that this set of 48 is the complete N R genome [25, 261. As new NRs were isolated, new connections between first-generation drugs and their targets were made. For example, thiazolidinediones (TZDs) had previously been discovered through traditional pharmacological methods to show clinical benefit in diabetes; however, the molecular basis for this therapeutic effect remained unclear. By using expression constructs derived from the isolated N R genes, activity screens for each receptor were developed. Using these screens, TZDs were found to be potent and selective activators of peroxisome proliferator activated receptor gamma (PPARy) [27]. Once this link was made, the search for a second generation of PPARy compounds could be initiated using an in vitro assay for PPARy activation.
75.3 The Nuclear Receptor Supefamily and Drug Discovery
This second generation approach of using the receptor rather than a bioactive extract can be characterized as a “reverse endocrinology” approach. Traditionally, ligands were identified on the basis of their biological effects. But, when this process is reversed, the orphan receptors are used to identify the ligands, which are subsequently used to dissect the biology of the receptors. For example, a reverse endocrinology approach was used to link farnesoid X receptor (FXR) to bile acid ligands. Availability of chemical tools (bile acids as well as synthetic ligands) for FXR led to experiments that linked FXR to bile acid homeostasis and suggested the possibility that FXR ligands could be of benefit in treating disorders involving cholestatic liver disease [28]. Amongst the N R superfamily, a third generation drug discovery effort has recently begun. In this phase, screening methods that give information beyond potency and selectivity (e.g., selective effects on gene expression) are used to discover compounds with therapeutic advantages over present drugs. Strategies that underlie this new drug discovery effort are the subject of a following discussion on NR modulators. 15.3.3 Basic Principles for Ligand-NR Recognition
From a medicinal chemistry perspective, targeting NRs via novel small molecule ligands is a fairly tractable exercise. As mentioned above, most NRs have a small, enclosed ligand-binding pocket and a wide variety of druglike, high-affinity molecules can be identified, which bind in this pocket. The inherent difficulty of rational drug design for NRs derives from the vast complexity of N R associated biology. While small molecules that bind the target N R with high affinity can be fairly readily identified, the corresponding functional activity is not always obvious or immediately interpretable given the current level of biological understanding (discussed in more detail below). In this section, we will discuss the general principles of ligand binding for NRs.
15.3.3.1 Steroid Receptors: CR, MR, PR, AR, and ER The ligand-binding pockets of the steroid receptors, which includes GR, the mineralocorticoid receptor (MR),progesterone receptor (PR),and the androgen receptor (AR),as well as the more divergent ER, have many common features required for binding the natural hormone. At least one crystal structure exists for each ofthese LBDs [6,29-311 (see Figs. 15.3-3(a)and 15.3-5 for an example of PR and GR, respectively). Typically, about 75% (roughly 17 of 22 residues) of the ligand-binding pocket’s inner lining consists of hydrophobic residues. Generally, all the polar residues within the binding pocket (roughly three to five residues) make a hydrogen bond to the natural ligand. In each case, the A ring
I
903
904
I
15 Target Families
Fig. 15.3-5 Structure of the CR LBD and features o f ligand binding. (a) Crystal structure o f the CR LBD bound with dexamethasone [29]. The protein is shown as a ribbon diagram and the AF2 helix, which is in the active orientation, is colored red. The CR agonist, dexamethasone, is shown in space-filling mode and carbons are colored blue, oxygens are red, and hydrogens are white. (b) Close-up ofthe CR ligand-binding site. The pocket is shown as a cut away and the back face o f the represents the hydrophobic nature o f t h e pocket (carbons are colored green). Dexamethasone i s shown oriented with the A ring 3-position ketone toward the back of the pocket and the D ring is positioned toward the AF2 helix. Hydrogen bonds with key amino acids within the pocket are shown as dotted yellow lines. (c) Representative structures o f well-known CR Iiga nds.
Dexamethasone
Fluticasone propionate
, N,
o@-
RU486
of the steroid hormone is positioned between helices 3 and 5. The oxosteroid receptors GR, MR, PR, and AR lock the A-ring 3-position carbonyl of the steroid into place with a hydrogen bond “charge clamp” using a conserved
75.3 The Nuclear Receptor Superfamily and Drug Discovery
glutamine and arginine on helices 3 and 5 , respectively. With ER, coordination of the 3-position hydroxyl is made via a glutamate and an arginine at the respective locations. In all cases, the D ring of the steroid points toward helix 10 and the AF2 helix. The volume of the pocket varies slightly amongst the receptors when in complex with the respective natural ligand: approximately 420, 450, 560, 580, and 590 A3, for AR, E R a , PR, MR, and GR, respectively. Although, depending on the size and shape of the bound ligand, the volume of the pocket can change significantly. This dynamic flexibility allows this class of receptors to accept a wide variety of synthetic ligands with numerous shapes and volumes. Interestingly, no crystal structures of unliganded steroid receptors have been reported, so the precise nature of the pocket in the absence of ligand is unknown. Crystal structures of steroid receptors in complex with synthetic ligands have revealed alternative binding modes as compared to the natural steroid hormone. To date, E R a and ERB subtypes [32]have provided the most variety of crystal structures with bound synthetic ligands [33]. There are currently several examples of ER in complex with synthetic ligands: diethylstilbestrol (DES), 4-hydroxytamoxifen (OHT) [34], genestein [35], raloxifene [36], (R, R)-5,1l-cis-diethyl-5,6,11,12tetrahydrochrysene-2,S-diol(THC) [37], and the pure antiestrogen ICI 164384 [38]. Each of these complexes, either with E R a or ERB, reveals that the hydrogen bond clamp with a hydroxyl off the A-ring analog is conserved. The presence of this interaction in each of the structures emphasizes the importance of this hydrogen bond for high-afinity binding. The other commonality between these ligands is that they fill the core of the ligand-binding pocket with hydrophobic atoms, each roughly occupying the same volume. One of the key features of the OHT, raloxifene, and ICI 164384 structures is that each contains an extended amine or hydrophobic group directed toward the AF2 helix, which causes steric repositioning of this structural element (see Fig. 15.3-8(b)and the discussion in section 15.3.4).
15.3.3.2 RXR-heterodimer Receptors: PPARs, RXR, LXR, FXR Unlike the steroid receptors, most ofwhich function as homodimers, a second class of NRs function as heterodimers with the retinoid X receptor (RXR). Importantly, these receptors serve as sensors for metabolites such as fatty acids, oxysterols, and bile acids. Key elements of ligand recognition and receptor activation have been elucidated following structure-function analyses of several receptors in this family including the PPARs, liver X receptors (LXRs),and FXR. The X-ray crystal structures of the PPARs, LXRs, and FXR have been determined in various unliganded and liganded states. The volumes of the ligand-binding pockets are larger than the steroid receptors and range from 700 to 850 A3 for FXR/LXRs and to 1300 A3 for the PPARs. As with the steroid receptors, the size and shape of the ligand-binding pockets can vary depending
1
905
906
I on the size and shape of the ligand. This plasticity permits the binding of 15 Target Families
diverse, structurally distinct chemotypes. The majority of amino acids that line the ligand-binding pockets in these receptors are hydrophobic; however, several key polar amino acids are present, which have been shown to be critical for ligand recognition and receptor activation. For the PPARs, an acidic group present in fatty acids is involved in a complex hydrogen-bond network consisting of a tyrosine on AF2 and two histidine residues on helices 5 and 10, most ofwhich are conserved between the three PPAR subtypes (Fig. 15.34). Importantly, the direct hydrogen-bonding interaction of the acidic moiety with tyrosine on AF2 stabilizes AF2 in an active conformation and initiates transcriptional activation. The requirement for this interaction for transcriptional activation is evidenced by the fact that PPAR ligands (such as GW0072, Fig. 15.3-9) that lack this hydrogen-bonding interaction show partial agonist or antagonist activity [39]. In contrast to the PPARs, the interaction between oxysterols and bile acids with LXR and FXR, respectively, does not occur through a direct interaction with an amino acid on AF2 [41, 421. A critical hydrogen-bond interaction is observed between a histidine on helix 10/11 and either an acceptor oxygen on the natural ligand (epoxycholesterol) or a donor oxygen on a synthetic ligand (T0901317). This interaction positions the histidine perpendicularly to a tryptophan residue that is located on the AF2 helix (Fig. 15.3-7), which, in turn, promotes an electrostatic interaction between these two amino acids. In addition to contributing to ligand binding, this network of interactions connecting ligand to the AF2 helix helps stabilize the receptor in an active confirmation (Fig. 15.3-7). It should be noted that hydrophobic interactions between ligand and receptor can also initiate the histidine/tryptophan electrostatic switch [43]. The cumulative data suggests that this histidine/tryptophan interaction is the molecular basis for liganddependent activation of the LXRs and FXR. Clearly, a select number of polar amino acids within the binding pockets of PPARs, LXRs, and FXR play important roles in mediating ligand recognition and receptor activation.
15.3.3.3
“Orphan” Receptors: HNF4, CAR, NCFIB
While the steroid and RXR-heterodimer receptors show low transcriptional activity in the basal state, several NRs have been identified that are transcriptionally active in the basal state and are thus referred to as constitutively active receptors. Structural analyses of two NRs in this class, the hepatocyte nuclear factors 4 (HNF4s)[44]and nerve growth factor-induced B (NGFIB) [45], provide insight into two unique mechanisms that give rise to the constitutive activity. The X-ray crystal structure of HNF4y has revealed the presence of host-derived fatty acids in the ligand-binding pocket. A similar observation was made in HNF4a [4G].The fact that these fatty acids were not displaceable led to the proposal that these natural ligands serve as structural cofactors for HNF4. In contrast to HNF4y, the X-ray crystal structures of NURRl and DHR38,
15.3 The Nuclear Receptor Superfamily and Drug Discovery
Fig. 15.3-6 Structure ofthe PPARy LBD and features o f ligand binding. (a) Shown in blue is a ribbon diagram ofthe crystal structure o f PPARy LBD bound with rosiglitazone [40]. The AF2 helix, which is colored red, is in the active position for binding an LXXLL coactivator peptide (not shown). The rosiglitazone molecule is buried in the receptor and is represented in space-filling mode with carbons colored green, oxygens red, and nitrogens blue.
(b) Close-up ofthe binding site with the PPARy LBD. The front face o f the site is clipped away to show the bound rosiglitazone molecule and the hydrophobic backside ofthe binding pocket. As shown, a tyrosine residue from the AF2 helix o f PPARy makes a hydrogen bond with the thiazolidinedione head group o f rosiglitazone. (c) Representative structure o f a well-known PPARy ligand.
I
907
908
I
15 Target Families
15.3 The Nuclear Receptor Superfamily and Drug Discovery 4
Fig. 15.3-7 Structure of the LXRB LBD and features o f ligand binding. (a) A ribbon diagram representing the crystal structure o f LXRP in complex with the synthetic agonist ligand, T0901317, i s shown in orange [42]. The AF2 helix, which assumes the agonist conformation, is colored red. The ligand is shown in space-filling mode and carbons are colored green, oxygens red, nitrogens blue, and fluorines magenta. Similar t o the orientation o f steroids with the steroid receptors, the D ring, or D-ring mimetics in
the case o f nonsteroidal synthetic molecules, protrudes toward the AF2 helix. (b) Close-up o f the ligand-binding pocket for LXRB. The front half o f the receptor is cut away t o show the ligand bound back face o f the pocket. The histidine/tryptophan switch, which is key for ligand-induced activation o f LXR, is highlighted. The His-mediated hydrogen bond i s indicated with yellow line. (c) Representative structures o f well-known LXR ligands.
the mouse and Drosophila orthologs of NGFIB-B, respectively, showed the absence of a ligand-binding pocket [45,47].Instead, several bulky hydrophobic residues fill the space that is normally occupied by the ligand, suggesting that the receptor may not be regulated via the classical ligand-based approach. Clearly, determination of the X-ray crystal structures for the remaining orphan NRs will provide insight into the tractability of these targets for drug discovery.
15.3.4 Influence of Ligand on NR LBD Conformation
There have been numerous key studies demonstrating that ligand binding does not simply trigger NRs from an off-state to an on-state. In fact these studies revealed, at a molecular level, that activation of an N R by a small molecule ligand is dramatically more complex than a two-state process. The concept that ligand alters N R conformation to produce activity profiles pertains mostly to the steroid receptors, PPARs, TR, RXR, RAR (retinoic acid receptor), LXR, and FXR. Considerable doubt exists whether this concept applies to select “constitutively active” receptors such as HNF4 and NGFIB. One of the first studies to reveal the conformational effect of ligand utilized a protease digestion assay to show that ER ligands could differentially affect the pattern of protease-generated peptides [48]. As suspected from earlier work, this study demonstrated that different ligand classes could affect N R conformation and thus alter the AF2 activity of the receptor. Predominantly structural studies using X-ray crystallography have shed light on how ligands can alter N R conformation. In the late 199Os, two groundbreaking reports on ER showed that ligand can particularly affect the orientation of the most C-terminal a-helix of the LBD, referred to as the AF2 helix [34, 361. In these studies, the AF2 helix of ER, bound with an agonist ligand such as estradiol or the synthetic DES, was shown to adopt a position similar to that seen in the original RAR and PPARy agonist-bound structures [S, 401 (Fig. 15.3-8(a)).In this active conformation, the AF2 helix spans across H3 and H10. This arrangement creates a shallow, hydrophobic groove adjacent
I
909
910
I
75 Target Families
713 The Nuclear Receptor Supe6amily and Drug Discovery 4
Fig. 15.3-8
Examples showing the many possible conformations ofthe AF2 helix. (a) E R u with the agonist diethylstilbestrol (341; (b) E R u with the antiestrogen 4-hydroxytamoxifen [34]; (c) PPARa with the antagonist CW471 [49]. Each receptor, oriented in the standard position with H1/H3 in front and slightly off to the right, is shown in space-filling mode. The AF2 helix for each receptor is shown as a green ribbon, or as a green random coil for PPARa. On DES:ERu. the AF2 helix lies across the receptor to help form a binding
site for an LXXLL coactivator peptide, which is colored yellow. The ligand tamoxifen sterically interferes with the loop preceding the AF2 helix and causes the AF2 helix to reorient, bind within the coactivator cleft, and block LXXLL peptide binding. For the PPARa:GW471 complex, the AF2 helix is perturbed in a way t o allow accommodation o f a corepressor peptide (shown in magenta). In this case the AF2 helix is somewhat unwound and localizes on the receptor in a different position relative to that seen for other NR LBD structures.
to the AF2 helix. This pocket accommodates a short helical peptide presented at the surface of a coactivator protein (reviewed in a section below). Peptides that bind this region of the activated N R typically contain an LXXLL motif (where L and X represent leucine and any amino acid, respectively). This short peptide motif is typically a-helical and the leucine residues are presented on one face of the amphipathic helix. An additional electrostatic interaction between amino acid side chains of the receptor and the peptide backbone are believed to aid orientation and stability to the interactions. The structures of E R bound with either tamoxifen or raloxifene, where both are antagonists for AF2 function, strikingly revealed that the AF2 helix could be repositioned from the agonist conformation (Fig. 15.3-8(b)).In each of these structures, an amine-containing head group from the ligand protrudes toward the surface of ER to destabilize the active position of the AF2 helix. This shift causes the AF2 helix to rotate approximately 90" from the active position. In the antagonist position, the AF2 helix occupies the coactivator peptide-binding site on the surface of the receptor. These studies highlight the ligand-induced flexibility and plasticity of the N R LBD particularly with respect to the AF2 helix. More recent structural studies using the GR LBD further demonstrate how ligand can influence the conformation of the LBD [29, 501. The structure of GR bound with the agonist dexamethasone shows that the AF2 helix exists in an active position to allow coactivator peptide association. Two structures of GR bound with the antagonist ligand RU486 have shown that a protruding dimethylaniline group effectively prevents the AF2 helix from occupying the active position. In one of these structures, the AF2 helix intramolecularly blocks the coactivator site. In the other structure, the AF2 helix extends away from the core of the LBD and associates with an adjacent LBD subunit in the crystal. Again, these studies suggest that the AF2 helix and the loop that precedes it are prone to ligand-induced conformational flexibility. Two studies dealing with PPAR also demonstrate the ligand-induced conformational aspects of the LBD. In a structure of PPARa, in complex with both an antagonist ligand GWG471 (Fig. 15.3-9) and a peptide motif
I
91 1
912
I
15 Target Families
Fig. 15.3-9 Examples o f NR tool compounds and drugs, many of which are referred t o and discussed in the text. For some ligands, the region o f the molecule
that is oriented toward the AF helix (as determined from the crystal structure o f the NR-ligand complex) is shaded.
15.3 The Nuclear Receptor Superfamily and Drug Discovery
from a corepressor (reviewed below), the AF2 helix assumes an alternative location (Fig. 15.3-8(c))[49]. Here, the AF2 helix occupies neither the agonist nor antagonist position (i.e., the coactivator groove as seen with ER), but lies adjacent to the corepressor peptide. Another study using nuclear magnetic resonance on PPARy shows that the apo LBD is a highly flexible module in which over half of the chemical shifts of the backbone atoms are missing [51]. When bound with rosiglitazone, these shifts particularly in the ligand-binding pocket and the AF2-helix regions can be assigned. In general, these studies suggest that physiochemical properties of the N R ligand can dramatically influence conformational dynamics of the LBD, which in turn ultimately governs the downstream signaling aspects of the liganded receptor.
15.3.5 The Multitude o f Ligand-induced NR Actions
By virtue of their ability to interact with a repertoire of molecules within the cell, ranging from DNA response elements and protein accessory factors, the NRs represent a target class of complex, multitasking proteins (see Refs. [52, 531 for reviews). Most of the NRs were initially considered to be simple ligand-induced transcription factors. However, studies over the past decade have revealed that NRs are much more complicated and serve more than a unified functional purpose. In this section we will highlight some of the types of activities of NRs using particular examples.
15.3.5.1
Gene Regulation and the Role ofActivity Enhancing Accessory Proteins
At various stages in the activity cycle, NRs act in concert with a variety of binding partners. For example, prior to ligand binding, GR resides in the cytoplasm of the cell in complex with chaperone proteins such as hsp90 or p23 [54]. Ligand association causes dissociation of chaperones and allows GR to traverse the nuclear envelope. Using amino acids within the DBD, the GR binds to a recognition site on a specific promoter, a site referred to as a glucocorticoid response element (GRE). N R response elements have a general half site consensus of RGGTCA (where R is a purine); these DNA half sites are commonly arranged as repeats, either direct or inverted. The precise mechanism by which NRs associate with DNA response elements varies amongst the superfamily. In general, the steroid receptors bind to their response elements as homodimers, although GR can form heterodimers with MR, and ERa and ERD also can bind DNA as heterodimers. Several NRs, such as TR, PPARs, LXR, VDR, RAR, and FXR, require heterodimerization with RXR. Further, many ofthe orphan receptors, such as LRH1, SF1, and NGFIB can bind DNA as a monomer. The DNA-bound, ligand-activated N R serves as the docking site for a rather large extended family of proteins called coactivators. Binding of a coactivator
I
913
914
I protein is believed to be one of the key events in initiating transcriptome 15 Target Families
assembly and consequent gene transcription. The first coactivator, called steroid receptor coactivator 1 (SRCl), was identified in 1995 [55] and since then over 200 such cofactors have been discovered. The variety of functions for coactivators, as well as their nomenclature, is a vastly complex field and a full description of the multitude of their functions are beyond this review (for more detail see Refs. [SG-581). Focusing on one representative, S R C l is a member of the plG0 family of coactivators, which also includes SRC2 (also called transcription intermediary factor 2 (TIF2))and SRC3 (also called ACTR/pCIP/receptor associated coactivator (RAC3/TRAMl/amplified in breast cancer 1 (AIB1)). SRCl illustrates many features common among the coactivators. First, it contains several LXXLL motifs, otherwise known as NR boxes [59, GO]. As mentioned above, these short, a-helical motifs present a hydrophobic surface that is critical for successfully docking the coactivator protein onto an activated NR. Second, an activation domain within SRCl contains an acetyltransferase activity which acts locally on histones to unravel DNA at the initiation site [Gl]. Third, SRCl is able to aid recruitment of other nuclear enzymes, such as other histone acetylating proteins including CAMP-response element binding protein (CBP) and p300, and an arginine methyltase called coactivatorassociated arginine methyltransfrase 2 (CARM1). Ultimately, to initiate gene transcription, the NR-coactivator complex recruits the chromatin remodeling complex SWI/SNF and the basal transcription factor-recruiting complex, TRassociated proteinlvitamin D receptor-interacting protein (TRAP/DRIP),and other basal transcription factors.
15.3.5.2 Corepressors and the Role o f Activity Diminishing Accessory Proteins
Essentially the functional counterpart of coactivators, corepressor proteins bind to many NRs in the absence of ligand and serve to repress basal transcriptive activity [G2]. Corepressors play a particularly important role for NRs that are found almost exclusively in the nucleus, unlike the apo steroid receptors that are cytoplasmically localized. Studies involving the nuclearlocalized receptors TR and RAR led to the identification of silencing mediator of retinoid and thyroid (SMRT) receptors and nuclear receptor corepressor (NCoR) [G3,641. Both SMRT and NCoR recruit histone deacetylases (HDACs), namely, HDAC3, which function to reverse the chromatin unwinding result of the coactivator-recruited histone acetylases [GS]. Similar to how the coactivators use the LXXLL motif as a docking point, the corepressors contain an LXXIIXXXL peptide referred to as the corepressor nuclear receptor (CoRNR) box [GG]. The precise nature of the interaction between corepressors and NRs remained elusive before the solution of the crystal structure between PPARa and a peptide from SMRT. As mentioned briefly in a previous section, this structure shows that the CoRNR box occupies the same general site on PPARa as the coactivator LXXLL motif. However,
15.3 The Nuclear Receptor Superfamily and Drug Discovery
the CoRNR box is approximately one a-helical turn longer, and the AF2 helix on PPARa is pushed out of position and does not play a role in molecular recognition (Fig. 15.3-8(c)).There are several reports showing that NRs occupied by nonagonist ligands, such as E R with raloxifene and GR with RU486, increase corepressor binding. These results suggest that these type of ligands not only disfavor coactivator binding but also create a surface on the N R favorable for corepressor binding.
15.3.5.3 Interference in NF-KB and AP-1 Pathways In addition to interaction with coactivator and corepressor proteins, NRs have been shown to associate with a variety of other proteins key to cellular maintenance and function. It has been well documented that several NRs, predominately the steroid receptors and also PPARs, RXR, and RAR, have the ability to crosstalk with signaling pathways involving the transcription factors NF-KB and AP-1 [67, 681. Activated NRs typically repress the ability of N F - K B and/or AP-1 to transcribe their targeted genes. This interference is believed to be the basis for the anti-inflammatory actions of corticosteroids and estrogens [69,70].There have been several mechanisms proposed for these activities, but a conclusive molecular basis for these activities remains elusive. One proposal suggests a direct interaction between the N R and NF-KB [71, 721. Since both NRs and NF-KB require the aid ofcoactivator proteins, such as SRCl and CBP, another proposed mechanism involves a “cofactor squelching” event. A third proposal involving GR involves a direct association between the N R and protein kinase A, whereby cross-coupling of NF-KB and GR occurs in the cytoplasm [73]. Clearly, these studies show that NRs play a complex and integrated role in pathway management beyond the direct D NA-mediated regulation of gene transcription.
15.3.5.4 Nonnuclear Functions and Interactions with Other Cellular Proteins Another level of complexity in N R functions, apart from the vast network of coactivator, corepressor, and NF-KB/AP-l interactions, involves interaction with a wide variety ofcellular proteins. In general, these activities are commonly referred to as nongenornic actions [74]. Full coverage of this arena is beyond the scope of this review, but a few selected examples are highlighted to demonstrate the breadth of complexity that liganded NRs have on adjacent pathways. For example, PR and other steroid receptors have been shown to interact with numerous cytoplasmic kinases, such as c-Src tyrosine kinases, in a ligand-dependent manner [75, 761. GR has been shown to interact with a variety of cellular factors such as SMAD3 [77]and J N K [78].E R has been shown to interact with a variety of factors, such as phosphatidylinositol-3-OH kinase (PI3K) [79]. Additionally, N R s are phosphorylation targets, primarily within the AFl domain, and it has been shown that N R activities can be modulated by phosphorylation state [80-83].
I
91 5
916
I
75 Target Families
15.3.6 Specific Examples of Recent NR Drugs and Novel Drug Candidates
As mentioned earlier in the text, the NRs have a rather illustrious history in pharmaceutical discovery (Table 15.3-1). Once a synthetic ligand has been identified for a receptor, typically via screening and/or structure-guided design efforts, the goal is to chemically alter the properties of the ligand to appropriately modulate the activities of the receptor. Throughout the last decade or so, ligands that display differential activities relative to the natural ligand have been commonly referred to as selective nuclear receptor modulators (SNuRMs). One of the original demonstrations of this concept involved ER and the two classic selective estrogen receptor modulators (SERMs),OHT and raloxifene. Essentially, it was found that these SERMs retained tissue-selective agonist activity (such as in bone tissue and on lipid profile for raloxifene), but functioned as antagonists in reproductive tissues [84, 851. Furthermore, even though both molecules were originally considered "antiestrogens", OHT generally shows a trend toward estradiol-like activity in uterine tissue [85], whereas raloxifene does not. The groundbreaking work around novel ER ligands has opened the gates to find novel, tissue-selective synthetic modulators for several of the therapeutically relevant NRs. In this section we will highlight a few of the more recent pursuits of SNuRMs (Fig. 15.3-9). The purpose of this brief discussion is to give an overview of the current state of the art for ligand and drug discovery by mentioning a few somewhat recent specific examples. Overall, the present mission in N R drug discovery is to manipulate the receptor with ligand to retain tissue-selective benefits while minimizing the unwanted activities (Table 15.3-2). These few selected examples cover the basic principles of N R drug discovery - such as identifying small molecule binders and modifying hits for N R modulation - and the use of recent techniques and methodologies.
15.3.6.1
Selective ER Modulators (SERMs)
First reported in the 1970s, tamoxifen was the first synthetic N R small molecule to show differential tissue effects. The primary reason it has not been widely used to treat menopausal symptoms is the fact that this molecule shows stimulatory effects on the uterus, which cause a significant risk for endometrial cancer [86]. However, tamoxifen remains a first-line treatment for ER-positive breast cancer. A second generation SERM, raloxifene, was originally developed as a tamoxifen follow-up for breast cancer, but it was demonstrated that this molecule has significant osteoporosis protective effects without the endometrial activities relative to tamoxifen [87]. The molecular basis for these ER-modulating activities has been the focal point for a wide body of pharmacological research [88]. One proposed mechanism is the differential effects of SERM-bound ER to promote corepressor association versus coactivator association [89, 901.
15.3 The Nuclear Receptor Superfamily and Drug Discovery I917
Table 15.3-2 Examples of therapeutic profiles for designer, tissue-selective nuclear receptor modulator ligands Receptor
Desired efficacy with therapeutic modulator compound
Estrogen receptor
a
(W a
Glucocorticoid receptor
a a
Reduce menopausal hot flashes Prevent postmenopausal osteoporosis Reduce inflammatory conditions Suppress immune system for transplant
Unwanted activity to be reduced with desired modulator a
Breast and uterine tissue stimulation
a
Fat redistribution and weight gain Increased bone loss Diabetes Depression/mood Effects H yperkalemia
a a a
Mineralocorticoid receptor
a
Progesterone receptor Androgen receptor
a
PPARa PPARS PPARy Liver X receptor (LXRa or LXRP)
a
a
a
a a a a
a
Farnesoid X receptor
a
Reduce hypertension Protection against congestive heart failure Reduce endometriosis Protection against skeletal muscle atrophy Improve dyslipidemia Improve dyslipidemia Glucose lowering Reduce atherosclerosis Anti-inflammatory Antidiabetic Protection against cholestasis
a
a m
a a a
a
Abortive activities Prostate stimulation Peroxisome proliferation Unknown Edema and weight gain Hypertriglyceridemia
Unknown
Driving on the theory that ligands can induce specific ER conformations, a series of triphenylethylene ligands for ER were made and screened through a uterine Ishikawa cellular assay "911. Compounds showing the ability to reduce estrogen stimulated Ishikawa cell stimulation were then tested in ovariectomized rats for the ability to protect against loss of bone mineral density. The molecule GW5638 was identified using this approach (Fig. 15.3-9); it was further shown that the compound had antagonist properties on the uterus and agonist activities on the bone and the cardiovascular system [92]. A further study has shown that the unique biological properties of GWS638 are derived from the unique structural conformation of ER when bound to GW5638 relative to other SERMs [93]. In addition to this one example, a number of novel SERMs have been identified using a combination of cellular screens, primarily uterine cell- and breast cell-based assays [94, 951. These SERMs include idoxifene, lasofoxifene, Wyeth 424, levomeloxifene, and others (Fig. 15.3-9). Two new approaches to ER ligand discovery have recently been reported. One involves the use of NF-KB-driven reporter assays to discover pathway
918
I selective ligands with the potential to treat inflammatory disorders [96]. 15 Target Families
Another relatively recent focus for ER-directed drug discovery relates to the fact that there are two subtypes of this receptor, ERw and ERj3, which derive from two separate genes [32, 971. Stimulated by the distinct tissue distribution pattern of these two related receptors, the concept is that new indications, such as inflammation and cancer, can be treated with an ER-selective molecule. Toward this goal, several reports have demonstrated it to be possible to identify ERB-selective ligands [37, 98-1001.
15.3.6.2
Selective CR Modulators (SCRMs)
A variety of debilitating diseases, such as rheumatoid arthritis, inflammatory myopathies, cancers, and a variety of immunological diseases are treated with the classic synthetic glucocorticoids, dexamethasone, and prednisone. However, long-term treatment with these drugs often leads to serious side effects such as fat redistribution, diabetes, vascular necrosis, and osteoporosis. There is currently an intense effort to identify new small molecules that are able to differentially modulate GR to retain the beneficial effects of glucocorticoids and reduce the incidence of unwanted side effects [lo]. A key genetic study, utilizing a knock-in mutation of a dimerization-deficient mutant of GR, has shed light o n the molecular basis for dissociative activity [loll. In essence, this GRdimmutant demonstrated that some of the direct gene transduction properties of GR can be reduced while other immunemodulating functions of the receptor can be retained. This concept forms one of the principles of selective modulation of GR. Importantly, many of the anti-inflammatory effects of GR are believed to be driven by the ability of the monomeric form of the receptor to interfere with NF-KB and AP-1 function, which ultimately results in reduction of proinflammatory cytokines such as interleukins (1L)-1, -2, -6, -8,and tumor necrosis factor (TNF) TNFw [69]. There have been several recent reports of ligands that display differential GR activation. Although a complete survey is beyond the scope of this review, we will select a few examples to demonstrate the concept and the methods used to discover the ligands. Typically, three measures of GR activity were used to identify these ligands: (a) direct GR binding relative to other steroid receptors, (b) a cell-based assay measuring GRE-mediated gene transcription (referred to a transactivation), and (c) cell-based assays measuring the ability of GR to regulate NF-KB and AP-1-driven genes (referred to as transrepression). Several steroid-based compounds have been shown to differentially reduce transactivation with only minimal effects of transrepression (see Figure 15.3-9) [102, 1031. In the nonsteroidal class of GR ligands, a quinolinebased series of compounds, particularly ones with an aryl substituent at the C5 position, yielded a trend toward a preferred transactivation/transrepression profile in cellular assays. Some of these ligands also showed a more promising therapeutic window for selective in vivo effects [104, 1051. In another study, a nonsteroidal GR ligand, ZK 216348, has been reported
75.3 The Nuclear Receptor Superfamily and Drug Discovery
to show significant dissociation of transactivation and transrepression activities [106]. Following a GR-binding assay to identify high-affinity binding compounds, hits were characterized using (a) an assay measuring GRE-driven reporter (induction of tyrosine aminotransferase), (b) an assay monitoring reduction of lipopolysaccharide (LPS)-induced IL-8 production from TH P-1 monocyte/macrophage cells, and (c) an assay measuring inhibition of TNFa and IL-12 p70 from LPS-induced peripheral blood mononuclear cells. This linear approach highlighted ZK 216348 as a dissociative molecule. Further in viuo work, using an ear inflammatory model for efficacy and models for skin atrophy, weight gain, adrenal weight, and blood glucose levels for unwanted side effects, showed an improved therapeutic profile relative to prednisone.
15.3.6.3
Other Modulator Efforts: PR, M R , AR, PPAR, FXR, LXR
The concept of selective N R modulation to produce an activity and therapeutic profile distinct from the natural ligand has been applied to numerous other receptors (Fig. 15.3-9). For example, a modified steroid ligand for PR, called asoprinisol has been shown to produce antiuterotrophic effects with only minimal antiabortive and breakthrough bleeding effects [ 1071. A selective M R modulator called eplerenone, a molecule that was discovered decades ago, has recently been approved as the drug for hypertension [108]. This synthetic steroid has improved the specificity for M R over related receptors, and functions as a partial antagonist of aldosterone [109]. Currently, there is an effort to identify a modulator of AR for utility in prostate cancer as well as possibly treating the neurological and muscular degenerative symptoms of androgen deficiency [110- 1121. One recent example of a tissue-selective AR modulator is LGD2226, which appears to retain some anabolic effects on the bone and muscle with reduced proliferative effects on the prostate [113]. Several groups have shown progress in developing selective peroxisome proliferator activated receptor gamma modulators (SPPARMs). The firstgeneration TZD class of PPARy agonists, used pharmacologically as insulin sensitizers, also exhibit dose-limiting liabilities such as hemodilution and edema (see Table 15.3-2).Initial studies of PPARy activation by TZDs revealed that these compounds activated the receptor via a direct interaction with the C-terminal AF2 helix [40]. Structural studies have also revealed PPARy activators that bind the LBD using non-TZD epitopes such as the partial agonist GW0072 [39].Compounds that have distinct binding and/or activation modes represent a potential avenue to discover PPARy modulators with modified biological activities. Non-TZD selective PPARy modulator (e.g., [45] nTZDpa) compounds have been found which induce an altered LBD conformation compared to TZDs as measured by protease protection and N M R spectroscopy [114].Like GW0072, these compounds function as partial agonists and could antagonize the activity of PPARy full agonists in 3T3L1 adipogenesis assays. Moreover, the nTZDpa compounds demonstrated qualitative differences versus traditional agonists on gene expression in cell
I
919
920
75 Target Families
I culture (3T3-Ll adipocytes) and in vivo (white adipose tissue) and also on in vivo physiological responses such as adipose depot size. Thus, further efforts to develop SPPARMs may lead to compounds with improved characteristics relative to existing clinical compounds. Modulator efforts have also begun for NRs that to date have only been investigated preclinically. In studies of both FXR and the LXRs [115-1171, compounds with potential novel biological activity relative to natural ligands are being identified. For example, LXRaIB are regulated in vivo by oxysterols and this regulation is consistent with the role of the LXRs in cholesterol homeostasis. Animal models using nonselective LXR tool compounds indicate that, in addition to conferring atheroprotective effects, these agonists also promote lipogenesis and triglyceride accumulation in liver. Miao et al. reported that two LXR agonists (TO901317 and GW3965) show differential effects on cofactor recruitment in human hepatoma cell assays. Additionally, these two compounds differ in their in vivo effects on hepatic lipogenesis genes. These studies point toward the promise of developing LXR modulator compounds that possess antiatherogenic activities with limited hepatic liability. Whether the difference between these compounds reflects tissue versus gene selectivity remains to be elucidated. For both the steroid receptor and nonsteroid receptor modulators, more work is needed to understand better the underlying basis of modulator effects. Taken together, these examples highlight the degree of complexity required on several levels, such as high-affinity binding to the receptor, inducing conformational change or altered structural dynamics, selecting an appropriate cellular assay for measuring N R modulation, and using relevant in vivo models for measuring the therapeutic index of effects. Because of the structural and functional similarities within the N R superfamily, lessons learned from one receptor concerning modulation by a designer small molecule can probably be applied to other members of the family [3, 118, 1191. Overall, with increasing knowledge of N R functions, the promise is high that novel, safer, and more effective medicines will be the eventual outcome. Important in this pursuit is the use of new technologies to profile ligands; this is the topic for the final section below. 15.3.7 New Approaches to NR Drug Discovery
One of the more recent principles in the field of NR research and drug discovery is the realization that a subset of the myriad of functions of NRs can be selectively manipulated by ligand, a general concept referred to as N R modulation. New technologies, including advanced computational methods, are inspiring new strategies for discovering novel NR modulating drug candidates. Importantly, new technologies allow profiling of N R ligands at greater speed and in a more physiologically relevant context. Several new approaches to N R
15.3 The Nuclear Receptor Superfamily and Drug Discovery
modulator discovery are illustrated in this section, drawing on recent work on ERaIERB to provide specific examples. As discussed briefly in the previous sections, NRs do not act in isolation, but in complex associations with other cellular factors. Cofactor interaction screening exploits the relationship between N R structure and functional activity. If a particular ligand uniquely alters the pattern of cofactor interaction relative to other ligands, there is a likelihood that the differential i n vitro profile will translate into a unique gene expression pattern or physiological outcome i n vivo. Peptides representing these interactions can be synthesized on the basis of known interaction motifs or isolated through screening random peptide libraries. In ER modulator discovery, this method has been used to characterize known SERMs and to discover E R ligands with unique properties [120].Norris et al. applied affinity selection of peptides to identify binding surfaces that are exposed on ERaIB when complexed with different ligands, such as with estradiol or 4-OH tamoxifen. They found that the established SERMs, known to produce distinct biological effects, induced distinct conformational changes in the receptors. The ability of the peptides to discriminate between different ERwIB ligand complexes has enabled development of screens to detect subtle differences between E R ligands. Ligand screens have been developed on the basis of NR-peptide interactions using a high-throughput multiplexed technology, which utilizes fluorescently encoded microspheres [ 121, 1221. Purified N R LBD domains can be used in these screens and the repertoire of novel NR-interacting cofactors has expanded dramatically in the past few years. To rapidly identify novel interactors, genome-widescreens for binding partners have been carried out in yeast and mammalian-based two-hybrid systems. As mentioned above, over 200 human N R cofactors have been identified. These interactors are important in the era of N R modulator discovery since each new cofactor carries the potential to recapitulate a particular cellular interaction and thus provides the basis for a molecular screen for molecules that uniquely affect the interaction. Since NRs are transcription factors, monitoring ligand effects on N R target genes is a powerful approach to N R drug discovery. The difficulties and expenses involved in measuring endogenous gene expression have limited this approach in drug-screening method until recently. Microarray technology has made it possible to assess endogenous gene expression on a genome-wide scale and this technology has been used to define an unbiased set of N R target genes. For example, multiple groups have utilized microarray technology to differentiate the functions of E R a and ERB in estrogen target organs such as the bone, breast, and uterus, In one specific set of experiments, human U20S osteosarcoma cells (which express neither E R a nor ERB) were stably transfected with human E R a / B to selectively overexpress the receptors in this bone model system [123]. Treatment of the two cell lines with 17-8 estradiol resulted in two overlapping but distinct patterns of gene expression. Interestingly, 28% of the estradiol-regulated genes were E R a cell specific while 11%were ERB cell specific. Not only did this work allow the functional
I
921
922
l dissection of the pathways regulated by two functionally similar receptors but 15 Target Families
it has also identified unique sets of endogenous target genes for use in ligand screening assays. Using a similar system as described above (U20S cells expressing either E R a or ERB), Tee and colleagues [124]evaluated the effects of different ER ligands (including the SERMs raloxifene and tamoxifen) on E R a and ERB target genes. Microarray analysis showed that raloxifene and tamoxifen regulated only 27% of the same genes in both the E R a and ERB-containing cells. These results indicate that estrogens and SERMs exert tissue-specific effects by regulating unique sets of target genes through ERa/#?. Thus, these specific genes serve as unique identifiers of compound action, and a subset is especially useful in discriminating ER ligands. Higher throughput methods to analyze gene expression hold the promise of screening large numbers of compounds in a cellular environment using a costeffective technology. For example, with advances in glass slide preparations for monitoring transcriptional changes of thousands of genes, a hit from a multiwell cell treatment can be inexpensively assessed over a genome-wide range of genes. with such an analysis, it is possible to observe distinctions between even very closely related chemotypes. A recent study has used gene expression profiling to characterize breast cancer cells and to identify desired “molecular fingerprints” within the data [ 1251. Key “biomarkers” can be identified, which provide information linked to the phenotypic effect of a compound. With such a screen, knowledge ofthe target ofthe compounds (e.g., whether a compound has antiestrogen effects) is not an a priori requirement. One challenge in this type of approach is that vast amounts of data are generated and bioinformatics analysis becomes a limiting factor. Current advances in gene expression profiling as a drug-screening method must go hand in hand with advances in bioinformatics and data handling. Changes in the steady-state levels of mRNA do not tell the whole story. Study groups are now involved in integrating the data obtained from mRNA steadystate level analysis with proteomic data. Huber et al. analyzed differences between the gene and protein expression patterns of the human breast carcinoma cell line T47D and its derivative T47D-r, which is resistant to the pure antiestrogen ZM 182780 [126]. Microarray analysis was carried out in parallel to a proteomics analysis where the total cellular protein content of T47D or T47D-r was separated on two-dimensional gels. Thirty-eight proteins were found to be reproducibly up- or downregulated more than twofold in T47D-r versus T47D in the proteomics analysis. Comparison with differential mRNA analysis revealed that 19 of these were up- or downregulated in parallel with the corresponding mRNA molecules. For 11 proteins, the corresponding mRNA was not found to be differentially expressed, and for 8 proteins an inverse regulation was found at the mRNA level. A general conclusion from such studies is that, though the pattern of expression of the two data sets is similar, the disconnected trends emphasize the importance of posttranslational mechanisms in cellular development. These types of changes can only be
Acknowledgment
observed through integration of the proteomic and transcriptomic approaches. New higher throughput methods to carry out proteome variation are making this type of analysis more practical. The above examples illustrate how N R target genes have been discovered through physical experimentation. In silico approaches are also being developed that increase the speed of N R drug discovery. For example, comprehensive computational approaches can now be carried out to identify N R target genes. NUBIscan represents a new computer algorithm for predicting N R target sequences in regulatory regions of genes [127]. This approach is being combined with other methods to quickly validate the target genes predicted by the in silico method. High-throughput, genome-wide chromatin immunoprecipitation methods have been combined with computational methods to identify ER target genes and promoter sequences [128]. Genes identified by computational analysis are not biased by target tissue or expression levels, and thus complement microarray approaches. In summary, N R drug discovery is moving closer to the realm of being able to profile compounds in a setting closer to the native physiological environment, or in an in vitro environment, with a physiologically comprehensive array of functional partners in a high-throughput fashion.
15.3.8 Future Developments and Conclusions for NR Chemical Biology
The human NRs as a structural class are essential for life and survival, and they play an integral role in many critical physiological processes such as metabolism, homeostasis, differentiation, growth and development, aging, and reproduction. This family of receptors has a common evolutionary history as evidenced by their sequence relationship and their commonality in cellular function [129]. The myriad of functions of NRs is vastly complex and the pathways they control are intertwined with each other as well as with numerous accessory proteins and partners in function. Even with this inherent complexity, as reviewed briefly above, this family of receptors has had a long and fruitful history for drug discovery. With the advent of high-throughput chemistries, structural biology, novel biochemical methods, and pathway analysis technologies, such as differential gene expression and proteomics, there will undoubtedly be new discoveries leading to drugs with improved therapeutic profiles. These N R modulator efforts should help in defining better the ligand-induced activities that produce tissue-selective beneficial effects and in minimizing unwanted activities. In addition, there are likely to be advances toward ligand discovery for the remaining orphan receptors. Studies using these tool compounds should lead to target validation and better definition of therapeutic relevance for the remaining orphan NRs. Overall, the future of targeting the N R superfamily with novel synthetic ligands holds
1
923
924
75 Target Families
I tremendous potential and should lead to a variety of safer, more effective medicines for treatment of a plethora of human diseases.
Acknowledgment
We would like to thank Tim Willson for critically reading this review. We also thank Lakshman Ramamurthy for his kind contribution of the N R superfamily phylogeny plot. Finally, we would like to thank our many GlaxoSmithKline colleagues for helpful discussions and collaborations on NR-related projects. References Nuclear Receptors Nomenclature, C., A unified nomenclature system for the nuclear receptor superfamily, Cell 1999, 97,161-163. 2. 0. Wrange, J.A. Gustafsson, Separation of the hormone- and DNA-binding sites of the hepatic glucocorticoid receptor by means of proteolysis,J . Biol. Chem. 1978, 253, 856-865. 3. J.M. Wurtz, W. Bourguet, J.P. Renaud, V. Vivat, P. Chambon, D. Moras, H. Gronemeyer, A canonical structure for the ligand-binding domain of nuclear receptors, Nut. Struct. B i d . 1996, 3 , 87-94. 4. S. Khorasanizadeh, F. Rastinejad, Nuclear-receptor interactions on DNA-response elements, Trends Biochem. Sci. 2001,2G, 384-390. 5. W. Bourguet, M. Ruff, P. Chambon, H. Gronemeyer, D. Moras, Crystal structure of the ligand-binding domain of the human nuclear receptor RXR-alpha [see comment], Nature 1995, 375, 377-382. 6. S.P. Williams, P.B. Sigler, Atomic structure of progesterone complexed with its receptor, Nature 1998, 393, 392-396. 7. B.F. Luisi, W.X. Xu, 2. Otwinowski, L.P. Freedman, K.R. Yamamoto, P.B. Sigler, Crystallographic analysis of the interaction of the glucocorticoid receptor with DNA, Nature 1991,352, 497-505. 1.
8.
M.I. Diamond, J.N. Miner, S.K. Yoshinaga, K.R. Yamamoto, Transcription factor interactions: selectors of positive or negative regulation from a single DNA element, Science 1990, 249,1266-1272.
9. 10.
11.
R.F. Witzmann, Steroids, Keys to Life, 1981. M.J. Coghlan, S.W. Elmore, P.R. Kym, M.E. Kort, The pursuit of differentiated ligands for the glucocorticoid receptor, Curr. Top. Med. Chem. 2003,3,1617-1635. R.B. Woodward, F. Sondheimer, D. Taub, The total synthesis of cortisone, J. Am. Chem. Soc. 1951, 73, 4057.
12.
L.H. Sarett, G.E. Arth, R.M. Lukes, R.E. Beyler, G.I. Poos, W.F. Johns, J.M. Constantin, Stereospecifictotal synthesis of cortisone, J . Am. Chem.
13.
V.C. Jordan,Tamoxifen: a most unlikely pioneering medicine [see comment], Nut. Rev. Drug Discov.
SOC.1952, 74,4974-4976.
14.
15.
2003,2,205-213. R. Miesfeld, S. Okret, A.C. Wikstrom, 0. Wrange, J.A. Gustafsson, K.R.
Yamamoto, Characterization of a steroid hormone receptor gene and mRNA in wild-type and mutant cells, Nature 1984, 312, 779-781. M.V. Govindan, M. Devic, S. Green, H. Gronemeyer, P. Chambon, Cloning of the human glucocorticoid receptor cDNA, Nucleic Acids Res. 1985, 13,8293-8304.
References I 9 2 5
S.M. Hollenberg, C. Weinberger, E.S. Ong, G . Cerelli, A. Oro, R. Lebo, E.B. Thompson, M.G. Rosenfeld, R.M. Evans, Primary structure and expression of a functional human glucocorticoid receptor cDNA, Nature 1985,318,635-641. 17. S. Green, P. Walter, G. Greene, A. Krust, C. Goffin, E. Jensen, G. Scrace, M. Waterfield, P. Chambon, Cloning of the human oestrogen receptor cDNA, /. Steroid Biochem. 1986, 24, 77-83. 18. P. Walter, S. Green, G. Greene, A. Krust, J.M. Bornert, J.M. Jeltsch, A. Staub, E. Jensen, G. Scrace, M. Waterfield, P. Chambon, Cloning of the human estrogen receptor cDNA, Proc. Natl. Acad. Sci. U.S.A. 1985,82,7889-7893. 19. G.L. Greene, P. Gilna, M. Waterfield, A. Baker, Y. Hort, J. Shine, Sequence and expression of human estrogen receptor complementary DNA, Science 1986,231,1150-1154. 20. D.J. Mangelsdorf, R.M. Evans, The RXR heterodimers and orphan receptors, Cell 1995, 83, 841-850. 21. B. Blumberg, R.M. Evans, Orphan nuclear receptors - new ligands and new possibilities, Genes Deu. 1998, 12, 3149-3155. 22. V. Giguere, Orphan nuclear receptors: from gene to function, Endocr. Rev. 1999, 20,689-725. 23. S.A. Kliewer, J.T. Moore, L. Wade, J.L. Staudinger, M.A. Watson, S.A. Jones, D.D. McKee, B.B. Oliver, T.M. Willson, R.H. Zetterstrom, T. Perlmann, J.M. Lehmann, An orphan nuclear receptor activated by pregnanes defines a novel steroid signaling pathway, Cell 1998, 92, 73-82. 24. M. Kobayashi, S. Takezawa, K. Hara, R.T. Yu, Y. Umesono, K. Agata, M. Taniwaki, K. Yasuda, K. Umesono, Identification of a photoreceptor cell-specific nuclear receptor, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,4814-4819. 25. J.M. Maglich, A.E. Sluder, T.M. Willson, J.T. Moore, Beyond the human genome: examples of nuclear 16.
receptor analysis in model organisms and potential for drug discovery, Am. /. Phannacogenomics 2003, 3, 345-353. 26. M. Robinson-Rechavi, A.S. Carpentier, M. Duffraisse, V. Laudet, How many nuclear hormone receptors are there in the human genome? Trends Genet. 2001, 17, 554-556. 27. J.M. Lehmann, L.B. Moore, T.A. Smith-Oliver,W.O. Wilkison, T.M. Willson, S.A. Kliewer, An antidiabetic thiazolidinedione is a high affinity ligand for peroxisome proliferator-activated receptor gamma (WAR gamma),]. Biol. Chem. 1995, 270,12953-12956. 28. D.J. Parks, S.G. Blanchard, R.K. Bledsoe, G. Chandra, T.G. Consler, S.A. Kliewer, J.B. Stimmel, T.M. Willson, A.M. Zavacki, D.D. Moore, J.M. Lehmann, Bile acids: natural ligands for an orphan nuclear receptor [see comment], Science 1999, 284,1365-1368. 29. R.K. Bledsoe, V.G. Montana, T.B. Stanley, C.J. Delves, C.J. Apolito, D.D. McKee, T.G. Consler, D. J. Parks, E.L. Stewart, T.M. Willson, M.H. Lambert, J.T. Moore, K.H. Pearce, H.E. Xu, Crystal structure of the glucocorticoid receptor ligand binding domain reveals a novel mode of receptor dimerization and coactivator recognition, Cell 2002, 110,93-105. 30. R.K. Bledsoe, K.P. Madauss, J.A. Holt, C.J. Apolito, M.H. Lambert, K.H. Pearce, T.B. Stanley, E.L. Stewart, R.P. Trump, T.M. Willson, S.P. Williams, A ligand-mediated hydrogen bond network required for the activation of the mineralocorticoid receptor, J . Biol. Chem. 2005,280, 31283-31293. 31. P.M. Matias, P. Donner, R. Coelho, M. Thornaz, C. Peixoto, S. Macedo, N. Otto, S. Joschko, P. Scholz, A. Wegg, S. Basler, M. Schafer, U. Egner, M.A. Carrondo, Structural evidence for ligand specificity in the binding domain of the human androgen receptor. Implications for
926
I
15 Target Families
32.
33.
34.
35.
36.
37.
38.
pathogenic gene mutations, J. Bid. Chem. 2000, 275,26164-26171. G.G. Kuiper, E. Enmark, M. Pelto-Huikko, S. Nilsson, J.A. Gustafsson, Cloning of a novel receptor expressed in rat prostate and ovary, Proc. Natl. Acad. Sci. U.S.A. 1996, 93,5925-5930. A.C. Pike, A.M. Brzozowski, R.E. Hubbard, A structural biologist's view of the oestrogen receptor, J. Steroid Biochem. Mol. Biol. 2000, 74, 261-268. A.K. Shiau, D. Barstad, P.M. Loria, L. Cheng, P.J. Kushner, D.A. Agard, G.L. Greene, The structural basis of estrogen receptor/coactivator recognition and the antagonism of this interaction by tamoxifen, Cell 1998, 95,927-937. A.C.W. Pike, A.M. Brzozowski, R.E. Hubbard, T. Bonn, A.G. Thorsell, 0. Engstrom, J. Ljunggren, J.K. Gustafsson, M. Carlquist, Structure of the ligand-binding domain of oestrogen receptor beta in the presence of a partial agonist and a full antagonist, E M B O J . 1999, 18, 4608-46 18. A.M. Brzozowski, A.C. Pike, 2. Dauter, R.E. Hubbard, T. Bonn, 0. Engstrom, L. Ohman, G.L. Greene, J.A. Gustafsson, M. Carlquist, Molecular basis of agonism and antagonism in the oestrogen receptor, Nature 1997, 389, 753-758. A.K. Shiau, D. Barstad, J.T. Radek, M.J. Meyers, K.W. Nettles, B.S. Katzenellenbogen, J .A. Katzenellenbogen, D.A. Agard. G.L. Greene, Structural characterization of a subtype-selective ligand reveals a novel mode of estrogen receptor antagonism, Nut. Struct. Biol. 2002, 9, 359-364. A.C.W. Pike,A.M. Brzozowski, J. Walton, R.E. Hubbard, A.G. Thorsell, Y.L. Li, J.A. Gustafsson, M. Carlquist, Structural insights into the mode of action of a pure antiestrogen, Structure 2001, 9, 145- 153.
39.
40.
41.
42.
43.
44.
J.L. Oberfield, J.L. Collins, C.P. Holmes, D.M. Goreham, J.P. Cooper, J.E. Cobb, J.M. Lenhard, E.A. Hull-Ryde, C.P. Mohr, S.G. Blanchard, D.J. Parks, L.B. Moore, J.M. Lehmann, K. Plunket, A.B. Miller, M.V. Milburn, S.A. Kliewer, T.M. Willson, A peroxisome proliferator-activated receptor gamma ligand inhibits adipocyte differentiation, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,6102-6106. R.T. Nolte, G.B. Wisely, S. Westin, J.E. Cobb, M.H. Lambert, R. Kurokawa, M.G. Rosenfeld, T.M. Willson, C.K. Glass, M.V. Milburn, Ligand binding and co-activator assembly of the peroxisome proliferator-activated receptor-gamma, Nature 1998, 395, 137-143. S. Svensson, T. Ostberg, M. Jacobsson, C. Norstrom, K. Stefansson, D. Hallen, I.C. Johansson, K. Zachrisson, D. Ogg, L. Jendeberg, Crystal structure of the heterodimeric complex of LXRalpha and RXRbeta ligand-binding domains in a fully agonistic conformation, EMBOJ. 2003, 22, 4625-4633. S. Williams, R.K. Bledsoe, J.L. Collins, S. Boggs, M.H. Lambert, A.B. Miller, J. Moore, D.D. McKee, L. Moore, J. Nichols, D. Parks, M. Watson, B. Wisely, T.M. Willson, X-ray crystal structure of the liver X receptor beta ligand binding domain: regulation by a histidine-tryptophan switch, J. Biol. Chem. 2003, 278, 27138-27143. M. Farnegardh, T. Bonn, S. Sun, J. Ljunggren, H. Ahola, A. Wilhelmsson, J.A. Gustafsson, M. Carlquist, The three-dimensional structure of the liver X receptor beta reveals a flexible ligand-binding pocket that can accommodate fundamentally different ligands, /. Biol. Chem. 2003, 278,38821-38828. G.B. Wisely, A.B. Miller, R.G. Davis, A.D. Thornquest Jr, R. Johnson, T. Spitzer, A. Sefler, B. Shearer, J.T. Moore, T.M. Willson, S.P. Williams,
References I 9 2 7
45.
46.
47.
48.
49.
so.
Hepatocyte nuclear factor 4 is a transcription factor that constitutively binds fatty acids, Structure 2002, 10, 1225-1234. 2. Wang, G. Benoit, J. Liu, S. Prasad, P. Aarnisalo, X. Liu, H. Xu, N.P. Walker, T. Perlmann, Structure and function of Nurrl identifies a class of ligand-independent nuclear receptors, Nature 2003, 423, 555-560. S. Dhe-Paganon, K. Duda, M. Iwamoto, Y.I. Chi, S.E. Shoelson, Crystal structure of the HNF4 alpha ligand binding domain in complex with endogenous fatty acid ligand, /. Biol. Chem. 2002, 277, 37973-37976. K.D. Baker, L.M. Shewchuk, T. Kozlova, M. Makishima, A. Hassell, B. Wisely, J.A. Caravella, M.H. Lambert, J.L. Reinking, H. Krause, C.S. Thummel, T.M. Willson, D.J. Mangelsdorf, The Drosophila orphan nuclear receptor DHR38 mediates an atypical ecdysteroid signaling pathway, Cell 2003, 113,731-742. D.P. McDonnell, D.L. Clemm, T. Hermann, M.E. Goldman, J.W. Pike, Analysis of estrogen receptor function in vitro reveals three distinct classes of antiestrogens, Mol. Endocrinol. 1995, 9, 659-669. H.E. Xu, T.B. Stanley, V.G. Montana, M.H. Lambert, B.G. Shearer, J.E. Cobb, D.D. McKee, C.M. Galardi, K.D. Plunket, R.T. Nolte, D.J. Parks, J.T. Moore, S.A. Kliewer, T.M. Willson, J , B. Stimmel, Structural basis for antagonist-mediated recruitment of nuclear co-repressors by PPARalpha, Nature 2002, 415, 813-817. B. Kauppi, C. Jakob, M. Farnegardh, J. Yang, H. Ahola, M. Alarcon, K. Calles, Am.0. Engstr, J. Harlan, S. Muchmore, A.K. Ramqvist, S. Thorell, L. Ohman, J . Greer, J.A. Gustafsson, J . Carlstedt-Duke, M. Carlquist, The three-dimensional structures of antagonistic and agonistic forms of the glucocorticoid receptor ligand-binding domain: RU-486 induces a transconformation
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
that leads to active antagonism, /. Biol. Chem. 2003, 278,22748-22754. B.A. Johnson, E.M. Wilson, Y. Li, D.E. Moller, R.G. Smith, G. Zhou, Ligand-induced stabilization of PPARgamma monitored by NMR spectroscopy: implications for nuclear receptor activation, /. Mol. Biol. 2000, 298, 187-194. M. Beato, J. Klug, Steroid hormone receptors: an update, Hum. Reprod. Update 2000, 6,225-236. H. Gronemeyer, J.A. Gustafsson, V. Laudet, Principles for modulation of the nuclear receptor superfamily, Nat. Rev. Drug Discou. 2004, 3, 950-964. W.B. Pratt, The role of heat shock proteins in regulating the function, folding, and trafficking of the glucocorticoid receptor, 1.Biol. Chem. 1993, 268,21455-21458. S.A. Onate, S.Y. Tsai, M.J. Tsai, B.W. O’Malley, Sequence and characterization of a coactivator for the steroid hormone receptor superfamily, Science 1995, 270, 1354-1357. C.K. Glass, D.W. Rose, M.G. Rosenfeld, Nuclear receptor coactivators, Curr. Opin. Cell Biol. 1997, 9,222-232. J.W. Lee, Y.C. Lee, S.Y. Na, D.J. Jung, S.K. Lee, Transcriptional coregulators of the nuclear receptor superfamily: coactivators and corepressors, Cell. Mol. Lfe Sci. 2001, 58, 289-297. N.J. McKenna, B.W. O’Malley, Minireview: nuclear receptor coactivators - an update [Review], Endocrinology 2002, 143,2461-2465. D.M. Heery, E. Kalkhoven, S. Hoare, M.G. Parker, A signature motif in transcriptional co-activators mediates binding to nuclear receptors [see comment], Nature 1997, 387, 733-736. D.M. Heery, S. Hoare, S. Hussain, M.G. Parker, H. Sheppard, Core w ( L L motif sequences in CREB-binding protein, SRC1, and RIP140 define affinity and selectivity for steroid and retinoid receptors, /. Biol. Chem. 2001, 276,6695-6702
9281 15 Target Families 61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
T.E. Spencer, G. Jenster, M.M. Burcin, C.D. Allis, J. Zhou, C.A. Mizzen, N.J. McKenna, S.A. Onate, S.Y. Tsai, M.J. Tsai, B.W. O’Malley, Steroid receptor coactivator-1 is a histone acetyltransferase, Nature 1997,389,194-198. M.L. Privalsky, The role of corepressors in transcriptional regulation by nuclear hormone receptors, Annu. Rev. Physiol. 2004, 66, 315-360. J.D. Chen, R.M. Evans, A transcriptional co-repressor that interacts with nuclear hormone receptors [see comment], Nature 1995,377,454-457. A.J. Horlein, A.M. Naar, T. Heinzel, J. Torchia, B. Gloss, R. Kurokawa, A. Ryan, Y. Kamei, M. Soderstrom, C.K. Glass, M.G. Rosenfeld, Ligand-independent repression by the thyroid hormone receptor mediated by a nuclear receptor co-repressor [see comment], Nature 1995,377, 397-404. M.G. Guenther, 0. Barak, M.A. Lazar, The SMRT and N-CoR corepressors are activating cofactors for histone deacetylase 3, Mol. Cell. Biol. 2001, 21, 6091-6101. X. Hu, M.A. Lazar, The CoRNR motif controls the recruitment of corepressors by nuclear hormone receptors, Nature 1999, 402, 93-96. M. Gottlicher, S. Heck, P. Herrlich, Transcriptional cross-talk, the second mode of steroid hormone receptor action [see comment], J. Mol. Med. 1998, 76,480-489. L.I. McKay, J.A. Cidlowski, Cross-talk between nuclear factor-kappa B and the steroid hormone receptors: mechanisms of mutual antagonism, Mol. Endocrinol. 1998, 12,45-56. L.I. McKay, J.A. Cidlowski, Molecular control of immune/inflammatory responses: interactions between nuclear factor-kappa B and steroid receptor-signaling pathways, Endocr. Rev. 1999, 20,435-459. A. Ray, K.E. Prefontaine, P. Ray, Down-modulation of interleukin-6 gene expression by 17 beta-estradiol
71.
72.
73.
74.
75.
76.
77.
78.
in the absence of high affinity DNA binding by the estrogen receptor, J . Biol. Chem. 1994, 269,12940-12946. A. Ray, K.E. Prefontaine, Physical association and functional antagonism between the p65 subunit of transcription factor NF-kappa B and the glucocorticoid receptor, Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 752-756. E. Caldenhoven, J. Liden, S. Wissink, A. Van de Stolpe, J. Raaijmakers, L. Koenderman, S. Okret, J.A. Gustafsson, P.T. Van der Saag, Negative cross-talk between RelA and the glucocorticoid receptor: a possible mechanism for the antiinflammatory action of glucocorticoids, Mol. Endocrinol. 1995, 9,401-412. V. Doucas, Y. Shi, S. Miyamoto, A. West, I. Verma, R.M. Evans, Cytoplasmic catalytic subunit of protein kinase A mediates cross-repression by NF-kappa B and the glucocorticoid receptor, Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 11893-11898. R. Losel, M. Wehling, Nongenomic actions of steroid hormones, Nat. Rev. Mol. Cell Biol. 2003, 4,46-56. V. Boonyaratanakornkit, M.P. Scott, V. Ribon, L. Sherman, S.M. Anderson, J.L. Maller, W.T. Miller, D.P. Edwards, Progesterone receptor contains a proline-rich motif that directly interacts with SH3 domains and activates c-Src family tyrosine kinases, Mol. Cells 2001, 8, 269-280. M.A. Shupnik, Crosstalk between steroid receptors and the c-Src-receptor tyrosine kinase pathways: implications for cell proliferation, Oncogene 2004, 23, 7979-7989. C.Z. Song, X. Tian, T.D. Gelehrter, Glucocorticoid receptor inhibits transforming growth factor-beta signaling by directly targeting the transcriptional activation function of Smad3, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,11776-11781. A. Bruna, M. Nicolas, A. Munoz, J.M. Kyriakis, C. Caelles, Glucocorticoid receptor-JNK interaction mediates
References I929
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
inhibition of the I N K pathway by glucocorticoids, E M B O J . 2003,22, 6035- 6044. T. Simoncini, A. Hafezi-Moghadam, D.P. Brazil, K. Ley, W.W. Chin, J.K. Liao, Interaction of oestrogen receptor with the regulatory subunit of phosphatidylinositol-3-OH kinase, Nature 2000,407,538-541. S. Kato, Estrogen receptor-mediated cross-talk with growth factor signaling pathways, Breast Cancer 2001,8,3-9. J. Bastien, C. Rochette-Egly, Nuclear retinoid receptors and the transcription of retinoid-target genes, Gene 2004,328,l-16. C . Rochette-Egly, Nuclear receptors: integration of multiple signalling pathways through phosphorylation, Cell. Signalling 2003,IS,355-366. D.A. Lannigan, Estrogen receptor phosphorylation, Steroids 2003,68, 1-9. R.R. Love, R.B. Mazess, H.S. Barden, S. Epstein, P.A. Newcomb, V.C. Jordan, P.P. Carbone, D.L. DeMets, Effects of tamoxifen on bone mineral density in postmenopausal women with breast cancer [see comment], N. Engl.J. Med. 1992,326,852-856. P.D. Delmas, N.H. Bjarnason, B.H. Mitlak, A.C. Ravoux, A.S. Shah, W.J. Huster, M. Draper, C. Christiansen, Effects of raloxifene on bone mineral density, serum cholesterol concentrations, and uterine endometrium in postmenopausal women [see comment], N. Engl. J. Med. 1997,337,1641-1647. S.M. Ismail, The effects oftamoxifen on the uterus, Curr. Opin. Obstet. Gynecol. 1996,8, 27-31. C.H. Turner, M. Sato, H.U. Bryant, Raloxifene preserves bone strength and bone mass in ovariectomized rats, Endocrinology 1994,135, 2001-2005. D.P. McDonnell, The molecular pharmacology of SERMs, Trends Endocrinol. Metab. 1999,10, 301-311. Y.Shang, M. Brown, Molecular determinants for the tissue specificity
90.
91.
92.
93.
94.
95.
96.
of SERMs [see comment], Science 2002,295,2465-2468. P. Webb, P. Nguyen, P.J. Kushner, Differential SERM effects on corepressor binding dictate ERalpha activity in vivo, J. Biol. Chem. 2003, 278,6912-6920. T.M. Willson, B.R. Henke, T.M. Momtahen, P.S. Charifson, K.W. Batchelor, D.B. Lubahn, L.B. Moore, B.B. Oliver, H.R. Sauls, J.A. Triantafillou, S.G. Wolfe, P.G. Baer, 3-[4-(1,2-Diphenylbut-lenyl)phenyl]acrylic acid: a non-steroidal estrogen with functional selectivity for bone over uterus in rats, J . Med. Chem. 1994, 37,1550-1552. T.M. Willson, J.D. Norris, B.L. Wagner, I. Asplin, P. Baer, H.R. Brown, S.A. Jones, B. Henke, H. Sauls, S. Wolfe, D.C. Morris, D.P. McDonnell, Dissection of the molecular mechanism of action of GW5638,a novel estrogen receptor ligand, provides insights into the role of estrogen receptor in bone, Endocrinology 1997,138,3901-3911. C.E. Connor, J.D. Norris, G. Broadwater, T.M. Willson, M.M. Gottardis, M.W. Dewhirst, D.P. McDonnell, Circumventing tamoxifen resistance in breast cancers using antiestrogens that induce unique conformational changes in the estrogen receptor, Cancer Res. 2001,61,2917-2922. H.U. Bryant, Selective estrogen receptor modulators, Rev. Endocr. Metab. Disord. 2002,3,231-241. M.J. Meegan, D.G. Lloyd, Advances in the science of estrogen receptor modulation, Curr. Med. Chem. 2003, 10,181-210. C.C. Chadwick, S. Chippari, E. Matelan, L. Borges-Marcucci, A.M. Eckert, J.C. Keith Jr, L.M. Albert, Y. Leathurby, H.A. Harris, R.A. Bhat, M. Ashwell, E. Trybulski, R.C. Winneker, S.J. Adelman, R.J. Steffan, D.C. Harnish, Identification of pathway-selective estrogen receptor ligands that inhibit NF-kappaB transcriptional activity, Proc. Natl.
930
I
15 Target Families
97.
98.
99.
100.
101.
102.
103.
Acad. Sci. U.S.A. 2005,102, antiinflammatory activity in vivo, 2543-2548. Mol. Endocritd. 1997,11,1245-1255. 104. S.W. Elmore, M.J. Coghlan, D.D. J.A. Gustafsson, What Anderson, J.K. Pratt, B.E. Green, pharmacologists can learn from A.X. Wang, M.A. Stashko, C.W. Lin, recent advances in estrogen C.M. Tyree, J.N. Miner, P.B. signalling, Trends Pharmacol. Sci. Jacobson, D.M. Wilcox, B.C. Lane, 2003,24,479-485. Nonsteroidal selective glucocorticoid B.R. Henke, T.G. Consler, N. Go, modulators: the effect of C-5 alkyl R.L. Hale, D.R. Hohman, S.A. Jones, substitution on the transcriptional A.T. Lu, L.B. Moore, J.T. Moore, L.A. activation/repression profile of 2,sOrband-Miller, R.G. Robinett, dihydro-lO-methoxy-2,2,4-trimethylJ. Shearin, P.K. Spearing, E.L. 1H-[l]benzopyrano[3,4-flcluinolines, Stewart, P.S. Turnbull, S.L. Weaver, J . Med. Chem. 2001,44,4481-4491. S.D. Williams, G.B. Wisely, M.H. 105. S.W. Elmore, J.K. Pratt, M.J. Lambert, A new series of estrogen Coghlan, Y. Mao, B.E. Green, D.D. receptor modulators that display Anderson, M.A. Stashko, C.W. Lin, selectivity for estrogen receptor beta, D. Falls, M. Nakane, L. Miller, C.M. /. Med. Chem. 2002,45,5492-5505. Tyree, J.N. Miner, B. Lane, H.A. Harris, J.A. Katzenellenbogen, Differentiation of in vitro B. S. Katzenellenbogen, transcriptional repression and Characterization of the biological activation profiles of selective roles of the estrogen receptors, glucocorticoid modulators, Bioorg. ERalpha and ERbeta, in estrogen Med. Chem. Lett. 2004,14, target tissues in vivo through the use 1721-1727. of an ERalpha-selective ligand, Endocrinology 2002,143,4172-4177. 106. H. Schacke, A. Schottelius, W.D. Docke, P. Strehlke, S. Jaroch, E.S. Manas, R.J. Unwalla, Z.B. Xu, N. Schmees, H. Rehwinkel, M.S. Malamas, C.P. Miller, H.A. H. Hennekes, K. Asadullah, Harris, C. Hsiao, T. Akopian, W.T. Dissociation of transactivation from Hum, K. Malakian, S. Wolfrom, transrepression by a selective A. Bapat, R.A. Bhat, M.L. Stahl, W.S. glucocorticoid receptor agonist leads Somers, J.C. Alvarez, Structure-based to separation of therapeutic effects design of estrogen receptor-beta from side effects, Proc. Natl. Acad. selective ligands, J. Am. Chem. Soc. Sci. U.S.A. 2004,101, 227-232. 2004,126,15106-15119. 107. D. DeManno, W. Elger, R. Garg, H.M. Reichardt, K.H. Kaestner, R. Lee, B. Schneider, J. Tuckermann, 0. Kretz, 0. Wessely, H. Hess-Stumpp, G. Schubert, R. Bock, P. Gass, W. Schmid, a K. Chwalisz, Asoprisnil (J8G7): P. Herrlich, P. Angel, G. Schutz, selective progesterone receptor DNA binding of the glucocorticoid modulator for gynecological therapy, receptor is not essential for survival, Steroids 2003,68, 1019-1032. Cell 1998,93,531-541. 108. B.J. Barnes, P.A. Howard, B.R. Walker, Deflazacort: towards Eplerenone: a selective aldosterone selective glucocorticoid receptor receptor antagonist for patients with modulation? Clin. Endocrinol. 2000, heart failure, Ann. Pharmacother. 52,13-15. 2005,39,68-76. B.M. Vayssiere, S. Dupont, 109. J.A. Delyani, Mineralocorticoid A. Choquart, F. Petit, T. Garcia, receptor antagonists: the evolution of C. Marchandeau, H. Gronemeyer, utility and pharmacology, Kidney Int. M. Resche-Rigon, Synthetic 2000,57,1408-1411. glucocorticoids that dissociate 110. J.P. Heaton, Andropause: coming of transactivation and AP-1 age for an old concept? CUT. Opin. transrepression exhibit Urol. 2001,11, 597-601.
References 111.
112.
113.
114.
115.
116.
117.
118.
R.S. Tan, S.J. Pu, J.W. Culberson, Role of androgens in mild cognitive impairment and possible interventions during andropause, Med. Hypotheses 2004, 62, 14-18. A.F. Santos, H. Huang, D.J. Tindall, The androgen receptor: a potential target for therapy of prostate cancer, Steroids 2004, 69, 79-85. J . Rosen, A. Negro-Vilar, Novel, non-steroidal, selective androgen receptor modulators (SARMs) with anabolic activity in bone and muscle and improved safety profile, 1. Musculoskelet. Neuronal Interact. 2002,2,222-224. J.P. Berger, A.E. Petro, K.L. Macnaul, L. J. Kelly, B.B. Zhang, K. Richards, A. Elbrecht, B.A. Johnson, G. Zhou, T.W. Doebber, C. Biswas, M. Parikh, N. Sharma, M.R. Tanen, G.M. Thompson, J. Ventre, A.D. Adams, R. Mosley, R.S. Sunvit, D.E. Moller, Mol. Endocrinol. 2003, 17,662-676. M. Downes, M.A. Verdecia, A.J. Roecker, R. Hughes, J.B. Hogenesch, H.R. Kast-Woelbern, M.E. Bowman, J.L. Ferrer, A.M. Anisfeld, P.A. Edwards, J.M. Rosenfeld, J.G. Alvarez, J.P. Noel, K.C. Nicolaou, R.M. Evans, A chemical, genetic, and structural analysis of the nuclear bile acid receptor FXR [see comment], Mol. Cells 2003, I I , 1079-1092. E.M. Quinet, D.A. Savio, A.R. Halpern, L. Chen, C.P. Miller, P. Nambi, Gene-selective modulation by a synthetic oxysterol ligand of the liver X receptor, J . Lipid Res. 2004, 45, 1929-1942. B. Miao, S. Zondlo, S. Gibbs, D. Cromley, V.P. Hosagrahara, T.G. Kirchgessner, J. Billheimer, R. Mukherjee, Raising HDL cholesterol without inducing hepatic steatosis and hypertriglyceridemia by a selective LXR modulator, /. Lipid Res. 2004,45,1410-1417. S . Folkertsma, P. van Noort, J. Van Durme, H.J. Joosten, E. Bettler, W. Fleuren, L. Oliveira, F. Horn, J . De Vlieg, G. Vriend, A family-based approach reveals the
119.
120.
121.
122.
123.
124.
125.
function of residues in the nuclear receptor ligand-binding domain, J . Mol. Biol. 2004, 341, 321-335. J.D. Baxter, J.W. Funder, J.W. Apriletti, P. Webb, Towards selectively modulating mineralocorticoid receptor function: lessons from other systems, Mol. Cell. Endocrinol. 2004, 217, 151-165. J.D. Norris, L.A. Paige, D.J. Christensen, C.Y. Chang, M.R. Huacani, D. Fan, P.T. Hamilton, D.M. Fowlkes, D.P. McDonnell, Peptide antagonists of the human estrogen receptor, Science 1999, 285, 744-746. M.A. Iannone, C.A. Simmons, S.H. Kadwell, D.L. Svoboda, D.E. Vanderwall, S.-J. Deng, T.G. Consler, J . Shearin, J.G. Gray, K.H. Pearce, Correlation between in vitro peptide binding profiles and cellular activities for estrogen receptor modulating compounds, Mol. Endocrinol. 2004, 18,1064-1081. K.H. Pearce, M.A. Iannone, C.A. Simmons, J.G. Gray, Discovery of novel nuclear receptor modulating ligands: an integral role for peptide interaction profiling, Drug Discov. Today 2004, 9, 741-751. F. Stossi, D.H. Barnett, J. Frasor, B. Komm, C.R. Lyttle, B.S. Katzenellenbogen, Transcriptional profiling of estrogen-regulated gene expression via estrogen receptor (ER) alpha or ERbeta in human osteosarcoma cells: distinct and common target genes for these receptors, Endocrinology 2004, 145, 3473-3486. M. Kian Tee, I. Rogatsky, C. Tzagarakis-Foster, A. Cvoro, J. An, R.J. Christy, K.R. Yamamoto, D.C. Leitman, Estradiol and selective estrogen receptor modulators differentially regulate target genes with estrogen receptors alpha and beta, Mol. Biol. Cell 2004, 15, 1262-1272. P.E. Young, D.K. Bol, Highthroughput transcriptional profiling for drug discovery and lead
I931
932
I
15 Target Families
126.
127.
development, Genet. Eng. News 2003, 23. M. Huber, I. Bahr, J.R. Kratzschmar A. Becker, E.C. Muller, P. Donner, H.D. Pohlenz, M.R. Schneider, A. Sommer, Comparison of proteomic and genomic analyses of the human breast cancer cell line T47D and the antiestrogen-resistant derivative T47D-r, Mol. Cell. Proteomics 2004, 3, 43-55. M. Podvinec, M.R. Kaufmann, C. Handschin, U.A. Meyer, NUBIScan, an in silico approach for prediction of nuclear receptor
response elements, Mol. Endocrinol. 2002, 16,1269-1279. 128. V.X. Jin, Y.W. Leu, S. Liyanarachchi, H. Sun, M. Fan, K.P. Nephew, T.H. Huang, R.V. Davuluri, Identifying estrogen receptor alpha target genes using integrated computational genomics and chromatin immunoprecipitation microarray, Nucleic Acids Res. 2004, 32, 6627-6635. 129. H. Escriva, S. Bertrand, V. Laudet, The evolution of the nuclear receptor superfamily, Essays Biochem. 2004, 40,ll-26.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
15.4 The GPCR - 7TM Receptor Target Family I 9 3 3
15.4 The CPCR - 7TM Receptor Target Family
EdgarJacoby, Rochdi Bouhelal, Marc Gerspacher, and Klaus Seuwen
Outlook
Chemical biology approaches have a long history in the exploration of the G-protein-coupled receptor (GPCR) family, which represents the largest and most important group of targets for therapeutics. The analysis of the human genome revealed a significant number of new members with unknown physiological functions, which are today the focus of many reverse pharmacology drug discovery programs. As the seven hydrophobic transmembrane segments are a defining common structural feature of these receptors, and as signaling via heterotrimeric G-proteins is not demonstrated in all cases, these proteins are also referred to as seven transmembrane (7TM) or serpentine receptors. This chapter will summarize important historic milestones of GPCR research, from the beginning when pharmacology was mainly descriptive, to the age of modern molecular biology with the cloning of the first receptor, and now the availability of the entire human GPCR repertoire at the sequence and protein levels. The chapter will show how GPCR-directed drug discovery was initially based on the careful testing of few specifically made chemical compounds and is today pursued with modern drug discovery approaches, including combinatorial library design, structural biology, and molecular informatics, as well as advanced screening technologies for the identification of new compounds activating or inhibiting GPCRs specifically. Such compounds, in conjunction with other new technology, allow us to study the role of receptors in physiology and medicine, and hopefully result in novel therapies. We will also outline how basic research on the signaling and regulatory mechanisms of GPCRs is advancing, leading to the discovery of new GPCR-interacting proteins, and thus opening new perspectives for drug development. Practical examples from GPCR expression studies, high-throughput screening (HTS), and the design of monoamine-related GPCR-focused combinatorial libraries illustrate ongoing GPCR chemical biology research. Finally, we will attempt to outline future progress that may relate today’s discoveries to the development of new medicines. 15.4.1 Introduction
G-protein-coupled receptors (GPCRs) are the largest known gene superfamily of the human genome. Around 30% of all marketed prescription drugs Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
934
I act on GPCRs
15 Target Families
(Fig. 15.4-1); in addition, they include around 30% of all targets investigated until now, which makes this class of proteins historically the most successful therapeutic target family [ 11. As illustrated in Table 15.4-1, GPCR-directed drugs cover a wide range of therapeutic indications [I, 21.
\
Sumatriptan 5-HT1, agonist
Olanzapine mixed 5-HTdD,/D2 antagonist
Fexofenadine H, antagonist
Ho&u H
2
N
0
q
H : + D - ~ ~ o
% \
/
OH
hfNx 'N- NH
Gabapentin GABA, agonist
Salmeterol
Valsartan AT, antagonist
p, agonist
Risperidone mixed 5-HTJD, antagonist
Clopidogrel P2Y1, antagonist
Farnotidine H, antagonist
OH
0
f
F.NH
0
0
N 'J
Leuprorelin LH-RH agonist Fig. 15.4-1
Chemical structures o f t o p selling CPCR drugs listed in Table 1
15.4 The CPCR
-
7TM Receptor Target Family
Table 15.4-1 Examples oftop selling CPCR drugs - source IMS Knowledge Link. CPCR drugs cover many therapeutic indications and represent a substantial part of today’s marketed medicines. Reported world sales are for 12 months ending with Q1 2005 ~
Trade name
Molecular entity Company
Therapeutic indication
AllegraiTelfast ’ DiovanLR GasterLR Imigran Leuplin/Lupron ’ Neurontin“
Fexofenadine Valsartan Famotidine Sumatriptan Leuprorelin GABApentin
Sanofi-Aventis Novartis Yamanouchi GlaxoSmithKline Takeda/Abbott Pfizer
PIavix@
Clopidogrel
Allergies Hypertension Gastric ulcer Migraine Cancer Neurological pain Stroke
Risperdal“ serevent“ Zyprexa‘”
Risperidone Salmeterol Olanzapine
Bristol-Myers Squibb Johnson & Johnson Schizophrenia Asthma GlaxoSmithKline Schizophrenia Elli Lilly
World sales ($ millions)
1792 2214 656 1454 904 2480
5277 3716 679 4905
“Before cloning”, GPCRs were originally defined as receptors transducing signals from the extracellular compartment to the interior via biochemical processes involving GTP-binding proteins. Molecular cloning of the first receptor genes suggested protein structures similar to rhodopsin, with seven transmembrane a-helical domains (hence also “7TM receptors”). Today, GPCRs are known as extremely versatile receptors for extracellular messengers as diverse as biogenic amines, purines and nucleic acid derivatives, lipids, peptides and proteins, odorants, pheromones, tastants, ions like calcium and protons, and even photons in the case of rhodopsin. GPCRs can form homoand heterodimers, as well as complex receptosomes, which in case-by-case dependent manner can incorporate additional intra- and extracellular soluble and transmembrane proteins [3, 41. As illustrated in Fig. 15.4-2, three main families of human GPCRs are known. The rhodopsin-like family A is the largest and the best studied from the structural and functional points of view. The other two main subfamilies are the secretin-like receptor family B, which binds several neuropeptides and other peptide hormones, and the metabotropic glutamate receptor (mG1uR)like family C. A still separate group is constituted by the receptors of the frizzled family, for which the direct coupling to heterotrimeric G-proteins is still a matter of debate [5]. The human GPCRs have recently been reclassified using phylogenetic analyses into five different groups named GRAFS, which is the acronym for the groups: glutamate, rhodopsin, adhesion, frizzledltaste2, and secretin [9]. The GRAFS system shows some distinct differences to the classification given above: (a)the adhesion receptors, which are expressed on leukocytes and in the central nervous system, are formed by secretin-like receptors that have a long N-terminal domain including adhesion molecule repeats
1
935
936
I
15 Target Families
15.4 The CPCR - 7TM Receptor Target Family 1937 4
Fig. 15.4-2 Classification o f human CPCRs. As described in greater detail below, the human genome contains 720-800 genes coding for functional CPCRs. On the basis of their primary sequence, these genes were historically classified into three main families, A, B, C or 1-3, respectively [6]. Sequences within each family generally share over 25% sequence identity in the 7TM core region. The rhodopsin-like family A is by far the largest subgroup and contains the opsins, the olfactory GPCRs, small molecule/peptide hormone CPCRs, and glycoprotein hormone CPCRs. Family A CPCRs are characterized by several highly conserved amino acids in the 7TM bundle, and there is usually a disulfide bridge linking extracellular loops E l and E2 (there are only few exceptions including the melanin stimulating/adrenocorticotropic hormone (MSH/ACTH) and the cannabinoid receptors). Most ofthe family A receptors have a palmitoylated cystein in the intracellular C-terminal tail. The binding sites ofthe endogenous small molecule hormone ligands o f class A CPCRs are located within the 7TM bundle ((a), the ligand-binding site is indicated in orange). For peptide and glycoprotein hormone receptors (respectively, (b) and (c)), binding occurs via the N-terminal, the extracellular loop segments, and the superior parts o f t h e T M helices. Family B comprises 50 CPCRs for peptides like secretin, calcitonin, and parathyroid hormone. The family B CPCRs are characterized by a relatively long N-terminal tail, which together with the juxtamembrane 7TM parts is implicated in ligand binding and contains a network of three conserved disulfide bridges defining a globular domain structure (d); the 3D structure ofthe N-terminal ligand-binding domain ofthe mouse CRF? receptorwas recently determined by high-resolution N M R spectroscopy. As in family A, the family B receptors show a number of conserved proline residues within the TM segments and are thought t o be essential for the conformational dynamics o f the receptors. Family B receptors appear to couple
preferentially t o activate the effector adenylate cyclase through the C-protein G,, and in general t o a lesser extent t o G , and C, [7]. Family C CPCRs include the mCluR, the y-aminobutyric acid type B (CABAe) and Ca+’-sensing receptors (CaR). This group has 17 members in the human genome, including notably the pheromone receptors, which form a small family in humans, but a much larger one in rodents. The majority o f family C receptors are characterized by very large N- and C-terminal tails, a disulfide bridge connecting the first and second extracellular loop, together with a very short and well conserved third intracellular loop (e). A number o f t h e strongly conserved residues of class A CPCRs are also strongly conserved in class C CPCRs; this is consistent with class A and class C receptors having a common ancestor. The ligand-binding site i s located in the N-terminal domain which is composed of the so-called Venus flytrap module (VFTM) that shares sequence similarity with bacterial periplasmic amino acid-binding proteins. In all class C CPCRs, except the CABAB receptor, a cysteine-rich domain (CRD) containing nine conserved cysteines links the VFTM to the 7TM domain. For mCluRl, the VFTM domain was crystallized in the liganded and unliganded state and was shown to form a disulfide linked homodimer undergoing considerable reorganization upon ligand binding [8].The 11 human frizzled/smoothened receptors control cell development and proliferation mediated by secreted glycoproteins called Writ and Hedgehog. The N-terminus contains a CRD ligand-binding domain with 10 conserved CYsteines, all ofwhich form disulfide bonds. The names frizzled and smoothened refer to specific Drosophila phenotypes that are linked t o mutations in the DrosoPh;la oflhologs. The N-terminal domains of CPCRs contain, in general, N-glycolysation sites for posttranslational modification, which ensure correct folding in the endoplasmatic reticulum and Cell-sUrfaCe expression.
938
I like epidermal growth factor (EGF) domains and are likely involved in 15 Target Families
cell-cell interactions and (b) the taste receptors were reclassified into two subgroups, one within the glutamate group and one together with frizzledltaste2 group. While the GRAFS classification is useful, in this chapter for historic reasons we will maintain the A, B, C nomenclature, as described above. In the last decades, several GPCR subfamilies were explored systematically in such a way that today selective ligands and drugs are known for a large number of receptors of these families [lo].The elucidation of the human genome with the discovery of the sequences of many novel orphan GPCRs with unknown functions provided the basis for further systematization of the exploration of the GPCR superfamily for drug discovery. Because of the evolutionary conserved commonalities existing inside a homogeneous subgroup of GPCRs, especially for aspects of molecular recognition, it is a very rational expectation that through further focus within subfamilies it will be possible to find ligands of the new receptors and to discover innovative medicines [11,121. This chapter will summarize the milestones of GPCR research and show how modern chemical biology disciplines and discovery technologies are currently used to explore this highly important target family and to contribute to new and better medicines. 15.4.2 History/Development
In their unparalleled significance for medicine, the history of GPCR chemical biology is in principle as old as the history of pharmacology [13]. Since the beginning of the nineteenth century, pharmacologists like Ariens, Furchgott, Schild, Blake, and others investigated animal models, isolated organs and tissues to study dose-dependent activity of neurotransmitters, and peptide hormones as well as natural and synthetic drugs. The targets for most of these molecules later turned out to be GPCRs and ion channels. Many essential concepts like the binding site and receptor theory, the definitions of agonists and antagonists, affinity and efficacy, as well as the usage of radioligands for binding studies and receptor quantification, were established (Table 15.4-2). Several methods emerged to analyze quantitatively the dose response of compounds. The molecular nature of the receptors remained, however, unrevealed long after pioneers of biochemistry - including Krebs, Rodbell, and Gilman, working on adrenoceptors - had discovered important elements of the signaling cascade in the 1960s and 1970s [15, 161. The early milestones for the elucidation of the signaling cascades, which couple hormones via the receptors to the intracellular effector proteins, included the discovery of cyclic adenosine monophosphate (CAMP) by Sutherland as the first characterized second messenger [17]; the enzyme adenylate cyclase responsible for its synthesis;
15.4 The CPCR
-
7TM Receptor Target Family I939
Table 15.4-2 General pharmacological terms used in this chapter
to describe compound action at the GPCRs Term
Definition
Receptor
A cellular macromolecule, or an assembly of macromolecules, that is concerned directly and specifically in chemical signaling between and within cells. Combination of a hormone, neurotransmitter, drug, or intracellular messenger with its receptor(s) initiates a change in cell function. A ligand that binds to a receptor and alters the receptor state resulting in a biological response. Conventional agonists increase receptor activity. Full agonists stimulate the maximum response capacity of the system: partial agonists do not reach the maximum response capacity. The designation offull versus partial agonist is system dependent and a full agonist for one tissue or measurement may be a partial agonist for another. Inverse agonists reduce the constitutive biological response. Nonendogenous agonists may combine either with the same site as the endogenous agonist (primary or orthosteric site), or with a different allosteric site on the receptor (allosteric or allotopic site). A drug that reduces the action of another drug, generally an agonist. Many antagonists act at the same receptor macromolecule as the agonist. In competitive antagonism, the binding of the agonist and antagonist is mutually exclusive, either because the agonist and antagonist compete for the same binding site or combine with spatially adjacent and overlapping binding sites (synoptic interaction): a third possibility is that different binding sites are involved but they influence the macromolecule in such manner that simultaneous binding is impossible. A ligand that increases or decreases the action of an (primary or orthosteric) agonist or antagonist by combining with a distinct (allosteric or allotropic) site on the receptor macromolecule. Decline in the response to continuous or repeated application of agonist.
Agonist
Antagonist
Allosteric (allotopic) modulator Desensitization
Adapted from the recommendation of the IUPHAR Committee on Receptor Nomenclature and Drug Classification 1141
and heterotrimeric G-proteins as transducers. Intracellular free calcium and inositol phosphates were later characterized as further second messengers, and phospholipases, kinases, and ion channels emerged as important effector systems downstream of GPCR activation. The list of effectors is ever expanding (see Fig. 15.4-3) [18, 191. Long before the GPCR proteins were isolated and sequenced, many important therapeutic classes were successfully introduced into the clinics, including the B-blockers, antihistaminics, anticholinergics, analgesic opiates, and neuroleptics [20]. These compounds were developed from discovery to market very rapidly and were successful in the pharmaceutical industry. The sales provided funds to fuel further research in the field. A critical success
940
I
15 Target Families
Fig. 15.4-3 Classical CPCR signaling. Receptors couple t o heterotrimeric C-proteins to regulate a variety o f cell responses. Agonist binding at the receptor leads to exchange ofC-protein bound CDP t o CTP. The activated heterotrimer dissociates into the a-subunit (symbolized as a*)and the By-dimer, both ofwhich have an independent capacity t o signal forward through the activation or inhibition o f effectors. Hydrolysis o f CTP t o CDP leads to signal termination and reassociation o f the heterotrimer; regulators o f C-protein signaling (RCS) proteins enhance the intrinsic CTPase activity of the c, subunit. Some C-protein subunits and effectors are expressed ubiquitously, others only in specific tissues. The 16 mammalian C-protein a-subunits fall into four broad families based on primary structure and the dependent signaling cascade. The stimulatory C,, family couple to adenylate cyclase t o cause an increase in intracellular CAMP levels. The eight members ofthe C,,,, family inhibit adenylate cyclase and trigger other signaling events. The three members ofthe Cuq/ll family activate Phospholipase Cb (PLCp) resulting in the
intramembrane hydrolysis o f phosphatidylinositol-4,5-bisphosphate (PIP2) t o inositol-l,4,5-triphosphate (IP3) and diacyl glycerol (DAG); DAC increases the activity o f protein kinase C (PKC) and IP3 triggers the release o f Caf2 ions from intracellular stores. Finally, the two members ofthe cu12ll3 family regulate Rho proteins. Gby-dimers are combinations of five known isoforms ofthe C p subunit and 13 known isoforms o f t h e C, subunit. Each individual isoform can associate with a set o f effectors and regulators. Cpy-dimers signal to a large number ofeffectors including ion channels, phospholipases, phosphoinositide kinases, and the ras/raf/extracellular signal-regulated kinase (ERK) pathways. Examples o f effectors include: C-protein regulated inward rectifying Kf channel (CIRKi-4), voltage-dependent Ca+* channels (VDCC), phospholipase A2 (PLA2), PLCp and Na+/HC exchanger (NHE1). The specific function o f individual Cpy-dimers is not fully explored. A single CPCR can activate more than one type o f C-protein. For further detail see Refs. [18, 191.
factor for their discovery was the existence of relatively well established knowledge of the physiology of the related hormone, and that new chemical compounds were systematicallytested in biological models of multiple disease
15.4 The CPCR
-
TTM Receptor Target Family
areas in parallel, allowing a rather complete understanding of their mode of action. Binding profiles of drugs and reference compounds were generated on membrane preparations from different organs, leading to the first clear evidence for receptor subtypes expressed in different tissues. The development of new protein chemistry technologies, like affinity labeling and affinity chromatography procedures allowed access to enriched and purified sources of receptors and finally introduced the molecular age of GPCR research. Having access to a broad range of adrenergic ligands and by coupling the new affinity chromatography procedures with more conventional chromatographic procedures, the Lefkowitz group was first able to purify, the B2-adrenoceptor in 1979 [21]. The proof of concept experiment that showed the purified B2-adrenoceptor protein is indeed the functional receptor which was achieved by reconstitution experiments in phospholipid vesicles with purified G-protein and the catalytic moiety of adenylate cyclase [22]. The progress in molecular cloning techniques provided access to the DNA sequence of the receptors. Microsequencing of small peptide stretches obtained from the purified adrenoceptors enabled the design of oligonucleotide probes, allowing, in 1986, Merck Research Laboratories to clone the gene and cDNA, encoding the hamster B2-adrenoceptor by using a genomic cDNA library and by identifying overlapping clones that encoded all the peptide stretches defining the full sequence [23]. The cloning of the B2-adrenoceptor was a historic breakthrough and catalyzed molecular GPCR research. The analysis of the sequence revealed the homology to bovine rhodopsin, which, since the beginning of 1980s, was a model system for the study of membrane proteins and the investigation of the molecular basis of vision. Given its remarkable easy access from retinal rod preparations, the sequence ofbovine rhodopsin was, in 1982, determined using conventional protein sequencing by Ovchinnikov, and cloned in 1983 [24]. A structure-function relationship was established to bacteriorhodopsin, the photon-driven and retinal-binding proton pump from the purple membrane of Halobacterium halobium, for which Henderson and Unwin, in 1975, had already determined a 7TM topology using electron microscopy techniques [25]. Given that the investigation of the signaling mechanisms of rhodopsin had revealed its linkage to the G-protein transducin, the knowledge on the Bz-adrenoceptor sequence and signaling mechanism consolidated the view that rhodopsin would provide an ideal model system for other GPCRs. The speculation about the existence of a large family of such receptors with the 7TM arrangement being a fold characteristic was then confirmed in the following years by successful cloning of essentially all monoamine GPCRs and several peptide class A GPCRs showing all the characteristic 7TM signatures in the hydrophobicity plot analyses. To this end, cDNA libraries, prepared from cells or tissues known to be rich in certain receptors, were screened by low stringency hybridization, or were used for polymerase chain reaction (PCR) amplification of candidate genes using degenerate primers. Proof of function was obtained after the
I
941
942
75 Target Families
I expression of the cloned receptor in heterologous cells, by measuring an agonist response. In many cases, however, the functional identity of the cloned receptors could not be matched, and receptors of unknown function were identified. Since the end of the 199Os, such orphan GPCRs became the object of reverse pharmacology-based drug discovery programmes [26,27].The successfully completed deorphanization projects resulted in relevant patent and intellectual property claims to the inventors. The elucidation of the human genome, in 2001, motivated additional projects in this direction, because almost all members of the GPCR-7TM target family became visible at the DNA sequence level, and advanced gene expression analysis and bioinformatics methods became available for mining and classification purposes. Around 60 orphan receptors since became ligand paired and progress was, in many cases, achieved for entire subfamilies such as the trace amine receptors or the endothelial differentiation gene (EDG) receptors. Besides the systematic exploration in drug discovery programmes, other branches of GPCR research focused on the detailed investigation of receptor signaling and regulation. Generally, direct signaling via second messengers resulting in immediate cell responses can be distinguished from the persistent activation of gene expression in the nucleus. The discovery of mutations conferring constitutive receptor activation led to the identification of receptors signaling in the absence of agonist ligands and which were later related to a number of diseases [28] (e.g., in Jansen’s disease [29], the hypercalcemia and skeletal dysplasia found in many cases is the result of a constitutively overactive parathyroid hormone/parathyroid hormone related protein (PTH/PTHrP) receptor, carrying a point mutation). Studies of the constitutive activity of receptors led to both the in vitro and in vivo demonstrations of inverse agonism. In the extended ternary complex model of receptor activation, inverse agonists are ligands that preferably bind to and stabilize the inactive conformational state of the receptor and therefore reduce background signaling [30]. Many receptors show a weak constitutive activity in specific cell systems following overexpression, and this can be used to determine the coupling mechanisms engaged downstream. Using mutagenesis and chimeric receptors, the ligand-binding domains and intracellular domains interacting with G-proteins and other effectors were determined [31,32]. Multiple signaling roles and signal switching mechanisms were discovered for many GPCRs. For instance, the ,8z-adrenoceptor signals on initial agonist binding via the G, pathway. Protein kinase A (PKA) mediated phosphorylation within the third intracellular loop switches the signaling specificity toward G, signaling pathways. A subsequent change of the signaling properties occurs through G-protein-coupled receptor kinase (GRK) mediated phosphorylation of the receptor C-terminal tail, resulting in binding of ,8-arrestin proteins, which mediate receptor downregulation via clathrin-coated pits. The internalized complexes subsequently undergo regulated endosomal sorting either toward lysosomal degradation or by recycling back to the plasma membrane. ,8-Arrestin also acts as a scaffold
15.4 The CPCR
-
7TM Receptor Target Family
protein for other signaling pathways and recruits, for instance the c-Src kinase via the poly-Pro-SH3 domain, and thereby activates mitogen-activated protein (MAP) kinase signaling. Also, G-protein independent signaling toward the NHEl ion exchanger was observed. This occurs via the Naf/H+ exchanger regulatory factor (NHERF) protein interacting by its postsynaptic density-95, disc large, zonulla occludens-1 (PDZ) domain with the PDZ binding motifs found at the C-terminus of several GPCRs [33]. The investigation of the mechanism of agonist-induced receptor signaling, desensitization, internalization, trafficking, and recycling resulted in the discovery of many proteins that interact with GPCRs and are collectively called G-protein-coupled receptor interacting proteins (GIPs) [34, 351. The GIPs link GPCRs to large protein networks, called receptosomes, whose mechanistic investigation and exploration for drug discovery is the subject of intense research activity. We will elaborate more on this topic at the end of the chapter.
15.4.3 General Considerations
15.4.3.1
CPCRs in Human and Other Cenomes
The human genome as well as genomes from several other species (mouse, rat, zebra fish, Drosophila, Caenorhabditis elegans) are now relatively well analyzed with respect to GPCRs, and these receptors constitute the largest gene family in mammals. The most recent studies concluded - depending on the stringencies ofthe different bioinformatics data mining methods used - on the existence of 720-800 human GPCRs accounting for around 2% of the human genome. These include ca 380 unique functional nonolfactory/nonsensory GPCR sequences for which endogenous ligands are expected and are therefore referred to as endo-GPCRs [9, 36). The endo-GPCR group has attracted the most attention in recent years. These receptors are expressed in different tissues and regulate various aspects of physiology. A recent comparative investigation of the human and mouse endo-GPCR repertoire [36], revealed 367 human and 392 mouse GPCRs - 343 were found in common to both species. The human receptors without orthologs, in mice, contain several orphan receptors, but notably the melanin-concentrating hormone subtype 2 (MCHZ) receptor and the recently identified receptor for the eosinophil chemoattractant 5-0x0-eicosatetraenoic acid. Of the 362 human GPCRs, 284 belong to the rhodopsin-like class A, 50 to the secretin receptor-like class B, 17 to the class C, and 11 to the frizzled-smoothened receptor-like class F/S; and of the 387 mouse GPCRs, 313, 47, 17, and 10 belong to classes A, B, C, and F/S, respectively. The cataloguing of these receptors according to ligand specificities reported in the literature identified 224 human and 214 mouse GPCRs with known ligands. The remaining 138 human and 173 mouse GPCRs have no known ligands and are therefore orphan receptors. Among the orphan receptors, 98 human and 136 mouse receptors belong to class A,
I
943
944
I 34 human and 31 mouse receptors belong to class B, 15 Target Families
G receptors belong to class C in both species, and none belong to class F/S. Olfactory receptor genes represent the largest mammalian subgroup. They are class A receptors encoded by single exons, and they are transcribed in the olfactory epithelium, where they interact specifically with the G-protein Goif to transduce odorant signals. They provided the basis for the understanding of odor recognition, which was awarded in 2004 with the Nobel prize for medicine and physiology to Buck and Axel [37]. For some olfactory receptors, expressed sequence tags (ESTs) were picked up in peripheral organs; however, the significance of these findings remains unclear at present (e.g., prostate-specific gene receptor (PSGR)).Especially for the human olfactory receptor family, it is not yet entirely clear which of these receptors are functionally expressed, as about 50% of the genes identified likely represent pseudogenes. In the mouse family, the majority of olfactory receptors appears to be functional. The annotation and functional characterization of olfactory receptors is rapidly evolving and specific databases have been created that follow recent developments “online” (see Table 15.4-3). In addition to olfactory receptors, taste and pheromone receptors are identified as chemosensory. The pheromone receptors play an important role in modulating behavior in rodents; whether they are involved in human behavior is a matter of debate. Pheromone receptors belong to class C. They are specifically expressed in the vemeronasal organ in rodents, which is a specific structure separate from, but in proximity to, the main olfactory epithelium. While there are more than 100 active receptors in the mouse, only 11 have been identified in humans, and their ligands are unknown. Taste receptors come in two families that are rather well conserved between human and mouse. One group belongs to class C and has three members (TlR1,2,3);these receptors form heterodimers like y -aminobutyric acid type B (GABAB)receptors, and the different entities formed are responsible for detecting sugars and amino acid glutamate. The second group oftaste receptors is class A like (T2Rs) and comprises more than 30 receptors in humans, which appear to be involved in detecting bitter tastes. All taste receptors are expressed exclusively in the tongue, and there is a separation between cells expressing T1- and T2-type receptors. The opsins represent the highly interesting small family of light-detecting GPCRs [38]. In addition to the four well known opsins operating in rod and cone cells, there are four additional opsin-related receptors (retinal G protein-coupled receptor (RGR) opsin, peropsin, melanopsin, encephalopsin) that are likely to bind chromopliores and appear to play interesting roles in light-sensing,outside the well-described primary phototransduction processes. For instance, melanopsin may be involved in the control of circadian rhythms. ESTs for encephalopsin were isolated from several tissues, including brain and skin. The genome of the nematode C. eleguns was the first to be sequenced in full, followed by Drosophilu shortly after. These very distantly related
~~
http://senselab.med.yale.edu/senselab/ORDB/
http://bioinfo-pharma.ustrasbg.fr/gpcrdb/gpcrdb-form.htm1
http://www.gpcr.org/7tm/
http://kidb.bioc.cwr.edu/
http://www.iuphar-db.org/iuphar-rd/index.html/
Internet resource URL
~
(continued overleaf)
Official database of the IUPHAR Committee on Receptor Nomenclature and Drug Classification, includes information on name synonyms, structure, functional assays, ligands, agonist and antagonist potencies, radioligand assays, transduction mechanisms, receptor distribution, tissue function, and phenotype. Database of N l M H Psychoactive Drug Screening Program. Pharmacoinformatics systems with strong focus on GPCR pharmacology and profile structure-activity data. GPCRDB: Information system of CMBI in Nijmegen contains information about sequences, multiple sequence alignments, phylogenetic trees, 3D models, GPCR mutation data and ligand-binding constants. hGPCRdb: The human druggable GPCR database at the University Louis Pasteur of Strasbourg provides searching capabilities for chemogenomics analyses of the 7TM and the binding cavity domains of human GPCRs. Olfactory Receptor Database of the SenseLab project at Yale University which is a long-term effort to build integrated, multidisciplinary models of neurons and neural systems, using the olfactory pathway as a model. The database provides metadata of gene and protein sequences of olfactory receptors.
Specification o f CPCR related information available
Publicly available Internet molecular informatics resources providing relevant information for CPCR chemical biology research
Table 15.4-3
1
(continued)
http://chembank.med.harvard.edu/ http://pubchem.ncbi.nlm.nih.gov/ http://www.ebi.ac.uk/interpro/
http://www.soe.ucsc.edu/research/compbio/gpcrsubclass/
http://bioinformatics.biol.uoa.gr/PRED-GPCR/
http://umber.sbs.man.ac.uk/dbbrowser/gpcrPRINTSj
http://www-grap.fagmed.uit.no/G RAP/homepage.html/
Internet resource URL
Table 15.4-3
The GRAP database at the University of Tromso contains information of mutants of family A GPCRs with detailed description of the ligand-binding and signal transductional properties. A diagnostic bioinformatics resource at the Univei-sityof Xanchestei profiling a query sequence against the PRINTS fingerprint database to determine most similar families or receptor subtypes. Additional bioinformatics classifiers of GPCRs exist at the University of Athens and the University of California Santa Cruz, and are, respectively, based on Hidden Markov Model (HMM) and SVM methods. ChemBank at Harvard University and Pubchem at the NCBI are cheminformatics databases for small molecules and their biological activities. Both systems are supported by the NCI’s initiative for chemical genetics. InterPro at EBI is a general bioinformatics database of protein families, domains, and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.
Specification o f CPCR related information available
m P W
15.4 The CPCR
-
7TM Receptor Target Family I947
organisms share with mammals the existence of receptor systems for monoamines, acetylcholine, GABAB, glutamate, Wnt glucoproteins, and several neuropeptides, inferring their potential usage as model organisms to explore the biology of the conserved receptor systems [39]. Virally encoded GPCRs might have a direct role in human diseases. Indeed, the GPCR from Kaposi’s sarcoma-associated herpesvirus has recently been implicated in Kaposi’s sarcomagenesis, and the human cytomegalovirusencoded GPCRs have been implicated in atherosclerosis. Given the versatility of GPCR signaling and its wide involvement in physiological processes, it is not surprising that viruses have evolved to exploit these receptors to their advantage [40].
15.4.3.2 Strategies for the Deorphanization o f CPCRs
Deorphanization, the identification of activating ligands for previously orphan receptors, is a key task in reverse molecular pharmacology. Identifying receptor/agonist pairs usually allows the rapid elucidation of the physiological role of both partners, sometimes putting them in unexpected context. Thus, the identification of orexin unexpectedly led to an understanding of narcolepsy; the discovery of pH-sensing receptors triggered new experimental approaches in several areas of biology. Although bioinformatics methods were initially helpful in successfully directing ligand-pairing experiments as illustrated by examples given in Section 4.3.5, deorphanization strategies rely on biological screening of orphan GPCRs expressed in specific recombinant expression systems, such as immortalized mammalian cells, yeast, or Xenopus melanophores [26, 271. The agonist ligand libraries used for deorphanization include small molecules, peptides, proteins, and lipids or tissue extracts, which are specifically selected as described in Section 4.3.4. The identification of an activating agonistic ligand of the cell-surface expressed receptors is dependent on the activation of an intracellular signaling cascade. The difficulty in the assay design is that the signaling cascade is a priori not known for a new orphan receptor. Generic assay systems amenable for high-throughput screening (HTS), therefore, need to be designed to allow screening of large surrogate ligand collections. One of the most successful approaches to deorphanization uses the fluorescent imaging plate reader (FLIPR) screening technology, which detects ligand induced intracellular Ca+’ mobilization. To direct the signaling via the PLC Ca+’ readout, the receptors are transiently expressed into mammalian cells in the presence of one or more cocktails of promiscuous G-proteins such as GqlsIl(j,which couple to the majority of GPCRs [41], or the engineered chimeric G-proteins like Gcrq,5-G or Guqs5-6in which five or six amino acids of the C-terminal of G,, have been replaced by the corresponding amino acids of G,, or G,, to redirect coupling of G,, or G,, specific receptors via phospholipase Cp(PLCp) [42]. Through mechanisms that are not yet fully understood, prestimulation of some cell types with the agonists of G,-coupled
948
I receptors dramatically sensitizes these cells to stimulation by Gi- and G,75 Target Families
coupled receptors, again linking such receptors to the calcium signaling system. GPCRs have been successfully expressed in yeast and coupled to the endogenous mating response pathway. Yeast-based assays use a variety of stable, expressed synthetic G-proteins and the readout is linked, for instance, to the expression of the ,6-galactosidase or other reporter genes. The usage of Xenopus melanocytes (frog skin cell) for transfection with mutant orphan GPCRs, which increased constitutive activity, represents an alternative to the mammalian and yeast expression systems. In response to selective GPCR signaling via G,, or G,,, the melanosomes disperse the melanin pigment and cause darkening of the cells. Conversely, when signaling is via G,,, the melanosomes aggregate and cause lightening of the cells. Activation and signaling can thus specifically be determined by simple measurements of light transmittance. The so-called constitutively activating receptor technology (CART),which is limited in compound throughput, provides the advantage of identifying agonist and inverse agonist in the same experiment [43]. The validation of the hits includes testing of the possible interference with endogenous receptors of the heterologous expression system and is followed by selectivity screening on other GPCRs and further investigation on cell-based, tissue or in vivo models. These experiments help determine the physiological role of the newly discovered ligands and receptors. There are limitations in such screening strategies, as the heterologous expression systems may not provide a permissive context for signaling. For example, the class B GPCR calcitonin receptor-like receptor (CRLR) requires the presence of single-transmembrane-domain receptor activity modulating proteins (RAMPS),which regulate the transport to the membrane and ligand specificity properties of the receptor. Depending on the RAMP subtype, CRLR can act as a calcitonin-gene-related peptide or as an adrenomedullin receptor [44]. Other receptors are active only as heterodimers, as has first been demonstrated for the GABABreceptors, requiring the coexpression of the two partner receptor proteins. The C5L2, a receptor that shares homology with the C3a and C5a anaphylatoxin receptors, is currently thought to work simply as a ligand sink without any classical signaling activity. Similarly, the chemokine receptor DG is thought to bind several chemokines with the only purpose being to internalize and degrade them. Unfortunately, it is not possible at present to predict with reasonable certainty from the receptor’s primary sequence whether it will be signaling. This raises the possibility for other orphan receptors that either do not signal or use alternative G-protein independent signaling pathways. These examples illustrate the need for the development of novel screening and imaging technologies, reporting, for instance, on receptor translocation of proteins between subcompartments of living cells using light resonance energy transfer based on either fluorescence (FRET) or bioluminescence (BRET) [45,4G].
15.4 The CPCR - 7TM Receptor Target Family 1949
Other receptors, like viral GPCRs (e.g., ORF74 of Kaposi sarcoma-associated herpesvirus), are highly constitutively active and function in the absence of ligand, which raised the possibility for the existence of other ligandless orphans. Again, other orphan receptors might play roles only in intracellular mechanism, acting, for instance, as trafficking factors via heterodimerization, or being expressed in the membranes of intracellular organelles; exogenously applied nonmembrane permeable ligands will not activate such receptors. The correct plasma membrane localization of orphan receptors studied should be controlled using immunocytochemistry methods. There are several cases where the reported receptor agonists may not be the physiological ones. For instance, the receptors HM74 and HM74a respond to niacin (nicotinic acid), a clinically useful molecule normalizing dyslipidemia, but the physiological first messenger(s) remains to be discovered. In some cases, the original reports describing new receptor-ligand pairings were not reproducible. For instance, several years ago, the related receptors ovarian cancer G protein-coupled receptor 1 (OGRl),G protein-coupled receptor 4 (GPR4),and T cell death-associated gene 8 (TDAGS) were described as receptors for lipid messengers. Later, it was demonstrated that OGR1, GPR4, and TDAG8 may in fact be considered as genuine pH-sensing receptors [47].
15.4.3.3 Structural Biology o f CPCRs and Molecular Modeling o f Ligand- Receptor Interactions
Until the year 2000, when the first 2.8-A crystal structure of bovine rhodopsin was solved by Palczewski and coworkers, and which was later refined in 2004 to 2.2-A resolution and for which in total seven crystallographic conformational states are deposited at the Protein Data Bank (PDB) [48], structural biological investigations of GPCRs were limited to indirect mutagenesis and second generation affinity labeling methods based on substituted-cystein accessibility method (SCAM)where sulfhydryl-reactive affinity reagents are combined with either wild-type or a series of substituted-cystein mutant receptors (e.g., D2 receptors).The first 3D molecular models were based on the analysis of the 2D projection maps generated from cryoelectron microscopic data of 2D crystal of rhodopsin and the analogous bacteriorhodopsin for which a 2.5-A resolution X-ray structure became available in 1997 using microcrystals grown in lipid cubic phases [49].The comparison of the 3D structures of rhodopsin and bacteriorhodopsin clearly showed differences in the length of the loop and helix segments and of the relative arrangement, tilts and kinks of the individual helices among the two proteins, which were already previously inferred to exist based on the 2D projection maps. While the early 3D models based on the bacteriorhodopsin template were able to explain, to some degree, the data generated from mutagenesis experiments [SO, 5 11,the quality of these analyses became clearly improved when the 3D structure of the bovine rhodopsin became available. This applies especially for the class A GPCRs, which, although having only a sequence similarity of 20-30% to rhodopsin, share characteristic
950
I signature motifs in each TM helix [52]. The main ligand-binding site of small 15 Target Families
molecule hormones and nonpeptidic agonist and antagonist is located within the central crevice of the 7TM bundle, in analogy to the lipophilic binding pocket of the retinal molecule in the light-sensing proteins. This is a remarkable similarity, especially for the overlap between the positions of the proposed ligand-contact residues and the positions of the retinal contact residues in rhodopsin. The extracellular side involved in ligand binding appears to form a receptor-specific binding site, while the cytoplasmic side and the ends of the transmembrane helices toward the cytoplasm are significantly more conserved. Illustrative examples include the work on the B 2 adrenergic, serotonin 5 - H T 1 (see ~ Fig. 15.4-4),neurokinin-1 ( N K l ) , adenosine A3, purine P2Y1, angiotensin AT1,and chemokine CCR2 receptors [2,52-541, where the 3D models helped in understanding detailed aspects of the observed structure-activity relationships (SARs) based on the analysis of the ligand-receptor interactions probed especially by two-dimensional mutagenesis experiments, that is, experiments in which both the ligand and the receptor are simultaneously modified according to the presumed nature of the specific molecular interaction. Such experiments are expected to be of better quality than the more frequent onedimensional mutations of the receptor, whereby the described effect on the binding might not exclusively result from a direct ligand-receptor contact but also from long-range structural perturbations. Such studies demonstrated that antagonists of small molecule hormone receptors bind isosterically to the endogenous ligands, whereas nonpeptide antagonist may bind rather differently to the peptide agonists. Mutation experiments in combination with molecular modeling of ligand-receptor interactions were also useful in understanding the species differences for ligand affinities and specificities (e.g., NK1 antagonists in human and rat). More prospectively, the models were used to provide a conceptional framework for combinatorial library design strategies like in the Novartis and Biofocus chemogenomics knowledge-based approaches described below and for the optimization of the selectivity aspects of lead series. For instance, modeling of the 5-HTzc receptor-ligand interaction in combination with ligand derived comparative molecular field analysis (CoMFA) was crucial for the discovery and optimization of 5-HTzcp selective indoline urea ligands not targeting the ~ - H T ~ receptors A [55]. Rhodopsin-based models of the 7TM domain were also instrumental to researchers at Novo Nordisk in understanding the molecular recognition of privileged structures used in generalized library design approaches, which provide a mean to target orphan GPCRs in the absence of the knowledge of the endogenous or surrogate ligand [SG]. Modeling the interactions of three sets of privileged motif-based ligands into their receptors, including 2-aryl-indole based ligands in the serotonin 5-HT6 and melanocortin-4 (MC4) receptors, spiro-piperidine-indane based ligands in the growth hormone secretagogue (GHS) and MC4 receptors, and 2-tetrazole-biphenyl based ligands in the AT1 and GHS receptors, showed the correlation of conserved patterns of residues in the ligand-binding pockets
15.4 The CPCR
Fig. 15.4-4 Three ligand binding sites model for monoamine-related CPCRs illustrated by a rhodopsin-based 3D model o f the S - H T ~ A receptor (left: extracellular view; right: side view). We recently proposed a three binding site hypothesis for the molecular recognition o f ligands at monoamine CPCRs by combining: (a) analyses ofthe architectures o f known monoamine CPCR ligands (see Fig. 15.4-9), (b) analyses o f molecular models o f the ligand-receptor interactions, and (c) structural bioinformatics analyses o f the sequence similarities o f the three distinct binding regions o f "one-site filling" ligand fragments within the monoamine CPCR family. For the ~ - H T receptor, ~A which provided a template for the discussion o f other related ligand-CPCR interactions, mutagenesis studies map three spatially distinct binding regions, which correspond to the binding sites o f the "small, one-site
-
7TM Receptor Target Family I951
filling" ligands 5-HT (serotonin propranolol (cyan), and
-
yellow),
8-hydroxy-N,N-dipropylaminotetralin (8-OH-DPAT - green), respectively. All three binding sites are located within the highly conserved 7TM domain o f the receptor and overlap a t the residue Asp3 32 (D116) in TM3, which constitutes the key anchor site for basic monoamine ligands. The three distinct binding sites are also reflected by the architectures o f known high-affinity ligands, which cross-link two or three "one-site filling" fragments around a basic amino group. For further detail see references [51, 531. Throughout this chapter the residue positions are number coded according to van Rhee and Jacobson [32]: The first digit gives the transmembrane domain and the following number indicates the position o f the residue relative t o position 50 which i s arbitrarily attributed t o the most conserved residue in each helix.
of the receptors with the recognition of specific privileged fragments. These findings imply that any one particular privileged structure can target a specific subset of receptors and that motif-based searches can be used for subsetting the receptor repertoire including the orphan receptors. The models also showed that only parts of the privileged structures are accommodated within the conserved subpocket; some contacts are between substructure elements of the
952
I full privileged motif and the nonconserved part of the pocket, which suggests 15 Target Families
the possibility for design of selective ligands based on privileged motifs. A broad spectrum of homology modeling techniques ranging from strict, templatebased methods to de novo prediction methods (e.g., the PREDICT method [57]) are used to build GPCR models. Although some reports suggest that rhodopsin template-based approaches can be adapted to the entire GPCR repertoire [%I, the underlying sequence alignments of such models must be carefully investigated, which for some helices in some subfamilies are not obvious [9]. While most of the time these models neglect the long intracellular loops and N- and C-terminal domains, some studies emphasized the role of the second extracellular loop E2 in ligand specificity. In the bovine rhodopsin structure, the E2 loop, which is bridged via a conserved disulfide link to the residue Cys3.25 in top of TM3, covers parts of the central binding crevice in a lidlike manner. One of the two ,!?-strandsthat defines the fold of the loop, contacts directly with the retinal ligand. As the length of the loop varies significantly within the class A family, general conclusions are difficult. Recently, it was proposed on the basis of random saturation mutagenesis experiments of the C5aR that the E 2 loop acts as a negative regulator of receptor activation and stabilizes the nonsignaling receptor conformation in the absence of the agonist ligand [59]. Also, the E2 loop has been implicated in ligand-ligand allosteric interactions which were experimentally investigated by the SCAM approach [60]. For instance, in the interaction of the muscarinic M1 receptor with the allosteric modulator gallamine, an acidic sequence segment just before the loop cystein residue could be linked to these effects. The potential role of the E2 loop in the allosteric effects observed for amiloride on the action of antagonists of the C X ~ Aand (Y2A adrenoceptors and dopamine receptors is reported. Recently, the potential value of GPCR models for in silico screening applications has become of interest. Using a 3D model of the NK1 receptor generated by the modeling binding sites including ligand information explicitly (MOBILE)approach, in combination with 2D and 3D database searches, novel submicromolar NKI antagonists were discovered [61]. As shown in another study [62], models of the dopamine D3, muscarinic MI, and vasopressin V1, receptors based on the rhodopsin template seemed to be of sufficient accuracy to be useful (20- to 40-fold enrichment compared to random screening) in protein-based virtual screening experiments. This procedure used standard docking software like DOCK, FlexX, or GOLD and searched for GPCR antagonist starting from antagonist-bound models shaped by minimizing manually docked antagonist into the binding site. The same procedure was, however, not applicable when a single agonist ligand was used for the binding site shaping step, indicating that the structural changes that can be achieved by minimization to expand the binding site are not sufficient for stimulating the conformational changes occurring in receptor activation. Instead, a multiagonist ligand pharmacophore-based receptor refinement method needed to be used to generate useful models for agonist virtual screening. Corroborative findings were described for models generated with
15.4 The CPCR
-
7TM Receptor Target Family 1953
the PREDICT method and using the DOCK software in prospective virtual screening for the Dz, ~ - H T ~S-HTd, A , NK1, and CCR3 GPCRs [63]. Given especially the differences in the length of the intra- and extracellular loops, the latter are expected to contribute to ligand entry, binding and/or modulation especially for the peptide and protein binding GPCRs, and given that the currently available inactive state rhodopsin structures can, at best, be a reference for an antagonist state of related class A GPCRs, there are many significant unknowns for the understanding of the structure-function relationship of GPCRs. In this respect, the modeling and indirect structural experiments of GPCRs also revealed the functional role of structural microdomains as opposed to simply considering individual residues. An important microdomain is the so-called DRY domain, which refers to a conserved sequence patch at the cytoplasmatic end of TM3 in class A GPCRs and which also involves residues in TM2, TMG, and TM7 [64]. The overall picture common to many class A GPCRs is that residue Arg3.S0 is hydrogen bonded to a carboxylate side chain at position Asp3.49 and to one or two residues in TM6 equivalent to residues Glu6.30 and T h ~ 6 . 3in~ rhodopsin. Removal of these interactions often results in constitutive activation of the receptor, and based on this and the findings of analysis of structural intermediates of the photocycle of rhodopsin, the emerging theory for receptor activation suggests a mechanism involving a separation of the TM3 and TM6 domains together with a twist in TM6, which pulls the third intracellular I3 loop into the cell, uncovering residues related to G-protein coupling. Since the DRY microdomain is not conserved in other GPCR families (exceptions are some class C GPCRs), it may be concluded that the conformational changes and signaling mechanisms are not strictly conserved. Importantly, as the active conformations generated through constitutively activating mutations and specific agonist ligands seem to be nonidentical, the concept of protean ligands was defined by Kenakin to explain that each specific ligand-receptor pair defines a functional entity with distinct signaling and functional properties [65]. Obviously, this concept raises questions on the generality of the above mentioned virtual screening studies for GPCR agonists. Regarding class B and class C GPCRs, significantly few modeling studies are reported. For class B GPCRs, a general two sites model has emerged for peptide binding [7]. In this mechanism, the C-terminal ligand region binds the extracellular N-terminal domain of the receptor. This interaction acts as an affinity trap, promoting the interaction of the N-terminal region of the ligand with the juxtamembrane 7TM domain of the receptor. Molecular models were, for instance, generated for the interaction of peptide agonists with the CFR2and PTH receptors, putting emphasis on a-helix recognition sites [G6,G7]. Nonpeptide ligands bind the juxtamembrane or the N-terminal domain and, in most cases, allosterically modulate peptide-ligand binding [7]. Also noteworthy is the modeling work around the allosteric binding sites of the class C Ca+*sensing receptor (CaR) [68] and mGluRl and mGluRS receptors [G9], where site-directed mutagenesis and rhodopsin-based homology modeling showed a
954
I novel antagonist binding site within the 7TM bundles, clearly separated from 15 Target Families
the agonist binding site located in the N-terminal domains of these receptors. Oligomerization of GPCRs appears to further contribute to the complexity of the picture [70, 711, and recently a structural hypothesis was provided using molecular modeling to describe how the G-protein transducin docks on to dimer and tetramer oligomeric states of rhodopsin, revealing structural details of this critical interface in the signal transduction process [72]. Biophysical studies, using a Combination of mass spectrometry after chemical crosslinking together with neutron scattering in solution, of the leukotriene B4 BLTl receptor, reconstituted with a heterotrimeric G-protein, sustains this hypothesis by providing evidence for the overall assembly of a pentameric complex formed by two BLTl units and one trimeric G-protein [73]. Ultimately, it will require high-resolution structures of multiple receptors bound to multiple ligands including agonist, inverse agonist, and antagonist, coupled to G-proteins and other modulators to understand fully the conformational dynamics of GPCRs. The development of systematic approaches for X-ray and nuclear magnetic resonance (NMR) analysis of GPCR structures is hence currently a major scientific challenge, which requires further progress in the expression, purification, and crystallization of GPCRs and their interacting proteins [74].
15.4.3.4 Designing Compound Libraries Targeting CPCRs
In the last years, the design of GPCR-directed compound libraries became an intense activity of drug discovery chemistry [75-771. Generally, the design of deorphanization libraries can be distinguished from targeted lead-finding libraries. Given the broad chemical diversity of the hormones that are recognized by GPCRs, deorphanization libraries try to cover as many as possible known active chemical classes. The term surrogate agonist library is also appropriate given that the purpose of these libraries is to find a chemical compound that selectively activates a given orphan receptor of interest [2G, 271. Typically, compounds identical or similar to previously identified GPCR agonists are included along with approved drugs and other reference compounds with known bioactivity, like primary metabolites (e.g., KEGG compound set), or commercially available compilations, like the Tocris LOPAC, the Prestwick, or the Sial Biomol sets. In addition to high-performance liquid chromatography (HPLC) fractionations of tissue extracts to identify new peptides and metabolites, of interest are protein mimetic libraries including B-turnla-helix mimetics together with random or designed peptide libraries based on the bioinformatics analysis of putatively secreted peptides and protein hormones defined in the genome. Typically, the size of deorphanization or surrogate sets is in the order of a few thousand well-characterized compounds amenable for medium-throughput screening. The design of lead-finding libraries follows the same molecular mimicry principles and makes the best use of the substantial medicinal chemistry
15.4 The GPCR
-
7TM Receptor Target Family I955
knowledge generated during the last decades around GPCR compounds together with modern concepts, including lead/drug likeness and computational combinatorial library design [78, 791. Although focused library design concepts target, in general, the classical binding sites, design concepts of bivalent ligands and allosteric ligands are expected to become more important in the future, given the anticipated progress in the understanding of the GPCR oligomerization phenomenon [80]. Divalent ligands selectively targeting 8 - ~ opioid receptor heterodimers provide a recent example [81].The general experience with focused libraries and screening sets for GPCRs is very positive and hit rates of up to 1-10% can be expected with library sizes of 500-2500 compounds, when the libraries are designed toward new members with expected conserved molecular recognition. Peptide and protein mimetic libraries including ,9-turn/a-helix mimetics are recognized of central importance [82, 831. A number of important hormones, like angiotensin, bradykinin, cholecystokinin (CCK),melanine stimulating factor (MSF),and somatostatin (SST)make their key recognition via specific p-turn motifs. Others, like corticotrophin releasing factor (CRF),PTH/PTHrP, neuropeptide Y (NPY), vasoactive intestinal peptide (VIP), or growth hormone releasing factor (GHRF) interact via a-helix motifs [7, 841. While the design of organic druglike a-helix mimetics is still in its infancy, the design of orally active p-turn mimetics based on organic druglike scaffolds, or based on cyclic a-peptides or ply-peptides advanced to a quite routine methodology. The work of Garland and Dean [85,86],defining a set of triangular distance constraints that the substitution points of a scaffold have to satisfy to mimic the specific C, atoms of the peptide template, provided a generalized frame for the design of novel p-turn mimetic scaffolds and was in combination with database searches that were successfully applied for the design of CCK and SST antagonists [84]. The use of privileged substructures or molecular master keys, whether target class specific or mimicking protein secondary structure elements, is an accepted concept in medicinal chemistry. The privileged structure approach emphasizes the molecular scaffolds or selected substructures that are able to provide high-affinity ligands (agonist or antagonists) for diverse receptors and originates from work at Merck Research Laboratories on the design of benzodiazepine-based CCK antagonist, where the previously known K-opioid Tifluadom was identified as a lead structure [87].A number of recent literature reviews provide impressive reference repertoires of empirically derived privileged structures, most notably the spiropiperidines, biphenyltetrazoles, benzimidazoles, and benzofurans [88-901. The 2-aryl-indolescaffold illustrated in Fig. 15.4-5 represents a particularly successful example and was shown at Merck to generate actives for diverse class A GPCRs [91]. In the view of the above mentioned modeling of the ligand-receptor interactions, the privileged structural classes will need to be analyzed further to allow a more directed use of such libraries for specific receptor subsets. The development of cheminformatic methods and procedures enabling the automatic identification and extraction of privileged structures is especially
956
I
15 Target Families
*OH
o$-Q J/H
H
.--N
?&; H
Br
NPY, (lC50= 0.8 nM)
NK, (lC,o = 0.8 nM)
CCR, = 1190 nM) CCR, (lC50= 920 nM)
/
rN'
5-HT,,
(lC,o = 10 nM)
5-HT6(lC5!, = 0.7 nM)
Fig. 15.4-5 Examples ofCPCR active compounds based on the 2-aryl-indoles privileged scaffold identified from a focused combinatorial library at Merck. Screening o f
-NH,
SST, (K, = 0.7nM)
the library against several CPCRs led t o the discovery o f NPYs, NKI, chemokine CCR~/CCRS,serotonin ~ - H T ~ A / ~ -and HT~, SST, receptor antagonists [91].
needed in the context of generating knowledge from HTS data [92]. On the basis of the molecular framework approach developed by Bemis and Murcko [93], we recently initiated a systematic analysis using reference compound and target information. Using the framework analysis as implemented in the Scitegic Pipeline Pilot software, we designed a data pipelining protocol that generates frequency analysis based on the input of the various reference sets. The approach is illustrated in Fig. 15.4-6 for the monoamine GPCRs. A different type of fragment-based design method called thematic analysis was developed by researchers at Biofocus for the design of focused class A GPCR libraries [77]. This knowledge-based method is comparable to a method developed at Novartis, which is illustrated in Section 4.4.3 [53]. SARs were analyzed in detail across the whole class A GPCR family, and family-activity relationships were used to develop a new classification process on the basis of the pairing of sequence themes and ligand structural motifs. A sequence theme is a consensus collection of amino acids within the central binding cavity and a motif is a specific structural element binding to such a particular microenvironment of the binding site. The analysis resulted in a compilation of themes and motifs that, to date, are used at Biofocus to generate focused discovery libraries and to increase the lead optimization efficiency for these targets. The individual compound libraries are targeting
p
15.4 The CPCR - T T M Receptor Target Family
I
957
5%:
F+?I PS2
0
0
0
0 0
0
0 0
0
0
a
0
Q
6
6
Cy
@H H PS1
PS2
PS4
PS3
Q
H
O\
6
HN\ 0
PS5 Fig. 15.4-6 Analysis o f privileged scaffold-target matrix o f monoamine CPCR ligands. For each CPCR ligand assigned in the MDL Drug Data Report (MDDR) database t o a specific monoamine GPCR subtype, the Bemis-Murcko frameworks were generated. The lists o f frameworks were then combined and duplicates were eliminated. The comprehensive list o f unique frameworks define the row vector o f the matrix, and the GPCR subtypes were arranged t o the column vector. The matrix
PS6
PS7
elements were assigned by the number o f compounds reported including a given framework for a given subtype. In addition, for each framework the total number of monoamine GPCR subtypes addressed were added and summarized in the frequency column; the rows were then sorted by decreasing frequency. The structures o f the seven most represented frameworks together with the addressed monoamine GPCR subtypes are shown.
958
I
75 Target Families
subsets of GPCRs, including orphans, which share a predefined combination of themes consisting of a central dominant theme and peripheral ancillary themes. The library scaffold is designed such that it complements the central theme and is amenable to incorporate a variety of structural motifs addressing the individual sequence themes. Each library, consisting of approximately 1000 compounds, can thus be thought of as representing a number of predefined themes, which are either present or absent in any given receptor, allowing through such fingerprinting the computation of a library appropriateness score for each receptor. Thematic analysis is also used to aid lead optimization by the analysis of those themes, which may or may not be involved in the binding of a particular hit molecule and the exploitation of new combinations of used and unused themes to increase affinity and selectivity. Compared to the fragment-based approaches, several groups have developed knowledge-based library design strategies which are, in principle, based on Sir James Black‘s frequently quoted statement: “the most fruitful basis for the discovery of a new drug is to start with an old drug”. The associated selective optimization of side activities (SOSA)approach is an additional very successful medicinal chemistry concept in which the atypical neuroleptics acting on a couple of GPCRs simultaneously provide a relevant illustration of the rationale [94].The related computer-assisted drug design (CADD) methods make use of selected reference compound sets and molecular descriptors together with advanced cheminformatic methods to compare and rank the similarity of designed candidate molecules [95, 961. Homology-based similarity searching was developed at Novartis as a cheminformatics similarity searching method able to identify not only ligands binding to the same target as the reference ligand(s) but also potential ligands of other homologous targets for which no ligands are yet known [97]. The method is based on the Similog descriptor, which describes molecules as counts of pharmacophore triplets formed by the individual nonhydrogen atoms and uses a centroid of the reference compounds to describe the distance to the candidate molecule. In a retrospective analysis, the method was shown to be highly effective for monoamine GPCR and became an essential tool for the compilation of focused screening sets. Related to the cheminformatics similarity searching methods are machine learning methods, like artificial neural networks, Kohonen self-organizing maps, and support vector machines (SVMs), which try to align the chemical and biological spaces on the basis of mapping procedures [98].The goal here is to identify which parts (islands) of the chemical-property space correspond to specific target family or therapeutic activities, and vice versa. A number of groups have applied such methods for design of broad GPCR-focusedlibraries, and more recently, to specifically distinguish between family subgroups class A, B, and C GPCR ligands, or to identify specific GPCR ligands for the adenosine A2A, cannabinoid, CRF, and endothelin GPCRs [99]. De novo design methods are reported, in which, based on ligand-based pharmacophore models and abstract feature tree representations of GPCR ligands, virtually generated molecules are evaluated and proposed for synthesis [ 100, 1011.
75.4 The GPCR
-
TTM Receptor Target Family I959
15.4.3.5 The Contribution o f Molecular Informatics to CPCR Chemical Biology
Given the fast growing number of molecular data and information related to GPCRs, the need for molecular information systems which integrate bioinformatic and cheminformatic systems was recently recognized [ 11, 1021. The cross-linking of the chemical and biological GPCR knowledge spaces via classification and annotation schemes is an essential element of chemogenomics knowledge-based ligand design strategies, which are based on the fact that similar ligands bind to similar targets. The systems allow the compilation of relevant reference sets for cheminformatics-based similarity searches and for the library design of target class focused collections; vice versa, the ligand similarity principle can be used to infer putative molecular targets of compounds of interest. Most of the systematically generated information on GPCRs is today publicly accessible via the Internet and a selection of relevant information sites is summarized inTab. 15.4-3.In addition, a growing number of chemogenomics knowledge-based companies, like Aureus Pharma, Inpharmatica, GVKBio, Evolvus, and Jubilant Biosys are developing molecular information systems, which integrate, in a comprehensive manner, GPCR data from patents and selected literature together with chemical and biological search engines. Molecular information systems like the Cerep Bioprint Matrix or Iconix DrugMatrix, summarizing the analysis of validated IC50 profiling data of drug and development compounds on a panel of GPCR and other targets together with absorption, distribution, metabolism, excretion, and toxicity (ADMET) data, are becoming important for lead prioritization and design of safety pharmacology studies. Currently, such data is used up front to identify the clinical investigations potential side effects using both in uitro and in silico testing [103-1051. Given the fast growing complexity of the knowledge around GPCRs and their interacting effector and regulator proteins, opening many new potential mechanisms for interaction with small drug molecules, the design of the data models of the molecular information systems will need to evolve further to enable integration and mining of knowledge within a broader system biological and chemical genetic network concept space. Bioinformatics analyses provide an essential contribution to GPCR chemical biology. The investigation of sequence similarities through phylogenetic, diagnostic fingerprint, or Hidden Markov Model (HMM) analyses are a commonly used strategy to classify new orphan members and to facilitate the identification of the endogenous ligands [ 106, 1071. Phylogenetic analyses predicted, for instance, that sphigosine-1-phosphate ( S l P ) , the endogenous ligand of the EDGl GPCR, is also the ligand of the EDGj, EDGs, EDGb, and EDGE GPCRs. Also, the ligand and the pharmacology of the human histamine H4 GPCR was predicted through phylogeny, noting that it shares only 26% identity with the histamine H I receptor. Conversely, examples are known where sequence homology can be misleading; for example, a receptor originally known as P2Y7 (BLT1)was thought to be a nucleotide receptor based
960
I on its similarity to P2Y purinoceptors, but it was shown to be activated by an 15 Target Families
unrelated ligand, leukotriene B4. A different type of bioinformatics analyses focus on the analysis of specific sequence motifs and signatures, which may lead to different conclusions than the analysis of the overall sequence identity. For instance, two orphan receptors, GPR6l and GPR62, reported to have overall sequence identity of 30% to the human 5-HTb receptor were thus classified as monoamine-like receptors. Strikingly, both of them show mutations of the D3.32 residue and should therefore belong to a different subfamily. Understanding the principles of molecular recognition, in combination with residue and motif-based 1D and 3D bioinformatics data mining, is becoming an essential element for successful chemogenomics knowledge-based ligand design strategies. Noteworthy in this perspective is the recent work done at Pfizer and Biofocus, where, based on analysis of sequence data, mutation data, and physicochemical properties of the ligands, approaches were outlined to discover sequence patterns characteristic of specific ligand classes [77, 1081. The potential of such computational methods was recently illustrated for the identification of ligands of the prostaglandin D2 receptor chemoattractant receptor-homologous molecule expressed on T helper type 2 (CRTH2). Using a computational strategy which emphasizes on the classification of GPCRs with respect to physicochemical features of selected amino acid residues of the central binding cavity, researchers at 7TM Pharma showed that the angiotensin AT1 and AT2 references can be used to identify high-affinity ligands for the CRTH2 receptor; notably in the ordinary phylogenetic analysis, the AT1 and AT2 receptors are not identified as close neighbors according to the conventional evolutionary relationship models [log]. Other signature motifs direct the signaling interactions of the receptors with effector and regulator proteins. The identification of a conserved motif within second intracellular loops I2 and I3 of the somatostatin receptor subtypes (SSTI, SST3, and SST4), the dopamine D2, and the a2B-adrenoceptors, which confers inhibitory coupling to the NHEl ion exchanger, is given as a recent example [1101.
15.4.4 Applications and Practical Examples 15.4.4.1
Biological Expression o f CPCRs
The analysis of the tissue distribution of the receptors provides valuable information related to the potential physiological function and therapeutic indication of a given GPCR, and is an essential part of the pharmacological target validation in the drug discovery process. Validation is based on the evidence that the target gene is expressed in cells relevant to the pathophysiologic mechanism of the disease indication. This information is combined with the epidemiological evidence that target gene expression
15.4 The CPCR
-
7TM Receptor Target Family I961
is associated with the appearance/progression of the disease indication. Furthermore, evidence that a target gene activity is necessary for a defined phenotypic response relevant to disease indication is tested by the inhibition of its expression or function, or by overexpression. For instance, the undecapeptide substance P is a neurotransmitter that mediates diverse biological responses in the nervous and immune systems mainly through the NK1 GPCR. The specific response ofthe hormone depends on the location of the NKI receptor, and pain, neurogenic inflammation, asthma, and emesis are currently discussed as potential therapeutic indications for NKI antagonists. The knowledge of the tissue distribution is thus essential to predict potential main and side activities. To this aim, specifically designed functional genomic experiments using oligonucleotide GPCR chips or reverse transcriptase-polymer chain reaction (RT-PCR) technologies in combination with immunochemistry approaches allow the identification of gene expression profiles across a wide variety of healthy versus diseased human and animal tissues [36, 431. The GPCR expression matrix generated by Vassilatis et al. [36] and represented in Fig. 15.4-7, shows the expression of 100 randomly selected endo-GPCRs in peripheral and neural mouse tissues demonstrating that most GPCRs are expressed in multiple tissues and that individual tissues express multiple receptors. Strikingly, over 90% of the analyzed GPCRs are expressed in the brain. The profiles of most GPCRs are unique, yielding thousands of tissuespecific receptor combinations for the potential modulation of physiological processes and design of therapeutics. Given that each tissue appears to have a unique combination of GPCRs, indicates that second messenger pathways are used in different contexts to allow differentiation of cellular responses to hormone action. Expression profiling also contributes to the understanding of the functional significance of receptor subtypes, which in different tissues, couple one same hormone to different G-proteins and effector systems and which might also show differences in their constitutive activity or regulatory aspects, like the desensitization kinetics.
15.4.4.2 Advances in HTS o f CPCRs
Since the birth of modern HTS in the mid-l980s, drug discovery experienced an explosion in novel assay methodologies and technologies. While around 10 000 compounds were tested every year in few assays in the mid-l980s, these numbers rapidly increased in the major pharmaceutical companies during the past 20 years to reach 1-2 million compounds tested within 50-100 assays. The major challenge is to develop and implement simple assay methods to expedite HTS while maintaining high quality and generating relevant information at low cost. GPCRs are targets where these criteria apply well since their mode of activation by ligands offers many opportunities for assay design and miniaturization. As illustrated in Fig. 15.4-3, the signaling cascade subsequent to GPCR activation opens, in addition to basic
962
I
15 Target Families
15.4 The CPCR 4
Fig. 15.4-7 Cluster analysis o f t h e expression o f 100 randomly selected mouse endo-CPCR genes in 1 7 peripheral tissues and 9 different brain regions. The genes were analyzed individually by RT-PCR as shown and the intensity ofthe observed bands was determined by scanning. Each gene is represented by a single row o f colored boxes with four different expression levels: no expression, blue; low expression, purple; moderate expression, dark red; strong expression, pure red. Three groups o f endo-CPCRs with broadly related profiles
-
7TM Receptor Target Family I963
were observed. In the first group (a) genes were expressed primarily in peripheral tissues. Seven o f these genes were expressed exclusively in peripheral tissues and not in the brain. The second group (b) contained genes expressed primarily in brain. O f these 41 genes, 14 were solely expressed in brain and not in peripheral tissues. In the third group (c), the genes were broadly expressed in the brain and throughout the periphery. Figure reproduced with permission from [36].
ligand-binding assays, versatile opportunities to develop HTS assays based on G-protein activation, determination of second messengers, or nuclear activation. Currently, a variety of biophysical readout techniques and assay formats are routinely used and have advantages and limitations as summarized in Table 15.4-4. Every assay will be selected on the basis of a set of criteria including among others, infrastructure, instrumentation, throughput requirement, or the type of information requested. For cell-based GPCR assays the question comes to measuring affinity or efficacy [114]; both are fundamental and distinct characteristics of the compound-receptor pair pharmacology [ 1151. Functional cellular assays are especially superior in information compared to ligandbinding assays when seeking allosteric modulators acting at receptor sites other than the binding site of the endogenous agonist, or when multiple measurements are required in the same well to provide additional activity and selectivity information. For instance, Sabroe et al. showed in a single HTS run that dual CCRl and CCR3 blockers are able to abrogate chemokine-induced cell chemotaxis and other functional parameters such as eosinophil shape changes and calcium mobilizations [llG]. FLIPR duplex calcium mobilization assays were developed at Novartis to identify blockers of the chemokine CXCR4 receptor. Screening compounds are tested in the same well against the CXCR4 receptor and subsequently against the muscarinic MS receptor expressed in the same CEM-T cells. This duplex readout provides a hint on compound selectivity in a cost-effective fashion already in primary screening. The approach is rendered possible by the noninvasive nature of FLIPR calcium assays and enables the prioritization of compounds acting at the receptor level and the exclusion of compounds interfering with cellular components common to the two GPCRs. Furthermore, GPCR triplex assays are routinely used at Novartis and rely on three successive readouts obtained from the same well. As shown in Fig. 15.4-8,the triplex GABAB heterodimeric receptor calcium assay enables the detection of agonist, modulation, or antagonist properties of screening compounds in a single run. The presence of an agonist is revealed not only by its own activity (Fig. 15.4-8(a))but also through the receptor desensitization
964
1
15 Target Families
Table 15.4-4 Commonly used assays for CPCR HTS Molecular principles
Ligand binding
G-protein activation Second messenger
Nuclear activation
Coupling
Assay type
Plate format
Comments limitations
Radioligand filtration assay SPA radioligand binding FP GTPy35S filtration assay
96
Safety and costs
384 384,1536 96
costs Ligand labeling Safety and costs
GTPy35S SPA cAMP determination based on fluorescence approaches: FP, FI, HTRF/LANCE. rP3 determination on binding and chromatographic approaches Ca+2determination using specific fluorescence reader technology and indicator dyes (FLIPR/Fluo-4, FDSS6000/Fluo-4,Fura-2) or proteins (Aequorin). Reporter gene assays activated via CRE and SRE response elements and SEAP, luciferase, and B-lactamase readouts
384 1536
costs FP: Sensitivity
96,384
Low throughput, mainly Gq
384
Mainly with G, and Gs with CNG2 channels
384.1536
May lead to signal variation based on cell quality. Long incubation.
HTRF/LANCE - homogeneous time-resolved fluorescence/lanthanide cryptate excitation; SPA - scintillation proximity assay - homogeneous assay which detects radioisotopes in close proximity to a solid scintillant; FP - fluorescence polarization; FI - fluorescence intensity: CNG2 - cAMP gated ion channel 2; CRE - cAMP response element; SRE - serum response element; SEAP - secreted alkaline phosphatase. For further details see Refs. [lll-1131
during the antagonist assay phase. Modulators are detected by using a small agonist concentration in the second phase (Fig. 15.4-8(b))and may be devoid of agonist properties. Antagonists are clearly appearing in the third phase following an injection of a higher GABA concentration and are characterized by a lack of intrinsic activity in the first phase (15.4-8(c)). Multiplex assays do not achieve the compound throughputs possible with single measurement assays; however, they produce much richer information already in primary screening, which is invaluable for compound categorization and prioritization by the medicinal chemists. A further advantage of such assays
75.4 The CPCR
-
7TM Receptor Target Family I965
Fig. 15.4-8 FLIPR calcium traces from a CABAB receptor triplex assay in a 384-well format. Experiments are performed with Chinese Hamster Ovarian K1 (CHOK1) cells stably expressing the CABABR~ and C A B A B R ~ receptor subunits. The cells are loaded with the calcium sensitive dye Fluo-4. Three successive injections are performed during the course o f t h e experiment. The first injection is with screening compound in general a t 10 pM, followed by two injections o f CABA at concentrations corresponding t o i t s EC20 (0.15 pM) and EC8o (10 pM), respectively. Different FLIPR traces are obtained depending on the nature o f the screening compound. (a) The agonist CABA; (b) L-Baclofen, a CABAs receptor modulator; (c) CCP56999, a competitive CABAB antagonist [117]. The shown signals are expressed as nonnormalized fluorescence changes.
is the possibility to exploit fluorescent kinetic traces to exclude compounds interfering nonspecifically with the readout of the affecting cells. The lower compound throughput per time unit can be largely compensated by a careful assay design and by using assay automation to ensure overnight operations.
966
I
15 Target Families
Although the information from cell-based, mainly heterologous, systems is very valuable, caution is necessary for the interpretation of its i n vivo physiological relevance. For instance, in stably transfected CHO cells, Cevimeline (AF102B) behaves as a classical M1 antagonist, measuring adenylate cyclase activity, fully blocking the activation by carbachol. However, when measuring IP3 activation in the same cell line via PLCb or PLA2, the compound behaves as a partial agonist. And even more amazing, when monitoring with confocal microscopy intracellular Ca+* mobilization, it behaves as superagonist having stronger response than carbachol [ 1181. Thus, an important role comes to advanced HTS data analysis for decision support in drug discovery [ 1191.
15.4.4.3 Designing a Focused Cornbinatorial Library for Monoarnine-related GPCRs
On the basis of the central chemogenomics principle that similar ligands bind to similar targets and that ligands of close homologous receptors are generally considered as putative starting points in lead-finding programs for receptors for which no specific ligands are yet known, we proposed a chemogenomics knowledge-based combinatorial library design strategy for lead finding [531. The strategy is founded on the integration of both, the deconvolution of known modular ligands of homologous receptors into their component fragments and the structural bioinformatics comparison of the binding sites for the individual ligand fragments. In essence, in the ligand space, by the analysis of both the ligand architectures and the structures of the component “one-site filling” fragments of known ligands, it should be possible (by referring to the locally, most directly related, and characterized receptors) to identify those component ligand fragments, which based on the binding site similarities are potentially best suited for the design of ligands tailored to the new target receptor. The strategy was presented in the context of designing the tertiary amine (TAM) combinatorial library directed toward monoamine-related GPCRs for which the conserved aspartate residue D3.32 in TM3 was demonstrated by twodimensional mutation experiments to be responsible for the recognition of the charged amino group of monoamine ligands by their GPCRs (Fig. 15.4-9). Focusing on the central importance of the D3.32 residue and using the D3 ~ ~ X ~ G ( D E ) R ( Y motif F H )in TM3 as sequence signature defining relatedness to the monoamine GPCR subfamily we identified, by database searches, 50 human GPCRs, which included 7 orphan GPCRs (two ofwhich are now known to correspond to pseudogenes) and constituted the originally aimed target repertoire of the library. Later it was recognized that trace amine receptors, which conserve the D3.32 residue, and also chemokine receptors, which lack the D3.32 residue, but in which a corresponding glutamate residue E7.39 in TM7 is responsible for the recognition of the TAM chemotype, have to be considered on the basis of molecular recognition principles as monoamine-like GPCR;
15.4 The CPCR
(a) Known Reference Architectures
kNH
v
HO
Serotonin 5-HT ago.
HU
8
0
I
-
7 J M Receptor Target Family
(b) Novel Compound Prototypes
I
967
\
PrOpranOlOl
p antagonist
H
O
a
8-OH-DPAT 5-HT,, part. agonist
CI
f5 b,,
o=s=o Ketanserin 5-HTzn antagonist
o~
Janssen-1 D, antagonist
WAY-100635 5-HTz, antagonist
5-HT, antagoonist pK, = 7.25
.
RO-16814
p agonist
Kissei-1 D, antagonist
Fig. 15.4-9 Prototype structures ofthe Novartis TAM combinatorial libraries generated through reductive amination o f selected aldehydes and secondary amines. The new structures for which examples are shown in (b) were designed to be similar t o known monoamine CPCR ligands for which examples are shown in (a). Ligands, which are o f the size o f the endogenous ligands, are herein called simple - one-sitefilling
Q0 ’ 0,
ligands. In addition to this natural architecture, ligands exist where two or three such “simple” ligand fragments are linked around a basic positively charged group: these ligands are called, correspondingly, double and triple ligands. All three architectures - “simple”, “double”, and “triple” - o f known monoamine CPCR ligands are represented in the TAM library.
what extents the target repertoire to around 80 GPCRs - a significant part of all the class A GPCRs [ll]. Databases of site-specific ligand fragments, which should be recombined on an appropriate scaffold to yield ligands, are the keystones of such a knowledge-based system. Their generation of site specifics is in principle possible through the deconvolution of the known ligands guided by SAR and by molecular similarity consideration. Given the promiscuity of some fragments (e.g., symmetric ligands), caution must be exercised before drawing definitive conclusions about the actual positioning of the fragments. Pragmatically,
968
I these limitations to the generation of site-specificligand fragment databases 75 Target Families
were approached by pooling fragments into multiple pools and by designing generic combinatorial libraries of known privileged active fragments around appropriate scaffolds. The TAM library was screened in a number of GPCR campaigns and high hit rates were observed especially for the monoamine and chemokine GPCRs. Especially noteworthy is that the hit rates of the designed TAM library are higher than those observed for a corresponding library without specific design input. The TAM library includes many new combinations of known active fragments and privileged GPCR motifs. In addition to addressing new receptors, this should allow the discovery of fascinating multireceptor profiles of potential pharmacological interest. The search for antagonist of the 5-HT7 GPCR, which has the 5 - H T 1receptors ~ as neighbors in the sequence dendrogram, illustrates the successful use of the TAM library. Searching with S-HT~A reference compounds, using the Similog method, within the TAM library, we were able to identify a 10% hit rate (~K< B 5 pM) when only a biological assay with limited throughput capacity was available. The hits were arylpiperazines, which in follow-up studies were found to be active on other monoamine GPCRs also. 15.4.5 Future Development
The molecular knowledge of GPCRs as information processing units continues to progress at an impressive pace [4, 1201. Besides the many efforts and opportunities on orphan receptors, GPCR research focuses on deeper characterization of GIP networks and receptosomes. Key questions focus on the physiological and therapeutic relevance of receptor homo- and heterodimerization [71, 731. GPCRs were initially believed to be monomeric entities, but accumulating evidence now supports the presence of GPCRs in multimeric forms using techniques like immunoblotting and coimmunoprecipitation combined with FRET and B RET experiments in living cells. The existence of homodimers is established for many class A GPCRs (e.g., dopamine D2 and D3, pz-adrenoceptor, muscarinic M1 and M2 receptors, NKI, opiate, and SST5 receptors), and class C GPCRs (e.g., mGluR and CaR forming covalent dimers via a cystein bridge linking the N-terminal domains of the two receptors). Proposed roles for heterodimerization include diversifying the pharmalogical response, providing a further mechanism for the fine tuning of hormone signaling and G-protein specificity,and regulating the receptor ontogenesis and internalization. Differences in the pharmacological properties of heterodimer GPCRs were observed for the G/K-opiate receptors, dopamine/somatostatin receptors, and GABABRI/GABABRz receptors. The GABABR~/GABABR~ heterodimer is particularly illustrative [121]. It is known that the GABABR~ is not trafficked effectively to the cell surface in the absence of GABABR~ expression. In addition, GABABR~ binds the agonist
I ligand but is not coupled to G-proteins, whereas GABAB~2 activates G-protein 15.4 The CPCR - TTM Receptor Target Family
signaling but does not bind the ligand. It was recognized that new compound screening strategies, allowing the detection of ligand binding or function only by a heterodimer pair in the presence of the corresponding homodimers, are required to allow rapid and effective identification of ligands with these characteristics. Only with such ligands at hand, it will be possible to tease out the physiological relevance of GPCR heterodimerization [71]. The opioid agonist ligand 6’-guanidinonaltrindole (6’-GNTI) is the first example of such a ligand. G’-GNTI has the unique property of selectively activating only G/K-opioid receptor heterodimers but not homomers [122]. Importantly, G’-GNTI is an analgesic, thereby demonstrating that opioid receptor heterodimers are indeed functionally relevant i n vivo. However, G’-GNTI induces analgesia only when it is administered in the spinal cord but not in the brain, suggesting that the organization of heterodimers is tissue specific. Other studies are indirect and may reflect cross-talk between the signaling pathways at a level downstream of receptor activation. The ability of B-blockers to interfere with angiotensin AT1-mediated signaling, and the ability of the AT1 receptor blocker valsartan to reduce catecholamine-induced elevation in the heart rate may indicate functional angiotensin AT1-adrenoceptor interactions i n vivo. The discovery that some GPCRs appear to function in preformed and dynamic complexes with other signal transduction and scaffolding proteins opens many interesting possibilities for drug discovery. For instance, targeting the postsynaptic density (PSD-95) and Homer scaffolding proteins might result in a new manner to modulate receptor activity [123]. PSD-95 is known to function in synaptic neurotransmission and plasticity by enhancing or depressing the synaptic strength depending on the frequency of neuronal firing. The protein is a multiadapter, which binds via its PDZ domain specific GPCRs (e.g., ~ - H T ~ 5-HT2c) A, and ion channels (e.g., N-methylD-aspartate (NMDA)) and enables, together with other protein-protein interactions, the spatial organization of complex microarchitectures jointly with the cytoskeleton. Similarly, the Homer proteins, which play a role in the glutamatergic synaptic transmission, are composed of an N-terminal enabled VASP homology type 1 (EVH1) domain, interacting with GPCRs (e.g., mGluRl, mGluR5), ion channels (e.g., IP3 or ryanodine Ca+* receptor channels, Transient receptor potential cation channel 1 and 2) and other proteins, and a C-terminal coiled-coil domain that enables dimerization and complex formation. It remains to be seen how general or specific these intracellular GPCR modulator mechanisms are. In addition, small molecular compounds able to disrupt or reinforce these interactions are needed to further understand their physiological importance. A new trend is also the therapeutic evaluation of monoclonal antibodies against GPCRs. Although small molecule drugs seem to be the preferred agents, recent success stories targeting the CCR5 receptors against HIV entry, or the thyroid-stimulating hormone (TSH) receptor in Grave’s disease, show that this route is also feasible.
969
970
I
15 Target Families
More generally, in the age of genomic medicine, the pharmacogenetics of GPCRs is becoming increasingly important and plays a role especially in the target validation of the new GPCRs and the clinical validation of the drugs [124], as was recently exemplified for adrenoceptors [125]. The study of allelic variations, based on single nuclear polymorphism (SNP) or other sequence polymorphism data, allows the identification of the major allele of the target gene needed for the development of screening and profiling assays. Alternative splicing (e.g., for the human histamine H3 receptor 20 isoforms are reported), RNA editing (e.g., for S-HTzc receptor, seven major isoforms are predicted differing in their second intracellular loop), and coupling to specific G-proteins,have all been selected by evolution to modulate the activity of GPCRs, providing multiple regulatory switches to fine-tune basal cellular activities. In addition, genetic linkage studies provide evidence that a mutation in the gene is associated with susceptibility to appearance/progression of the disease indication. Compared to the thus far discussed emphasis on drug discovery applications, olfactory receptors play an important role in the perfume and cosmetic industry; the screening and design of new odorants is an economically interesting application. The discovery that the malaria transmitting mosquitos Anopheles, which is responsible for the death of more than one million people each year, possesses odorant receptors for particular components of human sweat means that different ligands could be screened for their activation or inhibition of these receptors, potentially leading to new, more effective insect traps and repellents [126]. 15.4.6 Conclusions
Chemical biology investigations of GPCRs started with very simple questions to understand how hormones, such as adrenaline or glucagon, signal at the intracellular level and how this signaling translates into the physiological response. During the last 25 years of molecular GPCR research, the understanding of the machinery was elucidated for a few model GPCRs in great detail and revealed a fascinating beauty which turned out to be far more complex than that initially expected [4]. During the next several years, our detailed knowledge about many newly deorphanized GPCRs and the organization and regulation of the GIP network constituting the receptosomes will certainly continue to grow. The herein mentioned chemical biology approaches will all contribute to the identification of chemical compounds that enable the directed targeting of each of these components. In the perspective of the drug discovery, it will be especially interesting to follow how signaling drugs will be discovered further downstream, or whether the GPCR ligandbinding sites will remain as the preferred entry point for medication. A central question will be how fast these molecular discoveries will translate
References I971
into new medicines. Especially with the newly discovered and deorphanized receptors, the ultimate challenge resides in the enormous knowledge gap existing between the new molecular discoveries and their significance for disease processes and medicine. While the classical hormone GPCR targets were “top-down’’ validated, on the basis of pharmacology, physiology, and clinical medicine, the new hormone GPCR systems come “bottom-up”: Their early validation based on bioinformatics and genetics data are expected to direct clinical research, and the comprehensive understanding of their role in physiology will take time. It will be interesting to see the medical outcome of these activities after another decade of research.
Acknowledgments
Drs. K. Azzaoui, B. Faller, P. Floersheim, P. Fuerst, D. Hoyer, H. Mattes, H.-J. Roth, A. Sailer, G. Scheel, P. Schoeffter, S. Siehler, R.S. Tsai, and R. Wolf (all NIBR associates) are acknowledged for various support and discussions. We thank Drs. J. Mosbacher and K. Kaupman (also N I B R associates) for CHOKl cells expressing the heterotrimeric GABABreceptor and the selective modulators and antagonists.. B. Frisch and M. Brasey, from NIBR Knowledge Center, are acknowledged for support with IMS Knowledge Link. S.C.DiClemente, from NIBR Communications, is gratefully acknowledged for editorial assistance.
References A.L. Hopkins, C.R. Groom, The druggable genome, Nut. Rev. Drug Discou. 2002, I , 727-730. 2. T. Klabunde, G. Hessler, Drug design strategies for targeting G-protein-coupled receptors, ChemBioChem 2002,3,928-944. 3. T.W. Schwartz, B. Holst, in Textbook ofReceptor Pharmacology, (Eds.: J.C. Foreman, T. johansen), CRC Press, Washington, 2003, pp. 81-109. 4. C. Ellis, The state of GPCR research in 2004, Nat. Rev. Drug Dkcou. 2004, 1.
3,577-626. 5.
H.C. Huang, P.S. Klein, The Frizzled family: receptors for multiple signal transduction pathways, Genome Biol. 2004, 5, 234.1-2234.7.
6.
F. Horn, E. Bettler, L. Oheira, F. Campagne, F.E. Cohen, G. Vriend, GPCRDB information system for
7.
G protein-coupled receptors, Nucleic Acids Res. 2003, 31, 294-297. S.R.J. Hoare, Mechanisms of peptide and nonpeptide ligand binding to class B G-protein-coupled receptors, Drug Discov. Today 2005, 10, 417-427.
J.P. Pin, T. Galvez, L. Prezeau, Evolution, structure, and activation mechanism of family 3/C G-protein-coupled receptors, Pharmacol. Ther. 2003, 98, 325-354. 9. R. Fredriksson, M.C. Lagerstrom, L.G. Lundin, H.B. Schioth, The G-protein-coupled receptors in the human genome from five main families. Phylogenetic analysis, paralogon groups, and fingerprints, Mol. Pharmacol. 2003, 63, 1256-1272. 10. The IUPHAR committee on receptor nomenclature and drug classification. 8.
972
I
15 Target Families
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
7'he I U P H A R Compendium of Receptor Characterization and Classijcation, 2nd ed., IUPHAR Media, London, 2000. E. Jacoby,A. Schuffenhauer, P. Acklin, in Chemogenomics in Drug Discovery-A Medicinal Chemistry Perspective, (Eds.: H. Kubinyi, G. Muller), Wiley-VCH,Weinheim, 2004, pp. 139-166. G. Wess, How to escape the bottleneck of medicinal chemistry, Drug Discov. Today 2002,4,533-535. R. Lekowitz, Historical review: a brief history and personal retrospective of seven-transmembrane receptors, Trends Pharmacol. Sci. 2004, 25, 413-422. R.R. Neubig, M. Spedding, T. Kenakin, A. Christopoulos, International union of pharmacology committee on receptor nomenclature and drug classification. XXXVIII. Update on terms and symbols in quantitative pharmacology, Pharmacol. Rev. 2003,55,597-606. M. Rodbell, Nobel lecture. Signal transduction: evolution of an idea, Biosci. Rep. 1995,15, 117-133. A.G. Gilman, Nobel Lecture. G proteins and regulation of adenylyl cyclase, Biosci. Rep. 1995,15, 65-97. E.W. Sutherland, Studies on the mechanism of hormone action, Science 1972,177,401-408. M.J. Marinissen, J.S. Gutkind, G-protein-coupled receptors and signaling networks: emerging paradigms, Trends Pharmacol. Sci. 2001,22,368-376. H. Hamm, The many faces of G protein signaling, J . Biol. Chem. 1998, 273, 669-672. /. Black, Nobel lecture in physiology or medicine-1988. Drugs from emasculated hormones: the principle of syntopic antagonism, In Vitro Cell. Dev. Biol. 1989,25, 311-320. M.G. Caron, Y. Srinivasan, J. Pitha, K. Kociolek, R.J. Lefkowitz,Affinity chromatography of the beta-adrenergic receptor, J . Biol. Chem. 1979,254,2923-2927.
22.
23.
24.
25.
26.
27.
28.
29.
30.
R.A. Cerione, B. Stmlovici, J.L. Benovic, C.D. Strader, M.G. Caron, R.J. Lefkowitz, Reconstitution of beta-adrenergic receptors in lipid vesicles: affinity chromatography-purified receptors confer catecholamine responsiveness on a heterologous adenylate cyclase system, Proc. Natl. Acad. Sci. U S A . 1983,80,4899-4903. R.A. Dixon, B.K. Kobilka, D.J. Strader, J.L. Benovic, H.G. Dohlman, T. Frielle, M.A. Bolanowski, C.D. Bennett, E. Rands, R.E. Diehl, Cloning of the gene and cDNA for mammalian beta-adrenergic receptor and homology with rhodopsin, Nature 198G,21, 75-79. Y.A. Ovchinnikov, Structure of rhodopsin and bacteriorhodopsin, Photochem. Photobiol. 1987,45, 909-914. R. Henderson, P.N. Unwin, Three-dimensional model of purple membrane obtained by electron microscopy, Nature 1975,257, 28-32. N. Robas, M. O'Reilly, S. Katugampola, M. Fidock, Maximizing serendipity: strategies for identifying ligands for orphan G-protein-coupled receptors, Curr. Opin. Pharmacol. 2003,3, 121-126. A. Wise, S.C. Jupe, S. Rees, The identification of ligands at orphan G-protein coupled receptors, Annu. Rev. Pharmacol. Toxicol. 2004, 44, 43-66. R. Seifert, K. Wenzel-Seifert, Constitutive activity of G-protein-coupled receptors: cause of disease and common property of wild-type receptors, Naunyn Schmiedebergs Arch. Pharmacol. 2002, 366,381-416. E. Schipani, K. Kruse, H. Juppner, A constitutively active mutant PTH-PTHrP receptor in Jansen-type metaphyseal chondrodysplasia, Science 1995,268, 98-100. P. Strange, Mechanisms of inverse agonism at G-protein-coupled receptors, Trends Pharmacol. Sci. 2002,23,89-95.
31.
C.D. Strader, I.S. Sigal, R.B. Register, M.R. Candelore, E. Rands, R.A. Dixon, Identification of residues required for ligand binding to the beta-adrenergic receptor, Proc. Natl. Acad. Sci. U.S.A. 1987, 84,
4384-4388. A.M. van Rhee, K.A. Jacobson, Molecular architecture of G protein-coupled receptors, Drug Dev. Res. 1996, 37, 1-38. 33. J. Bockaert, P. Marin, A. Dumuis, L. Fagni, The ‘magic tail’ of G protein-coupled receptors: an anchorage for functional protein networks, FEBS Lett. 2003, 546, 65-72. 34. J. Bockaert, J.P. Pin, Molecular tinkering of G protein-coupled receptors: an evolutionary success, EMBOJ. 1999, 18,1723-1729. 35. J. Bockaert, A. Dumuis, L. Fagni, P. Marin, GPCR-GIP networks: a first step in the discovery of new therapeutic drugs?, Curr. Opin. Drug. Discov. Devel. 2004, 7, 649-657. 36. D.K. Vassilatis, J.G. Hohmann, H. Zeng, F. Li, J.E. Ranchalis, M.T. Mortrud, A. Brown, S.S. Rodriguez, J.R. Weller, A.C. Wright, J.E. Bergmann, G.A. Gaitanaris, The G protein-coupled receptor repertoires of human and mouse, Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 4903-4908. 37. L. Buck, R. Axel, A novel multigene family may encode odorant receptors: a molecular basis for odor recognition, Cell 1991, 65,175-187. 38. A. Terakita, The opsins, Genome Biol. 2005, 6, 213.1-213.9. 39. R. Fredriksson, H.B. Schioth, The repertoire of G-protein coupled receptors in fully sequenced genomes, Mol. Phamacol. 2005, 67, 1414-1425. 40. A. Sodhi, S. Montaner, J.E. Gutkind, Viral hijacking of G-protein-coypled-receptor signaling networks, Nat, Rev. Mol. Cell. Biol. 2004,5,998-1012. 41. S. Offermanns, M.I. Simon, G alpha 15 and G abha 16 couple a wide variety of receptors to
phospholipase C,]. Biol. Chem. 1995, 270,15175-15180. 42.
32.
B.R. Conklin, Z. Farfel, K.D. Lustig, D. Julius, H.R. Bourne, Substitution of three amino acids switches receptor specificity of Gq alpha to that of Gi alpha, Nature 1993, 363, 274-276.
D.T. Chalmers, D.P. Behan, The use of constitutively active GPCRs in drug discovery and functional genomics, Nut. Drug Discou. Rev. 2002, I, 599-607. 44. L.M. McLatchie, N.J. Fraser, M.J. Main, A. Wise, J . Brown, N. Thompson, R. Solari, M.G. Lee, S.M. Foord, RAMPS regulate the transport and ligand specificity of the calcitonin-receptor-like receptor, Nature 1998, 393, 333-339. 45. G. Milligan, High-content assays for ligand regulation of G-protein-coupled receptors, Drug Discou. Today 2003, 8, 579-585. 46. C. Granas, B.K. Lundholt, A. Heydorn, V. Linde, H.-C. Pedersen, C. Krog-jensen, M.M. Rosenkilde, L. Pagliaro, High content screening for G protein-coupled receptors using cell-based protein translocation assays, Comb. Chem. High Throughput Screen. 2005, 8, 43.
301-309. 47.
48.
49.
M.G. Ludwig, M. Vanek, D. Guerin, J.A. Gasser, C.E. Jones, U. Junker, H. Hofstetter, R.M. Wolf, K. Seuwen, Proton-sensing G-protein-coupled receptors, Nature 2003, 425,93-98. K. Palczewski,T. Kumasaka, T. Hori, C.A. Behnke, H. Motoshima, Crystal structure of rhodopsin: A G protein-coupled receptor, Science 2000, 289,739-745. E. Pebay-Peyroula,G. Rummel, J.P. Rosenbusch, E.M. Landau, X-ray structure of bacteriorhodopsin at 2.5
angstroms from microcrystals grown in lipidic cubic phases, Science 1997, 50.
277,1676-1681. M.F. Hibert, S. Trumpp-Kallmeyer, J. HofTlack, This is not a G protein-coupled receptor, Trends Phamacol. Sci. 1993, 14, 7-12.
974
I
15 Target Families
E. Jacoby, J.L. Fauchilre, E. Raimbaud, S. Ollivier, A. Michel, M. Spedding, A three binding site hypothesis for the interaction of ligands with monoamine G-protein coupled receptors: implications for combinatorial ligand design, Quant. Struct.-Act. Relat. 1999, 18, 561--572. 52. S. Filipek, D.C. Teller, K. Palczewski, R. Stenkamp, The crystallographic model of rhodopsin and its use in studies of other G protein-coupled receptors, Annu. Rev. Biophys. Biomol. Struct. 2003, 32, 375-397. 53. E. Jacoby, A novel chemogenornics knowledge-based ligand design strategy-application to G-protein coupled receptors, Quant. Struct.-Act. Relat. 2001, 20, 115-123. 54. D.R. Flower, Modelling G-protein-coupled receptors for drug design, Biochim. Biophys. Acta 1999, 1422,207-234. 55. S.M. Bromidge, S. Dabbs, D.T. Davies, D.M. Duckworth, I.T. Forbes, P. Ham, G.E. Jones, F.D. King, D.V. Saunders, S. Starr, K.M. Thewlis, P.A. Wyman, F.E. Blaney, C.B. Naylor, F. Bailey, T.P. Blackburn, V. Holland, G.A. Kennett, G.J. Riley, M.D. Wood, Novel and selective 5-HT2C/2B receptor antagonists as potential anxiolytic agents: synthesis, quantitative structure-activity relationships, and molecular modeling of substituted 1-(3-pyridylcarbamoyl)indolines, J . Med. Chem. 1998,41,1598-1612. 56. K. Bodensgaad, M. Ankersen, H. Thorgensen, B.S. Hansen, B.S. Wulff, R.P. Baywater, Recognition of privileged structures by G-protein coupled receptors, J . Med. Chem. 2004,47,888-899. 57. O.M.Becker, S. Shacham, Y. Marantz, S. Noiman, Modeling the 3D structure of GPCRs: advances and application to drug discovery, Cum. Opin. Drug. Discou. Deuel. 2003, 6 , 353-361. 58. C. Bissantz, A. Logean, D. Rognan, High-throughput modeling of human G-protein coupled receptors: amino acid sequence alignment,
51.
three-dimensional model building, and receptor library screening, /. Chem. Inj Comput. Sci. 2004, 44, 1162-1176. 59. J.M. Klco, C.B. Wiegand, K. Narzinski, T.J. Baranski, Essential role for the second extracellular loop in C5a receptor activation, Nut. Struct. Mol. Biol. 2005, 12, 320-326. 60. L. Shi, J.A. Javitch, The binding site of aminergic G protein-coupled receptors: the transmembrane segments and second extracellular loop, Annu. Rev. Pharmacol. Toxicol. 2002,42,437-467. 61. A. Evers, G. Klebe, Successful virtual screening for a submicromolar antagonist of the neurokinin-1 receptor based on a ligand-supported homology model, /. Med. Chern. 2004,47,5381-5392. 62. C. Bissantz, P. Bernard, M. Hibert, D. Rognan, Protein-based virtual screening of chemical databases. 11. Are homology models of G-protein coupled receptors suitable targets?, Proteins 2003, 50, 5-25. 63. O.M. Becker, Y. Marantz, S. Shacham, B. Inbal, A. Heifetz, 0. Kalid, S. Bar-Haim, D. Warhaviak, M. Fichman, S. Noiman, G protein-coupled receptors: in silico drug discovery in 3D, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 11304- 11309. 64. J. Ballesteros, S. Kitanovic, F. Guarnieri, P. Davies, B.J. Fromme, K. Konvicka, L. Chi, R.P. Millar, J.S. Davidson, H. Weinstein, S.C. Sealfon, Functional microdomains in G-protein-coupled receptors. The conserved arginine-cage motif in the gonadotropin-releasing hormone receptor, /. Biol. Chem. 1998, 273, 10445- 10453. 65. T. Kenakin, Protean agonists. Keys to receptor active states?, Ann. N.Y. Acad. Sci. 1997,812,116-125. 66. C.R. Grace, M.H. Perrin, M.R. DiGruccio, C.L. Miller, J.E. Rivier, W.W. Vale, R. Riek, NMR structure and peptide hormone binding site of the first extracellular domain of a type B 1 G protein-coupled receptor,
Proc. Natl. Acad. Sci. U.S.A. 2004, 101,12836-12841. 67. R.C. Gensure, N. Shimizu, J . Tsang, T.J. Gardella, Identification of a contact site for residue 19 of parathyroid hormone (PTH) and PTH-related protein analogs in transmembrane domain two of the type 1 PTH receptor, Mol. Endocuinol. 2003, 17,2647-2658. 68. S.U. Miedlich, L. Gama, K. Seuwen. R.M. Wolf, G.E. Breitwieser, Homology modeling of the transmembrane domain of the human calcium sensing receptor and localization of an allosteric binding site, ]. Bid. Chem. 2004, 279, 7254-7263. 69. A. Pagano, D. Ruegg, S. Litschig, N. Stoehr, C. Stierlin, M. Heinrich, P. Floersheim, L. Prezeau, F. Carroll, J.P. Pin, A. Cambria, I . Vranesic, P. J. Flor, F. Gasparini, R. Kuhn, The non-competitive antagonists
74.
75.
76.
77.
78.
79.
2-methyl-6-(phenylethynyl)pyridine and 7-hydroxyiminocyclopropan [blchromen-la-carboxylic acid ethyl ester interact with overlapping 80. binding pockets in the transmembrane region of group I metabotropic glutamate receptors, 1.
Bid. Chem. 2000, 275, 33750-33758. S. Angers, A. Salahpour, M. Bouvier, Dimerization: an emerging concept for G protein-coupled receptor ontogeny and function, Annu. Rev. Pharmacol. Toxicol. 2002, 42, 409-435. 71. G. Milligan, G protein-coupled receptor dimerization: function and ligand pharmacology, Mol. Pharmacol. 2004, 66,1-7. 72. S. Filipek, K.A. Krzysko, D. Fotiadis, Y. Liang, D.A. Saperstein, A. Engel, K. Palczewski, A concept for G protein activation by G protein-coupled receptor dimers: the transducin/rhodopsin interface, Photochem. Photobiol. Sci.2004, 63, 628-638. 73. J.L. Baneres, J . Parello, Structure-based analysis of GPCR function: evidence for a novel nentameric assemblv between the
70.
81.
82.
83.
84.
dimeric leukotriene B4 receptor BLTl and the G-protein,]. Mol. Biol. 2003, 329,815-829. K. Lundstrom, Structural genomics of GPCRs, Trends Biotechnol. 2005, 23, 103-108. K.H. Bleicher, L.G. Green, R.E. Martin, M. Rogers-Evans, Ligand identification for G-protein-coupled receptors: a lead generation perspective, Curr. Opin. Chem. Biol. 2004,8,287-296. P. Jimonet, R. Jager, Strategies for designing GPCR-focused libraries and screening sets, Curr. Opin. Drug. Discov. Devel. 2004, 7, 325-333. R. Crossley, The design of screening libraries targeted at G-protein coupled receptors, Curr. Top. Med. Chem. 2004,4,581-588. J . Bajorath, Integration of virtual and high-throughput screening, Nat. Rev. Drug Discov. 2002, 1, 882-894. M.M. Hann, T.I. Oprea, Pursuing the leadlikeness concept in pharmaceutical research, Cum. Opin. Chem. Bid. 2004, 8,255-263. S. Halazy, G-protein coupled receptors bivalent ligands and drug design, Exp. Opin. Ther. Patents 1999, 9,431-446. D.J. Daniels, A. Kulkarni, 2 . Xie, R.G. Bhushan, P.S. Portoghese, A bivalent ligand (KDAN-18) containing &antagonist and K-agonist pharmacophores bridges 8 2 and K , opioid receptor phenotypes, ]. Med. Chem. 2005,48,1713-1716. V.J. Hruby, Designing peptide receptor agonists and antagonists, Nat. Rev. Drug Discov. 2002, I , 847-858. M. Eguchi, M. McMillan, C. Nguyen, J.L. Teo, E.Y. Chi, W.R. Henderson Jr, M. Kahn, Chemogenomics with peptide secondary structure mimetics, Comb. Chem. High Throughput Screen. 2003, 6,611-621. T.R. Webb, in Chemogenomicsin Drug Discovery-A Medicinal Chemistry Perspective, (Eds.: H. Kubinyi, G. Miiller), Wiley-VCH, Weinheim, 2004, pp. 313-324.
976
I
75 Target Families 85.
86.
87.
88.
89.
90.
91.
92.
93.
S.L. Garland, P.M. Dean, Design criteria for molecular mimics of fragments of the beta-turn. 1. C alpha atom analysis, J. Cornput.-Aided Mol. Des. 1999, 13,469-483. S.L. Garland, P.M. Dean, Design criteria for molecular mimics of fragments of the beta-turn. 2. C alpha-C beta bond vector analysis, J. Cornput.-Aided Mol. Des. 1999, 13, 485-498. B.E. Evans, K.E. Rittle, M.G. Bock, R.M. DiPardo, R.M. Freidinger, W.L. Whitter, G.F. Lundell, D.F. Veber, P.S. Anderson, R.S. Chang, Methods for drug discovery: development of potent, selective, orally effective cholecystokinin antagonists, I. Med. Chem. 1988,31,2235-2246. A.A. Patchett, R.P. Nargund, in Annual Reports in Medicinal Chemistry, Vol. 35, (Ed.: G.L. Trainor), Academic Press, San Diego, 2000, pp. 289-298. G. Muller, in Chemogenomics i n Drug Discovery -A Medicinal Chemistry Perspective, (Eds.: H. Kubinyi, G. Muller), Wiley-VCH, Weinheim, 2004, pp. 7-42. T. Guo, D.W. Hobbs, Privileged structure-based combinatorial libraries targeting G protein-coupled receptors, Assay Drug Dev. Technol. 2003, I, 579-592. C.A. Willoughby, S.M. Hutchins, K.G. Rosauer, M.J. Dhar, K.T. Chapman, G.G. Chicchi, S. Sadowski, D.H. Weinberg, S. Patel, L. Malkowitz, J. Di Salvo, S.G. Pacholok, K. Cheng, Combinatorial synthesis of 3-(amidoalkyl)and 3-(aminoalkyl)-2-arylindole derivatives: discovery of potent ligands for a variety of G-protein coupled receptors, Bioorg. Med. Chem. Lett. 2002, 12, 93-96. R.P. Sheridan, Finding multiactivity substructures by mining databases of drug-like compounds, /. Chem. In$ Comput. Sci. 2003,43,1037-1050. G.W. Bemis, M.A. Murcko, The properties of known drugs.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
1. Molecular frameworks, J. Med. Chem. 1996,39,2887-2893. C.G. Wermuth, Selective optimization of side activities: another way for drug discovery, J. Med. Chem. 2004,47,1303-1314. R.P. Sheridan, S.K. Kearsley, Why do we need so many chemical similarity search methods?, Drug Discov. Today 2002,4,903-911. J. Hert, P.Willett, D.J. Wilton, P. Acklin, K. Azzaoui, E. Jacoby, A. Schuffenhauer, Topological descriptors for similarity-based virtual screening using multiple bioactive reference structures, Org. Biomol. Chem. 2004, 2, 3256-3266. A. Schuffenhauer, P. Floersheim, P. Acklin, E. Jacoby, Similarity metrics for ligands reflecting the similarity of the target proteins, 1. Chem. In{ Comput. Sci. 2003, 43, 391-405. N.P. Savchuck, K.V. Balakin, S.E. Tkachenko, Exploring the chemogenomic knowledge space with annotated chemical libraries, Curr.Opin. Chem. Biol. 2004, 8, 412-417. M. von Korff, M. Steger, GPCR-tailoredpharmacophore pattern recognition of small molecular ligands, J. Chem. In$ Comput. Sci. 2004,44, 1137-1147. G. Schneider, M.L. Lee, M. Stahl, P. Schneider, De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks, 1. Cornput.-Aided Mol. Des. 2000, 14, 487-494. G. Jenkins, Targeting GPCRs in silico, Curr. Drug Discov. 2004, 3, 23-26. A. Schuffenhauer, J. Zimmermann, R. Stoop, J.J. van der Vyver, S. Lecchini, E. Jacoby,An ontology for pharmaceutical ligands and its application for library design and In Silico screening, J . Chem. Inf: Comput. Sci. 2002, 42, 947-955. C.M. Krejsa, D. Horvath, S.L. Rogalski, J.E. Penzotti, B. Mao, F. Barbosa, J.C. Migeon, Predicting
References I977 ADME properties and side effects: the BioPrint approach, Curr. Opin. Drug. Discov. Devel. 2003, 6 , 470-480. 104. H. Roter, Large-scale integrated databases supporting drug discovery, C u r . Opin. Drug. Discou. Devel. 2005, 8, 309-315. 105. T.Klabunde, A. Evers, GPCR antitarget modeling: Pharmacophore models for biogenic amine binding GPCRs to avoid GPCR-mediated side effects, Chembiochem 2005, 6, 876-889. 106. E.S. Huang, Predicting ligands for orphan GPCRs, Drug Discou. Today 2005, 10,69-73. 107. A. Gaulton, T.K. Attwood, Bioinformatics approaches for the classification of G-protein-coupled receptors, Curr. Opin. Phamacol. 2003,3,114-120. 108. E.S. Huang, Construction of a sequence motif characteristic of aminergic G protein-coupled receptors, Protein Sci. 2003, 12, 1360- 1367. 109. T.M. Frimurer, T.Ulven, C.E. Elling, L.O. Gerlach, E. Kostenis, T. Hogberg, A physicogenetic method to assign ligand-binding relationships between 7TM receptors, Bioorg. Med. Chem. Lett. 2005, 15,3707-3712. 110. C.Y. Lin, M.G. Varma, A. Joubel, S. Madabushi, 0. Lichtarge, D.L. Barber, Conserved motifs in somatostatin, D2-dopamine, and alpha 2B-adrenergic receptors for inhibiting the Na-H exchanger NHE1,J. Biol. Chern. 2003, 278, 15128-15135. 111. A. Cacace, M. Banks, T. Spicer, F. Civoli, J. Watson, An ultra-HTS process for the identification of small molecule modulators of orphan G-protein-coupled receptors, Drug Discov. Today 2003,8, 785-792. 112. D.Gabriel, M. Vernier, M.J. Pfeifer, B. Dasen, L. Tenaillon, R. Bouhelal, High throughput screening technologies for direct cyclic AMP measurement, Assay Drug Dev. Technol. 2003, I , 291-303.
113. R.M. Eglen, Functional G
114.
115.
116.
117.
118.
119.
120.
121.
122.
protein-coupled receptor assays for primary and secondary screening, Comb. Chem. High Throughput Screen. 2005, 8, 311-318. C. Williams, A. Sewing, G-protein coupled receptor assays: To measure affinity or efficacy that is the question, Comb. Chem. High Throughput Screen. 2005, 8, 285-292. D. Colquhoun, Binding, gating, affinity and efficacy: the interpretation of structure-activity relationships for agonists and of the effects of mutating receptors, Br. J. Pharmacol. 1998, 125,924-947. I. Sabroe, M.J. Peck, B.J . van Keulen, A. Jorritsma, G. Simmons, P.R. Clapham, T.J. Williams, J.E. Pease, A small molecule antagonist of chemokine receptors CCRl and CCR3. Potent inhibition of eosinophil function and CCR3-mediated HIV-1 entry, /. Biol. Chem. 2000, 275,25985-25992. M. Marcoli, S. Scarrone, G. Maura, G. Bonanno, M. Raiteri, A subtype of the y-aminobutyric acid B receptor regulates cholinergic twitch response in the guinea pig ileum, J. Pharmacol. Exp. Ther. 2000, 293, 42-47. D. Gunvitz, R. Haring, Ligand-selective signaling and high-content screening for GPCR drugs, Drug Discou. Today 2003, 8, 1108- 1109. H.P. Fischer, S. Heyse, From targets to leads: the importance of advanced data analysis for decision support in drug discovery, Curr. Opin. Drug. Discov. Devel. 2005, 8, 334-346. T. Kenakin, Predicting therapeutic value in the lead optimization phase of drug discovery, Nat. Rev. Drug Discov. 2003, 2,429-438. A. Couve, A.R. Calver, B. Fairfax, S.J. Moss, M.N. Pangalos, Unravelling the unusual signalling properties of the GABA(B)receptor, Biochem. Phamacol. 2004, 68,1527-1536. M. Waldhoer, J. Fong, R.M. Jones, M.M. Lunzer, S.K. Sharma, E. Kostenis, P.S. Portoghese, J.L. Whistler, A heterodimer-selective
978
I
15 Target Families
agonist shows in vivo relevance of G protein-coupled receptor dimers, Proc. Natl. Acad. Sci. U.S.A. 2005, 123.
124.
125.
K.M. Small, D.W. McGraw, S.B. Liggett, Pharmacology and physiology of human adrenergic receptor polymorphisms, Annu. Rev. Pharmacol. Toxicol. 2003, 43,
102,9050-9055. J. Bockaert, L. Fagni, A. Dumuis, 381-411. P. Marin, GPCR interacting proteins (GIP), Phamtacol. Thher. 2004, 103, 126. E.A. Hallem, A. Nicole Fox, L.J. 203-221. Zwiebel, J.R. Carlson, Olfaction: W.E. Evans, H.L. McLeod, mosquito receptor for human-sweat Pharmacogenomics-drug disposition, odorant, Nature 2004, 427, 212-213. drug targets, and side effects, N. Engl. J . Med. 2003,348,538-549.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
75.5 Drugs Targeting Protein-Protein interactions 15.5 Drugs Targeting Protein- Protein Interactions
Patrick Che'ne
Outlook
Most of the biological processes involve permanent and nonpermanent interactions between different proteins, and many protein complexes play a key role in various human diseases. Therefore, molecules preventing the formation of these protein complexes could be valuable new therapeutic agents to treat these diseases. Protein interfaces have not evolved to bind low-molecular-weight molecules - as is the case with enzyme catalytic sites. It is therefore difficult to identify small compounds that inhibit protein-protein interactions. However, there is considerable diversity in the structure of protein interfaces, some of which may be more attractive than others for medicinal chemistry. One of the main challenges in drug discovery is therefore to identify these interfaces and to exploit their properties to make marketable drugs. In this article, the properties of protein interfaces will be studied in the light of their use as drug targets.
15.5.1 Introduction
The discovery of new drug targets is a constant challenge for pharmaceutical companies. In the last few decades, most drugs that have been developed are enzyme inhibitors [I]. One reason to explain this preference is that enzymes bind naturally to small molecules, their substrate. This therefore offers the possibility of identifying small molecules - which mimic the substrate and bind to these proteins, inhibiting their biological activity. For example, transition state analogs bind with high affinity to enzymes and are potent inhibitors [Z]. Furthermore, because enzyme inhibitors are normally small molecules they usually have an acceptable bioavailability, which facilitates their development. Currently, however, while many enzymes have still not been targeted or are in the process of being evaluated in a more systematic fashion [3], the pharmaceutical industry is looking for new opportunities outside the enzyme field. Amongst the potential candidates, inhibitors of protein-protein interactions represent an attractive new class of molecules. Many proteins, including enzymes, exert part if not all of their biological activity by interacting with other proteins. The prevention of these interactions is then a way of modulating the activities of these proteins. The structural diversity and large number of protein-protein interfaces offer an enormous amount of new targets for the pharmaceutical industry. A certain caution is Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH 6 Co. KGaA. Weinheim ISBN: 978-3-527-31150-7
1
979
980
I needed, however, because this large number of possible new targets may not 15 Target Families
be as enormous as it appears. Protein interfaces have not evolved to bind to low-molecular-weight molecules, as did enzymes. It might therefore be more difficult to identify protein-protein interaction inhibitors than it is to identify enzyme inhibitors. A second difficulty comes from the diversity of the protein interfaces. Since large families of enzymes bind to the same substrate (e.g., ATP for the kinases), it is possible to use the knowledge gained and the compound libraries that were synthesized to target the first members of the family to design more rapidly compounds that target new members of the family. This of course dramatically enhances the speed of the drug discovery process. In the case of protein-protein interactions, even if similarities have been observed between some interfaces, it does not appear that binding sites are preserved amongst protein interfaces. Therefore, each protein interface is rather unique and new strategies in chemistry (synthesis and optimization of new scaffolds) may have to be developed for each new protein-protein interaction that is studied. This is of course more time consuming and less attractive for the pharmaceutical companies because they have to maintain high productivity in the very competitive field of drug discovery. The following section presents an overview of the properties of protein interfaces followed by the application of this knowledge to the design of competitive inhibitors of protein-protein interaction.
15.5.2 The Diversity o f Protein-Protein Interfaces
In living organisms, a large number of proteins form transient or permanent complexes to exert their biological function, and recent studies have revealed the complexity of these protein-protein interaction networks [4].Since so many protein-protein interactions occur in cells, one can expect differences in the structure and composition of the regions of the proteins committed to the formation of these complexes. These differences are necessary to reach the degree of specificity needed to form the “right” complexes in the crowded cellular environment and to obtain complexes with different stabilities. For example, the protein concentration in the endoplasmic reticulum is estimated to be 100 mg mL-l [5].This diversity of the protein interfaces is an opportunity for drug discovery because it may allow more specific inhibitors to be generated. However, it is very likely that many of these interfaces do not have the properties required for the design of potent inhibitors. It is therefore very important - before starting any drug discovery process aimed at designing competitive inhibitors of protein-protein interactions - that the druggability of the selected interfaces be evaluated. This depends on both the structure and the physicochemical properties of the interface. In this section, we will summarize the general properties of protein interfaces.
15.5 Drugs Targeting Protein-Protein lnteractions
Protein complexes are formed from identical subunits (homo-oligomers) or from different subunits (hetero-oligomers) [6]. These oligomers can be formed directly during protein synthesis (obligate complexes) or on the encounter (nonobligatecomplexes) between the different subunits. The protein complexes also have a different half-life. Permanent complexes are very stable and their subunits remain associated, while others exist only transiently (nonpermanent) and their chains associate/dissociate more easily. This means that the subunits of some protein complexes (obligate/permanent) never exist in cells as stable independent structures. Furthermore, the formation of some complexes depends on the presence of effector molecules (e.g., GTP), on changes in protein expression/localization, or on physiological conditions (e.g., pH). These general properties are already valuable for drug discovery. Targeting the interface of permanent oligomers is a priori a difficult task since the only way to abolish this type of interaction is to identify compounds that act during protein synthesis/folding. However, it is conceivable that compounds may be identified that, upon binding to the contact surface of one subunit, prevent interaction with the other subunit in nonpermanent oligomers. Finally, the synthesis of compounds mimicking the natural effector might be an attractive way of inhibiting the formation of effector-regulated complexes. In this case, the inhibitors are designed in such a way that they bind not at the protein interface but to the effector-binding pocket. Depending on the structure of the effector-binding site, the design of such inhibitors might be similar to the design of enzyme inhibitors. This type of approach will not be considered here, which focuses on compounds that, on binding at protein interfaces, prevent the association between two proteins (competitive inhibitors). Upon binding, the components of a protein complex bury part of their accessible surface to create the contact interface. On average, the size of the subunit interface in permanent homodimers is larger than in other protein complexes [6, 71. Jones et al. have studied a set of 59 complexes and found that the surface buried in homodimers varies from 368 to 4746A' while in heterocomplexes it ranges from 639 to 3228A2 [8]. Janin and collaborators also found similar results [7, 91. The study of the structure of the free and associated subunits shows that they are likely to undergo -2 conformational changes when they form large interfaces (21500 A ) [lo, 111. With the exception of coexpressed subunits (obligate complexes) [ll],it does not seem that there is a strong correlation between the size of the interface and the binding energy (AG'') [12]. However, the entire contact surface does not contribute equally to binding. Some regions - recognition patches or hot spots - are more important for recognitionlbinding [13]. These regions have a core and a rim [14]with the more accessible rim residues surrounding the more buried core residues. The amino acid composition of the rim is similar to 1) AG: Gibbs free energy. The change in Gibbs
free energy is linked to change in enthalpy (AH) and entropy (AS) by the following
formula: AG = A H - TAS. A process occurs spontaneously - at constant temperature and pressure - when A G c 0.
1
981
982
I the rest of the protein surface, while the core contains more aromatic residues 15 Target Families
revealing a higher lipophilicity for this part of the contact region. There is a correlation between the number of recognition patches and the size of the interface [7, 141. The larger the interface, the more hot spots are present. However, in most cases, only one hot spot is present at the interface, and on an average it buries 1560 =t340 A’ of surface upon binding. In interfaces with multiple recognition patches, one of them is generally larger and it has a size similar to that of the hot spots found in single-patch interfaces. The presence of recognition patches at protein interfaces is interesting for drug discovery. Compounds that interact with these hot spots should prevent interaction because a large part of the binding energy is concentrated in these areas. Since the hot spots are of a smaller size that the full interface, it might be easier to identify low-molecular-weightcompounds - comparable in size to enzyme inhibitors - that inhibit interaction. By contrast, if the binding energy were equally distributed over the entire interface, much larger molecules, with a lower likelihood of success as drug-development candidates, would have to be designed. The shape of the interface is another important parameter for drug discovery because it is more difficult to obtain potent inhibitors for flat interfaces than for interfaces that contain well-defined cavities (pockets). The less flat the interface between two proteins, the greater the tendency of one partner to be buried and to form a more stable complex. The heterocomplexes have more planar interfaces than homodimers, and permanent heterocomplexes have more twisted contact surfaces than nonpermanent ones [8]. This suggests that the most attractive complexes for drug discovery - the nonpermanent complexes (see above) - have rather flat interfaces. The presence of cavities (pockets) at the contact region should therefore be looked at very carefully during the evaluation of a protein-protein interaction. Even if an interface contains cavities, they must be suitable for drug discovery. Of course, they must be large enough to accommodate inhibitors, but their shape complementarity is also important. It might be more difficult to generate potent competitive inhibitors if the two interacting chains are closely packed and make an extensive number of direct interactions*’. In contrast, if within the cavity, the shape complementarity between the two chains is low, the interacting subunits may only make a limited number of direct interactions. For such cavities, it might be easier to improve the potency of the inhibitors. A potent inhibitor should contain chemical groups that, upon binding to the target protein, mimic the key interactions (the most important for AG) made by the competing subunit and chemical groups, which make new interactions with the target protein. The creation of these additional contacts between the inhibitor and the target protein leads to a favorable 2) A direct interaction is an interaction that
does not involve any bridging water molecules between the two interacting protein subunits.
15.5 Drugs Targeting Protein-Protein lnteractions
enthalpic contribution (AH < 0) in the binding energy and, therefore, to an increased potency. “Loose” interfaces have a higher probability of containing atoms not directly involved in the formation of the protein complex than do very Complementary protein contact regions. They therefore offer more possibilities for improving the potency of inhibitors. Several methods are used to determine the complementarity between two interacting proteins [ 101. Thornton and collaborators [8] have used one method - the gap index - to measure the Complementarity of different complexes. Their results show that the homodimers and permanent heterodimers make more complementary interfaces than the nonobligatory heterocomplexes. The latter may therefore be more “druggable”. It should be kept in mind that these methods give an indication of the atom density (packing) but not of the interaction network. Since it is important - to enhance potency - that inhibitors make more interactions than the competing chain, loose packing does not necessarily imply that the cavity is a good drug target. During the study of an interface, therefore, it is important, even in the case of “loose” interfaces, to carefully check that, in addition to the key interactions made in the protein complex, it is possible to create new interactions that will help enhance the potency of the inhibitors. One consequence of a lack of complementarity between two interacting proteins is that water molecules are present at the interface to satisfy the H-bond network between the subunits. The study of different protein interfaces shows that contact regions with few cavities do not contain many water molecules, while interfaces with more cavities contain a larger number ofwater molecules that are used to maintain close packing at the interface [15]. These trapped water molecules are involved in bridging H bonds between the two chains [15]. Water is therefore an important element of the interaction, and it should be considered during drug design. The displacement of key bound water molecules by the inhibitor should enhance its affinity because of a favorable entropic effect. The presence of water molecules at the interface reflects its polar nature, but protein contact regions also contain hydrophobic areas, which are important for the interaction. In terms of energy, hydrophobic interfaces are more suitable for drug discovery than polar ones. The partial desolvation of both the protein and the inhibitor upon binding is a favorable component of the binding energy. The design of molecules that contain lipophilic moieties is then a prerequisite for obtaining potent drugs. The chemical nature of protein interfaces has been extensively studied, and their content of polarlapolar groups analyzed [6-10, 14, 16-18]. On an average, 56% nonpolar carboncontaining groups, 29% neutral polar groups, and 15% charged groups are present at protein interfaces [lo]. The interfaces in permanent complexes are generally more hydrophobic than the ones of nonpermanent complexes [6]. This could be explained by the fact that solvent-exposed hydrophobic patches are energetically unfavorable and that subunits with hydrophobic surfaces are therefore not stable. The presence of hydrophobic cavities at the interface
I
983
984
I
15 Target Families
between two proteins is particularly attractive for drug discovery because it allows the design of lipophilic molecules which, upon binding, become buried in a hydrophobic environment. Another feature of the interaction between two proteins is the loss of flexibility of their contact regions upon binding. It is expected from thermodynamics that better binding is obtained when the interaction does not induce a large loss of conformational entropy. Indeed, it has been shown that protein interacting sites are less flexible than the rest of the protein surface [19]. The loss of flexibility that occurs during the association between two proteins can be advantageously used to design inhibitors. The design of compounds conformationally constrained in such a way that they already take on their bound conformation in solution is a way of improving potency. Such molecules will not undergo large conformational changes upon binding, and they will therefore “pay” a decreased entropic penalty when compared with more flexible inhibitors. Altogether, this short summary indicates that there is no common recognition template used by oligomeric proteins to form complexes. In contrast, even if protein interfaces share some general properties, they differ to a large extent. It is therefore very difficult to make a general statement about the druggability or nondruggability of protein interfaces. Amongst the large number of protein interfaces, some are more druggable, and the major challenge for drug discovery is to identify them.
15.5.3 A Proposed Decision Tree to Select Interfaces for Drug Discovery
To help in identifying druggable interfaces, a decision tree is proposed. Two points need to be addressed before describing this tree. First, drug discovery is not - at least today - an exact science. So even if an interface does not fit the decision tree, it might still be possible to obtain molecules that prevent its formation. This leads to the second point: the potency of protein-protein interaction inhibitors. In many cases, molecules (peptides or low-molecularweight compounds) with IC503) in the micromolar range are described as protein-protein interaction inhibitors. However, a large number of these compounds - while they may be useful tools to study the interaction - will never enter clinical use, which is the ultimate goal for pharmaceutical companies. These molecules need further optimization to achieve this goal. Protein-protein interaction inhibitors will only be considered attractive as new drugs when they demonstrate clinical efficacy, as do enzyme inhibitors. Such drugs can only be obtained if the target interface allows the design of potent 3)
IC50: concentration of inhibitor required to inhibit 50% of the interaction between two proteins.
15.5 Drugs Targeting Protein-Protein Interactions
and bioavailable molecules. A detailed analysis of the interface to assess its druggability is therefore required before any drug discovery programme can be started. The proposed decision tree may help in selecting the interfaces that possess the structural and physicochemical properties required for the design of potent inhibitors (Fig. 15.5-1). Just as it is easier to pick cherries from a tree in daylight than during a moonless night, so too is it easier to guide the drug discovery process when it is possible to see the structure of the target interface. A drug discovery programme can be successful even without using any structural information, but it might be harder and take longer to obtain potent molecules without this precious knowledge. The structure of the interface should help in deciding
Interface to evaluate r-
I Structure I
Hydrophobicity
I
Complementarity
LF +/MF Attractive interface
Fig. 15.5-1 A decision tree t o evaluate the druggability o f protein interfaces. This tree can be used to determine whether a selected interface possesses some of the features required for drug discovery. LF - less favorable; MF favorable,
I
985
986
I whether it is druggable 15 Target Families
- using the criteria described below - but it will also help improve the potency of the compounds during their optimization. This explains why the availability of the interface structure is considered the most favorable case in the decision tree. The second criterion in the decision tree is the presence of cavities at the interface. The most favorable case is when a well-defined binding pocket is found at the contact region between both proteins. The presence of such a pocket allows the formation of a stable inhibitor-protein complex when the inhibitor mimics the protruding chain. The contact region of some uncomplexed proteins is flexible and, upon binding, this plasticity/flexibility allows conformational changes that enhance interface complementarity. The structure of the final protein complex may therefore not reveal the presence of cavities that are present on the surface of the unbound proteins but absent in the final complex. Compounds that bind to these pockets could block the conformational changes required for the formation of the complex preventing the interaction. The knowledge of the structure of the unbound proteins is therefore very useful to identify this type of pocket. The next selection criterion concerns the polarity of the selected cavity. In the most favorable case, it should contain hydrophobic residues to favor the design of lipophilic inhibitors. The addition of hydrophobic substitutions (taking care to ensure their solubility) is an effective way of improving the potency of an inhibitor thanks to the hydrophobic effect. It has been shown that electrostatic interactions are important for the rate of association, but not for the stability of protein complexes [20]. Furthermore, electrostatic interactions are weakened by the high dielectric constant of water. It might therefore be more difficult to identify inhibitors that bind tightly to the target cavity when it is essentially polar. The presence of a hydrophobic cavity is important, but its size is also relevant for drug discovery. It should be large enough to accommodate an inhibitor. An analysis of 20 marketed drugs shows that they have a solvent-accessible surface ranging from 150 to 500 A2 [21], so the target cavity should accommodate such molecules. On the other hand, the cavity should not be so large that the key contact residues for the interaction are too distant from each other. In such cases, inhibitors designed to contact these different residues might be excessively large. Keeping the size of inhibitors small is important for their bioavailability. As a general trend, the larger a synthetic molecule is, the lower its bioavailability. The last criterion of the decision tree is the shape complementarity between the two interacting subunits within the cavity. The less favorable case is when both chains are densely packed and make many direct interactions within the cavity. As already mentioned, inhibitors should mimic the natural substrate but in addition they should make additional contacts that help enhance their potency. The cavity should therefore contain atoms that are not directly engaged in the interaction between the two proteins such that it is possible to design molecules that interact directly with them. Interfaces that possess
15.5 Drugs Targeting Protein- Protein Interactions
cavities with low complementarity might therefore be more attractive. Since water molecules are present in such cavities, the potency of the inhibitors could be enhanced if they are designed in such a way that upon binding they displace some key water molecules. The analysis of protein interfaces using the proposed decision tree leads to the selection of competitive inhibitors because it focuses on the characterization of the contact region between the two proteins. However, it is important to note that molecules that do not bind at the interface can also inhibit protein-protein interactions. The potency of competitive inhibitors - as determined by the measure of their ICso - is affected by the concentration of the substrate (Fig. 15.5-2).The higher the concentration of the substrate, the less potent the inhibitor becomes. Therefore, if the competing subunit is very abundant and/or very stable (low turnover), so that it accumulates after inhibition of the interaction, it might be more difficult to reach efficacy with low doses of a competitive inhibitor. Higher doses of inhibitor will have to be administered to counterbalance this effect, but then compound-related toxicity could arise. Molecules that are not competitive inhibitors do not suffer these disadvantages. These molecules, which do not bind at the interface, induce conformational changes that prevent complex formation. Several such inhibitors - allosteric inhibitors - have been identified; see, for example, Arkin in Table 15.5-1. However, it is very likely that this strategy does not apply to every protein complex. Furthermore, if such binding sites do exist, they must also possess structural and physicochemical properties that allow the design of potent compounds.
(PI,
A
I+P, + P
Fig. 15.5-2 Competitive inhibition. The inhibitor (I) binds t o the target protein P1 blocking its association with protein P2. 150 corresponds to the concentration o f inhibitor required to inhibit/inactivate the
*
A PIP,
PI P2 complex by 50%. Note the influence of [S]on 1 5 ~ . Cheng and Prusoff have published a detailed analysis on the relationship between IC50 and inhibition of enzymes [22].
I
987
988
I
15 Target Families
Table 15.5-1 Some articles reviewing the latest findings in the discovery of protein-protein interaction inhibitors. These articles cover the period 2000-2004 First author
Title
References
Arkin, M.R.
Small-molecule inhibitors of protein-protein interactions: progress toward the dream Emerging classes of protein-protein interaction inhibitors and new tools for their development Peptides with anticancer use or potential Modulation of protein-protein interactions with small organic molecules Protein-protein interactions as targets for antiviral chemotherapy Inhibitors of protein-protein interactions
Nat. Rev. Drug. Discov.
Pagliaro, L. Janin, Y.L. Berg, T. Loregian, A. Ockey, D.A. Huang, Z.
Perez-Montfort, R. Toogood, P.L. Cochran, A.G. Zeng, J.
The chemical biology of apoptosis: exploring protein-protein interactions and the life and death of cells with small molecules The interfaces of oligomeric proteins as targets for drug design against enzymes from parasites Inhibition of protein-protein association by small molecules: approaches and progress Antagonists of protein-protein interactions Computational structure-based design of inhibitors that target protein surfaces
2004,3,301
C u r . Op. Chem. Bid.
2004, 8,442
A m i n o Acids 2003, 25, 1 Angew. Chern. 2003, 42, 2462
Rev. Med. Virol. 2002, 12,
239
Expert Opin. Ther. Patents 2002, 12, 393 Chem. Biol. 2002, 9,1059 Curr. Top. Med. Chem.
2002,2,457
1.Med. Chem. 2002, 45, 1543
Chem. Biol. 2000, 7, R85 Combi. Chem. High Throughput Screen 2000, 3,355
Huang, Z
Structural chemistry and therapeutic intervention of protein-protein interactions in immune response, human immunodeficiency virus entry, and apoptosis
Pharmacol. Trer. 2000, 86,201
15.5.4 Experimental Validation of the Selected Interface
All the selection criteria presented in the decision tree in Fig. 15.5-1 are general, and many protein interfaces will only fulfill some of them. In these cases - and also for interfaces that meet all the decision tree criteria - an experimental study of the interface should be carried out before starting drug discovery activities. This experimental validation should enable a good level of confidence to be obtained on the druggability of the selected interface. A powerful way of performing this experimental validation is to combine site-directed mutagenesis and peptide-binding experiments. Site-directed mutagenesis is used to demonstrate the role of selected residues in the interaction, while peptides will help in mapping the binding site and also in defining the importance of key amino acids. The synthesis of peptides containing nonnatural amino acids can also be used to create new contacts with the targeted subunit. This should help in validating some optimization
15.5 Drugs Targeting Protein-Protein interactions
strategies that could be used later on in the design of low-molecular-weight compounds. It must be kept in mind that peptides can be used only if at least one of the two contact regions at the interface is formed by a contiguous stretch of amino acids. This is not often the case, and many protein-binding sites are fragmented [8]. Peptides are also useful tools to demonstrate the validity of the biological concept and thereby show that the inhibition of the selected protein-protein interaction leads to the expected phenotype. Since peptides generally have a low bioavailability, they often have to be coupled to special sequences that facilitate their transport into cells [ 2 3 ] . Finally, the peptides can serve as starting points for a drug discovery programme. They can be transformed to peptidomimetics that - in some cases - can be further depeptidized.
15.5.5 Screening Techniques, Compound Libraries, and Targets
Since the goal of any drug discovery programme that deals with a protein-protein interaction is to identify low-molecular-weight compounds that bind to a well-defined pocket, the technologies and compound libraries used to identify enzyme inhibitors can also be used to identify protein-protein interaction inhibitors. Various assays are used to identify competitive inhibitors of protein-protein interactions, but the ones in which the inhibition of the complex is directly measured - competition assays - are the most commonly used. Several assay formats exist: enzyme-linked immunosorbent assay (ELISA), fluorescence polarization, fluorescence resonance energy transfer, and others. These assays are designed in such a way that they use either the two fulllength proteins, only their interacting domains, or even, when possible, peptides that mimic the binding region. One must be very cautious with this type of assay when determining ICsos. The potency of competitive inhibitors depends on the amount of the competing protein present in the assay (Fig. 15.5-2). The amount of competing protein present in the assay may vary between laboratories and even between different protein batches (change in specific activity). To obtain an accurate estimate of the binding properties of the inhibitors, their h4’ should be measured. The data obtained with the competition assay should therefore be completed with the & measurements obtained, for example, by isothermal calorimetry. Calorimetric measurements also provide valuable information about the energy of the interaction, which can be used to further optimize the compounds (e.g., to generate more enthalpy-driven or entropy-driven compounds [24]). 4) &: apparent dissociation constant of the
protein-inhibitor complex.
1
989
990
I
75 Target families
The other assays used to identify protein-protein interaction inhibitors are the binding assays. In these cases, only one of the two interacting chains is present and the binding of the compounds to this protein is measured. Several assay formats are used: surface plasmon resonance, * H-”N heteronuclear single quantum correlation nuclear magnetic resonance (NMR), ultracentrifugation, and others. Many of these methods only indicate that the compounds bind to the target protein, but they do not show that their binding inhibits the interaction. This needs to be demonstrated in a subsequent analysis (e.g., with a competition assay). It is important to note that, in some competition and binding assays, it is difficult to directly determine whether the inhibiting molecules are competitive inhibitors. The inhibitors may bind to a pocket located outside the interacting region and modulate the interaction by an allosteric effect. To allow a better optimization of these inhibitors, their binding mode should be firmly demonstrated. It is essential in this process to determine the structure of the inhibitor-protein complex. All types of compound libraries can be screened to identify protein-protein interaction inhibitors: low-molecular-weight compound libraries, natural compound libraries, peptide/peptidometic libraries, combinatorial chemistry libraries, fragment libraries, and so on. A simple literature survey shows that molecules belonging to these different types of libraries are described as protein-protein interaction inhibitors. However, there is an argument sometimes cited in the literature about the diversity of compounds in these libraries: the libraries available in the pharmaceutical companies reflect their drug discovery history. Since most of them have focused on the design of enzyme inhibitors, it is possible that the structural diversity of their libraries might not match what is required to identify protein-protein interaction inhibitors. Although this might be the case, the increasing number of drug discovery programmes dealing with protein interfaces will ensure that the chemical diversity of these libraries will change, and they may contain more compounds that prevent protein-protein interactions. An alternative reason to explain the low success rate when randomly screening large libraries for protein-protein interaction inhibitors is that the selected interfaces have low druggability and that, independent of the chemical diversity of these libraries, the probability of finding inhibitors is also low. The availability of the three-dimensional structure of the protein complex allows structure-driven drug discovery approaches. In this case, a pharmacophore model is first established. This corresponds to identifying the interactions that take place at the interface and which contribute most to AG. The importance of these interactions can be validated by site-directed mutagenesis or, when possible, by the use of peptides. Once these interactions are validated, molecules containing chemical groups mimicking these key interactions are selected from compound libraries and tested. Very often these initial molecules are not optimal (e.g., they do not make all the key contacts) and they must be modified to enhance their potency. This is done, for example,
75.5 Drugs Targeting Protein-Protein lnteractions
by adding the missing pharmacophores and/or by creating contacts that are not present in the natural complex. Alternatively, de novo drug design may be carried out. In this case a “very basic” scaffold - which mimics only few of the key interactions made by the competing subunit - is selected and modified progressively to obtain molecules that contain the different pharmacophores. This of course is very time consuming and resource demanding, because the affinity of the initial scaffold is usually low and a great deal of chemistry is required to improve its potency. The structure-drivenand screening approaches are not mutually exclusive, but the former require good comprehension of the interaction, while the latter can be used without information regarding the target interface. The list of protein-protein interactions that have been the subject of drug discovery programmes is constantly increasing, and many excellent articles have reviewed the latest findings in this area. Some of these reviews are listed in Table 15.5-1, and they can provide the reader with idea of the protein-protein interactions that have already been selected as targets for drug discovery programmes and on the inhibitors that have been identified in these studies. In the following section, we will focus on one protein interface: the p53-hdm2 interface. This protein-protein interaction has been selected from the literature because, when the various results are put together, the work carried out by the different research groups working on this interface makes up a very nice case study for the design of competitive protein-protein inhibitors. 15.5.6 An Example: The Design o f Inhibitors ofthe p53-hdm2 Interaction
15.5.6.1
Biological Background
The p53 protein is a transcription factor that regulates the expression of several genes with different biological functions, such as cell-cycle regulation, apoptosis, DNA repair, and differentiation [25]. The loss of p53 function has dramatic consequences and the p53 gene is deleted or mutated in more than 50% of human cancers [26]. The overexpression of the hdm2 protein can also lead to the inactivation of p53. The p53 and hdm2 proteins form an autoregulatory feedback loop [27, 281: p53 stimulates the expression of hdm2, which in turn acts negatively on p53 in several ways (Fig. 15.5-3). It inhibits its transcriptional activity [29], promotes its degradation [30, 311, and favors its export from the nucleus [32]. The hdm2 gene is amplified in about 7% of human cancers [33], and hdm2 is overexpressed in different types of tumors [34, 351. It is therefore likely that the p53 pathway is not active in these tumors, because the overexpressed hdm2 protein constantly inhibits the p53 protein. The idea that several pharmaceutical companies have pursued is to generate molecules, which by preventing the p53-hdm2 interaction will activate the p53 pathway in these tumors and thereby show anticancer activity.
I
991
992
I
15 Target Families
Fig. 15.5-3 Regulation o f p53 by hdm2. The tumor suppressor p53 is a tetrameric transcription factor. Upon various stress conditions such as DNA damage, and activation of various oncogenes or hypoxia, p53 is activated and binds to DNA. Depending on the cell line and/or the cellular stress, p53 induces either a cell-cycle arrest or apoptosis. p53 is also able to
15.5.6.2
mediate other biological answers such as senescence. hdm2 is a negative regulator o f p53. Upon binding t o p53 it inhibits its transcriptional activity, promotes its degradation, and favors its export from the nucleus. Therefore, in the presence of hdm2 the tumor suppressor activity o f p53 is inhibited.
Characterization of the Interface
Yeast two-hybrid screen [3G]and immunoprecipitation experiments [37]were initially used to map the two contact regions between both the proteins. The hdm2-binding domain on p53 was localized between residues 1 and 52 [36, 371, and the p53-binding domain on hdm2 between residues 1 and 118 [36, 371. Further studies using site-directed mutagenesis identified Leul4, Phel9, Leu22, and Trp23 as key p53 contact residues [38], and a minimal hdm2-binding site on the p53 protein was mapped between residues 18 and 23 [39]. The strength of the interaction (&)between p53 peptides and hdm2 fragments has been determined by several methods and, depending on the length of these fragments and the methodology used, & values between GO and 700 nM have been obtained. The availability of the structure of a p53 peptide (residues 15-29) in complex with a hdm2 fragment (residues 17-125) permits a more detailed analysis of the interface (Fig. 15.5-4(a))[40].The p53-binding site on the hdm2 protein is a cleft, about 25 A long and 10 A wide. In the bound p53 peptide, residues 19 to 25 form an a-helix, and residues 17, 18, and 26 to 29 take a more
15.5 Drugs Targeting Protein-Protein lnteractions
extended conformation. The structure of the bound p53 peptide is stabilized by several intramolecular hydrogen bonds. This first observation indicates that hdm2 is the only one of the two proteins to possess a well-defined pocket. Inhibitors then have to be designed in such a way that they mimic p53. The calculated accessible surface area buried at the interface on hdm2 and p53 is about 660 and 809A2, respectively. So the interface between these two proteins is not excessively large, and it can accommodate standard sized drugs (see above). The determination of the planarity [8]of the hdm2 contact region is 3.1. This confirms that the contact region is not flat but twisted in agreement with the presence of the above-described pocket. NMR experiments show that p53-derived peptides do not take a well-defined structure in solution [41, 421 suggesting that the p53 fragments only adopt the observed helical conformation when bound to hdm2. This structural organization of p53 upon binding is associated with a decrease in entropy, and experimental data give a change in entropy of -40.4 cal mol-' for the binding of a p53 fragment to hdm2 [43]. Upon p53 binding, conformational changes are also detected within the hdm2 protein [44, 451. The interaction between p53 and hdm2 is essentially hydrophobic, and 70% of the atoms at the interface are nonpolar. The three amino acids Phel9, Trp23, and Leu26 from p53 are located on the same side of the helix and their lateral chain point is toward the hdm2 protein (Fig. 15.5-4(a)).These amino acids make several interactions with hydrophobic hdm2 residues (Leu54, Leu57, Ile61, Met62, Tyr67, Va175, Va193, Phe86, Ile99, Phe91, and Ile103). Only three direct hydrogen bonds are present at the interface (p53 Phel9 - hdm2 Gln72; p53 Trp23 - hdm2 Leu54; p53 Am29 - hdm2 TyrlOO),and there is no water molecule bridging the two contact regions. This suggests high packing at the interface. Indeed, the gap volume [46]between both proteins is 892 A3 and the gap volume index (ratio between the gap volume and the interface accessible surface area) [8]is 0.61 A. Altogether the structural study of the p53-hdm2 interface suggests that there is a good likelihood of it being a druggable target. It fits most of the criteria of the decision tree presented in Fig. 15.5-1 (except for its high shape complementarity). Furthermore, since the p53 contact region is formed by only one segment of contiguous amino acids, peptides mimicking p53 can be used to establish/confirm a pharmacophore model and to study the effect of the inhibition of the p53-hdm2 interaction in tumor cells.
15.5.6.3
Establishment o f a Pharmacophore Model and its Validation
The structure of p53 in complex with hdm2 [40] and the initial data obtained with p53-derived peptides [36, 391 indicate that peptides can be used to study this interaction to establish a pharmacophore model. Phage display experiments [47]allowed the identification of a 12-mer phage-derived peptide (peptide 2, Table 15.5-2)that is 29 times more potent than the wild-type peptide (peptide 1,Table 15.5-2)[48].Peptide 2 was truncated to eight residues, leading to a peptide with micromolar activity (peptide 3, Table 15.5-2) [48]. It should
I
993
994
I
15 Target Families
Fig. 15.5-4 The structure o f p53 (residues
17 t o 29) in complex with hdm2 (residues 25 t o 109) [40]. (a) The surface o f hdm2 is represented in white, the p53-binding site in green, and the p53 peptide in red. The lateral chains o f p53 Phel9, Trp23, and Leu26 are shown. (b) p53 Leu22 has been replaced by a tyrosine residue and the lateral
chain is manually located in the structure of the p53-hdm2 complex. The backbone o f the p53 peptide is shown in gray and hdm2 Lys94 i s represented. (c) The different hdm2 residues (Leu57, Phe86, lle99, and lle103) surrounding p53 Trp23 are indicated and their van der Waals surface is represented in green.
15.5 Drugs Targeting Protein-Protein lnteractions
1
995
Table 15.5-2 Example of peptidic inhibitors used as tool compounds for studying the p53-hdm2 interaction. The lC50 values were obtained in a competition assay [49]. The position of the three key residues Phel9, Trp23, and Leu26 is indicated
Peptide
Sequence
G o ( F M)
Ac-Gln-Glu-Thr-Phe’9--Ser-Asp-Leu-Trp23-Lys-Leu-Leu26-Pro-NH~ 8.7 Ac-Met-Pro-Arg-Phe”-Met-A~p-Tyr-Trp~~-Glu-Gly-Leu~~-Asn-NH~ 0.3 A~-Phe”-Met-Asp-Tyr-Trp~~-Glu-Gly-Leu~~-N HZ 8.9 Ac-Phe” -Met-Aib-Tyr-Trpz3 -Glu-Ac3 c-Leuz6- N H1 2.2 A~-Phe”-Met-Aib-Pmp-GC1Trp~’-Glu-A~3~-Leu~~-NHz 0.005
1 2 3
4 5
Aib - a-amino isobutyric acid; Acjc - I-amino-cyclopropanecarboxylicacid; Pmp phosphonomethylphenylalanine; 6-CI-Trp 6-chloro-tryptophan. ~
~
be noted that further deletions of peptide 3, which remove the essential residues Phel9 or Leu26, induce a dramatic drop in activity. Since short peptides are usually very flexible in solution and because the bound p53 takes a well-ordered structure when bound to hdm2, the next step was to decrease peptide 3 flexibility to decrease the entropic penalty “paid” upon binding. The two nonnatural amino acids - a-amino isobutyric acid (Aib) and l-aminocyclopropanecarboxylic acid ( A c ~ c ) were used to fix the conformation of the peptides in solution [49, 501. Different peptides were synthesized and the more potent peptide 4 was obtained (peptide 4, Table 15.5-2). N M R measurements confirm a higher preorganization in solution for peptide 4. This peptide was modified to determine whether its potency can be improved by making new interactions with the hdm2 protein. Tyr22 was replaced by a phosphonomethylphenylalanine (Pmp) and Trp23 replaced by a 6-chlorotryptophan (6-C1-Trp)[49].The modification at p53 Tyr22 creates a salt bridge with the amino group ofhdm2 Lys94 (Fig. 15.5-4(b)).The addition ofa chlorine atom at position 6 on Trp23 was used to fill a small hydrophobic cavity formed by the hdm2 residues Leu57, Phe86, Ile99, and Ile103, which are unoccupied in the p53-hdm2 complex (Fig. 15.5-4(c)).Making these new contacts via Pmp22 and 6-Cl-Trp23 results in an approximately 440-fold increase in the potency of peptide 4 (compare peptides 4 and 5, Table 15.5-2). This gain in potency is probably associated with a more favorable enthalpic contribution in the binding energy. Altogether this study with the peptides shows that the key contacts made by p53 Phel9, Trp23, and/or Leu26 are important for the binding of p53 to hdm2 and, therefore, nonpeptidic inhibitors should mimic these important interactions. The work carried out with the peptides containing nonnatural amino acids also indicates that, despite the high complementarity of the interface, it is possible to create additional interactions with hdm2
996
75 Target Families
I (e.g., Pmp22 and 6-C1-Trp23)enhancing the potency of the inhibitors. This could also be exploited with nonpeptidic inhibitors. The peptides were also used to demonstrate that inhibition of the p53-hdm2 interaction in tumor cells leads to activation ofthe p53 pathway. Three different strategies have been used to introduce the p53 peptides in cells. Peptide 2 has been inserted into the Escherichia coli thioredoxin protein [Sl] or fused to the glutathione S-transferase protein [52]and peptide 5 has been directly used without further modification [53, 541. The data obtained with these different tools reveal that p53-hdm2 interaction inhibitors stimulate p53 activity (as measured by the induction of p53-regulated genes) in different tumor cells. These results are expected, since preventing the hdm2-mediated degradation of p53 should induce its accumulation in cells and, as a consequence, its activation. The activation of p53 by the peptides induces either a cell cycle or apoptosis, depending on the tumor cell lines, revealing that p53-hdm2 inhibitors have an antiproliferative effect and, therefore, they behave as anticancer drugs. The peptides were also used to study the effect of inhibiting the p53-hdm2 interaction in vivo. A p53 peptide (residues 16-27) was linked to the Tat transduction sequence and was used in New Zealand white rabbits with intraocular retinoblastoma [55]. Injecting this peptide into the interior chamber induced tumor regression, and apoptosis was observed. This effect is specific to the tumor cell, since the peptide induced damage only to the tumor and not to the surrounding ocular tissues (lens, cornea, retina, etc.). These in vivo experiments suggest that inhibitors of p53-hdm2 have an anticancer activity in vivo and, in addition, they may not be toxic to nontumour tissues. This latter information is of importance, since p53-hdm2 inhibitors also activate p53 in nontumour cells [54]. Biological validation is a key step in any drug discovery programme because, even if a protein-protein interaction is a “top” drug target for medicinal chemistry, its inhibition should lead to the expected biological output. In the case of the p53-hdm2 interaction, the results obtained both in vitro and in vivo tend to demonstrate that inhibitors of this interaction will exert an anticancer activity in at least some tumors.
15.5.6.4
The Synthesis o f Low-molecular-weightCompounds
For many years, the only synthetic low-molecular-weight inhibitors of the p53-hdm2 interaction described were not very potent. Only chalcone derivatives (G - Fig. 15.5-5) [45], some polycyclic compounds (7 - Fig. 15.5-5) [56], and sulfonamides (8 - Fig. 15.5-5) [57] were described. A fungal metabolite, chlorofusin (9 - Fig. 15.5-5),was also described as an inhibitor of the p53-hdm2 interaction [58]. Finally, 1,4-benzodiazepine-2-ones were proposed from a computational approach (10 - Fig. 15.5-5) [59]. These data were not very encouraging and, despite the attractiveness of this approach, it seemed that not only the druggability of the p53-hdm2 interaction
I
75.5 Drugs Targeting Protein-Protein interactions 997
-
O
q
OH O
6 CI
Fig. 15.5-5
Low-molecular-weight inhibitors ofthe p53-hdrn2 interaction. 6 Chalcone derivative [45], 7 polycyclic compound [56], 8 sulfonamide [57], 9 chlorofusin, 1 0 1,4-benzodiazepine-2-one [59], 11 cis-imidazoline [60].
was not as good as predicted by the structural analysis of the interface but also obtaining potent low-molecular-weight inhibitors was not an achievable goal. However, scientists at Hoffmann-La Roche recently demonstrated the feasibility of inhibiting the p53-hdm2 interaction with low-molecular-weight compounds. Since the publication of the first reports on peptidic inhibitors of the p53-hdm2 interaction, it took about 10 years to obtain such results! By screening a library of synthetic chemicals, Vassilev et al. were able to identify cis-imidazolines (11 - Fig. 15.5-5),which they optimized for potency and specificity [60]. These compounds bind at the p53-binding site on hdm2, and their different substitutions mimic the key contacts made by p53 Phel9, Trp23, and Leu26 (Fig. 15.5-6). Furthermore, the halogen (C1 or Br) present on one of their phenyl groups mimics the chlorine atom of 6-C1-Trpin peptide 5. Finally, these molecules build up around a heterocycle and have a rigid conformation that minimizes the entropic contribution upon binding. Their potency (ICso), measured in a competition assay, is in the 100 to 300nM
Q
10
998
I
75 Target Families
Fig. 15.5-6 Binding mode of cis-imidazoline and p53 peptide. The structures of the cis-imidazoline-hdm2 complex [60]and the p53-hdm2 complex
[40] have been superimposed. Only the bound cis-imidazoline and the p53 peptide (in red) are represented. The lateral chain of p53 Phel9, Trp23, and Leu26 are shown.
range. These compounds are active in various tumor cells (IC50 between 1 and 2 pM), in which they induce the activation of the p53 pathway. More importantly, they show efficacy as single agents in a tumor model in mice. One of these compounds (11- Fig. 15.5-5) given orally at a dose of 200 mg kg.-' twice daily for 20 days induces 90% inhibition of tumor growth (i.e., of cells overexpressing hdm2). This treatment does not induce toxicity as measured by bodyweight measurements and necropsy. These data are highly encouraging, and it will be very exciting to see the effect of these molecules - or of their follow-up - in the clinic.
15.5.7 Conclusions
The design of protein-protein interaction inhibitors is a hot topic in drug discovery today because many protein interfaces are exciting targets for pharmaceutical companies. However, one should be cautious while making any assumption that designing protein-protein interaction inhibitors will be a new Eldorado for the pharmaceutical industry or conversely that programmes based on protein-protein interactions should be avoided because of the low probability of obtaining potent inhibitors. Protein interfaces are quite unique,
References I999
and the only way to decide whether an interface is a “good” or a “bad” target for drug discovery is to carry out a careful analysis of its structure before starting any drug discovery activity. This should help in selecting better targets, thereby reducing the risk of investing time and resources in programmes that do not deliver the expected molecules. The p53-hdm2 interaction is one example of the interfaces that have been successfully targeted with low-molecular-weight compounds (see also Table 15.5-1). Many other protein-protein interactions are under investigation, and it is likely that new inhibitors of protein-protein interaction will be described in the future.
References 1.
2.
3.
4.
5.
6.
7.
a. 9.
10.
A.L. Hopkins, C.R. Groom, The druggable genome, Nat. Rev. Drug Discov. 2002, 1, 727-730. A.R. Fersht, Enzyme Structure and Mechanism, 2nd ed., Freeman, New York, 1985. P. Chene, The ATPases: a new family for a family-based drug design approach, Expert Opin. Trter. Targets 2003, 7,453-461. S. Li, C.M. Armstrong, N. Bertin, H. Ge, S. Milstein, M. Boxem, P.O. Vidalain, J.D. Han, A. Chesnau, T. Hao, D.S. Goldberg, A map of interactome network of the metazoan C. elegans, Science 2004, 303, 540-543. B. Kleizen, I. Braakman, Protein folding and quality control in the endoplasmic reticulum, Curr. Opin. Cell Biol. 2004, 16, 343-349. I.M.A. Nooren, J.M. Thornton, Diversity of protein-protein interactions, E M B O ] . 2003, 22, 3486-3492. R.P. Bahadur, P. Chakrabarti, F. Rodier, J. Janin, Dissecting subunit interfaces in homodimeric proteins, Proteins 2003, 53, 708-719. S. Jones, J.M.Thornton, Principles of protein-protein interactions, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 13-20. L. Lo Conte, C. Chothia, J. Janin,The atomic structure of protein-protein recognition sites, /. Mol. Biol. 1999, 285,2177-2198. S.J.Wodak, J. Janin, Structural basis of macromolecular recognition, Adu. Protein Chem. 2003, 61, 9-73.
I.M. Nooren, J.M. Thorton, Structural characterisation and functional significance of transient protein-protein interactions, 1.Mol. Biol. 2003, 325, 991-1018. 12. N. Brooijmans, K.A. Sharp, I.D. Kuntz, Stability of macromolecular complexes, Proteins 2002, 48, 645-653. 13. W.L. DeLano, Unraveling hot spots in binding interfaces: progress and challenges, Curr. Opin. Struct. Bid. 2002, 12, 14-20. 14. P.Chakrabarti, J. Janin, Dissecting protein-protein recognition sites, Proteins 2002, 47, 334-342. 15. J . Janin, Wet and dry interfaces: the role of solvent in protein-protein and protein-DNA recognition, Structure 1999, 7, R277-R279. 16. C.J. Tsai, S.L. Lin, H.J. Wolfson, R. Nussinov, Protein-protein interfaces: architectures and interactions in protein-protein interfaces and in protein cores. Their similarities and differences, Crit. Rev. Biochem. Mol. Biol. 1996, 3 1 , 127-152. 17. T.A. Larsen, A.J . Olson, D.S. Goodsell, Morphology of protein-protein interfaces, Structure 1998, 6, 421-427. 1a. Y. Ofran, B. Rost, Analysing six types of protein-protein interfaces, /. Mol. Biol. 2003, 325, 377-387. 19. C. Cole, J . Wanvicker, Side-chain conformational entropy at protein-protein interfaces, Protein Sci. 2002, 1 I , 2860-2870
11.
1000
I
15 Target Families 20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
J.A. Wells, Binding in the growth hormone receptor, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 7-12. T.R. Gadek, J.B. Nicholas, Small molecule antagonists of proteins, Biochem. Pharmacol. 2003, 65, 1-8. Y.C. Cheng, W.H. Prusoff, Relationship between the inhibition constant (Ki) and the concentration of inhibitor which causes 50 per cent inhibition (IC50) of an enzymatic reaction, Biochem. Pharmacol. 1973, 22,3099-3108. J.J.Schwartz, S. Zhang, Peptide-mediated cellular delivery, C u r . Opin. Mol. Tner. 2000, 2, 162-167. A. Velazquez-Campoy, I. Luque, E. Freire, The application of thermodynamic methods in drug design, Tnermochim. Acta 2001, 380, 217-227. K.H. Vousden, X. Lu, Live or let die: the cell’s response to p53, Nat. Rev. Cancer 2002, 2, 594-604. T. Soussi, K. Dehouche, C. Beroud, p53 website and analysis of p53 gene mutations in human cancer: forging a link between epidemiology and carcinogenesis, Hum. Mutat. 2000, 15, 105-213. S.M. Picksley, D.P. Lane, The p5 3-mdm2 autoregulatory feedback loop: a paradigm for the regulation of growth control by p53?, BioEssays 1993, 15,689-690. X. Wu, J.H. Bayle, D. Olson, A.J. Levine, The p53-mdm-2 autoregulatory feedback loop, Genes Deu. 1993, 7,1126-1132. J. Momand, G.P. Zambetti, D.C. Olson, D. George, A.J. Levine, The mdm-2 oncogene product forms a complex with the p53 protein and inhibits p53-mediated transactivation, Cell 1992, 69,1237-1245. R. Honda, H. Yasuda, Activity of MDMZ, a ubiquitin ligase, toward p53 or itself is dependent on the RING finger domain of the ligase, Oncogene 2000, 19,1473-1476. S. Fang, J.P. Jensen, R.L. Ludwig, K.H. Vousden, A.M. Weissman, Mdm2 is a RING finger-dependent ubiquitin
32.
33.
34.
35.
36.
37.
38.
39.
protein ligase for itself and p53,]. Biol. Chem. 2000,275,8945-8951. J . Roth, M. Dobbelstein, D.A. Freedman, T. Shenk, A.J. Levine, Nucleo-cytoplasmic shuttling of the hdm2 oncoprotein regulates the levels of the p53 protein via a pathway used by the human immunodeficiency virus rev protein, E M B O ] . 1998, 17, 554-564. J. Momand, D. Jung, S. Wilczynski, J. Niland, The MDM2 gene amplification database, Nucleic Acids Res. 1998,26, 3453-3459. B. Eymin, S. Gazzeri, C. Brambilla, E. Brambilla, Mdm2 overexpression and pl4ARF inactivation are two mutually exclusive events in primary human lung tumors, Oncogene 2002, 21,2750-2761. D. Polsky, B.C. Bastian, C. Hazan, K. melzer, J. pack, A. Houghton, K. Busam, C. Cordon-Cardo, I . Osam, hdm2 protein overexpression, but not amplification, is related to tumorigenesis of cutaneous melanoma, Cancer Res. 2001, 61, 7642-7646. J.D. Oliner, J.A. Pietenpol, S. Thiagalingam, J. Gyuris, K.W. Kinzler, B. Vogelstein, Oncoprotein mdm2 conceals the activation domain of tumour suppressor p53, Nature 1993,362,857-860, J. Chen, V. Marechal, A.J. Levine, Mapping of the p53 and mdm-2 interaction domains, Mol. Cell. Biol. 1993, 13,4107-4114. J. Lin, J. Chen, B. Elenbaas, A.J. Levine, Several hydrophobic amino acids in the p53 amino-terminal domain are required for transcriptional activation, binding to mdm-2 and the adenovims 5 E1B 55-kD protein, Genes Deu. 1994, 8, 1235-1246. S.M. Picksley, B. Vojtesek, A. Sparks, D.P. Lane, Immunochemical analysis of the interaction of p53 with mdm2;-fine mapping of the mdm2 binding site on p53 using synthetic peptides, Oncogene 1994, 9, 2523-2529.
References I1001 S.F. Howard, S.M. Picksley, D.P. Lane, P.H. Kussie, S. Gorina, V. Marechal, Molecular characterization of the B. Elenbaas, J. Moreau, A.J. Levine, hdm2-p53 interaction, /. Mol. Biol. N.P. Pavletich, Structure ofthe mdm2 1997, 269,744-756. oncoprotein bound to the p53 tumor 49. C. Garcia-Echeverria, P. Chene, M.J. suppressor transactivation domain, Blommers, P. Furet, Discovery of Science 199G, 274, 948-953. potent antagonists of the interaction 41. M.J.J.Blommers, G. Fendrich, between human double minute 2 and C. Garcia-Echeverria, P. Chene, On tumor suppressor p53,J. Med. Chem. the interaction between p53 and 2000,43, 3205-3208. mdm2: transfer NOE study of p53-derived peptide ligated to mdm2, 50. R. Banerjee, G. Basu, P. Chene, S. Roy, Aib-based peptide backbone as J. Am. Chem. Soc. 1997, 119, scaffolds for helical peptide mimics, 1. 3425-3426. Pept. Res. 2002, GO, 88-94. 42. M. Uesugi, G.L. Verdine, The a-helical 51. A. Bottger, V. Bottger, A. Sparks, FXXFF motif in p53: TAF interaction W.L. Liu, S.F. Howard, D.P. Lane, and discrimination by mdm2, Proc. Design of a synthetic Mdm2-binding Natl. Acad. Sci. U.S.A. 1999, 96, mini protein that activates the p53 14801- 14806. response in vivo, C u r . Biol. 1997, 7, 43. Z. Lai, K.R. Auger, C.M. Manubay, 860-869. R.A. Copeland, Thermodynamics of 52. C. Wasylyk, R. Salvi, M. Argentini, p53 binding to hdm2(1-126): effects C. Dureuil, I. Delumeau, J. Abecassis, of phosphorylation and p53 peptide L. Debussche, B. Wasylyk, p53 length, Arch. Biochem. Biophys. 2000, mediated death of cells overexpressing 381,278-284. MDMZ by an inhibitor of MDMZ 44. 0. Schon, A. Friedler, M. Bycroft, interaction with p53, Oncogene 1999, S.M.V. Freund, A.R. Fersht, Molecular 18, 1921-1934. mechanism of the interaction between mdm2 and p53,]. Mol. B i d . 2002, 323, 53. P. Chene, J. Fuchs, J. Bohn, C. Garcia-Echeverria, P. Furet, 491-501. D. Fabbro, A small synthetic peptide, 45. R. Stoll, C. Renner, S. Hansen, which inhibits the p53-hdm2 S. Palme, C. Klein, A. Belling, interaction, stimulates the p53 W. Zeslawski, M. Kamionka, T. Rehm, pathway in tumour cell lines, J. Mol. P. Muhlhahn, R. Schumacher, Biol. 2000, 299, 245-253. F. Hesse, B. Kaluza, W. Voelter, R.A. 54. P. Chene. J. Fuchs, 1. Carena, P. Furet, Engh, T.A. Holak, Chalcone C. Garcia Echeverria, Study of the derivatives antagonize interactions cytotoxic effect of a peptidic inhibitor between the human oncoprotein ofthe p53-hdm2 interaction in tumour MDMZ and p53, Biochemistry 2001, 40, cells, FEBS Lett. 2002, 529, 293-297. 336- 344. 46. R.A. Laskowski, SURFNET a program 55. J.W. Harbour, L. Worley, D. Ma, M. Cohen, Transducible peptide for visualizing molecular surfaces, therapy for uveal melanoma and cavities and intramolecular retinoblastoma, Arch. Ophthalmo. interactions, /. Mol. Graph. 1995, 13, 2002, 120,1341-1346. 323-330. 56. J. Zhao, M. Wang, J. Chen, A. Luo, 47. V. Bottger, A. Bottger, S.F. Howard, X. Wang, M. Wu, D. Yin, 2 . Liu, The S.M. Picksley, P. Chene, initial evaluation of non-peptidic C. Garcia-Echeverria, H.K. small-molecule HDM2 inhibitors Hochkeppel, D.P. Lane, Identification based on p53-HDM2 complex of novel mdm2 binding peptides by structure, Cancer Lett. 2002, 183, phage display, Oncogene 1996, 13, 69-77. 2141 -2147. 57. P.S. Galatin, D.J. Abraham, A 48. A. Bottger, V. Bottger, nonpeptidic sulfonamide inhibits the C. Garcia-Echeverria, P. Chene, H.K. p53-mdm2 interaction and activates Hochkeppel, W. Sampson, K. Ang,
40.
1002
I
15 Target Families
p53-dependent transcription in mdm2-overexpressing cells, J. Med. Chem. 2004,47,4163-4165. 58. S.J. Duncan, S. Gruschow, D.H. Williams, C. McNicholas, R. Purewal, M. Hajek, M. Gerlitz, S. Martin, S.K. Wrigley, M. Moore, Isolation and structure elucidation of chlorofusin, a novel p53-mdm2 antagonist from a Fusarium sp, J. Am. Chem. Soc. 2001, 123, 554-560.
N. Majeu, M. Scarsi, A. Caflisch, Efficient electrostatic model for protein-fragment docking, Proteins 2001,42,256-268. 60. L.T. Vassilev, B.T. Vu, B. Graves, D. Carvajal, F. Podlaski, Z. Filipovic, N. Kong, U. Kammlott, C. Lukacs, C. Klein, N. Fotouhi, E.A. Liu, In vivo activation of the p53 pathway by small-molecule antagonists of mdm2, Science 2004, 303, 844-848. 59.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
16 Prediction of ADM ET Properties UlfNorinder and Christel A. S. Bergstrom
Outlook
This chapter describes some of the approaches and techniques used currently to derive in silico models for the prediction of absorption, distribution, metabolism, elimination/excretion, and toxicity (ADMET) properties. The chapter also discusses some of the fundamental requirements for deriving statistically sound and predictive ADMET relationships as well as some of the pitfalls and problems encountered during these investigations. It is the intention of the authors to make the reader aware of some of the challenges involved in deriving useful in silico ADMET models for drug development.
16.1 Introduction
With the use of genomics, proteomics, and bioinformatics, the possibility to identify and validate target proteins has recently improved. Once the target has been identified, the search for a pharmacophore, that is, a structural fragment that binds to the target and exerts the effect, with an acceptable therapeutic potency starts. After finding such a structure, the lead optimization is initiated. Computational chemistry (CC)and high-throughput screening (HTS)are used to synthesize new compounds and optimize them with regard to increased potency. The lead optimization is performed in cycles, and in the end the leads with the highest potency might be structurally rather diverse from the starting structure. The obtained chemical library can be composed of several thousands of new structures. The synthesized library is experimentally examined for developability with the use of rapid experimental techniques for measuring, for example, stability, solubility, permeability, and toxicity. After Chemical Biology. From Small Molecules to System Biology and Drug Design Edited bv Stuart L. Schreiber. Tarun M. Kaooor. and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag G k b H 61 Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
I
1003
1004
I
1 G Prediction ofADMET Properties
Fig. 16-1 From target identification to candidate drug (CD). Target identification and validation are followed by lead discovery and lead optimization. The lead
optimization process is performed in cycles, and at the end ofthe lead optimization process the developability ofthe compounds is traditionally investigated.
these determinations, one to two candidate drugs (CDs) are selected from the library for further development (Fig 16-1). The increase in new structures generated each year has not resulted in the expected increase of marketed new drugs annually. This has amongst others been attributed to poor pharmacokinetic (PK) properties of the CDs, and as much as 40% of the attrition rate of CDs has been related to poor PK profiles [I]. Given this, reliable screening filters for factors such as absorption, distribution, metabolism, elimination/excretion, and toxicity (ADMET) are highly desirable [2-41. Indeed, the considerable effort that has been invested in the development of experimental absorption filters, for example, cell monolayers for permeability determinations [5, 61 and the turbidimetric method for solubility measurements [7],
76. 7 introduction
Fig. 16-2 Reasons for attrition in drug development in the years o f 1991 and 2000 The following reasons were observed clinical safety (black), efficacy (red), formulation (green), Pl(/bioavailability (blue), commercial (yellow), toxicology (gray), cost o f g o o d s (purple). and unknown/others (white) Note that
formulation and cost o f goods were only observed as reasons for attrition in 2000 and not 1991 Further Pl(/bioavailability profiles o f new drugs were largely improved d u r i n g this decade Finally, commercial reasons for attrition were m o r e than threefold higher in 2000 than in 1991 [8]
has lately resulted in a decrease i l l the attrition rate related to PI< properties (Fig. 16-2) [8]. However. to allow a n A D M E T analysis of coniputationally designed druglike molt~cules to bc pc,rformed prior to their chemical synthesis, computer-based filter-s for prcdicting PI< properties are needed. Also, in current pharmaceutical rescarch i i r m challenges have been included where additional considerations have to be t a l e t i with respect to toxicological effects such as avoiding interactions with human Ether-a-go-go-RelatedGene (hERG) as well as potential cytochronie P450 intcractions related to avoidance of phase 1 metabolism. A particular probleni associated with the predictions of toxicological effects is the lack of one well-dcfiiied and measui-able target (end point) where the same mechanism is involved it1 giving rise to the observed effrcts. On the contrary. even fairly similai- compounds may exert their toxicity using di ffei-e11 t mec hail is ins. From a development perspective. oiie of the first properties to be evaluated is the gastrointestinal (GI) absorption. since tht, extent to Mhich a drug is absorbed through the intestine will determitie i f i t is possible to give the drug in an oral dosagc form. This formulatioti is the tnost convenient dosage form for the patient. allowing thc patient to taltc care of the medication himself/herself.. Two of the main factors influtmcing intcstinal absorption are the solubility of the compound in thr GI fluid and thc permeabilit) of thc compound through the intestinal wall. Thc solitbility will be restricting the absorption if the oral dose given is not s o l u b l ~i n 250 m L in the pH interval relevant in the G I tract (pH 1 in the stornach u p to pH 8 i n the colon) .I)“
I
1005
1006
I Permeability will restrict the absorption if the permeability coefficient through 7 G Prediction ofADMET Properties
the enterocytes is low, leading to only a fraction of the compound in solution that has been transported over the epithelium during the transit time in the small intestine. Both solubility and permeability are dependent on the physicochemical properties of the molecule, unfortunately in an opposed manner (Fig. 16-3). For instance, lipophilicity, which is the major driving force for permeability, is one of the most restricting properties for aqueous solubility.
Fig. 16-3 Molecular properties important for solubility and permeability. (a) In the GI tract the tablet needs t o dissolve t o be able to permeate the intestinal wall. One ofthe main properties restricting solubility, for example, hydrophobicity, is a driving force for the transcellular permeability. (b) The following general properties can be extracted for permeability (from the left-hand side): the transcellular route is used by nonpolar, medium-sized (MW < 500), and uncharged compounds; the paracellular route is utilized by compounds that are polar, small (MW < 180), and charged;
energy-dependent active transport processes (transport efflux and influx proteins) are used by compounds that are medium t o large sized, both by polar and nonpolar compounds. Further, the compounds may be charged or uncharged. The figure presents hydrophobic atoms in gray (carbon atoms) and white (hydrogens bound t o carbon atoms), and polar atoms are shown in red (oxygen atoms), pink (hydrogens bound t o oxygen atoms), blue (nitrogens), and light blue (hydrogens bound t o nitrogens).
IG.I introduction 16.1.1 Drug Solubility
The aqueous solubility of the compound is dependent both on the intramolecular forces in the solid state and the intermolecular forces between the drug molecule and the surrounding intestinal fluid. The solubility will be poor if it is more energetically favorable for the molecules to bind to each other than to the water molecules, resulting in the molecules rather remaining as a solid compact than dissolving in the water-based fluid. However, poor solubility might also be a result of the unfavorable binding between the water and the drug molecule is unfavorable. Depending on which of these underlying properties is the most important, different physicochemical properties will be important for the behavior of the molecule in the water. Multivariate data analysis of melting point, a property reflecting the stability of the solid state, has shown that molecules proven to form stable crystals, in general, are small, rigid, and polar [lo]. On the other hand, compounds that are hydrophobic, flexible, and large demand a larger cavity to be formed in the aqueous fluid to get dissolved, and may be solubility restricted due to these properties. Models for prediction of solubility will be further discussed in Section 164.2, but the above-mentioned contrasts indicate that solubility is not a straightforward property to predict. 16.1.2 Intestinal Permeability
A compound can permeate the intestinal wall by using the paracellular route (between the cells) or the transcellular route (through the cells) by passive diffusion. To generalize, small, hydrophilic, and/or charged compounds, which cannot permeate the lipophilic cell membrane, diffuse through the aqueous pores. However, the pores cover less than 1% of the intestinal surface [ll],and this in concert with the solute restriction caused by the tight junctions of the pores largely limits the contribution of the paracellular pathway. Compounds that show a reasonable hydrophobicity (log D p H 7 . 4 of0-2) and intermediate size (up to a molecular weight of 500) are assumed to permeate the intestinal wall by passive transcellular diffusion. Even though the transport by the transcellular route seems to be a rather complex process, demanding partitioning between lipophilic and hydrophilic milieus several times, the vast majority of druglike compounds utilize this pathway. Larger molecules with a large number of hydrogen bond donors and acceptors, sometimes in combination with a high lipophilicity value, may be utilizing active processes and transport proteins to get through the cells. However, the latter properties also increase the risk that the compound might be transported by ef€lux proteins, resulting in a secretion of the compound back to the intestinal lumen. Such efflux results in a lower drug concentration reaching the blood circulation and the site of action.
I
1007
1008
I
IG Prediction ofADMET Properties
To conclude, two of the main factors influencing intestinal drug absorption are aqueous solubility and intestinal permeability. These characteristics are dependent on opposed physicochemical properties, resulting in difficulties in finding easily interpretable models for prediction of the drug absorption process. Several computational solubility and permeability models have so far been developed and a majority ofthese are either dataset restricted, for example, only a small volume of the druglike space has been included in the training of the model, or mechanism based, for example, valid for a specific transport route or transport protein. This indicates that firstly, the datasets used in the development of absorption models applicable in the drug discovery process need to cover a large volume of the druglike space. Secondly, the development of pharmaceutical informatics tools is crucial to extract correct information from combinations of all mechanism-based models that are available. 16.1.3 Toxicity
Structure-activity relationships (SARs) in toxicology are based on the assumption that an adequate representation, that is, geometric and electronic, of the investigated structures will permit the derivation of a quantitative statistical model. This assumption is not unique for toxicological modeling but true for all other areas of ADME modeling as well. However, in toxicology, the situation is somewhat further complicated by the fact that toxicological effects may result from many different mechanisms. This, in turn, means that it is possible to establish good in silico models for congeneric series of molecules, and more general models may be difficult to derive. In 1969, Convin Hansch, the founder of modern quantitative structure-activity relationship (QSAR), proposed that, in general, a biological and toxicological action for a congeneric series of structures could be described by the model: Log(activity)= a(n)
+ b(&)+ c(S) + d
(1)
where n,E , and S are related to the hydrophobic, electronic, and steric descriptions, respectively, of the studied compounds. Toxicological structure-activity investigations have over the years been conducted in areas such as nonspecific toxicity, aquatic toxicology, mutagenicity, and carcinogenicity as well as developmental toxicity, and skin sensitization. For a recent article on the subject see Ref. 12. 16.2 History and Development
Traditionally, the discovery setting has worked in serial with the primary focus set on identification of new structures that show good pharmacological
16.3 General Considerations
effect. After the screening for pharmacological effects, other important properties such as solubility, permeability, stability, metabolism, distribution, elimination, and toxicity have been investigated one after each other. This is a noneffective, time-consuming drug discovery process, which not necessarily results in the identification of optimal drug molecule because of the investigation of one property at the time. Currently, the pharmaceutical industry is working with experimental screens in a parallel setting, in which the above-mentioned properties are experimentally examined at the same time and thereafter evaluated. Hence, all properties affect the final decision on which compounds to pursue, leading to better selection of the CDs. Further, the discovery setting is now moving into the virtual era, applying several virtual tools to further cut time and costs during the discovery process. By designing virtual compound libraries and testing these by virtual docking to targets and in silico models for ADMET properties, a prioritized library predicted to have favorable pharmaceutical profile and acceptable pharmacological potency is computationally selected and thereafter synthesized. This scenario results in knowledge-based synthesis of fewer compounds with better properties than both the serial and parallel setting described above. After the synthesis of the prioritized library, the potency and the developability of the compounds must be experimentally confirmed (Fig. 16-4). Thus, methods for rapid and reliable experimental screening of these properties are warranted. Currently, rapid methods have been devised for the screening of several of the ADMET properties at the expense of reliability [7, 131, resulting in large number of false-positive results in the screens. By incorporating reliable computational and experimental screens better leads will be produced, saving time and money during the discovery process. However, if the virtual-based discovery setting is to be successful new computational tools need to be developed. The development of informatic tools applicable for pharmaceutical profiling and with the capacity to handle large databases with such diverse information as in silico, in vitro, and in vivo data as well as qualitative and quantitative information will be of utmost importance.
16.3 General Considerations 16.3.1 General Terms
When trying to develop in silico models for the prediction ofADMET properties there is in most cases a trade-off between accuracy, speed, and, many times, transparency of the derived models. This is not always a significant problem as the various models may be intended for different usages, for example, for high-throughput in silico screening or for guidance and focusing, respectively. In reality, this often means that rapidly computed descriptors, often of one-
1
1009
1010
I
IG
Prediction ofADMET Properties
Traditional (serial)
Current (parallel)
lji”[c a-
Near future (knowledge based)
Library
-a-El 8
8
CD selection
Virtual library
Privileged library
aaaaaa CD selection
CD selection Fig. 16-4 The traditional setting applied in the candidate drug (CD) selection was a serial experimental testing o f pharmacology (P) followed by the different ADMET properties, resulting in extended development times and difficulties t o find the optimal compound. Currently the pharmaceutical industry applies a parallel setting and moves toward the knowledge-based setting. In the parallel setting, both pharmacology and ADMET
properties are experimentally evaluated simultaneously and the complete profile can be used when selecting the CD. In the knowledge-based setting, a virtual library designed in the computer is primary evaluated through different in silico models for pharmacology and ADMET properties. A priviliged library is synthesized on the basis ofthe results from the virtual screening and the compounds are thereafter experimentally tested.
and two-dimensional nature, are utilized in the former kind of models while more computer intensive, three-dimensional based, variables are employed (sometimes in conjunction with one- and two-dimensional representations) in the latter type of models. Cronin and Schultz have in a recent article [14] quite nicely put forward some rather basic requirements to derive statistically sound models: 1. well-defined and measurable target 2. a chemically and biologically diverse data set 3 . physicochemical descriptors that are consistent with the modeled target 4. usage of an appropriate statistical technique 5. where possible, a strong mechanistic basis. 16.3.2 Datasets and Models
One consideration to take into account in ADMET modeling is the availability of relevant and accurate datasets. In general, there exists a relatively small
16.3 General Considerations
number of datasets, especially public ones, with desirable quality of data, diversity of the investigated structures, and large enough size to permit sufficient validation of the derived model. In the ADMET literature, especially within the areas of solubility, absorption, and permeability, it is quite common that models are derived from rather few compounds (less that 50). These models are usually quite local ones having a limited scope with respect to their predictive ability. Local models, however, are in many cases quite useful for advancing a particular project or set of compounds but in one particular respect a vast majority of the published models are lacking information, that is, with respect to the applicability domain in which they operate. Very few publications of ADMET models explicitly point out or discuss how the applicability domain of the derived model in question is established. Statistical models in general, including in silico ADMET models, should always have some protocol (measure) to determine if the prediction of a property for a particular compound is within, on the border of, or outside (perhaps also how far outside) the applicability domain of the model based on the chemical description employed. This aspect will be further discussed in Section 16.3.3.6 together with an approach on how to proactively use the information on outliers to further advance the model. Absorption and permeability models, and the datasets they are based on, also have a particular problem with respect to active transport. In the past, datasets were modeled under the assumption that the absorption or permeation process was devoid of active transport, although later analysis showed that this was not entirely true. Most probably, compounds in datasets currently being investigated will later be found to be involved in active transport by transporters not yet identified. An extenuating circumstance is the fact that if a model with good statistics as well as good predictive ability is derived despite the fact that some compounds of the training set, that is, the compounds used to derive the model, are involved in active transport then the two alternative explanations may emerge: (a) that the amount of active transport of a particular compound is rather small (negligible) or (b) that the derived model somehow encompasses the information also related to active transport, although this was, in most cases, not the intent from the start.
16.3.3 Statistical Tools 16.3.3.1
Linear Multivariate Methods
The statistical methods most often employed for developing ADMET in silico structure-property relationships are linear multivariate methods, such as multiple linear regression (MLR) or partial least squares (PLS).Although aimed at the same end point, namely, to derive a statistically sound and predictive structure-property relationship, the underlying assumptions regarding the information contained in the independent variables, that is, the chemical
I 1011
1012
7G Prediction ofADMEJ Properties
I description of the investigated structures, are quite different for the two methods. With respect to MLR the following should be considered: 1. MLR assumes each variable to be exact and relevant, that is, the information content in each variable is to be used in entirety for developing the statistical model. 2 . Strong colinear variables must be eliminated by removing all but one of the strongly correlated variables; otherwise spurious chance correlation may result. 3 . The number of variables cannot exceed the number of observations, for example, the number of measured ADMET property points, to be studied. A rule of thumb is that the number of variables used should not exceed a fourth of the number of observations. Regarding PLS the following applies: 1. The descriptors (variables) are not treated as exact and relevant but as consisting of two parts, one part related to the dependent variable and the other part not related (noise). 2. Strong correlations between relevant variables are not a problem in PLS and all such variables can be kept in the analysis. In fact, the models derived using PLS become more stable with the inclusion of strongly correlated and relevant parameters. 3 . The number of original descriptors may vastly exceed the number of compounds in the analysis since PLS uses, internally, only a few (usually less than 5-10) latent variables for the actual statistical analysis. 4. Again, a rule of thumb is that the number of latent variables used should not exceed a fourth of the number of observations. The PLS model becomes identical to the MLR when the number of latent variables of a PLS derived model becomes equal to the number of actual independent variables, something that rarely happens as a consequence of model validation. The regression coefficients of the MLR model are straightforward to interpret while the PLS latent variables need to be retransformed into original variable space to be interpreted in a similar manner. This also means that the PLS “regression” coefficients are dimensional dependent, that is, they depend on how many latent variables (PLS components) are used. However, since each PLS component explains a decreasing amount of variance it is usually not that important if a PLS model is based on three or four components, which also means that the PLS “regression” coefficients will not differ very much between the three- and four-component models.
76.3 General Considerations
16.3.3.2 Nonlinear Multivariate Methods Although a majority of the published ADMET models are based on linear multivariate methods as discussed in Section 16.3.3.1, other nonlinear methods have also been employed. The most commonly used nonlinear method in ADMET modeling is neural networks (NNs). Backpropagation N N s have been used to model absorption, permeation, as well as solubility and toxicological effects. A particular problem for many N N s is the tendency for these networks to overtrain (see further discussions on model validation in Section 16.3.3.4), which needs to be closely monitored to avoid the situation where the derived model becomes an “encyclopedia”, that is, the model can perfectly explain the variance of the investigated property of the compounds used to derive the model but have quite poor predictive ability with respect to new compounds. 16.3.3.3 Dataset Pretreatment It is very important to give the variables used in the model development equal chance, regardless of their respective numerical scales, to influence the outcome of the analysis. This can be achieved by scaling the variables in an appropriative way. One popular method for scaling variables is autoscaling whereby the variance of each variable is adjusted to 1. Sometimes it is also desirable to center each of the variables with respect to their mean values. 16.3.3.4 Model Validation Stringent model validation is a cornerstone for the successful development of any statistical model. Without proper validation the predictive ability of the derived model cannot be estimated. Likewise, the derived model may equally be nothing more than a random model. There are a few standard techniques that should be employed to ensure proper validation: 1. Cross-validation is one technique for the internal validation of a proposed model. When using the cross-validation the training set is divided into groups, usually four to seven, and one group is removed from the set. The model is then derived using the rest of the training set. The dependent property of the compounds of the left-out group is the predicted by the developed model. Each group is successively left out and predicted in the same manner as just described. The predicted residual error sum of squares (PRESS) is computed from all the predictions. The PRESS value is compared with the sums of squares for the dependent variable y (SSY):
c(yi,measured
- )‘mean)’
(2)
I
1013
1014
I
1 G Prediction ofADMET Properties
A squared correlation coefficient (Q2)is then defined as: Q2
= 1 - PRESS/SSY
(3)
A significant difference between Q2 and the normal squared correlation coefficient ( R2)is that the former may also assume negative values, indicating that the model has worse predictive ability than using the mean value as predicted value for each compound. Q2 should be 20.5 for the model to be considered to have reasonable practical predictive performance. 2. An external validation set should be used as an independent test of the predictive ability of a derived model. 3 . Randomization of the dependent variable, that is, the values of the dependent variable is randomly redistributed among the compounds. A model is then derived on the basis of the redistributed values and checked for its predictive performance using the methods outlined under points 1and 2. This procedure is repeated a number of times, typically between 50 and 100 times. There should exist a clear separation in predictive ability between the model based on the “true” dependent values versus the model based on redistributed values.
16.3.3.5 Training and Test Set Selection It is certainly possible to chose a training set at random and also to derive a statistically sound and predictive model. Chances are, however, that the choice of training set compounds is soinewhat skewed. This, in turn, most probably means that many of the remaining compounds, the external test set, will fall outside of the applicability domain of the derived model and constitute outliers to the present model. For a model to have the ability to show a good predictive capability and to cover the investigated descriptor space in a good manner the training set must be chosen with some care. There are several methods available for the selection of well-distributed training sets. Two such methods will be exemplified here: 1. Experimental design methods of some appropriate complexity are one such choice. The number of compounds to be used for the training set depends on the chosen design scheme and the number of investigated independent variables (descriptors) but may typically range between 8 and 64. 2. Maximin methods, where the aim is to maximize the closest (minimum)distance between two potential
16.3 General Considerations
training set compounds in the investigated descriptor space. By maximizing the closest distance all other distances between training set compounds are greater, thus ensuring a rather uniform distribution of compounds comprising the final training set.
16.3.3.6 Applicability Domain Estimation
It is rather essential that the applicability domain of a derived model can be evaluated so that outliers to the model may be indicated. If an established statistical model is to be regarded as poor from a predictive point of view this should be done on the basis of correct reasons, that is, that the model has truly poor predictive ability and not from the fact that the model cannot estimate outliers to the model with acceptable accuracy. The latter case is probably the most common cause for statistical (ADMET) models to “fall from fame” especially those that can be accessed through internal or external web services. In many cases it is difficult, if not impossible, to find out about the compounds used as training set and/or the chemical description used in the model. Thus, many compounds outside the applicability domain of the model will be submitted. It is therefore of great importance to have an indication together with the prediction whether the compound is considered to fall inside or outside of this domain, that is, if the compound is an outlier or not. The outlier information, and possibly also how far from the model the compound in question is, may in many cases be utilized in a more proactive way than just realizing that a number of compounds submitted to the model for prediction are, in fact, outliers to the present model. Thus, by analyzing the outliers, perhaps virtual compounds, from various points ofviews, for example, structural or synthetic, some of these compounds may later be synthesized and tested experimentally. The same compounds may then be incorporated into a revised model that will have a broader applicability domain. There are different methods available to determine whether a particular compound is to be labeled as an outlier. In this section, we will describe two of these methods: 1. The first of these methods is the Mahalanobis distance. This distance in descriptor space measures how similar the investigated compound is to the training set compounds. The Mahalanobis distance is superior to the corresponding, and more familiar, Euclidian distance since the former takes correlations between the variables into account, that is, the Mahalanobis distance does not assume orthogonal descriptors as does the euclidian distance, that normally exists. 2. The second method is related to the remaining information present in the variables used to describe the compound that has not been utilized by the model. This method is closely related to the PLS method and its
1
1015
1016
I
IG
Prediction ofADMEJ Properties
assumption with respect to the relevance of each variable (see Section 16.3.3.1).Thus, if a particular compound contains a lot of unexplained variance (information) in the chemical descriptor variables, much more than the training set compounds, it is quite likely that the compound in question will have other properties, not accounted for by the present model, which will impact on the true value for the investigated ADMET property. The predicted value will therefore, most likely, deviate substantially from the corresponding experimental value.
16.3.3.7
Calculation of Descriptors
A large number of different descriptors have been used to model ADMET properties. All 1-D, 2-D, and 3-D based computed chemical properties have been found useful for deriving statistically sound and predictive ADMET models. The choice of which type of descriptors, or combinations thereof, to use depends very much on the aim of the derived model. Is the model to be used for screening large sets of (virtual)compounds or for smaller sets of structures? How important is interpretability versus predictive accuracy and robustness of the prediction? How much computational time is allowed for spending on each individual prediction? In fortunate cases many of these considerations coincide, that is, the model is robust and shows good predictive capability as well as being based on rapidly computed descriptors that are easy to interpret from a mechanistic or physicochemical point of view. However, in most cases there exists a trade-off between objectives. Depending on the priorities for the development of the particular model at hand different sets of descriptors have to be employed. Having these aspects in mind, it is usually quite useful to develop more than one model for the same ADMET property based on different sets of descriptors. This way, both interpretability (incorporated into a model with acceptable although perhaps not the best predictive ability) and robustness, as well as speed and accuracy can be achieved. For instance, the former kind of model can be used for understanding the important physicochemical properties influencing the particular ADMET property in question, how these physicochemical properties should be modified to achieve a suitable level for the investigated ADMET property. This, in turn, gives an indication of how new and improved compounds could be designed, as well as enables focusing on promising regions of the chemical space of the model. Thus, instead of simulating a very large number of virtual compounds for prediction by the model, a much smaller number can be submitted. Subsequently, this smaller number of structures can then be submitted to a more robust and accurate, although more complex, model for the final estimation ofthe ADMET property. In many cases, consensus or ensemble models, although more complex in nature, are quite useful for deriving in silico ADMET models with high levels of predictive accuracy as well as high degrees of robustness. These models may
I G. 3 General Considerations
quite often be looked upon as “gray”, not “black”, boxes since each model can be interpreted but the multitude of them makes the overall picture difficult to comprehend. 1-D and 2-D descriptors are generally much faster to compute than the corresponding 3-D based ones. Also, the possible problems associated with generating a reasonable 3-D conformation for the investigated structure are eliminated. 1. I-D descriptors such as molecular weight, molar refractivity, as well as number of atoms and bonds have been used to model permeability, absorption, solubility, and toxicological effects. These kinds of descriptors are usually rather easy to interpret. 2. A large number of 2-D descriptors exist. Many of them are topological in nature, that is, they are computed from the connectivity of the investigated compound or, more specifically, from the mathematical graph that the structure represents, and often contain important information with respect to ADMET modeling. Some of the more well known, and often much used, topological variables are the Kier and Hall descriptors. However, many times these topological descriptors are somewhat difficult to interpret with respect to the question: “How should the present structure be modified to improve the ADMET property presently investigated?” A particular subset of topological descriptors, the so-called electrotopological ones, is an exception with respect to interpretability. These kinds of descriptors are quite easy to interpret in terms of hydrogen bonding and quite a few published investigations have found the electrotopological (or e-state) descriptors useful for deriving good ADMET models. 3. In many cases 3-D based descriptors are superior to lower dimensional ones because they capture important information, such as internal hydrogen bonds, and other potentially important, but buried functional groups revealed only by using the actual 3-D representation of investigated compound. The 3-D descriptors may also be easier to interpret than some of the previously mentioned variables. However, choosing the correct 3-D conformation may, in some cases, cause problems depending on how rapidly the descriptors must be generated. There are softwares for converting 2-D structures into 3-D ones, for example, Corina and Concord, but although quite successful in a vast majority of cases, both these programs sometimes fail during the
I
1017
1018
I
1 6 Prediction ofADMET Properties
conversion process or the 3-D conformation given is not a reasonable one for this particular modeling exercise (Tab. 16-1).Certainly, some sort of conformational analysis would in many cases be desirable. For the 3-D descriptors there exists a large difference in complexity and computational speed ranging from rapid calculations of various surfaces and volumes of a structure to high level, for example, ab initio, quantum mechanical based descriptors such as orbital energies, charges, polarizabilities as well as multipole moments. In some cases it is possible to go from more computationally demanding descriptors to more rapidly computed ones while preserving the information content from one descriptor matrix to the other.
16.4 Applications and Practical Examples 16.4.1 Physiological Factors and Experimental Parameters Influencing the Accuracy of Predictions of Intestinal Drug Absorption
16.4.1.1
Solubility
The intestinal solubility of a compound is dependent on physicochemical properties of the molecule (discussed in Sections 16.1.1 and 16.4.2), the location in the GI tract, the general physiology, and the dosage form. By analyzing the descriptors in the Noyes-Whitneyequation [ 151the physiological and pharmaceutical influence on dissolution becomes apparent: dfitfdt = DA(C,)/h
(4)
where, C, is the maximum amount of drug that can be dissolved in the fluid, that is, the solubility value, A is the surface area of the undissolved compact, D is the diffusion coefficient in the intestinal fluid, and h is the height of the diffusion layer adjacent to the undissolved tablet. The diffusion coefficient of a molecule will be dependent on the viscosity of the fluid; the higher viscosity, the lower diffusion coefficient and thereby less amount of compound will be dissolved per time unit. Furthermore, the larger the surface area of the undissolved compact and the higher the solubility of the compound, the more compound will be dissolved per time unit. The pH of the GI tract varies from pH 1 in the stomach up to pH 8 in the colon. Thus, the solubility of protolytes, that is, compounds with one or several ionizable groups, will be dependent on the location in the GI tract [16]. Compounds with an acidic functional group will show increased solubility at pH values above the pK,, whereas the solubility of bases will improve at
ACD labs PharmaAlgortihms PharmaAlgorithms AccelRys Cyprotex Simulations Plus Lion Biosciences Bio-Rad Laboratories ZyxBio Bayer Technology Services Schrodinger Simulations Plus TimeTec
Company
X
Dissolution
Crosses shows properties predicted in each of the reported software. The following abbreviations are used: Sol - solubility, Perm - intestinal permeability, Trp - transporters, HIA - human intestinal absorption, BBB - blood-brain barrier permeability, PK - pharmacokinetic properties.
QikProp QMPRPlus SLIPPER
AbSolv ACD Solubility DB ADME batches ADME boxes Cerius2 Cloe PK GastroPlus iDEA PKexpress KnowItAll ADME/Tox Oraspotter PK-sim
Software
Table 16-1 Examples of commercial software available for prediction of ADMET related properties
X
x x
X X
x
X
x
x
x x
x
X
X
x
x
x X X
Perm
Sol
x
X
x
Trp
X
X
X
X
Oral bioavaila bility
xX
X X
x
X X X
x
X
HIA
x
X
x
x
BBB
X
X X X X X
Metabolism
X
X
X X
X
Other PK
x
X
X
X
Toxicity
3
0
s.
8
-Q
b
-
A
o\
1020
IG Prediction ofADMET Properties
I pH values below the pK, value. For ampholytes, the lowest solubility will be found at the isoelectric point, which is obtained at a pH value between the acidic and basic pKa values. Another physiological factor that will influence the solubility is the ionic strength of the intestinal fluid. This will be dependent on food and fluid intake, and on the absorption and secretion of fluid within the intestine [17].In general, the solubility decreases with increased ionic strength, because of the salting-out effect and/or the common ion effect displayed by the counterions in the solution [18, 191. However, the presence of electrolytes can in specific cases improve the solubility [lo].This phenomenon is known as the salting-in effect, and occurs when additives such as electrolytes loosen up the tight water structure and thereby drive the formation of solvent cavities for the drug molecule. Further, food induces the secretion of bile salts, that is, surfactants secreted by the bile bladder, which may improve the solubility of poorly soluble compounds by acting as a wetting agent or by solubilization within the lipophilic core of bile salt micelles formed at higher bile salt concentrations [21]. The in silico models derived for solubility are based on intrinsic solubility as their experimental input data. The intrinsic solubility is the solubility value determined for the neutral (i.e., uncharged) species of the compound and is generally determined at 2 pH units above the pKa value for bases and 2 pH units below the pK, value for acids. Ampholytes are determined at their isoelectric point. The solubility values used for the model development therefore seldom reflect the apparent solubility seen in the intestinal fluids. Hence, the predicted values obtained from the models need to be transferred to an in vivo situation, for instance, by use of the HendersonHasselbalch equation, which takes into account the pH dependency of solubility [16].
16.4.1.2
Permeability
The rate and extent of intestinal permeation is dependent on the physicochemical properties of the compound (see Sections 16.1.2 and 16.4.3) and the physiological factors. Drugs are mainly absorbed in the small intestine due to its much larger surface area and less tight epithelium in comparison to the colon [17].The permeation of the intestine may be affected by the presence of an aqueous boundary layer and mucus adjacent to cells, but for a majority of substances the epithelial barrier is the most important barrier to drug absorption. The lipoidal cell membrane restricts the permeability of hydrophilic and charged compounds, whereas large molecules are restricted by the ordered structure of the lipid bilayer. In the GI tract, a pH-dependent permeability is seen (see also Section 16.4.1.1):the higher the degree of ionization of the compound, the poorer the permeability. Other physiological factors influencing the permeability value of the compounds are the motility of the GI tract, the expression of transport proteins, and the thickness of the mucus layer adherent to the
76.4 App/ications and Practical Examples
enterocytes. These factors influence the permeability as follows: the better the motility of the intestine, the smaller the unstirred waterlayer (UWL) adjacent to the cells. In general, the peristaltic is so good in vivo that the UWL does not become the rate-limiting step in the absorption process. Further, the extent to which the transport proteins are expressed will largely influence the absorption. Dependent on whether the transport protein is an influx protein, transporting the compounds through the enterocytes into the blood circulation or an efflux protein, transporting the compound out from the cell back to the intestinal lumen, the fraction absorbed (FA) will either increase or decrease with a high expression of the transporter. Finally, a thick mucus layer adjacent to the cells may slow down the diffusion of the compound and become the rate-limiting step of the absorption process. Taken altogether, these physiological factors may result in large interindividual variability in the permeability value, giving large standard deviations in the FA
i n vivo. The i n silico models derived for permeability are based on experimentally determined permeability values using different cell culture models. The most commonly used is the Caco-2 cell line, which is a human colon carcinoma cell line [22, 231. This cell line is inexpensive and easy to culture, and these factors in concert with its human origin make it a popular cell model. However, the colonic epithelium is somewhat tighter than the small intestinal epithelium, resulting in permeability values of 1-2 orders of magnitude less than that seen in smaIl intestinal tissue. Despite this fact, the permeability ranking of the compounds is in good agreement with that obtained in the small intestine, and therefore the model is a valid tool for estimations of FA over the small intestinal wall. Other cell lines used for determination of permeability values are MDCK cells that originate from canine kidney tissue [24] and 2/4/A1 cells originating from the rat small intestine [25]. The drawback with these cell lines is that they are not obtained from human tissues and the MDCK cell line is further restricted by its kidney origin resulting in for example, other expression pattern of transporters than that in the human small intestine. In vivo, perfusion studies in humans can be used to determine intestinal drug permeability [26]. All the different experimental settings and protocols applied for permeability measurements will largely influence the obtained permeability data. It is therefore important that the experimental values used in the development of computational models are determined in a consistent manner, within the same laboratory using one experimental setting and one experimental protocol. Only then the i n silico model is based on high quality data and the noise level is minimized.
16.4.1.3
Fraction Absorbed
Several computational absorption models based on human FA data have been published [27-301. These models should be interpreted with caution,
I
1021
1022
7G Prediction ofADMET Properties
I due to the fact that the datasets are compiled from a large number of literature sources of varying quality. The following facts must be taken into consideration: 1. Different experimental methods are used to determine the FA, resulting in a large variability in the numbers reported. 2. The influence of active transporters and the concentration dependency in vivo are not always clear. 3. It is not transparent whether the FA is solubility limited and/or permeability limited, resulting in difficulties in obtaining a mechanistically transparent model. 4. The datasets obtained are often heavily biased toward compounds with high FA due to the fact that a majority of the compounds for which FA is known are commercially available compounds. Hence, these compounds are the results of years of discovery and development and they are expected to show a good absorption profile. However, this fact will influence the obtained in silico models. These will be rather good at sorting compounds such as high FA, but poor in determining other classes such as intermediate or poor FA due to the lack of such compounds in the training sets. To conclude, it is not unusual that FA data for the same compound varies with 50% in the literature, for example, FA can be reported as either 10 or GO%, generally sorted as poor and intermediate FA, respectively. If such data is used for training the i n silico model, the model will to a large extent be based on noise leading to poor external predictions and noninterpretable results. In our mind, it is more relevant to estimate the FA on the basis of in silico solubility and permeability screens.
16.4.2 In silico Solubility Models
Modeling solubility represents perhaps a bigger challenge than modeling absorption and permeability. Why is this so? Some of the particular issues involved in trying to derive good statistical models for solubility are related to quality and precision of the dependent variable, namely, the solubility values, the complexity (or lack thereof) and diversity of the compounds of the investigated datasets, the possible influence of the solid state for each of the studied compounds as well as whether modeling solubility is fundamentally a linear or nonlinear problem. With respect to the first issue, the quality (precision) of the solubility values found in literature, it must be recognized that the values published stems from a variety of experimental
16.4 Applications and Practical Examples
procedures that make comparisons between sets of measurements rather difficult. It is not uncommon that published values of a particular compound may differ by as much as a factor of lo! This, in turn, certainly makes modeling solubility a difficult problem. Many of the publications on modeling solubility contain a large number of compounds but in many cases majority of these structures are rather simple, nondrug like, molecules in which the structural complexity with respect to functional groups and ring systems is somewhat limited. Good quantitative structure-solubility relationships are easier to derive for such datasets. Also, it has been recognized for many years that the solid state of each of the investigated compounds may very well play an important role for the modeling attempt to be a successful one. The difficulty here lies in the fact that it is rather difficult to obtain a theoretical estimate of the solid phase within reasonable computation time and with satisfactory precision. Nevertheless, many attempts have been made and many articles published over the years on how to model solubility. In this section some of these recently published works will be described and commented upon to illustrate the present status of the field: 1. A well-known paper is that by Huuskonen [31]. In this investigation a backpropagation artificial neural network (ANN) was used as statistical engine and as e-state descriptors to parameterize the chemical structures. The investigation was based on 1297 compounds, also known as the Huuskonen dataset, and used a large training set, a randomly chosen test set, and a second (external) test set composed of 21 compounds. A model with good statistical quality was developed (see Table 16-2).A point worth noticing in this investigation is the use of the dataset specific “test” set where, in this case, according the publication: “The network architecture and the training end point giving the highest coefficient of determination, rired,and the lowest standard error s for the predictions of the test set were then used”. This means that the randomly chosen test set is in fact a validation set for the training of the N N and the only “true” external test set is the 21-compound set. A somewhat larger external test set is desirable to more extensively evaluate the predictive ability of the derived model in question. The statistical results are presented in Table 16-2. 2. Several other investigations of solubility using the “Huuskonen dataset” and other datasets using ANNs and various other N N methods, for example, Bayesian NNs, Kohonen’s self-organizing NNs, have been published in the last few years (see Table 16-2 for results and references).
1
1023
1024
I IG
Prediction ofADMEJ Properties
3 . Jorgensen and Duffy have published a recent review of
predictions of solubility focused on drugs [32]. 4. Consensus modeling using ANNs have been published by Manallack and coworkers [ 3 3 ] .They used BCUT variables with diagonal elements consisting of charges, hydrogen bonding acceptor and donor ability, respectively, as well as polarizability. Many, not to say an overwhelming majority, of the investigations that have published on the prediction of aqueous solubility of dmgs (and other compounds) have identified the most important (influential) factors to be related to hydrogen bonding, polarizability or polarity, as well as hydrophobicity expressed through terms such as e-state indices, hydrogen bonding terms, and the log P variable. Lately, consensus modeling has come into play as a useful tool for obtaining robust models with good predictive ability. By using this approach the weakness of one particular model is compensated by the other models thus obtaining a much more robust behavior for the ensemble of models. Apart from the accuracy of experimental data as discussed earlier, however, there exists a problem with the presently derived models, that is, although, at first sight, appearing to be quite respectable statistical models with rather good predictive ability these models are not so optimal for predicting the solubility of drugs. Why is that? An investigation by Norinder and coworkers [34]will be used to illustrate the situation but, again, this is a general deficiency among the published models for predicting aqueous solubility. The statistics for the PLS model is quite appreciably, see Table 16-2 (Norinder; PLS) and also a plot of experimental versus calculated solubility (Fig. 16-5). However, a closer inspection of the solubility range relevant for most drugs, that is, -6 to -3, reveals a rather different picture (Fig. 16-6). For the accurate prediction of such entities the derived model is not very useful. This is, however, the situation that investigators are faced with when trying to derive models for accurately predicting drug solubility that can be of valuable practical use for medicinal chemists, biologists, pharmacologists, and others in trying to advance research projects to arrive at compounds with reasonable solubility. Using consensus or ensemble modeling instead of a single model usually improves the situation somewhat as exemplified by a rule-based ensemble model using two-dimensional parameters on the Huuskonen dataset (Table 16-2: Norinder; RDS/classification/ensemble) [34]. Sometimes, depending on the targeted use ofthe model as well as the precision of the experimental data, it is more useful to classify the range of solubility into two or three bins (categories). This approach is exemplified on the same dataset in which three categories (log(S); good: > -2, medium: -2 to -4, poor: 1 4 ) were used. The results of a single model approach as well as an ensemble modeling (50 models) are reported in Table 16-2.
0.83 0.91
n
21 21
Accuracy (“3)
n
497 497
Accuracy (%)
82.10 98.00
n
800 800
Type
RDS/classification RDS/classification/ ensemble
Norinder Norinder
coefficient, s - standard deviation, accuracy - %correct classified compounds into the three classes: good, medium, poor.
n - number of compounds,
Norinder Norinder
RZ - squared correlation
Accuracy (“A)
21 21
0.58 0.51
497 497
0.69 0.35
0.87 0.97
800 800
PLS RDS/ensemble
Wegner
Model
21 21
0.60 0.54
0.88 0.93
413 253
0.47 0.52
0.94 0.94
884 1016
ANN30-12-1 ANN9-15-1
80.30 86.90
0.93 0.95
21 21
0.60 0.71
0.91 0.88
412 413
0.47 0.67
0.94 0.89
879 884
A N N 33-4-1 M LR
Huuskonen
21
0.81
0.85
412
0.75
0.86
879
0.80 0.87
0.91 0.82
0.90 0.83
0.77
0.82 0.67
0.63 0.79
0.64 0.88
0.99
0.77 0.93
MLR
0.85 0.79
Tetko
21 21
0.59 0.71
0.92 0.86
496 258
0.50 0.70
1.20
0.93 0.86
0.56
S
797 1033
21
0.79
R2
ANN40-8-1 ANN7-2-1
0.82
496
0.93
0.79
n
S
Liu
R2
n
S
R2
797
n
Test set 2
MLR
Type
Test set
Training set
Gasteiger
Model
Huuskonen aqueous solubility dataset
Table 16-2 Summary of different methods and models for the
Unpublished work Unpublished work
Wegner and Zell, /. Chem. In$ Comput. Sci., 2003, 1077-1084 Unpublished work Unpublished work
Huuskonen,]. Chem. 1nJ Comput. Sci., 2000, 773-777
Liu and So, J. Chem. In& Comput. Sci., 2001, 1633-1639 Tetko et al., J. Chem. In& Comput. Sci., 2001, 1488- 1493
Yan and Gasteiger, I. Chem. In& Comput. Sci., 2003,429-434
References
8 Gl
4
5 -
-m
3-
P
c
s. 0
a2-u
SL
x
1
r\
2
0
2.
Ts
P
b
h
g
1026
I
1 G Prediction ofADMET Properties
0
-2 -4
-6 -8
-1 0
.-
-1 3
-12
-10
-8
-6
-4 -2 Experimental log(S)
Fig. 16-5 Model ofthe Huuskonen aqueous solubility dataset using PLS [34]. Triangles - training set, circles - test set. The plot shows the "deceptively" good
0
2
performance o f the developed model with respect to usage for predicting aqueous solubility for new potential drug compounds (see also Figure 16-6).
16.4.3 In silico Models o f Permeability and FA
16.4.3.1 Descriptors Used for Permeability Predictions Response parameters when studying permeability related absorption can be the permeability through a cell monolayer, such as Caco-2, MDCK, and 2/4/A1; the effective permeability in the intestine; or the FA of the dose. Permeability models predicting intestinal absorption are generally models of transcellular passive diffusion, and descriptors of hydrophobicity, hydrophilicity, and size have proven important (see Table 16-3). Hydrophobic descriptors can be regarded as measures of distribution capacity into the membrane, hydrophilic descriptors as desolvation restriction when the compound partitions from the intestinal aqueous fluid into the hydrophobic membrane, and size reflects the steric hindrance to diffusion through the membrane [35]. The log Pact descriptor has been used historically to predict membrane permeability and hence, it is incorporated into a large number of the models developed. For noncomplex datasets, properties such as log Pact, polar surface area (PSA), and hydrogen bond counts have each been used as a single predictor of permeability [36-391. However, lipophilicity can be regarded as a composed property that is largely dependent on
76.4 Applications and Practical Examples -3
-4 A
A
-5
A
-6
A
L
A
-5 -4 Experimental log(S)
Fig. 16-6 Close-up o f t h e area o f aqueous solubility interest from drug development perspective [34]. Triangles - training set, circles - test set. The graph shows the
-6
“true” or limited performance ofthe developed solubility model with respect t o predictive capability for new compounds.
the size and hydrophilicity of the compound, and thus, the use of these two components might be regarded as more sound than logPo,,. Indeed, the use of molecular weight and number of hydrogen bonds have been shown to predict better the permeability of a smaller dataset than did log pact [401. The introduction of more complex datasets used for model development has pointed at the need for several descriptors and multivariate data analysis (Table 16-3). For instance, combinations of PSA and nonpolar surface area (NPSA) proved to predict the permeability of a series of peptides when PSA alone failed [41]. Moreover, the introduction of larger structures and structures with larger flexibility showed that the partitioned total surface areas (PTSAs), that is, the surface area of the molecule occupied by a specific atom, and/or descriptors related to the flexibility of the molecule are in the permeability predictions [42, 431. Electrotopological indices have been used to predict permeability, computationally (Table 16-3). The electrotopological descriptors are not always easily comprehended, even though they can be attributed to describe hydrophobicity, hydrophilicity, and size. Other typical 2D generated descriptors are related to dispersion forces, polarizability, solute molar volume, and hydrogen bonding acidity and basicity [44-471. Descriptors such as log POct/logDo,,,
I
1027
1028
I
1G Prediction ofADMET Properties
Table 16-3 Quantitative in silico models based on Caco-2 permeability values or human fraction absorbed (FA) data Response
Type of descriptors Statistical method
Caco-2 Papp
LR
0.94
10
Caco-2 Papp
Number of hydrogen bonds PWASA
LR
0.98
11
Caco-2 Papp
PSA
SR
0.96
9
Caco-2 Papp
Molecular surface MLR areas Solute and solvation M LR related
0.96
19
0.86
30
Caco-2 Papp
R2
Nt, Nte References
0 Conradi et al., Pham. Res., 1992,435-439 0 Hjort Krarup et al., Pharm. Res., 1998, 972-978 0 Ertl et al.,]. Med. Chem., 2000, 3714-3717 0 Stenberg et al., Pharm. Res., 1999,205-212 8 Kulkarni et al., ]. Chem. 1 5 . Comput. Sci., 2002, 331-342 23 Hou et al.,]. Chem. In5 Comput. Sci., 2004, 1585-1600 12 Marreroetal.,]. Pharm. Pharm. Sci., 2004,186-199
Caco-2 Papp
PSA, lipophilicity, size, and flexibility
MLR
0.71
77
Caco-2 Papp
Hydrogen bond capacity, lipophilicity, and size Hydrogen bond strength and electrostatics Hydrogen bond capacity, lipophilicity, size, and flexibility Hydrogen bond capacity and lipophilicity
MLR
0.71
33
PLS
0.85
9
8 Norinder et al., Pharm. Res., 1997,1786-1791
PLS
0.80
16
0 Oprea and Gottfries, J. Mol. Graph Model, 1999,261-274
PLS
0.92
11
0 Osterbergand Norinder, J. Chem. In5 Comput. Sci., 2000, 1408-141 1 0 Osterberg and Norinder, Eur. J. Pharm. sci., 2001, 327-337 10 Stenberg eta].,]. Med. Chem., 2001, 1927-1937 10 Stenberg et al.,]. Med. Chem., 2001, 1927-1937 10 Stenberg et a].,]. Med. Chem., 2001, 1927-1937
Caco-2 Papp
Caco-2 Papp
Caco-2 Papp
Caco-2 Papp
Size, surface tension, and dielectric constant
PLS
0.90
16
Caco-2 Papp
Electrotopological indices
PLS
0.71
17
Caco-2 Papp
Hydrogen bond strength and electrostatics Surface areas
PLS
0.87
17
PLS
0.93
17
Caco-2 Papp
76.4 Applications and Practical Examples Table 16-3 (continued) Response
Type of descriptors
Statistical method
R2
Caco-2 Pa,,
Electrotopological indices
PLS
0.91
Caco-2 Papp
Surface areas
PLS
0.93
Caco-2 Pa,,
Hydrogen bond capacity, PSA, and charge Hydrogen bond capacity, charge, polarizability, and dipole moment PSA
PLS
0.83
NN
0.62
SR
0.91
Caco-2 active Size, electrostatics, trp (peptides) and flexibility
P LS
0.75
Caco-2 active Electrotopological trp (peptides) indices
PLS
0.92
FA
PSA
SR
0.94
FA
PSA
SR
0.91
FA
Structural fragments
M LR
0.79
FA
Hydrogen bond capacity, lipophilicity, size, and flexibility Hydrogen bond capacity and lipophilicity
PLS
0.50
PLS
0.93
FA
Electrotopological indices
PLS
0.83
FA
Hydrogen bond capacity, size, and flexibility
NN
0.87
Caco-2 Papp
Caco-2 P,
FA
Nt, Nt, References
9
8 Norinder and Osterberg, J . harm. Sci., 2001, 1976-1085 13 10 Bergstrom et al.,J. Med. Chem., 2003, 558- 570 20 10 Matsson et al., J. Med. Chew., 2005,604-613
87
0
Fujiwara et al., 1nt.J. Pharm., 2002,95-105
0 Palm et al., J . Med. Chem., 1998, 5382-5392 20 0 Wanchana et al., /. Pharm. Sci., 2004, 3057-3065 20 0 Wanchana et al., J. Pharm. Sci., 2004, 3057-3065 20 0 Palm et al., Pharm. Rex, 1997,568-571 20 0 Ertl et al.,J. Med. Chem., 2000, 3714-3717 417 50 Klopman et al., Eur. J . Med. Chem., 2002, 253-263 85 0 Oprea and Gottfries, J . Mol. Graph Model, 1999,261-274
9
0 Osterberg and Norinder, J . Chem. In$ Comput. sci., 2000, 1408-1411 13 7 Norinder and Osterberg, /. Pham. Sci., 2001, 1976-1085 76 10 Wessel et al., 1.Chem. If: Comput. Sci., 1998, 726-735
74
(continued overleaf)
I
1029
1030
I
16 Prediction ofADMET Properties
Table 16-3 (continued) ~~
~~~
Response
Type of descriptors Statistical method
FA
Hydrogen bond capacity, flexibility, and hydrophobicity
NN
R2
0.86
Nt, N e References
76 10 Niwa,J. Chem. Inf: Comput. Sci., 2003, 113-119
Compilation of descriptors, size of datasets, and statistical models used, and accuracy of published in silico absorption models. Several classification models can be found in the literature, which are regarded as qualitative models and therefore not reported. Caco-2 and FA data were selected for the compilation, since these are the main responses used in the development of computational models. However, other responses such as permeability in 2/4/A1 cell monolayers, artificial membranes, and the MDCK cell line, have also been used as responses in the computational model development. The following abbreviations are used: R2 -coefficient of determination, Nt, and N,,- number of compounds in training set and test set, PaPp-apparent permeability, P,-cellular permeability, active trp - active transport, PWASA - polar water accessible surface area, PSA - polar surface area, LR - linear regression, SR - sigmoidal regression, MLR - multiple linear regression, PLS - partial least squares projection to latent structures, N N - neural network.
polarizability, polarity, strength of Lewis base and acid, number and strength of hydrogen bond donors/acceptors, obtained from quantum mechanics have also been correlated to permeability [42, 48, 491. These descriptors did show high accuracy in the prediction, even though less complex and more rapidly calculated descriptors were almost as accurate. Thus, since quantum mechanic descriptors are not outperforming the fragment-based descriptors with respect to accuracy, they will not be usable in the drug discovery setting until such calculations become faster. 16.4.3.2
Factors Influencing the Accuracy o f Computational Permeability Models
Most published models are based on experimentally determined permeability data in Caco-2 cell monolayers. However, models based on FA (human intestinal absorption) have also been developed. The descriptors used in these models are of the same type as found in the cell-based models. However, the response parameter used generally shows large variability, depending on the methodology used to determine the FA in humans and the interindividual variability (see Section 16.4.1.3), and hence the accuracy of the obtained model is largely influenced. Even for datasets where the compounds have been selected carefully to utilize only passive diffusion to permeate the intestinal cell membrane [SO], it has later become evident that some of the
76.4 Applications and Practical Examples
included compounds also have an active component included in their transport mechanism. The quality of the response parameters can also be varying for the datasets used in permeability models based on cell lines. Permeability values obtained for the same compound using the same cell line in different laboratories will differ in their absolute numbers due to effects of cell culture protocols and experimental procedures during the measurements. Hence, the dataset used for training and evaluation should be determined within the same laboratory using the same experimental protocol. However, classification models might be based on compiled data, since measurements in the different laboratories in general will result in the same ranking of compounds, that is, the compounds will be correctly sorted as poor, intermediate, or high permeability compounds even though the absolute numbers may differ largely between the laboratories. Other important factors influencing the accuracy and applicability of the model are the chemical diversity of the training set used in the model development, the statistical tools used in the development, and the transport mechanisms included in the response parameter. These will influence the models as follows: to be generally applicable and to have high accuracy in the prediction of drug permeability, the training set used should cover a large volume of the druglike space. If a model applicable for a specific therapeutic class is warranted, the training set should be focused on this region of the druglike space. In any of these scenarios, the most important fact to bear in mind is that the training set should be representative of the type of compounds that are to be predicted, that is, if a model is to predict the permeability of drugs then druglike molecules must be used in the model development. Regarding the statistical tool used, it is important to select a statistical and mathematical tool that is sound. Hence, the data has to be preanalyzed so that linear versus nonlinear methods are correctly selected. Finally, it is difficult to obtain transparent and interpretable models if all different kinds of transport routes are included in the measured permeability value. Ideally, separate models are developed for passive transcellular diffusion, passive paracellular diffusion, and for each of the transport proteins that can be utilized. After the establishment of these models, pharmaceutical informatic tools are used to extract the information about the apparent permeability through the intestinal wall. When plotting the permeability versus FA, different cell models will result in largely different slopes and ranges of the respective permeability curve. The cell models, in common, have relatively steep slopes, as exemplified in Fig. 16-7.The presented curves are obtained from permeability measurements using Caco-2 and 2/4/A1 cell lines in our laboratory. The 2/4/A1 cell line has the steepest slope and highest apparent permeability values of the two cell lines, and is in good agreement with the values obtained in human perfusion studies [25]. The steep slopes of these model systems result in the in silico models based on these data, which are good at discriminating high permeability from low permeability. However, a small difference in predicted
I
1031
1032
I
IG Prediction ofADMET Properties
FA
Fig. 16-7 Permeability versus human fraction absorbed. The range and the slope o f the apparent permeability values obtained from different cell models used for in vitro studies o f absorption differ largely, as exemplified with Caco-2 cell permeability values (full line) and 2/4/A1 cell permeability (dashed line). Drawn after Matsson et al.,J. Med. Chem., 2005, 48, 604-61 3.
permeability in comparison to the experimental value in the region of the slope may shift the compound from being predicted as intermediately permeable to be either highly or poorly permeable. Hence, the predictions in the midrange of the permeability values are much more difficult to interpret and draw conclusions from regarding further development.
16.4.4 A Computer-based Biopharmaceutical ClassificationSystem
The biopharmaceutics classification system (BCS) is one way of getting information on drug absorption [51]. According to the BCS, compounds can be sorted into four classes depending on their solubility and permeability: class I compounds with high solubility and high permeability; class 11 compounds with poor solubility and high permeability; class 111 compounds with high solubility and poor permeability; and class IV compounds with poor solubility and poor permeability. High solubility is defined as the maximum dose given orally soluble in 250 mL fluid within the pH interval of 1-7.5, otherwise it is of low solubility. High permeability is defined as 290% absorbed, otherwise it is low [9]. If a compound is sorted as a class I compound, no further clinical studies need to be performed after minor changes in the formulation. Various cut-off values for the BCS have previously been applied as qualitative screening tools for drug absorption in drug discovery and development [9,52,53].Recently, a semiexperimental study using literature solubility data in combination with FA data predicted from the calculated log Po,, correctly sorted 65% of a series of 29 compounds [54]. If a computer-based BCS with high accuracy in the prediction of the absorption characteristics were to be devised, it would be possible to sort compounds absorption-wise, prior to synthesis. Such virtual tools applied in early drug discovery would result in fewer CDs with formulation problems.
76.4 Applications and Practical Examples
In a recent study we used a BCS with six classes, where the solubility was classified as either “low” or “high” in accordance with the cutoff values set by the FDA and the permeability was classified as ‘‘low’’ (FA < 20%), “intermediate”(20% < FA < SO%), or “high” (FA > 80%) [55]. This classification was chosen because we believe it provides a better tool for absorption ranking of compounds in drug discovery than the stricter permeability classification provided by the FDA. Experimental determinations of the Caco-2 permeability and intrinsic solubility were performed in-house, and PLS i n silico models based on PTSAs were derived. In comparison to the experimentally determined data, the combination of the two in silico models resulted in 87% of the compounds being sorted into the correct class. The compounds included in a reference test set given by the FDA were correctly sorted with an accuracy of 77%. To summarize, these results indicate that more sophisticated in silico models combining computational analysis of the solubility and permeability can successfully estimate the absorption process both qualitatively and quantitatively [55].
16.4.5 In silico Toxicity Models
Toxicology is a rather different matter compared with the other ADME disciplines because many different mechanisms may be involved. Thus, the compounds of the investigated dataset may, although they appear to be rather similar, be subject to different toxicological mechanisms that, in turn, give rise to different types of toxicological responses. A large number of papers have been published over the years with proposed models (relationships) that relate molecular structure to a toxicological end point of some sort. Three good literature starting points with respect to the present state of in silico toxicology statistical modeling are by Green [56], Schultz and coworkers [57], and Dearden [58], respectively. The first article is an update on the various softwares that exist for prediction toxicology, for example, DEREK, OncoLogic, HazardExpert, COMPACT, multi-CASE, and TOPCAT, while the article by Schultz and coworkers focuses on QSARs in toxicology. Toxicological end points that are referred to in this investigation are aquatic toxicity, receptormediated toxicity, mutagenicity and carcinogenicity, skin sensitization, and skin and eye irritation, and they are acute. The article by Dearden deals with both softwares but also has references to some specific toxicological Q5AR investigations related to end points such as cytotoxicity, drug resistance, and skin permeability. A study with a historical perspective for the development of QSARs in toxicology published by Schultz and coworkers makes useful reading [12].Within the area of modeling QSARs, including pharmacophore approaches, several articles have appeared in recent years. A QSAR related article on cytochromes P450 has been published by Lewis [59]. Relationships between binding affinities related to various binding site interactions such
I
1033
1034
I as hydrogen bonding and n-n stacking and also to parameters related to 7 G Prediction ofADMET Properties
hydrophobicity, namely, log P and log D, have been developed. An extensive review article related to QSARs of cytochrome P450s has recently been published by Hansch and coworkers [60].A large number of P450 end points and datasets for which QSARs have been investigated are presented in this review article. A slight drawback in many of the P450 datasets in this review is that they are relatively small in size. Typically, many P450 datasets contain between 7 and 15 compounds and the largest investigated dataset contains only 28 members. Although useful for elucidating important properties and possibly rendering some mechanistic insight in fortunate cases, the resulting statistical models are rather local in character with a small applicability domain. The practical use of these models for predicting the behavior of new, virtual, sets of compounds may therefore be of limited value. Lately, additional considerations with respect to avoiding interactions with hERG have entered into the drug development scenario. Avoiding interactions with hERG has become a top priority for many drug companies due to the increased attention with respect to this issue by Federal Drug Administration (FDA) and regulatory agencies in other countries due to the severe consequences associated with hERG interaction such as Q-T interval prolongation. Only a few published studies on hERG SARs have been published so far and much work is currently being conducted to identify properties and/or structural entities that cause hERG channel inhibition. One structure-based model of hERG inhibition based on the KcsA crystal structure has been published, while the other models are ligand based using 3D QSAR techniques such as CoMFA, CoMSIA, and Catalyst. Recently, 2D QSAR descriptions using both more traditional variables as well as holograms have been used to derive models for hERG inhibition. Again, the publicly available training sets for developing in silico models for hERG are rather limited in size, which restrain these models with respect to predictive ability for estimating inhibition of new compounds. An interesting article published by Stouch and coworkers [Gl] addresses some cases where ADME/Tox models fail and the reason for these failures. In some cases, the failure is related to the intended use of the in silico and the expectations of the users of the model. In other cases, failures are related to developmental aspects of the model, such as choice of statistical tool, description of the investigated structures, as well as limited model validation. Feng and coworkers [G2] have benchmarked different sets of descriptors, for example, constitutional descriptors (CONS),topological information indices (TI), BCUT parameters, as well as some fragment (fingerprint) descriptors (FRAG), and statistical methods, for example, recursive partitioning (RP), ANNs, and PLS, on four different datasets with different toxicological end points. They found that three combinations BCUT and RP, FRAG and PLS, as well as FRAG and RP worked better than expected, while two combinations BCUT and NN, together with TI and RP worked somewhat worse than expected. The fact that fragment (fingerprint) descriptors seem to work well
I
IG.5 Future Development and Conclusions 1035
is not too surprising since the concept of toxicophores has been used for quite some time in explaining the toxicological behavior of compounds. At the same time, however, the authors of the article also state that for large datasets there is a clear need for the development of new descriptors and/or statistical methods. 16.5 Future Development and Conclusions
To improve solubility, permeability, and toxicity predictions, further a number of actions are needed. Firstly, as mentioned above, focus should be set on the datasets used for the training of the in silico models. The compounds included in the model development and validation need to be representative of the application of the model. Hence, if a general in silico model is to be developed, a large dataset (i.e., hundreds of compounds) with a chemical diversity covering the volume of the druglike space should be used. On the other hand, if a model applicable for the prediction of a specific subset is warranted, focus should be set on this region of the druglike space to improve the accuracy of the model. Secondly, the experimental setting needs to be standardized and the experimental values used in the model development should be consistently determined using one type of assay. Only high quality data should be incorporated to minimize the effect of
Fig. 16-8 (a) To improve the drug discovery setting, the development o f informatics tools suitable for virtual pharmaceutical screening are highly desirable. Such tools must have the ability t o extract important information related t o each o f the main areas investigated during the drug discovery and early development process, that is, pharmacological effect and ADMET properties. (b) Each ofthese groups
is further divided into a large number o f subgroups as exemplified by absorption. These subgroups may cooperate, counteract, or be independent ofeach other. Furthermore, both qualitative and quantitative information are compiled in these screening, further stressing the importance o f development o f specific software for this application.
1036
I noise on the model. Thirdly, the models should be as simplified as possible. 16 Prediction ofADMEJ Properties
In our opinion, it is therefore better to permeability-wise develop several mechanism-based models revealing, for example, the extent of the passive transcellular and/or paracellular transport, and eventual binding to important transport proteins. Finally, to extract information from such different models for transferring the computational predictions to approximations of the in vivo behavior new data-mining tools need to be devised (Fig. 16-8). The need for data-mining tools devised for pharmaceutical informatics can be exemplified by the absorption process per se. The extent to which a compound is absorbed will be dependent on its dissolution rate, stability (chemical and enzymatical), solubility, and permeability (passive transcellular component, passive paracellular component, active influx, and active e m u ) . For each component in the ADMET screen, the same scenario is valid, that is, a large number of in silico models need to be devised to predict each of the ADMET components. Hence, one of the future challenges will be the development of user friendly,transparent, and fast data-miningtools, allowing pharmaceutical informatics to be performed early in drug discovery. If such computational tools are devised and highly accurate in silico models of ADMET properties applicable to the druglike space are developed, then the prerequisites for a successful virtual drug discovery setting are present.
Acknowledgments
Christel Bergstrom acknowledges financial support from the Knut and Alice Wallenberg foundation and the Swedish Fund for Research without Animal Experiments.
Glossary Multiple Linear Regression (MLR)
The relationship between the independent input variables xi and the dependent variable y is described in the equation:
y = a0
+ a1x1 +
02x2
+
+ +
~3x3
' ' '
anXn
+
E
(5)
The error parameter E is the residual. The parameters a, are adjusted so that the sum of the squared errors ( C E ~for ) all the investigated objects (compounds) is minimized. Partial Least Squares (PLS)
PLS reexpress the original matrix of data ( X ) for the investigated objects (compounds) as the product of a score matrix T and a loading matrix P'. The
Glossary 17037
scores, where each investigated object (compound)has a computed set of score values, give the best summary of X and can be seen as the underlying factors of the studied system. Similarly, the dependent variable Y is decomposed into U and C‘.
U=BxT
The PLS algorithm then minimizes F while preserving the correlation between X and Y through the equation U = B x T. Neural Networks (NN)
NN systems are inspired by the manner in which biological nervous systems, for example, the brain, handle information. A typical NN is constructed from a number of “input nodes” (the X variables), a “hidden layer” of nodes, and an “output node” (the dependent Y variable).
The basic idea of the network is to adjust the weights (wi)of each connection so that, as was the case for MLR, the sum of the squared errors ( C E * )between experimental and predicted output for all the investigated objects (compounds) is minimized. Huuskonen Dataset
The Huuskonen dataset [31] consists of 1297 compounds compiled from the AQUASOL dATAbASE of the University of Arizona (Yalkowsky,S. H.; Dannelfelser, R. M. The ARIZONA dATAbASE of Aqueous Solubility; College of Pharmacy, University of Arizona:
1038
I
1 G Prediction ofADMET Properties
Tucson, AZ, 1990) and SCR’s PHYSPROP Database (Syracuse Research Corporation. Physical/Chemical Property Database (PHYSOPROP); SRC Environmental Science Center: Syracuse, NY, 1994). The experimental aqueous solubility values for the investigated compounds are measured between 20 and 25°C. The logs values of the dataset ranges from -11.62 to f1.58. BCUT Descriptors
The BCUT descriptors are the lowest and highest eigenvalues of a connectivity matrix of a molecule in which the diagonal elements for each atom are assigned properties such as atomic charges, atomic polarizability, or atomic hydrogen bond parameters, respectively. Recursive Partitioning (RP)
RP is a method that in a repetitive (recursive) manner selects variables that separate and enrich different classes, for example, active and inactive or toxic and nontoxic, of compounds to achieve a good discrimination between the classes, thus creating sets of rules to attain that objective.
50 Inactive compounds (I)
References 1.
2.
T. Kennedy, Managing the drug discovery/development interface, Drug Discov. Today 1997, 2,436-444. D.E. Clark, P.D. Grootenhuis, Progress in computational methods for the prediction of ADMET properties, Cum. Opin. Drug. Discov. Devel. 2002, 5, 382-390.
3. S. Modi, Computational approaches to
4.
the understanding of ADM ET properties and problems, Drug Discov. Today 2003,8,621-623. H. van de Waterbeemd, E. Gifford, ADMET in silico modelling: towards prediction paradise? Nut.Rev. Drug Discov. 2003, 2, 192-204.
References 11039
5. P. Artursson, J. Karlsson, Correlation
6.
7.
8.
9.
10.
11.
12.
13.
14. 15.
between oral drug absorption in humans and apparent drug permeability coefficients in human intestinal epithelial (Caco-2)cells, Biochem. Biophys. Res. Commun. 1991, 175,880-885. P. Artursson, R.T. Borchardt, Intestinal drug absorption and metabolism in cell cultures: Caco-2 and beyond, P h a m . Res. 1997, 14, 1655-1658. C.A. Lipinski, F. Lombardo, B.W. Dominy, P.J. Feeny, Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv. Drug Deliv. Rev. 1997, 23, 3-25. I. Kola, J. Landis, Can the pharmaceutical industry reduce attrition rates? Nut. Rev. Drug Discov. 2004,3,711-716. C.A. Lipinski, Drug-like properties and the causes of poor solubility and poor permeability, 1.Phnmacol. Toxicol. Methods 2000,44, 235-249. C.A.S. Bergstrom, U.Norinder, K. Luthman, P. Artursson, Molecular descriptors influencing melting point and their role in classification of solid drugs,]. Chem. InJ Comput. Sci. 2003, 43,1177-1185. J.R. Pappenheimer, K.Z. Reiss, Contribution of solvent drag through intercellular junctions to absorption of nutrients by the small intestine of the rat,]. Membr. Biol. 1987, 100, 123-136. T.W. Schultz, M.T.D. Cronin, J.D. Walker, A.O. Aptula, Quantitative structure-activity relationships (QSARs) in toxicology: a historical perspective,]. Mol. Struct. ( T H E O ) 2003, 622,l-22. M. Kansy, F. Senner, K. Gubernator, Physicochemical high throughput screening: Parallel artificial membrane permeation assay in the description of passive absorption processes, J. Med. Chem. 1998,41,1007-1010. T.W. Schultz, M.T.D. Cronin, Pitfalls in QSAR,]. Mol. Struct. ( T H E O ) 2003, 622,39-51. A.A. Noyes, W.R. Whitney, The rate of solution of solid substances in their
16.
17.
18.
19.
20.
21.
22.
23.
24.
own solutions, J. Am. Chem. SOC.1897, 19, 930-934. K.A. Hasselbalch, The calculation of the hydrogen number of the blood from the free and bound carbon dioxide of the same and the binding of oxygen by the blood as a function of the hydrogen number, Biochem. Z. 1916, 78, 112-144. T.T. Kararli, Comparison of the gastrointestinal anatomy, physiology, and biochemistry of humans and commonly used laboratory animals, Biophurm. Drug Dispos. 1995, 16, 35 1-380. J.B. Bogardus, Common ion equilibriums of hydrochloride salts and the Setschenow equation,]. Pharm. S C ~1982, . 71, 588-590. E. Khalil, S. Najjar, A. Sallam, Aqueous solubility of diclofenac diethylamine in the presence of pharmaceutical additives: a comparative study with diclofenac sodium, Drug Dev. Ind. Pharm. 2000, 26,375-381. T. Arakawa, S.N. Timasheff, Mechanism of protein salting in and salting out by divalent cation salts: balance between hydration and salt binding, Biochemistry 1984, 23, 5912-5923. W.N. Charman, C.J. Porter, S. Mithani, J.B. Dressman, Physicochemical and physiological mechanisms for the effects of food on drug absorption: The role of lipids and pH,]. Pharm. Sci. 1997,86,269-282. I. J , Hidalgo, T.J. Raub, R.T. Borchardt, Characterization of the human colon carcinoma cell line (Caco-2) as a model system for intestinal epithelial permeability, Gastroenterology 1989, 96,736-749. P. Artursson, Epithelial transport of drugs in cell culture. I: A model for studying the passive diffusion of drugs over intestinal absorptive (Caco-2) cells, J. Pharm. Sci. 1990, 79, 476-482. J.D. Irvine, L. Takahashi, K. Lockhart, J , Cheong, J.W. Tolan, H.E. Selick, J.R. Grove, MDCK (Madin-Darby canine kidney) cells: A tool for membrane
1040
I
I6 Prediction ofADMEJ Properties
25.
26.
27.
28.
29.
30.
31.
32.
33.
permeability screening, 1.P h a m . Sci. Pitt, A consensus neural 1999,88,28-33. network-based technique for S. Tavelin, V. Milovic, G. Ocklind, discriminating soluble and poorly S. Olsson, P. Artursson, A soluble compounds, J. Chem. If: Conditionally immortalized epithelial Comput. Sci. 2003, 43, 674-679. cell line for studies of intestinal drug 34. U. Norinder, P. Liden, H. Bostrom, transport, J. Phartnacol. Exp. Ther. Prediction of aqueous solubility using 1999,290,1212-1221. rule-based systems (RDS, H. Lennernas, 0. Ahrenstedt, www.compumine.com) and ensemble R. Hallgren, L. Knutson, M. Ryde, modelling, unpublished results. L. Paalzow, Regional jejunal 35. S.J. Marrink, H.J.C. Berendsen, perfusion, a new in vivo approach to Simulation of water transport through study oral drug absorption in man, a lipid membrane, J. Phys. Chem. Pharm. Res. 1992, 9,1243-1251. 1994, 98,4155-4168. Y.H. Zhao, J. Le, M.H. Abraham, 36. R.A. Conradi, A.R. Hilgers, N.F. Ho, P.S. Burton, The influence of peptide A. Hersey, P.J. Eddershaw, C.N. Luscombe, D. Boutina, G. Beck, structure on transport across Caco-2 B. Sherborne, I. Cooper, J.A. Platts, cells. 11. Peptide bond modification Evaluation of human intestinal which results in improved absorption data and subsequent permeability, Pharm. Res. 1992, 9, derivation of a quantitative 435-439. structure-activity (QSAR) 37. K, palm, p. Stenberg, K. Luthman, with the Abraham descriptors, J. P. Artursson, Polar molecular surface Pharm. Sci. 2001, 90, 749-784. properties predict the intestinal G. Klopman, L.R. Stefan, R.D. absorption of drugs in humans, Saiakhov, ADME evaluation: 2. A Phartn. Res. 1997, 14,568-571. computer model for the prediction of 38. L. Hjorth Krarup, I. Thooger intestinal absorption in humans, Eur. Christensen, L. Hovgaard, S. Frokjaer, J. P h a m . Sci. 2002, 17,253-263. Predicting drug absorption from T. Niwa, Using general regression and molecular surface properties based on probabilistic neural networks to molecular dynamics simulations, predict human intestinal absorption with topological descriptors derived Pharm. Res. 1998, 15,972-978. from two-dimensional chemical 39. K. Palm, K. Luthman, A.L. Ungell, G. Strandlund, F. Beigi, P. Lundahl, structures, J . Chem. h j Comput. Sci. P. Artursson, Evaluation of dynamic 2003, 43, 113-119. M.A. Perez, M.B. Sanz, L.R. Torres, polar molecular surface area as R.G. Avalos, M.P. Gonzalez, H.G. predictor of drug absorption: Comparison with other computational Diaz, A topological sub-structural approach for predicting human and experimental predictors, J. Med. intestinal absorption of drugs, Eur. J. Chem. 1998,41,5382-5392. 40. G. Camenisch, J. Alsenz, H. van de Med. Chem. 2004,39, 905-916. J. Huuskonen, Estimation of aqueous Waterbeemd, G. Folkers, Estimation solubility for a diverse set of organic of permeability by passive diffusion compounds based on molecular through Caco-2 cell monolayers using topology, J. Chem. Inj Comput. Sci. the drugs’ lipophilicity and molecular weight, Eur.J. P h a m . Sci. 1998, 6, 2000,40,773-777. W.L. Jorgensen, E.M. Duffy, 313-319. Prediction of drug solubility from 41. P. Stenberg, K. Luthman, P. Artursson, Prediction of membrane structure, Adv. Drug Deliv. Rev. 2002, 54,355-366. permeability to peptides from D.T. Manallack, B.G. Tehan, calculated dynamic molecular surface properties, P h a m . Res. 1999, 16, E. Gancia, B.D. Hudson, M.G. Ford, D.J. Livingstone, D.C. Whitley, W.R. 205-212.
References I1041 42.
43.
44.
45.
46.
47.
48.
49.
P. Stenberg, U.Norinder, K. Luthman, P. Artursson, Experimental and computational screening models for the prediction of intestinal drug absorption, J . Med. Chem. 2001,44,1927-1937. D.F. Veber, S.R. Johnson, H.Y. Cheng, B.R. Smith, K.W. Ward, K.D. Kopple, Molecular properties that influence the oral bioavailability of drug candidates, I. Med. Chem. 2002, 45, 2615-2623. M.J. Karnlet, R.M. Doherty, v, Fiserova-Bergerova, P,W, Carr, M.H. Abraham, R.W. Taft, Solubility properties in biological media 9 prediction of solubility and part tion of organic nonelectrolytes in blood and tissues from solvatochrornic parameters, _I. Pharm. Sci. 1987, 76, 14-17. J.A. Gratton, M.H. Abraham, M.W. Bradbury, H.S. Chadha, Molecular factors influencing drug transfer across the blood-brain barrier, /. Pharm. Pharmacol. 1997,49, 1211-1216. M.H. Abraham, Y.H. Zhao, J. Le, A. Hersey, C.N. Luscombe, D.P. Reynolds, G. Beck, B. Sherborne, I. Cooper, On the mechanism of human intestinal absorption, Eur. J . Med. Chem. 2002,37,595-605. O.A. Raevsky, S.V. Trepalin, H.P. Trepalina, V.A. Gerasimenko, O.E. Raevskaja, SLIP P ER-2001- Software for predicting molecular properties on the basis of physicochemical descriptors and Structural Similarity,/. Chem. In$ Comput. Sci. 2002, 42, 540-549. U.Norinder, T. Osterberg, P. Artursson, Theoretical calculation and prediction of Caco-2 cell permeability using MolSurf parametrization and PLS statistics, Pharm. Res. 1997, 14,1786-1791. U.Norinder, T. Osterberg, P. Artursson, Theoretical calculation and prediction of intestinal absorption of drugs in humans using MolSurf parametrization and PLS statistics, Eur. I.Pharm. Sci. 1999,8,49-56.
50.
51.
52.
53.
54.
55.
56.
57.
58.
M.D. Wessel, P.C. Jurs, 1.W. Tolan, S.M. Muskal, Prediction of human intestinal absorption of drug compounds from molecular structure, /. Chem. I$ Comput. Sci. 1998, 38, 726-735. G.L. Amidon, H. Lennernas, V.P. Shah, J.R. Crison, A theoretical basis for a biopharmaceutic drug classification: the correlation of in vitro drug product dissolution and in vivo bioavailability, Pharm. Res. 1995, 12,413-420. E. Walter, S. Janich, B.J. Roessler, J.M. Hilfinger, G.L.J.Amidon, HT29-MTX/Caco-2cocultures as an in vitro m ~ ~for e the l intestinal epithelium: in vitro-in vivo correlation with permeability data from rats and humans, Pharm. Sci. 1996, 85, 1070-1076. S. Winiwarter, N.M. Bonham, F. Ax, A. Hallberg, H. Lennernas, A. Karlen, Correlation of human jejunal permeability (in vivo) of drugs with experimentally and theoretically derived parameters. A multivariate data analysis approach, /. Med. Chem. 1998,41,4939-4949. N.A. Kasim, M. Whitehouse, C. Ramachandran, M. Bermejo, H. Lennernas, A.S. Hussain, H.E. Junginger, S.A. Stavchansky, K.K. Midha, V.P. Shah, G.L. Amidon, Molecular properties of WHO essential drugs and provisional biopharmaceutical classification, Mol. phamacol, 2004, 1, 85-96, C.A.S. Bergstrom, M. Strafford, L. Lazorova, A, Avdeef, K. Luthman, P. Artursson, Absorption classification of oral drugs based on molecular surface properties, /. Med. Chem. 2003, 46,558-570. N. Green, Computer systems for the prediction of toxicity: an update, Adv. Drug D e h . Rev. 2002, 54, 417-431. T.W. Schultz, M.T.D. Cronin, T.I. Netzeva, The present status of QSAR in toxicology,/. Mol. Struct. (THEO) 2003, 622, 23-38. J.C. Dearden, In silico prediction of drug toxicity,/. Cornput.-Aided Mol. Des. 2003, 17, 119-127.
1042
I
7G Prediction ofADMET Properties D.F.V. Lewis, S. Modi, M. Dickins, Quantitative structure-activity relationships (QSARs)within substrates of human cytochromes P450 involved in drug metabolism, Drug Metab. Drug Interact. 2001, 18, 221-242. 60. C. Hansch, S.B. Mekapati, A. Kamp, R.P. Verma, QSAR of cytochromes P450, Drug. Metab. Rev. 2004, 36, 105- 156. 59.
T.R. Stouch, J.R. Kenyon, S.R. Johnson, X.-Q. Chen, A. Doweyko, Y. Li, In silico ADME/Tox: why models fail, J . Cornput.-AidedMol. Des. 2003, 17,83-92. 62. J. Feng, L. Lurati, H. Ouyang, T. Robinson, Y. Wang, S . Yuan, S. S Young, Predictive toxicology: benchmarking molecular descriptors and statistical methods, J. Chem. Inf: Comput. S C ~2003,43,14G3-1470. .
61.
PART VII Systems Biology
Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tamn M. Kapoor, and Gunther Wess Copyright 52 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
17 Computational Methods and Modeling 17.1 Systems Biology of the JAK-STATSignaling Pathway
lens Timmer,Markus Kollmann, and Ursula Klingmiiller
Outlook
Systems biology is a worldwide rapidly growing field of research. The central idea of systems biology is to apply mathematical modeling to understand the dynamics of regulatory processes in cell biology. In this chapter we will discuss the necessity of the systems biology approach and exemplify it by an application to cellular signal transduction.
17.1.1 Introduction
After sequencing the genomes of several organisms, including humans, the “text of life” is available. Now, the next step is to learn how to “read” it. This includes the understanding and prediction of cellular responses to external stimuli and to decipher the evolutionary design principles of biochemical networks. Of special medical importance is the understanding of conditions promoting health or leading to disease. In some cases, single gene mutations decide between the two states. But it is more and more recognized that the function of cellular processes is not determined by a single gene, but by regulation of the complex cellular networks. Diseases like cancer result from dysregulations in these networks. Regulation is determined by dynamical interaction of the involved components. Therefore, biological function becomes the systems’ property of dynamic networks. The goal of systems biology is to elucidate the network-based functions of cellular processes. Because of the complexity of these processes, intuition-based Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gurither Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
I
1045
1046
I
77 Computational Methods and Modeling
reasoning is not sufficient to reach this goal, but mathematical computer-based approaches are necessary.
17.1.2 History Development
A systems-based approach to biology dates back to Norbert Wiener (1894-1964) [I]and Ludwig von Bertalanffy (1901-1972) [2]. These early approaches might have suffered from oversimplifying assumptions and far-reaching general claims, but provided groundbreaking examples of how mathematical modeling and ideas from control theory can contribute to a systems level understanding of biology. In the 1970s, two groups independently developed the systems biology of metabolic systems [3, 41. Metabolic systems are especially suited for a mathematical treatment because, in contrast to signaling pathways and gene regulatory networks, they are completely determined by the involved enzymes; usually operate in steady state; obey conservation laws for their components, the metabolites. The developed control theory for metabolic systems allows inferring of, for example, the effects of local changes, like the properties of an enzyme on global properties as the flux through the system. Furthermore, general global properties of the systems were captured by summation and connectivity theorems, see [S] for a comprehensive review. For signaling pathways and gene regulatory networks, the above constraints do not hold and similar general statements are not available. But for specific examples, the ideas from metabolic systems have been generalized to signaling pathways [GI and design principles of signaling pathways and gene regulatory networks have been discovered [7, 81. An important topic of recent research is the robustness of the systems because they have to function in a noisy environment under fluctuating conditions. These investigations reach from bacterial chemotaxis [9, 101 via components of signaling pathways [11] to developmental biology 1121, see Ref. 13 for a recent review. For signaling pathways, recent years have seen an increasing number of studies of specific pathways where mathematical modeling is applied to infer systems’ properties from the models. These applications include the mitogenactivated protein (MAP)-kinasepathways [14-161, apoptotic pathways [17-191, the WntlB-Catenin [20],and the Janus kinase-signal transduction and activator of transcription (JAK-STAT)pathway [21].A regulatory network that has been studied intensively is the cell cycle [7, 22, 231. Because of the nascent state of systems biology, only few textbooks are available [24-261.
77.7 Systems Biology oftheJAK-STAT Signaling Pathway
17.1.3 General Considerations
Since Newton’s days, Physics and Engineering have been extremely successful in understanding the inanimate part of nature by applying mathematics and translating these insights into technological developments. It is foreseeable that in the twenty-first century an analogous development will take place for the animate part of nature, including technology based on the insights of the basic sciences. Arguments for the helpful contributions of mathematics applied to the life sciences include: Make assumptions explicit Decades of work in biology have produced enormous amounts of knowledge rendering it difficult to see the forest for the trees, that is, to judge what the important players and effects are. A mathematical description necessitates being explicit about what the assumptions of a model are. Understand essential properties from failing models If a mathematical models fails to describe biological data, this gives the valuable information that the assumptions of the model missed an essential part. Condense information, handle complexity The huge extent of biological knowledge is also an obstacle since it does not allow for intuition-based reasoning due to its complexity. Mathematical modeling can help handle the complexity by condensing it into a model. Understand role of dynamical processes, for example, feedback Dynamic properties like combinations of positive and negative feedbacks induce system properties that can only be captured by mathematics, see Ref. 16 for an example, where a mathematical treatment elucidates why cells react differently to transient and sustained stimuli. Impossible experiments become possible Mathematical models allow for in silico biology. Experiments that might be impossibIe biochemically can be conducted using the computer. Prediction and control On the basis of mathematical models, new experiments can be suggested and their outcome can be predicted. Especially, the control of networks can be investigated in silico. This enables identification of targets for medical intervention. Understand what is known Pure biological facts can be understood in the context of dynamic behavior.
I
1047
1048
I
17 Computational Methods and Modeling
Discover general principles It is expected that nature developed a limited number of “tricks” and principles independent of specific implementations to ensure, for example, robustness of the biological function in a noisy environment. Mathematics will be helpful in discovering these general design principles. “You don’t understand it until you can model it” Being able to mathematically model a biological process might be the final proof of understanding. All these arguments apply to biology in general; but due to its network structure, especially to cell biology in terms of metabolism, signal transduction, and gene regulation. Systems biology can be defined as the endeavor to understand biomedical systems using data-based mathematical modeling of their dynamic behavior. The final goal is to turn the life sciences from a qualitative, descriptive science into a quantitative, predictive science. Systems biology relies on other fields of research but should also be distinguished from them, since systems biology is more than . . . . . . Mathematical Biology because systems biology is data based Mathematical Biology formulates and investigates mathematical models inspired by biology but it is de facto a part of mathematics often not getting back to biology. Systems biology requires close collaborations between theoreticians and experimentalists. This ranges from the joint planning of experiments to the corporate interpretation of the results of the mathematical models including the formulation of new hypotheses to be tested in the next cycle between “wet-lab” and “dry-lab”. . . . Bioinformatics because systems biology considers the dynamics Bioinformatics is an important basis for systems biology in, for example, identifying the components involved but does not deal with the dynamic aspects of networks that are substantial for systems biology. . . . another ‘omics’-technologybecause systems biology involves mathematics Proteomics, genomics, metabolomics, and other high-throughput technologies to monitor the state of cells in certain respects provide important information for systems biology, but systems biology should not be understood as “putting the . . .omics together”. It should be noted that the term systems in systems biology originates from systems
17.1 Systems Biology oftheJAK-STAT Signaling Pathway
sciences, that is, the mathematical discipline of how to infer properties from dynamical models. . . . “one Postdoc - one protein” because systems biology considers the system Although “systems” in systems biology stems from systems sciences the goal is to understand systems in the colloquial sense. The detailed investigation of the components of the systems is the indispensable basis to reach this aim. 17.1.4 Practical Example
Considerable progress has been made in identifying the molecular composition of complex signaling networks. However, as outlined above, to reveal the systems properties, quantitative models based on experimental observations have to be developed. In this section, the core module ofthe JAK-STATpathway of the Epo receptor is investigated. On the basis of time-resolved quantitative measurement of the receptor activity, unphosphorylated and phosphorylated STAT-5 in the cytoplasm, the parameters in differential equations describing the pathway are estimated. The analysis will show that the so far believed assumption of a feed-forward cascade to describe the pathway is not compatible with the experimental data. A generalization of the model that includes nucleocytoplasmatic cycling is suggested. The final model is validated by successfully predicting the outcome of a new experiment. From this model, we infer the time courses of the unobserved STAT-5 populations and show that, on a systems level, fast nucleocytoplasmatic cycling of STAT-5 serves as a remote sensing system to couple gene activation to receptor activity. The JAK-STATpathway of the Epo receptor is essential for proliferation and differentiation of erythroid progenitor cells. Binding of Epo to the extracellular part of the receptor leads to activation by phosphorylation of the JAK2 at the cytoplasmic domain of the receptor. In turn, this leads to receptor recruitment and to phosphorylation of monomeric STAT-5, a member of the STAT family of transcription factors. The phosphorylated monomeric STAT-5 forms dimers and these dimers migrate into the nucleus where they bind to promoter regions of the DNA and initiate gene transcription. Afterwards, it is dephosphorylated and dedimerized. It was debated whether STATs are degraded in the nucleus [27],or exported back to the cytoplasm [28]. In any case it was believed that the active role of STAT-5 ends in the nucleus. Thus, the JAK-STAT signaling pathway represents a feed-forward cascade. Its graphical representation is given in Fig. 17.1-1. Assuming mass-action kinetics and denoting the amount of activated Epo receptors by EpoRA(t), unphosphorylated monomeric STAT-5 by ( t ) , phosphorylated monomeric STAT-5 by x l ( t ) , phosphorylated dimeric STAT-5 in the cytoplasm by x3 ( t ) and phosphorylated dimeric STAT-5 in the nucleus
I
1049
1050
I
17 Computational Methods and Modeling
Fig. 17.1-1 Graphical representation of the JAK-STAT pathway o f the Epo receptor. The dashed line represents a possible export of STAT-5 from the nucleus back t o the cytoplasm that is, however, not involved in the signaling.
by x 4 ( t ) ,we arrive at the following dynamic model where the time dependence is suppressed for the sake of clarity:
23
= +0.5 k 2 ~ :- k 3 ~ 3
(3)
k4
=+k3~3
(4)
These equations describe the yield and loss of the different components. For example, Eq. (1) states, that the unphosphorylated STAT-monomer x1 is reduced, expressed by the minus sign, with a certain rate k l due to the interaction of the STAT-monomer with the activated receptor described by x1E ~ o R A .Since this interaction leads to the phosphorylated STAT-monomer x 2 , the same term as in Eq. (l),but with positive sign appears in Eq. (2). The second part of Eq. (2) describes the loss of the phosphorylated STAT-monomer x 2 by dimerization with rate constant k z . This term appears in Eq. (3) with the factor of 0.5 since two monomers form one dimer. The second term in Eq. (3) and the right-hand side of Eq. (4),finally, describe the transport of the dimer into the nucleus.
17. I
Systems Biology oftheJAK-STATSignaling Pathway
The initial values for x2,x3,and x4 are zero, the initial value for x1 is a free parameter that in addition to the parameters k l , k 2 , and k3 has to be estimated from the data. These equations have a vivid meaning. For example, Eq. (1)means that the rate of change of the unphosphorylated monomer is negative and proportional to the interaction of the monomer with the activated receptor. The rate is determined by kl . By quantitative immunoblotting, the time courses of the phosphorylated (monomeric, x2, and dimeric, xj) STAT-5 in the cytoplasm y t ( t ) , the total amount of STAT-5in the cytoplasm y 2 ( t ) and the activation of the Epo receptor y 3 ( t ) , were determined. The measured values represent relative units. For a detailed description of the biochemical techniques to measure the different components, see Ref. 21. All together, the observation equations read:
where k5 - k7 have to be included as scaling parameters since only relative units can be measured. The factor of 2 in Eqs. (5, 6) reflect the fact that a dimer produces a signal that is twice as high as a monomer. Note, that E ~ o R Ameasured , by y 3 , is not a dynamical variable but an external input. The observables y1 and y2 will later be used to estimate the parameters. To first gain insights into the properties of this system, a simulation study is performed. Therefore, all parameters are set to 1, and an artificial Eporeceptor time course is chosen. The dynamical model is solved numerically and the observation equations are evaluated. The resulting time courses for the phosphorylated STAT-5 in the cytoplasm y l and the total amount of STAT-5 in the cytoplasm y2 are displayed in Fig. 17.1-2. The qualitative behavior is identical for all parameter settings: The phosphorylated STAT-5 in the cytoplasm shows a biphasic behavior, the total amount of STAT-5 in the cytoplasm decreases monotonically. However, the quantitative behavior depends on the parameters. Thus, if simulated model predictions are compared to experimental data, it is difficult to decide whether discrepancies between simulated and measured data result from inadequate parameters or from an insufficient model. To resolve this simulation dilemma [29], we will estimate the parameters from the experimental data. Mathematically, the equations of the system under investigation can be summarized as:
I
1051
1052
I
17 Computational Methods and Modeling
Simulation 4
Simulation 3
1' 08
0" 0
2
t
# I
6
4
8
10
t
Fig. 17.1-2 Results of a simulation study for yl (phosphorylated STAT-5 in the cytoplasm, red) and yz (total STAT-5 in the cytoplasm, blue). Initially, upper left, all parameters are set t o 1, for the other plots parameters k l t o k:, are set to 10.
Equation (8) captures the dynamical equations (1-4), the parameters, and the activation ofthe Epo receptor as an external input u. Equation (9)describes how the sampled observables are linked to the dynamical variables and also includes observational noise &(ti) always present in experimental data. Estimation of the parameters is based on minimizing the error function:
+
where $ ( t i ) denotes the experimental data, $(ti;Il(t = 0), k) denotes the model predictions depending on the parameters and the initial values, and 02 denotes the variance of the noise. Numerical techniques are established to Y fulfill this task [29, 301. Figure 17.1-3 displays time courses of Epo-receptor activation, phosphorylated STAT-5in the cytoplasm, and the total amount of STAT-5 in the cytoplasm for one representative experiment. The receptor displays its maximal activity
I
17.1 Systems Biology oftheJAK-STAT Signaling Pathway
1053
8 min after stimulation. In the time series ofphosphorylated STAT-5,a plateau is reproducibly detected between 10 and 30 min. With the feed-forward model, Eqs. (1-3), derived from the graphical representation in Fig. 17.1-1, the experimental data in Fig. 17.1-3, connected to the model by Eqs. (5-7), and the numerical techniques to estimate the parameters, we arrive at the modeling results displayed in Fig. 17.1-4. For the phosphorylated STAT-5 in the cytoplasm, the model does not capture the plateau between 10 and 30 min and the behavior of total STAT-5
2ot
I
:01j
4
O
5
OO
(4
m
Lp>
-72;-30-
Time (min)
40
50
60
Time (min)
1.2,
a t
Fig. 17.1-3 Examples ofthe measured time series. (a) Activation o f t h e Epo receptor. (b) Phosphorylated STAT-5 in the cytoplasm. (c) Total amount o f STAT-5 in the cytoplasm. 35
1.2
--.__ (I)
I . _ I -
0
(4
Time (min)
(b)
10
20
30
Time (min)
Fig. 17.1-4 Fit ofthe feed-forward model, Eqs. (1-4), to the measured time series o f phosphorylated (a) and total (b) STAT-5 in the cytoplasm.
40
50
60
1054
I in the cytoplasm is completely missed. This calls for a reconsideration of the 17 Computational Methods and Modeling
biological assumptions that led to Fig. 17.1-1. In an iterative process different extensions of the model were tested, see [21, 31, 321 for mathematical and statistical details. The result is that the export of STAT-5 from the nucleus plays an active and essential rule in this pathway. The export of STAT-5 was modeled by a delay term xi = x3(t - r ) , describing the sojourn time of STAT-5in the nucleus. The extended model reads: Xi
(11)
= 2p4X; - PlxlEpoRA
(14)
24 = p3x3-p4xj
The results of a fit of this model to the data are displayed in Fig. 17.1-5 and demonstrate a good agreement of the model trajectories with the experimental data. As a surprising result, the sojourn time T of STAT-5 in the nucleus turned out to be approximately G min. The fitted trajectory for phosphorylated STAT-5 shows that the "plateau" between 10 and 30 min is not a plateau, but results from waves of phosphorylated STAT-5through the nucleus. Simulating the model allows investigation of the single populations x1 to x4 of STAT-5.The in silico results are given in Fig. 17.1-6. It is observed that the unphosphorylated monomer x1 is completely processed in the first wave of activation, Furthermore, the concentration of the phosphorylated monomer x2 is low for the whole time because the dimerization process is fast. This explains the experimental experience that the phosphorylated monomer is difficult to measure. The model explains this fact in a natural way. On the basis of the fitted model, a sensitivity analysis is performed. These in silico investigations mean that the parameters of the model are changed and the (predicted) effect on the function of the system is determined. Because we
35 I m
Time (min)
Fig. 17.1-5
1.2 IT
(b)
Fit o f t h e extended model, Eqs. (11-14), including nucleocytoplasmatic cycling t o the measured time series of phosphorylated (a) and total (b) STAT-5 in the cytoplasm.
Time (min)
7 7.7 Systems Biology oftheJAK-STAT Signaling Pathway
Fig. 17.1-6 In silico results. Simulation ofthe single STAT components. Blue: unphosphorylated monomer X I , black: phosphorylated monomer x z , green: phosphorylated dimer in the cytoplasm x 3 , red: phosphorylated dirner in the nucleus x 4 .
deal with signal transduction, activation of target gene is the most important function. For the study, target gene activation is assumed to be proportional to the shuttling STAT-5 in the nucleus. The results are displayed in Fig. 17.1-7. Surprisingly, the first step in the network, that is, variation of the phosphorylation of the monomeric STAT-5 described by kl has the smallest
Fig. 17.1-7 I n silico results. Sensitivity dimerization ( k z ) , green: nuclear import ( k 3 ) , red: sojourn time in the nucleus ( r ) , analysis. Predicted influence ofthe single yellow: nuclear export ( k 4 ) . parameters on gene transcription. Black: phosphorylation o f t h e monomer ( k l ) , blue:
I
1055
1056
I influence on gene activation. It can be varied by a factor of 2, showing next to 17 Computational Methods and Modeling
no effect. The parameters describing the nucleocytoplasmic shuttling ( k 3 , k 4 , and t) have the largest influence. Especially, setting k4 to zero, meaning to inhibit the nuclear export, is predicted to decrease target gene activation by a factor of 2. This prediction can be tested experimentally. The substance Leptomicin B inhibits the nuclear export of STAT-5. Figure 17.1-8(a) shows the time course of the protein CIS whose translation is initiated by the JAK-STAT signaling pathway. The areas under the curves differ roughly by the predicted factor of 2. Results for repeated experiments in Fig. 17.1-8(b)demonstrate that Leptomicin B has no effect on CIS translation without Epo stimulus. In the case of stimulation, the protein production is decreased by a factor of 2 if Leptomicin B is applied, which confirms the in silico prediction of the extended model and finally validates the model. In summary, the mathematical model allows for the inference of two system’s properties STAT-5 is not available in excess. The cell acts economically: By cycling STAT-5 is “recycled”. Fast cycling of STAT-5 represents a remote sensor system to closely couple gene expression to receptor activation. A saying in mathematical modeling reads: “All models are wrong, but some are useful”. This also holds in the presented case:
Fig. 17.1-8 Experimental confirmation of the in silk0 prediction ofthe extended model. (a) Time course of the translation o f the proteins CIS with and without inhibition ofthe nuclear export of STAT-5 by Leptomicin B (LMB). (b) Summary of repeated experiments.
17.1 Systems Biology oftheJAK-STATSignaling Pathway
No scaffolding for receptor- STAT-5interaction The interaction OFSTAT-5with the receptor that we have described by Eq. (1)is a highly complex process. A detailed modeling of this process would require up to 50 equations containing approximately the same number of parameters. Spatial effects We have treated the cell as a well-stirred reactor, which is certainly not true for the highly structured cell. Stochastic effects The deterministic description by the proposed model does not capture the stochastic effects that are always present in living systems. Data averaged over 10‘ cells The biochemical process to generate the experimental data averages over 10‘ cells, which are not identical. Nevertheless, the final model is reasonable because it fulfills the two central requirements of a successful model: Capture the main effect Make testable predictions Defacto the above listed shortcomings are not relevant. Even more, it is in fact not desirable to have a model that exactly copies the cell. I t would have too many parameters and it would not tell what the relevant effects are.]) In this sense, successful modeling means to make well-chosen “errors”. In summary, the example has shown that given quantitative time-resolved experimental data, it is possible to turn qualitative, static cartoons like Fig. 17.1-1 into quantitative dynamical models allowing for Testing the cartoon Calculating unobservable components Manipulating the system in silico Identifying efficient manipulation targets Predicting the outcome of new experiments Inferring systems’ properties 17.1.5 Future Development
The limiting factor in systems biology is high quality data [16]. Mathematical modeling can only give as much information as is coded in the data. Unfortunately, most techniques including the high-throughput “omics” technologies I draw my dog exactly as he is, I will have two dogs, but never a piece of art”, for modeling holds: “If I model the cell exactly as it is, I will have two cells. but never a model”.
1) In analogy to Goethe’s saying: “If
I
1057
1058
7 7 Computational Methods and Modeling
I up to now produce mainly qualitative data. The rapid technological developments in these areas and new techniques like quantitative immunoblotting 1331 or protein chips will allow building and validating larger models, including also the interactions between signaling, and gene regulatory and metabolic networks. So far, most of the measurement techniques average over a large number of cells not taking into account cell-to-cellvariability. Imaging methods will allow investigation of the dynamic behavior in single cells [34, 351. On the basis of these technologies systems biology is expected have a major impact on medicine: As demonstrated by Fig. 17.1-7in the above application to the JAK-STATpathway, sensitivity analysis can contribute to the identification of drug targets facilitating the early stages of drug development. Adverse effects are a major reason for terminating clinical trials in the late stages of drug development. Systems biology models, including, for example, drug metabolism, can help discover adverse effects earlier. The effects of the drugs show a large interindividual variability due to polymorphisms [36, 371. Systems biology approaches taking this into account will help in transferring current medicine from mainly being reactive to a predictive and preventive personalized medicine as visualized in Ref. 38.
References N. Wiener, Cybernetics, or Control and Communication in the Animal and the Machine, MIT Press, Cambridge, 1948. 2. L. von Bertalanffy, General Systems n e o r y , Braziller, New York, 1968. 3. R. Heinrich, T.A. Rapoport, A linear steady-state treatment of enzymatic chains. General properties, control and effector strength, Eur. /. Biochem. 1.
1974,42,89-95. H. Kacsar, J.A. Burns, The control of flux, Symp. Soc. Exp. Biol. 1973, 27, 65- 104. 5. R. Heinrich, S. Schuster, R e
blinkers: dynamics of regulatory and signaling pathways in the cell, C u r . Opin. Cell Biol. 2003, 15, 221-231. 8. S. Shen-Orr, R. Milo, S. Mangan, U. Alon, Network motifs in the transcriptional regulation network of Escherichia coli, Nut. Genet. 2002, 31, 64-68. 9. N. Barkai, S. Leibler, Robustness in
simple biochemical networks, Nature
4.
Regulation of Cellular Systems, Chapman & Hall, New York, 1996. 6. R. Heinrich, B.G. Neel, T.A. Rapoport, Mathematical models of the protein kinase signal transduction, Moi. Cells
1997,387,913-917. 10. U. Alon, M.G. Surette, N. Barkai,
S. Leibler, Robustness in bacterial chemotaxis, Nature 1999, 397, 168-171. 11.
2002, 9,957-970. 7.
J.J. Tyson, K.C. Chen, B. Novik, Sniffers, buzzers, toggles and
12.
N. Bliithgen, H. Herzel, How robust are switches in intracellular signaling cascades? J . Theor. Biol. 2003, 225, 293-300. G. von Dassow, E. Meir, E.M. Munro, G.M. Odell, The segment polarity
References I1059
network is a robust developmental module, Nature 2000, 406, 188-192. 13. J. Stelling, U. Sauer, 2. Szallasi, F.J. Doyle, J. Doyle, Robustness of cellular functions, Cell 2004, 118, 675-685. 14. A.R. Asthagiri, D.A. Lauffenburger, A computational study of feedback effects on signal dynamics in a mitogen-activated protein kinase (MAPK) pathway model, Biotechnol. Prog. 2001, 17, 227-239. 15. B. Schoeberl, C. Eichler-Jonsson, E.D. Gilles, C . Muller, Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors, Nut. Biotechnol. 2002, 20, 370-375. 16. U.S. Bhalla, P.T. Ram, R. Iyengar, MAP kinase phosphatase as a locus of flexibility in a mitogen-activated protein kinase signaling network, Science 2003, 297,1018-1023. 17. M. Fussenegger, J.E. Bailey, J. Varner, A mathematical model of caspase function in apoptosis, Nat. Biotechnol. 2000, 18,768-774. 18. T. Eissing, H. Conzelmann, E.D. Gilles, F. Allgower, E. Bullinger, P.Scheurich, Bistability analyses of a caspase activation model for receptor-induced apoptosis, /. Biol. Chem. 2004, 279, 36892-36897. 19. M. Bentele, I. Lavrik, M. Ulrich, S. StoBer, H. Kaltoff, P.H. Krammer, R. Eils, Mathematical modeling reveals threshold behavior of CD95-induced apoptosis, /. Bid. Chem. 2004, 166,839-851. 20. E. Lee, A. Salic, R. Kruger, R. Heinrich, M.W. Kirschner, The roles of APC and Axin derived from experimental and theoretical analysis of the Wnt pathway, PLoS 2003, 1, 116-132. 21. I. Swameye, T. Muller, J. Timmer, 0. Sandra, U. Klingmuller, Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by data-based modeling, Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 1028-1033. 22. B. Novak. 2. Pataki, A. Ciliberto, 1.7. Tyson, Mathematical model of the
cell division cycle of fission yeast, Chaos 2001, 1 I , 277-286. 23. B. Novak, J.J.Tyson, Modelling the controls of the eukaryotic cell cycle, Biochem. Soc. Trans. 2003, 31, 1526- 1529. 24. H.Kitano, Foundations ofsystems Biology, MIT Press, Cambridge, 2001 25. E. Klipp, R. Henvig, A. Kowald, C. Wierling, H. Lerrach, Systems Biology in Practice, Wiley-VCH, Weinheim, 2005. 26. L. Alberghina, H.V. Westerhoff, Systems Biology,Springer, New York, 2005. 27. T.K. Kim, T.Maniatis, Regulation of interferon-y-activated STATl by the ubiquitin-proteasome pathway, Science 1996,273,1717-1719. 28. M. Koster, H. Hauser, Dynamic redistribution of STATl protein in IFN signaling visualized by GFP fusion proteins, Eur. J. Biochem. 1999, 260,137-144. 29. J. Timmer, H. Rust, W. Horbelt, H.U. Voss, Parametric, nonparametric and parametric modelling of a chaotic circuit time series, Phys. Lett. A 2000, 274, 123-134. 30. H.G. Bock, Recent advances in parameter identification for ordinary differential equations, in Progress in Scientijc Computing, vol. 2, (Eds.: P. Deuflhard, E. Hairer), Birkhauser, Boston, MA, 1983,95-121. 31. T.G. Muller, D. Faller, J. Timmer, I. Swameye, 0 . Sandra, U. Klingmuller, Tests for cycling in a signalling pathway, J. Royal. Stat. Soc. C: Applied Stat. 2004, 53, 557-568. 32. J.Timmer, T. Muller, 0. Sandra, 1. Swameye, U. Klingmuller, Modelling the non-linear dynamics of cellular signal transduction, Int. /. Bfurcat. Chaos 2004, 14,2069-2079. 33. M.Schilling, T.Maiwald, S. Bohl, M. Kollmann, J. Timmer, U . Klingmuller, Quantitative data generation for systems biology - the impact of randomisation, calibrators, and normalisers, I E E Proc. Systems Biology, 2006, 152, 193-200. 34. D.E. Nelson, A.E.C. Ihekwaba, M. Elliott, J.R. Johnson, C.A. Gibney,
1060
I
17 Computational Methods and Modeling
B.E. Foreman, G. Nelson, V. See, CYP2B6 gene with impact on C.A. Horton, D.G. Spiller, expression and function in human liver, Phamacogenetics 2001, I I, S.W. Edwards, H.P. McDowell, J.F. Unitt, E. Sullivan, R. Grimley, 399-415. N. Benson, D. Broomhead, D.B. Kell, 37. 0. Burk, H. Tegude, I. Koch, M.R.H. White, Oscillations in NF-KB E. Hustert, R. Wolbold, H. Glaeser, signaling control the dynamics of gene K. Klein, M.F. Fromm, A.K. Nuessler, expression, Science 2004, 306, P. Neuhaus, U.M. Zanger, 704-708. M. Eichelbaum, L. Wojnowski, 35. N. Rosenfeld, J.W. Young, U. Alon, Molecular mechanisms of P.S. Swain, M. Elowitz, Gene polymorphic CYP3A7 expression in regulation at the single-cell level, adult human liver and intestine, /. Science 2005, 307,1962-1965. Biol. Chem. 2002,277,24280-24288. 36. T. Lang, K. Klein, J. Fischer, 38. L. Hood, J.R. Heath, M.E. Phelps, A.K. Niissler, P. Neuhaus, B. Lin, Systems biology and new U. Hofmann, M. Eichelbaum, technologies enable predictive and M. Schwab, U.M. Zanger, Extensive preventative medicine, Science 2004, genetic polymorphism in the human 306,640-643.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
17.2 Modeling lntracellular Signal Transduction Processes
17.2 Modeling lntracellular Signal Transduction Processes
Jason M. Haugh and Michael C. Weiger
Outlook
The ability to control normal and diseased cell function will require quantitative analyses of how cells perceive and decode information. Involving enzymecatalyzed reactions and assembly of protein-protein and protein-lipid complexes that modulate enzyme activity, signal transduction is the biochemical integration of information inside the cell, and manipulation of signal transduction networks thus offers a broad-based approach to influence cell behavior. Mathematical modeling approaches, wherein chemical kinetics, spatial distributions of molecules, and biophysical constraints may be described in dynamic and unambiguous terms, are being applied with increasing frequency to analyze biochemical signaling mechanisms more critically. Once validated by quantitative measurements, such models may soon offer a means to predict the integrated behavior of interacting pathways and combinations of cell stimuli. We discuss here the recent advances in, and challenges faced by, this emerging field.
17.2.1 Introduction
The past 15 years or so have seen a shift in the focus of biological research to the study of molecular mechanisms underlying cell regulation and function. Thus, we now have a qualitative roadmap of how intracellular molecules are organized to form signal transduction pathways, which govern cell decisionmaking in a tightly controlled, context-dependent manner [l].However, it is not yet fully appreciated how biochemical mechanisms affect the kinetics of pathway activation, or how the magnitudes and/or timing of those signals are related to the likelihood and quality of a cell response. Mathematical modeling of signal transduction interactions, pathways, and networks is emerging as a powerful tool that can aid in explaining and interpreting experimental data. In most cases, the explanations are fairly intuitive (at least in hindsight) once the model has been applied to the problem at hand; in other cases, the conclusions are less so. In any case, quantitative models provide a way to organize hypotheses and integrate the many effects that may be at play. If done correctly, all the inherent assumptions are clearly laid out, because the system is described in the unambiguous language of mathematics. Chemical Biology. From Small Molecults to System Biology and Drug Design Edited by Stuart L. Schreiber. Tarun M . Kapoor. and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KCaA, Weinheim ISBN: 978-3-527-31150-7
1
1061
1062
I
17 Computational Methods and Modeling
In theory, quantitative models of signaling processes offer two distinct advantages over the conceptual models routinely invoked in the signaling literature. First, models may be formulated that are mechanistic, meaning they are based on established principles of physical chemistry and/or mechanics, in which case the form of the model equations is determined by the hypothetical mechanism assumed. In many cases, one may formulate multiple models corresponding to different candidate mechanisms and rule out one or more of them on the basis of a quantitative analysis. Models that are phenomenological, on the other hand, aim to capture at least the qualitative features observed in experiments. They are naturally less powerful, but they serve a definite and useful role and are appropriate in situations where the mechanisms that “connect the dots” are much less certain. Second, to the extent that the model has been trained on a large amount of high-quality quantitative data, and its mechanistic assumptions are sound, it may be used to predict the outcomes of novel experiments and may thus generate new, hypothesis-driven research. Some of the experimental findings will inevitably contradict the predictions of model, but just as with conceptual models, one would iteratively refine the model on the basis of such data. In this chapter we aim to review the progress that has been made in modeling signal transduction, mostly in recent years and also note the pioneering contributions to this field, and we critically assess the open questions that need to be addressed, if the field is to advance. We have intentionally organized the discussion in a top-down manner, starting from the cell’s initial perception of external stimuli and building up step by step to the complex models, which incorporate multiple, interacting signaling pathways (Fig. 17.2-1). Although reductionism is not so fashionable these days, we wish to stress that there is still much to learn from generalized models of relatively simple systems, and that it is easy to neglect the details as we strive toward models of greater scope [2]. Finally, we refer the interested reader to a number of related reviews published recently on the topic of modeling signal transduction [3-81.
17.2.2 Receptor-Binding and Regulation Mechanisms
The first step in most signaling pathways is the binding ofcell surface receptors, which links the presence and concentration of a specific extracellular ligand to the intracellular processes that ultimately govern the cellular response. One often thinks of receptor binding simply as a reversible, bimolecular process, characterized by the dissociation (inverse equilibrium) constant, KD; an apparent KD value is generally defined as the free concentration ofligand that yields half-maximal binding to the cell surface (or to receptors immobilized on a solid support). In the simplest model, each ligand-bound receptor is activated for signal transduction. This picture belies a number of complexities, however, which are most often neglected in models of signal transduction. Arguably the
7 7.2 Modeling lntracellular Signal Transduction Processes
Fig. 17.2-1 Fundamentals o f intracellular signaling. In this chapter, we discuss modeling o f intracellular signaling processes from the top down. (a) One must first consider the binding o f ligands to receptors and receptor dimerization a t the cell surface, as well as the internalization and intracellular processing of receptors and ligands, which affect the number o f functional complexes available for signaling. (b) Signaling complexes organize signal transduction pathways through the
recruitment and covalent modification o f signaling adaptors and enzymes, many o f which act upon substrates associated with the membrane or colocalized in the receptor complex. (c) In many situations, such as the perception of ligand gradients, one must explicitly account for the spatial patterns o f intracellular signaling molecules. After establishing these general concepts, we discuss modeling o f the downstream signaling pathways and networks.
two most important complexities involve the dimerizationlaggregation and intracellular trafficking of receptors, which significantly impact the kinetics and dose response of receptor activation and subsequent intracellular signaling.
17.2.2.1
Receptor Dimerization
Many receptors form dimers or higher oligomers on the cell surface, spontaneously andlor in response to ligand binding. In many cases, receptor dimerization is required for downstream signal transduction. For example, structural constraints generally prevent receptor tyrosine kinases (RTKs)from phosphorylating their own cytosolic tails in an intramolecular fashion, and thus dimerization permits phosphorylation of receptor sites in trans. In the case of multi-subunit receptors such as the interleukin 2 (IL-2) receptor, different subunits can bring together distinct non-RTKs that rely on each other for activation. Although many receptor systems rely on dimerization, this process can occur in different ways, and models can be and have been used to discern between candidate mechanisms. The underlying issues informing such models include the number of binding sites per ligand and receptor molecule, whether multiple subunits/receptor types are involved and their relative affinities for ligand, and whether ligand binding is required/sufficient for dimerization or if other receptor domains are involved. These considerations and the receptor density determine whether receptor activation will exhibit
I
1063
1064
I a hyperbolic (as for
17 Computational Methods and Modeling
1: 1 binding or Michaelis-Menten kinetics), sigmoidal (apparent cooperativity),or bell-shaped dose-response curve (Fig. 17.2-2), and evaluation of candidate models is generally achieved through comparisons
c
0 .+
m > .C I
m m c .-0 0
-
-
0.01
0.1 1 10 100 [Ligand], dimensionless
0.01 0.1 I 10 1001000 [Ligand], arbitrary units Fig. 17.2-2 Receptor dimerization mechanisms and dose response. The manner in which receptor dimers form affects the dose response o f receptor activation and downstream signaling. Here we invoke simple, steady-state models that account for receptor binding, dimerization, and trafficking t o illustrate this point. (a) When dirners form via the lateral association o f t w o 1 : 1 ligand-receptor
complexes, the resulting steady-state dose response (solid curve) is predicted t o exhibit more cooperativity than does the simple 1 : 1 binding case (dashed curve); here, ligand concentration is normalized by the value that yields half-maximal activation. (b) When dimers for via lateral association o f one 1 : 1 complex and a free receptor, a bell-shaped dose-response curve is predicted.
77.2 Modeling lntracellular Signal Transduction Processes
I
with quantitative ligand binding and receptor activation data measured at various times and/or ligand concentrations. Models that focus on or include receptor dimerization have emerged for epidermal growth factor (EGF) [9-131, insulin [14], fibroblast growth factor [15, 161, FcERI (immunoglobulin E) [17, 181, platelet-derived growth factor (PDGF) [19], human growth hormone [20, 211, and IL-2 [22] receptors.
17.2.2.2
Receptor Trafficking
Receptors are not static on the cell surface, as the plasma membrane and all its constituents are turned over at various rates. Membrane proteins undergo endocytosis, whereby they are internalized in vesicles that bud off from the plasma membrane and later fuse with endosomes inside the cell. There, they are sorted for one of two fates: recycling back to the plasma membrane, or degradation in lysosomes. Receptor trafficking processes are modulated in response to receptor binding through a combination of protein-protein interactions and covalent modifications (e.g., ubiquitylation), which can specifically immobilize/sequester activated receptors in endocytic or endosomal structures or otherwise mark them for enhanced internalization and/or degradation rates. Certain growth factor receptors of the RTK family, as well as other receptor types are regulated in this fashion, which over time leads to a significant downregulation of the number of receptors available for binding and signaling at the cell surface. Models accounting for these effects at various timescales and levels of abstraction have been offered, most notably for EGF/EGF receptor [23-251, and for other systems as well [21, 26, 271. Besides the consideration of receptor abundance at the cell surface, one must also consider whether the receptor remains ligated and/or active in endosomes, and if so which signaling processes endure or are initiated there. Although it is commonly assumed that internalized receptors are silent, implicitly or based on specific evidence, compartmentalization of signaling and its potential role in prolonging specific signaling events have been considered, using modeling [28, 291.
17.2.3 Receptor-mediatedCovalent Modifications and Molecular Interactions
Once the functional receptor-ligand complex has been assembled, it is rapidly activated for intracellular signaling. This often occurs through conformational changes in the receptor, which result in the switching on of an intrinsic enzymatic activity or the association of enzymes from the cytosol. In the case of G-protein-coupled receptors, the story ends here, as ligated receptors may then activate heterotrimeric G-proteins that are precoupled to the receptor or that encounter receptor complexes by lateral diffusion in the membrane. However, growth factor and cytokine receptors present a more complex situation, given
1065
1066
I the aforementioned phosphorylation of one or more receptor subunits by 17 Computational Methods and Modeling
receptor-associated kinase activities. Receptors tend to be phosphorylated on multiple sites, and each site may be phosphorylated to a different extent on an average. They are phosphorylated by the kinase(s) and dephosphorylated by protein phosphatases in a dynamic fashion and at various rates, and the pattern of phosphorylation might change with increasing receptor occupancy. The general purpose of receptor phosphorylation is to provide a scaffold for the association of cytosolic signaling enzymes and adaptor proteins, which possess one or more modular binding domains (e.g., Src-homology 2 and phosphotyrosine-binding domains) that recognize specific phosphorylation sites. The recruited proteins are thus activated to initiate various signaling pathways, and each functional receptor might have the capacity to form large, multiprotein complexes.
17.2.3.1
Receptor Phosphorylation and Binding States
It is clear that even these early stages of receptor signaling present significant challenges from the standpoint of modeling, as one has to decide whether to ignore or account for the combinatorial diversity of phosphorylated receptor species and their complexes with intracellular proteins. The former strategy is adopted most often, particularly when the downstream signal transduction is the focus, which may be appropriate when phosphorylation of a specific site and the resulting activation of an enzyme are known or assumed to be independent from other processes. One must deal with these issues, however, when receptor-binding sites overlap or when one receptor-bound protein affects another in the complex. To this end, the Cell Signaling group at Los Alamos National Labs has recently devised a general modeling framework that accounts for all possible receptor species while assuming that receptor binding, dimerization, and receptor phosphorylation are kinetically independent [4,181. Such assumptions are generally necessary to avoid an explosion in the number of rate constant values that must be specified. Another recent model has explicitly considered receptor-mediated regulation and localization of phosphatase activities (e.g., Shp-1 and -2) as a means of modulating receptor phosphorylation states and signaling specificity [30]. Even with these advances, we are far from capturing the true complexity in the formation of receptor complexes; multivalent interactions between different proteins suggest the possibility that protein interactions form cyclic (ring) structures, which could be important for maintaining the stability of the complex but are notoriously difficult to model even in the simplest cases [31]. Proteins in complex with activated receptors are often phosphorylated by the associated kinase(s), leading to modulation of enzymatic activity or, in the case of adaptor proteins such as Shc, IRS-11-2, and Gab-11-2, binding of other proteins to the phosphorylated site(s). Because these proteins are substrates of receptor-associated kinase activity, they are commonly assumed to leave the receptor complex after phosphorylation in some models
77.2 Modeling lntracellular Signal Transduction Processes
[ll,321, according to the Briggs-Haldane mechanism of enzyme action. Most of the biochemical evidence suggest otherwise, however, as the binding domains tend to be truly modular, and hence other models have treated the binding and phosphorylation of receptor-binding proteins as independent events [28, 331. Certain phosphorylated enzymes such as phospholipase C (PLC) and phosphoinositide (PI) 3-kinase act on substrates at the plasma membrane and do so in a spatially localized manner, consistent with the view that maintenance of receptor association is critical for the functions of these enzymes. This is the perspective from which some models of these pathways have been formulated [ 19, 341. Considering this, receptor binding of certain phosphorylated proteins may be compromised by competing intraor intermolecular interactions, reflecting the need to access other locations or compartments; the phosphorylation and dimerization of STAT transcription factors is a case in point. Generally speaking, one needs to carefully consider whether phosphorylation of a particular protein affects its receptor-binding properties.
17.2.3.2
Kinetic Considerations
Ligands with sub-nanomolar effective KD values, including many growth factors and cytokines, tend to form functional receptor complexes that remain active for several minutes. In fact, some receptor dimers may dissociate so slowly that they rely on internalization for signal termination [21]. In such cases, it is generally safe to assume that intracellular phosphorylation and other reactions respond rapidly to changes in receptor occupancy (pseudosteady state). In cases where the functional complex dissociates more rapidly, however, one must also account for receptor complexes that are formed but are not yet phosphorylated as well as active complexes that dissociate but have not yet been dephosphorylated (or otherwise deactivated) (Fig. 17.2-3). For example, such issues arise in the case of T-cell receptor engagement of peptide-MHC complexes presented on antigen-presenting cells in which prospective peptide ligands naturally vary widely in receptor-binding affinity. Kinetic proofeading refers to the inability of rapidly dissociating ligands to transmit signals, because the short-lived receptor-ligand complexes fail to be activated, whether by dimerization, phosphorylation, association of other proteins, and/or other mechanisms [35, 361. On the other hand, a shorter lifetime can be advantageous when active receptors persist for some time after ligand dissociation, particularly when ligand molecules may be limiting in number as in the case of antigen presentation [37]. Each ligand may thus participate in serial engagement of multiple receptors [38, 391. As discussed in the following section, a shorter lifetime may also be beneficial when significant spatial gradients develop in the vicinity of an active receptor. Signaling outcomes may also be affected by disparities in the timescales associated with intracellular processes (Fig. 17.2-3). Substrate exchange refers to the ability of phosphorylated (or otherwise modified) proteins to associate
I
1067
1068
I
17 Computational Methods and Modeling
Fig. 17.2-3 Kinetic considerations at the level o f receptor complexes. The kinetic proofreading concept (top left) holds that ligands with fast off-rates will not allow the sequence of events required for activation o f signaling t o occur; however, a high off-rate, relative t o the rate o f receptor deactivation, can be advantageous when the number of ligand molecules is limiting in number (serial engagement, top right). The binding
kinetics o f intracellular proteins is also important relative t o the rates o f phosphorylation/dephosphorylation by receptor-associated, mem brane-associated, and cytosolic kinases/phosphatases. Substrate exchange is said t o be high when the kinetics are such that the phosphorylation state o f the protein reflects a global average ofthese activities.
17.2 Modeling lntracellular Signal Transduction Processes
and dissociate with receptor complexes before they are dephosphorylated. Slow versus rapid exchange is determined by the relative rates of substrate phosphorylation, dissociation from the receptor complex, and dephosphorylation within the complex and in the cytosol; fast exchange has the effect of homogenizing the phosphorylation state ofthe protein, which thereby responds globally to the average status of the receptor complexes [28, 301. The ability to hold information about the local receptor environment, in the context of phosphorylation within the receptor complex, requires slow substrate exchange [33].
17.2.4 Spatial Organization and Gradients on Cellular and Subcellular Length Scales
Most of the examples cited above are purely kinetic models with variables changing only with respect to time. While processes may be compartmentalized in such models, with rate terms that account for transfer between cellular compartments, spatial gradients within compartments are obviously not accounted for. In most cases, signaling molecules encounter one another through mutual diffusion, and net molecular transport from one location to another depends on such gradients. However, the concept of a concentration gradient serving as a “driving force” for macroscopic diffusion leads to a common misconception. On a microscopic level, biological molecules are constantly in motion through collisions with water (and occasionally other) molecules, and thus it is obvious that they can associate in the absence of concentration gradients. If one were to survey the cytoplasm of a typical cell, the average distance between the plasma membrane and the nucleus is in the range of L 1-10 ym. The diffusion coefficient D of a small molecule such as Ca2+or ATP in the cytosol is -103pm2 spl,and that of a larger macromolecule is -10 pm2 spl (the cytosolic D value for green fluorescent protein, medium sized at 27 kDa, has been measured at 40 pm2 spl).In three dimensions, the average time associated with traversing that distance is L 2 / 6 D , which yields a range of times from 0.2 ms to 2 s. One concludes that diffusive transport in the cytosol is relatively efficient on cellular length scales, and that the formation of macroscopic gradients requires a fairly rapid degradation/turnover of the molecule. In the case of intracellular calcium and certain other second messengers, fluorescence imaging experiments and detailed kinetic and spatial modeling [40, 411 have demonstrated that spatial waves propagate in the cell as a result of rapid dynamical processes characteristic of excitable media [42]. For signaling proteins that are phosphorylated or otherwise modified at the plasma membrane and/or at endosomal membranes but dephosphorylated throughout the cell, models have been used to evaluate the possibility and functional consequences of gradients of these phosphorylated proteins in the cytosol[28,43-451; when the cytosolic phosphatase activity is either very strong or very weak, however, a kinetic model is adequate [33].
-
I
1069
1070
I
17 Computational Methods and Modeling
1 7.2.4.1 Spatial Gradient Sensing and Chemotaxis
Spatial gradients, both inside and outside the cell, are an inherent component of directed cell migration, or chemotaxis, in which cell movement is biased over time toward the highest extracellular concentration of chemoattractant, or away from the highest concentration of repellent. Such gradients are formed as a natural consequence of physiological settings during development, the immune response and wound healing, for example. Eukaryotic cells sense the gradients spatially, that is by linking the local chemoattractant receptor signaling to the cytoskeletal and/or adhesion processes that drive cell crawling. The signaling pathways that mediate this linkage have been studied intensely in recent years, and in cells that exhibit rapid, amoeboid migration (Dictyosteliurn discoideum, neutrophils), it has been established that external gradients are amplified inside the cell to the point where an all-or-none decision is made concerning the direction of membrane protrusion. In response, numerous models have been proposed that include autocatalytic signaling processes or other positive feedback mechanisms, negative feedback that tends to desensitize the response, and/or a combination of slow- and fast-diffusing species (Fig. 17.2-4). While the classic treatment in this vein was offered over 30years ago by Gierer and Meinhardt [4G], most of the models have emerged recently [47-SO], in tandem with experimental work revealing some of the underlying molecular details. One of the key features of spatial gradient sensing is the ability to localize the intracellular second messenger(s), which requires an appropriate turnover rate relative to diffusion across distances of -10 ym. Well suited in this regard are membrane lipids such as 3’ PIS, products of receptor-activated PI 3-kinases known to mediate spatial sensing [47, 51, 521, whose role is to organize motility processes specifically at the protruding plasma membrane.
17.2.4.2
Gradients on the Molecular Scale
The concentrating effect of enzyme recruitment by receptors combined with the slow diffusion of membrane-associated substrates that many signaling enzymes act upon can push such receptor-proximal reactions into a regime in which their rates are limited by lateral diffusion of the substrate. In such cases, substrate gradients would tend to form depletion zones surrounding the enzyme molecules (radius -10- 100 nm). Theoretical consideration of this problem in the biological context dates back to the seminal contributions of Adam and Delbruck [ 5 3 ] and Berg and Purcell [54], and more recent theories and simulations have focused on specific enzymatic mechanisms relevant to early signaling processes [SS-581. Another layer of complexity at this level of modeling is the subcompartmentalization or domain structure of the plasma membrane, which has been shown using models to affect the rates of enzyme-mediated reactions and the apparent motion of single particles tracked at various frame rates [59-G1]. Accurate microscopic models of signaling reactions/interactions are needed especially
17.2 Modeling lntracellular Signal Transduction Processes
Fig. 17.2-4 Spatial sensing o f chemoattractant gradients. (a-c) Depict phenomena seen in gradient sensing by certain fast-moving cells, with concentrations o f chemoattractant (dashed lines) and intracellular messenger (solid curves) a t the "front" and "rear" o f the cell shown as a function o f time. (a) Uniform stimulation typically elicits adaptation o f the signaling response. (b) Gradient
stimulation, on the other hand, yields a persistent and amplified messenger gradient. (c) The sensing mechanism is able to track changes in the orientation ofthe extracellular gradient. (d) Models have been formulated on the basis of the opposition of positive and negative feedback loops, together with fast and slow diffusion ofthe various components. Here, m * denotes the active intracellular messenger.
1
1071
1072
I in the light of the inability to spatially resolve such gradients by fluorescence 17 Computational Methods and Modeling
microscopy.
17.2.5 Downstream Signaling Cascades and Networks
After the receptor-mediated events described above, signals are transduced through conserved biochemical pathways (Fig. 17.2-5), ultimately leading to the actuation of functional responses such as specific control of transcription, translation, or cytoskeletal dynamics. A signaling cascade generally refers
Fig. 17.2-5 Signal transduction pathways and networks. A partial interaction map, focusing on receptor-proximal signaling processes, is illustrated for the network typically activated by growth factor receptors (RTKs) and cytokine receptors that associate with nonreceptor tyrosine kinases such as those ofthe Src and JAK families (not depicted). Adaptor proteins are shown on the first level below the receptor, followed by the enzymes in complex with the receptor.
These act upon membrane-associated substrates, which once modified recruit serine-threonine kinases and other enzymes to the membrane for initiation o f signaling cascades. Of particular interest are branch points (blue), which act upon multiple molecules/pathways, and points o f convergence (red), which receive and integrate inputs from multiple pathways. Pathway modulators are also shown (light green).
17.2 Modeling lntracellular Signal Transduction Processes
to a series of enzyme modification processes, as in the activation of the various mitogen-activated protein (MAP) kinases, and thus presents a linear picture of signal transmission. As considered theoretically by Bray [62], the use of multiple intermediates in signaling pathways affords more opportunities for regulation, often from parallel pathways (crosstalk). In fact, most signaling “pathways” are simply dominant routes of regulation embedded in larger networks of interactions, in which proteins may interact with and/or modify multiple substrates (branch points) and receive regulatory inputs from multiple molecular partners (convergence points) (Fig. 17.2-5).
1 7.2.5.1
General Considerations and Pathway-specificModels
In addition to providing multiple nodes for pathway regulation, signaling cascades have long been considered a mechanism for amplifying signals. Biologists often refer to amplification in the linear sense, suggesting that a signaling cascade will amplify the absolute number of activated proteins, but in theory this outcome should not be expected. The sensitivity of the pathway, defined as the fractional change in output relative to that of the input, is another matter. Borrowing from formalisms developed for the analysis of metabolic pathways, it is readily shown that the sensitivity is additive as one moves down a sequence of reactions [63]. Pioneering theoretical work by Goldbeter and Koshland [64, 651 and later by Ferrell [G6] showed that amplified sensitivity to a stimulus is readily achieved in systems governed by reversible, enzyme-mediated covalent modifications. These effects were shown to arise when the modifying enzymes are close to saturation, and when activation requires multiple modifications by the same enzyme (as in the dual phosphorylations of MEK and Erk in the MAP kinase cascade). More recent studies along these lines have considered the effects of enzyme/substrate compartmentalization [67, 681 and binding to scaffolding proteins [69], the kinetics in response to transient stimuli [67, 701, pathway feedback and branching [63, 711, and the existence and functional significance of bistability [72] in signaling cascades. Another suite of models has analyzed or otherwise considered the mechanisms involved in specific pathways. Within the past 10 years or so, such models have been formulated to describe receptor-mediated formation of Ras-GTP [ll,67, 731, activation of the Raf-MEK-Erkand homologous kinase cascades [29,69,74-791, regulation of PtdIns(4,5)P2lipid levels through activation of its synthesis and PLC-mediated hydrolysis [34, 801, and activation of PI 3-kinase and Akt [19], and still others have considered pathways of activation of NF-KB [Sl], STAT [82], and Gli [83] transcription factors. For the sake of simplicity, each of the models cited above implicitly assumes that its pathway operates in isolation; however, as models become more detailed it is clear that they will need to consider crosstalk interactions from other pathways emanating from the same receptor(s).
I
1073
1074
I
17 Computational Methods and Modeling
1 7.2.5.2
Pathway Crosstalk and Signaling Networks
When confronted with a system in which multiple signaling pathways are activated and crosstalk between them is prevalent, it is difficult to predict the consequences of mutations or interventions at the level of signaling intermediates, particularly those nodes that serve as branch and/or convergence points through which signals are distributed and integrated. Especially when a branch point leads to activation of some downstream signals and suppression of others, or when a convergence point receives both positive and negative signals, it is crucial to quantitatively characterize the magnitudes of the effects and how they influence the overall response [84]. An example of this sort of signal integration is seen in the activation of Erk, which is activated by the Raf-MEK-Erkcascade and negatively regulated through phosphorylation of Raf by Akt, a PI 3-kinase-dependent pathway; a model accounting for this crosstalk relationship has appeared recently [85].Pathway crosstalk interactions may also be involved in positive feedback loops that produce prolonged responses, provided a threshold level of receptor signaling has been achieved. Activation of a negative feedback is then needed to break the cycle. To illustrate such bistable signaling mechanisms, Bhalla and Iyengar have formulated complex models in the context of Erk activation, which are robust with respect to producing bistability over relatively wide ranges of parameter values [32, 861. Pathway crosstalk remains an important and developing area of signal transduction research, in both the experimental and modeling arenas.
17.2.6 Prospects and Challenges
With our ever-expanding knowledge of signal transduction mechanisms, it is envisioned that complex kinetic models incorporating all major intracellular pathways will be constructed. In tandem, stochastic simulations accounting for the full diversity of molecular interactions and intracellular compartments will allow researchers to visualize, at the single-molecule level, the sequence of signaling complex assembly and the local and global activation of signaling pathways that follows. Another exciting frontier is the linkage of signaling dynamics with control of the cytoskeleton, which will require an appreciation of both kinetics and mechanics, and yet another is the interface with gene regulatory networks and genomic data. In terms of implementation, the question is not whether such efforts are feasible; indeed, efforts along these lines are well underway. Rather, the real test will be to extract mechanistic insights that allow one to predict or at least explain the outcomes of specific experiments.
17.2.6.1
Limitations o f Complex Models
If the field is to move toward more complicated models that include multiple pathways and cell stimuli, a nurnber of hurdles must be overcome. First and
77.2 Modeling lntracellular Signal Transduction Processes
perhaps foremost, one must choose a model structure that relates to molecular mechanisms that may not be known completely, and so it is inevitable that complex models will include controversial elements. Like conceptual models of signaling mechanisms, quantitative models will need to be refined and/or revised in the light of new findings, but then the model bears the burden of showing whether earlier predictions and analyses remain valid. Second, a fundamental problem with complex models is that they require the specification of an increasing number of parameter (e.g., rate constant) values; even when such values are obtained from the literature or from best-fits to available data sets, it must be recognized that there is a great deal ofuncertainty associated with this exercise. In the best-case scenario, the model would be validated by direct comparison with quantitative measurements that assess multiple intermediates activated under the same stimulation conditions, and even then a sensitivity analysis will be warranted to identify those parameter values that drive the quality of fit; in spite of the vast literature on signaling mechanisms, the field is currently limited by the availability of such data. Model generality is a related issue; it seems unlikely that a model that was trained on one cellular context will transfer well to the analysis ofother systems. Finally, more comprehensive models can be cumbersome to work with, and how one might approach the analysis depends on the specific question(s) being asked. In response, it has been suggested that one might build models from smaller process modules, which might be analyzed individually and in the context of other modules [87, 881. Software packages such as Virtual Cell (http://www.nrcam.uchc.edu/) [89] have been developed for the purpose of linking models together in a seamless and interactive way.
1 7.2.6.2
Model Compression and Integration
The issues of model structure, parameter estimation, generality, and modularity all point to the continued need for detailed analyses of smaller models that focus on a particular aspect of the system, in conjunction with focused, quantitative experiments. While the modular strategy described above will no doubt become increasingly valuable as efforts are made to link the models, we offer here an approach that is similar in spirit yet distinct in one important respect; that is, once a submodel has been formulated and analyzed in full-blown mechanistic detail, we favor a compression step whereby the submodel is simplified by lumping parameters and processes to the extent where it retains its basic features (as illustrated in Fig. 17.2-6).Classically, this is achieved through a consideration of fast and slow kinetic processes, perhaps with input from sensitivity analysis. We argue that such a coarse-graining approach is forgiving with respect to the choices made in the submodel formulation and facilitates the process of submodel integration; one might initially explore the phenomenological behavior of the higher-level model with fewer parameters to specify, simplifying the sensitivity analysis and portability to other systems. The simplifying assumptions used to condense
I
1075
1076
I
17 Computational Methods and Modeling
j
c, k,
internalized dimer
Fig. 17.2-6 Compression o f a signaling module. As an illustrative example o f model compression, we consider the activation o f PDCF receptor as a module to be embedded in a model ofcell response to PDCF. (a) Model schematic (adapted from Ref. 19). Our previous model accounted for PDCF receptor binding, dimerization, and internalization; in addition t o the processes shown here, we have added basal receptor turnover, synthesis, and recycling. (b) The complete kinetic model is posed in terms of ordinary differential equations according t o the laws o f mass action. There are 1 0 adjustable rate constants in this model. (c) It is assumed that the ultimate cell responses are slow relative to the processes considered here, hence we assume a steady or pseudosteady state. The simplifications shown here further assume that the processes described by the rate constants k- and k,,,,, are much faster than those
describing the initial receptor binding or receptor trafficking. (d) One could stop at this stage and simplify, or make further assumptions. The simplified receptor balance shown here, with R(0) defined as the cell surface receptor number prior t o ligand stimulation (R(0)= ( V s / k t ) ( l krec/kdeg))assumes that k , >> k,, kt (pseudoequilibrium, with KD = k , / k f ) . (e) The simplified equations are used t o solve for C2, the number offunctional signaling complexes, in terms o f only three lumped parameters. If one is interested only in the shape ofthe dose-response curve, one might normalize the ligand concentration, [L],and Cz (by KD and V , / k , , respectively; alternatively, c2 could be normalized by its maximum value, taken at 6 = 1 ) . In that case, the normalized dose-response curve would be determined by a single Parameter, K x .
+
References 11077
each submodel may be reevaluated at any time, and it would be expected that some findings would prompt a revision of the submodel, while others will simply reveal accessory processes that modulate the existing lumped parameters.
17.2.7 Concluding Remarks
Quantitative models, in conjunction with quantitative experimentation, are being used to evaluate biochemical signaling mechanisms, predict the outcomes of novel experiments, and generate nonintuitive insights and hypotheses warranting further study. Generalized and pathway-specificmodels have elucidated relationships between molecular properties and kinetics of signaling responses, incorporating spatial information where appropriate. The lessons learned from smaller, “reductionist” models have been significant, and one of the challenges we now face is how best to integrate such models to analyze complex intracellular systems.
Acknowledgments
Support from the NIH (ROl-GM067739), N S F (# 0133594), and Office of Naval Research (N00014-03-1-0594)is gratefully acknowledged.
References 1. T. Hunter, Signaling
- 2000 and beyond, Cell 2000, 100,113-127. 2. D. Bray, Reductionism for biochemists: how to survive the protein jungle, Trends Biochem. Sci. 1997,22, 325-326. 3. B.M. Slepchenko, J.C. Schaff, J.H. Carson, L.M. Loew, Computational cell biology: spatiotemporal simulation of cellular events, Annu. Rev. Biophys. Biomol. Stmct. 2002, 31, 423-441. 4. W.S. Hlavacek, J.R. Faeder, M.L. Blinov, A.S. Perelson, B. Goldstein, The complexity of complexes in signal transduction, Biotechnol. Bioeng. 2003, 84,783-794. 5. A. Levchenko, Dynamical and integrative cell signaling: challenges for the new biology, Biotechnol. Bioeng. 2003,84,773-782.
6. J.J.Tyson, K.C. Chen, B. Novak,
Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell, CUT. Opin. Cell Biol. 2003, 15, 221-231. 7. N.J. Eungdamrong, R. Iyengar, Computational approaches for modeling regulatory cellular networks, Trends Cell Biol. 2004, 14, 661-669. 8. H.M. Sauro, B.N. Kholodenko, Quantitative analysis of signaling networks, Prog. Biophys. Mol. Biol. 2004, 86,s-43. 9. C. Wofsy, B. Goldstein, K. Lund, H.S. Wiley, Implications of epidermal growth factor (EGF) induced EGF receptor aggregation, Biophys. /. 1992, 63,98-110.
10781 17 10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
computational Methods and Modeling S.G. Chamberlin, D.E. Davies, A unified model of c-erbB receptor homo- and heterodimerisation, Biochim. Biophys. Acta 1998, 1384, 223-232. B.N. Kholodenko, O.V. Demin, G. Moehren, J.B. Hoek, Quantification of short term signaling by the epidermal growth factor receptor, J . Biol. Chem. 1999, 274, 30169-30181. P. Klein, D. Mattoon, M.A. Lemmon, J. Schlessinger, A structure-based model for ligand binding and dimerization of EGF receptors, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 929-934. B.S. Hendriks, G. Orr, A. Wells, H.S. Wiley, D.A. Lauffenburger, Parsing ERK activation reveals quantitatively equivalent contributions from epidermal growth factor receptor and HER2 in human mammary epithelial cells, J. Biol. Chem. 2005, 280, 6157-6169. S. Wanant, M.J. Quon, Insulin receptor binding kinetics: modeling and simulation studies, J . Theor. Biol. 2000, 205, 355-364. K.E. Forsten, M. Fannon, M.A. Nugent, Potential mechanisms for the regulation of growth factor binding by heparin, J . Theor. Biol. 2000, 205, 21 5 -230. K. Forsten-Williams, C.C. Chua, M.A. Nugent, The kinetics of FGF-2 binding to heparan sulfate proteoglycans and MAP kinase signaling, J . Theor. Biol. 2005,233,483-499. C. Wofsy, B.M. Vonakis, H. Metzger, B. Goldstein, One Lyn molecule is sufficient to initiate phosphorylation of aggregated high-affinity IgE receptors, Proc. Natl. Acad. Sci. U.S.A. 1999, 96,8615-8620. J.R. Faeder, W.S. Hlavacek, I. Reischl, M.L. Blinov, H. Metzger, A. Redondo, C. Wofsy, B. Goldstein, Investigation of early events in FceRI-mediated signaling using a detailed mathematical model, J . lmmunol. 2003, 170, 3769-3781. C.S. Park, I.C. Schneider, J.M. Haugh, Kinetic analysis of platelet-derivpd growth factor receptor/
20.
21.
22.
23.
24.
25.
26.
27.
28.
phosphoinositide 3-kinase/Akt signaling in fibroblasts, J . Biol. Chem. 2003,278,37064-37072, M.M. Ilondo, A.B. Damholt, B.C. Cunningham, J.A. Wells, P. De Meyts, R.M. Shymko, Receptor dimerization determines the effects of growth hormone in primary rat adipocytes and cultured human IM-9 lymphocytes, Endocrinology 1994, 134, 2397-2403. J.M. Haugh, Mathematical model of human growth hormone (hGH)stimulated cell proliferation explains the efficacy of hGH variants as receptor agonists or antagonists, Biotechnol. Prog. 2004, 20, 1337-1344. B. Goldstein, D. Jones, I.G. Kevrekidis, A.S. Perelson, Evidence for p55-p75 heterodimers in the absence of IL-2 from Scatchard plot analysis, lnt. lmmunol. 1992, 4, 23-32. H.S. Wiley, D.D. Cunningham, A steady state model for analyzing the cellular binding, internalization and degradation of polypeptide ligands, Cell 1981, 25, 433-440. C. Starbuck, D.A. Lauffenburger, Mathematical model for the effects of epidermal growth factor receptor trafficking dynamics on fibroblast proliferation responses, Biotechnol. Prog. 1992, 8, 132-143. A.R. French, D.A. Lauffenburger, Intracellular receptor/ligand sorting based on endosomal retention components, Biotechnol. Bioeng. 1996, 51,281-297. E.M. Fallon, D.A. Lauffenburger, Computational model for effects of ligandlreceptor binding properties on interleukin-2 trafficking dynamics and T cell proliferation response, Biotechnol. Prog. 2000, 16, 905-916. C.A. Sarkar, D.A. Lauffenburger, Cell-levelpharmacokinetic model of granulocyte colony-stimulating factor: implications for ligand lifetime and potency in vivo, Mol. Phamacol. 2003, 63,147-158. J.M. Haugh, D.A. Lauffenburger, Analysis of receptor internalization as a mechanism for modulating signal transduction, I. Theor. Bid. 1998, 195, 187-218.
References I1079 29.
30.
31.
32.
33.
34.
35.
36.
37.
B. Schoeberl, C. Eichler-Jonsson, E.D. Gilles, G . Muller, Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors, Nat. Biotechnol. 2002, 20, 370-375. J.M. Haugh, I.C. Schneider, J.M. Lewis, On the cross-regulation of protein tyrosine phosphatases and receptor tyrosine kinases in intracellular signaling, J. 7’heor. Biol. 2004, 230,119-132. R.G. Posner, C. Wofsy, B. Goldstein, The kinetics of bivalent ligand-bivalent receptor aggregation: ring formation and the breakdown of the equivalent site approximation, Math. Biosci. 1995, 126,171-190. U.S. Bhalla, R. lyengar, Emergent properties of networks of biological signaling pathways, Science 1999, 283, 381-387. J.M. Haugh, A.C. Huang, H.S. Wiley, A. Wells, D.A. Lauffenburger, Internalized epidermal growth factor receptors participate in the activation of p21rasin fibroblasts, J . Biol. Chem. 1999,274,34350-34360. J.M. Haugh, A. Wells, D.A. Lauffenburger, Mathematical modeling of epidermal growth factor receptor signaling through the phospholipase C pathway: mechanistic insights and predictions for molecular interventions, Biotechnol. Bioeng. 2000, 70, 225-238. T.W. McKeithan, Kinetic proofreading in T-cell receptor signal transduction, Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 5042- 5046. W.S. Hlavacek, A. Redondo, C. Wofsy, B. Goldstein, Kinetic proofreading in receptor-mediated transduction of cellular signals: receptor aggregation, partially activated receptors, and cytosolic messengers, Bull. Math. Biol. 2002, 64,887-911. P.A. Gonzalez, L.J. Carreno, D. Coombs, J.E. Mora, E. Palmieri, B. Goldstein, S.G. Nathenson, A.M. Kalergis, T cell receptor binding kinetics required for T cell activation depend on the density of cognate ligand on the antigen-presenting cell,
Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 4824-4829. 38. C. Wofsy, D. Coombs, B. Goldstein, Calculations show substantial serial engagement of T cell receptors, Biophyr.1. 2001, 80, 606-612. 39. D. Coombs, A.M. Kalergis, S.G. Nathenson, C. Wofsy, B. Goldstein, Activated TCRs remain marked for internalization after dissociation from pMHC, Nat. Immunol. 2002, 3 , 926-931. 40. C.C. Fink, B. Slepchenko, 1.1. Moraru, J . Schaff, J. Watras, L.M. Loew, Morphological control of inositol1,4,5-trisphosphate-dependent signals, 1. Cell Biol. 1999, 147, 929-935. 41. J.C. Schaff, B.M. Slepchenko, Y.S. Choi, J . Wagner, D. Resasco, L.M. Loew, Analysis of nonlinear dynamics on arbitrary geometries with the virtual cell, Chaos 2001, 11, 115-131. 42. S.Y. Shvartsman, Shooting from the hip: spatial control of signal release by intracellular waves, Proc. Natl. Acad. Sci. U.S.A. 2002, 99,9087-9089. 43. B.N. Kholodenko, G.C. Brown, J.B. Hoek, Diffusion control of protein phosphorylation in signal transduction pathways, Biochem. /. 2000, 350, 901-907. 4. B.N. Kholodenko, MAP kinase cascade signaling and endocytic trafficking: a marriage of convenience? Trends Cell Biol. 2002, 12, 173-177. 45. I.V. Maly, H.S. Wiley, D.A. Lauffenburger, Self-organization of polarized cell signaling via autocrine circuits: computational model analysis, Biophys. J . 2004, 86, 10-22. 46. A. Gierer, H. Meinhardt, A theory of biological pattern formation, Kybernetik 1972, 12, 30-39. 47. M. Postma, P.J.M. Van Haastert, A diffusion-translocation model for gradient sensing by chemotactic cells, Biophys.J. 2001, 81, 1314-1323. 48. A. Levchenko, P.A. Iglesias, Models of eukaryotic gradient sensing: application to chemotaxis of amoebae and neutrophils, Biophys. J . 2002, 82, 50-63. 49. K.K. Subramanian, A. Narang, A mechanistic model for eukaryotic
1080
I
17 Computational Methods and Modeling
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
gradient sensing: spontaneous and induced phosphoinositide polarization, J. Theor. Biol. 2004, 231, 49-67. L. Ma, C. Janetopoulos, L. Yang, P.N. Devreotes, P.A. Iglesias, Two complementary, local excitation, global inhibition mechanisms acting in parallel can explain the chemoattractant-induced regulation of PI(3,4,5)P3response in Dictyostelium cells, Biophys. /. 2004, 87, 3764-3774. J.M. Haugh, F. Codazzi, M. Teruel, T. Meyer, Spatial sensing in fibroblasts mediated by 3' phosphoinositides, J. Cell Biol. 2000, 151, 1269-1279. J.M. Haugh, I.C. Schneider, Spatial analysis of 3' phosphoinositide signaling in living fibroblasts: I. Uniform stimulation model and bounds on dimensionless groups, Biophys. /. 2004, 86, 589-598. G. Adam, M. Delbriick, Reduction of dimensionality in biological diffusion processes, in Structural Chemistry and Molecular Biology, (Eds.: A. Rich, N. Davidson), W.H. Freeman and Co., San Fransisco, 1968,198-215. H.C. Berg, E.M. Purcell, Physics of chemoreception, Biophys. /. 1977, 20, 193-219. L.D. Shea, G.M. Omann, J.J. Linderman, Calculation of diffusion-limited kinetics for the reactions in collision coupling and receptor cross-linking, Biophys. 1. 1997, 73,2949-2959. J.M. Haugh, A unified model for signal transduction reactions in cellular membranes, Biophys. J. 2002, 82,591-604. H. Berry, Monte Carlo simulations of enzyme reactions in two dimensions: fractal kinetics and spatial segregation, Biophys.]. 2002, 83, 1891-1901. P.J. Woolf, J.J. Linderman, Untangling ligand induced activation and desensitization of G-protein-coupled receptors, Biophys. J. 2003, 84, 3-13. M.J. Saxton, K. Jacobson, Single-particle tracking: applications to membrane dynamics, Annu. Rev. Biophys. Biomol. Struct. 1997, 26, 373-399.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
L.D. Shea, J.J.Linderman, Compartmentalization of receptors and enzymes affects activation for a collision coupling mechanism, J. Theor. Biol. 1998, 191, 249-258. K. Ritchie, X. Shan, J. Kondo, K. Iwasawa, T. Fujiwara, A. Kusumi, Detection of non-brownian diffusion in the cell membrane in single molecule tracking, Biophys. /. 2005, 88, 2266-2277. D. Bray, Intracehlar signaling as a parallel distributed process, /. Theor. Biol. 1990, 143, 215-231. B.N. Kholodenko, J.B. Hoek, H.V. Westerhoff, G.C. Brown, Quantification of information transfer via cellular signal transduction pathways, FEBS Lett. 1997, 414, 430-434. A. Goldbeter, D.E. Koshland Jr,An amplified sensitivity arising from covalent modification in biological systems, Proc. Natl. Acad. Sci. U.S.A. 1981, 78,6840-6844. A. Goldbeter, D.E. Koshland Jr, Ultrasensitivity in biochemical systems controlled by covalent modification: interplay between zero-order and multistep effects, /. Biol. Chem. 1984, 259,14441-14447. J.E. Ferrell Jr, Tripping the switch fantastic: how a protein kinase cascade can convert graded inputs into switch-likeoutputs, Trends Biochem. S C ~1996, . 21,460-466. J.M. Haugh, D.A. Lauffenburger, Physical modulation of intracellular signaling processes by locational regulation, Biophys. /. 1997, 72, 2014-2031. J.E. Ferrell Jr, How regulated protein translocation can produce switch-like responses, Trends Biochem. Sci. 1998, 23,461-465. A. Levchenko, J. Bruck, P.W. Sternberg, Scaffold proteins may biphasically affect the levels of mitogen-activated protein kinase signaling and reduce its threshold properties, Proc. Natl. Acad. Sci. U.S.A. 2000, 97,5818-5823. R. Heinrich, B.G. Neel, T.A. Rapoport, Mathematical models of protein
References I1081
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
kinase signal transduction, Mol. Cells 2002, 9,957-970. V.K. Mutalik, A.P. Singh, J.S. Edwards, K.V. Venkatesh, Robust global sensitivity in multiple enzyme cascade system explains how the downstream cascade structure may remain unaffected by cross-talk, FEBS Lett. 2004, 558, 79-84. J.E. Ferrell Jr, Self-perpetuating states in signal transduction: positive feedback, double-negative feedback and bistability, Curr. Opin. Cell Biol. 2002, 14,140-148. H. Resat, ].A. Ewald, D.A. Dixon, H.S. Wiley, An integrated model of epidermal growth factor receptor trafficking and signal transduction, Biophys. J. 2003, 85,730-743. C.F. Huang, J.E. Ferrell Jr, Ultrasensitivity in the mitogen-activated protein kinase cascade, Proc. Natl. Acad. Sci. U.S.A. 1996, 93,10078-10083. W.R. Burack, T.W. Sturgill, The activating dual phosphorylation of MAPK by MEK is nonprocessive, Biochemistry 1997,36,5929-5933. F.A. Brightman, D.A. Fell, Differential feedback regulation of the MAPK cascade underlies the quantitative differences in EGF and NGF signalling in PC12 cells, FEBS Lett. 2000,482,169-174. B.N. Kholodenko, Negative feedback and ultrasensitivity can bring about oscillations in the mitogen-activated protein kinase cascades, Eur. J. Biochem. 2000,267,1583-1588. A.R. Asthagiri, D.A. Lauffenburger, A computational study of feedback effects on signal dynamics in a mitogen-activated protein kinase (MAPK) pathway model, Biotechnol. Prog. 2001, 17,227-239. S. Sasagawa, Y. Ozaki, K. Fujita, S. Kuroda, Prediction and validation of the distinct dynamics of transient and sustained ERK activation, Nat. Cell Bid. 2005, 7, 365-373. C. Xu, J . Watras, L.M. Loew, Kinetic analysis of receptor-activated
81.
82.
83.
84.
85.
86.
87.
88.
89.
phosphoinositide turnover, J. Cell Biol. 2003, 161,779-791. A. Hoffmann, A. Levchenko, M.L. Scott, D. Baltimore, The IKB-NF-KB signaling module: temporal control and selective gene activation, Science 2002, 298,1241-1245. S. Yamada, S. Shiono, A. Joo, A. Yoshimura, Control mechanism of JAK/STAT signal transduction pathway, FEBS Lett. 2003, 534, 190- 196. K. Lai, M.J. Robertson, D.V. Schaffer, The sonic hedgehog signaling system as a bistable genetic switch, Biophys. 1. 2004, 86,2748-2757. B.N. Kholodenko, A. Kiyatkin, F.J. Bruggeman, E. Sontag, H.V. Westerhoff, J.B. Hoek, Untangling the wires: a strategy to trace functional interactions in signaling and gene networks, Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 12841-12846. M. Hatakeyama, S. Kimura, T. Naka, T. Kawasaki, N. Yumoto, M. Ichikawa, J. Kim, K. Saito, M. Saeki, M. Shirouzu, S. Yokoyama, A. Konagaya, A computational model on the modulation of mitogen-activated protein kinase (MAPK) and Akt pathways in heregulin-induced ErbB signalling, Biochem. J. 2003,373,451-463. U.S. Bhalla, P.T. Ram, R. Iyengar, MAP kinase phosphatase as a locus of flexibility in a mitogen-activated protein kinase signaling network, Science 2002,297,1018-1023. G. Weng, U.S. Bhalla, R. lyengar, Complexity in biological signaling systems, Science 1999, 284, 92-96. A.R. Asthagiri, D.A. Lauffenburger, Bioengineering models of cell signaling, Annu. Rev. Biomed. Eng. 2000, 2, 31-53. J . Schaff, C.C. Fink, B. Slepchenko, J.H. Carson, L.M. Loew, A general computational framework for modeling cellular structure and function, Biophys. J. 1997, 73, 1135- 1146.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim
I
18 Genome and Proteome Studies
18.1 Genome-wide Gene Expression Analysis: Practical Considerations and Application to the Analysis of T-cell Subsets in Inflammatory Diseases
Lars Rogge and Elisabetta Bianchi
Outlook
The scope of this chapter is twofold. We will first review some important conceptual and technical issues related to experiment design that we feel should be addressed while designing studies using microarrays. In the second part, we will illustrate how this technology can be employed practically to promote insight into a specific biological field, by reviewing several studies that address the molecular basis of inflammatory diseases using gene profiling. We will focus on the gene expression analysis of T-lymphocyte subsets, the key players in several inflammatory diseases.
18.1.1 Introduction
The concept of systems biology is to use a holistic approach to understand the function of an organism. This approach involves a large-scale analysis of the interplay of the constituents of the organism using genetics, genomics, and proteomics. Systems biology would have remained an illusion without the significant progress that has been made in each of the three fields mentioned above. Genomic-scale gene expression profiling has developed from its infancy in the mid-1990 into a robust tool used currently in many laboratories and now has increasing impact on biological and biomedical research. This technology is based on the development of the so-called microarrays. Microarrays consist of an ordered array of DNA sequences on a solid support that allows measuring Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim I S B N : 978-3-527-31150-7
1083
1084
I
18 Genome and Proteome Studies
the expression level of many genes in parallel. The technology can reveal the physiology of cells and tissues on an unprecedented scale by quantitating the mRNA levels of tens of thousands of genes [l]. The amount of data generated by microarray experiments cannot be handled by simple sorting in spreadsheets or by plotting on graphs. Microarray data analysis has recently developed as a separate field with increasing impact of mathematicians generating dedicated algorithms and tools [2-41. Sophisticated computational tools are now available, but it should be noted that a basic understanding of these tools is required for meaningful data analysis.
18.1.2 History/Development
Gene expression profiling using microarrays is a relatively new technology. Initially, global gene expression studies have relied mainly on two technologies: spotted complementary DNA (cDNA) microarrays and commercial highdensity oligonucleotide microarrays generated by light-directed, chemical synthesis [S, 61 (see Refs. 7 and 8 for reviews of the two technologies). It is of interest to note that the technology of light-directed, chemical synthesis was initially developed for the parallel synthesis of multiple peptides (e.g., for the identification of epitopes of monoclonal antibodies) [9], then applied to the parallel synthesis of oligonucleotides for rapid DNA sequence analysis (e.g., of HIV or other pathogens) [lo], before it was commercialized for the monitoring of gene expression [Ill. Currently, in addition to the two technologies described above, custom-designed and commercial platforms using “long” oligonucleotides (approximately GO nucleotides) are increasingly used. Apart from the development of dedicated technology for the production, microarray technology is based on the knowledge of the transcriptome (cDNA sequences) of the respective organism. In the early days, microarrays contained large amounts of expressed sequence tags (ESTs), whose origins and significance were sometimes dubious. The scarce annotation of ESTs sometimes turned the biological interpretation of microarray experiments into a nightmare. The availability of the draft sequence of the human genome represented a milestone in the development of microarray technology. The notion that humans have “only” approximately 30 000 genes made it technically possible to design microarrays that could measure the expression levels of all human genes on a single chip. In addition, the published draft sequence allowed the control of the cDNA sequences represented on microarrays and resulted in a much higher quality of both custom-made and commercial microarrays. The recent publication of the finished euchromatic sequence of the human genome [ 121will certainly result in a further refinement of this technology. Currently, custom-made and commercial microarrays typically interrogate the expression levels of approximately 30 000 human
18.1 Genome-wide Gene Expression Analysis
genes, although the international human genome sequencing consortium predicts “only” 20 000-25 000 protein-coding genes. This discrepancy indicates that it may still take some time to further improve this technology. Nevertheless, it is fair to say that genome-wide gene expression analysis has developed in only 10 years from a splendid idea into a robust tool. 18.1.3 General Considerations 18.1.3.1
Issues in Experimental Design
Array experiments are still far from being inexpensive, both in terms of reagents and time. Careful design of these experiments is therefore essential to optimize information retrieval, in particular, in studies involving primary human samples, which have to take into account the limitations imposed by restricted availability of sample material and the high donor-to-donor variability. Two basic experimental designs are possible: in two-fluorescence methods, the two samples to be compared are labeled with two different dyes and hybridized to the same array, allowing direct comparison of gene expression levels; in one-fluorescence methods, each sample is hybridized to a separate array, and differences in gene expression levels between samples are determined by comparison with a common reference sample (Fig. 18.1-1). 18.1.3.1.1
Reference Sample
Microarray experiments are often employed to determine relative fold differences in gene expression levels between different experimental samples. The reference sample is the one to which the other samples are compared. For one-color platforms, in which each sample is hybridized to a separate array, the choice of the sample of reference is quite flexible, and can be performed after the experiment is carried out. For technologies in which two extracts are hybridized to the same array, the choice of the reference sample has to be included in the experimental design. The direct comparison between two samples (e.g., tumor vs. normal sample) reduces variations in measurements, providing a more accurate representation of expression changes. A method for optimizing direct comparisons, the loop design, has been proposed by Kerr and Churchill [13]. In loop design studies, samples are systematically compared with each other, an approach that allows the generation of more relevant data and of very precise assessment of gene expression levels. A drawback of this approach is its limited flexibility, since extension of these studies to include additional samples calls for a redesign of the experiment and rapidly growing requirements for larger amounts of RNA and microarrays. In addition, with this study design, the efficiency of estimation of gene expression levels is greatly compromised by the loss of just one sample. The use of a common reference sample allows the comparison of data from multiple arrays, and, ideally, from multiple experiments or laboratories that use
I
1085
1086
I
78 Genome and Proteome Studies
Fig. 18.1-1 Global gene expression studies rely mainly on two technologies: spotted complementary DNA (cDNA) microarrays (a) and oligonucleotide microarrays (b). The first type o f microarray is generated by robotic spotting of cDNA fragments for defined genes on a glass slide, in an ordered fashion. In general, each gene is represented by double-stranded DNA probe (up to 1 kb) that is usually generated by polymerase chain reaction (PCR) amplification. Current technology allows the deposition o f more than 10 000 genes on a single slide. High-density oligonucleotide arrays are generated by in situ synthesis of short oligonucleotides (25-mers) on a glass slide. A sophisticated process developed in the semiconductor industry, termed photolithography, is used to synthesize approximately 1 300 000 distinct oligonucleotide features in defined places on a chip. In contrast to spotted cDNA arrays, each gene is represented by 11 to 20 pairs o f oligonucleotides on a single chip. This allows the design o f oligonucleotide probes that hybridize to a specific exon o f a
given gene. More recently, custom-designed and commercial platforms using "long" oligonucleotides (60-mers) are increasingly used. To generate hybridization targets, RNA i s extracted from the tissue o f interest and mRNA is reverse transcribed into cDNA. In protocols used mainly for spotted cDNA arrays, fluorescently labeled nucleotides are incorporated into the cDNA during this step. In other protocols used mainly for high-density oligonucleotide arrays, a biotin-labeled cRNA target is generated by transcribing the double-stranded cDNA target with T7 RNA polymerase. This last step also results in a linear amplification (approximately 50-fold) o f the material. In both cases, the labeled target cDNA or cRNA is hybridized t o the array, and the intensity of hybridization t o individual cDNA fragments or oligonucleotides on the array is revealed by a high-resolution scanner. The hybridization signal is then used t o determine the expression level o f each gene represented on the array.
the same reference, making it easier to build common databases of microarray data. The desirable characteristics of a reference sample are that it should be homogeneous, available in large quantities, and stable over time. Frequently used reference samples are genomic DNA or RNA from different cell lines, that have been pooled to obtain coverage of all expressed genes [14,15].In a study to compare a direct two-dye measurement (where two samples are hybridized to
78.I Genome-wide Gene Expression Analysis
the same array) with a common reference measurement (where each sample is hybridized to a separate array), Park et al. found a high correlation between the two settings, suggesting that multiple comparisons of experimental conditions using a common control can achieve a satisfactory degree of accuracy [16]. 18.1.3.1.2 Replication and Sample Size Microarray technology is very powerful, but quite noisy - and this characteristic should be taken into account while planning array experiments. Replication is a good approach to decrease the effects of variability. Technical replicates (such as multiple hybridizations performed with the same RNA sample) can be used to assess the experimental noise of the system and to ensure quality control of the experiment. Technical replicates that have entered common practice include dye swapping for experiments in which two extracts are hybridized to the same array. In this case, it is recommended to repeat sample hybridization by inverting the dyes that label the samples. This expedient is commonly employed to control gene-specific dye bias [17-191. Another common example of technical replication is the presence, on the array, of multiple probes that identify the same transcript. Reporter sequence replication may provide the additional advantage of facilitating cross-platform comparison of data, which requires adequate matching of corresponding probe sets and may be optimally performed by matching the sequence of the probes present on the different microarrays, rather than the genes represented [20]. It is generally agreed that experimental variations due to technical aspects of the process (such as cDNA and cRNA synthesis or chip hybridization) do not constitute the major source of variability of microarray experiments, which is instead provided by the natural variability of gene expression levels, with variations among samples obtained from different individuals being the most pronounced. This variability is most effectively addressed by the use of biological replicates (e.g., mRNA from different extractions or from multiple biological samples) [21, 221. The importance of replicate microarray experiments has been emphasized in a study addressing the natural differences of gene expression in inbred mouse strains [23]. The authors used a 5406-clone spotted cDNA microarray to quantitate transcript levels in the kidney, liver, and testis from each of six normal male C57BLG mice. analysis of variance (ANOVA) was used to compare the variance across the six mice to the variance among four replicate experiments performed for each tissue. The conspicuous finding was that statistically significant variable gene expression was detected for 3.3, 1.9, and 0.8% of the genes in the kidney, testis, and liver, respectively [23].Importantly, many of the transcripts that were found to be most variable were immunemodulated, stress-induced, and hormonally regulated genes. Pritchard et al. point out that genetically diverse populations such as humans are very likely to show an even greater variability in gene expression than inbred mice [23].This suggests that a meaningful study of the outbred human population will require many replicate experiments and/or an extensive characterization of normal
1
1087
1088
I variability, to discriminate between informative variations in gene expression 18 Genome and Proteome Studies
and effects due to uncontrolled variables. The estimation of adequate sample size for microarray studies takes into account several factors, including the variability of the population, the desired detectable fold differences in gene expression, the power (probability) to detect differences, and the acceptable error rate [24-281. A number of papers provide computational methods or orientation tables to help determine the desirable number of replicates to be included in a statistically significant study. A general and sobering conclusion that derives from many of these calculations is that the number of samples required for a reasonably informative experiment is much larger than the number commonly used in human microarray case-control studies [25]. 18.1.3.1.3
Pooling of R N A Samples
Messenger RNA is often pooled in microarray experiments, either because of the impossibility of obtaining sufficient material from a single individual or to reduce costs, by reducing the number of microarrays hybridized. The effect of pooling on data quality is still debated in the literature. Pooling can be useful in reducing the variability in individual samples induced by experimental artifacts or by sample dishomogeneity [21]. However, a serious drawback of pooling is the loss of information regarding population variability, and therefore pooling should not be used if inferences are sought for single subjects. This is typically the case of studies aimed at identifying gene profiles that classify individual subjects and predict their membership in classes (e.g., cancer patients vs. normal patients, or distinguishing cancer subsets). An additional disadvantage of pooling is the inability to detect outliers and possibly remove them from the analysis. It has been proposed that appropriate RNA pooling can provide adequate statistical power and improve the efficiency and cost-effectiveness for many types of microarray experiments when inferences are made at the group level [29]. In particular, for small experimental designs, in which only few arrays are available for each biological condition, pooling could actually improve accuracy [ 301. For larger designs, that include several biological replicates, pooling is not usually advantageous. Pooling extra subjects on a fixed number of arrays decreases slightly the variability across experiments, at the price of loss of individual information. As pooling is often taken into consideration to reduce the number of arrays (and therefore the costs) of an experiment, it should be noted that to maintain accuracy, the number of subjects analyzed must be greatly increased, and that the added expense of additional samples for the pooled design may outweigh the benefit of saving on microarray cost [30, 311. 18.1.3.1.4
RNA Amplification
An alternative approach to pooling, in the analysis of small samples, is RNA amplification. In particular, this approach has been successfully used to derive enough RNA from sources such as laser capture microdissection
18. I Genome-wide Gene Expression Analysis
of solid tissues (Refs. 32-36 and references therein). King et al. found that gene expression measurements from small sample RNA are not really equivalent to measurements from standard sample RNA, possibly because of amplification failure of low-abundance transcripts and sequencespecific differences in amplification efficiency. They, however, concluded that biological variability in gene expression between independent samples is greater than the technical variability associated with the amplification process [36]. Some amplification methods have been shown to have reproducible bias (such as overrepresentation of T-rich sequences), related to the amount of starting material and to the number of amplification cycles. Underrepresentation of mRNA with extensive secondary structure may be partially resolved by performing the reverse transcription step at higher temperatures [37]. Comparisons between amplified and nonamplified samples show that the best correlations of expression levels are obtained for abundant transcripts [38]. The choice ofthe amplification protocol may be important in determining the quality and robustness of the results, as even small variations in methodology introduce considerable distortion of gene expression profiles. Klur et al. have focused on procedures in which a double-stranded cDNA produced from total RNA is used as a template to generate a labeled cRNA, and have compared random PCR amplification, which includes a PCR amplification step at the double-stranded cDNA level and linear amplification, consisting of two cycles of cDNA synthesis followed by in vitro transcription. The authors found that brain microdissections prepared with either method gave similar expression results, in their ability to identify differentially expressed genes. Analysis of technical replicates, however, suggests that random PCR amplification may be more reproducible, requires smaller RNA input, and generates cRNA of higher quality than linear amplification [39]. Several comparisons between amplification procedures are available in the literature [40-431.
18.1.3.2 Some Principles of Data Analysis
The raw data produced by microarray analysis is a digital image. To generate numeric data of gene expression levels, the hybridization spots on the array have to be identified and their intensity measured (image quantitation). Image analysis is often performed through manufacturer’s software, which also generally provides the means for initial quality control and low-levelanalysis of the data (preprocessing). Initial transformation of the data includes background subtraction and elimination (flagging) of aberrant signals and hybridization spots of low intensity (usually, those with intensity less than two or three times the standard deviation of the background intensity). Data are normalized to eliminate systematic, nonbiological variations, such as those introduced by differences in RNA amounts used, sample labeling, dye incorporation, or scanner settings. Normalization makes adjustments for these effects, so that
I
1089
1090
I average gene expression levels are made equivalent among the arrays com18 Genome and Proteome Studies
pared. There are several normalization methods commonly used, and they can be either based on the complete set of arrayed genes, or on endogenous (housekeeping) or exogenous (spiked-in) control genes. All normalization methods are based on some assumptions, such as that most gene expression levels do not change across conditions or that total RNA levels in a sample do not change. When relying on housekeeping genes for normalization, it is useful to refer to a large number of genes, since expression of many of the housekeeping genes can actually vary among different biological settings. For more detailed discussion of data preprocessing, see Refs. 4, 44. These first steps of data transformation are required to organize the data into a gene expression matrix, a table where each row represents a gene and each column an experimental condition. In addition to information on gene expression levels, the table ideally contains information on the variability and accuracy of measurement (e.g., standard deviations among replicates). Data organized in such a way can then be used for analysis: the simplest is the identification of differentially expressed genes. Many publications still characterize differentially expressed genes as those whose expression ratios, or “fold changes” are above an arbitrary set level; however, more complex algorithms that take into account the intrinsic variability of the dataset are possible (see Refs. 4, 45, 46 for an overview of current methods). To further biological insight, additional analytical methods can be applied to simplify the dataset and produce an overview of the data. These analysis approaches can be “unsupervised”, that is, based exclusively on the information intrinsic to the data (Figs. 18.1-2 and 18.1-3),or “supervised”, such as class prediction, which assigns new samples to known classes, on the basis of already acquired biological information (Figs. 18.1-4 and 18.1-5).Examples of unsupervised analyses are the various “clustering” algorithms that create categories of similar data, either by grouping genes into classes with similar expression profiles, or by grouping samples in classes defined by similarly expressed genes. Microarray analysis can also be used to delineate the biological pathways involved in a process, by analyzing whether certain functional classes of genes are overrepresented in a cluster. There is a current effort to develop informatic tools that provide informative gene annotation and correlation with biological pathways. Many of these, such as ArrayXPath (http://www.snubi.org/software/ArrayXPath/), GoMiner (http://discover.nci.nih.gov/gominer),MAPPfinder (http://www.genmapp. org/MAPPFinder.html), or Onto-tools [47], use the organizing principles of Gene Ontology, which characterize genes on the basis of molecular function, biological process, and cellular component (http://www.geneontology.org). We will be unable discuss here the many algorithms that have been formulated to aid both in unsupervised and supervised analysis. For an introduction, we refer the reader to Refs. 4,45,46. For links to analysis software the reader can refer to further websites for array databases: http://genopole. toulouse.inra.fr/bioinfo/microarray/; http://www.rockefeller.edu./genearray/links.php;
18.7 Genome-wide Gene Expression Analysis
Fig. 18.1-2 In the unsupervised approach, pattern-recognition algorithms are used to identify subgroups of samples that have related gene expression profiles. A commonly used method, termed hierarchical clustering [Z], calculates the similarity in expression o f t w o different genes across a set o f samples. Using this similarity measure, genes can be ordered hierarchically, leading to the identification o f genes that are regulated in a similar fashion (coregulation). This method can also be used t o determine the similarity in gene
K-means Clustering
69
0
0
0
00
O0
0 0 0 0
expression between different samples, such as hierarchical clustering o f groups o f genes with similar patterns o f expression in a set of tumor samples. These so-called gene expression signatures may include genes expressed in a specific cell type or stage o f differentiation, or genes expressed during a particular biological response, such as activation o f a specific intracellular signaling pathway or cell proliferation. Typical graphic representations o f data clustering are a dendogram and a “heat map”, which usually color codes the levels o f gene expression.
Fig. 18.1-3
Another unsupervised learning approach is provided by “K-means clustering”. A K number o f cluster centers (“centroids”, in black) are chosen randomly among the samples. The algorithm iteratively assigns samples (in white) to the nearest (most similar) centroid’s cluster and recalculates the centroid based on the new inclusion. The process is repeated until all samples are assigned and centroids no longer change.
http://www.stat.uni-muenchen.de/-strimmer/rexpress.html; http://nslij-genetics.org/microarray/soft.html; ihome.cuhk.edu.hk/ -b400559/arraysoft.html
18.1.3.3 lnterplatform Comparison of Results
With the expanding application of high throughput technologies for analysis of gene expression, an increasingly attractive possibility is the comparison
I
1091
1092
I
18 Genome and Proteome Studies
Fig. 18.1-4 Supervised methods represent an alternative that can be applied if previous information i s available about which genes are expected t o be coregulated. In general, supervised methods use a “training set” in which genes known t o be related by function are provided as positive examples and genes not known t o be members o f that class are negative examples. This “training set” is used by the computer program t o learn to
distinguish between members and nonmembers o f a class on the basis o f gene expression data. The computer program is subsequently used to recognize and classify genes in the “data set” according t o their gene expression levels. Supervised methods therefore compare biological information (e.g., clinical data) with already known gene expression features that are characteristic o f a group.
Supervised learning: linear classifiers Fig. 18.1-5
Class prediction can also be obtained through the use o f support vector machines (SVMs). SVM can test several mathematical combinations o f genes to find a line or plane that optimally separates groups o f samples in the training set and accurately classifies new samples.
Disease 1 (u
C
.-0
0
Disease 2
0
Gene combination 1
of data sets from independent experiments, sometimes based on different microarray platforms. Unfortunately,the obvious advantage of having multiple observations at our disposal is often offset by the difficulty in comparing experiments that are heterogeneous in format, sample annotation, type of microarray used, and statistical processing of results. While intraplatform reproducibility is quite satisfactory in many of the studies that have
78.7 Genome-wide Gene Expression Analysis
addressed this issue, the analysis of interplatform variability has occasionally produced discouraging findings. Studies comparing gene expression levels and significant gene expression changes obtained by analyzing the same RNA samples with different microarray systems often show relatively low correlation between platforms (Refs. 16,48-50 and references therein), so that completely different sets of differentially expressed genes may be identified when the same sample is analyzed with different arrays [Sl]. Perhaps not surprisingly, the best correlations are obtained for highly expressed genes [16, 491. A major source of variation for oligonucleotide arrays is the choice of the probe sequence, which determines the affinity of hybridization with the sample [lG]. Short oligonucleotides result in more specific target identification compared to long cDNA clones that are more likely to give cross-hybridization to homologous sequences on other genes [52]. Jarvinen et al. report a fairly good correlation for gene expression data from two commercial platforms, Afimetrix and Agilent ( r = 0.78-0.86), but lower correlations for data obtained from custom-made arrays. Their analysis shows that more than half of the discrepancies can be traced back to incorrect clones on the custom-made arrays and to problems in gene designation and annotation [52]. Another source of variation is introduced during data analysis, as different algorithms may cause variability in the measured spot intensity levels or in the number of analyzable data points between different microarray platforms. Low-level analysis, such as quality filtering and normalization, is most often performed with the software provided by the array manufacturer and may have substantial influence on subsequent processing of the data. An additional level of difficulty in comparing results obtained with different microarray settings is introduced by the lack of standardization in gene annotation [48]. One note of caution in the interpretation of the above validation studies is the observation that the number of replicates analyzed is often quite small. This fact could contribute to the limited overlap observed between findings obtained with different platforms. The differences between multiple platforms have also been exploited as a method to cross-validate microarray data. Lee et al. have proposed the application of a mutual validation algorithm to data obtained from two microarray platforms (oligonucleotide and cDNA arrays) that are subject to different artifacts, to generate a consensus gene expression dataset more reliable than either set. Such an approach would substitute individual validation of differentially expressed genes through more “classic” methods, such as northern blot or quantitative RT-PCR [53]. A conceptually similar approach has been used in silico, by comparing publicly available datasets for acute lymphoblastic leukemia to cross-validate findings from a new microarray experiment [54]. A list of differentially expressed genes that had been reported in the literature as possible subclass predictors was validated on all of the independent datasets generated on the various array platforms [54].
I
1093
1094
I
18 Genome and Proteorne Studies
18.1.3.4
Toward a Standardization o f Microarray Data
Microarray data are context-dependent since they rely on the use of different reagents and software packages for data processing and analysis. The large number of hardware and software tools employed, as well as the fragmentary information on the experimental settings, constitute an obvious obstacle to the meaningful comparison of microarray data from different sources. Efforts to standardize the recording of microarray-basedexperiments and the formulation of gene expression data have been promoted by the Microarray Gene Expression Data (MGED) Society.MGED is an international organization of biologists, computer scientists, and data analysts whose aim is to develop and promote tools that facilitate the sharing of high throughput data generated by functional genomics and proteomics experiments. Its efforts are articulated mainly in three areas: Minimum Information About a Microarray Experiment (MIAME, http:// www.mged.org/miame) is a document that describes the minimum information required to ensure easy interpretation and independent verification of microarray data [55]. A guideline itemizes the detailed information that should be included while reporting a microarray experiment (see Table 18.1-1for a summarized checklist). MIAME-required information should be encoded using a standard language, MAGE-ML (for Microarray Gene Expression Markup Language). MAGE-ML is a formal language designed to describe information about microarray-basedexperiments, including microarray designs and manufacturing information, microarray experiment setup and execution information, and gene expression data and data analysis results. The MAGE Workgroup (http://www.mged.org/Workgroups/MAGE/mage.html) has simplified the MAGE language by omitting some elements and producing MAGEML-Lite; however, the MAGE format may still be somewhat hostile for the inexperienced user. Furthermore, terms used to provide MIAME-compliant information should be chosen from a controlled vocabulary, codified by the Ontology Working Group (OGW, http://mged.sourceforge.net/ontologies/index.php). The primary purpose of the MGED Ontology is to provide standard terms for the annotation of microarray experiments. The terms are provided in the form of an ontology, which not only defines precisely the terms included in the vocabulary but also describes how the terms are related to each other. The MGED website lists several links to MIAME-supportive gene expression databases or microarray analysis tools that use the ontology standard vocabulary. Although compliance with MGED guidelines is still somewhat limited, it is of note that journals such as Nature, Cell, and The Lancet have adopted these guidelines for submitting microarray expression data for publication. In addition to demanding MIAME-compliant data, Nature and Cell require authors to submit their microarray data to a public repository as a precondition
18. 1 Genome-wide Gene Expression Analysis
I
1095
Table 18.1-1 MIAME checklist Experiment design Goal of the experiment 0 Description of the experiment (e.g., abstract from a related publication) 0 Keywords (e.g., time course, cell type comparison) 0 Experimental factors (the parameters or conditions tested) 0 Experimental design - relationships between samples, treatments, extracts, and so on 0 Quality control steps taken (e.g., replicates or dye swaps) 0 Links to the publication, any supplemental websites or accession numbers Samples used, extract preparation and labeling 0 Origin of each biological sample and its characteristics (e.g., gender, age, developmental stage, strain, or disease state) 0 Manipulation of biological samples and protocols used 0 Technical protocols for preparing the hybridization extract and labeling 0 External controls (spikes),if used Hybridization procedures and parameters 0 Protocol and conditions used for hybridization, blocking and washing, including any postprocessing steps such as staining Measurement data and specijications Data - The raw data, namely, scanner or imager and feature extraction output - The normalized and summarized data (gene expression data matrix) 0 Data extraction and processing protocols - Image scanning hardware and software, processing procedures - Normalization, transformation, and data selection procedures Array design 0 General array design, including the platform type Array feature and reporter annotation, normally represented as a table 0 For each feature (spot) on the array, its location on the array and the reporter present in the location 0 For each reporter, unambiguous characteristics of the reporter molecule, including the sequence for oligonucleotide based reporters, the source, preparation and database accession number for long reporters, and primers for PCR-based reporters Appropriate biological annotation for each reporter 0
for publication, a requirement shared by a number of other life-science journals as well.
18.1.3.5
Public Databases for Gene Expression Data
Gene profiling experiments produce large volumes of data, whose significance typically goes beyond the first immediate analysis of the first report. The data generated in one laboratory may become a useful source of information for a large number of researchers and clinicians. The need to reinvestigate and compare over time the gene expression datasets generated in different experimental systems has encouraged the establishment of a growing number of public databases for gene expression data [SG]. Examples are the ArrayExpress repository of the European Bioinformatics
1096
I
18 Genome and Proteome Studies
Institute (http://www.ebi.ac.uk/arrayexpress),the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) of the National Institute of Health (GEO, http://ncbi.nlm.nih.gov/geo/), and the Center for Information Biology Experimentation Databases (CIBEX, http://cibex.nig.ac.jp/index. jsp) in Japan. These databases have adopted the standards proposed by the MGED Society and implement the Gene Ontology vocabulary. The RNA Abundance Database (RAD; http://www.cbil.upenn.edu/RAD) has recently been updated to provide a MIAME-supportive infrastructure for gene expression data management [57]. Software has been developed to generate MAGE-ML documents that permit export of studies from RAD to other MAGE-ML compatible databases. RAD has also been linked to an integrated databases system, Genomics Unified Schema (GUS - http://www.gusdb.org). GUS maximizes information from stored data by providing a platform that integrates genomic and transcriptome data from multiple organisms (http://www.allgenes.org). The RIKEN Expression Array Database of the Institute of Physical and Chemical Research, Japan (READ, http://read.gsc.riken.go.jp/) is a database of expression profile data from the RIKEN mouse cDNA microarray. It stores the microarray experimental data and information, and provides Web interfaces for researchers to retrieve, analyze, and display their data [58]. The Stanford Microarray Database (SMD; http://genome-www.stanford.edu/microarray/) serves as a microarray research database for the entire scientific community, by providing full public access to the data published by SMD users, along with many tools to explore and analyze those data. SMD currently provides public access to data from 5000 microarrays. Stanford Genomic Resources also offer a comprehensive yeast gene expression database (SGD). A project-dedicated database is represented by Germonline (http://www.germonline.org), which provides cross-species microarray data relevant to the mitotic and meiotic cell cycles, as well as gametogenesis [59]. Several databases offer the possibility to perform global analysis of datasets derived from different technologies. CleanEx (http://www.cleanex.isb-sib.ch/) of the Swiss Institute of Experimental Cancer Research is a curated database that includes microarray and serial analysis of expression (SAGE) expression data. The data is presented in a way that facilitates joint analysis and cross-data set comparisons [60].By collecting and integrating different types of expression data, the Gene Expression Database (GXD, http://www.informatics.jax.org/ or http://www.informatics.jax.org/menus/expression_menu.shtml) provides information about expression profiles in different mouse strains and mutants. The database classifies genes and gene products according to the Gene Ontology project [61,62].Links to additional gene expression databases can be found at: http://ihome.cuhk.edu.hk/-b400559/arraysoft~public.html and http://www. 123.genomics.com
78. I Genome-wide Gene Expression Analysis
18.1.4 Applications and Practical Examples
18.1.4.1 Development and Function of CD4+ T-cell Subsets: Gene Profiling as a Tool to Identify Transcriptional Networks in Infectious and Inflammatory Diseases
The discovery of polarized subsets of CD4+ Tcells that differ in their cytokine secretion pattern and effector functions has provided the molecular framework for the understanding of the diversity of T-cell-dependent immune responses against different types of pathogens [63, 641. The two subsets of differentiated CD4+ T cells, T helper type 1 (Thl) and T helper type 2 (Th2), protect against different microbial pathogens by producing cytokines able to mobilize different mechanisms of defense. Thl cells are characterized by the secretion of interferon-y (IFN-y) and are adept at macrophage activation. Such cells have been demonstrated in numerous infectious disease models to activate appropriate host defenses against intracellular pathogens, including viruses, bacteria, yeast, and protozoa. Th2 cells produce interleukin (1L)-4, IL-5, and IL-13, and are involved in the development of humoral immunity protecting against extracellular pathogens (Fig. 18.1-6). On the other hand, uncontrolled Thl responses are associated with inflammatory or autoimmune pathologies such as rheumatoid arthritis (RA), insulin-dependent diabetes mellitus (IDDM), or psoriasis and excessive Th2 responses are associated with allergies and asthma [65].This indicates that the development of Thl and Th2 cells must be tightly controlled and that therapeutic modulation of immune responses may have an impact on human diseases. During the past decade, important progress has been made in the understanding of the mechanisms that regulate the development and the functional properties of Thl and Th2 cells. Thl and Th2 cells develop from a common precursor, the naive CD4+ T cells. T helper cell differentiation is initiated by triggering of the T-cell receptor (TCR) on na'ive CD4+ T cells, and cytokines present at the time of stimulation are essential to determine the cell fate of the developing effector T-cell population: IL-4 activates signal transducer and activator of transcription 6 (STAT6) and promotes Th2 differentiation while IL-12 is a potent inducer of Thl development, through activation of STAT4 [65-671. IFN-y has been shown to be an important cofactor for Thl cell development, because of its ability to stimulate antigen-presenting cells (activated macrophages and dendritic cells) to produce high-levels of IL-12. An important breakthrough in the understanding of the molecular events that determine the differentiation and the activity of Thl and Th2 cells has been the identification of two so-called master regulators, T-bet and GATA-3. The transcription factor T-bet is expressed in Thl cells and activates Thl cell-specifictranscripts such as IFN-y [68].Conversely, the transcription factor GATA-3 plays a central role in Th2 cell development by inducing expression ofthe Th2 cytokines IL-4, IL-5, and IL-13 [69, 701 (Fig. 18.1-7).
I
1097
1098
I
18 Genome and Proteome Studies
Fig. 18.1-6 T helper cell differentiation. T h l and Th2 cells develop from a common precursor, the na'ive CD4+ T cell. Na'l've CD4+ T cells differentiate into T helper type 1 and T helper type 2 (Thl and Th2) cells that protect against microbial pathogens by producing cytokines that mobilize appropriate defence mechanisms. The differentiation process is initiated by stimulation ofthe T-cell receptor (TCR) on the naive CD4+ T cell with a peptide-major histocompatibility complex (MHC) complex on an antigen-presenting cell (APC).
Differentiation o f na'ive precursor T cells into T h l or Th2 cells depends mainly on the cytokine environment at the time o f priming. I L-4 promotes Th2 development, whereas IL-12 plays a central role in controlling the development o f T h l cells. IL-12 is produced by dendritic cells (DC), which are the most potent APC for na'l've CD4+ T cells. T h l cells secrete IFN-y and are important effectors o f cell-mediated immunity, whereas Th2 cells secrete IL-4, IL-5, and IL-13 (the so-called Th2 cytokines) and are important mediators o f humoral immunity.
To learn more about the differentiation and functional properties of human Thl and Th2 cells and also to possibly identify molecules which could be of interest for pharmacological intervention in chronic inflammatory diseases, we decided to take an independent approach to study human Thl and Th 2 cells by analyzing their gene expression profiles [73]. We generated human Thl and Th2 cells from cord blood leukocytes and analyzed samples 3 days after stimulation to detect changes of gene expression that occurred early in the differentiation process. In this study, we used Affymetrix high-density oligonucleotide arrays with the capacity to display transcript levels of GOO0 human genes. The analysis of the chip data was performed using software developed in house. After analyzing gene expression data from Thl and Th2 cells derived from two independent donors, we realized that it was difficult to discriminate between subset-specific and donor-specific changes in gene
18.7 Genome-wide Gene Expression Analysis
Fig. 18.1-7 Control o f T helper cell differentiation. Following the identification o f T-bet as the master transcription factor inducing T h l development, a model o f T helper cell differentiation has been proposed [68]. According t o this model, IL-12 signals through high-affinity IL-12 receptors via STAT4 t o activate expression ofT-bet. Subsequently, T-bet activates expression o f IFN-y and represses expression ofthe Th2 cytokines IL-4, IL-5, and IL-13. Consistent with previous findings from several laboratories (reviewed in Refs. 65, 70), IL-4 directs Th2 differentiation by a mechanism that involves STAT6-dependent activation o f GATA-3 expression. GATA-3 is the “mirror image” ofT-bet in that it activates expression of Th2 cytokines and represses the T h l cytokine, IFN-y. The main feature o f
this model is that cytokine receptor signaling and STAT activation are placed upstream o f the master T helper lineage-determining transcription factors T-bet and GATA-3. This model also infers that T-bet and GATA-3 antagonize each other. Subsequent studies have shown that following stimulation o f na’l’ve CD4’ T cells, expression of T-bet is strongly induced by IFN-y signaling and STAT1 activation [71, 721, indicating a positive feed back loop similar t o Th2 cell differentiation. This figure also indicates that in addition to TCR and cytokine receptor signaling, costimulatory molecules (such as CD28), adhesion molecules (such as LFA-l), and signaling through other cell surface receptors (e.g., CD40-CD40 ligand interactions) can influence T helper cell differentiation.
expression. We therefore decided to analyze gene expression in Thl and Th2 cells generated from three additional donors and to analyze the dataset using a statistical algorithm (paired t-test). We found 215 genes to be differentially expressed at a confidence level of 95% and whose change in expression level was at least twofold. To confirm the results obtained with oligonucleotide arrays with an independent technique, we also analyzed mRNA expression of a selected set of genes in Thl and Th2 RNA samples using kinetic RT-PCR [74]. As expected, we noticed variability in gene expression changes in cell lines derived from different subjects, but we could confirm differential expression of 28 of 29 genes in Thl and T h 2 cells generated from two independent donors.
1
1099
V00536 U64198 M27288 049487 M60278 X66945 U43672 U89922 M58286 S83362 M32315 U00872
XI 4798 M83667 U04898 M97936 LO5072 XI 7254 577154 U53830 U22376 U72862 M91196 U22431 X90824 LO6633 U15641 M31627 U37431 X74143 581439 J04076 M29204 X58072
M69203 X63629 M23178 X95876 X72755 M25280 U76764 MI6336 D43767 X60992 M32334 S80335
voltage gated Calcium channel delayed rectifier K(+) channel lanotropic ATP receptor P2X vacuolar proton ATPase TAP 1
0032 0001 0001 0046 0047 0007 0001 0012 0029 0017 0039 0001
0 022 0 013 0 001 0 003 0 003 0 038 0 028 0 020 0 001 0015 0 021 0 002 0019 0 004 0 047 0 001 0 020 0016 0021 0 049 002 4 0003
0048 0009 0001 0033 0041 0.023 004 7 0013 0037 0006 0031 0016
U07139 AF003743 U49395 X71490 X57522
ion channels and transporters
MIP-1 beta p-cadhenn MIP-I alpha CXCR3 MIG L-selectlr CD97 CD2 TARC CD6 ICAM 2 integrin beta 7
adhesion and migration
ets-1 NF-IL6 beta ROR alpha 2 ISGF-3 p91 IRF-1 GATA-lIERYF1 TlNUR IRF-7A c-myb IFN-induced protein 35 ICSBP HIF-1 alpha USFZ leucine ripper protein EZF-4 XBP-1 HOX-1A BF-2 EGRZ EGR alpha GCFnCF 9 GATA 3
transcriptional regul;ition
IFN-gamma IL-IPR beta2 oncostatin M leplin EGF-like growth factor FGF-RllN-sam IL-18R lymphotoxin bela TNF-R1 LlFR TNF-R2 IL-IOR
25
-24 24 25
L 0
a
I
L
L
21
E32 2 h 21
46
1 3.5
I
65
8
54
4
61
'
I -
--
i 90 7 2
: : B
0 039 0 039 0 027 0046 0003
-257
cytokines, growth factors and receptors X60708 X99699 M31951 M28879 U37518 U59863 U26174 U09937 M27891 M93056 U37546 M36118 U16812 X98172 U62801
-
Expression level:
phosphodiesterase 48 senne-pyruvale amlnolransferase metallothionin annex,n 111 GTP cvclohvdrolase I ~. acyl :oA synthelase apol ioprolein E receptor 2 cyc1 h l i n tern ial transferase adei dale cyclase NDf kinase SPlll osomal protein SAP 61 alde yde dehydrogenase 7
metabolic pathways
IFN-induced GBP-1 PKC-L NKG5 p1rn-1 CD38 CD69 RAB 32 IFN-induced GBP-2 PGEP receptor EP2 MAPKKK5 PKC-beta 2 MAPK-actsated kinase MNKl CD40-ligand ITK beta-arrestin 2 kinase suppressor of RAS-1 KSRI PPZA subunit delta PTP zeta PTP-alpha EBV-induced GPCR EBI 2 GPCR EDG-I GPCR GPR6 FDG-1 RHOIRAC GEF lhrornboxane A2 receptor
L20971 X53414 X64177 L20591 U19523 LO9229 275190 M80254 MI1722 D25538 YO7604 UO8815 U10868
0018 0 021 0 008 0004 0017 0.013 0.010 0.001 0003 0.015 0.006 0007 0004
0.003 0.038 0023 0000 0 007 0009 0.006 0.001 0036 0018 0019 0007 0043 0009 0010 0 030 0025
0.006
0036 0.045 0.006 0001 0.007 0.027
0044 0013 0021 0010 0.013 0017 0008 0006 0003 0046 0036 0025 0047 0027 0029
> 1000
M55542 M55284 M85276 MI6750 D84276 230426 U59878 M55543 LZ8175 U67156 X07109 A8000409 D31797 L10717 AF106941 U43586 L76702 M93426 M34668 LO8177 M31210 U18549 U11690 D38081
enzymes and other signaling molecules
CD26IDPPIV XlAP associated factor 1 perforin granzyme B TRAIL I-TRAF pre-granzyme 3 UPAR cystalln c elastase Inhibitor IAP homolog C granzyme H BAK 2 caspase 8 protease M
apoptosis and proteolytic systems
3 23 22
61-
200 -1 000
3 36 37 38
0 < 200
159
d"
47
62
1 -23
8
4 d
18.1 Genome-wide Gene Expression Analysis 4
Fig. 18.1-8 Gene expression profiles o f human T h l and Th2 cells generated from five independent donors were analyzed using high-density oligonucleotide arrays. Genes were selected i f differential expression between T h l and Th2 cells was determined at a confidence level o f 95% on the basis o f t-test statistics performed on a dataset derived from five independent experiments and i f at least a twofold change in expression level was observed. Bars represent “fold change” ofthe mRNA level o f a particular gene when comparing T h l versus Th2 cells (mean o f five experiments). Positive values indicate that the transcript is more abundant in T h l than in Th2 cells and
negative values indicate the opposite. Colors indicate the “absolute” expression level o f a gene (arbitrary fluorescence units). Black: high level ofexpression (>1000); grey: medium level o f expression (200-1000); white: low transcript abundance (<200). The column next to the bar diagram indicates the P value obtained from the result o f a paired t-test performed with the data from independently derived T h l and Th2 cell lines from five donors. Genes were grouped according to their presumed function, based on information available in public databases or in the literature (from Ref. 73).
Well-established marker genes for Thl cells, such as IFN-y and IL-12Rp2 were found at much higher levels in Thl than in Th2 cells (Fig. 18.1-8).In addition, some genes that had previously not been implicated in the process of T helper cell differentiation, such as oncostatin M (OSM), were found to be overexpressed in Thl cells (Fig. 18.1-8).The gene expression profiles of Thl and Th2 cells also revealed differential expression of genes encoding transcription factors, some of which (GATA-3and IRF-1) had previously been characterized in the context of T helper cell differentiation [69, 75-77]. In addition, several transcription factors that had not been associated with T helper cell polarization were also identified, including ETS-1, RORa2, IRF7A, and c-fos. Although, the target genes of these factors in regulating the gene expression patterns specific to each T helper cell subset are not known, it is possible that some of these factors may control individual cytokine gene expression as GATA-3 and T-bet control IL-4 and IFN-y production, respectively. In fact, the recent analysis of Ets-1-deficient mice demonstrated that this transcription factor is an important cofactor ofT-bet to promote IFN-y production and is essential for the efficient development ofThl responses [78]. Thl cells are more susceptible to activation-induced cell death (AICD), a mechanism for downregulation of an immune response and maintenance of T-cell tolerance and are important mediators of tissue damage in inflammatory and autoimmune diseases. Results from our gene expression analysis suggested a potential mechanism for increased susceptibility of Thl cells to AICD and their cytotoxic effects [73]. Thl cells expressed higher levels of TRAIL than Th2 cells, an apoptosis inducing molecule; BAK, a proapoptotic Bcl-2 family member; and proapoptotic caspase-8, perforin and granzyme B (Fig. 18.1-8).The functional program of Thl and Th2 lymphocytes requires these cells to be home to different sites. Thl cells have been shown to preferentially express the chemokine receptor CCR5 and CXCR3, whereas Th2 cells were reported to preferentially express CCR3, CCR4, CCR8, and
I
1101
1102
78 Genome and Proteome Studies
I the chemoattractant receptor CRTh2 [79]. Other gene expression changes identified in our study were consistent with previous experiments defining differential recruitment of Thl and Th2 cells to sites of inflammation. We reported an increased expression of mRNA for fucosyltransferase VII (FucTVII), which codes for an enzyme that mediates the fucosylation of selectin ligands on the surface of T cells (Fig. 18.1-8).This fucosylation is required for the first step of lymphocyte adhesion to endothelial cells, “rolling”. Recent in vivo observations have validated the biological relevance of this finding: FucT-VII was in fact found to be upregulated in Thl cells infiltrating the inflamed joints of patients affected by either RA or juvenile idiopathic arthritis (JIA) [73,80].Moreover, FucT-VII expression and increased P-selectin binding capacity of T cells were associated with a more severe course of the disease [80].These data indicate a critical role of FucT-VII in the enhanced homing of T cells to the inflamed synovium and suggest that inhibitors of FucT-VII enzyme activity may be of significant therapeutic value in the treatment of chronic arthritis. IL-12 also induced two chemokine receptors CCR5 and CCR1, both of which promote increased responsiveness ofThl, but not Th2, cells to MIP-la or RANTES. The activity of RANTES and other chemokines is regulated by CD2G (dipeptidyl-peptidase 1V)-mediated cleavage. The DPP4 (encoding CD2G) mRNA was found upregulated in Thl cells compared to Th2 cells (Fig. 18.1-8).The inactivation of chemokines by CD2G may contribute to the fine control of chemotactic migration of Thl cells by providing a stop signal that keeps cells at the site of inflammation. Finally, higher expression of integrin aGP1 on Thl cells suggested that adhesion and extravasation of Thl cells into tissues triggered by inflammatory chemokines might be mediated by higher surface levels of integrin aGP1 binding to laminin in basal membranes and extracellular matrix. Of the 215 genes which we found differentially expressed in Thl and Th2 cells, 157 genes were expressed at higher levels in Thl cells and 58 genes were overexpressed in Th2 cells. There are several possible explanations for the apparent Thl bias of our shdy. Previous studies have demonstrated that Th2 cells may require more time to acquire their effector functions than Thl cells [81,821. Hamalainen et al. have used an oligonucleotide microarray specifically designed to screen for 250 inflammation-related genes to identify those differentially expressed in human, cord blood-derived Thl and Th2 lines, 2 weeks after initial stimulation [83]. Although the experimental protocol to generate Thl and Th2 cells used in the study by Hamalainen et al. was quite distinct from our protocol [73], there was a large overlap of the genes identified in both studies. In addition to the Thl/Th2 signature cytokines, several chemokines (MIP-la, MIP-lP, RANTES) and chemokine receptors (CCR1, CCR2, CCR4, CCR5) were found differentially expressed in human Thl and Th2 cells [83]. These results further emphasize the importance of correct homing of polarized effector T cells to eradicate pathogens.
18. I Genome-wide Gene Expression Analysis
In a subsequent study, Chtanova et al. used high-density oligonucleotide microarrays to analyze gene expression in mouse CD4’ Thl and Th2 cells, as well as CD8+ type 1 and type 2 T cells (Tcl and Tc2) [84]. In contrast to our study in which Thl-overexpressed genes predominated [73], Chtanova et al. identified more type 2-biased genes [84]. It is important to note that different protocols were used to generate polarized T-cell subsets in the two studies. Chtanova et al. stimulated purified naive mouse CD4+ and CD8+ T cells with anti-CD3/CD28 antibodies, IL-2, and IL-6 plus the polarizing cytokine cocktail. Cells were cultured for 7 days and then restimulated for 24 h with anti-CD3 before extracting RNA. A previous report has demonstrated that IL-6 is able to polarize naive CD4’ T cells into Th2 cells by inducing the initial production of IL-4 in CD4+ T cells [%I. In addition, it has been shown that IL-6 inhibits Thl differentiation in an IL-4-independent manner through the induction of SOCSl [86]. The addition of IL-6 to the cultures could therefore be a possible explanation for the Th2 bias observed in this study [84]. Genes that showed a change of least twofold in at least two separate experiments were considered as differentially expressed. An interesting finding of this study was that STAT4 (the signal transducer relaying IL-12 signals from the cell surface to the nucleus) was expressed at higher levels in mouse Thl than in Th2 cells. We did not observe differential expression of STAT4 mRNAs in human Thl and Th2 cells [73] and Hamalainen et al. found a rather modest downregulation of STAT4 transcripts in human Thl cells [83]. Protein data from our lab and from others did not indicate differential expression of STAT4 in human Thl and Th2 cells [87-901. However, a study by Usui et al. confirmed downregulation of STAT4 mRNA and protein during differentiation of mouse Th2 cells [91].This study also showed that downregulation of STAT4 in Th2 cells is mediated by GATA-3, the “master” regulator of Th2 development [91]. It is at present not clear whether the observed differences of STAT4 regulation in human and mouse Thl and Th2 cells reflect a species-specific differentiation program or result from the experimental conditions in which the cell populations were generated. Chtanova et al. also found two members of the tumor necrosis factor receptor-associated factor (TRAF) family to be differentially expressed in Thl and Th2 cells. TRAF4 was expressed at a higher level in type 1 cells while TRAFS was preferentially expressed in type 2 cells. Members of this family serve as adapter proteins that mediate cytokine signaling; in particular, they seem to play a role in tumor necrosis factor (TNF) and Toll/IL-1 signaling, resulting in activation of transcription factors NF-KB and AP-1. Clearly, additional studies are required to address the biological relevance of these findings. A more recent study addressed specifically the kinetics of gene expression during mouse T helper cell differentiation. Lu et al. analyzed gene expression in unstimulated naive CD4+ T cells and in cells stimulated for 1, 2, 3, and 4 days in Thl- or Th2-inducing conditions [92]. In addition, the authors analyzed the gene expression profiles in Thl and Th2 cells that had been restimulated for 4 h with anti-CD3 antibodies, a procedure that induces, in
1
1103
1104
I particular, expression of genes that are associated with the effector functions of 18 Genome and Proteome Studies
T helper cells. Two independent experiments were performed and genes that showed greater than twofold changes in both were chosen for further analysis. A global hierarchical clustering analysis revealed that the expression pattern of day 1 or day 2 Thl cells is closer to day 1or day 2 Th2 cells than to Thl cells harvested on day 3 or 4. A similar relationship was also observed for Th2 cells, indicating that at the global gene expression level, Thl and Th2 cells begin to diverge at day 3 after primary stimulation [92]. These findings correlate with previous studies that analyzed the kinetics of changes in the chromatin structure at the IFN-y and IL-4 cytokine loci. Histone hyperacetylation at the IL-4 locus was observed in both Thl and Th2 cultures during the first 2 days of T helper cell differentiation. However, at later time points of T helper cell differentiation, histone acetylation was selectively detected at the IL-4 locus in T h 2 cells and at the IFN-y locus in Thl cells [93]. The above studies have provided insight into the mechanisms that control the development of polarized helper T-cell subsets and have given important information about previously unknown effector functions of these cells. However, the in vitro systems used to generate polarized Thl and Th2 cells might not reproduce the conditions that lead to the differentiation of these subsets in vivo. In addition, a critical issue that could not be addressed in these studies concerns the interaction of differentiated Thl and Th2 cells with the tissues, during an infection or in the setting of an inflammatory disease. Infection with the parasite Schistosoma mansoni is a well-established model to study Th2 responses in vivo [64]. Intravenous injection of S. mansoni eggs, which are retained in the lung, results in a strong Th2 response and granuloma formation in the lung. This model has been widely used to study basic mechanisms of asthma, allergy, and other Th2-mediated inflammatory diseases. Neutralization of IL-4 in this model results in a reduced granuloma size and a diminished Th2 response, whereas neutralization of IL-12 results in increased granuloma size and Th2 cytokine production. In the absence of the immunoregulatory cytokine IL-10, enhanced levels of IL-4 and IL-12 are secreted, compared to wild-type mice [94]. IL-4/IL-10 and IL-lO/IL-l2 double knockout mice develop highly polarized Thl and Th2 responses, respectively, after infection with S. rnansoni eggs [95]. Sandler et al. have recently analyzed gene expression profiles of lung tissue from wild type, IL-4/IL-10, and IL-lO/IL-12 double-deficient mice at several time points after challenge with S. mansoni eggs [96].They found that Thl-polarized mice developed only small granulomas and expressed genes that are characteristic of tissue damage. In addition to genes known to be associated with Thl responses (IFN-y-induced genes and TNF-a-induced protein 2), Thl-polarized mice expressed several chemokines (IFN-y-inducibleprotein10 and RANTES),as well as Natural Killer (NK) cell ligands. Activation of macrophages, a hallmark of Thl responses, was reflected by the upregulation of MIP-3a, macrophage-expressed gene 1, macrosialin, and macrophage C-type lectin. Thl-polarized mice also showed features of the acute-phase response, as levels of both IL-1B and its activator
18. I Genome-wide Gene Expression Analysis
caspase 1 were increased, as well as serum amyloids A2 and A3. A particular striking observation was the upregulation of cytotoxic genes such as granzymes A, B, and K, caspases 1 and 3, and the programmed cell death 1ligand. Finally, genes responsible for intracellular protein degradation, including ubiquitin D, ubiquitin-conjugating enzyme 8, and cathepsin D are also upregulated in Thl-polarized mice [96]. I n contrast, Th2-polarized mice formed large granulomas with massive collagen deposition and demonstrated upregulation of genes associated with wound healing [9G]. In particular, expression of IL-13, an important mediator of fibrosis, chemokines that recruit Th2 effector cells, such as MCP-2, and genes induced by Th2 cytokines, such as TGF-p-induced and IL-4-induced gene 1 were found to be upregulated in Th2-polarized mice. Eosinophilia in the lung tissue correlated with the expression of several eotaxins (chemokines that recruit eosinophils) and with increased expression of eosinophil-specific genes such as eosinophil-associated ribonucleases 1, 2, and 5. Furthermore, the presence of alternatively activated macrophages was indicated by their markers, arginase and leukotriene. Thromboxane synthesis was suggested by the induction of arachidonate 15-lipoxygenaseand platelet thromboxane A synthase 1. As noted above, a variety of genes involved in wound healing is found in the lung tissue ofTh2-polarized mice. These include the matrix metalloproteinases (MMPs) 12 and 13, and the gene encoding tissue inhibitor of matrix metalloproteinases (TIMP)-I,the protein that degrades the majority of MMPs. The time-course analysis revealed maximum expression of MMP-9, MMP-13, and TIMP-1 at day 4, followed by TIMP-2 at day 8, and MMP-12 peaking at day 14. Of note, precursors of elements of the extracellular matrix, the procollagens followed a similar pattern. Procollagen types I, 111, and XVIII were expressed early at day 4, followed by procollagen type XIV and XV, peaking at day 8 [9G]. In conclusion, this study demonstrated that Thl responses to S. mansoni eggs are characterized by the expression of genes crucial for cytotoxicity and tissue damage, whereas Th2 responses direct tissue remodeling and wound healing [96]. All these studies have shown the impact of large-scale gene expression profiling on the analysis of polarized helper T-cell populations. The analyses of the expression of 6000 genes in human Thl and Th2 cells and of 11 000 genes in mouse Thl, Tcl, Th2, and Tc2 cells were first attempts to understand the molecular mechanisms underlying the functional diversity of distinct CD4' T-cell subsets. The finding that genes regulating key steps in the process of leukocyte extravasation into inflamed tissues are coregulated in human T-cell subsets, sheds light on the importance of the correct homing of T cells within tissues to eliminate pathogens. Moreover, the analysis of global gene expression profiles during lung inflammation in an infectious disease model has revealed important information about the divergent effects of polarized T h l and Th2 responses on tissues. These large-scale studies have furthered the understanding of the genetic program that controls the differentiation and functional properties of polarized helper T-cell subsets and may have impact on the development of more advanced therapies for inflammatory diseases.
I
1105
1106
I
18 Genome and Proteome Studies
18.1.4.2
Uncoveringthe Mysteries of Regulatory CD4+ CD25' T Lymphocytes by Gene Expression Profiling
One of the central problems in immunology is to understand how the immune system can discriminate between self and nonself, thereby inhibiting autoimmunity but mounting efficient immune responses to eradicate infectious microorganisms. The evolution of the adaptive immune system in higher vertebrates allows a more efficient and specific elimination of invading pathogens than the ancestral innate immune system. The adaptive immune system is characterized by the random generation of antigen receptors in lymphocyte clones with essentially unlimited specificities. This system, however, also generates self-reactivelymphocytes and therefore poses the threat of autoimmunity. Work over the past decades has unraveled the cellular and molecular mechanisms that lead to the elimination or functional inactivation of autoreactive T and B cells in the thymus and bone marrow, respectively. The clonal deletion of self-reactiveT cells in the thymus by apoptosis is called recessive tolerance because elimination of individual autoreactive T-cell clones does not affect other autoreactive clones. However, it soon became clear that not all autoreactive T cells are deleted in the thymus and that additional mechanisms must exist to maintain tolerance. Work over the past 10 years has identified a subpopulation of CD4+ T cells that acts in a dominant way to actively suppress immune activation and plays a critical role in the maintenance of self-tolerance and homeostasis. These cells are characterized by high-level expression of the a chain of the IL-2 receptor (CD25) and have been called natural CD4+ CD25+ T regulatory (Treg) cells [97] (Table 18.1-2). The identification of CD25 as a cell surface marker of Treg and the development of an i n vitro T-cell suppression assay have greatly facilitated the analysis of the mechanisms of T-cell-mediated dominant tolerance [98, 991. Although this is still a matter of some discussion, Table 18.1-2 An overview of CD4+ T-cell subsets CD4+ T-cell subset
Function
Treg
Maintenance of tolerance Prevention of autoimmunity Homeostasis Cell-mediated immunity Protection against intracellular pathogens Humoral immunity Protection against extracellular pathogens
Thl Th2
Transcription factor involved in lineage specification
Cell surface marker
FOXP3
CD25 CTLA-4 GITR IL-12Rp2
Only main features of the individual subsets are shown Molecules in bold are unique for the respective subset.
T-bet STAT4
GATA-3 STATG
CRTHZ
Cytokines secreted following stimulation
IL-10 (?)
IFN-y
IL-4 IL-5 IL-13
18. I Genome-wide Gene Expression Analysis
there is now good evidence that CD4+ CD25+ Treg constitute a separate lineage that develops in the thymus (see Refs. 100-102 for recent reviews). The recent identification of FOXP3 as a transcription factor essential for the development and function of CD4+ CD25+ Treg has provided an important breakthrough for the analysis of this subpopulation of peripheral CD4+ T cells [ 103- 1051. Evidence that this forkhead/winged-helix transcription factor is essential for Treg development comes from the analysis of scurfy mice. These mice carry a mutated Fox@ gene and are characterized by a massive activation and expansion of CD4+ T cells resulting in gross enlargement of secondary lymphoid organs, severe dermatitis, lymphocytic infiltration of multiple organs, hypergammaglobulinemia, and autoimmune hemolytic anemia [ 1061. The analysis of scurfy mice demonstrated that the disease is mediated by CD4+ T cells. This finding was confirmed by the analysis of FOXP3-deficient mice, which display polyclonal activation of CD4+ T cells already 7 days after birth [103]. By knock-in of a GFP-FOXP3 reporter allele into the murine FOXP3 locus, the Rudensky laboratory has now provided compelling evidence that Treg constitute a separate lineage that develops in the thymus and that FOXP3 is in fact the lineage-specification factor of these cells [107]. Importantly, FOXP3 mutations are also responsible for the pathogenesis of immune dysregulation, polyendocninopathy, enteropathy, X-linked (IPEX), a fatal human X-linked disorder characterized by extensive multiorgan lymphocyte infiltration and abnormal activation of effector CD4+ Tcells. At a very young age, IPEX patients present with massive lymphoproliferation, early onset IDDM, thyroiditis, eczema, severe enteropathy, and food allergies preventing normal food intake, and additional autoimmune pathologies such as autoimmune hemolytic anemia and thrombocytopenia, as well as severe infection [108-1101. Affected males succumb to the IPEX syndrome between 3 and 4 weeks of age. Altogether, there is compelling evidence that FOXP3 is necessary for development of CD4+ CD25+ Treg in mice, and the identification of FOXP3 mutations in IPEX patients suggests that this transcription factor plays a similar role in humans. Although the identification of FOXP3 as lineage specification of Treg has provided a precious tool to understand the ontogeny and function of this lineage, the important question of how Treg acquire and exert their suppressive action remains unresolved [ill]. In particular, the target genes of Foxp3 have not been identified and nothing is known about the molecular mechanism by which this transcription factor downregulates the activity of CD4+ T cells. Given the accumulating evidence that the immunosuppressive potential of Treg could be used therapeutically to treat autoimmune diseases and facilitate transplant tolerance, or could be targeted to elicit tumor immunotherapy, it is not surprising that many laboratories are currently trying to unravel the molecular basis of Treg-mediated immunosuppression. Several labs have performed large-scale gene expression studies to identify molecules mediating the suppressive effects of Treg. Most of these studies have been performed in mice and, given the current pace of the field, human studies are sure to follow soon.
1
1107
1108
I
18 Genome and Proteome Studies
Gavin et al. have purified resting CD4+ CD25+ and CD4+ CD25- T cells from normal BG mice by cell sorting and have analyzed their gene expression profiles using Affymetrix m u l l K and mul9K oligonucleotide arrays [112]. In the first experiment, biotinylated cRNA was amplified directly from cDNA, whereas in the second experiment two sequential rounds of in vitro transcription were used to obtain enough cRNA for analysis. With a few exceptions, only transcripts that were differentially expressed in both experiments were considered for confirmation by real-time RT-PCR. A comforting finding was the strong upregulation of CD25 in Treg when compared to CD4+ CD25- T cells. Additional cell surface receptors that were upregulated in Treg included cytotoxic T lymphocyte-associated protein 4 (CTLA4), a molecule that has been implicated in the suppressive effects of Treg, and several members of the TNF receptor superfamily, including glucocorticoid-induced tumor necrosis factor receptor (GITR, also called Tnf$l8), OX40 (also called Tnf$4), 4-1BB (also called Tnfsp),and TNFR2 (also called Tnfsfllb, the p75 chain of the TNF receptor). Together with the overexpression of FAS-associated phosphatase I (FAP-l), these data point to a prolonged survival of Treg by restriction of TCR-induced apoptosis. Furthermore, the authors found higher transcript levels of TGF-BRI, the signal-transducing subunit of TGF-B, an important negative regulator of cell growth and inflammation. Additional transcripts that were found overexpressed in Treg include the suppressors of cytokine signaling SOCSl and SOCSZ, as well as RGSI, a molecule that inhibits chemoltine-induced signaling through heterotrimeric G proteins [ 1121. The authors concluded that the interplay of several pathways, such as increased T-cell survival and blockage of TCR and cytokine signaling, may account for the unique characteristics of Treg [112]. The characteristics of mouse Treg and CD4+ CD25- Tcells were also analyzed in a similar study by McHugh et al. [113].As in the previous report, only two biological replicates were performed; however, this study also analyzed the gene expression profiles of Treg and CD4+ CD25- T cells that had been stimulated for 12 and 48 h with anti-CD3 antibodies. Gene expression profiling was performed using Affymetrix m u l l K oligonucleotide arrays. Only 29 genes were found to be differentially expressed when comparing resting Treg and CD4+ CD25- T cells. For unknown reasons, in this study,the “positive control” of this experiment, CD25, was not detected in Treg [113].Although the use of only two replicate experiments in both studies certainly does not allow major conclusions to be drawn, the fact that 50% ofthe genes found by McHugh et al. were also detected in the study by Gavin et al. provides some cross-validation of the results [112, 1131. McHugh et al. focused their study on the functional role of GITR for the suppressive functions of Treg and demonstrated that agonistic antibodies against G ITR could abrogate Treg-mediated suppression in in vitro T-cell suppression assays [113].Additional microarray-based studies have identified neuropilin-1 (Nrpl) [ 1141 and lymphocyte activation gene-3 (Lag-3) [115] as Treg-specific cell surface molecules. With respect to Lag-3, it
18.1 Genome-wide Gene Expression Analysis
should be noted that this receptor is also highly expressed on activated Thl cells [116]. Herman et al. have recently analyzed the function of Treg in a type 1 diabetes model in mice [117].Type 1 diabetes models are particular useful for the study of autoimmune diseases because mice spontaneously develop the disease and their pathology is very similar to the human counterpart, IDDM. The disease develops in two stages: in the BDC2.5 model cells invade the pancreas and set up a massive infiltrate in the islets at 15-18 days of age (insulitis). Subsequently, only 10-20% of animals develop diabetes resulting from the massive destruction of pancreatic ,&cells at around 20 weeks of age. The authors studied whether the relatively long prediabetic period and low incidence of diabetes in this model may be explained by the presence of Treg in the pancreas. They show that both Treg and effector T cells coexist within the pancreatic lesion before the onset of diabetes. To assess the potential roles of Treg within the lesion, they sorted CD4+ CD25+ CDG9- Treg cells from the pancreas of prediabetic mice and compared their gene expression profile to effector T-cell populations, also isolated by cell sorting from pancreas preparations. Since only small cell numbers could be obtained with these procedures, the authors used commercial kits to amplify RNA. Three to five independent experiments were performed for each cell population and statistical algorithms were used for data analysis [117]. In addition to genes overexpressed in Treg, such as GITR, CD103, Nrp-1, IL-10, and CTLA-4, the authors identified several molecules that had previously not been associated with Treg functions [117]. One of these molecules, inducible costimulator (ICOS), was shown to be specifically upregulated on Treg purified from pancreas but not on Treg that had been purified from peripheral lymph nodes. The authors showed that blockade of ICOS results in a rapid progression from insulitis to diabetes, giving a strong indication that this molecule may play an important role in the maintenance of the prediabetic stage [117].This study provides an excellent example of how increased understanding of the molecular and cellular basis of regulatory events in the pancreatic islets could lead to the development of therapies that promote long-term tolerance even after an immune response has been established in the lesion.
18.1.5 Future Development
Genome-wide gene expression analysis has become a tool that is widely used in biology and biomedical research. Technological improvements are likely to occur with respect to reduced sample input and/or more robust protocols for the preamplification of RNA, an increase of sensitivity, a better signal-to-noise ratio, the development of exon-specific probes to tackle the important issue of differentially spliced transcripts and of probes allowing the analysis of micro-RNAs. An equally important, although certainly more difficult, issue
I
1109
1110
I
18 Genome and Proteome Studies
concerns standardization. Some of the efforts in this direction have been discussed in this review but we would like to reemphasize the importance of a common language for gene annotation and standardized information about experimental setups. Initially, microarray technology was used to answer very basic biological questions and usually the analysis focused on the identification of differentially expressed genes. Now, as microarray technology is becoming more widely used, it is possible to address more and more sophisticated questions by reanalyzing gene expression data present in the databases. Finally, the development of whole-genome microarrays will allow the study of genome-wide regulation of gene expression. The recently developed, so-called ChIP-on-chip technology analyzes the association of a specific factor to a particular region of the genome, at a given time. Very recent examples for this technology include the genome-wide analysis of RNA polymerase I1 association with the yeast genome [118],the global mapping of histone acetylation patterns to gene expression in yeast [119],and the genome-wide analysis oftranscription factor binding to the yeast genome [120].
References 1. D.J. Lockhart, E.A. Winzeler,
Genomics, gene expression and DNA arrays, Nature 2000, 405,827-836. 2. M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 14863-14868. 3. j. Quackenbush, Computational analysis of microarray data, Nat. Rev. Genet. 2001,2,418-427. 4. H.C. Causton, J. Quackenbush, A. Brazma, Microarray Gene Expression Data Analysis. A Beginner’s Guide, Blackwell Publishing, Oxford, 2003. 5. M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monjtoring of gene expression patterns with a complementary DNA microarray, Science 1995, 270, 467-470. 6. R.J. Lipshutz, D. Morris, M. Chee, E. Hubbell, M.J. Kozal, N. Shah, N. Shen, R. Yang, S.P. Fodor, Using oligonucleotide probe arrays to access genetic diversity, Biotechniques 1995, 19,442-447. 7. P.O. Brown, D. Botstein, Exploring the new world of the genome with
8.
9.
10.
11.
12.
DNA microarrays, Nut. Genet. 1999, 21,33-37. R. J. Lipshutz, S.P. Fodor, T.R. Gingeras, D.J. Lockhart, High density synthetic oligonucleotide arrays, Nut. Genet. 1999, 21, 20-24. S.P. Fodor, J.L. Read, M.C. Pirrung, L. Stryer, A.T. Lu, D. Solas, Light-directed, spatially addressable parallel chemical synthesis, Science 1991,251,767-773. A.C. Pease, D. Solas, E.J. Sullivan, M.T. Cronin, C.P. Holmes, S.P. Fodor, Light-generated oligonucleotide arrays for rapid DNA sequence analysis, Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 5022-5026. D.J. Lockhart, H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Horton, E.L. Brown, Expression monitoring by hybridization to high-density oligonucleotide arrays, Nut. Biotechnol. 1996, 14, 1675-1680. T.H.G.S. Consortium, Finishing the euchromatic sequence of the human genome, Nature 2004, 431, 931-945.
References 13.
14.
15.
16.
17.
18.
19.
20.
21.
M.K. Kerr, G.A. Churchill, Experimental design for gene expression microarrays, Biostatistics 2001, 2, 183-201. B.A. Williams, R.M. Gwirtz, B.J. Wold, Genomic DNA as a cohybridization standard for mammalian microarray measurements, Nucleic Acids Res. 2004,32, e81-e81. N. Novoradovskaya, M. Whitfield, L. Basehore, A. Novoradovsky, R. Pesich, J. Usary, M. Karaca, W. Wong, 0. Aprelikova, M. Fero, C. Perou, D. Botstein, J. Braman, Universal reference RNA as a standard for microarray experiments, BMC Genomics 2004, 5,20. P.J. Park, Y.A. Cao, S.Y. Lee, J.-W. Kim, M.S. Chang, R. Hart, S. Choi, Current issues for DNA microarrays: platform comparison, double linear amplification, and universal RNA reference, J . Biotechnol. 2004, 112, 225 -245. W. Gregory Cox, M.P. Beaudet, J.Y. Agnew, J.L. Ruth, Possible sources of dye-related signal correlation bias in two-color DNA microarray assays, Anal. Biochem. 2004, 331, 243-254. A.A. Dombkowski, B.J. Thibodeau, S.L. Starcevic, R.F. Novak, Gene-specific dye bias in microarray reference designs, FEBS Lett. 2004, 560,120-124. M.-L. Martin-Magniette, J. Aubert, E. Cabannes, J.-J. Daudin, Evaluation of the gene-specific dye bias in cDNA microarray experiments, Bioinformatics 2005, 21, 1995-2000. B.H. Mecham, G.T. Klus, J. Strovel, M. Augustus, D. Byrne, P.Bozso, D.Z. Wetmore, T. J. Mariani, I.S. Kohane, 2. Szallasi, Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements, Nucleic Acids Res. 2004, 32, e74. M. Bakay, Y.-W. Chen, R. Borup, P.Zhao, K. Nagaraju, E. Hoffman, Sources of variability and effect of experimental approach on expression
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
profiling data interpretation, BMC Bioinformatics 2002, 3, 4. J.P. Novak, R. Sladek, T.J. Hudson, Characterization of variability in large-scale gene expression data: Implications for study design, Genomics 2002, 79, 104-113. C.C. Pritchard, L. Hsu, J. Delrow, P.S. Nelson, Project normal: Defining normal variance in mouse gene expression, Proc. Natl. Acad. Sci. U.S.A. 2001, 98,13266-13271. M.C.K. Yang, J.J. Yang, R.A. Mclndoe, J.X. She, Microarray experimental design: power and sample size considerations, Physiol. Genomics 2003, 16,24-28. C. Wei, J. Li, R. Bumgarner, Sample size for detecting differentially expressed genes in microarray experiments, B M C Genomics 2004,5, 87. S.-H. Jung, H. Bang, S. Young, Sample size calculation for multiple testing in microarray data analysis, Biostatistics 2005, 6, 157-169. K. Dobbin, R. Simon, Sample size determination in microarray experiments for class comparison and prognostic classification, Biostatistics 2005, 6, 27-38. C.-A. Tsai, S.-J. Wang, D.-T. Chen, J.J. Chen, Sample size for gene expression microarray experiments, Bioinformatics 2005, 21, 1502-1508. X. Peng, C. Wood, E. Blalock, K. Chen, P. Landfield, A. Stromberg, Statistical implications of pooling RNA samples for microarray experiments, BMC Bioinformatics 2003, 4, 26. C. Kendziorski, R.A. Irizarry, K.-S. Chen, J.D. Haag, M.N. Gould, On the utility of pooling biological samples in microarray experiments, Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 4252-4257. J.H. Shih, A.M. Michalowska, K. Dobbin, Y. Ye, T.H. Qiu, J.E. Green, Effects of pooling mRNA in microarray class comparisons, Bioinformatics 2004, 20, 3318-3325. L. Luo, R.C. Salunga, H. Guo, A. Bittner, K.C. Joy, J.E. Galindo,
I
11 11
1112
I
18 Genome and Proteome Studies
33.
34.
35.
36.
37.
38.
H. Xiao, K.E. Rogers, J.S. Wan, M.R. Jackson, M.G. Erlander, et al. Gene expression profiles of laser-captured adjacent neuronal subtypes, Nut. Med. 1999, 5, 117-122. C. Leethanakul, V. Patel, J. Gillespie, M. Pallente, J.F. Ensley, S. Koontongkaew, L.A. Liotta, M. Emmert-Buck, J.S. Gutkind, Distinct pattern of expression of differentiation and growth-related genes in squamous cell carcinomas of the head and neck revealed by the use of laser capture microdissection and cDNA arrays, Oncogene 2000, 19, 3220-3224. L.V. Hooper, M.H. Wong, A. Thelin, L. Hansson, P.G. Falk, J.I. Gordon, Molecular analysis of commensal host-microbial relationships in the intestine, Science 2001, 291,881-884. V. Luzzi, M. Mahadevappa, R. Raja, J.A. Warrington, M.A. Watson, Accurate and reproducible gene expression profiles from laser capture microdissection, transcript amplification, and high density oligonucleotide microarray analysis, J. Mol. Diagn. 2003, 5, 9-14. C. King, N. Guo, G.M. Frampton, N.P. Gerry, M.E. Lenburg, C.L. Rosenberg, Reliability and reproducibility of gene expression measurements using amplified KNA from laser-microdissected primary breast tissue with oligonucleotide arrays, J. Mol. Diagn. 2005, 7, 57-64. T. Ernst, M. Hergenhahn, M. Kenzelmann, C.D. Cohen, M. Bonrouhi, A. Weninger, R. Klaren, E.F. Grone, M. Wiesel, C. Gudemann, J. Kuster, W. Schott, G. Staehler, M. Kretzler, M. Hollstein, H.-J. Grone, Decrease and gain of gene expression are equally discriminatory markers for prostate carcinoma: A gene expression analysis on total and microdissected prostate tissue, Am. J . Pathol. 2002, 160, 2169-2180. D.J. Kelly, S. Ghosh, RNA profiling for biomarker discovery: practical considerations for limiting sample sizes, Dis. Markers 2005, 21,43-48.
S. Klur, K. Toy, M.P. Williams, U.Certa, Evaluation of procedures for amplification of small-size samples for hybridization on microarrays, Genomics 2004, 83, 508-5 17. 40. J. McClintick, R. Jerome, C. Nicholson, D. Crabb, H. Edenberg, Reproducibility of oligonucleotide arrays using small samples, BMC Genomics 2003, 4, 4. 41. R. Singh, R.J. Maganti, S.V. Jabba, M. Wang, G. Deng, J.D. Heath, N. Kurn, P. Wangemann, Microarray based comparison of three amplification methods for nanogram amounts of total RNA, AmJ Physiol Cell Physiol, 2005, 288, 1179-1189. 42. L. Li, J. Roden, B.E. Shapiro, B.J. Wold, S. Bhatia, S.J. Forman, R. Bhatia, Reproducibility,fidelity, and discriminant validity of mRNA Amplification for microarray analysis from primary hematopoietic cells, J. Mol. Diagn. 2005, 7,48-56. 43. J. J. Upson, R. Stoyanova, H.S. Cooper, C. Patriotis, E.A. Ross, B. Boman, M.L. Clapper, A.G. Knudson, A. Bellacosa, Optimized procedures for microarray analysis of histological specimens processed by laser capture microdissection, 1.Cell. Physiol. 2004, 201, 366-373. 44. B.M. Bolstad, F. Collin, K.M. Simpson, R.A. Irizarry, T.P. Speed, Experimental Design and Low-Level Analysis of Microarray Data, Int Rev Neurobiol. 2004, 60, 25-58. 45. N.J. Armstrong, M.A. van de Wiel, Microarray data analysis: From hypotheses to conclusions using gene expression data, Cell. Oncol. 2004,26,279-290. 46. D.K. Slonim, From patterns to pathways: gene expression data analysis comes of age, Nat. Genet. 2002,32,502-508. 47. P. Khatri, P. Bhavsar, G. Bawa, S. Draghici, Onto-Tools:an ensemble of web-accessible,ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments, Nucleic Acids Res. 2004, 32, W449-W456.
39.
References 48.
49.
50.
51.
52.
53.
54.
P.K. Tan, T.J. Downey, E.L. Spitznagel Jr, P. Xu, D. Fu, D.S. Dimitrov, R.A. Lempicki, B.M. Raaka, M.C. Cam, Evaluation of gene expression measurements from commercial microarray platforms, Nucleic Acids Res. 2003, 31, 5676-5684. R. Shippy, T. Sendera, R. Lockner, C. Palaniappan, T. Kaysser-Kranich, G. Watts, J. Alsobrook, Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations, B M C Genomics 2004, 5, 61. D. Hollingshead, D.A. Lewis, K. Mirnics, Platform influence on DNA microarray data in postmortem brain research, Neurobiol. Dis.2005, 18,649-655. L.W. Jurata, Y.V. Bukhman, V. Charles, F. Capriglione, J. Bullard, A.L. Lemire, A. Mohammed, Q. Pham, P. Laeng, J.A. Brockman, C.A. Altar, Comparison of microarray-based mRNA profiling technologies for identification of psychiatric disease and drug signatures, /. Neurosci. Methods 2004, 138,173-188. A,-K. Jarvinen, S. Hautaniemi, H. Edgren, P. Auvinen, J. Saarela, 0.-P. Kallioniemi, 0. Monni, Are data from different gene expression microarray platforms comparable? Genomics 2004,83,1164-1168. J. Lee, K. Bussey, F. Gwadry, W, Reinhold, G. Riddick, S. Pelletier, S. Nishizuka, G. Szakacs, J:P. Annereau, U. Shankavararn, S. Lababidi, L. Smith, M. Gottesman, J. Weinstein, Comparing cDNA and oligonucleotide array data: concordance of gene expression across platforms for the NCI-60 cancer cells, Genome Biol. 2003, 4, R82. S. Mitchell, K. Brown, M. Henry, M. Mintz, D. Catchpoole, B. LaFleur, D. Stephan, Inter-Platform comparability of microarrays in acute lymphoblastic leukemia, B M C Genomics 2004, 5, 71.
55.
56.
57.
58.
59.
A. Brazma, P. Hingamp, j. Quackenbush, G. Sherlock, P. Spellman, C. Stoeckert, J. Aach, W. Ansorge, C.A. Ball, H.C. Causton T. Gaasterland, P. Glenisson, F.C. Holstege, I.F. Kim, V. Markowitz, J.C. Matese, H. Parkinson, A. Robinson, U. Sarkans, S . Schulze-Kremer, J. Stewart, R. Taylor, J. Vilo, M. Vingron, Minimum information about a microarray experiment (MIAME)-toward standards for microarray data, Nat Genet. 2001, 29, 365-371. C.]. Penkett, J. Baehler, Navigating public microarray databases, Comp. Funct. Genomics 2004,5471-479, E. Manduchi, G.R. Grant, H. He, J. Liu, M.D. Mailman, A.D. Pizarro, P.L. Whetzel, C.J. Stoeckert Jr, RAD and the RAD study-annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies, Bioinformatics 2004, 20,452-459. H. Bono, T. Kasukawa, Y. Hayashizaki, Y. Okazaki, READ: R I K E N expression array database, Nucleic Acids Res. 2002, 30, 211-213. C. Wiederkehr, R. Basavaraj, C. Sarrauste de Menthiere, L. Hermida, R. Koch, U. Schlecht, A. Amon, S. Brachat, M. Breitenbach, P. Briza, S. Caburet, M. Cherry, R. Davis, A. Deutschbauer, H.G. Dickinson, T. Dumitrescu, M. Fellous, A. Goldman, J.A. Grootegoed, R. Hawley, R. Ishii, B. Jegou, R.J. Kaufman, F. Klein, N. Lamb, B. Maro, K. Nasmyth, A. Nicolas, T. Orr-Weaver, P. Philippsen, C. Pineau, K.P. Rabitsch, V. Reinke, H. Roest, W. Saunders, M. Schroder, T. Schedl, M. Siep, A. Villeneuve, D.J. Wolgemuth, M. Yamamoto, D. Zickler, R.E. Esposito, M. Primig, Germonline, a cross-species community knowledgebase on germ cell differentiation, Nucleic Acids Res. 2004, 32, D56O-DS67.
I
1113
1114
I
78 Genome and Proteome Studies 60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
V. Praz, V. Jagannathan, P. Bucher, CleanEx: a database of heterogeneous gene expression data based on a consistent gene nomenclature, Nucleic Acids Res. 2004, 32, D542-D547. M. Ringwald, J.T. Eppig, J.E. Richardson, GXD: integrated access to gene expression data for the laboratory mouse, Trends Genet. 2000, 16,188-190. D.P. Hill, D.A. Begley, J.H. Finger, T.F. Hayamizu, I. J. McCright, C.M. Smith, J.S. Beal, L.E. Corbani, J.A. Blake, J.T. Eppig, J.A. Kadin, J.E. Richardson, M. Ringwald, The mouse Gene Expression Database (GXD):updates and enhancements, Nucleic Acids Res. 2004, 32, D568-D571. T.R. Mosmann, R.L. Coffman, Th1 and Th2 cells: Different patterns of lymphokine secretion lead to different functional properties, Annu. Rev. Immunol. 1989, 7, 145-173. A.K. Abbas, K.M. Murphy, A. Sher, Functional diversity of helper T lymphocytes, Nature 1996,383, 787-793. K.M. Murphy, S.L. Reiner, The lineage decisions of helper T cells, Nut. Rev. Immunol. 2002, 2, 933-944. G. Trinchieri, Interleukin-12 and the regulation of innate resistance and adaptive immunity, Nat. Rev. Immunol. 2003,3,133-146. S.J. Szabo, B.M. Sullivan, S.L. Peng, L.H. Glimcher, Molecular mechanisms regulating Thl immune responses, Annu. Rev. Immunol. 2003,21,713-758. S.J. Szabo, S.T. Kim, G.L. Costa, X. Zhang, C.C. Fathman, L.H. Glimcher, A novel transcription factor, T-bet, directs Thl lineage commitment, Cell 2000, 100, 655-669. W. Zheng, R.A. Flavell, The transcription factor GATA-3 is necessary and sufficient for Th2 cytokine gene expression in CD4 T cells, Cell 1997, 89, 587-596. K.A. Mowen, L.H. Glimcher, Signaling pathways in Th2
71.
72.
73.
74.
75.
76.
77.
development, Immunol. Rev. 2004, 202,203-222. A.A. Lighvani, D.M. Frucht, D. Jankovic, H. Yamane, J. Aliberti, B.D. Hissong, B.V. Nguyen, M. Gadina, A. Sher, W.E. Paul, J.J. O’Shea, T-bet is rapidly induced by interferon-g in lymphoid and myeloid cells, Proc. Nutl. Acad. Sci. U.S.A. 2001, 98,15137-15142. M. Afkarian, J.R. Sedy, J. Yang, N.G. Jacobson, N. Cereb, S.Y. Yang, T.L. Murphy, K.M. Murphy, T-bet is a STAT1-induced regulator of IL-12R expression in naive CD4+ T cells, Nut. Immunol. 2002, 3, 549-557. L. Rogge, E. Bianchi, M. Biffi, E. Bono, S.Y. Chang, H. Alexander, C. Santini, G. Ferrari, L. Sinigaglia, M. Seiler, M. Neeb, J. Mous, F. Sinigaglia, U. Certa, Transcript imaging of the development of human T helper cells using oligonucleotide arrays, Nat. Genet. 2000,25,96-101. R. Higuchi, R. Watson, Kinetic PCR analysis using a CCD camera and without using oligo nucleotide probes. In PCR Applications (Eds.: M.A. Innis, D.H. Gelfand, J.J. Sninsky), Academic Press, San Diego, 1999, pp. 263-284. W. Ouyang, S.H. Ranganath, K. Weindel, D. Bhattacharya, T.L. Murphy, W.C. Sha, K.M. Murphy, Inhibition of Thl development mediated by GATA-3 through an IL-4-independent mechanism, Immunity 1998, 9,745-755. S. Taki, T. Sato, K. Ogasawara, T. Fukuda, M. Sato, S. Hida, G. Suzuki, M. Mitsuyama, E.-H. Shin, S. Kojima, T. Taniguchi, Y. Asano, Multistage regulation of Thl-type immune responses by the transcription factor IRF-1, Immunity 1997, 6,673-679. E.M. Coccia, N. Passini, A. Battistini, C. Pini, F. Sinigaglia, L. Rogge, IL-12 induces expression of interferon regulatory factor-1 via signal transducer and activator of transcription-4 in human T helper
References
78.
79.
80.
81.
82.
83.
84.
85.
type 1 cells, J . B i d . Chem. 1999, 274, 6698-6703. R. Grenningloh, B.Y. Kang, I.C. Ho, Ets-1, a functional cofactor ofT-bet, is essential for T h l inflammatory responses, J . Exp. Med. 2005, 201, 615-626. F. Sallusto, C.R. Mackay, A. Lanzavecchia, The role of chemokine receptors in primary, effector, and memory immune responses, Annu. Rev. Imrnunol. 2000, 18,593-620. F. De Benedetti, P. Pignatti, M. Biffi, E. Bono, S. Wahid, F. Ingegnoli, S.Y. Chang, H. Alexander, M. Massa, A. Pistorio, A. Martini, C. Pitzalis, F. Sinigaglia, L. Rogge, Increased expression of alpha(l,3)-fucosyltransferase-VIIand P-selectin binding of synovial fluid T cells in juvenile idiopathic arthritis, /. Rheol. 2003, 30, 1611-1615. J.J. Bird, D.R. Brown, A.C. Mullen, N.H. Moskowitz, M.A. Mahowald, J.R. Sider, T.F. Gajewski, C.-R. Wang, S.L. Reiner, Helper T cell differentiation is controlled by the cell cycle, Immunity 1998, 9, 229-237. J.A. Lederer, J.S. Liou, S. Kim, N. Rice, A. Lichtman, Regulation of NF-kB activation in T helper 1 and T helper 2 cells,]. Immunol. 1996, 156, 56-63. H. Hamalainen, H. Zhou, W. Chou. H. Hashizume, R. Heller, R. Lahesmaa, Distinct gene expression profiles of human type 1 and type 2 T helper cells, Genome Biol. 2001, 2, research 0022.1-0022.11 . T. Chtanova, R.A. Kemp, A.P. Sutherland, F. Ronchese, C.R. Mackay, Gene microarrays reveal extensive differential gene expression in both CD4(+) and CD8(+) type 1 and type 2 T cells,J. Immunol. 2001, 167, 3057-3063. M. Rincon, J. Anguita, T. Nakamura, E. Fikrig, R.A. Flavell, Interleukin (1L)-6directs the differentiation of IL-4-producing CD4+ T cells, J. Exp. Med. 1997, 185,461-469.
86.
87.
88.
89.
90.
91.
92.
93.
94.
S. Diehl, I.Anguita, A. Hoffmeyer, T. Zapton, J.N. Ihle, E. Fikrig, M. Rincon, Inhibition of T h l differentiation by IL-6 is mediated by SOCS1, Immunity 2000, 13,805-815. C.M.U. Hilkins, G. Messer, K. Tesselaar, A.G.I. van Rietschoten, M.L. Kapsenberg, E.A. Wierenga, Lack of IL-12 signaling in human allergen-specific Th2 cells, J. Irnmunol. 1996, 157,4316-4321. L. Rogge, L. Barberis-Maino, M. Biffi, N. Passini, D.H. Presky, U. Gubler, F. Sinigaglia, Selective expression of an interleukin-12 receptor component by human T helper 1 cells, /. Exp. Med. 1997, 185, 825-831. L. Rogge, D. D’Ambrosio, M. Biffi, G. Penna, L.J. Minetti, D.H. Presky, L. Adorini, F. Sinigaglia, The role of Stat4 in species-specific regulation of Th cell development by type I IFNs,]. lmrnunol. 1998, 161,6567-6574. V. Athie-Morales, H.H. Smits, D.A. Cantrell, C.M. Hilkens, Sustained IL-12 signaling is required for T h l development, J. Immunol. 2004, 172, 61-69. T. Usui, R. Nishikomori, A. Kitani, W. Strober, GATA-3 suppresses Thl development by downregulation of Stat4 and not through effects on IL-12Rbeta2 chain or T-bet, Immunity 2003, 18,415-428. B. Lu, P. Zagouras, J.E. Fischer, J. Lu, B. Li, R.A. Flavell, Kinetic analysis of genomewide gene expression reveals molecule circuitries that control T cell activation and Th1/2 differentiation, Proc. Nutl. Acud. Sci. U.S.A. 2004, 101,3023-3028. 0. Avni, D. Lee, F. Macian, S.J. Szabo, L.H. Glimcher, A. Rao, T(H) cell differentiation is accompanied by dynamic changes in histone acetylation of cytokine genes, Nut. lmrnunol. 2002,3,643-651. T.A. Wynn, R. Morawetz, T. Scharton-Kersten, S. Hieny, H.C. Morse 111, R. Kuhn, W. Muller, A.W. Cheever, A. Sher, Analysis of granuloma formation in double cytokine-deficient mice reveals a
1
1115
1116
I
78 Genome and Proteome Studies
95.
96.
97.
98.
99.
100.
101.
102.
103. J.D. Fontenot, M.A. Gavin, A.Y. central role for IL-10 in polarizing Rudensky, Foxp3 programs the both T helper cell 1-and T helper cell development and function of 2-type cytokine responses in vivo, J. CD4+CD25+ regulatory T cells, Nut. Immunol. 1997,159,5014-5023. rmmunol. 2003,4, 330-336. K.F. Hoffmann, S.L. James, A.W. 104. R. Khattri, T. Cox, S.A. Yasayko, Cheever, T.A. Wynn, Studies with F. Ramsdell, An essential role for double cytokine-deficientmice reveal Scurfin in CD4+CD25+ T regulatory that highly polarized Thl- and cells, Nut. Immunol. 2003, 4, Th2-Type cytokine and antibody 337-342. responses contribute equally to vaccine-induced immunity to 105. S . Hori, T. Nomura, S. Sakaguchi, Control of regulatory T cell schistosoma mansoni, J . Immunol. development by the transcription 1999, 163,927-938. factor Foxp3, Science 2003, 299, N.G. Sandler, M.M. Mentink-Kane, 1057-1061. A.W. Cheever, T.A. Wynn, Global gene expression profiles during acute 106. M.E. Brunkow, E.W. Jeffery, K.A. pathogen-induced pulmonary Hjerrild, B. Paeper, L.B. Clark, S.A. inflammation reveal divergent roles Yasayko, J.E. Wilkinson, D. Galas, for Thl and Th2 responses in tissue S.F. Ziegler, F. Ramsdell, Disruption repair, /. Immunol. 2003, 171, of a new forkheadlwinged-helix protein, scurfin, results in the fatal 3655-3667. S. Sakaguchi, N. Sakaguchi, lymphoproliferative disorder of the M. Asano, M. Itoh, M. Toda, scurfy mouse, Nut. Genet. 2001, 27, 68-73. Immunologic self-tolerance maintained by activated T cells 107. J.D. Fontenot, J.P. Rasmussen, L.M. expressing IL-2 receptor alpha-chains Williams, J.L. Dooley, A.G. Farr, A.Y. (CD25).Breakdown of a single Rudensky, Regulatory T cell lineage mechanism of self-tolerance causes specification by the forkhead various autoimmune diseases, 1. transcription factor foxp3, Immunity Immunol.1995, 155,1151-1164. 2005,22, 329-341. E.M. Shevach, CD4+ CD25+ 108. T.A. Chatila, F. Blaeser, N. Ho, H.M. suppressor T cells: more questions Lederman, C. Voulgaropoulos, than answers, Nut. Rev. Immunol. C. Helms, A.M. Bowcock, JM2, 2002,2,389-400. encoding a fork head-related protein, S. Sakaguchi, Naturally arising is mutated in X-linked CD4+ regulatory t cells for autoimmunity-allergic disregulation immunologic self-tolerance and syndrome, J. Clin. Invest. 2000, 106, negative control of immune R75-R81. responses, Annu. Rev. Immunol. 109. R.S. Wildin, F. Ramsdell, J. Peake, 2004,22,531-562. F. Faravelli, J.L. Casanova, N. Buist, R.H. Schwartz, Natural regulatory T E. Levy-Lahad, M. Mazzella, cells and self-tolerance, Nut. 0. Goulet, L. Perroni, F.D. Bricarelli, Immunol.2005, 6, 327-330. G. Byrne, M. McEuen, S . Proll, S. Sakaguchi, Naturally arising M. Appleby, M.E. Brunkow, X-linked Foxp3-expressingCD25+CD4+ neonatal diabetes mellitus, regulatory T cells in immunological enteropathy and endocrinopathy tolerance to self and non-self, Nut. syndrome is the human equivalent of Immunol. 2005, 6,345-352. mouse scurfy, Nut. Genet. 2001, 27, J.D. Fontenot, A.Y. Rudensky, A well 18-20. adapted regulatory contrivance: 110. C.L. Bennett, J. Christie, F. Ramsdell, regulatory T cell development and M.E. Bmnkow, P.J. Ferguson, the forkhead family transcription L. Whitesell, T.E. Kelly, F.T. factor Foxp3, Nut. Immunol. 2005, 6, Saulsbury, P.F. Chance, H.D. Ochs, 331-337. The immune dysregulation,
References I 1 1 1 7
111.
112.
113.
114.
115.
116.
polyendocrinopathy, enteropathy, X-linked syndrome (IPEX) is caused by mutations of FOXP3, Nut. Genet. 2001,27,20-21. H. von Boehmer, Mechanisms of suppression by suppressor T cells, Nat. Immunol. 2005, 6, 338-344. M.A. Gavin, S.R. Clarke, E. Negrou, A. Gallegos, A. Rudensky, Homeostasis and anergy of CD4(+)CD25(+) suppressor T cells in vivo, Nut. Immunol. 2002, 3, 33-41. R.S. McHugh, M.J. Whitters, C.A. Piccirillo, D.A. Young, E.M. Shevach, M. Collins, M.C. Byrne, CD4(+)CD25 (+)immunoregulatory T cells: gene expression analysis reveals a functional role for the glucocorticoid-induced TNF receptor, Immunity 2002, 16,311-323. D. Bruder, M. Probst-Kepper, A.M. Westendorf, R. Geffers, S. Beissert, K. Loser, H. von Boehmer, J. Buer, W. Hansen, Neuropilin-1: a surface marker of regulatory T cells, Eur. J . lmmunol. 2004,34,623-630. C.T. Huang, C.J. Workman, D. Flies, X. Pan, A.L. Marson, G. Zhou, E.L. Hipkiss, S. Ravi, J. Kowalski, H.I. Levitsky, J.D. Powell, D.M. Pardoll, C.G. Drake, D.A. Vignali, Role of LAG-3 in regulatory T cells, Immunity 2004, 21,503-513. F. Annunziato, R. Manetti, L. Cosmi, G. Galli, C.H. Heusser, S. Romagnani, E. Maggi, Opposite
117.
118.
119.
120.
role for interleukin-4 and interferon-gamma on CD30 and lymphocyte activation gene-3 (LAG-3) expression by activated naive T cells, Eur. J. Immunol. 1997, 27, 2239- 2244. A.E. Herman, G.J. Freeman, D. Mathis, C. Benoist, CD4+CD25+ T regulatory cells dependent on ICOS promote regulation of effector cells in the prediabetic lesion, J . Exp. Med. 2004, 199, 1479-1489. M. Radonjic, 7.-C. Andrau, P. Lijnzaad, P. Kemmeren, T.T.J.P. Kockelkorn, D. van Leenen, N.L. van Berkum, F.C.P. Holstege, Genome-wide analyses reveal RNA polymerase 11 located upstream of genes poised for rapid response upon S. cerevisiae stationary phase exit, Mol. Cells 2005, 18, 171-183. S.K. Kurdistani, S. Tavazoie, M. Grunstein, Mapping global histone acetylation patterns to gene expression, Cell 2004, 117, 721-733. C.T. Harbison, D.B. Gordon, T.I. Lee, N.J. Rinaldi, K.D. Macisaac, T.W. Danford, N.M. Hannett, J.B. Tagne, D.B. Reynolds, J. Yoo, E.G. Jennings, J . Zeitlinger, D.K. Pokholok, M. Kellis, P.A. Rolfe, K.T. Takusagawa, E.S. Lander, D.K. Gifford, E. Fraenkel, R.A. Young, Transcriptional regulatory code of a eukaryotic genome, Nature 2004, 431, 99-104.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim 1118
I
18 Genome and Proteome Studies
18.2 Scanning the Proteome for Targets of Organic Small Molecules Using Bifunctional Receptor Ligands
Nikolai Hey
Outlook
The terms chemical genomics and chemical proteomics refer to a systematic analysis of the effects of organic small molecules on genomic and proteomic activity (i.e., a chemical approach to systems biology). The goal of this type of analysis is to improve our understanding of the cellular targets and signaling mechanisms that underlie or could predict drug effects. Recent chemical proteomic initiatives have resulted in the emergence of several novel, complementary methods for the characterization and identification of molecular targets of organic small molecules. This chapter reviews the evolution, development, and applications of three-hybrid-based (3H) technologies that utilize chemically engineered bifunctional ligands and facilitate proteome-wide small molecule target discovery. 3H approaches may prove particularly useful in tracing an observed therapeutic/physiological effect of a small molecule to one or more molecular targets or, alternatively, in revealing novel molecular targets that could suggest an alternative therapeutic potential for a particular drug, drug candidate, or chemical class.
18.2.1 Introduction
Organic small molecules embody an important class of therapeutic agents. They are also increasingly being used in chemical biological studies as molecular probes to study the cellular functions of proteins, signaling pathways, and processes associated with disease pathogenesis. The usefulness of a small molecule as a molecular probe is, however, critically dependent on an understanding of its target spectrum, and its specificity and selectivity profiles. Similarly, when a small molecule is selected as a probe to unravel the molecular basis for an observed phenotypic effect in cultured cells or in vivo model systems, the identification of its molecular targets is of fundamental importance. In drug discovery research, an understanding of the molecular targets of a drug or drug candidate can shed important light on activities that are either positively or negatively associated with its therapeutic efficacy, as well as on activities that may raise concern with respect to potential adverse side effects. Whatever the individual scenario, target identification represents an important element in rational lead optimization that strives to Chemical Biology. From Small Molecules to .System Biology and Drug DesigM Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
18.2 Scanning the Proteornefor Targets oforganic Small Molecules
achieve an optimal target spectrum and a therapeutic index for a given drug candidate. Alternatively, the identification of proteins with known function as novel molecular targets could reveal a previously unrecognized therapeutic application(s) for a drug candidate or a marketed drug. In some instances, this could also present an opportunity to resurrect drug candidates that failed to progress in the discovery or development process due to the lack of a good understanding of their mechanisms of action (MoA). With regard to drug development, target discovery may also lead to the identification of novel surrogate markers for therapeutic efficacy, permitting an assessment of the extent to which a putative therapeutic drug might yield a satisfactory clinical result. This is particularly important for the development of the new generation of mechanism-based drugs. Thus, the identification of protein targets of organic small molecules is of fundamental importance in many areas of biomedical research. Recent chemical proteomic initiatives have resulted in the emergence of various alternatives to classical protein activity profiling (e.g., i n uitro kinase assays using purified enzymes) for small molecule target identification. One such alternative method utilizes a variety of chemically reactive probes to profile and identify enzymes or other protein targets in complex mixtures based on their catalytic or ligand-binding activities. This approach, known as activity-based protein profiling (ABPP), is designed to address subproteomes, such as a discrete enzyme family [l-71. Depending on the spectrum of targets recognized by a “pan-active’’ chemical probe, competitive profiling provides information on the selectivity profile of a compound. Because the number of suitable reactive probes is steadily growing, ABPP promises to become a more widely used methodology in chemical proteomics [7]. Another alternative that has been recently described is based on monitoring the interaction of a small molecule with proteins expressed as fusions to T7 bacteriophage [8,91. This approach has been applied successfully in target and selectivity profiling of kinase inhibitors [9]. I t is conceivable that this approach could be adapted to support cDNA library screening, which would expand its application to proteins other than kinases - although it would be limited to proteins that function as monomers. Several other methods for detecting small molecule-protein interactions have been described, including ribosome display, drug-far western, and protein or small molecule microarray-based methods [ 10- 141, but these studies consisted primarily of proof-of-principle studies using known interaction partners. In contrast, the other alternatives noted above have already been successfully applied to the profiling of specific subproteomes, and have resulted in the discovery of many novel molecular interactions. Traditionally, the identification of protein targets of small molecules has relied on in uitro biochemical methods, such as photocross-linking, radiolabeled ligand binding, and affinity chromatography. Affinity chromatography is still a widely used method and can be used to identify targets present in any cell extract of choice. Therefore, it is, in principle, not restricted to an analysis of specific
I
1119
1120
I
78 Genome and Proteome Studies
protein classes or subproteomes. Recent advances in the fabrication of solid supports with improved physical properties for protein affinity purification [ 151, improvements in experimental design and purification schemes [ 16-22], as well as advances in mass spectrometry methods have improved the success rate of affinity-based purification of small molecule targets [23]. However, despite some past and more recent successes, the affinity-based approach does not always deliver results. Most successful examples involve a combination of a high-affinity small molecule with fairly abundant protein receptor. Such a scenario is more of an exception than the rule. Furthermore, most bioactive synthetic small molecules are somewhat hydrophobic, which predisposes them to nonspecific protein binding when coupled at high density to solid supports. This requires stringent wash conditions, which may be unfavorable for many interactions. An unfavorable signal-to-noise ratio is indeed the most common cause of failure in the identification of specific interactions [24]. Another drawback of affinity purification is that it does not directly deliver a cDNA clone that encodes the candidate protein target, which is required for subsequent validation of any putative interaction. cDNA cloning can be laborious and time consuming, especially if the number of candidate targets is large and prioritization of these targets based on some rationale is difficult. Three-hybrid (3H) technologies, which enable the detection of small molecule-protein interactions in intact cells, lack some of the drawbacks encountered with methods such as those outlined above and will be described in more detail here. 18.2.2 History and Development
The yeast three-hybrid (Y3H) system is a cellular assay system designed for the identification and characterization of small molecule-protein interactions in intact cells [25]. It uses yeast Saccharomyces cerevisiae as a host system and combines aspects of the yeast two-hybrid (Y2H) system [26] with recent developments in chemical dimerizer technology [27, 281. The discovery of the MoA of the immunosuppressive macrocyclic lactone lactams FK506 and rapamycin marked the beginning of our current understanding of chemical dimerizers [29]. These bifunctional molecules are able to simultaneously interact with two different proteins through distinct structural elements, promoting the formation of a ternary complex (Fig. 18.2-1). In the case of FK506, the ternary complex consists of FKBP12-FK506-calcineurin. Recruitment of calcineurin, a Ca’+/calmodulindependent protein phosphatase, to the FKBP12- FK506 complex inhibits its function. This results in impaired signaling of the T-cell antigen receptor (TCR) and subsequent immunosuppression [30]. Rapamycin forms a FKB P 12-rapamycin- FRAP ternary complex (FRAP: FKBPl2-rapamycinassociated protein, also named RAFTZ, RAPTI, or TOR). Recruitment of
18.2 Scanning the Proteomefor Targets oforganic S m a l l Molecules
Fig. 18.2-1 (a). Chemical structures ofthe immunosuppressants FK506 and rapamycin. (b) Ribbon diagram ofthe FKBP-FK506-calcineurin complex (adapted from Griffith et al., Cell 1995, 82, 507-522, with permission from Elsevier). Color coding is as follows: calcineurin A (blue), calcineurin B (green), FKBP12 (red), FK506 (white). (c) Schematic representation o f how rapamycin may be used to induce signal transduction through dimerization of
two fusion proteins containing FKBP and FRB (FKBP12-rapamycin binding domain of FRAP) fused to specific signaling domains. DD - "docking domain", which could be a DNA-binding domain or a sequence causing membrane localization ofthe FKBP fusion protein. ED - "effector domain", which could be a transcription activation domain or Some other signaling domain (e.g.? a kinase).
FRAP inhibits its function in T lymphocytes, which results in impaired interleukin-2 receptor signaling [31-341, and this is thought to be the basis for the immunosuppressive actions of rapamycin (301. Another example of an immunosuppressant that acts in this manner is cyclosporin A, which interacts with cyclophilin to form a complex that then binds calcineurin [30]. FK50G and rapamycin, and various analogs thereof, have been widely used to cross-link at will hybrid proteins that have been designed to contain appropriate binding sites for these molecules, thereby controlling intracellular signaling events that are naturally or otherwise regulated by protein-protein interactions [27, 29, 351 (Fig. 18.2-1). Chemically engineered analogs include
1
1121
1122
I dimeric versions of FK506 (FK506-FK506, appropriately termed FK1022) 78 Genome and Proteorne Studies
[36], FK506-cyclosporin [37], and analogs of FK506 and rapamycin that recognize only mutated forms of FRAP or FKBP12 (and are therefore devoid of the cytotoxic activities associated with FK506 and rapamycin) [38-401. The synthesis and use of the hybrid ligand/dimerizer FK1012 [36] marked the beginning of 3H systems, in which a synthetic bifunctional molecule is used to induce the homo- or heterodimerization of chimeric proteins. The use of this hybrid ligand and other dimerizers was initially strictly focused on promoting cellular signaling events in a temporally controllable and dynamic fashion. However, the synthesis of hybrid ligands incorporating a small molecule test compound for the purpose of de novo identification of protein targets of that molecule followed shortly thereafter [25]. This marked the beginning of the development of 3H systems for proteome-wide screening of small molecule-protein targets. Liu and colleagues [25]took advantage of the concept of compound-induced protein-protein interaction and modified the previously developed Y2H system [26] to create a Y3H system. Y2H is arguably the most widely used technology for the detection and identification of hybrid proteins on a large scale [41-431. A logical next step was to adapt it to a Y3H system that could, in principle, support the screening of complex cDNA libraries for identifying targets of small molecules. The basic elements of the Y2H system and their interactions are depicted and described in Fig. 18.2-2. The basic components of a Y3H system are shown and described in Fig. 18.2-3. Both the Y2H and Y3H systems make use of fusion proteins that contain a DNA-binding domain (DBD) or a transcription activation domain (AD). In Y2H, the interaction of the fusion proteins is a direct interaction of proteins or protein domains that are fused to the DBD and AD domains. In Y3H, the interaction of DBDand AD-fusion proteins is mediated by a hybrid ligand (chemical dimerizer). The chemical dimerizer consists of an anchor moiety with known binding affinity for a ligand-binding domain (LBD) that is fused to the DBD domain. Recruitment of the AD-fusion protein to the promoter region of a reporter gene is induced by its interaction with a small molecule that is linked to the anchor moiety of the hybrid ligand. A productive interaction generates a ternary complex that promotes the transcriptional activation of the resident reporter gene. The Y3H system described by Liu and colleagues made use of a dexamethasone (DEX)-FK506 heterodimer. The fusion proteins consisted of the glucocorticoid receptor (GR, the LBD) fused to LexA (the DBD), and FKBP12 fused to a transcription AD derived from the bacterial protein B42 [25]. The use of mutant forms of GR, which displayed higher affinity for DEX than wild-type GR, was necessary for the detection of the interaction. These findings suggest that affinities in at least the nanomolar range are most likely required for successful display of a synthetic hybrid ligand by the DBD-fusion protein. Importantly, using DEX-FK506, FKBPl2 could be identified in a screen using a cDNA library encoding a complex mixture of AD-fusion proteins, suggesting that Y3H could, in principle, be used to identify novel drug receptors. This
18.2 Scanning the Proteomefor Targets oforganic Small Molecules
Fig. 18.2-2 The Y2H system: interaction o f grow in the absence of histidine in the bait and prey fusion proteins activates the expression of a reporter gene. DBD - DNA-binding domain. AD - transcription activation domain. R E - promoter response element. Reporters: HIS3 (an auxotrophic marker, the induction of which enables yeast cells to
culture medium), LacZ (can be detected in a colorimetric assay). Inset shows an array of yeast cells that has been generated using an appropriate robot. As shown, LacZ reporter induction (bluelgreen colored yeast cells) reflects a productive protein-protein interaction.
was confirmed by the use of a DEX-methotrexate (MTX) hybrid ligand in cDNA library screening, which resulted in the identification of its known target dihydrofolate reductase (DHFR) [44]. These studies, however, involved molecules (FK506 and MTX) that exhibit high affinity for their respective receptors, leaving unanswered the important question of whether Y3H is also suitable for the detection of lower affinity small molecule-protein interactions and for the identification of novel protein targets. Since the first report on Y3H, several other hybrid ligands have been described, all of which incorporate anchor moieties with high affinity for a particular receptor protein. These include DEX, FK506, estradiol analogs, and MTX [25,36,45-SO]. MTX-based hybrid ligands (also referred to as MTX-fusion compounds or MFCs) appear particularly promising, as recently demonstrated by their use in the screening of cDNA libraries and the identification of known as well as novel targets of ATP-competitive small molecule kinase inhibitors [51] (see also Figs. 18.2-3 and 18.2-4).This work addressed a number of previously unanswered questions, providing evidence that cDNA library screens can be performed at high complexity and redundancy; the emergence of false positives can be easily controlled and deselected for, using appropriate genetic
I
1123
1124 18 Genome and Proteorne Studies
I
Fig. 18.2-3 Y3H system. (a) Components ofthe Y3H system. A MTX-based hybrid ligand associates with a DBD-fusion protein and AD-fusion protein. Formation o f a complex induces activation o f a reporter gene. In the example shown here, the MTX-fusion compound (MFC) incorporates a PEG linker and the small molecule kinase inhibitor purvalanol B (PurvB). (b) Example o f a Y3H interaction. The DNA-binding domain fusion protein is a LexA (DBD)-DHFR fusion. The AD-fusion protein is a CR*-Cal4 (AD) fusion. GR* represents
a mutant form o f glucocorticoid receptor (GR) with high affinity for dexamethasone (DEX). Activation o f gene expression is reflected in positive yeast growth (HIS3 marker). Alternatively, induction o f the Lac2 reporter is detected by a colorimetric assay. (c) Example o f outgrowth o f yeast cells in which a positive interaction has taken place in the presence o f a MFC. Such kind o f yeast colonies, typically formed during cDNA library screens, can be picked and subjected t o subsequent analysis, as described in Fig. 18.2-4.
18.2 Scanning the Proteomefor Targets oforganic Small Molecules
Fig. 18.2-4 Y3H-cDNA library screening workflow, as recently described (adapted from Becker et al., Chem. Biol. 2005, 1 7 , 21 1-223, with permission from Elsevier). Screening involves transformation o f yeast cells with a cDNA library, selection ofyeast colonies (HIS3 selection), picking o f yeast cells, rearraying ofthese yeast cells, interrogation of arrays with t e s t MFC and other hybrid ligands (and MTX-PEG), picking o f positives, isolation o f plasmid DNA, and sequencing. Plasmids are then retransformed into yeast cells and arrays are
interrogated once again with the t e s t MFC and control compounds (96-well format assay). Each 96-well plate represents the effects o f one particular compound. Images from each array screen are then clustered to yield a composite image, as shown. The composite image shows an example o f the interaction o f MFCs o f kinase inhibitors, and variants thereof, with their respective protein kinase targets (adapted from Becker et al., Chem. Biol. 2005, 1 7 , 21 1-223, with permission from Elsevier).
counterscreens and a combination of different hybrid ligands; interactions with affinities in the low micromolar range can still be detected; and interactions can be detected with a high degree of specificity.
I
1125
1126
I
18 G e n o m e a n d Proteome Studies
Kley and colleagues have also described the development of array-based screening approaches for the rapid profiling of small molecule-protein interactions with Y3H [51]. In this screening paradigm, yeast cells are transformed with a specific cDNA encoding a candidate target protein and are subsequently spotted with an appropriate robot on agar plates to generate a yeast cell array (96-wellformat, see Fig. 18.2-4). Prior to spotting the yeast cells, hybrid ligand is deposited at the same location. Positive concentric outgrowth of yeast cells at a particular coordinate indicates that an interaction of a candidate target protein with the small molecule of interest has occurred. The implementation of yeast cell arrays and automation of the spotting process was found critical in performing controlled cDNA library screens and appropriate quality control tests (to ensure a high signal/noise ratio). Array screening also enables a direct interaction analysis of any cloned open reading frame (ORF) of interest, as has been described for the screening of a defined set of kinases [Sl]. It should be noted that array screening is inherently more sensitive than complex cDNA library screening. In an array screen, each potential interaction is tested separately; therefore, no competitive growth selection is taking place and weaker interactions are easier to detect. The application of such an approach to the scanning of the kinome for small molecule-protein interactions is described below (see Section 18.2.4). To successfully perform Y3H screens, the choice of the anchor moiety of the hybrid ligand is important. MTX, as already indicated, shows much promise. It exhibits high affinity (low nanomolar to picomolar) for the monomeric form of E. coli dihydrofolate reductase (eDHFR), which is a small, compact molecule that can be easily expressed as a fusion protein in yeast cells [46]. Furthermore, contrary to what is often observed with nonhybrid small molecules, MTX-hybrid molecules appear to generally permeate yeast cells quite readily. At GPC Biotech we have screened over 50 hybrid ligands in which MTX was coupled to various small molecule chemotypes. To date we have not encountered difficulties with cellular uptake of these molecules. Cellular uptake can readily be determined using appropriate competition experiments, as outlined in Fig. 18.2-5. For practical purposes, the choice of linker and strategy for the chemical synthesis of the hybrid ligands is also important. MTX-based hybrid ligands that include polyethylene glycol (PEG) as a linker have proven quite successful [Sl]. PEG linkers of variable length have been used, and generally PEG repeats of n = 3-6 generate suitable hybrid ligands. A PEGylated test compound, which is generated as an intermediate in the synthesis of the MTX-hybrid ligand, also provides a suitable probe for coupling to solid phase and for biochemical validation of any interactions that might be identified with Y3H. A general strategy for the synthesis of MTX-based hybrid ligands is described in Fig. 18.2-6. In summary, the development of MTX-based hybrid ligands and array-based screening approaches have led to the “reemergence” of Y3H as a chemical proteomics technology that can be successfully deployed for the scanning of the proteome or subproteomes with organic small molecules. Thus, although
18.2 Scanning the Proteome for Targets oforganic Small Molecules
Fig. 18.2-5 A Y3H competition assay. The competition assay provides a measure o f cellular uptakelfunctionality o f a t e s t MFC. Also shown is an example o f experimental results showing a dose-dependent competitive inhibition o f HIS3 reporter
activation induced by a “reference” MFC (reflected in the decrease in yeast growth in response to increasing concentration o f test MFCs) (adapted from Becker et al., Chem. Biol. 2005, I 1 , 21 1-223, with permission from Elsevier).
Y3H may not be suitable for lead discovery, it could prove particularly useful in tracing an observed therapeutic/physiological effect of a small molecule on one or more molecular targets or, alternatively, reveal molecular targets that could suggest an alternative therapeutic potential for a particular drug, drug candidate, or chemical class.
18.2.3 General Considerations
As outlined above, Y3H offers a promising alternative to other methods for the identification and characterization of small molecule-target interactions. It provides a means to rapidly screen complex cDNA libraries encoding candidate target proteins. The identification of an interaction is directly associated with the availability of a cDNA clone encoding a target protein, which enables rapid secondary validation experiments. Furthermore, once a clone has been identified, it becomes a permanent resource that can be interrogated in a reiterative fashion with any small molecule hybrid ligand of interest. Another advantage of Y3H is that it is a binding assay that does not require a priori knowledge of the biochemical activity of candidate target proteins. Thus, it also makes possible the identification and characterization of targets whose biological functions are unknown. Compared to Y2H, Y3H boasts the advantage that the DBD-fusion protein for a given system (e.g., LexA-DHFR, see Fig. 18.2-3)remains invariant. Many
I
1127
1128
I
18 Genome and Proteome Studies
Fig. 18.2-6 A strategy for the synthesis o f MFCs (adapted from Kley, Chem. Biol. 2005, 1 1 , 599-608, with permission from Elsevier). A probe that can be immobilized on solid phase for biochemical studies is
generated as an intermediate in the synthesis ofthe MFC. Various chemical reactions can be applied when using different functional groups for coupling reactions.
false positives in Y2H screens emerge due to "stickiness" of a particular DBD-bait fusion protein and its nonspecific binding to AD-fusion proteins. This is not an issue with Y3H. Furthermore, multiple hybrid ligands can easily be used to assess the specificity of a particular interaction [Sl].However, Y3H also shares some limitations with Y2H. For instance, it is limited to proteins that can be expressed as fusion proteins in yeast and that translocate into the nucleus. Thus, it is not suitable for the analysis of membrane proteins, unless specific domains of such proteins are expressed (e.g., cytoplasmic domains of receptor tyrosine kinases). Interactions that require accessory proteins may also not be detected. The need for the use of hybrid ligands in Y3H may also limit its application. For example, coupling of the PEG linker and MTX to a test molecule may perturb its binding affinity to certain target proteins. In the event that structure-activity relationship (SAR) information, which can provide
18.2 Scanning the Proteome for Targets oforganic Small Molecules
a rational basis for the positioning of PEG linker in the test molecule, is not available, positional scans may have to be performed. In that respect, Y3H has constraints similar to those seen with aftinity purification methods, which require modification and solid-phase immobilization of a test molecule. MTX-based hybrid ligands that cause growth inhibition or cell death in yeast cells would also be unsuitable, although we have not yet encountered such a case. One complication, which we encountered once, involved a MTX ligand that autoactivated the Y3H system. This appeared to be due to the interaction of the test molecule with a yeast protein that, when recruited to the promoter region of the reporter gene, causes transcription activation (manuscript in preparation). This supposition is based on the findings that the same hybrid ligand was not autoactivating in a yeast strain that was made deficient in the gene encoding that particular yeast protein (which was identified by screening of a yeast cDNA library). Alternatively, autoactivation could be suppressed by adding 3-amino-1,2,4-triazole (3AT) to the culture medium (as frequently done in Y2H experiments that utilize baits that are autoactivating [41, 431). Another arguable limitation of the Y3H system is that robust screening requires, ideally, robotic handling of yeast cells and the generation of yeast cell arrays. This technical capability may not be available to every laboratory, in which case more labor intensive and error prone manual handling and spotting of yeast cells would have to be performed. In summary, although the application of Y3H may be limited in some scenarios, most of these are likely to be rare events. The most limiting factor is likely the requirement for expression of fusion proteins that are able to translocate into the nucleus of yeast cells while retaining a properly folded small molecule binding domain. This may, however, not be an issue with many proteins, because of their modular structure. A modular structure favors proper folding of a binding domain, even when it is expressed in isolation or as part of a hybrid fusion protein. Thus, the use of complex cDNA libraries, which contain multiple fusion variants of a particular protein, is preferable and will decrease the occurrence of false negatives.
18.2.4 Applications and Practical Examples
As outlined in the previous section, the emergence of a Y3H system that supports cDNA library screening and the identification of novel interactions is fairly recent. However, its potential is clearly demonstrated by its application to the identification of targets of small molecule kinase inhibitors [51, 521. Protein kinases have been implicated as pivotal signal transducers in many cell signaling networks, and have emerged as an attractive class of drug targets for many disease indications, in particular, cancer and inflammation [53].
I
1129
1130
I The realization that small molecule inhibitors of protein kinases might be I8 Genome and Proteome Studies
of therapeutic use, as exemplified by the phenyl-aminopyrimidine STI571 (also known as imatinib mesylate or Gleevec) in the treatment of myelogenous leukemia [54], has led to intensive drug discovery efforts involving multiple disease-relevant kinases. These include the cyclin-dependent kinases (CDKs) and CDK-related kinases (CRKs). Protein kinase inhibitors from a large number of different chemotypes have emerged in recent years. Most of these interact with the ATP-binding domain (activesite) ofkinases, thereby inhibiting catalysis and substrate phosphorylation. Because of the structural similarity of the active sites of different kinases, such compounds have the potential for cross-reacting with kinases other than the intended target kinase(s) [23]. Sequence similarityper se is not a good predictor of cross-reactivity, which often occurs with phylogenetically distantly related kinases [9, 231. Cross-reactivity with other proteins, such as purine-binding proteins, may also occur. Thus, extensive target screening is an important factor in the characterization of the MoA of kinase inhibitors. Assessing the effects of kinase inhibitors on the in vitro kinase activity of purified kinases has been critical in determining their selectivity profiles. However, screening of a large number of purified kinases is costly and assays are available for only part of the kinome (approx. 200 kinase assays; the kinome encodes >500 human kinases [55]).The functions of many kinases are unknown, as are their substrates, and no standard assays are available to probe the effects of a small molecule on their activity. Y3H provides an opportunity to simultaneously assay any kinase or kinase domains that can be expressed as a fusion protein in yeast. A recent study successfully made use of a hybrid ligand incorporating the potent CDK inhibitor purvalanol B, a purine analog, suggesting that many different kinases, or their modular ATP-binding domains, can be assayed with Y3H [Sl]. Thus, a significant coverage of the kinome might be achieved. That study also revealed that purvalanol B, deployed as a CDK inhibitor in a wide number of biological studies, actually “sees” many more kinases than previously known, including tyrosine kinases. Roscovitine, a closely related purine analog, appeared to be more specific. However, this compound is also a far less potent CDK inhibitor. Similar observations were obtained with other kinase inhibitor chemotypes. For example, indenopyrazoles, which are potent inhibitors of CDK1/2/4, were found through Y3H screening to be much more promiscuous than one might have anticipated [Sl]. This was recently confirmed using in vitro kinase activity profiling (unpublished results). In contrast to the previous examples, potent CDK inhibitors that are based on a [1,3,G]-tri~ub~tituted-pyrazolo-[3,4-d]-pyrimidine-4-one kinase inhibitor scaffold [5G] have recently been found (using Y3H) to exhibit a remarkable proteome-wide specificity for a relatively small number of CDKs/CRKs [52]. These included kinases other than the known targets CDK1/2, some ofwhich have been implicated in cellular processes associated with cellular proliferation or, alternatively, the pathogenesis of diseases other than cancer. Thus, a
18.2 Scanning the Proteomefor Targets oforganic Small Molecules
I
1131
compound derived from the [1,3,G]-trisubstituted-pyrazolo-[3,4-d]-pyrimidine4-one scaffold could possibly be optimized for enhanced or decreased affinity for one or the other target(s), making it more suitable for one or the other therapeutic application. We have indeed recently identified such compounds (unpublished results). This latter study [52] provides a good example of how Y3H-based target profiling can be used to gain a more detailed understanding oftargets that could underlie the biological effects of a small molecule, as well as the range of potential therapeutic applications of the compound class/inhibitor scaffold from which it was derived. Furthermore, the biological functions of some of the newly identified CRK targets are only poorly understood. The availability of chemical probes for these kinases should facilitate their functional characterization. We have also used Y3H to profile a number of different kinase inhibitors that are in clinical trials or in the market. Consistent with results recently published [9], many of these were found to interact with kinases other than their intended targets. These findings strongly emphasize the importance of kinome-wide selectivity profiling of kinase inhibitors. Y3H-based kinase inhibitor profiling, using yeast cell arrays that display many kinases, should facilitate such studies. We have recently assembled such a resource (Ref. 52, manuscript in preparation) and will integrate it into Y3H for standard kinome profiling of putative kinase inhibitors. Although the Y3H studies reported by our laboratory have focused on the use of kinase inhibitors, a growing number of studies indicate that Y3H is equally suitable for use with other types of small molecules. For example, we have detected bona jide interactions of small molecules with phosphodiesterases (PDEs),histone deacetylases (HDACs),sirtuins (SIRTs), carbonic anhydrase, and various other proteins (manuscript in preparation). In addition to being broadly applicable to the de novo identification of targets of small molecules, 3H systems may be used to further characterize their interactions and to investigate SAR parameters. For example, one may rapidly investigate the effects of particular mutations or naturally occurring polymorphisms on the interaction of a small molecule with its target protein. Additionally, mutagenesis screens may be performed to identify protein variants that display altered characteristics in their ligand-binding properties. This kind of functional cloning approach has been used to identify FKBP or FRAP mutants that bind specific analogs of FK506 and rapamycin, which have reduced affinity for the naturally occurring forms of these proteins [38]. This has led to the development of chemical dimerizers with higher affinity for their target proteins, along with reduced cytotoxicity. A similar approach could be used to identify mutant variants of a target protein that have decreased affinity for a particular compound while retaining biological activity. Such drug-resistant mutants could be used to explore the relative importance of that target in the pharmacological effects of that compound [57]. Yet another functional cloning application of Y3H has recently been described by Cornish and colleagues [58],in which Y3H was used to assay for an enzymatic activity
1132
78 Genome and Proteome Studies
I of a protein expressed in yeast cells that could cleave the linker moiety of a specific dimerizer. These examples emphasize the broad range of the possible applications of Y 3H . 18.2.5 Future Developments
Y3H is the first 3H system that has been successfully applied to large scale screening for small molecule targets. Future developments of 3H systems that operate in mammalian cells rather than in yeast cells should further expand the range of applications of the 3H concept. As already discussed, Y3H relies on the expression of hybrid proteins in yeast cells and their translocation into the nucleus. Furthermore, yeast cells are generally less permeable to small molecules than mammalian cells, with the previously noted exception of MTX heterodimers. These drawbacks render it difficult to perform competition experiments, in which the ability of a test compound to compete with a hybrid ligand for binding to a specific target protein is determined. This would be less of an issue in a mammalian 3H (M3H) system. Furthermore, a M3H system may facilitate the detection of interactions that require accessory proteins or posttranslational modifications of the target protein. Several 2H systems that enable the detection of protein-protein interactions in mammalian cells have been described, for example: (a) the ubiquitin-splitprotein-sensor (USPS) technology [59], (b)two-component protein fragment complementation assays (PCAs)[GO, 611 (e.g., systems based on reconstitution of split-DHFR, split-b-lactamase,and split-GFP),and (c) interaction technologies based on resonance energy transfer between reporter proteins with either fluorescent or bioluminescent properties (FRET:fluorescent resonance energy transfer and BRET bioluminescent resonance energy transfer). These systems have been used to monitor specific known protein-protein interactions in intact cells or to determine whether one protein would be able to interact with another protein (direct interaction tests). They have not been applied to random screening of protein-protein interactions using cDNA library screening paradigms, with the exception of a recent report on the use of split-GFP [G2]. How broadly applicable this system is remains to be determined. One potential drawback of PCA assays is susceptibility to steric constraints imposed on the assembly of two reporter protein fragments when these are fused to other proteins or protein fragments of varying sizes and properties. Limited sensitivity and dynamic range might also be an issue in some instances. Thus, even if these 2H systems could be adapted to a 3H version for the detection and characterization of defined small molecule-protein interactions (as has been described for some of these [GO, G l ] ) , it remains uncertain whether they would be suitable for random, large scale cDNA library screening and for de novo target identification. On the other hand, a recently described M2H method, termed mammalian protein-protein interaction trap (MAPPIT)[G3],has already
18.2 Scanning the Proteornefor Targets oforganic Small Molecules
provided a novel opportunity for the development of a M3H system with broader applications. MAPPIT has been successfully used by Tavernier and colleagues [63], as well as in our laboratory (unpublished observations), in the identification of novel protein-protein interactions using cDNA library screening. Its basic components and their mode of action are described in Fig. 18.2-7. It operates according to the concept ofa “protein recruitment” system. In this instance, the bait protein (the “docking station”) recruits a prey protein to the cytoplasmic domain of a cytokine receptor, which triggers a signal transduction event that can be easily monitored. In that respect, MAPPIT displays similarities to the Y2H system, in which an AD-fusion protein (the prey) is recruited to DNA through its association with a DBD-fusion protein (the bait). Such protein recruitment systems are arguably less susceptible than PCA-based systems to the occurrence of false negatives due to steric constraints encountered during protein fragment assembly. We have recently been successful in developing a 3H version of the MAPPIT technology, termed mammalian small molecule-protein interaction trap (MASPIT), which, similar to Y3H, is suitable for the detection of the interaction of MTX-based hybrid ligands with their target proteins [64]. The concept and components ofthis system are described in Fig. 18.2-7. In contrast to Y3H, MASPIT can be readily used to perform competition experiments with hybrid ligands and nonmodified parent molecules. Thus, the interaction of the parent molecule with a candidate target protein can be directly validated in this fashion. Additionally, dose-response experiments can provide a measure for the targeting potency of a compound for a target protein in the context of an intact cell [64]. Such measurements could lend some important insights into how effective a compound might be in inhibiting the activity of a target protein in the context of other competing interactions. For instance, if a competing protein was expressed at high levels, higher doses of the compound might be required to inhibit the intended target(s) as effectively as might otherwise be the case (as, for instance, with purified target protein). For a number of reasons, monitoring the interaction of a small molecule with its target protein(s) in intact cells could reflect a more realistic setting in which to shtdy a compound’s cellular MoA. It would simultaneously address variables that may influence the cellular potency of a compound, such as cell permeability, posttranslational modifications of target proteins, competitive interactions, intracellular concentrations of molecules such as ATP, and so on. A cell-based assay would also enable the analysis of the interaction of a target protein with a drug that is presented to cells in the form of a prodrug and which requires intracellular conversion to an active ligand (unpublished observations). Since MASPIT is a “simple” binding assay, it could also be used potentially to screen small molecule libraries for compounds that interfere with or compete for binding of a known molecule with its target protein. Therefore, MASPIT provides an opportunity for small molecule discovery that is not possible with Y3H (due to the less favorable permeability of yeast cells to small molecules).
1
1133
1134
I
18 Genome and Proteome Studies
Fig. 18.2-7 The MAPPIT and MASPIT systems. (a) Events occurring in response t o ligand-induced activation o f a type 1 cytokine receptor. Ligand-binding results in conformational changes in the receptor complex, ultimately leading to juxtaposition and activation o f a receptor-associated Janus kinase (JAK). JAK then phosphorylates the cytoplasmic part o f the receptor, leading t o recruitment o f signaling molecules. including signal transducers and activators o f transcriptions (STATs). JAK phosphorylates STAT, which causes STAT t o dissociate from the receptor, form a homodimer, translocate to the nucleus and activate transcription o f a STAT-response gene (or reporter gene). STAT3-activation
can be monitored using a STAT3-responsive reporter gene, which uses the pancreatitis associated protein 1 (rPAP1) promoter. (b) MAPPIT. This 2H system is based on the concept described in (a). It employs a signalingdeficient leptin receptor F3 (lepRF3) variant that cannot recruit STAT3. An interaction o f t h e bait and prey proteins results in the recruitment o f a gpl30 protein fragment containing STAT3 recruitment sites. STAT3 can now be recruited and subsequently phosphorylated by JAK2, leading t o its activation. (c) MASPIT. In this system, the recruitment o f the g p l 3 0 protein fragment is triggered by the interaction o f a prey protein with the t e s t compound moiety o f an MFC.
References I 1 1 3 5
Finally, we have recently successfully applied MASPIT to the screening of cDNA libraries and to the identification of novel small molecule-protein interactions [64]. These studies mark the beginning of the development of a broadly applicable M3H system that holds promise for future use in target identification and drug discovery.
18.2.6 Conclusions
A detailed understanding of the MoA of organic small molecules is equally important in chemical biology and drug discovery. In chemical biology, mapping of the target spectrum and selectivity profile of a small molecule is critical for its meaningful use as a probe to study protein function, as well as in tracing molecular targets to its observed therapeutic/physiological effects. In drug discovery, an understanding of the MoA of small molecules can have an impact on the discovery process at multiple stages, particularly in the lead optimization and the assessment of the therapeutic potential of drugs or drug candidates [52, 651. Thus, recent advances in the development of 3H systems hold promise for their more widespread use in biomedical research and drug discovery. Y3H has already provided a powerful approach in the identification of novel molecular targets of small molecules, as exemplified by the studies with protein kinase inhibitors, and by a method to study the effects of mutations or polymorphisms on small molecule-protein interactions. The emergence of mammalian-based systems promises to further expand the range of 3H applications, such as a determination of relative targeting potencies of small molecules for protein targets in intact cells, and pending a successful adaptation to higher throughput analysis, even for limited compound screening and hitflead identification.
Acknowledgments
I thank Dr. Margaret Lee Kley for a critical reading of the manuscript and many helpful comments.
References I . Y. Liu, M.P. Patricelli, B.F. Cravatt.
Activity-based protein profiling: the serine hydrolases, Proc. Natl. Acad. Sci. U.S.A. 1999, 96(26), 14694-14699. 2. D.C. Creenbaum, W.D. Arnold, F. Lu, L. Hayrapetian, A. Baruch,
J. Krumrine, S. Toba, K. Chehade, D. Bromme, I.D. Kuntz, M. Bogyo, Small molecule affinity fingerprinting. A tool for enzyme family subclassification, target identification, and inhibitor design, Chem. B i d . 2002, 9(lo), 1085-1094.
1136
I
18 Genome and Proteome Studies 3.
4.
5.
6.
7.
8.
9.
10.
11.
A. Borodovsky, H. Ovaa, N. Kolli, T. Gan-Erdene, K.D. Wilkinson, H.L. Ploegh, B.M. Kessler, Chemistrybased functional proteomics reveals novel members of the deubiquitinating enzyme family, Chern. Biol. 2002, 9(10),1149-1159. D. Leung, C. Hardouin, D.L. Boger, B.F. Cravatt, Discovering potent and selective reversible inhibitors of enzymes in complex proteomes, Nat. Biotechnol. 2003, 21(6),687-691. D.A. Campbell, A.K. Szardenings, Functional profiling of the proteome with affinity labels, C u r . Opin. Chem. Biol. 2003, 7(2),296-303. A.E. Speers, B.F. Cravatt, Profiling enzyme activities in vivo using click chemistry methods, Chem. Biol. 2004, 11(4),535-546. N. Jessani, B.F. Cravatt, The development and application of methods for activity-based protein profiling. Curr. Opin. Chem. Biol. 2004, 8(l),54-59. P.P. Sche, K.M. McKenzie, J.D.White, D.J. Austin, Display cloning: functional identification of natural product receptors using cDNA-phage display, Chem. Biol. 1999, G(lO), 707-716. M.A. Fabian, W.H. Biggs, D.K. Treiber, C.E. Atteridge, M.D. Azimioara, M.G. Benedetti, T.A. Carter, P. Ciceri, P.T. Edeen, M. Floyd, J.M. Ford, M. Galvin, J.L. Gerlach, R.M. Grotzfeld, S. Herrgard, D.E. Insko, M.A. Insk0,A.G. Lai, J.M. Lelias, S.A. Mehta, Z.V. Milanov, A.M. Velasco, L.M. Wodicka, H.K. Patel, P.P. Zarrinkar, D.J. Lockhart, A small molecule-kinase interaction map for clinical kinase inhibitors, Nat. Biotechnol. 2005, 23(3),329-336. M. McPherson, Y. Yang, P.W. Hammond, B.L. Kreider, Drug receptor identification from multiple tissues using cellular-derived mRNA display libraries, Chem. Biol. 2002, 9(6),691-698. H. Tanaka, N. Ohshima, H. Hidaka, Isolation of cDNAs encoding cellular drug-binding proteins using a novel expression cloning procedure:
12.
13.
14.
15.
16.
17.
18.
19.
drug-western, Mol. Pharmacol. 1999, 55(2), 356-363. G. MacBeath, S.L. Schreiber, Printing proteins as microarrays for high-throughput function determination, Science 2000, 289(S48S),1760- 1763. F.G. Kuruvilla, A.F. Shamji, S.M. Sternson, P.J. Hergenrother, S.L. Schreiber, Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays, Nature 2002, 416(6881), 653-657. N. Winssinger, S. Ficarro, P.G. Schultz, J.L. Harris, Profiling protein function with small molecule microarrays, Proc. Natl. Acad. Sci. U.S.A. 2002, 99(17),11139-11144. N. Shimizu, K. Sugimoto, J. Tang, T. Nishi, I. Sato, M. Hiramoto, S. Aizawa, M. Hatakeyama, R. Ohba, H. Hatori, T. Yoshikawa, F. Suzuki, A. Oomori, H. Tanaka, H. Kawaguchi, H. Watanable, H. Handa, Highperformance affinity beads for identifying drug receptors, Nat. Biotechnol. 2000, 18(8),877-881. M. Knockaert, N. Gray, E. Damiens, Y.T. Chang, P. Grellier, K. Grant, D. Fergusson, J. Mottram, M. Soete, J.F. Dubremetz, K. Le Roch, C. Doerig, P. Schultz, L. Meijer, Intracellular targets of cyclin-dependent kinase inhibitors: identification by affinity chromatography using immobilised inhibitors, Chem. Biol. 2000, 7(6), 411-422. M. Knockaert, K. Wieking, S. Schmitt, M. Leost, K.M. Grant, J.C. Mottram, C. Kunick, L. Meijer, Intracellular targets of paullones. Identification following affinity purification on immobilized inhibitor, J . Biol. Chem. 2002,277(28),25493-25501. P.R. Graves, J.J.Kwiek, P. Fadden, R. Ray, K. Hardeman, A.M. Coley, M. Foley, T.A. Haystead, Discovery of novel targets of quinoline drugs in the human purine binding proteome, Mol. Phamacol. 2002, 62(6), 1364-1372. G. Lolli, F. Thaler, B. Valsasina, F. Roletto, S. Knapp, M. Uggeri,
References I 1 1 3 7 A. Bachi, V. Matafora, P. Storici, A. Stewart, H.M. Kalisz, A. Isacchi,
20.
21.
22.
23.
24.
25.
26.
27.
Inhibitor affinity chromatography: profiling the specific reactivity of the proteome with immobilized molecules, Proteomics 2003, 3(7), 1287-1298. K. Godl, I. Wissing, A. Kurtenbach, P. Habenberger, S. Blencke, H. Gutbrod, K. Salassidis, M. Stein-Gerlach, A. Missio, M. Cotten, H. Daub, An efficient proteomics method to identify the cellular targets of protein kinase inhibitors, Proc. Natl. Acad. Sci. U.S.A. 2003, 100(26),15434-15439. J. Wissing, K. Godl, D. Brehmer, S. Blencke, M. Weber, P. Habenberger, M. Stein-Gerlach, A. Missio, M. Cotten, S. Muller, H. Daub, Chemical proteomic analysis reveals alternative modes of action for pyrido[2,3-d]pyrimidine kinase inhibitors, Mol. Cell Proteomics 2004, 3(12), 1181- 1193. Y. Liu, K.R. Shreder, W. Gai, S. Corral, D.K. Ferris, J.S. Rosenblum, Wortmannin, a widely used phosphoinositide 3-kinase inhibitor, also potently inhibits mammalian polo-like kinase, Chem. Biol. 2005, 12(1),99-107. H. Daub, K. Godl, D. Brehmer, B. Klebl, G. Muller, Evaluation of kinase inhibitor selectivity by chemical proteomics, Assay Drug Dev. Technol. 2004, 2(2),215-224. L. Burdine, T. Kodadek, Target identification in chemical genetics: the (often) missing link, Chem. Biol. 2004, 1 1 ( 5 ) ,593-597. E.J. Licitra, J.O. Liu, A three-hybrid system for detecting small ligand-protein receptor interactions, Proc. Natl. Acad. Sci. U.S.A. 1996, 93(23), 12817- 12821. S . Fields, 0. Song, A novel genetic system to detect protein-protein interactions, Nature 1989, 340(6230), 245-246. N. Kley, Chemical dimerizers and three-hybrid systems: scanning the proteome for targets of organi; small
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
molecules, Chem. Biol. 2004, 1 1 ( 5 ) , 599-608. S . Lefurgy, V. Cornish, Finding Cinderella after the ball: a three-hybrid approach to drug target identification, Chem. Biol. 2004, 11(2),151-153. S.L. Schreiber, Chemical genetics resulting from a passion for synthetic organic chemistry, Bioorg. Med. Chem. 1998, 6(8), 1127-1152. J. Liu, J.D. Farmer, Jr.,W.S. Lane, J. Friedman, I. Weissman, S.L. Schreiber, Calcineurin is a common target of cyclophilin-cyclosporin A and FKBP-FK506complexes, Cell 1991, 66(4),807-815. J. Heitman, N.R. Mowa, M.N. Hall, Targets for cell cycle arrest by the immunosuppressant rapamycin in yeast, Science 1991, 253(5022), 905-909. E.J. Brown, M.W. Albers, T.B. Shin, K. Ichikawa, C.T. Keith, W.S. Lane, S.L. Schreiber, A mammalian protein targeted by G1-arresting rapamycin-receptor complex, Nature 1994, 369(6483), 756-758. D.M. Sabatini, H. ErdjumentBromage, M. Lui, P. Tempst, S.H. Snyder, RAFT1: a mammalian protein that binds to FKBPl2 in a rapamycin-dependent fashion and is homologous to yeast TORS, Cell 1994, 78(1), 35-43. M.I. Chiu, H. Katz, V. Berlin, RAPT1, a mammalian homolog of yeast Tor, interacts with the FKBPlZ/rapamycin complex, Proc. Natl. Acad. Sci. U.S.A. 1994, 91(26),12574- 12578. R. Pollock, T. Clackson, Dimerizer-regulated gene expression, C u r . Opin. Biotechnol 2002, 13(5), 459-467. D.M. Spencer, T.J. Wandless, S.L. Schreiber, G.R. Crabtree, Controlling signal transduction with synthetic ligands, Science 1993, 262(5136), 1019-1024. P.J. Belshaw, D.M. Spencer, G.R. Crabtree, S.L. Schreiber, Controlling programmed cell death with a cyclophilin-cyclosporin-based
1138
I
18 Genome and Proteome Studies
vivo, J. Am. Chem. SOC. 2000, 122, chemical inducer of dimerization, 4247-4248. Chem. Biol. 1996, 3(9),731-738. 46. W.M. Abida, B.T. Carter, E.A. Althoff, 38. S.D. Liberles, S.T. Diver, D.J. Austin, H. Lin, V.W. Cornish, S.L. Schreiber, Inducible gene Receptor-dependence of the expression and protein translocation transcription read-out in a using nontoxic ligands identified by a small-molecule three-hybrid system, mammalian three-hybrid screen, Proc. Chembiochem2002, 3(9),887-895. Natl. Acad. Sci. U.S.A. 1997, 94(15), 47. K. Baker, D. Sengupta, G. Salazar7825-7830. Jimenez,V.W. Cornish, An optimized 39. T. Clackson, W. Yang, L.W. Rozamus, dexamethasone-methotrexate yeast M. Hatada, J.F. Amara, C.T. Rollins, %hybrid system for high-throughput L.F. Stevenson, S.R. Magari, screening of small molecule-protein S.A. Wood, N.L. Courage, X. Lu, interactions, Anal. Biochem. 2003, F. Cerasoli, Jr., M. Gilman, D.A. Ilolt, 315(1),134-137. Redesigning an FKBP-ligand interface 48. K.S. De Felipe, B.T. Carter, to generate chemical dimerizers with E.A. Althoff, V.W. Cornish, novel specificity, Proc. Natl. Acad. Sci. Correlation between ligand-receptor U.S.A. 1998, 95(18),10437-10442. affinity and the transcription readout 40. T. Clackson, Redesigning small in a yeast three-hybrid system, molecule-protein interfaces, C u r . Biochemistry 2004, 43(32), Opin. Stmct. Biol. 1998, 8(4),451-458. 10353-10363. 41. P.Uetz, L. Giot, G. Cagney, 49. S.L. Hussey, S.S. Muddana, B.R. T.A. Mansfied, R.S. Judson, Peterson, Synthesis of a J.R. Knight, D. Lockshon, V. Narayan, beta-estradiol-biotinchimera that M. Srinivasan, P. Pochart, potently heterodimerizes estrogen A. Qureshi-Emili, Y. Li, B. Godwin, receptor and streptavidin proteins in a D. Conover, T. Kalbfleisch, yeast three-hybrid system, J. Am. G. Vijayadamodar, M. Yang, Chem. SOC.2003, 125(13),3692-3693. M. Johnston, S. Fields, J.M. Rothberg, 50. S.S. Muddana, B.R. Peterson, Facile A comprehensive analysis of synthesis of cids: biotinylated estrone protein-protein interactions in oximes efficiently heterodimerize Saccharomyces cerevisiae, Naturr estrogen receptor and streptavidin 2000, 403(6770),623-627. proteins in yeast three hybrid systems, 42. T. Ito, T. Chiba, R. Ozawa, Org. Lett. 2004, 6(9),1409-1412. M. Yoshida, M. Hattori, Y. Sakaki, A comprehensive two-hybrid analysis to 51. F. Becker, K. Murthi, C. Smith, J. Come, N. Costa-Roldan, explore the yeast protein interactome, C. Kaufmann, U. Hanke, Proc. Natl. Acad. Sci. U.S.A. 2001, C. Degenhart, S. Baumann, 98(8),4569-4574. W. Wallner, A. Huber, S. Dedier, 43. A.J. Walhout, M. Vidal, Protein S. Dill, D. Kinsman, M. Hediger, interaction maps for model N. Bockovich, S. Meier-Ewert, organisms, Nut. Rev. Mol. Cell Biol. A.F. Kluge, N. Kley, A three-hybrid 2001, 2(1),55-62. approach to scanning the proteome for 44. D.C. Henthorn, A.A. Jaxa-Chamiec, targets of small molecule kinase E. Meldrum, A GAL4-based yeast inhibitors, Chem. Biol. 2004, 11(2), three-hybrid system for the 211-223. identification of small molecule-target 52. M. Caligiuri, F. Becker, K. Murthi, protein interactions, Biochem. F. Kaplan, S. Dedier, C. Kaufmann, P ~ u ~ u c 2002, o ~ . 63(9),1619-1628. G. Zybarth, J. Richard, N. Bockovich, 45. H. Lin, W. Abida, R.C. Sauer, V.W. A.F. Kluge, N. Kley, A proteome-wide Cornish, DexamethasoneCDK/CRK-specific kinase inhibitor methotrexate: an efficient chemical promotes tumor cell death in the inducer of protein dimerization in
References
53.
54.
55.
56.
57.
58.
59.
absence ofcell cycle progression, Chem. Biol. 2005, 12, 1103-1115 in press. P.Cohen. Protein kinases-the major drug targets of the twenty-first century? Nat. Rev. Drug Discov. 2002, 1(4),309-315. R. Capdeville, E. Buchdunger, J. Zimmerrnann, A. Matter, Glivec (STI571, imatinib), a rationally developed, targeted anticancer drug, Nat. Rev. Drug Discov. 2002, 1(7),493-502. G. Manning, D.B. Whyte, R. Martinez, T. Hunter, S. Sudarsanarn, The protein kinase complement of the human genome, Science 2002, 298(5600),1912- 1934. J.A. Markwalder, M.R. Arnone, P.A. Benfield, M. Boisclair, C.R. Burton, C.H. Chang, S.S. Cox, P.M. Czerniak, C.L. Dean, D. Doleniak, R. Grafstrom, B.A. Harrison, R.F. Kaltenbach, 3rd, D.A. Nugiel, K.A. Rossi, S.R. Sherk, L.M. Sisk, P.Stouten, G.L. Trainor, P.Worland, S.P. Seitz, Synthesis and biological evaluation of l-ary1-4,5dihydro- 1H -pyrazolo[3,4-d]pyrimidin4-one inhibitors of cyclin-dependent kinases, /. Med. Chem. 2004, 47(24), 5894-5911. P.A. Eyers, I.P. van den, R.A. Quinlan, M. Goedert, P. Cohen, Use of a drugresistant mutant of stress-activated protein kinase 2a/p38 to validate the in vivo specificity of SB 203580, FEBS Lett. 1999, 451(2), 191-196. K. Baker, C. Bleczinski, H. Lin, G . Salazar-Jimenez,D. Sengupta, S. Krane, V.W. Cornish, Chemical complementation: a reactionindependent genetic assay for enzyme catalysis, Proc. Natl. Acad. Sci. U.S.A. 2002, 99(26), 16537-16542. N. Johnsson, A. Varshavsky, Split ubiquitin as a sensor of protein interactions in vivo, Proc. Natl. Acad. S C ~ U.S.A. . 1994, 91(22),10340-10344.
60. S.W. Michnick, I . Remy,
61.
62.
63.
64.
65.
F.X. Campbell-Valois, A. Vallee-Belisle, J.N. Pelletier, Detection of protein-protein interactions by protein fragment complementation strategies, Methods Enzynzol. 2000, 328,208-230. I. Remy, S.W. Michnick, Mapping biochemical networks with protein-fragment complementation assays, Methods Mol. Biol. 2004, 261, 41 1-426. I. Remy, S.W. Michnick, Regulation of apoptosis by the Ftl protein, a new modulator of protein kinase B/Akt, Mol. Cell. Biol. 2004, 24(4), 1493-1504. S. Eyckerman, A. Verhee, J.V. der Heyden, I. Lernmens, X.V. Ostade, J. Vandekerckhove, J . Tavernier, Design and application of a cytokine-receptor-based interaction trap, Nat. Cell Biol. 2001, 3(12), 1114-1119. M. Caligiuri, L. Molz, Q. Liu, F. Kaplan, J.P. Xu, J.Z. Majeti, R. Ramos-Kelsey, K. Murthi, S. Lievens, J. Tavernier, N. Kley, MASPIT: Three-hybrid trap for quantitative proteome fingerprinting of small molecule-protein interactions in mammalian cells, Chem. Biol. 200k 13,711-722. T.A. Carter, L.M. Wodicka, N.P.Shah, A.M. Velasco, M.A. Fabian, D.K. Treiber, Z.V. Milanov, C.E. Atteridge, W.H. Biggs, 3rd, P.T. Edeen, M. Floyd, J.M. Ford, R.M. Grotzfeld, S. Herrgard. D.E. Insko, S.A. Mehta, H.K. Patel, W. Pao, C.L. Sawyers, H. Varmus, P.P. Zarrinkar, D.J. Lockhart, Inhibition of drug-resistant mutants of ABL, KIT, and EGF receptor kinases, Proc. Natl. Acad. Sci. U.S.A. 2005, 102(31),11011- 11016.
I
1139
PART Vlll Outlook
Chemical Biology. From Small Molecules to System Biology and Drug Design. Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim I1143
19 Chemical Biology - An Outlook Giinther Wess Outlook
Chemical Biology has evolved to a strong driving force in biomedical science. It is a paradigm change and will enable scientists to approach grand challenges. Chemical Biology is not limited to academia. It will contribute to a wide range of industrial applications, in particular in the field of drug discovery. Systems biology as well as translational medicine might also benefit from several elements of Chemical Biology. In this article the wide range of application and impacts will be highlighted.
19.1 The Evolving Concept of Chemical Biology
Almost 20 years ago Arthur Kornberg stated in his famous article “The Two Cultures: Chemistry and Biology” the following: “. . . we now have the paradox of the two cultures, Chemistry and Biology, growing farther apart even as they discover more common ground . . [l] This was made at a time when it had already become apparent that the 1980s had ushered in a new era in biomedical research with new technologies providing previously undreamed opportunities. Ten years later S.L. Schreiber and KC Nicolaou commented on the emerging concept of Chemical Biology as “. . . the perhaps most exciting development. . .”, “. . . that biological problems are increasingly well defined from a chemist’s point of view . . .” and . . . “while Molecular Biology allows the function of biological molecules such as proteins and nucleic acids to be altered by mutation, Chemical Biology directly alters the function of biological molecules by chemical means . . .”. Finally they defined the core of the field of chemical biology as “. . . using small molecules or designed molecules as ligands to directly alter the function of biological molecules . . [2]. The next milestone happened in 2005: The Nature Publishing Group launched the new journal Nature Chemical Biology with the statement that “. . . Chemical Biology has emerged as a field grounded .I’
.I’
Chemical Biology. From Small Molecules to System Biology and Drug Design Edited bv Stuart L. Schreiber. Tarun M. Kaooor. and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag G k b H 6 Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
1144
l in technical advances brought about by the close collaborations of Chemists 19 Chemical Biology - An Outlook
and Biologists . . .” and “. . . Chemical Biologists have tackled challenging problems in Biology, ranging from cellular signaling to drug development and Neurobiology . . . the field is connected by a common desire to understand and manipulate living systems at the molecular level with increasing precision . . .” [3]. Where are we today and what does the future hold? Is chemical biology the bridge or the common ground between both disciplines? What are the great challenges ahead of us that can be answered in the next 20 years? How will the field emerge? In my view the previous chapters of this book have convincingly demonstrated that chemical biology is not simply a new scientific discipline. It is a paradigm change in the way scientists approach biomedical questions. In addition, it is a kind of mindset change across different scientific cultures facilitating seamless interactions and collaborations. This is required to be able to approach grand challenges in biomedical science. If Arthur Kornberg was right at his time, chemical biology will bring scientific disciplines and research areas closer together, and enable them to discover more common ground, sharing a common vision, setting common goals, and launching joint efforts.
19.2 Chemical Biology in Academia
Although there is not yet a precise definition of chemical biology, the common understanding among many scientists is that chemical biology directly alters, activates, perturbes or inhibits the function of biological macromolecules by chemical means, that is, small-molecule ligands. In future, this leitmotiv should be extended to higher levels of complexity and should also include biological systems and pathways, regulatory networks, cellular processes, and even whole organisms. The scientific questions will range from basic science, purely academic in nature, to questions of life science, drug discovery, and future medicine. It will also include plant biology and even ecosystems and their evolution. Chemical biology brings the small molecules into play. It will significantly give new insight - how things function at various levels. Needless to mention that this will require the fruitful interplay of many disciplines and technologies such as Biology, Chemistry, Medicine, and Mathematics, screening in vivo models and metabolomics. Such an approach will not only give new insight into fundamental biological processes but will also create new opportunities for new products and businesses. At this point, some remarks on the future role of chemistry in the context of chemical biology seem to be required. With some oversimplification, chemistry was traditionally concerned with structure and synthesis, and biology more with function (with the exception of structural biology of biological macromolecules). Research into structure-activity relationships was always
79.2 Chemical Biology in Academia I1145
an interdisciplinary affair and was therefore fairly underdeveloped in view of the actual opportunities. In the world of chemical biology, structure-activity relationships would be extended to a broader understanding of how to induce a particular biological response in a biological system through a small molecule. It is quite compelling that in addition to the three elements of structure, synthesis, and function the paradigm of chemical biology requires a fourth element: this is selection. It addresses unambiguously the critical question of WHAT is the chemical structure needed to get the desired biological response and how does one get there. Therefore, selection is a key element of chemical biology approaches. Eschenmoser has differentiated presynthesis selection from postsyntheses selection [4]. In his view presynthesis selection is clearly a design process in which the chemist has the knowledge to define one molecule that will exhibit the biological properties. In contrast, postsynthesis selection means discovery, that is, finding the molecule in a typical high-throughput screening approach. As biological function is the ultimate goal, the chemist is challenged by the question: WHAT is the structure I need and how do I get there? This strongly depends on the information that is available about the biological system, in particular, the biological space that needs to be occupied by a small molecule to get the expected biological response. Therefore, the central theme is how to generate and accumulate knowledge that enables identification of the regions of chemical space that are generated by small molecules, which are biologically relevant. Every day we learn more about the complexity of biological systems and that our reductionistic models are getting less useful, explaining our experimental results. Therefore, we are far away from de novo predicting chemical structures that are biologically relevant. A combination of design and discovery processes is still required. It is a very long way to go and the accumulation of knowledge on the structure and biological function of biological macromolecules in whole systems is on the critical path for the future. Regarding the biological systems, we need very reliable experimental data to make correlations. Meaningful high-content screening systems as well as phenotypic screening and in vivo systems with smart readouts that allow quantification are required. These capabilities will also significantly contribute to projects in systems biology. One can even go one step further that chemical biology will become a driver of systems biology. As structure function correlations are a central theme of chemical biology approaches and chemical biologists will define WHAT needs to be synthesized, they need excellent synthetic organic chemists as their partners who are skilled to rapidly synthesize in reasonable quantity what is really required. This includes single small molecules as well as small-molecule libraries. I also refer to the categories DOS and TOS, which have been introduced by S.L. Schreiber [ S ] . In conclusion, the study of biological systems at higher levels of complexity, through small molecules and finding out the rules behind how things function will be the greatest challenge and a tantalizing opportunity. A typical example could be the understanding of stem cell biology in health and diseases and
1146
I stimulating the body’s own regenerative mechanisms through small-molecule I 9 Chemical Biology - An Outlook
treatment for promoting survival, migration and homing, proliferation, and differentiation [GI. 19.3 Chemical Biology in Industry
Chemical biology is by no means limited to academic projects. It has the potential to contribute significantly to bring industrial research, in particular, drug discovery,to the next level and help improve innovation and productivity. Currently, the pharmaceutical industry is challenged by a decline of their R&D productivity, in particular, delivering innovative products that are real breakthroughs. Many multifaceted reasons can be identified. In summary, they fall under three main categories: identifying relevant disease approaches based on drugable targets; generating a molecule that has the properties to become a drug (druglike molecules); demonstrating a real therapeutic advantage over existing therapies, which justifies a competitive label. With regard to the identification of drugable targets, chemical biology can, as described in the previous paragraph, play an important role in target or pathway validation to better understand the biological systems or get an idea of potential side effects. In this context, it might provide valuable tools and probes for experiments to validate hypotheses. It should be mentioned here that several efforts are ongoing across the industry to improve the target identification/validation output not only by introducing new technologies into the value chain or through new organizational models and processes but also by introducing new scientific strategies dealing with genomics and disease biology. Such an effort has recently been described as “a new grammar for drug discovery” [7].One might also speculate that the interplay between chemical biology and systems biology opens new opportunities. However, the most important contribution of chemical biology is in the area of generating drug-like molecules. This can simply be summarized by “finding better compounds faster”. Compounds that are not only high affinity binders of bio-macromolecules but compounds that can also be optimized into drugs with reasonable effort. Two aspects have to be considered and distinguished: Finding a molecule with the right biological profile interacting with the defined target(s) and/or exhibiting the required pleiotropic effects in the biological system. In addition, having the required selectivity and lack of activity against antitargets
19.3 Chemical Biology in Industry
that would diminish the therapeutic effect and/or create unwanted side effects. Finding a molecule that has the right profile with regard to Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET)as well as physicochemical properties. Both areas comprise very complex challenges. The first one deals with the question of what the molecule does to the biological system with regard to activity and specificity, for example, inhibiting an enzyme or activating a receptor. The second one deals with the question of what the system does to the molecule, for example, getting metabolized by an enzyme of the liver or being transported through a membrane. Despite the fact that pharmaceutical companies will optimize these areas by applying new technologies and management processes [8] there are typical, critical, success elements chemical biology can contribute. These elements are primarily based on knowledge on targets and molecules and particularly on target families and privileged molecular scaffolds, recognition patterns, and binding motifs. This knowledge has to be accumulated over time and needs validation in vivo to become more valuable. In addition, this knowledge on target classes and privileged drug-like molecules will be complemented by further insight into the ADMET rules and the correlation to the human system. Chemical biology in drug discovery would also address how drugs really work in interdependent systems including pleiotropic effects of drugs [9].Emphasis would also be laid on the characterization of compounds in distinguished transgenic cellular and in vivo models to get a comprehensive set of data on the whole biological profile. Such a systematic science-driven strategy would lead into a new science of drug discovery. New types of targets require new approaches that are much more knowledge-based and see the molecules in their complex environment of interdependent biological networks. Needless to say that the intention is definitely not to replace the classical pharmacology approach. The question is simply how to reach the next level and get the most relevant success critical information as soon as possible (Fig. 19-1). Mechanisms of health and diseases and the complex interaction with the environment at macroscopic and microscopic levels will become another central theme in the context of future medicine that will be much more focused on the question of prevention rather than classical treatment and “polypharmacy” strategies. Other aspects are how to induce repair mechanisms and how to cope with the question of personalized medicine. It is apparent that these complex future questions will require much more interaction between academic research and industry. The grand challenges in drug discovery require new types of interaction, networks, and clusters of knowledge. Chemical biology will not only be a major contributor but also a key driver.
I
1147
1148
I
19 Chemical Biology
-
An Outlook
Fig. 19-1 Reaching the next level. 19.4 Chemical Biology and Translational Medicine
Finally, some comments are needed regarding the interaction of chemical biology and translational medicine. The leitmotiv of translational medicine has been taken “from bench to bedside and bedside to bench”. Chemical biologists need validation of their hypotheses, and also a learning loop from clinical studies feeding back clinical observations and building them in into new hypothesis. This is true for academic research as well as industry research. I t regards not only new compounds but also already known drugs in the market and their biological profiles including side effects. In the long run, this will lead to future medicine with a strong focus on individualized prevention. Key milestones and achievements will be the better use of already existing drugs, and drugs for the individual needs of the patients. This will require a battery of diagnostic tools, which characterizes the patient in such a way that personalized treatment becomes a reality. Chemical biology will also make valuable contributions by dealing with the biological systems and supporting the development of new diagnostic tools. 19.5 Knowledge and Networks, Education and Training
Integration and leverage of knowledge across disciplines and working in teams and networks are critical success factors. Therefore, it must be assured
19.G Conclusion I1149
that knowledge can flow and that there are no hierarchical or bureaucratic boundaries. There is also a component that has to do with values and behavior: sharing of knowledge across organizations and disciplines. Networks should have in place mechanisms that encourage and reward knowledge sharing. The networks should not be limited to academia. They should also include partners from industry. This is a great chance to approach new fields with grand challenges and to use the complementary capabilities of academia and industry. In the precompetitive area, it’s just a question of commitment and real interest. In the competitive area, it should be possible to find adequate legal frames that respect the interest of the different stakeholders. In addition, by performing joint efforts these partners will find more common ground, as previously expected. How should chemical biologists be trained and educated? Is this a training in the job, a new curriculum or branch at the chemistry departments, or a graduate program? Currently, there are all kinds of approaches and a clear answer cannot be given at present. As the field is emerging, the requirements and necessary skills will become defined. In the end, there might perhaps be less traditional chemistry departments but more chemical biologists working at different places.
19.6 Conclusion
There is already one common denominator or even a leitmotiv of future chemical biology: chemical structures of small molecules and the biological function in health and diseases at the level of biological systems. How do structures look like those that induce the desired biological response profiles? Although we are far away from predicting chemical structures and biological function in whole organisms and do not yet understand the rules behind, we feel very much encouraged through the chemical biology approaches. We are looking forward with excitement to reach the next milestones. We can define them and approach them in interdisciplinary projects. Some might be at the level of grand projects and need significant resources. They will all be based on knowledge. Knowledge is the key driver. The chemical biology approach is a new paradigm. It will guide us in the biomedical research of the twenty-first century. Currently, we are becoming more and more aware of how complex biological systems function. And even the question of what a gene really is, has been asked recently [lo].Therefore, the realization of our vision requires even more joint efforts across disciplines, organizations and institutions. Chemical biology has been the answer to Arthur Kornberg’s provocative statement. It is the common ground from which new directions will evolve and grand challenges will be approached. This will bring science to the next level.
1150
I
I9 Chemical Biology - An Outlook
Chemical biology will contribute significantly to systems biology, and to some extent contribute to translational medicine. Today chemical biology still means different things to different people. Nevertheless this is more a strength than a weakness. It is a unique opportunity to become defined and positioned over time by the scientists and their invaluable scientific achievements.
Acknowledgment
I am very grateful to a number of colleagues who I had the privilege to work with and who have stimulated and encouraged me to develop Chemical Biology approaches in industry: Frank Douglas, Birgit Konig, Hildegard Nimmesgern, Daniel Schirlin, Andreas Batzer, Hans-Peter Nestler, Heiner Glombik, and Bruce Baron. They all contributed significantly not only to develop a great concept but also to implement and make it a success.
References 1.
A. Kornberg, The Two Cultures: chemistry and Biology, Biochemistry 1987,26,6888-6891.
6.
2.
S.L. Schreiber, K.C. Nicolaou, What’s in a name? Chem. Biol. 1996,3,
7.
1-2. 3. A community of chemists and
4.
5.
8.
S. Ding, P. Schultz, A role for chemistry in stem cell biology, Nut. Biotechnol 2004, 22, 833-840. M.C. Fishman, J.A. Porter, A new grammar for drug discovery, Nature 2005,437,491-493. G. Wess, M. Urmann, B. Sickenberger,
Medicinal Chemistry: Challenges and biologists, Nat. Chem. Biol. 2005, 1, 3, Editorial. Opportunities, Angew. Chem. Int. Ed. A. Eschenmoser, One Hundred Years Engl. 2001,40,3341-3350. 9. G. Drews, Case histories, magic bullets Lock-and-keyPrinciple, Angew. Chem. Int. Ed. Engl. 1994, 33, 2363. and state of drug discovery, Nature S.L. Schreiber, Target-Oriented and Reviews Drug Discovery, 2006, 5, Diversity-OrientedOrganic Synthesis in 635-640. 10. H. Pearson, Whats a Gene, Nature Drug Discovery, Science 2000, 287, 1964-1969.
2006,441,399-401.
Chemical Biology Edited by Stuart L. Schreiber, Tarun M. Kupoor,and Gunther Wess CoDvriaht 0 2007 WILEY-VCH Verlaa CmbH & Co KCaA. Weinheim I1151
Index
a AANAT, Arylakylamine N-acetyltransfrase (AANAT) 385 ABPP, Activity-based protein profiling ( A B D D ) 403,1119 Absorption, distribution, metabolism, elimination and toxicology (ADMET) 801,1147 Absorption, distribution, metabolism, elimination/excretion, and toxicity (ADMET) properties applicability domain, estimation of 1015f applications and examples of 1018ff datasets 1OlOf pretreatment of 1013 descriptors, calculation of 1016ff development of 1008f drug solubility 1007 future developments in 1035f general considerations for 1009ff history of 1008f in silico toxicity models 1033ff intestinal permeability 1007f model validation 1013f models 1OlOf multivariate methods linear 1Ollf nonlinear 1013 outlier compound, labeling of 1015f Mahalanobis distance 1015 prediction of 1003ff statistical tools 1 O l l f f toxicity 1008 training and test set selection 1014f acdAla, Acridonylalanine (acdAla) 289 ACDName 771
ACE, Angiotensin converting enzyme (ACE) 699 N-Acetyl Galactosamine (GalNAc) 551 N-acetyllactosamine natural substrate 643 2’-acetyltransferase(AAC(2’)) 68 1 6’-acetyltransferase(AAC(6‘)) 681 ACM, Atom Connectivity Matrix ( A C M ) 729 ACP, Acyl camerprotein (ACP) 463, 472, 521 AcpS, Acyl carrier protein synthase (AcpS) 472 Acridonylalanine (acdAla) 289 Actin, see Cytochalasin Actinorhodin 525 Activated sugar-nucleotide substrates 636 Activation domain (AD) 1122 Activation function 1 (AF1) 895 Activation function 2 (AF2) 892 Activation-induced cell death (AICD) 1101 Activator protein 1 (AP-1) 895 activities like depudecin 99 Activity identifier (AID) 769 Activity-based protein profiling (ABPP) 403,406, 1119 disease-associated enzymes parallel discovery of 423 human disease, diagnostic markers and therapeutic targets for 423 small-molecule probes, active site-directed measuring protein activity 403 Acyl carrier protein (ACP) 463,472, 521 fusion proteins fluorescence labeling of 474
Chemical Biology. From Small Molecules to System Biology and Drug Design Edited by Stuart L. Schreiber, Tarun M. Kapoor, and Gunther Wess Copyright 0 2007 WILEY-VCH Verlag CmbH 61 Co. KGaA, Weinheim ISBN: 978-3-527-31150-7
1152
I
Index
Aminoglycoside arrays Acyl transferase (AT) 521 hybridization to 681 Acyl-carrier protein synthase (AcpS) 472 Aminooxypentane (AOP) 583 AD, Activation domain (AD) 1122 AMPA, Adenine triphosphate (ATP) 826 a -amino-3-hydroxy-5-methyl-4-isox~zoleADMET, Absorption, distribution, propionate (AMPA) 460 metabolism, elimination and toxicoloby Ampholyte 1020 (ADMET) 801,1147 Analog-specific Kinases 127 ADMET properties, Absorption, kinase-signaling pathways 128 distribution, metabolism, peptide substrates elimination/excretion, and toxicity combinatorial 128 (ADMET) properties 1003 phosphoproteomics 128 Adrenoceptor 938 targets Bz-adrenoceptor protein 941 in the genome 128 cloning of 941 of each kinase 128 AF1, Activationfunction I (AFI) 895 Analysis of variance (ANOVA) 1087 AF2, Activationfunction 2 (AF2) 89% Androgen receptor (AR) 903 Affinity chromatography 941 ANF, Atrial natriureticfactor (ANF) 374, Affinity labeling 941 Agonist 939 714 Angiogenesis 104 full 939 blood vessels inverse 939 from preexisting 104 partial 939 new 104 AGT, O'-Alkylguanine-DNA alkyltransfirase Curcuminoids 105 (AGT) 428,463 Fumagillin 105 fusion proteins Inhibitors 104 application of 465 Angiotensin converting enzyme (ACE) immobilization, scheme for 468 labeling of 463ff 699 Aib, a-amino isobutyric acid (Aib) 995 1,2-anhydrosugar 671 AICD, Activation-induced cell death (AICD) Animal Models 239 1101 degenerative diseases 240 AID, Activity identijier (AID) 769 of Disease 239 Aldehyde dehydrogenase-1 (ALDH-1) 411 study of ALFUC, a-l-Fucosidase (ALFUC) 369 invivo 239 Alkene-containing linker 671 pathway 239 Allosteric (allotopic) modulator 939 protein 239 Amide ligation transgenic mice 239 using auxiliaries 577f ANN, Art$cial neural network (ANN) Amine-containing linkers 673 1023 a-amino-3-hydroxy-5-methyl-4-isoxazole-ANOVA, Analysis of variance (ANOVA) propionate (AMPA) 460 1087 a-amino isobutyric acid (Aib) 995 Antagonist 939 Amino acid Antibodies 52 FlAsH approach catalytic antibodies 53 small molecule modification, reliance molecules on 612 clonal expansion 53 Amino acid side chains designed 52 synthesis of functionalized 578 guide 53 Amino group somatic mutation 53 lysine acylation 595 Antithrombin 111 (AT 111) 683 secondary bioconjugation AOP, Aminooxypentane (AOP) 583 oxidative coupling reactions 623 AP-1, Activator protein 1 (AP-1) 895 Aminoacyl tRNA synthases 386 Apicidin 98 Aminoglycoside 668,679,681,682 cyclic tetrapeptide 98
Index
Depudecin 98 structural similarity toTPX 99 Apoptotic pathways 1046 Applications 96, 216, 237f, 255 Angiogenesis 104 Animal Models 239 Capsaicin 108 Catalysis 220 Cell Therapies 240 DNA-Protein Interactions 218 Helical Mimetics 260 Immunosuppressant 106 Mechanism of action 97 modulators bioavailability 255 peptide-based 255 Parthenolide 109 Practical Examples 96, 255 Proteasome 101 Protein Function 239 Protein-Protein Interactions 216 Regulated Transcription and Gene Therapies 241 RNA-Protein Interactions 219 Small Molecule-Protein Interactions 220 B-TurnslStrands 256 two-hybrid assay for biology research 216 integral 216 Aqueous solution native chemical ligation in 575 AR, Androgen receptor ( A R ) 903 ArCPs, Aryl carrier proteins (ArCPs) 472 Array experiments experimental designs, issues of 1085ff global gene expression studies high-density oligonucleotide arrays, biotin-labeled cRNA target 1086 microarray technology amplification protocol, choice of 1089 messenger RNA, and pooling of R N A samples 1088 replication and sample size 1087f RNA amplification 1088 replicate microarray experiments natural differences of, gene expression in inbred mouse strains 1087 spotted complementary DNA (cDNA) microarrays and oligonucleotide microarrays 1086-
B-Arrestin 942 Artificial neural network ( A N N ) 1023 Aryl carrier proteins (ArCPs) 472 Arylalkylamine N-acetyltransferase (AANAT) 394 melatonin production 394 nonphosphorylated 395 phosphonate-containing 394 role of phosphorylation of 394 semisynthetic, stabilities of 395 Ascomycin 558 AT, Acyl transferuse (AT) 521 AT 111, Antithrombin I l l (AT 111) 683 Atom Connectivity Matrix (ACM) 729 ATP, Adenine triphosphate (ATP) 826 ATP-binding site 3% ATPyS-acetyl-kemptide 400 Atrial natriuretic factor (ANF) 374, 714 Automated carbohydrate synthesis 670 oligosaccharide synthesis with glycosyl phosphates 673 oligosaccharide synthesizer 668 Aventis traditional research and development organization organizational design, of three principles 790 relevant selected target, critical in disease 791 Azides and alkynes dipolar cycloaddition Click reactions, use of “spring-loaded’’ reactive components 619 enumerating stereospecific chemical reactions 619ff
b Bacteria, pathogenic detection of 684 Bacteriorhodopsin 941, 949 Bafilomycin and Concanamycin 103 biological activities in vitro 103 regulators of organelle pH 104 BAL, British anti-lewisite (BAL) 435 BCS, Biopharmaceutics classijcation system (BCS) 1032 BCUT descriptors 1038 Beadle and Tatum’s original tenets of “one gene-one enzyme” hypothesis 302 Benzamide HDAC inhibitors, fourth class of 701 O‘-Benzylguanine (BG) 463
I
1153
1154
I
hdex
06-Benzylguanine-Methotrexate (BGMtx) 467 Benzylguanine-SNARF(BGSF) 466 BG, 06-Benzylguanine( B G ) 463 BG, Bindinggroup ( B G ) 409,463 BGMtx, 06-Benzylguanine-Methotrexate (BGMtx) 467 BGSF, Benzylguanine-SNARF( B G S F ) 466 Biarsenical for tetracysteine peptide picomolar affinity of 452f photoinduced generation of ReAsH-tetracysteinecomplex 449 singlet oxygen 449ff Biarsenical dye sequence-specific protein labeling with FlAsH dyes 612 Biarsenical-tetracysteine FlAsH-tetracysteineanisotropy monitoring proteolysis 447 Biarsenical-tetracysteine complex biarsenical ligand 432 SDS-polyacrylamideGel Electrophoresis (PAGE)analysis 453 Biarsenical-tetracysteine method dithiol arsenic antidotes EDT 437 Biarsenical-tetracysteine system application BarNile-EDT2,synthesis of 446 smaller fluorescent reporter, constructing 441 Bicyclohexyl mimetics 646 Bile salt export pump (BSEP) 367 Binding energie 396 Binding group (BG) 409 Biochemical mechanisms pathway activation kinetics of, magnitudes and timing of signals 1061 Biochemical networks 1045 Biochemical pathways downstream signaling cascades and networks 1072ff Biochemical signaling mechanisms evaluation of, quantitative models and quantitative experimentation 1077 Bioconjugates chemical synthesis of large 567ff Bioconjugation history and development of 595ff new methods targeting of, unnatural functional groups 616ff
NHS esters, reaction of widely used strategy 595 Bioconjugation proteins metal-free bioconjugation using strain-promoted [3 + 21 dipolar cycloaddition reaction 622 Bioconjugation reaction bioconjugate purification 624f mass spectrometry, advances in 627 new transition metal-based methods, availability of 627 Bioconjugation technique targeting native functionality countless new strategies, provision of 594 Bioinformatics 959, 1048 Biological Analysis Screening 20 Biological field gene profiling, molecular basis of 1083 Biological networks connectivity of 302 Biological Problems 18, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39,41,43, 5 3 BiologicalAnalysis 20 Chemical Synthesis 20 designed biological functions 54 DNA modules predefined 54 genome fully synthetic 54 man-made cell 53 by Nature directly 53 synthetic biology 53 Biological research cell complete protein repertoire (proteome) 404 functional proteomics chemical strategies for 405ff protein expression and protein function 405 history and development of 404f molecular mechanisms focuson 1061 introduction to lO6lff novel genes, identification of 405 postgenome era global approaches for 404f Biological Solutions to 45,47,49, 51 Biological space changing scaffolds, to scaffold morphing 841
Index
chemical and biological space concepts, schematic visualization of 835 chemical space focused libraries and scaffold hopping, iournev through 837ff kinase inhibitors: competitors of ATP 837 kinome maps 837 combinatorial chemistry building on established - privileged scaffolds 835ff building on established - privileged scaffolds, relation to target families 839 in silico scaffold hopping, and biological scaffold morphing 840 molecular diversity, advent of 838 combinatorial libraries chemical space, around proven starting points 836 exploring of 834ff putting pieces together - fragment approaches 842ff selected fragment screening experiment application to, proteases and kinases 845 Biological studies gene function chemical probes with, specificity of genetic methods 365 Biological systems different strategies comparison of 363 global response of 379 levels of hierarchy, probing of 355, 356 protein function deeper understanding of 355 modulation by small molecules 355 reverse chemical genetics agonists of a - and /?-adrenergic receptors 359 chemical biology, probe tools identification 357 concept of 356ff general considerations 361ff Biology and biomedical research genome-wide gene expression analysis widely used tool 1109f Bioluminescent resonance energy transfer (BRET) 1132 Biomedical research proteomics methods, need for 405
new diagnostic markers and drug targets, identification of 404 transcript profiling, standard tool in 404 Biomolecular Interfaces 135 biological specificity oflarger interfaces 135 Engineering 135 Extended 135 interfaces large regions of protein 135 redesign 135 Biomolecules unnatural functional groups “Amber” codon 613 methods for, biosynthetic incorporation of 612ff posttranslational protein modifications using metabolic machinery for 613 successful for, N-acetylglucosamine derivatives 614 Biopharmaceutics classification system (BCS) classes of 1032 computer-based 1032ff Biopolymers classes of 669 interactions of classes of 670 Bisubstrate analog 395 for serinelthreonine kinase 399 Bisubstrate tyrosine kinase inhibitors 396 Black, James alky-substituted histamine analogs beta-blockers, development of 359 antihistamines and two histamine receptors 794 Blood group determinant oligosaccharides 671 Bovine rhodopsin 941 Branched oligosaccharides 671 Breast cancer cells gene expression profiling identification of desired molecular fingerprints 922 Brefeldin A (BFA) 84 110-kDprotein 85 BFA action biochemical 87 mechanism 87 cycle GTP-GDP 87 Golgi ARF binding to 87
I
1155
1156
I
Index Caged Proteins 156 channel activation kinetics of 159 methodology biosynthetic 156 Mutagenesis Amino Acid 156 Site Directed 156 Unnatural 156 Photoactive Residues Introduction of 156 photoirradiation after 157 C before 157 C-Abl protein kinase 549 replacing C-Crk-I1 550 natural ones 157 C-Crk-I1 signaling protein 549 trans-cis C-terminal thioester 387 photoisomerization of the azobenzene C-terminal tyrosine phosphorylated tail moiety 158 391 Caged Tyrosine Residues 160 C-type lectin-like domains (CTLDs) 643 Caged Cysteine and Thiophosphoryl C-type lectins 643ff Residues 162 C2 hydroxyl group 671 caged version a-CA, a-Chloroacetamide(a -CA) 411 in vitro 161 Ca+2-sensingreceptor (CaR) 953 in vivo 161 CaBP, Calcium-bindingprotein (CaBP) 369 LMS-1 161 CADD, Computer-assisted drug design RS-20 161 (CADD) 958 nitrobenzyl group cage asacage 160 cyclic nucleotides 147 signaling pathway 162 Caged Compounds 140 Calcitonin receptor-like receptor Caged Proteins 156 (CRLR) 948 Controlling Protein Function 140 Calcium-binding protein (CaBP) modulate protein function 140 369 Multiresidue Protein Caging 150 CALI, Chromophore-assistedlight Photoactivatable Groups 140 inactivation (CALI) 428 Single Residue Protein Caging 152 Calmodulin (CaM) 446 Small Caged Molecules 159 Ca2+ activation small molecule 140 protein dynamics of 446 Caged Cysteine and Thiophosphoryl single FlAsH-labeled CaM molecules Residues 162 protein motions of 448 on serine residues 162 CaMKI, Calmodulin-depend kinase I peptide 163 (CaMKI) 870 GRTGRRNAI 164 CAMP, Cyclic adenosine monophosphate inhibitory behavior 164 (CAMP) 312,938 thiophosphotyrosyl 163 CAMP response element binding (CREB) protein kinase A 163 313 thiophosphoryl-Ser residue CAMP-response Element Binding Protein over a cysteine residue 162 (CBP) 914 Caged Peptides 159 Cancer chemotherapy Caged lysine 160 multiple HDAC inhibitors, in clinical trials for 696 Caged Tyrosine Residues 160 Phosphorylation Sites and Candidate drugs (CDs) 1004 selection of 1010 Phosphopeptides 165
Brefeldin A (BFA) (continued) Golgi-ER recycling pathway 85 membrane transport from the Golgi 85 BRET, Bioluminescent resonance energy tranSfer (BRET) 1132 British anti-lewisite (BAL) 435 BSEP, Bile salt export pump (BSEP) 367 BTK, Bruton’s Tyrosine Kinase ( B T K ) 858 Bumps and Holes 231
lndex
Capsaicin 108, 133 biochemical change in mammal versus avianVR1 134 cation channel avianVR1 133 VR1 133 channel's response to heat and acid 134 component of hot chili 133 pungent ingredient of hot pepper 108 Sensitivity 133 VR1 108 CaR, Ca*+-sensingreceptor (CaR) 953 Carbodiimide coupling reagents 485 Carbohydrate 567,635,668 branched 671 cell-surface 681 function of, in biologically important recognition processes 669 interactions of, in biological systems 672 as vaccines 677 Carbohydrate affinity screening 637, 677 Carbohydrate-functionalized fluorescent polymer 668,684 Carbohydrate microarrays 674, 676 preparation of 676 Carbohydrate-modifying enzymes 638f Carbohydrate-nucleic acid interactions aminoglycosides 679 Carbohydrate-processing enzymes inhibitors of 657f Carbohydrate-protein interactions selectins and heparin 681 Carbohydrate recognition domains (CRDs) 641 CARMl, Coactivator-associated arghine methyltransferuse 2 (CARMI) 914 Carrier protein (CP) 471 CART, Constitutively activating receptor technology (CART) 948 Catalysis 206, 220 bond formation acceptor 222 donor 222 glycosidic 221 cephalosporin hydrolysis 207 enzyme 206 as a fourth component 206 three-hybrid system 20G Quest 208 S. cerevisiae 222 CBD, Chitin binding domain (CBD) 545
CBP, CAMP-response Element Binding Protein (CBP) 914 CBP, Chemical biology platforms (CBP) 789,914 CC, Computational Chemistry (CC) 1003 CCD, Charge-coupled device (CCD) 448 CCK, Choleqstokinin (CCK) 955 CDCA, Chenodeoxycholic acid (CDCA) 367 CDG, Congenital disorders ofglycosylation (CDG) 635 CDK-related kinases (CRKs) 1130 CDK2, Cyclin-dependent Kinase 2 (CDKZ) 845 CDKs, Cyclin-dependent kinases (CDKs) 1130 cDNA, Complementary DNA (cDNA) 1084 CDs, Candidate drugs (CDs) 1004 Cell living cells designing protein tags for 454 Cell biology regulatory processes in 1045 Cell culture isoprenoid biosynthesis, halting of with addition of lovastatin 615 Cell cycle 1046 Cell decision making in context-dependent manner, tightly controlled 1061 Cell function cytosolic signaling enzymes and adaptor proteins association of 1066 growth factor and cytokine receptors of more complex situations 1065 intracellular signal transduction processes modelling of lO6lff intracellular signaling 1065 normal and diseased cell function ability of control 1061 receptor phosphorylation general purpose of 1066 receptor-mediated covalent modifications and molecular interactions 1065ff signal transduction biochemical integration of 1061 Cell lines human cancer cell lines, behavior of 416
I
1157
1158
I
Index
Cell lines (continued) xenograft-derivedbreast cancer cells secreted protease activities, dramatic elevations in 416 Cell-permeable inhibitors 640 Cell receptor complexes kinetic proof reading 1067 ligands with fast off-rates 1068 receptor phosphorylation and binding states lO66f significant challenges, standpoint of modeling 1066 slow versus rapid exchange determination of, substrate phosphorylation rates 1069 sub-nanomolar ligands functional receptor complexes, forming of 1067ff T-cell receptor engagement of peptide-MHC complexes 1067 Cell regulation and function molecular mechanisms underlying cell function 1061 Cell surface receptor dimerization forming dimers, or higher oligoniers oncell 1063ff receptor trafficking non static receptors 1065 Cell-surfacecarbohydrate 681 Cell-surface carbohydrate recognition interactions 641ff Cell surface receptors binding of signaling pathways 1062ff Michaelis-Menten kinetics hyperbolic binding 1064 receptor dimerization receptor dirnerization mechanisms and dose response 1064 Cell Therapies 240 cell growth switch 240 death switch 241 Regulated 240 signaling proteins 240 vaccine cellular cancer 241 Cell-based assays 361 Cell-based reporter assays FK228 studied by Yoshida group 712 spiruchostatin A, biological characterization of 712ff Cell-cell recognition 668
Cellular compartments cellular and subcellular length scales concentration gradient, concept of 1069 cytosol, diffusive transport in 1069 spatial organization and gradients on 1069ff Cellular functions spatial gradient sensing ability of localizing, intracellular second messenger(s) 1070 and chemotaxis 1070 spatial gradient sensing, in eukaryotic cells adhesion processes, driving cell crawling 1070 Cellular gene products target identification problem 308 Cellular processes multiparametric considerations dosage effects 318 dose and time 318 multidimensional 318 Cellular retinoic acid binding protein (CRABP-1) 442 Cellular retinoic acid binding protein 11 (CRABP-11) 369 Cellulose GG9 Central nervous system (CNS) 379 CFP, Cyanfluorescent protein (CFP) 428 cGMP, Cyclic guanosine monophosphate (cGMP) 373 Chain length factor (CLF) 520 Charge-coupled device (CCD) 448 Selvin and coworkers single ReAsH-tetracysteine complexes 448 single ReAsH-tetracysteinecomplexes nanometer localization of 448 Chemical Abstracts via SciFinder 760 Chemical and biological data other organizational and knowledge challenges 801f Chemical biological studies molecular probes to study, cellular functions of proteins 1118 Chemical biology 1143ff altering landscape with new chemical tools 628 array synthesis, starting points for libraries 835
biological space “molecular toolkit”, expanding of charting biological space - structural 300 biology and informatics 829ff nonnatural amino acids homology modeling, understanding transfection method 288 structural space 830 nonnatural mutagenesis membership of, protein to protein application of 289ff family 831 basic strategy of 291 orphan GPCRs, receptors without fluorescence labeling 289 agonistic or antagonistic ligands polarity-sensitive fluorescent amino 832 acids 289 understanding biological machines, position-specific fluorescence labeling from structure to function 832ff 289 understanding of 828ff nonnatural mutants and drug discovery engineered aaRSs 287 understanding of, MoA of organic in vivo aminoacylation 287 small molecules 1135 microinjection method 288 and target family approach 847 synthesis of 287ff and polar/hydrophobic balance 805 novel molecular entities chemical-genetic approaches modulating biological processes 825 high-throughput phenotypic assays pathways and networks 307 screens to reveal connections between chemical-genetic maps, creation of 307 307 PNA-assisted aminoacylation method chemical-genetic modifier, use of 307 for amino acids and tRNAs 281 combining structural information reshaping methods of, drug discovery biological process modulation 825 846 concept of 1143ff role of chemistry in 1144 drug discovery single molecular spectroscopic analysis synergizing structural relationships of 289 proteins 826 small-molecule modulators drug-like molecules, generation of charge of identifying 423 1146 structure function correlations 1145 drugable targets, identification of structure-activity relationships 1146 1145 education and training of chemical synthetic codons biologists 1149 containing nonnatural nucleobases genomic tools 286f for identifying candidate targets 832 Schultz’s group, nonnatural base pairs green fluorescent protein (GFP) 287 alternative methods, variety of 427 system biology 1145 Hecht method target family approach 825 for isolated tRNAs in test tube 281 foundations of 825 micelle-mediated method, for target family oriented concepts aminoacylation in test tube 281 discovery paradigm in, pharmaceutical industry 826 in academia 1144ff translational medicine 1148 in biomedical science 1143 in drug discovery 1143,1147 tRNA aminoacylated with nonnatural amino acids, import of 288 in industry 114Gf Y3H-cDNA library screening workflow in vitro cellular experiments interaction of MFCs, of kinase compound within range of, solubility inhibitors 1125 knowlegde and networks 1148 Chemical biology and drug discovery medicinal chemist chemical tools and leads, learning protein function important strategy for 355ff from experience 804
1160
I
Index
Chemical biology platforms (CBP) 789 core team appointment of knowledge management specialist 798 drug innovation and approval (DIdA) 797 drug metabolism and pharmacokinetics (DMPK) human studies and “rapid prototyping“ feed back information 796 Kinase Chemical Biology Platform first of four CBPs 798ff lead optimization organization areas of drug metabolism and pharmacokinetics (DMPK) 796 management mergers, additional complexity of 789 management challenge discovery and development cycle 789ff in implementation 789ff knowledge-driven S curve 790 modern day version of, “drug discoverer” 789 organizational structures for establishment of 796ff Chemical Complementation 199, 201, 203,205,207, 209, 211, 213, 215, 217, 219,221 Power of Genetics 199, 201, 203, 205,207, 209, 211, 213, 215, 217, 219,221 Chemical databases 760 Chemical Dimerization Technology 228 Development of 228 Dimerization Systems 229 FK1012 homodirnerizer 229 Rapamycin 229 immunosuppressive drug FK506 228 interaction FK506-FKBP 229 Chemical genetics and classical genetics perturbations, nonheritable and combinations of 316f applications and practical examples 336ff biological mechanisms small-molecule probes of 299
centrosome-duplication assay chemical-genetic modifier screens 315 chemical space dimensionality reduction and visualization of 330ff dimensionality-reduction and pattern-finding techniques 331 overview of 331 classical genetics development and refinement of 344 general considerations of 307ff genetically encoded probes, use of 345 molecular recognition code(s) 346 cluster analysis of multidimensional, chemical-genetic data 332 computational framework dendrogram showing clustering of, small molecules 332 mapping chemical space 327f cytoskeleton and cell division 305 forward and reverse chemical genetics 308 forward chemical genetics important role in 314 molecular tool box, development of 299 forward chemical-genetic discovery probes of biological mechanisms 346 gene products probes of biological mechanisms 301 targeting small molecules 301 historical and conceptual developments of 299ff history/development of 302ff image-based phenotypic screen inhibiting PI3K/Akt signaling 316 localization of GFP-tagged FOXOla 315 PI3K/PTEN/Akt signal transduction pathway, importance of 316 mapping chemical space adjacency matrix 329 using forward chemical genetics 326ff small molecules as chemical graphs 329 mRNA profiling chemical-genomic profiling 333
Index multidimensional phenotypic descriptors 330 chemical-genetic data array 330 Neurospora work one gene-one enzyme 299ff Pearson correlation matrix 332 phenotypic assays neural stem-cell differentiation 314 phenotypic assays for 312 forward chemical-genetic screening 311ff protein function, study of 371ff protein targets biologically active small molecules, examples of 303 reverse chemical genetics applications and practical examples 366 Schreiber group immunophilins and histone deacetylases, chemical biology of 366 small molecules, assaying of 347 small-molecule libraries appropriate cell-based assays 304 small organic molecules screening of 308 T-cell signaling role of calcineurin in 307 using “forward” chemical genetics 299 using signaling pathway characterizing of 304 Chemical genomics and chemical proteomics scanning proteome for 1118ff scanning proteome using bifunctional receptor ligands, outlook of 1118 Chemical glycomics automated carbohydrate synthesis 670ff carbohydrate-nucleic acid interactions 679ff carbohydrate-protein interactions 68lff for drug discovery 668ff oligosaccharide conjugate vaccines 6778 pathogenic bacteria, detection of 684ff tools for 672ff carbohydrate affinity screening 677 Carbohydrate Microarrays 674ff fluorescent carbohydrate conjugates 677
hybrid carbohydrate/glycoprotein microarrays 676 microsphere arrays 676 surface plasmon resonance (SPR) 676f Chemical graph concept of 727 Chemical Inducer of Dimerization (CID) 208,466 pairs high-affinity 210 ligand/receptor 209 to dimerize in vivo 208 transcriptional activator 208 Chemical Industry 54 CO and HL,tohydrocarbons 55 Fossil Fuel Dilemma 54 Nuclear energy 55 Present 54 Chemical library synthesis conceptual development in 319 Chemical ligation future directions of 586ff Chemical ligation reactions conditions, selection of 580f native 580f rates, enhancement of 581 requirements for 574 site, selection of 580 Chemical ligation theme native variations on 576 Chemical probes search to illuminate carbohydrate function 635ff development of 636ff history of 636ff Chemical Problems 10, 11, 13, 15, 17,45, 47, 49, 51 Antibodies 52 artificial models of living systems 12 Biological Problems 18, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 53 chemical sciences chemical biology 10 Diels-Alder Reaction 16 Historical Periods 12 ideal synthesis 11 industry efficiency 11 expediency 11
I
1161
1162
I
Index
Chemical Problems (continued) nanotechnology chemical sciences 10 Organic synthesis bottom-up strategies 10 perfect reaction 11 Proteins 45 synthetic chemist as a practicing technologist 12 Chemical proteomics affinity chromatography 378 widely used method 1119 cellular assay system strategy for, synthesis of MFCs 1128 yeast three-hybrid (Y3H) 112Off chemical proteomic initiatives alternatives to, classical protein activity profiling 11 19 compound-induced protein-protein interaction concept of 1122 interaction of, small molecule with proteins supporting cDNA library screening 1119 new cheminformatic approaches 379 organic small molecules embodying therapeutic agents, important class of 1118ff small molecule targets and future development of 1132ff three-hybrid-based (3H) technologies evolution, development, and applications of 1118 understanding, cellular targets and signaling mechanisms 1118 Y2H system interaction of bait and prey fusion proteins 1123 Chemical research computer assistance to 724 Chemical shift perturbations (CSPs) 866 Chemical Solutions 10,11, 13, 15, 17 for the construction ofmolecular skeletons 10 trusted reactions 10 Chemical space 723,725 cheminformatics and 724f concept of 726 Chemical structure basic principle of 725 encoding 729 properties of 725
Chemical Synthesis 20 lock-and-keymetaphor 20 modify target structure 21 Multicomponent 28 Preparation 20 Sequence 20 Single-component 21 target molecule synthesizing 21 with particular properties 21 Chemical synthesis 538 Chemical topology 730 Chemical-genetic modifier screens small-molecule suppressors and enhancers identification of 317 Chemical-genetic network chemical-genetic modifier screens graph-theoretic framework 336 forward chemical-genetic screen for inhibitors of mitosis 337 Chemical-genetic screens discrete methods of analysis of forward chemical-genetic data 334ff Cheminformatics 723, 958 chemical space 724f chemical structure graphs 725ff computable representations of structure 729ff molecular descriptor spaces 746ff multidimensional outcome metrics 750ff Chemistry complex biochemical milieu, compatible with high reactivity and selectivity 454 future development discipline of 421f functional analysis of proteome 421 Chemistry and biological applications biarsenical-tetracysteine method 427 biarsenical-tetracysteine protein tag 427ff protein trafficking 427f novel applications, development of 427 Chemistry and Biology 3 analysis Top-dow 3 biochemistry 4 Biological Solutions 45, 47,49, 51 Chemical Industry 54 Chemical Solutions 10, 11, 13, 15, 17 Darwinian evolution 4
fndex
interdisciplinary 3 Lessons 55 living cell as a model 3 molecular biology 4 protein synthesis 4 synthesis bottom-up 3 Chemoattractant Receptor-Homologous Molecule Expressed on T Helper Type 2 (CRTH2) 960 Chemokine 581 structure-function analysis of %Iff Chemokine receptor 948 Chemoselective coupling reaction 540 Chemoselective ligation 539 Chemoselective transthioesterification reaction 540 Chenodeoxycholicacid (CDCA) 367 Chinese Hamster Ovarian (CHO) 395 Chitin binding domain (CBD) 545 CHO, Chinese Hamster Ovarian 385,465 Cholecystokinin (CCK) 955 Chromophore-assisted light inactivation (CALI) 428 Chromophore-labeled proteins purification of hostlguest interaction 626 chromosome 77 genes 79 genetic screens Mad/Bub 80 nocodazole 80 spindle assembly 78 checkpoint 78 mitotic 78 CID, Chemical Inducers of Dimerimtion ( C I D ) 466 Classical genetics central dogma (DNA-to-RNA-to-protein) tenets of 300 chemical genetics mapping “chemical space” using phenotypic descriptors 299 vs. chemical genetics 301 genetic maps,creation of 299 Cleavage Plane 80 in Cytokinesis 80 Mad2 81 model 81 Monastrol cytokinesis 80 inhibitor 80 Positioning 80
CLF, Chain lengthfactor (CLF) 520 CMC, Comprehensive Medicinal Chemistry (CMC) 760 CNS, Central nervous system ( C N S ) 379 CoA, Coenzyme A (CoA) 694 Coactivator-associatedarginine methyltransferase 1 (CARMI) 914 Colchicine and Tubulin 72 aneuploidy 72 chromosome movements 72 colchicine binding activity 74 labeled with H3 74 microtubules 74 mitosis 72 spindle fiber dynamics 73 mitotic 72 taxol 74 vinca alkaloids 74 Column chromatography 484 combinatorial approach large variations of related molecules 33 Combinatorial chemistry building blocks growing accessibility of 378 compound libraries natural product guided compound library development 362 in silico scaffold hopping, and biological scaffold morphing kinase-directed drug discovery 840 isoform selective inhibitor roles of isoforms 370 privileged fragments DFG-out conformation 838 peptide-binding GPCR antagonists 839 target family oriented libraries, design of 838 Combinatorial library system using CDK2 protein crystals 845 Combinatorial synthesis 487ff Combinatorialization power of 487f CoMFA, Comparative molecularjield analysis (CoMFA) 950 Competitive antagonism 939 Complementary DNA (cDNA) 1084 Complex proteomes ABPP strategies for in vivo analysis of enzyme activities 418f
I
1163
1164
I
Index
Complex proteomes (continued) activity-based probes functional role o f , cysteine proteases 416 activity-based protein profiling (ABPP) comparative and competitive ABPP, applications and practical examples 415ff general considerations of 407ff schematic of, representative protease posttranslational regulation mechanisms 407 activity-based protein profiling (ABPP), expanding scope of 419ff bio-orthogonal chemical reactions enabling ligation of, reporter tags onto proteins 419 comparative profiling for discovery of enzyme activities 415ff competitive ABPP for potent and selective reversible enzyme inhibitors 417f 1DE gel-based methods for employing gel-based or gel-free strategies 422 probe-labeled proteomes 422 enzyme activities global profiling of 407 general method for, performing AB PP 419 in vivo model of, human cancer-breast cancer xenografts 416 inhibitor discovery by ABPP reversible inhibitor library, and activity-basedprobe 418 papain-directed ABPP probes inhibitor screening 418 probe-enzyme reactions molecular basis for 421 SH superfamily, of enzymes 415 Complex signaling networks molecular composition of 1049 Compound libraries synthesis of 378 Comprehensive Medicinal Chemistry (CMC) 760 Computational chemistry 724 Computational Chemistry (CC) 1003 Computational permeability models accuracy, factors influencing 1030ff Computational tools 3D-pharmacophore searches and high-throughput docking 362
small molecule protein target, identification of 362 small molecule probes computer-assisted drug design 362 Computer chemistry 724 Computer-assisted drug design (CADD) 958 Computer-encodable structure representation classes of 730 Concanamycin, see Bafilomycin Conditional protein splicing (CPS) 557, 559 Congenital disorders of glycosylation (CDG) 635,649 Conklin receptors activated solely by synthetic ligands (RASSL)approach 365 Connection tables 730 Conotoxins nAChRs, chemical biological study of 376 Constitutively activating receptor technology (CART) 948 Core team and strategy teams (CBP strategy teams) responsibility of downstream implications 799 Corepressor activity diminishing accessory proteins, role of 914f interference in NF-KB and AP-1 pathways 915 CoRNR, Corepressor nuclear receptor (CoRNR) 914 Correcting Errors 81 anaphase 84 attachment errors 83 syntelic 83 Aurora kinase inhibitors 81 Reversible 81 small molecule 81 dynamics microtubule fibers 83 mitosis timescales 84 oncogenesis 83 Corticotrophin releasing factor (CRF) 95s Cowpea mosaic virus (CPMV) 620 COX, Cychxygenase ( C O X ) 792 CP, Carrierprotein (CP) 471
Index CP-fusion proteins labeling of as tool to study cell surface proteins 470ff CPMV, Cowpea mosaic virus ( C P M V ) 620 CPS, Conditional protein splicing (CPS) 557,559 CRABP-1, Cellular retinoic acid binding protein ( C R A B P - I ) 442 CRABP-11, Cellular retinoic acid binding protein 11 (CRABP-lZ) 369 CRDs, Carbohydrate recognition domains (CRDs) 641 CREB, C A M Presponse element binding (CREB) 313 CRF, Corticotrophin releasingfactor (CRF) 955 Critical circadian rhythm hormone 394 CRKs, CDK-related kinases (CRKs) 1130 CRLR, Calcitonin receptor-like receptor (CRLR) 948 Cross-reactive sensor analysis 685 Cross-validation 1013 CRTH2, Chemoattractant Receptor-Homologous Molecule Expressed on T Helper Type 2 ( C R T H Z ) 960 Crystallography 583 binding modes, investigation of 844 CsA, Cyclosporine A (CsA) 304 CSPs, Chemical shiJ perturbations (CSPs) 866 CTLA-4, Cytotoxic T lymphocyte-associated protein 4 (CTLA-4) 1108 CTLDs, C-type lectin-like domains (CTLDs) 643 Curcuminoids 105 isolated from turmeric 105 Current Patents Fast Alert 760 Cyan fluorescent protein (CFP) 428 Cyclic adenosine monophosphate (CAMP) 312,938 Cyclic guanosine monophosphate (cGMP) 373 Cyclic peptides 556 Cyclin-dependent Kinase 2 (CDK2) 845 Cyclin-dependent kinases (CDKs) 1130 Inhibitors 99 Purine Analogs 99 Cyclooxygenase (COX) 792 Cyclosporin A (CsA) and FK506 107 biological activity same phenotypic 107
different potencies 107 structurally different 107 Cyclosporine A (CsA) 304 Cys residue 547 Cysteine modification of 597 uniquely reactive cysteine group using site-directed mutagenesis 596 Cysteine protection 546 Cysteine residue chemical modification of 386 Cytochalasin and Actin 74 actin filaments 75 cytochalasin phenotype 75 direct link 75 microfilaments 75 Cytochrome P450 interactions 1005 Cytoplasm apoptosis, programmed cell death release of, mitochondria1 cytochrome 441f Cytotoxic T lymphocyte-associated protein 4 (CTLA-4) 1108
d
DAB, Diaminobenzidine(DAB) 449 Darwinian Era 18 genotype 19 natural selection rested on analogy 18 Origin of Species 18 phenotype 19 DBD, D N A binding domain ( D B D ) 895, 1122 DC-SIGN, Dendritic cell-spec$c intracellular adhesion molecule-3-grabbino-non-integrin (DC-SIGN) 643 2DE, Two-dimensional electrophoresis (2DE) 405 DEBS, 6-Deoxyerythronolide B Synthase (DEBS) 523 Deciphering human genome challenges of 801 Dehydratase (DH) 522 Dendritic cell-specific intracellular adhesion molecule-3-grabbino-non-integrin (DC-SIGN) 643 Deorphanization 947ff 6-Deoxyerythronolide B Synthase (DEBS) 523 schematic diagram of 524 system, manipulation of 529
I
1165
1166
I
hdex Deoxyribonucleic acid (DNA) 300,576 Depsipeptide HDAC inhibitors completion of, total syntheses of FK228 and FR901,375by Mitsunobu macrolactonization 710 total synthesis of macrocyclizations, and completion of synthesis 709ff Derived from Natural Repressors 175 IPTG stable synthetic analog 175 lac binds to operons 175 LacR-VP16 chimera 176 Ligand-dependent 175 activators 176 repressors 176 Tet-On 176 tetracycline 175 DES, Diethylstilbestrol (DES) 905 Descriptors 1030 1-D 1017 2-D 1017 3-D based 1017 biological 501 hydrophobic 1026 physicochemical 501 structural 501 used for permeability predictions 1026ff Desensitization 939 Desogestrel total synthesis 25 Dess-Martin Periodinane (DMP) 607 Desulfination 547 Desulfurization reaction 546 DEX, Dexamethasone ( D E X ) 1122 DH, Dehydratase (DH) 522 DHFR, Dihydrofolate reductase ( D H F R ) 460,1123 DHNA, Dihydroneopterin aldolase ( D H N A ) 844 Diaminobenzidine (DAB) 449 Diarylpropionitrile (DPN) 368 Diazonium salt coupling reactions introduction of, new functional groups 598 tyrosine residues, modification of using electron-deficient 599 Dictyostelium discoideum amoeboid migration 1070 DIdA, Drug innovation and approval ( D I d A ) 706 _-
Diels-Alder Reaction 16 Prototype of a SyntheticallyUseful Reaction 16 steroid synthesis 17 in the synthesis of steroids 16 structurally complex natural products 16 Diethylstilbestrol (DES) 905 Diffusion ordered spectroscopy (DOSY) 860 Difluoromethylene 389 Dihydrofolate reductase (DHFR) 460, 556,1123 Dihydroneopterin aldolase (DHNA) 844 2,3-dimercaptopropanesulfonate(DMPS) 453 Dimerization Systems 229 Homodimerization 229 Reverse Dimerization 235 Transcription 235 Dimethyl dioxirane (DMDO) 671 Dimethylformamide (DMF) 539, 569 Dimethylsulfoxide (DMSO) 572 Discoverygate 760 Disease biology complete human-genome sequence single-gene Mendelian disorders 300 Disulfide bonds modification of using metallocarbenoids 605ff Dithiothreitol (DTT) 438, 602, 704 Divalent ligands 955 Diversity-orientedsynthesis (DOS) 483ff applications and examples for 502ff assessing library diversity 501f chemical and biological space 496 chemical methodologies for 502 of combinatorial libraries early efforts in 495 development of 484ff early efforts in 492f future development of 514 general considerations in 496ff history of 484ff libraries design strategies 496ff screening of 502 separation techniques in 487 synthetic strategies 499ff planning 499ff DMDO, Dimethyl dioxirane ( D M D O ) 671 DMF, Dimethylfomamide (DMF) 569
Index
DMP, Dess-Martin Periodinane ( D M P ) 607 DMPK, drug metabolism and phamacokinetics ( D M P K ) 796 DMPS, 2,3-dimercaptopropanesulfonate ( D M P S ) 453 DMSO, Dimethylsulfoxide ( D M S O ) 572 DNA, Deoxyribonucleic acid ( D N A ) 300, 576,668 DNA binding domain (DBD) 895 DNA-Protein Interaction 204, 218 AD-cDNA fusion 205 genes olfactory-specific 205 one-hybrid assay 204 phage display 219 transcriptional activators 218 two-hybrid assay into one-hybrid system 218 zinc-finger evolution 219 DOS, Diversity-oriented synthesis ( D O S ) 48 3 DOSY, D i f i s i o n ordered spectroscopy ( D O S Y ) 860 DPN, Diarylpropionitrile ( D P N ) 368 DRIP, Vitamin D receptor-interactingprotein ( D R I P ) 914 Drosophila phenotypes 937 Drospirenone combinatorial acceleration of preparation 28 screening 28 leading position in hormonal contraception 27 synthesis 27 unnatural biologically 27 Drug delivery applications chemical groups on entrance of protein into reducing environments 597 Drug development inhibition of HDACs beneficial effect in, repressing hypertrophy 698 reasons for attrition in 1005 Drug discovery approaches to C-terminal 891 biological models discovery of, penicillin-resistant Streptococcus pneumoniae 794 novel anti-infective drug 794f
biomolecular N M R spectroscopy 855 chemical glycomics for 668ff COX-2inhibitors development of, celecoxib (Celebrex) 792 enzyme, identification of 792 drugs target N R account, in pharmaceutical sales 90 1 gene-family approach for protein classes 852 histone deacetylases (HDACs) outstripping histone acetyltransferases (HATS) 696 isolating and synthesizing active ingredient and pharmacological experiments in parallel 793 mechanism-based discovery background 793f propranolol, interesting development of 793 new rules for 379 N M R spectroscopy different stages of, pharmaceutical research 855f N R drug discovery tissue-selective benefits 916 tissue-selective benefits, examples of 917 N R drugs, brief history of 901ff N R function binding druglike small molecules 895 N R LBD fold, of three stacked a-helical sheets 892 N R superfamily reverse endocrinology approach 903 NR-targeted drug discovery history of 901 nuclear receptor structure/function, features of 891 nuclear receptor superfamily classic steroid receptors 897 domain organization of 893 features of 891ff general mechanisms of, N R function 896 key methodologies, for nuclear receptor-targeted drugs 891 representative structures of, N R functional modules 895
I
1167
1168
I
Index
Drug discovery (continued) observation-based discovery background 791ff organic acids, ibuprofen and diclofenac 792 penicillin discovery in historical approach 792 recent N R drugs and novel drug candidates 916ff small molecules new protein discovery, role in 360 target validation, critical factor in 355 traditional approach differences between 802f traditional drug discovery differences between 802 validated disease target “common mechanism” target 790 Drug discovery research understanding of molecular targets of, drug or drug candidate 1118 Drug innovation and approval (DIdA) 796 organization of Aventis centers of expertise in 791 units of innovation 796 Drug metabolism and pharmacokinetics (DMPK) sharing of knowledge and improved attrition rate 797 Drug molecule binding energy affinity of 806 hydrophobic surface to, binding energy “magic methyl” 806 Drug targets accessible to, protein therapeutics 817ff approved drugs molecular targets of 811 COX-2 inhibitors withdrawal of drugs 355 NCE approvals, antibody taking over 818 physicochemical constraints of 807 whole genomes sequencing of 355 Drug-like libraries 496f Drugbank 760 Druggability druggability argument 804 druggability hypothesis molecular recognition, basis of 805ff
medicinal chemists and chemical biologists predicting molecular basis of 804ff predictions using nuclear magnetic resonance (NMR) 808 Druggability prediction method human genome, accessible to protein therapeutics 819 predictions of, human druggable genome size 818 Druggable genome draft human genome systematic survey of 809 drug targets feature-based druggability prediction 816 initial estimates of 809 druggable-binding sites structure-based druggability analysis of, PDB Structures 816f Drugstore and StARLITe 81 1 estimating size of 808ff gene family distributions small-molecule druggable genome, and protein therapeutics 820 homology-based analysis of, drug targets 810ff Hopkins and Groom’s method systematic survey of 809f Orth druggable gene families, Interpro domain assignments 810 protein sequence uncompetitive allosteric-binding sites 808 Russ and Lampel’s Update 2005 810 sequence and structural levels 808 Druglike compounds fast Ertl method with 2D approximation 807 relationship between, molecular weight and molecular surface area 807 Drugs discovery 979 proposed decision tree 984ff Drugs and leads feature-based probabilistic druggability analysis 809 homology-based analysis, comprehensive survey of 808 structure-based amenability analysis 809
Index
DTT, Dithiothreitol (DTT) 438, 602, 704 Dynamic Variation 34 activity (inhibition) 40 afinity (binding) 40 activity of a conjugate triplet 44 single molecular species 44 Base-pairing dynamics of single strands a, b, and c 35 binary complexes R A , R:B, and R C 39 conjugates A, B,andC 37 equilibria 37 three sets 37 dynamic system heterobifunctional character 45 receptor profiling 45 enzyme-binding experiment 40 exchangeability of effectors 40 receptor 40 experimen t enzyme inhibition 43 screening 43 inhibition competitive (ACB:R) 41 mixed (ACB:R+ACB:R:S) 41 uncompetitive (ACB:R:S) 41 inhibitory activity color coding 43 degree of 43 interactions equilibria 38 receptor R 37 specific 37 triple peptide combinations 37 nonbiogenic substance dendrimers 44 in place of the peptides 44 pairing equilibrium constants 36 ternary complexes acb 36 Preparation 34 pyranosyl-RNA (p-RNA) single strands a , b , a n d c 35 intobinary 35 into ternary supermolecules 35 self-assembly 35 quaternary complex R A C B 39 Screening 34 stoichiometry for maximum activity 43
substitution equilibria conjugates exchanged 38 substitutions binary 39 pathways 39 substrate S fluorescence-labeled 43 ternary complexes R A B , R A C , and R B C 39 Dyslipidemia 949 e e-NOS, endothelial Nitric Oxide Synthase ( e - N O S ) 368 E. coli 211 assays alternate 21 1 transcription-based 21 1 bacterial three-hybrid 2 13 two-hybrid 212 doubling rate 211 pathway lytic/lysogenic 212 proteins heat shock 213 Transcription Activation Assays 211 yeast proteins G a l l 1 212 Gal4 212 interacting 212 E. coli dihydrofolate reductase (eDHFR) 1126 Ebola virus viral coat proteins trafficking of 439 EDG, Endothelial differentiation gene ( E D G ) 942 eDHFR, E. coli dihydrofolate reductase (eDHFR) 1126 Edman sequencing 488 EDT, I,2-Ethanedithiol (EDT) 429 EF-Tu, Elongationfactor (EF-Tu) 271 EGF, Epidennal growth factor (EGF) 938, 1065 EGFP, Enhanced Green Fluorescent Protein ( E G F P ) 466 Ehlers-Danlos syndrome progeroid-type 649 Elan pharmaceuticals MVIIa Ziconotide (PrialtTM) novel nonopioid drug 376
I
1169
1170
I
Index
Electron microscopy (EM) fluorescently labeled proteins, imaging of 451 gap junctions of connexin43-tetracysteine 451 ReAsH-mediated photoconversion of diaminobenzidine for correlated fluorescence 451f Electron paramagnetic resonance (EPR) 454 Electrophoretic mobility shift assays (EMSA) 513 Electrospray ionization 670 Electrospray ionization mass spectrometry (ESI-MS) 569 Electrotopological indices 1027 ELISA, Enzyme-linked immunosorbent assays (ELISA) 513,637,989 Elongation factor (EF-Tu) 271 Electron microscopy (EM) 451 EMSA, Electrophoretic mobility sh$ ussuys (EMSA) 513 Enabled VASP homology type 1 (EVH1) 969 Encephalopsin 944 Endocrinology controlling activities and processes, act of 891-901 controlling activities and processes NR superfamily, a phylogeny plot 892 ligand-bound NR relays, and ligand and celltype 891 Endoplasmic reticulum (ER) 465 Endothelial differentiation gene (EDG) 942 endothelial Nitric Oxide Synthase (e-NOS) 368 Engineered Nuclear Receptor 185 Potential 185 Engineering Uniquely Inhibitable Kinases 126 Engineering Control 174, 175, 177, 179, 181,183,185,187,189 ligand naturally occurring 174 ligand-dependent multiple 174 transcription 174 Over Protein Function 174, 175, 177, 179,181,183,185,187,189 proteins denovo 174
Transcription Control by Small Molecules 174, 175, 177, 179, 181, 183,185,187,189 Transcriptional Regulators 175 Enhanced Green Fluorescent Protein (EGFP) 466 Enol reductase (ER) 522 Enolpyruvyl uridine diphosphate N-acetylglucosamine (EP-UNAG) 655 Enzyme activity enzyme-catalyzed reactions protein-protein and protein-lipid complexes, assembly of 1061 signal transduction modeling intracellular processes 1061 outlook of 1061 Enzyme classes cysteine proteases useful pharmacological agents 417 nondirected ABPP - probe design for 410ff Enzyme families nondirected strategies bona fide activity-based probes for 411 Enzyme inhibitors 979 Enzyme mechanisms domain folds, on molecular level 826 Enzyme recruitment slow diffusion of, membrane-associated substrates gradients on, molecular scale 1070ff Enzyme-linked immunosorbant assay (ELISA) 513,637,989 Enzymes 385 ABPP, proteome coverage of probe-labeled 422 complex physiological and pathological processes 421 enzyme classes whole proteomes, active site profiling in 421 enzyme superfamily cryptic members, of enzyme classes 42 1 database (BLAST) searches 420 sequence-unrelated members, class assignment of 420f histone deacetylases conserved group of 696f individual human HDAC enzymes 696
Index
histone modifying enzymes nonhistone proteins, regulated by acetylation status 697 history and outlook of 693 EP-UNAG, Enolpyruvyl uridine diphosphate N-acetylglucosamine(EP-UNAG) 655 Epidermal growth factor (EGF) 938, 1065 Epigenetic mechanisms histone acetylation, schematic representation of model for transcriptional control 695 EPL, Expressed protein ligation 385 Epothilone 519 cY,B-Epoxyketones 102 Bafilomycins and Concanamycins 103 chemokines 103 chemotaxis 103 covalent inhibitors 102 downmodulation mechanism 103 eponemycin 102 Epoxomicin 102 EPR, Electron paramagnetic resonance (EPR) 454 ER, Endoplasmic reticulum (ER) 465, 522, 902 ER, Enol reductuse (ER) 522 ER, Estrogen receptor (ER) 559,902 Erythroid progenitor cells 1049 Erythromycin 519 ESI-MS, Electrospray ionization mass spectrometry (ESI-MS) 569 EST, Expressed sequence tags (EST) 378, 902,944,1084 Ester-containing linker 671 Estrogen receptor (ER) 559,902 1,2-Ethanedithiol (EDT) 429 Eukaryotes 648 examples of, posttranslational modifications at histone tails 695 gene-silencing mechanism CpG residues, methylation at 694 genomic DNA of 694 Eukaryotic 177 heat-shock protein 178 hormone Steroid 178 receptors ecdysone 179 endogenous 179 reprogram ligand-binding 177 Reprogramming 177
specificity gene targeting 177 Eukaryotic HDACs difficulty of expressing 699 EVH1, Enabled VASP homology type 1 (EVHI) 969 Evolutionary Thinking 18 Darwinian Era 18 Darwinian evolution accepted as a reality 19 post-Darwinian Era 19 pre-Darwinian 18 quasispecies 19 Role of 18 Shaping Biology 18 Expanding By Design 51 By Natural Selection 50 Experimental design and purification schemes affinity-based purification of, small molecule targets 1120 issues of general considerations 1085ff Exploit fusion proteins chemical approaches to 458ff applications and examples of 463ff future developments of 476f general considerations of 459ff Expressed protein ligation (EPL) 387, 390, 537ff applications of 548ff bottleneck of 542 general considerations in 542ff genesis of 538ff and ligation reaction 545 and protein transsplicing 556 reactions, one-pot 548 segmental isotope labeling 555 semisynthetic nature of 550 use of, in future developments 560 Expressed Sequence Tags (EST) 378, 902, 944,1084 Exteins 540
f
FACS, Fluorescence activated cell sorter (FACS) 435 FAD, Flavin adenine dinucleotide (FAD) 655 FAP-1, FAS-associatedphosphatase I (FAP-I) 1108 Farnesoid X receptor (FXR) 366, 511, 903 FAS, Fatty acid synthesis (FAS) 471
I
1171
1172
I
Index
FAS-associated phosphatase I (FAP-1) 1108 Fatty acid synthesis (FAS) 471 FCS, Fluorescence correlation spectroscopy (FCS) 361 FDA drugs molecular targets of drug substances and drug targets, in gene family 812 FDC-PET, Fluorodeoxyglucose positron-emission tomography (FDG-PET) 304 Fetal liver kinase-1 (Flk-1) 771 Fexaramine 511-512 FITC, Fluorescein isothiocyanate (FITC) 446 FKBP, FKS06-binding protein (FKBP) 470 FKBP12-rapamycin-associated protein (FRAP) 303,1120 FlAsH-tetracysteine complex fluorescence anisotropy of four arsenic-sulfur bonds 446 FlAsH-tetracysteine complexes fluorescent properties, and stability of FlAsH bound to, peptide with higher affinity 434 Flavin adenine dinucleotide (FAD) 655 Flavopiridol (FLV) 100 mechanisms 100 rohitukine 100 semisynthetic 100 Fleming, Alexander lysozyme discovery 793 FLIPR, Fluorescent imagingplate reader (FLIPR) 312,947 FLIPR duplex calcium mobilization assays 963 Flk-1, Fetal liver kinase-2 (Flk-I) 771 Flow cytometry 677 Fluorescein isothiocyanate (FITC) 446 Fluorescence activated cell sorter (FACS) 435 FRET or ReAsH fluorescence with pooling or single-cell collection options 436 Fluorescence and Electron microscopy (EM) ReAsH-mediated photoconversion diaminobenzidine, for correlated fluorescence 452 Fluorescence correlation spectroscopy (FCS) 361 Fluorescence imaging plate reader (FLIPR) 312
Fluorescence labeling 465 Fluorescence microscopy 677 Fluorescence polarization (FP) 361 Fluorescence resonance energy transfer (FRET) 291, 361,428,466,511,549, 596,685,871,1132 Fluorescent carbohydrate 668 Fluorescent carbohydrate conjugates 677 Fluorescent imaging plate reader (FLIPR) 947 Fluorescent Probes 548ff Fluorescent proteins 548 Fluorescent spectroscopy 548 Fluorodeoxyglucose positron-emission tomography (FDG-PET) 304 9-fluroenylmethoxycarbonyl (Fmoc)-based SPPS 543 Fluorophore-labeled carbohydrate-binding protein 676 Fluorophores 549 biarsenical derivatives of 432 tetracysteine motifs, requiring 433 Fluorophosphonate (FP) 409,410 Fluorous tags 485 FLV, Flavopiridol (FLV) 100 Fmoc (fluorenylmethoxycarbonyl) 671 Forward chemical genetics chemical-genetic screens overlapping distance measurements 326 computational framework chemical-genetic screens 326 Morgan and Sturtevant, legacy of 325f small-molecule probes for, biological mechanisms 348 target identification problem 319ff Fosfomycin 652,653 FP, Fluorescence polarization (FP) 361, 409 FP, Fluorophosphonate (FP) 409,410 Fragmentation codes 730 FRAP, FKBPZ 2-rapamycin-associated protein (FRAP) 303,1120 Frenolicin 525 FRET, Fluorescence resonance energy tranSfer (FRET) 291,361,428,466,511,596, 871,1132 a-L-Fucosidase (ALFUC) 369 FucT-VII, Fucosyltransferase VII (FucT-VII) 1102 Fumagillin A. fumigatus 105 drug candidate TNP-470 105
Index mechanism chemical biology ofaction 106 molecular informatics, contribution of p21 ' I p / WAk 106 959f TNP-470 106 deorphanization Functional genomics strategies for 947ff central aim of 302 designing compound libraries 954ff Functional Orthogonality 180 endo- 943 ligand-receptor pair family A 937 modified 180 family B 937 Requirement of 180 family C 937 Functional proteomics future developments of 9688 glycoprotein hormone 937 activity-based probes HTS, advantages in 96lff enzyme activity profiles 408 activity-based protein profiling (ABPP) human classification of 937 chemical ABPP probes 408 families of 935 directed ABPP - probe design for and other genomes 943ff enzyme classes 409 monoamine ligands 957 directed versus nondirected strategies monoamine-related 408 combinatorial library for 966ff general strategy for 409 ligand binding sites model for 951 integrity of, enzyme active sites 408 olfactory 937 chemical probes reporter gene activity-based probes 408 easy-to-measure surrogate for gene chemical proteomic strategy product 313 active site-directed chemical probes signaling of 940 404 small molecule/peptide hormone 937 click chemistry-based ABPP 419 structural biology of 949ff second bio-orthogonal reaction, thematic analysis 956 Staudinger ligation 419 top selling drugs covalent inhibitors chemical structures of 934, 935 combinatorial, or nondirected strategy Venus flytrap module (VFTM) 937 forABPP 410 G protein-coupled receptor 4 (GPR4) 949 serine hydrolase (SH) G-protein-coupled receptor interacting fluorophosphonate labeling of 410 proteins (GIPs) 943 Fusion proteins G-protein-coupled receptor kinase (GRK) CP-based labeling of 473 942 Future Development 222 G-protein transducin 941 dynamics GABA8, y-aminobutyric acid type B analyzing 223 (GABAB) 944 in living cells 223 Galectin-3 total protein 223 bound to N-acetyllactosamine 642 genetics 223 structure of 642 FXR, Fametoid X Receptor ( F X R ) 366, Galectins 641Ff 511,903 multivalency 643 y-aminobutyric acid type B (GABA8) 944 y-lactone aminolysis 499 g G protein-coupled receptor (GPCR) 312, Ganesan and Doi-Takahashi 428,471, 647, 796, 809, 826, 852, procedures for 933 enantioselective acetate aldol active compounds reactions, with aldehyde 707 examples of 956 syntheses of applications and examples of 9608 spiruchostatin A seco acids 709 biological expression of 960f Gastrointestinal (GI) absorption 1005
I
1173
1174
I GE-HTS, Gene expression-based highIndex
Gene therapy targeted nuclear acid repair throughput screening (GE-HTS) 313 assay for 442 Gene expression Genes selected putative target, based on chemical events differential gene expression 795 regulation of 300 Gene expression omnibus (GEO) 1096 genes 79 Gene expression profiling Bub 78 using microarrays Mad 78 new technology, history and Genetic approaches development of 1084f forward chemical genetics Gene expression-based high-throughput phenotype of interest, relies on screening (GE-HTS) 313 309 Gene family protein targets and genetic pathways, molecular targets with, chemical leads identification of 310 and tools 813 forward genetics redundant ortholog targets 813 classical genetic approach 309 Gene microarrays novel gene products, identification of complementary oligonucleotide 309 hybridization use of, phenotype-based screening inherent specificity of 405 308 Gene ontology (GO) 818 forward versus reverse chemical genetics Gene profiling small molecules and phenotypic genome-wide gene expression analysis assays 310 outlook of 1083 new small-molecule modulator of practical considerations and gene product 311 application to 1083ff microarray analysis reverse chemical-genetic approach for dissecting biological systems data analysis, principles of 1089ff 311 delineating of, biological pathways involved in a process 1090 reverse chemical-genetic screen starting point, protein of interest pattern-recognition algorithms, identifying gene expression profiles 311 1091 reverse genetics supervised methods, using “training phenotypic consequences of, mutations in known gene 309 set” 1092 support vector machines (SVMs), use Genetic Code Cracking 50 of 1092 public databases for Expanding 50 Genetic Disease 186 gene expression data 1095f Complementation/Rescue 186 T-cell subsets compounds application and practical examples of Computer-aided design 188 1097ff that rescue mutations 188 unsupervised learning approach hormone K-means clustering 1091 analogspecific forms 186 Gene profiling T helper cell differentiation nuclear/steroid 186 Thl and Th2 cells, developing from receptors 186 common precursor 1098 hormone analogs Gene regulation designed 187 altered patterns of, protein expression 694 interface receptor-hormone 187 epigenetic mechanisms of 694ff mutations and role of, activity enhancing accessory genetic disease 186 proteins 913f in nuclear receptors 186 Gene regulatory networks 1046
Index Genetic diversity chemical mutagens ethylnitrosourea capable of, inducing point mutations 318 genetic vs. chemical diversity phenotypic variation, sources of 318f Herman J. Muller heritable mutations, in Drosophila 318 Genomic age generating information and approximate upper or common mechanism curve 796 Genomic approach mRNA transcript levels, reliance on 404 Genomics unified schema (GUS) 1096 GEO, Gene expression omnibus (GEO) 1096 GFP, Greenrfluorescent protein (GFP) 314, 458,612 see Greenfluorescent protein (GFP) 427 GHRF, Growth hormone releasingfactor (GHRF) 955 GHS, Growth hormone secretagogue ( G H S ) 950 GIPs, G-protein-coupled receptor interacting proteins (GIPs) 943 GITR, Glucocorticoid-induced tumor necrosis factor receptor (GITR) 1108 Global organizations CBP project scenario for 795 observation summary and future application 795f Glucocorticoid receptor (GR) 467,902, 1122 Glucocorticoid response element (GRE) 913 Glucocorticoid-induced tumor necrosis factor receptor (GITR) 1108 Glucose signaling 505ff Glutathione S-transferase (GST) 446 Gluthation S-transferase fusion protein 859 GlyCAM-1 551 Glycan biosynthesis inhibitors of 651 Glycine 554 Glycoarrays 636 Glycobiology tools for 674 Glycocalix 669
Glycoconjugate biosynthesis 635 importance of 649 Glycoconjugates 636, 658, 668, 669 N-linked 649 Glycogen Synthase Kinase-3B (GSK-3B) 509 Glycomimetics 641, 647 carbohydrate-derived 639 strategies for 640 Glycoprotein microarrays 676 p-Glycoprotein protein (pgp)-1 714 Glycoproteins Hedgehog 937 Wnt 937 Glycosidic linkages 639 Glycosyl phosphate monomers 671 Glycosyl phosphates 671 Glycosyl trichloroacetimidates 671 Glycosylating agents 671 Glycosylation 550 Glycosylation reactions 671 Glycosylphosphatidylinositolis (GPI) 678 Glycosyltransferase loss of 635 Glycosyltransferases 668 GO, Gene ontology ( G O ) 818 Golgi-ER 85 dynamic nature of 87 invitro 87 transport 87 GPCR, G-protein coupled receptor (GPCR) 312,428,471,796,809,826,852,933 GPR4, G protein-coupled receptor 4 (GPR4) 949 GR, Glucocorticoid receptor ( G R ) 467, 902, 1122 Grave’s disease 969 GRE, Glucocorticoid response element (GRE) 913 Green fluorescent protein (GFP) 314, 427,458,548,612 FRET sensors of biochemical pathways replacing CFP with FlAsH 440f relative sizes of and biarsenical-tetracysteine complex 428 GRK, G-protein-coupled receptor kinase (GRK) 942 Growth hormone releasing factor (GHRF) 955 Growth hormone secretagogue (GHS) 950 GSK-3,9, Glycogen Synthase Kinase-3B ( G S K - 3 B ) 509
I
1175
1176
I
Index
side chains GST, Glutathione S-transferase (GST) 446, alkyl or aryl 260 859 terphenyl derivatives GTPases to XTPases 128 cylindrical shape 262 mutation with side chains 261 aspartate to the asparagine 129 staggered conformation 262 D138N 130 structural mimetics 261 nucleotides synthetic inhibitor 261 radiolabeled 130 Terphenyl-based 260 orthogonal nucleotide Heparin 681,683 specificity 129 Heparin-protein interactions 684 translation experiments Hepatocyte nuclear factors 4 (HNF4s) 906 invitro 129 Human Ether-a-Go-Go-Related Gene hERG, Guanidinoglycosides 681 @ERG) 1005 GUS, Genomics unijied schema ( G U S ) Hetero-oligomers 981 1096 Heterodimerization 230, 949 Ligand-Protein Pairs 231 h rapamycin H1 histamine receptor 778 heterodimerizer 230 Halobacterium halobium 941 Heterodimerizers 233 HATs, Histone acetyltransferases (HATs) bump- hole 694 solutions 234 HDAC, Histone deacetylase ( H D A C ) 505, Bumped 233 693f, 914,1131 Ma-rap Heat shock proteins (hsps) 896 in vivo 235 Hedgehog signaling pathway 509 preclude 235 HeLa cells Rapalogs 233 FlAsH fluorescence C16-substituted 234 specificity of FlAsH staining 444 rapamycin turnover of, Connexin43 in gap C l 6 methoxy 234 junctions C20-methallyl 234 two-color pulse chase 443f Heterodimers 944, 948 Helical Mimetics 260 HF, Hydrofluoric acid ( H F ) 569 a-helix mimetics Hidden Markov Model (HMM) 959 BH3 domain 261 High performance liquid chromatography of the Bak protein 261 (HPLC) 369,569 assay orexin-A and orexin-B, existence of 369 fluorescence polarization 261 High-throughput screening (HTS) 355, that Disrupt the Bcl-xL/Bak Interaction 484,724,760,933,947,1003 260 Histacin 505, 508f HEK293 cells 262 Histone acetyltransferases (HATs) 694 pathway Histone deacetylase (HDAC) 96, 505, apoptotic 261 SOSf, 694,914,1131 blocking 261 Apicidin 98 protein Inhibitors 96, 508 p53 263 Modifications 96 tumor suppressor 263 Trapoxin 98 protein surface Trichostatin A (TSA) 97 shallow cleft 261 Historical Periods 12 scaffold advancements synthetic agents 260 discontinuities 12 terphenyl 260 of Chemical Synthesis 1 2 secondary structures firstphase 12 a-helical 260 pre- Woodwardian 12
Index scientific technological 12 Woodwardian 14 HIV, H u m a n immunodeficiency virus ( H I V ) 583 HIV Protease (HIV PR) 116 drugs indinavir 116 nelfinavir 116 Inhibition 116 mutants HIVPR 118 inhibitor resistant 118 V82A 118 mutation alanine-to-valine 118 coevolve 119 in the enzyme 119 at the NC-pl cleavage site 118 atP2 118 in the substrate 119 Substrate Selectivity 116 HIV-1, H u m a n immunodeficiency virus type 1 ( H I V - I ) 445 HIV-1 matrix protein synthesis with an N-terminal myristoyl 584 HMM, Hidden Markov Model ( H M M ) 959 HNF4s, Hepatocyte nuclearfactors 4 ( H N F 4 s ) 906 HOBT, Hydroxybenzotriazole ( H O B T ) 595 Homer scaffolding proteins 969 Homo-oligomers 981 Homodimerization 229 clustering order 230 FK1012 design 230 Heterodimerization 230 Homodimerizers 233 AP1903 i n vivo studies 233 affinity 233 selectivity 233 Bumped 233 Homogeneous time resolved fluorescence (HTRF) 361 Hopkins and Groom Investigational Drugs Database and Pharma Projects database 399 nonredundant molecular targets, identification of 810 identification of, 399 nonredundant molecular targets 809
potentially druggable proteins in druggable gene families 809 H PLC, High performance liquid chromatography ( H P L C ) 369,434, 569, 954 hsps, Heat shock proteins (hsps) 896 HTRF, Homogeneous time resolved Juorescence ( H T R F ) 361 HTS, High-throughput screening ( H T S ) 355,484,933,947,1003 Human enzymes human histone deacetylase (HDAC) inhibitors depsipeptide HDAC inhibitors 703f Human Ether-a-Go-Go-RelatedGene (hERG) 1005 Human genome computer-aided drug design methods docking compounds into binding pockets 368 deorphanizing receptors by reverse pharmacology 369f finished euchromatic sequence of 1084 high-throughput synthesis and screening, and structure-driven drug design 825 Hopkins and Groom druggable target, estimating size of 808 isotype-selective small molecule probes computational design of 367ff isotype-selective probes for E R a and ERB 368 methodologies and approaches for druggable portions of targets 808 orphan nuclear receptors isotype-selective small molecule probes for 366f reverse chemical genetics sequencing of 378 reverse pharmacology strategy of 370 selective tool compounds for farnesoid X receptor 367 sequencing of 825ff sequencing of, protein kinases 853 target families, drug candidates of 827 target validation pharmacological approach of 376ff Human histone deacetylase (HDAC) 693 depsipeptide HDAC inhibitors Evans’ chiral auxiliary, with chloroacetate 705
I
1177
1178
I
Index
Human histone deacetylase (HDAC) (continued) drug discovery targets class I and class I1 HDACs 697f HDAC inhibitors, in infectious diseases 698 investigations into, HDAC inhibitors 698 small molecule HDAC inhibitors, study of 697 function in, eukaryotic cell regulation 693 growing set of, therapeutic indications 693 histone acetylation immunoblotting analysis 716 in spiruchostatin A-, or TSA- treated cells 715 induction of, pgp-1 RNA expression and expression of pgp-1 RNA, analyzed using Q-RT-PCR 715 natural product, bicyclic depsipeptide family of 693 natural products, FK228 in advanced clinical trials for cancer 693 Parkinson’s and Huntington’s disease HDAC inhibitors for, neurodegenerative ailment treatment 698 transient histone acetylation associated with, “pulse” treatment of cells 716 Human histone deacetylase (HDAC) inhibitors bicyclic depsipeptide HDAC inhibitors 703 depsipeptide HDAC inhibitors Simon’s aldol reaction 706 Wentworth-Janda synthesis 705 HDAC inhibitors, third family of cyclic tetrapeptide natural products 70 1 hydroxamic acids excellent metal-binding chelators 700 lead small molecule inhibitors of zinc-dependent class I and class 11 HDACs 698ff peptide synthesis and formation of seco-hydroxy acid 706ff
relative expression levels of by qPCR in series of, cancer cell lines 702 selectivity in, classical metal-binding HDAC inhibitor 703 sequence homology between mammalian HDACs, and bacterial HDAC-like protein (HDLP) 700 simplest HDAC inhibitors in clinical trials, anticancer agents 701 short chain carboxylic acids 700 total synthesis of depsipeptide HDAC inhibitors - routes to, p-hydroxy acid fragment 704ff X-ray structure of, bacterial histone deacetylase-likeprotein homologous to human class I HDACs 699 Human immunodeficiency virus (HIV) 583 Human immunodeficiency virus type 1 (HIV-1) 445 synthesis, intracellular site of probing of 445 Human nuclear receptor superfamily classic RXR-heterodimer receptors thyroid hormone receptor (TR) 898 classical receptors to more recently discovered family members 900 ligands and therapeutic utilities, examples of 897 role in, neuronal development (COUP-TFI) and vascular development (COUP-TFII) 899 Huuskonen aqueous solubility dataset 1026 Huuskonen dataset 1023, 1037 HxBP, Hydroxanzate-benzophenone (HxBP) 420 Hybrid carbohydrate 676 Hydrofluoric acid (HF) 569 Hydrogen-suppressed molecular graphs 72 7 Hydrophobic descriptors 1026 Hydroxamate-benzophenone (HxBP) 420 Hydroxybenzotriazole (HOBT) 595 Hypothesis generation 724 Hypothesis testing 724
Index I
ICAT, Isotope-coded afinip tagging (ICAT) 406 ICOS, Inducible costimulator IICOS) 1109 IDDM, Insulin-dependent diabetes mkllitus ( I D D M ) 1097 IFN-y, Interferon-y ( I F N - y ) 1097 IL, Interleukin ( I L ) 1097 IL-2, Interleukin 2 ( I L - 2 ) 1063 IL-8, Interleukin-8 (IL-8) 582 Immune dysregulation, polyendocninopathy, enteropathy, X-linked (IPEX) 1107 Immunological response 668 Immunology regulatory CD4+ CD25+ T lymphocytes by gene expression profiling ll06ff T-cell subsets Rudensky laboratory findings 1107 T-cell subsets, overview of by gene expression profiling 1106 Immunosuppressant 106 Cyclosporin A (CsA)and FK 506,107 pathways signal transduction 107 in T lymphocytes 107 Rapamycin 108 IMPACT (intein-mediated purification with an affinity chitin binding tag) system 544, 545 in the synthesis of 16 estrone 17 Inducible costimulator (ICOS) 1109 Inflammatory diseases transcriptional networks in gene profiling of T-cell subsets 1097 Informatic tools development of 1009 Inhibitory switch (IS) 855 Inpharmatica’s Drugstore relational database FDA approved drugs 811 Inpharmatica’s Drugstore database predicting dmggability on, protein drug targets 817 Inpharmatica’s StARLITe database gene family distribution of, human proteins with small-molecule chemical leads 814 Insulin receptor kinase (IRK) 397,855 Insulin receptor kinase (IRK) inhibitors 398 Insulin receptor tyrosine kinase 399
Insulin-dependent diabetes mellitus (IDDM) 1097 Intein 540 Interferon-y (IFN-y) 1097 Interleukin (IL) 1097 Interleukin 2 (IL-2) 1063 Interleukin-8 (IL-8) 582 International Union of Pure and Applied Chemistry (IUPAC) 770 Intestinal drug absorption factors influencing 1008 fraction absorbed 1021f in silico models 1026ff permeability 1020f vs. human fraction absorbed 1032 in silico models 1021, 1026ff prediction of physiological factors and experimental parameters influencing 1018ff solubility 1018ff in silico models 1020, 1022ff salting-in effect 1020 Intestinal permeability 1007f IPEX, Immune dysregulation, polyendocninopathy, enteropathy, X-linked (IPEX) 1107 IRK, Insulin receptor kinase 385,855 IS, Inhibitory switch ( I S ) 855 Isotope-coded affinity tagging (ICAT) 406 I sotopes stable 555
J James Black alkyl-substituted histamine analogs beta-blockers, development of 359 Janus kinase-signal transduction and activator of transcription (JAK-STAT) pathway 1046,1049 J I A , Juvenile idiopathic arthritis (JIA) 1102 Joshua Ledenberg genetic recombination discovery of 300 Journal of Medicinal Chemistry (JMC) 761 Jurkat cell surfaces chemospecific labeling of 618 Juvenile idiopathic arthritis (JIA) 1102
k Opioid Receptor (KOR) 365 Kaposi’s sarcomagenesis 947 Kenograms 727-
K
I
1179
1180
I
Index Ketones and azides I L-type Calcium Channel Signaling unnatural functional groups through posttranslational modification 130 614 assay radioligand-binding 132 Ketoreductase (KR) 522 calcium channel Ketosynthase (KS) 520 DHP-resistant 133 Kinase amendable kinases to, NMR-guided dmg L-type 133 T1006Y mutant 133 discovery 852 calcium channels cancer patients antineoplastic drugs 122 Voltage-gated 131 calcium signal as drug targets 856 imatinib targets act locally 131 Bcr-Abl 123 chimeric channels 132 c-Abl 123 photoaffinity labels 132 c-Kit 123 Resistance Mutations 130 kinases 123 single protein PDGFR 123 uniquely resistant to a general inhibitor inhibitor 131 Lactacystin 101 BAY43- 9006,125 Bcr-Abl tyrosine kinase 123 a,B-Epoxyketones 102 analog 101 imatinib 123 nonspecific of (VEGFR) 125 Inhibitors 122 inhibitor 101 ligand binding TMC-95A 103 Lag-3, Lymphocyte activation gene-3 (Lag-3) binding mechanisms by lineshape analysis 874f 1108 mechanism LBD, Ligand-binding domain (LBD) 366, imatinib resistance 123 559,892,1122 mutation LC-MS, Liquid chromatography-mass spectrometry ( L C - M S ) 408 control ligand selectivity 124 Le” - Ley nonasaccharide 672 T315I 124 Ley-Le” nonasaccharide 671 Philadelphia chromosome 123 Lead identification (LI) 795 protein NMR spectroscopy 856ff, Leptomicin B 1056 875 Bruton’s Tyrosine Kinase (BTK) Lessons 858 From 55 Patchouli Alcohol 55 Resistance 122 single kinase Published Total Syntheses 55 Quinine 56 cancers 125 catalytic activity of 125 Lewis antigens 671 tumour-specific kinase inhibitors dimeric combinations of 671 Lewis hexasaccharide 672 cancer patients, therapeutic opportunities for 852 Lewis X pentasaccharide 671,672 Kinase CBP Lewis Y hexasaccharide 671 establishment of, core panel kinases LI, Lead identijcation ( L r ) 795 799 Library synthesis kinase insert domain-containing receptor guidelines for 493 (KDR) 771 Ligand Kinase-substrate interactions 388 binding energy potential of 806 physicochemical characteristics of, KOR, K Opioid Receptor ( K O R ) 365 binding site 806 KR, Ketoreductase ( K R ) 522 small molecule ligand-binding sites KS, Ketosynthase ( K S ) 520 808
Index
thermodynamic argument thermodynamics and selection pressure, for ligand interactions 806 Ligand binding ER ligand discovery ER-directed drug discovery 918 ER-selectivemolecule 918 ligand on N R LBD conformation, influence of 909ff LXRB LBD structure and features of 909 multitude of, ligand-induced N R actions 913ff Ligand Selectivelyof Ion Channels 130 Capsaicin 133 Engineering 130 L-type Calcium Channel Signaling 130 Ligand-binding domain (LBD) 366, 559, 892,1122 Ligand-binding Pocket de novo binding sites 189 De Novo Design 188 into proteins 188 zinc finger domains inducible 189 Ligand-binding Pockets 188 Ligand-dependent Activators 177 Exploiting 177 Prokaryotic 177 receptors quorum-sensing 177 Ligand-Protein Pairs 231 Bumps and Holes 231 modified ligand 231 steric clash 231 Heterodimerizers 233 Homodimerizers 233 Refining 231 Ligand-receptor interactions molecular modeling of 949ff Ligation sequential 545 single 545 strategies of 547f Ligation reaction 546 Light-activated Gene Expression 189 cell cultured 190 monolayer 190 duration of reporter gene response 190 from Small Molecules 189
light-activated transcription 189 translation 189 nuclear receptor agonists photocaging 190 small molecules gene expression 190 photocaged 189 Line notations 730 Lipinski Dement World Drug Index concept of, physicochemical property limits to drugs 805 Lipinski’s rule-of-five 805 “rule-of-five” (Ro5) 766 commonly used guidelines of 826 Lipophilicity 1026 Liquid chromatography-mass spectrometry (LC-MS) 408 Low-molecular-weightcompounds synthesis of 99Gff LXRs, Liver X receptors (LXRs) 905 Lymphocyte activation gene-3 (Lag-3) 1108 Lymphocytes 681 Lysine residues modification through, reductive alkylation 595 Lysozyme 385 m
M3H, Mammalian 3 H ( M 3 H ) 1132 mAb, Monoclonal antibody (rnAb) 337 MAGE-ML, Microarray gene expression markup language ( M A G E - M L ) 1094 Magnetic resonance imaging (MRI) 438 Major histocompatibility complex (MHC) 1098 MALDI, Matrix assisted laser desorptionlionization spectrometry (MALDI) 569 Maltose binding protein (MBP) 558 Mammalian 3H (M3H) 1132 Mammalian protein-protein interaction trap (MAPPIT) 1132 Mammalian small molecule-protein interaction trap (MASPIT) 1133 Mammalian target of rapamycin (mTOR) 303 Mannich reaction not targeting cysteine, or lysine residues 601
I
1181
1182
I
Index
Mannose-binding bacteria 685 Mannose-binding proteins (MBPs) 643 MAP, Mitogen-activatedprotein 861, 943, 1073 MAP kinase activation 393 MAP, Multiantigenicpeptide ( M A P ) 585, 861,943,1073 MAPKAP-2, Mitogen-activated protein kinase-activated protein kinase-Z(MAPKAP-2) 859 MAPPIT, Mammalian protein-protein interaction trap ( M A P P I T ) 1132 MASPIT, Mammalian small molecule-protein interaction trap ( M A S P I T ) 1133 Mass spectrometry (MS) 405 Mathematical biology 1048 Mathematical modeling 1045 Mathematical models in silico biology 1047 Matrix assisted laser desorption/ionization spectrometry (MALDI) 569 Matrix metalloproteases (MMPs) 420, 1105 Maximum recommended therapeutic dose (MRTD) 776 MBP, Maltose binding protein (MBP) 558 MBPs, Mannose-binding proteins (MBPs) 643 MC4, Melanocortin-4 ( M C 4 ) 950 MCF7 cells 771 MCH2, Melanin-concentrating hormone subtype 2 ( M C H 2 ) 943 MDL Drug Data Report (MDDR) 760 Mechanisms of action (MoA) 1119 Medicinal chemistry ligand-NR recognition structure of, GR LBD and ligand binding features 904 ligand-NR recognition, basic principles of 903ff RXR-heterodimer receptors PPARs, RXR, LXR, FXR 905ff small-molecule modulator biological target of interest 804 steroid and RXR-heterodimer receptors “orphan” receptors 906ff steroid receptors ligand-binding pockets of 903ff Melanin-concentrating hormone subtype 2 (MCH2) 943 Melanine stimulating factor (MSF) 955 Melanocortin-4 (MC4) 950 Melanopsin 944
Melatonin pineal gland biosynthesis of 394 Members of Later Generations 24 Desogestrel 24 Drospirenone 25 exogenous gestagen new 24 Gestoden 24 norethindrone 28 trial and error approach 24 Members of the First Generation 22 Norethindrone from estrone-methylether by partial synthesis 22 gestagenic component 22 Members of the Second Generation 23 ethyl group in C(13) 23 gestagen (-)-norgestrel31b 23 total synthesis 23 Mendel, Gregor discovery of “heritable factors” 300 genetic maps law of independent assortment 326 2-Mercaptoethane sulfonate (MES) 434 2-Mercaptoethansulfonic Acid (MESNA) 545 2-(2-(2-Mercaptoethoxy)ethoxy)ethanol 674,675 Merrifield’sresin 671 MES, 2-Mercaptoethane sulfonate ( M E S ) 434 Messenger Ribonucleic Acid (mRNA) 299 Metabolic pathways amplified sensitivity to stimulus enzyme-mediated covalent modifications 1073 enzymejsubstrate compartmentalization, effects of 1073 Metabolic systems 1046 connectivity theorems 1046 control theory for 1046 robustness of 1046 summation 1046 Metabotropic Glutamate Receptor (mGluR) 935 Metalloproteases (MPs) 419 activity-based probes for proteomic profiling of 419f Metastasis 668 Methotrexate (MTX) 460,1123 Methylene 389 MFCs, MTX-fision compounds (MFCs) 1123
lndex
MGED, Microarray gene expression data (MGED) 1094 mGluR, Metabotropic Glutamate Receptor (mGluR) 935 MHC, Major histocompatibility complex ( M H C ) 1098 MIAME, Minimum information about a microarray experiment ( M I A M E ) 1094 Microarray data MGED Ontology MGED guidelines, compliance with 1094 standard terms for, annotation of microarray experiments 1094 Nature and Cell requiring authors to submit microarray data, for public repository 1094 Microarray data analysis mathematicians generating, dedicated algorithms and tools 1084 Microarray experiments context-dependent standardization toward 1094f experimental designs gene expression levels, estimation of 1085 loop design, of Kerr and Churchill 1085 reference sample 1085ff use of, common reference sample 1085 gene expression interplatform comparison of results 1091ff Microarray gene expression data (MGED) 1094 Microarray gene expression markup language (MAGE-ML) 1094 Microarray technology transcriptome (cDNA sequences) knowledge of 1084 Microarrays 668 and binding events 674 ordered array of DNA sequences technology revealing, physiology of cells and tissues 1083 Microsequencing of small peptide 941 Microsphere arrays 676 Mimetics 250 anchor low-affinity 265
antagonists potency 265 Applications 255 complexation receptor-ligand 265 drug design computer-aided 264 structure-based 253, 264 hotspot 251 interactions protein-peptide 254 protein-protein 254 thermodynamic 254 interface barnase-barstar 254 protein-protein 254 interfaces analysis 255 interfacial residues 252 as Modulators of Protein-Protein 250 nonpeptide agents 252 protein clefts or cavities 250 Protein Secondary Structure 250 as Protein-Ligand Interactions 250 protein-protein association 253 disrupters 253 mechanism 254 screening methods mass spectrometry 264 N M R 264 small molecule 250 small molecules druglike 251 structural mimetics of @-helices 251 B-turns 251 strands 251 synthetic agents in drug discovery 250 synthetic inhibitors 251 Mineralocorticoid receptor (MR) 903 Minimum information about a microarray experiment (MIAME) 1094 Mitogen-activated protein (MAP) 861, 943,1073 linear picture of signal transmission 1073 Mitogen-activated protein kinase-activated protein kinase-2 (MAPKAP-2) 859 Mitogen-activated protein (MAP)-kinase pathways 1046 Mixture synthesis 488f
I
1183
1184
I
Index MLR, Multiple linear regression ( M L R ) 1011 MMPs, Matrix metalloproteases ( M M P s ) 420,1105 MoA, Mechanisms ofaction ( M o A j 1119 MOBILE, Modeling binding sites including ligand information explicitly ( M O B I L E ) 952 Molecular biology new techniques emergence of 360 Molecular cloning 935,941 Molecular connection table 730 Molecular encoding molecular tags 33 Molecular genetics biological systems, understanding of 300 Molecular graph 727 types of 727 Molecular information systems 959 Molecular Libraries Initiative (MLI) 760 Molecular mechanisms chemical-genomic profiling 340ff small-molecule perturbagens (SMPs) 344 WT strain of the budding yeast 342 mitosis and spindle assembly 336ff chemical-genetic screens for, inhibitors of mitosis 336 molecular toolbox intracellular protein acetylation 338ff, 343 selective inhibitors of, a-tubulin (tubacin) and histone deacetylation 342 Molecular properties for solubility and permeability 1006 Molecules assessing druglike properties 806 quantitative approach "rule-of-five"index 807 assessing druglike properties of 806 Monoclonal antibody (mAb) 337 Monomeric red-fluorescent protein (mRFP) ReAsH-mediated CALI of Connexin43 and L-type calcium channels 450f Monomeric sugar mimics use of 639 MPs, Metalloproteases ( M P s ) 419 MR, Minerulocorticoid receptor (MR) 903 MRI, Magnetic resonance imaging (MRI) 438
mRNA, Messenger Ribonucleic Acid ( m R N A ) 299 MRTD, M a x i m u m recommended therapeutic dose ( M R T D ) 776 MS, Mass spectrometry ( M S ) 405 MSF, Melanine stimulatingfactor ( M S F ) 955 mTOR, Mammalian target of rapamycin ( m T O R ) 303 MTX, Methotrexate ( M T X ) 460, 1123 MTX-fusion compounds (MFCs) 1123 hybrid ligand DBD-fusion protein and AD-fusion protein, associating with 1124 MudPIT, Multidimensional protein identijcation technology ( M u d P I T ) 406 Multiantigenic peptide (MAP) 585 Multicomponent 28 asthmatic controlling 29 inflammation 29 Dynamic Variation 34 focused variation cluster ofpoints 31 combinatorial approach 31 natural products 29 non-natural ligands action on the immune system 30 collection of 30 synthesized independently 30 signal carriers cascade of 29 immunosuppressants 29 initiated by allergens 29 T-cell overproduction 29 signaling pathways pharmacological treatment 29 Simultaneous Procedure 28 Static Variation 31 variant collective screening 28 population 28 restricted 28 Multidimensional protein identification technology (MudPIT) 406 Multiple linear regression (MLR) 1011, 1036 Multiresidue Protein Caging 150 dynamics in actin filament 151 local perturbation 151 G-actin conjugates 151 o-nitrobenzylgroup toward specific residues 150
Index
Multiscaffold libraries early efforts toward 495 MurA 651 MurB inhibitors 656 MurG inhibitors 653 Mutagenesis site-directed 567,988 Mutagenic analysis 386 Mutant bacteria 685 Mutant inteins 542 Mutants classes of 389 mutation 118 Mutation genetics forward chemical genetics 356 phenotypes or biomarkers 356 Mycobacterial cell wall components of 651 n N-hydroxy succinimidyl ester (NHS) 453 N-myristoylated HIV-1 matrix protein synthesis from three peptide segments 583f N-terminal Cys 387 N-terminal cysteine alternative to 546 N-terminal cysteine residues protecting groups for 546 Na+/H+ Exchanger Regulatory Factor (NHERF) 943 NAD+, Nicotinamide adenine dinucleotide ( N A D + ) 696 Narcolepsy orexin sleep and wakefulness, regulation of 370 National center for biotechnology information (NCBI) 1096 Native chemical ligation 387 auxillary mediated 577 to yield noncysteine ligation products 577 Native chemical ligation (NCL) 540, 601 mechanism of 541 protein a-thioesters 542 for protein semisynthesis 540 Native peptide bonds chemoselective ligation to form 574ff Natural amino acids new bioconjugation methods targeting of 597ff Natural Killer (NK) 370, 1104
Natural product-like libraries 497ff Natural Products 95 bioassay screening cell-based 109 natural products 109 cell systems model 96 perturbing 96 chemical genetics 95 protein inhibit 95 knockout 95 Small molecules conditional alleles 95 to Unravel Biological Mechanisms 71 to Unravel Cell Biology 95 NBEs, New biological entities (NBEs) 811 NCBI, National centerfor biotechnology information ( N C B I ) 1096 NCEs, New chemical entities (NCEs) 811 NCL, Native chemical ligation ( N C L ) 601 NCoR, Nuclear receptor corepressor (NCoR) 914 Nerve growth factor-induced B (NGFIB) 906 Nestler, Hans Peter chemical biology Book of Knowledge recommendations from 800 Network connectivity FOXOla nuclear export nucleocytoplasmic transport 324 small-molecule probes relationship between 323ff Neural networks (NNs) 1013,1037 backpropagation 1013 Neurons glutamate receptors activity dependant turnover and trafficking of 443ff Neuropeptide Y (NPY) 955 Neuropilin-1 (Nrpl) 1108 New biological entities (NBEs) 811 New chemical entities (NCEs) 811 New Ligand Specificities 179 bump and hole 179 chemical inducers of dimerization (CID) 179 Engineering 179 intoNHRs 179 New molecular entities (NMEs) 811 NF-KB, Nuclearfactor kappa B ( N F - K B ) 895 NF-AT, Nuclearfactor ofactivated T cell (NF-AT) 304
I
1185
1186
I NGFIB, Nerve growthfactor-induced B Index
(NGFIB) 906 NHERF, Na+/H+Exchanger Regulatory Factor ( N H E R F ) 943 NHRs 185 actions of NHRs extranuclear 185 nongenomic 185 Chemical Biology 185 pathways cellular signaling 186 Vitamin D analogs 186 NHS, N-hydro? succinimidyl ester ( N H S ) 453 Niacin 949 Nicotinamide adenine dinucleotide (NAD+) 696 Nitric oxide (NO) 373 Nitrilotriacetate (NTA) 471 2-nitrobenzyl 141 kinetics of muscle contraction 141 Nitrobenzyl and Nitrophenyl 140 o-nitrobenzyl 141 2-nitrobenzyl 141 applications invivo 145 cage coumarin-based 146 peptides 146 proteins 146 derivatives alcohol 141 aldehyde 141 electron-donating groups to the aromatic moiety 143 formation of diastereomers 144 isomeric nitroaromatic 145 photo-by-product 145 protecting groups photolabile 144 o-nitrobenzyl 141 effect of electronic nature 144 release kinetics 143 Nitrocellulose coated slides 676 NK, Natural Killer ( N K ) 370, 1104 NMEs, New molecular entities (NMEs) 811 NMR investigations kinases protein-based results of 867ff
statistics of amino acids 869 ribbon representation of, protein kinase PKA p38 MAP kinase, and N-lobe, C-lobe, ATP-binding site 869 NMR methods activation and substrate binding protein phosphorylation 873 eight kinase-targeted oncology drugs 852 kinases activation and substrate binding 871ff kinases, chemical biology outlook 852 NMR-based screening trials 852 applicable tool (LIGDOCK) 852 protein kinases structure-guided drug design 852ff NMR, Nuclear magnetic resonance ( N M R ) 362,583,808,954,990 NMR spectroscopy chemical biology of kinases, studies of 852ff fragment approach fragment linking, building scaffolds of complex compound 877ff fragment-based hits M detected NMR fragment approach 880 NMR-basedfragment approach 881 fragment-based hits, strategy of 879ff kinases NMR-based screening 876,877 screening techniques/strategies 875 titrations curves, indicating different binding mechanisms 875 kinases, screening of 875ff, 882 ligand-detected NMR screening NMR reporter screening 878f NNs, Neural networks ( N N s ) 1013, 1037 NO, Nitric oxide ( N O ) 373 Nonlinear protein structures synthesis of 584ff nonpolar surface area (NPSA) 766,1027 nonribosomal peptide synthesis 471 nonribosomal peptide synthetase (NRPS) 522 Nonsteroidal anti-inflammatory drugs (NSAIDs) 792 Noonan syndrome 391 Novartis TAM combinatorial libraries prototype structures of 967
lndex
NPSA, Nonpolar surface area (NPSA) 1027 NPY, Neuropeptide Y (NPY) 955 N R Chemical biology human NRs structural class 923 NR modulation concept of 919f NR, Nuclear hormone receptor (NR) 891 NR research and drug discovery new approaches to 920ff microarray technology 921 Nrpl, Neuropilin-1 (Nrpl) 1108 NRPS, Nonribosomal peptide synthesis (NRPS) 471,522 NSAIDs, Nonsteroidal anti-inflammatory drugs ( N S A I D s ) 792 NTA, Nitrilotriacetate (NTA) 471 Nuclear factor kappa B (NF-KB) 895 Nuclear factor of activated T cell (NF-AT) 304 Nuclear hormone receptor (NR) 891 nonnuclear functions and interactions, with other cellular proteins 915 NR drugs and novel drug candidates examples of 916ff NR genes, identification in humans 891 Nuclear magnetic resonance (NMR) 362, 583,808,954,990 Nuclear receptor corepressor (NCoR) 914 Nuclear Receptor Engineering 183 by Selection 183 NHR mutants screening 183 selecting 183 selectivities 184 Nucleic acid-nucleic acid interactions 669 Nucleophilic groups ketone functionalization through hydrazone and oxime formation 616 Nucleotide-binding site 396 Nucleotide-sugar substrates 649 0
OGR1, Ovarian cancer G protein-coupled receptor 1 (OGRI) 949 OGW, Ontology workinggroup (OGW) 1094 Olfactory receptor genes 944 Olfactory receptors 944
Oligomerization ofGPCRs 954 Oligomers 981 Oligonucleotides 567 Oligosaccharide conjugate vaccines malaria and HIV 677 Oligosaccharide sequencing 669 Oligosaccharides 550,636,637,669 automated assembly of 670 chain length of 669 Oncostatin M (OSM) 1101 One-pot EPL reactions 548 Ontology working group (OGW) 1094 Open reading frame (ORF) 1126 Opsins 937,944 Oral Contraceptives 2 1 estrogenic 19-nor-steroid Binding of a gestagen 22 hand-and-glove metaphor 22 Members of Later Generations 24 Members of the First Generation 22 Members of the Second Generation 23 ORF, Open readingf/ame (ORF) 1126 Organic chemistry synthetic organic chemistry strategies for, construction of complex natural products 593 Organic solvent auxillary mediated segment condensation 571 Organic synthesis sophisticated tools of 567 Orphan receptors 949 OSM, Oncostatin M ( O S M ) 1101 Ovarian cancer G protein-coupled receptor 1 (OGR1) 949 Oxidative coupling reactions, aniline functionalization 623ff Oxocarbenium ions 638 Oxyethanethiol group 546
P P-selectin potent inhibitor of 647 p2*activated protein kinase 1 (PAK1) 855 p53-hdm2 interaction inhibitors of 991ff biological background of 991 interface, characterization of 992f pharmacophore model, establishment and validation of 993ff P450 datasets 1034
I
1187
1188
I
Index PAGE, Polyacrylamide gel electrophoresis (PAGE) 447 PAI-1, Plasminogen activator inhibitor (PAZ-I) 704 PAKl, p2lactiuated protein kinase I ( P A K I ) 855 Pancreatic trypsin inhibitor 539 Parallel synthesis 489 Parathyroid Hormone/Parathyroid Hormone Related Protein (PTHIPTHrP) 942 Parthenolide 109 Feverfew 109 nuclear translocation NF-KB 109 phosphorylation IKB 109 Partial least squares (PLS) 1011, 1036 Partitioned total surface areas (PTSAs) 1027 Patchouli Alcohol accepted X-ray 55 proof of structure total synthesis 55 Structural Proof 55 structure wrong 55 Synthetic Lesson 55 Trouble with 55 Patient population target validation proof of principle, in phase Ira clinical trials 791 PCA, Principal component analysis (PCA) 333,501 PCAs, Proteinfragment complementation assays (PCAs) 1132 PCP, Peptidyl carrier protein (PCP) 472, 522,615 PCR, Polymerase chain reaction (PCR) 405,436,941, 1086 PDB, Protein Data Bank (PDB) 949 PDE, Phosphodiesterases (PDE) 374, 1131 PDGF, Platelet-derived growthfactor (PDGF) 1065 PEG, Poly(ethylene glycol) (PEG) 607, 1126 PEP, Phosphoenolpyruuate (PEP) 651 Peptide optimal peptides library approach 435 Peptide a-thioesters 543 Peptide binding 953
Peptide carrier protein (PCP) 615 Peptide moiety-kinase interaction 399 Peptide nucleic acid (PNA) 272, 576 Peptide thioesters production of 543 solid-phase peptide synthesis 543 tent-botylmethoxycarbonyl (Boc)-based peptide synthesis 543 Peptides 567, 989 C-terminal thioester synthesis of 579f C-terminally modified solid phase synthesis of 579 synthesis of 578-579 chemical synthesis of 568 fragment condensation of 570 thioester method for 570 fully unprotected 572 intermolecular linking of 571 N-alkyl 568 N-terminal modification of 578 solid phase synthesis of 578 N-terminally functionalized synthesis of 578 partially protected 570f coupling of 571 synthesis of 988 unprotected chemoselective ligation of 572ff hydrazone ligation in aqueous solution 572 thioester ligation in aqueous solution 573 Peptidoglycan 650 synthesis 652 Peptidyl carrier protein (PCP) 472, 522 Peropsin 944 Peroxisome proliferator activated receptor gamma (PPARy) 902 PET, Positron emission tomography (PET) 438 PGIS, Prostacyclin synthase ( P G I S ) 369 pgp-1,p-Glycoprotein protein (pgpj-l 714 Pharmaceutical industry medicinal chemists screening campaigns for 804 Pharmaceutical research combination strategy of, ligand-detected and protein-detected NMR 880 fragment-based NMR approach Jun N-terminal Kinase 3 (JNK3) 881
Pharmacological literature Drews identication of, 483 known drug targets 809 ligand-binding domains, estimation of 809 Phenol sulphuric acid test 685 Phenylalanine phosphonates 390 Pheromone receptors 944 PhK, Phosphorylase kinase ( P h K ) 871 Phosphatidylinositol-3-OH kinase (PI3K) 915 Phosphodiesterases (PDE) 374, 1131 Phosphoenolpyruvate (PEP) 651 Phosphoinositide (PI) 1067 Phospholamban pentamer biarsenical-tetracysteine complex structure of 447 Phospholipase C (PLC) 1067 Phospholipase Cp (PLCB) 947 Phosphonates 389 Phosphonomethylene alanine (Pma) 390 Phosphonomethylene phenylalanine (Pmp) 390 Phosphonomethylphenylalanine (Pmp) 995 Phosphopantetheine transferase (PPTase) 463 Phosphorylase kinase (PhK) 871 Phosphorylated STAT-5 in cytoplasm 1051 Phosphorylation Sites and Phosphopeptides 165 cage to the phosphate 166 Caged 165 caged phosphoserine containing phosphopeptides 166 efficiency of photoactivation 166 peptide probe activity 165 monitors protein kinase C 165 photoactivatable fluorescent 165 Ser-caged 165 phosphoproteins on the phosphate moiety 167 with cages 167 phosphoserine 2-nitrophenylethyl-caged 166 tripeptide N-formyl-(L) Met-(L) Leu-(L) Phe 168 Caged versions 168 Phosphoserine/threonine 389
Photoactivatable Groups 140 Applications 140 cinnamate cage E + 2 photoisomerization 147 Nitrobenzyl and Nitrophenyl 140 nucleophilic group alcohol 148 amino 147 in proteins and peptides 147 thiol 147 Photocleavable Groups 147 thiophosphates 149 via diazo compounds 149 Photocleavable Groups 147 Vinylogenic 147 Photoreceptor cell-specific receptor (PNR) 902 Photoremovable Groups photoremovable protecting groups 146 Physical chemistry 725 Physician Desk Reference (PDR) 760 PI, Phosphoinositide ( P I ) 1067 P13K, Phosphatidylinositol-3-OH kinase ( P 1 3 K ) 915 PKA, Protein kinase A 385, 855, 942 PKB, Protein kinase B ( P K B ) 859 PKS, Polyketide synthesis ( P K S ) 471 Plasma membrane (PM) 439,445 Plasminogen activator inhibitor (PAI-1) 704 Platelet-derived growth factor (PDGF) 1065 PLC, Phospholipase C ( P L C ) 1067 PIXa, Phospholipase CB(PLC,) 947 Plerograms 727 PLP, Pyridoxal phosphate ( P L P ) 610 PLS, Partial least squares ( P L S ) 1011,1036 PM, Plasma membrane (PM) 439,445 Pma, Phosphonomethylene alanine (Pma) 390 Pma-32 AANAT 395 Pmp, Phosphonomethylene phenylalanine (Pmp) 390 Pmp, Phosphonomethylphenylalanine (Pmp) 995 PNA, Peptide nucleic acid ( P N A ) 272, 576 PNR, Photoreceptor cell-spectj'ic receptor ( P N R ) 902 polar surface area (PSA) 766, 1026 Poly(ethy1eneglycol) (PEG) 607 Poly@-Phenylene Ethynylene (PPE) 685 Polyacrylamide gel electrophoresis (PAGE) 447 Polyethylene glycol (PEG) 1126
1190
I
Index Polyethylene glycol-derivedpolyamide (PPO) 585 Polyfluorocarbon chains 485 Polyhistidine-containing sequence (HIS) 558 Polyketide synthases (PKSs) 520 Polyketide synthesis (PKS) 471 Polyketides aromatic 525,533 analog production 526 combinatorial biosynthesis of 529 classes of 520 formation of 521 Polyketides and nonribosomal peptides combinatorial biosynthesis of 519ff applications and examples of 529ff development of 523ff future development of 531ff general considerations for 527 history of 523ff Polymerase chain reaction (PCR) 405, 436,675,941,1086 Polymers classes of 668 non-cross-linked 485 Polypeptides 567 chemoselective ligation for 573 POS, Probability ofsuccess ( P O S ) 790 Positron emission tomography (PET) 438 post-Darwinian Era 19 genetic mutation 20 Modern Synthesis 20 multidimensional sequence space 20 natural selection 20 New Synthesis 20 Postsynaptic density (PSD-95) 969 Posttranslational modifications 550 Power of Genetics 199, 201, 203, 205, 207,209,211,213,215,217,219,221 Chemistry 199, 201, 203, 205, 207, 209,211,213,215,217,219,221 PPARy , Peroxisome prolijerator activated receptor gamma ( P P A R y ) 902 PPO, Polyethylene glycol-derivedpolyamide (PPO) 585 PPT, Propyl pyrazole triol (PPT) 368 PPTase, Phosphopantetheine transfrase (PPTase) 463 PR, Progesterone receptor ( P R ) 903 pre-Darwinian 18 anatomical function 18 anatomical structure 18 Cuvier-Geoffroy debate 18
pre- Woodwardian 12 Emil Fischer synthetic chemistry in biology 13 Estrone Dane strategy 14 Robert Robinson employ mechanistic considerations 14 modifications in a pathway 13 steroid synthesis 13 Precipitation tags 485 Predicted residual error sum of squares (PRESS) 1013 Pregnane X receptor (PXR) 902 preparative chemistry 9 Preparative Chemistry - Synthetic Chemistry 9 preparative chemistry 9 PRESS, Predicted residual error sum of squares ( P R E S S ) 1013 Principal component analysis (PCA) 333, 501 Euclidean distance-preserving rotation 333 Pearson correlation coefficients 333 linear dimensionality reduction 334 Probability of success (POS) 790 Probe 77 Brefeldin A Principles of Membrane Transport 84 Correcting Errors in Chromosome-spindle Attachments 81 Progression through Mitosis 77 Ribosomal RNA 88 Progesterone receptor (PR) 903 Progression 77 chromosome into two daughter cells 78 movements 77 segregation 78 sister 77 Cleavage Plane 80 Prokaryotes 635,648 Prokaryotic and eukaryotic organisms complete genome sequences availability of 403f genomic and proteomic methods mRNA and protein abundance, measurements of 403 Propyl pyrazole triol (PPT) 368 Prostacyclin synthase (PGIS) 369
Prostaglandins lysine, cysteine, and glutamic acid markers for residues inflammatory and thrombotic disease's strategies for 596 792 molecules and materials, attached to role in, inflammation and platelet proteins function 792 survey of 594 Prostate-specific gene receptor (PSGR) new chemical methods 944 attachment of, synthetic molecules to Protease Chemical Biology Platform, proteins 593ff launching of outlook of 593 by Hans Peter 801 Protein biosynthetic system Proteasome 101 Central Dogma 700kDa 101 micelle-mediated aminoacylation Inhibitors 101 275ff Lactacystin 101 synthetic expansion of 271ff proteolysis directed evolution of, existing of intracellular 101 aaRS/tRNA Pair to accept nonnatural regulator 101 amino acids 278ff Protecting groups four-base codons for N-terminal cysteine 546 CGGG and AGGU 285f orthogonal 671 complementary four-base anticodons strategies of 546 285 Protein frame-shift suppressor tRNA 285 fluorescein bis(arsenica1) (FlAsH) dyes nonnatural base pairs, orthogonal to binding of, tetracysteine motifs to 287 Gllf principle of, four-base codon strategy lysine residues 285 reductive alkylation using transfer top codons for, multiple hydrogenation 607 incorporations 286 modification of genetic codes transition metal catalyzed reactions, amber suppression method 284 using G O l f f expansion of 284f N-termini of stop-codon suppression method, site-selective modification of 607ff drawbacks of 285 posttranslational modifications of three stop codons (UAG, UAA, UGA) 387ff 285 Protein a-thioesters 542 nonnatural amino acids Protein assemblies adaptability of EF-Tu to functionalization of aminoacyl-tRNAs 283 diazonium-coupling strategies 599 adaptability of, E. coli ribosome 283 Protein bioconjugation adaptability of ribosome 283f activity based protein profiling biomolecules optimized for 281f cycloaddition reaction, detecting EF-Tu molecule 283 probes attached protein reactive incorporation of, proteins and sites 620 small-sized ones 284 central role in, Chemical biology using puromycin analogs 283 593ff variety of 271 field of nonnatural aminoacylation unique reactivity attributes 593 alternative approach to 278 future development of 625ff Methanococcas jannaschi, mutation of ketone groups tRNA structure 278 using primary bioconjugation negative selection for, eliminating reactions 616 TyrRS 279
1192
I
Index
Protein biosynthetic system (continued) nonnatural amino acid as 21st amino acid 280 selection of, tRNAs not aminoacylated 279 TyrRS mutants, positive selection for 280 orthogonal aaRS/tRNA pair in mammalian cells 281 Schultz and Yokoyama, elegant approaches of 281 orthogonal tRNAs nonnatural amino acids 282 outlook of 271 PNA-assisted aminoacylation 277f in vitro translation system 278 Nielsen-type PNA, obstacle of 278 yeast phenylalanine tRNA, 9-mer PNA 277 protein synthesis, mechanism of 273 ribozyme-mediated aminoacylation 276f flexizyme 277 Protein catenane synthesis of 587 Protein circularization 556 Protein Complementation Assay 213 interactions detection 214 protein-small molecule 214 protein interactions detect 213 incell 213 invitro 213 invivo 213 Protein cyclization 585 Protein Data Bank (PDB) 949 Protein Engineering 134 Challenges 134 mutant proteins compromised function 134 mutations engineered 135 impact on the activity of the protein 135 Protein engineering 556 Protein fragment complementation assays (PCAs) 1132 Protein Function 115, 239 Analysis of 239 Engineering Control 115 pathway it controls 239
Protein - Ligand Interactions 117, 119, 121, 123, 125, 127, 129, 131, 133, 135 Protein-Ligand Interactions 115 Using Chemistry 115 Protein interfaces analysis of 987 Protein kinase A (PKA) 394, 399,855, 942 Protein kinase B (PKB) 859 Protein kinase inhibitors 388 Protein kinase-bisubstrate analog inhibitors 3961 Protein kinases 388 catalytic domain fold construct and condition optimization 859 characterizing kinase-ligand interactions byNMR 882ff construct and condition optimization ['H,'SN]-TROSYspectra of, protein kinase catalytic domains of 862 as drug targets 852f implicated as, pivotal signal transducers in cell signaling networks 1129 inhibition of signal transduction pathways, study of 853 kinase - ligand interactions chemical shift perturbations 883 simulation of NMR spectra, of two state DFG-in/DFG-out model 885 kinase-ligand interactions DFG-in/DFG-out 884ff LIGDOCK procedure 886 mapping of, chemical shift perturbations 882f NMR resonance assignment ['H,''N]-TROSY spectra 868 ['H,''N]-TROSY spectrum of, active murine protein kinase A (PKA) 863 chemical shift matching procedure 864f paramagnetic spin labels 867 use of, paramagnetic spin labels 866ff using, triple-resonance experiments 861ff optimization of, buffer conditions unfolded or aggregated protein state, folded protein suitable for NMR 860
Index
protein dynamic behavior solution-state N M R 873 protein dynamic behavior, study of 873f protein kinase catalytic domain 853 protein-based bisubstrate analogs of 385 ribbon diagram of murine protein kinase A (PKA) in complex with Mg/ATP, catalytic domain of 854 signal transduction biochemical reactions, succession of 853 structural biology of 853ff Yeast three-hybrid (Y3H) applications and practical examples 1129ff using in vitro kinase activity profiling 1130 Protein ligation 544 Protein lipidation 583 Protein Medicinay Chemistry 582 Protein network analysis proteome analysis position-specific fluorescence labeling 289 Protein phosphatases 388 Protein phosphorylation 388 Protein semisynthesis 386f, 390, 539 in living cells 558 and proteolytic enzymes 539 scope of 539 Protein splicing 540ff in living cell conditional protein splicing 557 control of 557 in living cells 557ff Protein substrate sites advantage of 396 Protein synthesis and protein folding bacteria with FlAsH, monitoring of 442 using peptide fragments from solid phase peptide synthesis 569 Protein target isoform selective inhibitor new clinical aspect 373f Protein transduction domain (PTD) 557, 558 Protein transsplicing 542, 556, 560
Protein tyrosine phosphatase (PTP) 385; 388,391 Protein-based catalysts 385 Protein- DNA interactions antagonist 511ff Protein-Ligand Interactions 115 Biomolecular Interfaces 135 Engineering 115 Genetic approaches 115 Ligand Selectively of Ion Channels 130 mutations 115 phenotype 116 protein alter ligand specificity 116 mutated 116 protein engineering 116, 134 Resistance Mutations 116 Revealing Biological Specificity 115 Sensitizing Mutations 126 Protein - Ligand Interactions 117, 119, 121, 123,125, 127, 129, 131, 133, 135 Engineering 117,119,121,123,125, 127,129,131,133,135 Revealing Biological Specificity 117, 119, 121, 123, 125, 127, 129, 131, 133, 135 Protein-carbohydrate interactions 636f inhibition (348 strategies for 639ff inhibitors, identification of 645ff Protein-nucleic acid interactions 669 Protein-protein Interactions 199, 216, 227,388, S l l f f , 669 activators fully synthetic 245 transcriptional 245 aptamer peptide 217 selections 217 Applications 216, 237 Catalysis 206 Chemical Dimerization Technology 228 Chemical Inducer of Dimerization (CID) 208 CID anchor 215 compound libraries 989ff Controlling 199, 227 cyclin-dependent kinase (CDK) Cdc2O 204 Development 202 dimerization reverse 227
I
1193
1194
I
Index
Protein-protein Interactions (continued) dimerizer cell-permeant organic molecule 227 diversity of 980ff DNA-Protein Interactions 204 drugs targeting 979ff E.coli 211 transcription assays 210 Future Development 222 genetic assays 210 pathway-specific 201 traditional 201 History 202 n-hybrid assays 202 independent domains DBD 202 functionally 202 transcription AD 202 inhibitors of 979 K~cutoff 215 medium lacking histidine 205 molecules chemical discovery 200 in the cell 200 nucleic acids 200 small molecules 200 Myc - Max 513 protein evolution 216 protein chimera DNA-binding 203 transcription activation 203 Protein Complementation Assay 213 receptors activate 245 cytokine 245 RNA-Protein Interactions 205 S. cerevisiae 208 screening techniques 989ff selected interface experimental validation of 988f Small molecule-Protein Interactions 206 targets 989ff three-hybrid assay small molecule 208 Transient 227 two-hybrid assay 199 Using Chemical Inducers and Using Disrupters of Dimerization 227 Yeast 210 zinc-finger protein 215
variant 215 Proteins 45,668 Ala scanning mutagenesis of 572 amino acids ordered arrangement 48 in proteins 48 azide modification using Staudinger ligation 616ff bio-macromolecules 46 biochemist bottom-up view 46 biomimetic strategy for N-terminal modification 610 carboxylate residues of 595 chemical orthogonality preparation of 598 chemical synthesis of 567ff chemically synthesized 572 common strategies for N-terminus, modification of 609 competitive inhibition 987 complementarity 983 complexes of 981 different binding sites of conotoxins and nicotinic acetylcholine receptors 375f expressed in, prokaryotes strategies targeting N-terminal serine residues 610 function of 458 Generation 45 Genetic Code 50 human genes and proteins potentially druggable 808 intein-based labeling of 460 labeling of 459 messenger-RNAs (mRNAs) template-RNA 49 unstable intermediates 49 modification of, C-terminus using native chemical ligation 611 modification of, cowpea mosaic virus (CPMV) using “Click” chemistry 621 modulation of enzymatic activity Briggs-Haldane mechanism, of enzyme action 1067 (molecular) biologist topdown attitude 46 Molecular Biologist’s Look 48 N-terminal modification strategies critical consideration 609 nucleophilic groups number of GO3
Index pharmaceuticals, development of 581ff phosphorylated proteins receptor binding of 1067 plasma membrane receptor association 1067 Polypeptide synthesis polymer supports 47 protein target cysteine residues site-specific modification of 596 ReAsH-mediated photoconversion practical for 452 reductive alkylation of using iridium catalyzed transfer hydrogenation 608 self-assembly due to codon-anticodon interaction 49 during translation 49 mRNA and tRNA 49 STAT transcription factors phosphorylation and dimerization of 1067 Structure 45 hydrogen bonding 47 polypeptide chains 47 synthesis automated solid-phase 47 protecting group technology 47 targeting of other functional groups 597 The Chemist’s Look 47 unwanted disulfide bond formation or scrambling 597 Proteomes candidate inhibitors, library of 417 enzyme target ABPP probe structures, and target enzyme classes 412 SE probe library, reactivity profile with 411 probe library screening libraries of 41 1 Proteomics activity-based proteomics and activity-based methods 403 chemical strategies for 403ff complex biological proteomes functional analysis of 403 prokaryotic and eukaryotic genomes assignment of, molecular and cellular functions 403 proteins functional characterization of 422f PSA, Polar surface area (PSA) 1026
PTD, Protein transduction domain (PTD) 557,558 PSD-95, Postsynaptic density (PSD-95) 969 Pseudooligosaccharides 679 PSGR, Prostate-spec$c gene receptor (PSGR) 944 PTH/PTHrP, Parathyroid HormonelParathyroid Hormone Related Protein (PTHIPTHrP) 942 PTP, Protein tyrosine phosphatase 385 PTSAs, Partitioned total surface areas (PTSAs) 1027 Pubchem database 760 Pulmonary fibrosis 391 Purine Analogs 99 CDK inhibition 99
I
1195
Flavopiridol (FLV) 100 inhibitors selective kinase 99 PXR, Pregnane X receptor (PXR) 902 pyranosyl-RNA (p-RNA) single strands with nucleobases 36 Pyridoxal phosphate (PLP) 610
9
qPCR, Quantitative, polymerase chain reaction (qPCR) 702 Q SAR, Quantitative structure-activity relationship (QSAR) 310, 1008 Qualitative roadmap intracellular molecules signal transduction pathways, organizing to form 1061 Quantitative, polymerase chain reaction (qPCR) 702 Quantitative Strucure-Activity Relationship (QSAR) 310, 731, 1008 ~ ~ 56 i ~ i ~ partial synthesis from quinitoxine 56 Rabe and Kindler 56 Synthetic Lesson 56 total synthesis formal 58 Stork 58 Woodward and Doring 58 Trouble with Total Syntheses 56 r R1128 525 RA, Rheumatoid arthritis (RA) 1097 Rab escort protein (REP) 549
~
1196
I
Index
Rab geranylgeranyl transferase (RabGGTase) 549 Rab GTPase effect of prenylation on 550 RAC3, Receptor associated coactivator (RAC3) 914 RAD, R N A abundance database ( R A D ) 1096 Radio Immune Assay (RIA) 368 RAMPs, Receptor activity modulating proteins (RAMPS) 948 Rapamycin 108,519 toFK 506 different activity 108 structurally similar 108 RASSL, Receptors activated solely by synthetic ligands ( R A S S L ) 365 RDCs, Residual dipolarcouplings (RDCs) 866 Reaction constant 731 Reactive group (RG) 408 Reagents carbodiimide coupling 485 solid-supported 485 Receptor 939 Receptor activity modulating proteins (RAMPS) 948 Receptor associated coactivator (RAC3) 914 Receptor Plasticity 180 arginine residue 181 estrogen analogs 182 estrogen receptor 181 functionalized carboxylate 183 ligands 183 hormone-binding selectivity 181 hormones bumped 180 mutation Glu353 182 near drugs 9 4 s retinoic acid 180 Overcoming 180 polar group exchange 183 receptor RAR 180 retinoid 181 salt bridge ligand-receptor 181 Receptor target family GPCR - 7TM 933ff development of 938ff
general considerations for 943ff history of 938ff Receptor tyrosine kinases (RTKs) 1063 Receptors activated solely by synthetic ligands (RASSL) 365 Receptosomes 935,943 Recombinase 184 Conditional 184 Cre-ER system 184 Engineered Nuclear Receptors 185 Ligand-dependent 184 NHRs 185 receptor antagonists 185 synthetic 185 site-specific 184 Recursive deconvolution 491f Recursive partitioning (RP) 1034, 1038 Regulated Transcription and Gene Therapies 241 activation allosteric 242 diphtheria toxin 242 genes control of 242 endogenous 242 tetracycline-inducible 241 Three-hybrid Approaches chemical complementation 243 REP, Rab escort protein (REP) 549 Research and development clinical knowledge for next generation projects 790 successful phase I11 clinical studies 790 Residual dipolar couplings (RDCs) 866 Resistance Mutations 116 HIV Protease 116 Kinase 122 to Small-molecule Agents 116 Target of Rapamycin 119 The Selection 116 Resistance-causing enzymes inhibitors of 681 Retinal G protein-coupled receptor (RGR) 944 Retinoid X receptor (RXR) 905 Reverse chemical genetics proteins biological function of, full control of 380 target validation, necessary tools in 379f
Index
Reverse Dimerization 235 Inducible Disaggregation 235 ligand analogous 236 bumped 236 to one half of AP1903 236 two-hybrid assay 236 Reverse transcriptase-polymer chain reaction (RT-PCR) 961 RG, Reactivegroup ( R G ) 408 RGR, Retinal G protein-coupled receptor ( R G R ) 944 Rhamnose biosynthesis probe identification 656 Rhamnose biosynthetic pathway 655 inhibitors of 656 Rheumatoid arthritis (RA) 1097 Rhodium carbenoids in disulfide modification 606 using, tryptophan modification 605 Rhodopsin 935,949,953 RIA, Radio I m m u n e Assay (RIA) 368 Ribonucleic acid (RNA) 300, 576 Ribonucleic acid-based interference (RNAi) 307 Ribosomal RNA 88 aminoacyl-tRNA mimic 88 Catalysis 88 model 88 Puromycin 88 ribosome 88 Yarus inhibitor 88 Ribosome 668 Ribosome-synthesized proteins 554 RMSD, Roo; mean squave dijirence ( R M S D ) 865 RNA abundance database (RAD) 1096 RNA-Protein Interactions 205, 219 in vitro methods 219 specificity 220 switch sperm/oocyte 220 third component hybrid RNA 205 three-hybrid assay 205 to the two-hybrid system 205 RNA, Ribonucleic acids ( R N A ) 300, 576, 668 RNAi, Ribonucleic acid-based interference ( R N A i ) 307 Root mean square difference (RMSD) 865 RP, Recursive partitioning ( R P ) 1034, 1038
RT-PCR, Reverse transcriptase-polymer chain reaction ( R T - P C R ) 961 RTKs, Receptor prosine kinases (RTKs) 1063 RXR, Retinoid X receptor ( R X R ) 905 5
S-type lectins 641ff structure of 642 S l P, Sphigosine-I-phosphate ( S I P ) 959 Saccharides 635 SAE, Sialic acid 9-0-acetylesterase (SAE) 420 SAGE, Serial Analysis OfExpression ( S A G E ) 1096 SAHA, Suberoylanilide hydroxamic acid ( S A H A ) 701 SAR, Structure-activity relationship ( S A R ) 792,811,828,876,950,1008,1128 Saturation transfer detection (STD) 873 SCAM, Substituted-cystein accessibility method ( S C A M ) 949 Scavengers 485 Scintillation proximity assay (SPA) 361 Screening campaigns failure of druglike leads or chemical tools, discovery of 804 Scytovirin-N 679 SE, Sulfonate ester ( S E ) 411 Segmental isotopic labeling 555 Selectins 643ff, 681 features of 644 Selective estrogen receptor modulators (SERMs) 916 tamoxifen and second generation SERM, raloxifene 916ff Selective GR modulators (SGRMs) 918 drugs for variety of, debilitating diseases 918f Selective nuclear receptor modulators (SNuRMs) 916 Selective optimization of side activities (SOSA) 958 Selective peroxisome proliferator activated receptor gamma modulators (SPPARMs) 919 Selenocysteine 576 Self-assembled monolayers 676 Semantics 4, 5, 7, 9 Preparative Chemistry - Synthetic Chemistry 9
I
1197
1198
I Index Semantics (continued ) Synthetic Design 8 Sensitizing Mutations 126 to Engineer Nucleotide Binding Pockets 126 Exploiting 126 GTPases to XTPases 128 Uniquely Inhibitable Kinases 126 Sequential ligation 545 Serial Analysis Of Expression (SAGE) 1096 series identifier (SID) 767 Serine hydrolase (SH) 409 Serine/threonine kinase 385, 399 SERMs, Selective estrogen receptor modulators ( S E R M s ) 916 Serotonin N-acetyltransferase 394 Serpentine receptors 933 7TM, Seven transmembrane ( 7 T M ) 933 SGRMs, Selective G R modulators ( S G R M s ) 918 SH, Serine hydrolase ( S H ) 409 Shikimic acid 647, 648 Shokat kinases, allele-specific chemical intervention of 365 Short synthetic peptides 400 SHP, Small heterodimerpartner ( S H P ) 367 SHP-1 391, 392 mutations of, in mice 391 SHP-2 391,392 Sialic acid 9-0-acetylesterase (SAE) 420 Sialyl Le' 682 Sialyl Le" 682 Signal transducer and activator of transcription 6 (STAT6) 1097 Signal transducers and activators of transcriptions (STATs) 1134 Signal transduction intracellular signaling, modeling fundamentals of 1063 modeling of 1062 Signal transduction mechanisms complex kinetic models control of cytoskeleton 1074 gene regulatory networks and genomic data, interface with 1074 limitations of 1074ff model compression and integration, issues of 1075 prospects and challenges 1074ff sequence of, signaling complex assembly 1074
multiple pathways and cell stimuli model generality 1075 signaling module, compression of activation of PDGF receptor 1076 Signal transduction pathways mathematical modeling emergence of, powerful tool 1061 Signaling cascades and networks bistability, existence and functional significance of 1073 general considerations and pathway-specificmodels 1073f multiple signaling pathways 1074 Erk activation, by Raf-MEK-Erk cascade 1074 pathway crosstalk interactions in positive feedback loops 1074 Signaling literature conceptual models invoked in 1062 Signaling pathways 1046 binding, cell surface receptors 1062 Signaling processes novel experiments, outcomes of generating hypothesis-driven research 1062 quantitative models of 1062 Silencing mediator of retinoid and thyroid (SMRT) 914 Similog descriptor 958 Simplified Molecular Input Line Entry System (SMILE) 761 Single gene mutations 1045 Single nuclear polymorphism (SNP) 970 Single nucleotide polymorphism (SNP) 378 Single Residue Protein Caging 152 alkyl halides photolabile 152 amino acid different from lysine or cysteine 155 residues 155 specific 155 BChE catalytic activity 155 mechanistic properties of 156 cysteine residues essential 152 modification 152 in vitro F-actin filaments 153 motility assay 153 motility models 154
in vivo role of cofilin 154 kinase protein 153 phenacyl groups 154 Single-component 21 Consecutive Procedure 21 example total synthesis of estrone 21 Oral Contraceptives 21 Singlet oxygen CALI, alternative methods of transgenic knockouts 450 chromophore, or fluorophore assisted laser or light inactivation 450 SIRT, Sirtuin ( S I R T ) 696, 1131 Site-directed mutagenesis 386, 988 Skeletal diversity approaches to 501 Smad2 553,555 Small Caged Molecules 159 Caged Peptides 159 to Control Protein Activity 159 ligand activating 159 inhibiting 159 synthesis obstacles 150 Small heterodimer partner (SHP) 367 Small molecule perturbagens (SMPs) 318 Small Molecule- Protein Interactions 206,220 chemical inducers of dimerization in a small molecule 206 drug discovery research 220 enzyme 220 invivo 221 three-hybrid assay yeast 206 Small Molecules 71, 73, 75, 77, 79,81, 83, 85, 87, 89 inhibitor design strategies 89 Discovery 89 specificity 89 probes fluorescence-based 90 as Probes for Biological Processes 77 proteome small fraction 90 targeted 90 to perturb designing strategies 71
protein function 71 short timescales 71 Small-molecule interaction database (SMID) 348 SMD, Stanford microarray database ( S M D ) 1096 SMDLID 767 SMID, Small-molecule interaction database ( S M I D ) 348 SMILE, Simplijed Molecular Input Line Entry System ( S M l L E ) 761 SMPs, Small molecule perturbagens ( S M P s ) 318 SMRT, Silencing mediator of retinoid and thyroid ( S M R T ) 914 SNF, Sucrose nonferuenting ( S N F ) 694 SNP, Single nuclear polymorphism (SNP) 970 SNP, Single nucleotide polymorphism ( S N P ) 378,970 SNuRMs, Selective nuclear receptor modulators (SNuRMs) 916 SOD, Superoxide dismutase ( S O D ) 621 Solid phase peptide synthesis (SPPS) 543, 568-569 abilityof 569 restrictions of 545 Solid-phase reactions heterogeneous nature of 485 Solid-phase synthesis 484f, 487,670 advantages of 670 Solid-supported reagents 485 SOS-NMR,Structural information using overhauser efects and selective labeling ( S O S - N M R ) 887 SOSA, Selective optimization of side activities (SOSA) 958 SPA, Scintillation proximity assay (SPA) 361 Sphigosine-1-phosphate (SlP) 959 spindle 72 Spiruchostatin epimer of investigating, saturable transporters 716 Split inteins 542, 559 Split-pool synthesis 489ff encoding 492 solid-phase 493 SPPARMs, Selective peroxisome proliferutor activated receptor g a m m a modulators ( S P P A R M s ) 919
1200
I
Index SPPS, Solid phase peptide synthesis ( S P P S ) 568 SPR, Surface plasmon resonance ( S P R ) 361,843,855 SRC1, Steroid receptor coactivator I ( S R C I ) 511,914 Stanford microarray database (SMD) 1096 STAT6, Signal transducer and activator of transcription 6 (STAT6) 1097 Static Variation 31 Clark Still’sencoding-decoding alternation 32 combinatorial approach antiasthma drug 31 split-and-mix strategy 31 Molecular decoding 33 Molecular encoding 32 molecular tags cleaved photochemically 33 on-bead selection test specified 33 Preparation 31 Screening 31 variants identified 32 removed 32 with affinity for the receptor 32 variation preparative rounds 32 on resin-beads 32 screened 32 STATs, Signal transducers and activators of transcriptions (STATs) 1134 Staudinger ligation 546, 547 first bioconjugation reaction 617 generation of, fluorescent Staudinger ligation products 619 powerful tool study of, glycosylation pathways 618 quenching process enhancement in, dye quantum yield 618 STD, Saturation transfer detection ( S T D ) 873 Stem cells differentiation modulators 509f differentiation of small molecule modulators 510 pluripotent embryonic 509 Steroid receptor coactivator 1 (SRC1) 511, 914 Stimuli 1045
Structural biology and application of knowledge management families of, targets kinases, proteases, ion channels 796 Structural information using overhauser effects and selective labeling (SOS-NMR) 887 Structure activity relationship (SAR) 505, 792,811,828,876,950,1008,1128 Suberoylanilide hydroxamic acid (SAHA) 701 Substituent constant 731 Substituted-cystein accessibility method (SCAM) 949 Subtiligase 539, 574 Sucrose nonferuenting (SNF) 694 Sugar-nucleotide-binding enzymes effective probes, design of 655ff high-throughput screening probe identification through 651ff inhibitors of 648ff identification of 651ff Sulfhydryl-reactiveaffinity reagents 949 Sulfonate ester (SE) 411 Superoxide dismutase (SOD) 621 Support vector machines (SVMs) 958 Surface plasmon resonance (SPR) 361, 668,673,676,843,855 SVMs, Support vector machines ( S V M s ) 958 SWI, Yeast mating type switching (SWI) 694 SwissProt ID 771 Synaptotagmin FlAsH-FALI inactivation of Davis and coworkers, using 450 Synthesis - Genesis - Preparation 4 artificial indigo 6 artificial urea 5 biological indigo 7 ch em ica1 indigo 7 construction anabolic pathway 5 degradation catabolic pathway 5 example indigo 7 genesis programmed 8 indigo 6f N-phenylglycine 5
preparation intuitive 8 synthesis planned 8 synthetic chemist asdesigner 8 as molecule maker 8 Synthetic Execution 8 target molecule constitution of 6 degradation products 6 Urea 5 Wohler 5 Synthetic carbohydrates 668 Synthetic chemistry Hecht method, chemical aminoacylation of isolated tRNAs 274 nonnatural amino acids aminoacylation oftRNA 274f progress of 272f Synthetic Design 8 Design 8 execution bottom-up-oriented 8 R. B. Woodward art of organic synthesis 9 synthetic planning 9 top-down event 8 Synthetic drugs 496 and natural products structures of 499 vs. natural products 503 Synthetic Execution 8 Synthetic organic chemistry 725 Systematic nomenclature 730 Systems biology 1045 vs. bioinformatics 1048 biological signals and actions 826 interactions of proteins, and pathways of transferring 826 chemical biology and 1145 chemical genomics and chemical proteomics, chemical approach to 1118 definition of 1048 general considerations in 1047ff goal of 1045 history of 1046 holistic approach of biological networks and experimental data 379 impact on medicine 1058 limiting factor in 1057 vs. mathematical biology 1048
of metabolic systems 1046 one Postdoc - one protein 1049 organism function concept of 1083f protein function of cell multicell organisms, complex interplay in 355 vs. proteomics, genomics, metabolomics and high-throughput technologies 1048 signal pathways turning into signal networks 379 VS.
t
T cell death-associated gene 8 (TDAG8) 949 T Helper Type 1 (Thl) 1097 T Helper Type 2 (Th2) 1097 T Regulatory (Treg) 1106 T-cell receptor (TCR) 1097,1120
cell differentiation control of 1099 Tail tyrosine residue of Src phosphorylation of, by Csk 400 TAM, Tertiary amine (TAM) 966 Tamoxifen first synthetic NR small molecule with differential tissue effects 916ff first-line treatment for ER-positivebreast cancer 916 Tanimoto metrics mean and standard deviation of, the distribution 332 Target family kinases prototype of 826 Target family approach foundations of 825 proteins, clustering of 825 Target of Rapamycin 119 F KB P - rapamycin complex 120 Identification 119 immunosuppressants cyclosporin A 121 FK506 121 Mechanism 121 mechanism of action 121 proteins target of rapamycin 120 TORI 120 TOR2 120 rapam ycin cellular targets of 120
1202
I
Index
Target of Rapamycin (continued) immunosuppressant 119 natural product 119 resistance mutations from genome-wide screens 122 isolating 122 targets phenotypically 119 relevant 119 TASP, Template assembled synthetic protein (TASP) 585 Taste receptors 944 TCEP, Tris(carboxyethylphosphi~e) (TCEP) 620 TCR, T-cell receptor (TCR) 1097, 1120 TDAG8, T cell death-associated gene 8 (TDAG8) 949 TE, Thioesterase (TE) 522 Temperature-Sensitive Glycoprotein of Vesicular Stomatitis Virus-O'--alkylguanine-DNA Alkyltransferase (tsVSVG-AGT) 465 multicolor analysis of 466 Template assembled synthetic protein (TASP) 585 Tent-botylmethoxycarbonyl (Boc)-based peptide synthesis 543 Tertiary amine (TAM) 966 Tetracenomycin 525 Tetracysteine-biarsenical system biarsenicals, chemistry of 430ff environment-sensitive fluorescent biarsenicals 445f FlAsH-EDT2, synthesis of 431 fluorescence anisotropy of FlAsH-tetracysteinecomplex 446ff future developments, and applications of 453f general considerations of 430ff genetically encoded fluorescence tag small size of 439ff history and design concepts of 429f multicolor pulse-chase labeling 443ff nonspecific staining limitation of 454 peptide libraries optimizing tetracysteine sequence with 435ff practical applications of 439ff protein-lipoates cofactors and enzyme thiols regeneration of 429f regeneration of, to arsenic 429
red-fluorescent dye resorufin (ReAsH) important biarsenical besides FlAsH 43 1 single-molecule studies using biarsenical-tetracysteines 448 small-molecule labeling systems comparison with 438f specificity of, biarsenical-tetracysteine method optimized tetracysteine sequences 435 tetracysteine motif 433ff toxicity and antidotes 437f two-color method continuous imaging of single cells 443 TFA, Trijuoroacetic acid (TFA) 569 TGFB, Transfonninggrowthfactor /?(TGFB) 552 TGFB signaling 552 Thl, T Helper Type I ( T h I ) 1097 Th2, T Helper Type 2 (Th2) 1097 Thermal Sensation 76 capsaicin cellular phenotype 76 natural product 76 cation channel 77 cloned receptor VR1 (vanilloidreceptor subtypel) 77 cloning strategy 77 Thiazolidinediones (TZDs) 902 Thioesterase (TE) 522 Thioesters 542ff C-terminus intein-based methods 611 Thiolate anions sulfur-carbon bond lysine cross-reactivity 597 Three-hybrid (3H) 1120 Thyroid-stimulating hormone (TSH) 969 TIF2, Transcription intennediaryfactor 2 (TIF2) 914 TIMP-1, Tissue inhibitor ofmatrix metalloproteinase ( T I M P ) - I 1105 TIPS, Triisopropylsilyl ( T I P S ) 709 Tissue-specific progenitor cells dedifferentiation of 509 TMC-95A 103 Tmsotf 671 TMV, Tobacco mosaic virus (TMV) 600 TNF, Tumor necrosisfactor (TNF) 794, 1103 to Link a Protein Target 72 capsaicin and menthol 76
Index
Colchicine and Tubulin 72 Cytochalasin and Actin 74 phenotypes inhibition 72 Thermal Sensation 76 to a Cellular Phenotype 72 Tobacco mosaic virus (TMV) 600 Toxicology 1033 TPX, Trapoxin ( T P X ) 98 TR-associated protein (TRAP) 914 TRAF, Tumor necrosisfactor receptor-associatedfactor (TRAF) 1103 Transcription 235 Regulated 235 transcription activate 235 Transcription intermediary factor 2 (TIF2) 914 Transcriptional Regulators 175 Derived from Natural Repressors 175 Eukaryotic 177 Functional Orthogonality 180 Genetic Disease 186 Ligand-binding Pockets 188 Ligand-dependent Activators 177 Light-activated Gene Expression 189 New Ligand Specificities 179 Nuclear Receptor Engineering 183 Receptor Plasticity 180 Recombinases 184 Role of Ligand-dependent 175 Transducers 939 Transforming growth factor B (TGFB) 552 Translational medicine chemical biology and 1148 Transthioesterification 522 TRAP, TR-associated protein (TRAP) 914 Trapoxin (TPX) 98 affinity reagent synthesized 98 fungal metabolite 98 Treg, T Regulatory (Treg) 1106 Triantennary N-linked mannoside (Man)g(GlcNAc)z 679, 681 Triarylphosphines and azides Staudinger ligation reacting to form, iminophosphorane imtermediate 617 Trichostatin A (TSA) 97, 508, 701 anti fungal from a Streptomyces 97 concentrations low 97
nanomolar 97 Trifluoroacetic acid (TFA) 569 Triisopropylsilyl (TIPS) 709 Trimethylsilyl triflate 671 Tris(carboxyethy1phosphine)(TCEP) 620 Tryptophan residues, modification of using metallocarbenoids 604ff TSA, Trichostatin A ( T S A ) 508, 701 TSH, Thyroid-stimulating hormone (TSH) 969 tsVSVG-AGT, Temperature-Sensitive Glycoprotein of Vesicular Stomatitis Viuus-O'--alkylguanine-DNA Alkyltransferase (tsVSVG-AGT) 465 Tubacin 505,508f Tumor necrosis factor (TNF) 794, 1103 cytokine synovial proliferation, critical role in 794 Tumor necrosis factor receptor-associated factor (TRAF) 1103 Tunicamycin 649 B-TurnslStrands bilayer lipid 258 Computational modeling Macromodel program 257 computer-simulated conformational search 257 HIV-1 protease inhibitors 258 surface 258 pyrrolinone derivatives 258 inhibitory effects 258 scaffold #I-D-glucose 256 nonpeptide 257 scaffolds denovo 259 designed 259 structures mimic 257 protein 257 secondary 257 synthetic scaffolds 259 P-TurnslStrands 256 Peptidomimetics 256 Two-dimensional electrophoresis (2DE) 405 two-hybrid assay 199 biased toward proteins 201
I
1203
1204
I
Index
two-hybrid assay (continued) eukaryotic transcription factor 200 genetic 199 key modifications 202 libraries DNA 201 exact cDNA-AD 201 screen entire genome 204 selection strain 200 yeast 199 Tyrocidine 527 Tyrosine 385 bioconjugation protein surface residues, as targets for 598 electrophilic aromatic substitution method for 598 modification of commercially available lysine-reactive probes 602 three component Mannich-type reaction 600 using palladium JC -ally1chemistry 603 residues native chemical ligations, using 602 residues, modification of new chemical tools 597ff TZDs, Thiazolidinediones (TZDs) 902 U
Ubiquitin-split-protein-sensor (USPS) 1132 UGM, Uridine 5’-diphosphate-galactopyranosemutase (UGM) 639 uHTS, Ultra high-throughput screening ( u H T S ) 361 Uniquely Inhibitable Kinases 126 Analog-specific Kinases 127 gatekeeper residue mutation 126 inhibitor designed 127 pyrazolopyrimidine-based 127 uniquely sensitive kinase allele 126 Unstirred waterlayer (UWL) 1021 Ure2p 505ff Uretupamines 505ff Uridine 5’-diphosphate-galactopyranose mutase (UGM) 639,653 inhibitors of 654
US P S, Ubiquitin-split-protein-sensorUSPS) 1132 UWL, Unstirred waterlayer (UWL) 1021 V
Vaccines for malaria and HIV 677ff Vacuolar ATPases (V-ATPases) 103 enzymes 103 function as proton pumps 103 Inhibitors 103 van der Waals components affinity of binding 805 Vancomycin 519 Vasoactive intestinal peptide (VIP) 955 VEGFR-2, Vascular endothelid growthfactor receptor subtype 2 (VEGFR-2) 771 VFTM, Venusflytrap module (VFTM) 937 VIP, Vasoactive intestinal peptide (VIP) 955 Viral membrane proteins 585 Vitamin D receptor-interacting protein (DRIP) 914 VLP, Virus-like particle ( V L P ) s 439 W
Wild-type O6-a1kylguanine-DNA alkyltransferase (wtAGT) 464 Wild-type (WT) 317 Wnt/B-Catenin 1046 WOMBAT 760f activity identifier (AID) 769 bioactivity summary panel 766 computed chemical properties panel 766,768 database structure of 767ff datamining with 779ff and errors 772ff quality control 769ff reference database 766, 768 rule-of-three compliant molecules 782 SMDLID 767 target and biological information panel 766,767 target types in 763 WOMBAT 2006.1 761ff Bioactivity distribution pie charts in 76 3 enzyme inhibitors 763 estrogen receptor 771 mostypopulated oncology-relatedtargets in 781
Index
target type distribution pie charts in 764 vascular endothelial growth factor receptor subtype 2 (VEGFR-2) 771 WOMBAT-Pharmacokinetics, WOMBAT-PK 755ff Woodwardian 14 beginning in 1937 14 Case Study (+c)-Estrone 1 5 chemical reactions by diastereoselection 14 second phase 14 World Drug Index (WDI) 760 World of Molecular BioAcTivity, WOMBAT 761 WT, Wild-type( W T ) 317 wtAGT, Wild-type06-alkylguanine-DNA alkyltransjerase (wtAGT) 464 X
X-ray crystallography Xenopus melanocytes
Y
641, 646, 652 948
Y2H, Yeast two-hybrid ( Y Z H ) 1120 Y3H, Yeast three-hybrid ( Y 3 H ) 1120
Yeast 210 GFP and tetracysteine tags to p-tubulin 440 hybrid systems reverse 210 n-Hybrid System 210 split hybrid systems 210 transcriptional strength 21 1 yeast chromosome 211 Yeast mating type switching (SWI) 694 Yeast three-hybrid (Y3H) 1120 chemical structures of immunosuppressants FK50G and rapamycin 1121 competition assay measure of, cellular uptakelfunctionality of test MFC 1127 promising alternative methods general considerations 1127ff Yeast two-hybrid (Y2H) 1120 Yeastcloning 214 Yersinia bacteria mammalian cells infection of 440 YFP, Yellowfluorescent protein (YFP) 44 1
I
1205