Genetic Engineering Principles and Methods Volume 23
GENETIC ENGINEERING Principles and Methods Advisory Board Carl W. Anderson Peter Day Donald R. Helinski Maynard V. Olson John Shanklin
A Continuation Order Plan is available for this series. A continuation order will bring delivery of each new volume immediately upon publication. Volumes are billed only upon actual shipment. For further information please contact the publisher.
Genetic Engineering Principles and Methods
Volume 23 Edited by
Jane K. Setlow Brookhaven National Laboratory Upton, New York
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: Print ISBN:
0-306-47572-3 0-306-46645-7
©2002 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow Print ©2001 Kluwer Academic/Plenum Publishers New York All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: and Kluwer's eBookstore at:
http://kluweronline.com http://ebooks.kluweronline.com
CONTENTS OF EARLIER VOLUMES VOLUME 1 (1979) Introduction and Historical Background • Maxine F. Singer Cloning of Double-Stranded cDNA • Argiris Efstratiadis and Lydia Villa-Komaroff Gene Enrichment • M. H. Edgell, S. Weaver, Nancy Haigwood, and C. A. Hutchison III Transformation of Mammalian Cells • M. Wigler, A. Pellicer, R. Axel, and S. Silverstein Constructed Mutants of Simian Virus 40 • D. Shortle, J. Pipas, Sondra Lazarowitz, D. DiMaio, and D. Nathans Structure of Cloned Genes from Xenopus: A Review • R. H. Reeder Transformation of Yeast • Christine llgen, P. J. Farabaugh, A. Hinnen, Jean M. Walsh. and G. R. Fink The Use of Site-Directed Mutagenesis in Reversed Genetics • C. Weissmann, S. Nagata, T. Taniguchi, H. Weber, and F. Meyer Agrobacterium Tumor Inducing Plasmids: Potential Vectors for the Genetic Engineering of Plants • P. J. J. Hooykaas, R. A. Schilperoot, and A. Rörsch The Chloroplast, Its Genome and Possibilities for Genetically Manipulating Plants • L. Bogorad Mitochondrial DNA of Higher Plants and Genetic Engineering • C. S. Levings III and D. R. Pring Host-Vector Systems for Genetic Engineering of Higher Plant Cells • C. I. Kado Soybean Urease—Potential Genetic Manipulation of Agronomic Importance • J. C. Polacco, R. B. Sparks, Jr., and E. A. Havir
VOLUME 2 (1980) Cloning of Repeated Sequence DNA from Cereal Plants • J. R. Bedbrook and W. L. Gerlach The Use of Recombinant DNA Methodology in Approaches to Crop Improvement: The Case of Zein • Benjamin Burr Production of Monoclonal Antibodies • Sau-Ping Kwan, Dale E. Yelton, and Matthew D. Scharff Measurement of Messenger RNA Concentration • S. J. Flint DNA Cloning in Mammalian Cells with SV40 Vectors • D. H. Hamer Adenovirus–SV40 Hybrids: A Model System for Expression of Foreign Sequences in an Animal Virus Vector • Joseph Sambrook and Terri Grodzicker Molecular Cloning in Bacillus subtilis • D. Dubnau, T. Gryczan, S. Contente, and A. G. Shivakumar Bacterial Plasmid Cloning Vehicles • H. U. Bernard and D. R. Helinski Cloning with Cosmids in E. coli and Yeast • Barbara Hohn and A. Hinnen DNA Cloning with Single-Stranded Phage Vectors • W. M. Barnes Bacteriophage Lambda Vectors for DNA Cloning • Bill G. Williams and Frederick R. Blattner
VOLUME 3 (1981) Constructed Mutants Using Synthetic Oligodeoxyribonucleotides as Site-Specific Mutagens • M. Smith and S. Gillam
v
vi
CONTENTS OF EARLIER VOLUMES
Evolution of the Insertion Element IS1 that Causes Genetic Engineering in Bacterial Genomes In Vivo • E. Ohtsubo, K. Nyman, K. Nakamura, and H. Ohtsubo Applications of Molecular Cloning to Saccharomyces • M. V. Olson Cloning Retroviruses: Retrovirus Cloning? • W. L. McClements and G. F. Vande Woude Repeated DNA Sequences in Drosophila • M. W. Young Microbial Surface Elements: The Case of Variant Surface Glycoprotein (VSG) Genes of African Trypanosomes • K. B. Marcu and R. O. Williams Mouse Immunoglobulin Genes • P. Early and L. Hood The Use of Cloned DNA Fragments to Study Human Disease • S. H. Orkin Physical Mapping of Plant Chromosomes by In Situ Hybridization • J. Hutchinson, R. B. Flavell, and J. Jones Mutants and Variants of the Alcohol Dehydrogenase-1 Gene in Maize • M. Freeling and J. A. Birchler Developmentally Regulated Multigene Families in Dictyostelium discoideum • R. A. Firtel, M. McKeown, S. Poole, A. R. Kimmel, J. Brandis, and W. Rowekamp Computer Assisted Methods for Nucleic Acid Sequencing • T. R. Gingeras and R. J. Roberts VOLUME 4 (1982) New Methods for Synthesizing Deoxyoligonucleotides • M. H. Caruthers, S. L. Beaucage, C. Becker, W. Efcavitch, E. F. Fisher, G. Galluppi, R. Goldman, P. deHaseth, F. Martin, M. Matteucci, and Y. Stabinsky An Integrative Strategy of DNA Sequencing and Experiments Beyond • J. Messing Transcription of Mammalian Genes In Vitro • J. L. Manley Transcription of Eukaryotic Genes in Soluble Cell-Free Systems • N. Heintz and R. G. Roeder Attachment of Nucleic Acids to Nitrocellulose and Diazonium-Substituted Supports–B. Seed Determination of the Organization and Identity of Eukaryotic Genes Utilizing Cell-Free Translation Systems • J. S. Miller, B. E. Roberts, and B. M. Paterson Cloning in Streptomyces: Systems and Strategies • D. A. Hopwood and K. F. Chater Partial Sequence Determination of Metabolically Labeled Radioactive Proteins and Peptides • C. W. Anderson Molecular Cloning of Nitrogen Fixation Genes from Klebsiella pneumoniae and Rhizobium meliloti • F. M. Ausubel, S. E. Brown, F. J. deBruijn, D. W. Ow, G. E. Riedel, G. B. Ruvkun, and V. Sandaresan The Cloning and Expression of Human Interferon Genes • R. M. Lawn Cloning by Complementation in Yeast: The Mating Type Genes • J. B. Hicks, J. N. Strathern, A. J. S. Klar, and S. L Dellaporta Construction and Screening of Recombinant DNA Libraries with Charon Vector Phages • B. A. Zehnbauer and F. R. Blattner VOLUME 5 (1983) Microcloning of Microdissected Chromosome Fragments • V. Pirrotta, H. Jackie, and J. E. Edstrom Transient Expression of Cloned Genes in Mammalian Cells • J. Banerji and W. Schaffner Transposable Elements in Archaebacteria • W. F. Doolittle, C. Sapienza, J. D. Hofman, R. M. Mackay, A. Cohen, and W.-L. Xu The Application of Restriction Fragment Length Polymorphism to Plant Breeding • B. Burr, S. V. Evola, F. A. Burr, and J. S. Beckmann Antibodies against Synthetic Peptides • G. Walter and R. F. Doolittle Wheat α-Amylase Genes: Cloning of a Developmentally Regulated Gene Family • D. Baulcombe Yeast DNA Replication • J. L. Campbell Chromosome Engineering in Wheat Breeding and Its Implications for Molecular Genetic Engineering • C. N. Law Bovine Papillomavirus Shuttle Vectors • N. Sarver, S. Miltrani-Rosenbaum, M.-F. Law, W. T. McAllister, J. C. Byrne, and P. M. Howley Chemical Synthesis of Oligodeoxyribonucleotides: A Simplified Procedure • R. L. Letsinger
CONTENTS OF EARLIER VOLUMES
VOLUME 6 (1984) Cloning of the Adeno-Associated Virus • K. I. Berns Transformation of the Green Alga Chlamydomonas reinhardii • J.-D. Rochaix Vectors for Expressing Open Reading Frame DNA in Escherichia coli Using lacZ Gene Fusions • G. M. Weinstock An Enigma of the Leghemoglobin Genes • J. S. Lee and D. P. S. Verma Yeast Transposons • G. S. Roeder Rearrangement and Activation of C-MYC Oncogene by Chromosome Translocation in the B Cell Neoplasias • K. B. Marcu, L. W. Stanton, L. J. Harris, R. Watt, J. Yang, L. Eckhardt, B. Birshtein, E. Remmers, R. Greenberg, and P. Fahrlander Screening for and Characterizing Restriction Endonucleases • I. Schildkraut Molecular Studies of Mouse Chromosome 17 and the T Complex • L. M. Silver, J. I. Garrels, and H. Lehrach Use of Synthetic Oligonucleotide Hybridization Probes for the Characterization and Isolation of Cloned DNAs • A. A. Reyes and R. B. Wallace Hybridization of Somatic Plant Cells: Genetic Analysis • Yu. Yu. Gleba and D. A. Evans Genetic Analysis of Cytoskeletal Protein Function in Yeast • P. Novick, J. H. Thomas, and D. Botstein Use of Gene Fusions to Study Biological Problems • L. Guarente The Use of the Ti Plasmid of Agrobacterium to Study the Transfer and Expression of Foreign DNA in Plant Cells: New Vectors and Methods • P. Zambryski, L. Herrera-Estrella, M. De Block, M. Van Montagu, and J. Schell Analysis of Eukaryotic Control Proteins at Their Reception Sequences by Scanning Transmission Electron Microscopy • P. V. C. Hough, M. N. Simon, and I. A. Mastrangelo The Mass Culture of a Thermophilic Spirulina in the Desert • K. Qian, G. H. Sato, V. Zhao, and K. Shinohara DNA-Mediated Gene Transfer in Mammalian Gene Cloning • F. H. Ruddle, M. E. Kamarck, A. McClelland, and L. C. Kühn VOLUME 7 (1985) Biochemical and Genetic Analysis of Adenovirus DNA Replication In Vitro • B. W. Stillman Immunoscreening Recombinant DNA Expression Libraries • R. A. Young and R. W. Davis In Situ Hybridization to Cellular RNAs • R. C. Angerer, K. H. Cox, and L. M. Angerer Computer Methods to Locate Genes and Signals in Nucleic Acid Sequences • R. Sladen Biochemical and Molecular Techniques in Maize Research • N. Fedoroff Analysis of Chromosome Replication with Eggs of Xenopus laevis • R. A. Laskey, S. E. Kearsey, and M. Mechali Molecular Genetic Approaches to Bacterial Pathogenicity to Plants • M. J. Daniels and P. C. Turner Synthesis of Hybridization Probes and RNA Substrates with SP6 RNA Polymerase • P. A. Krieg, M. R. Rebagliati, M. R. Green, and D. A. Melton Identification and Isolation of Clones by Immunological Screening of cDNA Expression Libraries • D. M. Helfman, J. R. Feramisco, J. C. Fiddes, G. P. Thomas, and S. H. Hughes Molecular Studies on the Cytomegaloviruses of Mice and Men • D. H. Spector Gene Transfer with Retrovirus Vectors • A. Bernstein, S. Berger, D. Huszar, and J. Dick HPRT Gene Transfer as a Model for Gene Therapy • T. Friedmann Catabolic Plasmids: Their Analysis and Utilization in the Manipulation of Bacteria Metabolic Activities • S. Harayama and R. H. Don Transcription of Cloned Eukaryotic Ribosomal RNA Genes • B. Sollner-Webb, J. Tower, V. Culotta, and J. Windle DNA Markers in Huntington’s Disease • J. F. Gusella
vii
viii
CONTENTS OF EARLIER VOLUMES
VOLUME 8 (1986) Regulation of Gene Activity During Conidiophore Development in Aspergillus nidulans • W. E. Timberlake and J. E. Hamer Regulation of Expression of Bacterial Genes for Bioluminescence • J. Engebrecht and M. Silverman Analysis of Genome Organization and Rearrangements by Pulse Field Gradient Gel Electrophoresis • C. L. Smith, P. E. Warburton, A. Gaal, and C. R. Cantor Structural Instability of Bacillus subtilis Plasmids • S. D. Ehrlich, Ph. Noirot, M. A. Petit, L. Jannière, B. Michel, and H. te Riele Geminiviruses, The Plant Viruses with Single-Stranded DNA Genome • A. J. Howarth The Use of Bacterial Plasmids in the Investigation of Genetic Recombination • A. Cohen Shuttle Mutagenesis: A Method of Introducing Transposons into Transformable Organisms • H. S. Seifert, M. So, and F. Heffron Genetic Advances in the Study of Rhizobium Nodulation • S. R. Long Galactokinase Gene Fusion in the Study of Gene Regulation in E. coli, Streptomyces, Yeast and Higher Cell Systems • M. Rosenberg, M. Brawner, J. Gorman, and M. Reff Structure and Function of the Signal Recognition Particle • V. Siegel and P. Walter Alteration of the Structure and Catalytic Properties of Rubisco by Genetic Manipulation • S. Gutteridge Electrophoresis of DNA in Denaturing Gradient Gels • L. S. Lerman Caulimoviruses as Potential Gene Vectors for Higher Plants • R. J. Shepherd An Insect Baculovirus Host-Vector System for High-Level Expression of Foreign Genes • D. W. Miller, P. Safer, and L. K. Miller Preparation of cDNA Libraries and the Detection of Specific Gene Sequences • J. Brandis, D. Larocca, and J. Monahan Construction of Human Chromosome Specific DNA Libraries: The National Laboratory of Gene Library Project • L. L. Deaven, C. E. Hildebrand, J. C. Fuscoe, and M. A. Van Dilla New Approaches to the Expression and Isolation of a Regulatory Protein • D. Bastia, J. Germino, S. Mukherjee, and T. Vanaman VOLUME 9 (1987) Gene Transfer in the Sea Urchin • B. R. Hough-Evans and E. H. Davidson Properties and Uses of Heat Shock Promoters • H. Pelham The Expression of Introduced Genes in Regenerated Plants • D. Dunsmuir, J. Bedbrook, D. Bond-Nutter, C. Dean, D. Gidoni, and J. Jones Control of Maize Zein Gene Expression • R. S. Boston and B. A. Larkins Dnase I Footprinting as an Assay for Mammalian Gene Regulatory Proteins • W. S. Dynan Use of Gene Transfer in the Isolation of Cell Surface Receptor Genes • D. R. Littman, and M. V. Chao A New Method for Synthesizing RNA on Silica Supports • D. J. Dellinger and M. H. Caruthers Activity Gels: Reformation of Functional Proteins from SDS-Polyacrylamide Gels • R. P. Dottin, B. Haribabu, C. W. Schweinfest, and R. E. Manrow Plasmid Vectors Carrying the Replication Origin of Filamentous Single-Stranded Phages • G. Cesareni and J. A. H. Murray High Level Production of Proteins in Mammalian Cells • R. J. Kaufman Plant Microinjection Techniques • R. J. Mathias Genetic Transformation to Confer Resistance to Plant Virus Disease • R. N. Beachy, S. G. Rogers, and R. T. Fraley Alternative Splicing: Mechanistic and Biological Implications of Generating Multiple Proteins from a Single Gene • B. Nadal-Ginard, M. E. Gallego, and A. Andreadis VOLUME 10 (1988) Genomic Footprinting • P. B. Becker and G. Schütz Theoretical and Computer Analysis of Protein Primary Sequences: Structure Comparison and Prediction • P. Argos and P. McCaldon
CONTENTS OF EARLIER VOLUMES
ix
Affinity Chromatography of Sequence-Specific DNA-Binding Proteins • C. Wu, C. Tsai, and S. Wilson Applications of the Firefly Luciferase as a Reporter Gene • S. Subramani and M. DeLuca Fluorescence-Based Automated DNA Sequence Analysis • L. M. Smith Phosphorothioate-Based Oligonucleotide-Directed Mutagenesis • J. R. Sayers and F. Eckstein Design and Use of Agrobacterium Transformation Vectors • M. Bevan and A. Goldsbrough Cell Commitment and Determination in Plants • F. Meins, Jr. Plasmids Derived from Epstein–Barr Virus: Mechanisms of Plasmid Maintenance and Applications in Molecular Biology • J. L. Yates Chromosome Jumping: A Long Range Cloning Technique • A. Poustka and H. Lehrach Isolation of Intact MRNA and Construction of Full-Length cDNA Libraries: Use of a New Vector, and Primer-Adapters for Directional cDNA Cloning • J. H. Han and W. J. Rutter The Use of Transgenic Animal Techniques for Livestock Improvement • R. M. Strojek and T. E. Wagner Plant Reporter Genes: The GUS Gene Fusion System • R. A. Jefferson Structure of the Genes Encoding Proteins Involved in Blood Clotting • R. T. A. MacGillivray, D. E. Cool, M. R. Fung, E. R. Guinto, M. L. Koschinsky, and B. A. Van Oost VOLUME 11 (1989) DNA Methylases • A. Razin Advances in Direct Gene Transfer into Cereals • T. M. Klein, B. A. Roth, and M. E. Fromm The Copy Number Control System of the 2µm Circle Plasmid of Saccharomyces cerevisiae • B. Futcher The Application of Antisense RNA Technology to Plants • W. R. Hiatt, M. Kramer, and R. E. Sheehy The Pathogenesis-Related Proteins of Plant • J. P. Carr and D. F. Klessig The Molecular Genetics of Plasmid Partition: Special Vector Systems for the Analysis of Plasmid Partition • A. L. Abeles and S. J. Austin DNA-Mediated Transformation of Phytopathogenetic Fungi • J. Wang and S. A. Leong Fate of Foreign DNA Introduced to Plant Cells • J. Paszkowski Generation of cDNA Probes by Reverse Translation of Amino Acid Sequence • C. C. Lee and C. T. Caskey Molecular Genetics of Self-Incompatibility in Flowering Plants • P. R. Ebert, M. Altschuler, and A. E. Clarke Pulsed-Field Gel Electrophoresis • M. V. Olson VOLUME 12 (1990) Folding of Eukaryotic Proteins Produced in Escherichia coli • R. F. Kelley and M. E. Winkler Human Retinoblastoma Susceptibility Gene • C.-C. Lai and W.-H. Lee A New Chimeric Nucleic Acid Analog • F. Morvan, B. Rayner, and J.-L. Imbach The Utility of Streptomycetes and Hosts for Gene Cloning • P. K. Tomich and Y. Yagi From Footprint to Function: An Approach to Study Gene Expression and Regulatory Factors in Transgenic Plants • E. Lam Purification of Recombinant Proteins with Metal Chelate Adsorbent • E. Hochuli Determinants of Translation Efficiency of Specific mRNAs in Mammalian Cells • D. S. Peabody The Polymerase Chain Reaction • N. Arnheim Regulation of Alternative Splicing • M. McKeown Structure and Function of the Nuclear Receptor Superfamily for Steroid, Thyroid Hormone and Retinoic Acid • V. Giguère Identification and Functional Analysis of Mammalian Splicing Factors • A. Bindereif and M. R. Green The Genes Encoding Wheat Storage Proteins: Towards a Molecular Understanding of Bread-Making Quality and Its Genetic Manipulation • V. Colot Control of Translation Initiation in Mammalian Cells • R. J. Kaufman
x
CONTENTS OF EARLIER VOLUMES
Electroporation of Bacteria: A General Approach to Genetic Transformation • W. J. Dower The Isolation and Identification of cDNA Genes by Their Heterologous Expression and Function • G. G. Wong Molecular Cloning of Genes Encoding Transcription Factors with the Use of Recognition Site Probes • H. Singh
VOLUME 13 (1991) The Mutator Transposable Element Family of Maize • V. Walbot Protein Phosphorylation and the Regulation of Cellular Processes by the Homologous TwoComponent Systems of Bacteria • A. J. Ninfa The Peculiar Nature of Codon Usage in Primates • S. Zhang and G. Zubay The Role of Nodulation Gene in Bacterium–Plant Communication • A. Kondorosi, E. Kondorosi, M. John, J. Schmidt, and J. Schell Regulation of Gene Expression by Epidermal Growth Factor • L. G. Hudson and G. N. Gill Machinery of Protein Import into Chloroplasts and Mitochondria • D. Pain, D. J. Schnell, H. Murakami, and G. Blobel High-Level Expression of Foreign Genes in Mammalian Cells • S. E. Kane Aromatic Hydrocarbon Degradation: A Molecular Approach • G. J. Zylstra and D. T. Gibson Employment of Fibroblasts for Gene Transfer Applications for Grafting into the Central Nervous System • M. D. Kawaja, J. Ray, and F. H. Gage The Molecular Biology of Amino Acid Biosynthesis in Plants • T. Brears and G. M. Coruzzi Genetic Manipulation of Bacillus thuringiensis Insecticidal Crystal Protein Genes in Bacteria • C. Gawron-Burke and J. A. Baum Progress Towards Gene Targeting in Plants • J. I. Yoder and E. Kmiec Molecular Biology of Mating-Type Determination in Schizophyllum commune • R. C. Ullrich, C. A. Specht, M. M. Stankis, H. Yang, L. Giasson, and C. P. Novotny Functions of Intracellular Protein Degradation in Yeast • M. Hochstrasser Transgenic Fish for Aquaculture • G. L. Fletcher and P. L. Davies
VOLUME 14 (1992) Cleavage-Site Motifs in Protein Targeting Sequences • G. von Heijne Complications of RNA Heterogeneity for the Engineering of Virus Vaccines and Antiviral Agents • E. Domingo and J. J. Holland The Quaternary Structures of SV40 Large T Antigen and Tumor Suppressor p53: Analysis by Gel Electrophoresis • J. E. Stenger, G. A. Mayr, K. Mann, S. Ray, M. E. Anderson, and P. Tegtmeyer Assembly of Antibodies and Mutagenized Variants in Transgenic Plants and Plant Cell Cultures • A. Hiatt, Y. Tang, W. Weiser, and M. B. Hein Maize Endosperm Tissue as an Endoreduplication System • R. V. Knowles, G. L. Yerk, F. Crienc, and R. L. Phillips Study of Chlorate-Resistant Mutants of Aradibopsis: Insights into Nitrate Assimilation and Ion Metabolism of Plants • N. M. Crawford Approaches and Progress in the Molecular Cloning of Plant Disease Resistance Genes • J. L. Bennetzen and J. D. G. Jones Is GRP78 a Sensor of Cellular Secretory Activity? • T. Leustek The Molecular Biology of Pathogenesis in Ustilago maydis • B. J. Saville and S. A. Leong Molecular Design of Oligomeric Channel Proteins • A. Grove, J. M. Tomich, and M. Montal Regulation of Gene Expression by Thyroid Hormones and Retinoic Acids • S. M. Lipkin, M. G. Rosenfeld, and C. K. Glass RNA Trans-Splicing • X.-Y. Huang and D. Hirsch Structural Constraints on Residue Substitution • J. Overington Molecular and Functional Analysis of the A Mating Type Genes of Coprinus cinereus • U. Kües and L. A. Casselton Physical Mapping of Human Chromosomes • G. A. Evans and D. L. McElligott
CONTENTS OF EARLIER VOLUMES
xi
VOLUME 15 (1993) Application of Computational Neural Networks to the Prediction of Protein Structural Features • S. R. Holbrook Human Cellular Protein Patterns and Their Link to Genome Data Mapping and Sequencing Data: Towards an Integrated Approach to the Study of Gene Expression • J. E. Celis, H. H. Rasmussen, H. Leffers, P. Madsen, B. Honore, K. Dejgaard, P. Gromov, and E. Olsen, H. J. Hoffman, M. Nielsen, B. Gesser, M. Puype, J. Van Damme, and J. Vandekerckhove Regulation of Translation in Plants • A. Danon, C. B. Yohn, and S. P. Mayfield On the Origins, Structures and Functions of Restriction-Modification Enzymes • J. Heitman Manipulation of Amino Acid Balance in Maize Seed • T. Ueda and J. Messing Investigational Approaches for Studying the Structures and Biological Functions in Myeloid Antimicrobial Peptides • M. E. Selsted Progress in the Cloning of Genes for Plant Storage Lipid Biosynthesis • V. C. Knauf Genes for Crop Improvement • J. Bennett Molecular Biology and Genetics of Protective Fungal Endophytes of Grasses • C. L. Schardl
and Z. An Prospects for Human Gene Therapy • A. B. Moseley and C. T. Caskey The Use of Microparticle Injection to Introduce Genes into Animal Cells • In Vitro and In Vivo S. A. Johnston and D-C. Tang VOLUME 16 (1994) RNA Polymerase III Transcription in the Yeast Saccharomyces cerevisiae • Stephen Buratowski Lens Oncogenesis and Differentiation • Heiner Westphal Genetic Engineering of Cardiac Muscle Cells: In vitro and In vivo • Stephen J. Fuller and Kenneth R. Chien Genetic Control of Plant Ureases • Joseph C. Polacco and Mark A. Holland Gene Discovery of Dictyostelium • William F. Loomis, Adam Kuspa, and Gad Shaulsky Transfer of YACs to Mammalian Cells and Transgenic Mice • Clare Huxley Plant Genetic Engineering and Future Agriculture • S. Riazuddin Internal Initiation of mRNA Translation in Eukaryotes • Ann Kaminski, Sarah L. Hunt, Catherine L. Gibbs, and Richard J. Jackson Genetic Recombination Analysis Using Sperm Typing • Karin Schmitt and Norman Arnheim Genetic Regulation in Plant Pathogenic Pseudomonads • David K. Willis, Jessica J. Rich, Thomas G. Kinscherf, and Todd Kitten Defense-Related Gene Induction in Plants • Danny Alexander, Kay Lawton, Scott Uknes, Eric Ward, and John Ryals The P1 Vector System for the Preparation and Screening of Genomic Libraries • Nancy S. Shepherd and David Smoller The Unmasking of Maternal mRNA During Oocyte Maturation and Fertilization • James L. Grainger Recognizing Exons in Genomic Sequences Using Grail II • Ying Xu, Richard Mural, Manesh Shah, and Edward Uberbacher Gene Expression of Plant Extracellular Proteins • Beat Keller VOLUME 17 (1995) The Molecular Biology of Nucleotide Excision Repair and Double-Strand Break Repair in Eukaryotes • Alan R. Lehman Manipulating and Mapping RNA with RecA-Assisted Restriction Endonuclease (RARE) Cleavage • Lance J. Ferrin Molecular Studies on the Virulence of Listeria monocytogenes • Michael Kuhn and Werner Goebel Indirect Use of Immobilized Metal Affiniity Chromatography for Isolation and Characterization of Protein Partners • Michèle Sawadogo and Michael W. Van Dyke Structure and Function of RNA Pseudoknots • C. W. A. Pleij Role of Molecular Chaperones in the Initiation of Plasmid DNA Recognition • Dhruba K. Chattoraj
xii
CONTENTS OF EARLIER VOLUMES
Structure, Function and Engineering of Bacillus thuringiensis Toxins • Mark A. Thompson, H. Ernest Schnepf, and Jerald S. Feitelson Uses of GAL4 Expression in Mammalian Cells • Ivan Sadowski Protein Thiol Modification of Glyceraldehyde-3-Phosphate Dehydrogenase • Bernhard Brüne and Eduardo G. Lapetina The Genetics of Nuclear Migration in Fungi • Susan M. Beckwith, Christian H. Roghi, and N. Ronald Morris Structure and Function of the Platelet-Derived Growth Factor Family and Their Receptors • Kristen C. Hart, Brendan D. Galvin, and Daniel J. Donoghue Recombination between Prokaryotic and Eukaryotic DNA: Integration of Agrobacterium tumefaciens T-DNA into the Plant Genome • Bruno Tinland and Barbara Hohn Metal Precipitation by Marine Bacteria: Potential for Biotechnological Applications • Bradley M. Tebo
VOLUME 18 (1996) Cloning and Characterization of DNAs with Palindromic Sequences • David R. F. Leach DNA Isolation, Manipulation and Characterization from Old Tissues • Rob DeSalle and Elizabeth Bonwich Growth Factors and Neural Connectivity • Sarah McFarlane and Christine E. Holt Gene Identification by 3´ Terminal Exon Trapping • David B. Krizman Engineering Transgenes for Use in the Mammary Gland • Sinai Yarus, Darryl Hadsell, and Jeffrey M. Rosen Problems that Can Limit the Expression of Foreign Genes in Plants: Lessons to Be Learned from B.t. Toxin Genes • Scott H. Dlehn, E. Jay De Rocher, and Pamela J. Green Renaturation and Reconstitution of Functional Holoenzyme from Recombinant Subunits of Casein Kinase II Expressed as Inclusion Bodies in E. coli • Wey-Jinq Lin, Rolf Jakobi, and Jolinda A. Traugh Plant ACYL-ACP Thioesterases: Chain-Length Determining Enzymes in Plant Fatty Acid Biosynthesis • Toni Voelker Genetic Engineering of an Insect Parasite • Randy Gaugler and Sarwar Hashmi The Stop Signal Controls the Efficiency of Release Factor-Mediated Translational Termination • Warren P. Tate, Mark E. Dalphin, Herman J. Pel, and Sally A. Manning Mechanism of Replication and Copy Number Control of Plasmids in Gram-Positive Bacteria • Saleem A. Khan Pathways of Protein Remodeling by Escherichia coli Molecular Chaperones • Marie Pak and Sue H. Wickner Pheromones and Pheromone Receptors as Mating-Type Determinants in Basidiomycetes • Lisa J. Vaillancourt and Carlene A. Raper Synthesis and Applications of Phosphopeptides • Kazuyasu Sakaguchi, Peter K. Roller, and Ettore Appella
VOLUME 19 (1997) Novel Approaches to Engineering Disease Resistance in Crops • Kathy M. M. Swords, Jihong Liang, and Dilip M. Shah The Structure of Plant Gene Promoters • Tom J. Guilfoyle Plasmid Stabilization by Post-Segregational Killing • Kenn Gerdes, Jimmy Schouv Jacobsen, and Thomas Franch Pathways and Genes Involved in Cellulose Synthesis • Yasushi Kawagoe and Deborah P. Delmer Conjugative Transposons • Abigail A. Salyers and Nadja B. Shoemaker Termination of DNA Replication in Prokaryotic Chromosomes • Deepak Bastia, Adhar C. Manna, and Trilochan Sahoo Regulation of Protein Degradation in Plants • Judy Callis Genetic Engineering of Oilseeds for Desired Traits • Anthony J. Kinney
CONTENTS OF EARLIER VOLUMES
xiii
Specificity of Receptor Tyrosine Kinase Signaling Pathways: Lessons from Drosophila • Willis Li and Norbert Perrimon Switching on Gene Expression: Analysis of the Factors that Spatially and Temporally Regulate Plant Gene Expression • Lee Meisel and Eric Lam Nucleic Acid Transport in Plant-Pathogen Interactions • Robert Lartey and Vitaly Citovsky Leaf Senescence: Gene Expression and Regulation • Louis M. Weaver, Edward Himelblau, and Richard M. Amasino Production and Analysis of Transgenic Mice Containing Yeast Artificial Chromosomes • Kenneth R. Peterson Comparative Molecular Analysis of Genes for Polycyclic Aromatic Hydrocarbon Degradation • Gerben J. Zylstra, Eungbin Kim, and Anil K. Goyal Recognition and Signaling in Plant-Pathogen Interactions: Implications for Genetic Engineering • Michael Lawton VOLUME 20 (1998) Agrobacterium-Mediated Horizontal Gene Transfer • Clarence I. Kado Computer-Assisted Methods for the Identification and Characterization of Polymerase II Promoters • Ingmar Reuter, Thomas Werner, and Edgar Wingender Retroviral cDNA Integration: Mechanism, Applications and Inhibition • Mark S. T. Hansen, Sandrine Carteau, Christopher Hoffman, Ling Li, and Frederic Bushman The Signal Transduction of Motion and Antigen Recognition: Factors Affecting T Cell Function and Differentiation • Stephen C. Bunnell and Leslie J. Berg Synthetic DNA Arrays • Alan Blanchard Detection of Single Nucleotide Variations • Pui-Yan Kwok and Xiangning Chen Antisense: A Key Tool for Cell and Developmental Studies in Dictyostelium • Richard H. Gomer Antisense in Abundance: The Ribosome as a Vehicle for Antisense RNA • Rosemary Sweeney, Qichaag Fan, and Meng-Chao Yao Salinity Tolerance—Mechanisms, Models and the Metabolic Engineering of Complex Traits • Donald E. Nelson, Bo Shen, and Hans J. Bohnert Biochemistry, Molecular Biology and Regulation of Starch Synthesis • Jack Preiss and Mirta N. Sivak Genetic Engineering and the Expression of Foreign Peptides or Proteins with Plant Virus-Based Vectors • Christophe Lacomme, Lisa Smolenska, and T. Michael A. Wilson Cloning and Expression of Large Mammalian cDNAs: Lessons from ATM • Yosef Shiloh, Anat Bar-Shira, Yaron Galanty, and Yael Ziv The Use of Genetically Engineered Cells in Drug Discovery • Gerhard Loeber and Renate Schnitzer Molecular Engineering of Monoterpene Production • Christian D. Haudenschild and Rodney B. Croteau VOLUME 21 (1999) Nuclear Plasmids of Dictyostelium • Joanne E. Hughes and Dennis L. Welker The Translation Initiation Signal in E. Coli and Its Control • Eckart Fuchs Direct Isolation of Specific Chromosomal Regions and Entire Genes by Tar Cloning • Vladimir Larionov Regulation of Lysine and Threonine Metabolism in Plants • Rachel Amir and Gad Galili Genetic Engineering of Plant Chilling Tolerance • James Tokuhisa and John Browse Role of Bacterial Chaperones in DNA Replication • Igor Konieczny and Maciej Zylicz Transformation of Cereals • Roland Bilang, Johannes Fütterer, and Christof Sautter Mechanisms of Initiation of Linear DNA Replication in Prokaryotes • Margarita Salas Diverse Regulatory Mechanisms of Amino Acid Biosynthesis in Plants • Katherine J. Denby and Robert L. Last Forage and Turf-Grass Biotechnology: Principles, Methods, and Prospects • John W. Forster and Germán C. Spangenberg Informatics Needs of Plant Molecular Biology • Mary Polacco
xiv
CONTENTS OF EARLIER VOLUMES
VOLUME 22 (2000) Post-Transcriptional Light Regulation of Nuclear-Encoded Genes • Marie E. Petracek and William F. Thompson Novel Methods of Introducing Pest and Disease Resistance to Crop Plants • Jeremy Bruenn Targeting Gene Repair in Mammalian Cells Using Chimeric Oligonucleotides • Eric B. Kmiec, Sarah Ye, and Lan Peng Exploring the Mechanism of Action of Insecticidal Proteins by Genetic Engineering Methods • Jeremy L. Jenkins and Donald H. Dean Enzyme Stabilization by Directed Evolution • Anne Gershenson and Frances H. Arnold ET-Cloning: Think Recombination First • Joep P. P. Muyrers, Youming Zhang, and A. Francis Stewart Growth and Genetic Modification of Human β-Cells and β-Cell Precursors • Gillian M. Beattie, Albert Hayek, and Fred Levine Elucidation of Biosynthetic Pathways by Retrodictive/Predictive Comparison of Isotopomer Patterns Determined by NMR Spectroscopy • Wolfgang Eisenreich and Adelbert Bacher Are Gene Silencing Mutants Good Tools for Reliable Transgene Expression or Reliable Silencing of Endogenous Genes in Plants? • Philippe Mourrain, Christophe Béclin, and Hervé Vaucheret Manipulating Plant Viral RNA Transcription Signals • Cynthia L. Hemenway and Steven A. Lommel Genetic Engineering Strategies for Hematologic Malignancies • Thomas J. Kipps Telomerase and Cancer • Murray O. Robinson
ACKNOWLEDGMENT All the final processing was done most competently by Bonnie J. McGahern. The Editor is particularly grateful.
XV
CONTENTS
EVOLUTION OF TRANSPORT PROTEINS Milton H. Saier, Jr.
1
MECHANISMS OF APOPTOSIS REPRESSION Collin C.Q. Vu and John A. Cidlowski
11
CYTOKINE ACTIVATION OF TRANSCRIPTION Kerri A. Mowen and Michael David
35
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS Pamela Sears, Thomas Tolbert and Chi-Huey Wong
45
VECTOR DESIGN AND DEVELOPMENT OF HOST SYSTEM FOR PSEUDOMONAS Herbert P. Schweizer, Tung T. Hoang, Katie L. Propst, Henry R. Ornelas and RoxAnn R. Karkhoff-Schweizer GENETIC AND BIOCHEMICAL STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS Timothy L. Tellinghuisen, Rushika Perera and Richard J. Kuhn
69
83
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY Margaret E. Black
113
DECONSTRUCTING A CONSERVED PROTEIN FAMILY: THE ROLE OF MCM PROTEINS IN EUKARYOTIC DNA REPLICATION Sally G. Pasion and Susan L. Forsburg
129
EXPRESSION OF FOREIGN GENES IN THE YEAST Pichia pastoris Geoffrey P. Lin Cereghino, Anthony J. Sunga, Joan Lin Cereghino and James M. Cregg
157
xvii
xviii
PROTEIN SPLICING AND ITS APPLICATIONS Izabela Giriat, Thomas W. Muir and Francine B. Perler
CONTENTS
171
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SERIAL ANALYSIS OF GENE EXPRESSION (SAGE) Hamish S. Scott and Roman Chrast
201
INDEX
221
EVOLUTION OF TRANSPORT PROTEINS
Milton H. Saier, Jr.
Department of Biology University of California at San Diego La Jolla, CA 92093-0116
Abbreviations: TC, transporter classification MIP, major intrinsic protein MF, major facilitator MFS, major facilitator superfamily TMS, transmembrane spanning segment APC, amino acid/polyamine/organocation RND, resistance-nodulation-division DMT, drug/metabolite transporter CHR, chromate resistance VIC, voltage-gated ion channel ORF, open reading frame SMR, small multidrug resistance aas, amino acyl residues BAT, Bacterial/Archaeal Transporter
Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
1
2
M.H. SAIER
INTRODUCTION
Over the past decade, our laboratory and others have been concerned with molecular archaeological studies aimed at revealing the origins and evolutionary histories of permeases (1). These studies have revealed that several different families, defined on the basis of sequence similarities, arose independently of each other, at different times in evolutionary history, following different routes. When complete microbial genomes first became available for analysis, we adapted preexisting software and designed new programs that allowed us quickly to identify probable transmembrane proteins, estimate their topologies and determine the likelihood that they function in transport (2). This work allowed us to expand previously-recognized families and to identify dozens of new families. All of this work then led us to attempt to design a rational but comprehensive classification system that would be applicable to the complete complement of transport systems found in all living organisms (3). The classification system that we have devised is based primarily on mode of transport and energy coupling mechanism, secondarily on molecular phylogeny, and lastly on the substrate specificities of the individual permeases (4). Our laboratory has utilized the knowledge that has accrued from the classification and characterization of transport systems and their protein constituents in order to gain insight into the evolutionary origins of transporters found in living organisms. This information is based in part on our genome analyses. An extensive website is available describing the genome analysis work as well as the classification system (http://www-biology.ucsd.edu/~msaier/transport/). This site includes (i) names, (ii) abbreviations, (iii) transporter classification numbers (TC #), (iv) descriptions of the families, (v) primary references and (vi) representative well-characterized family members including their names, substrate specificities, organismal sources and database accession numbers. The interested reader is referred to reference 4 for a comprehensive review. In this chapter, I summarize evidence concerning the evolution of transport proteins. These proteins are proposed to have evolved independently of other protein types following the pathway: channel proteins secondary active transporters primary active transporters. Presently documented examples are cited to support this proposal, and other relevant new information is provided. INDEPENDENT ORIGINS OF DIFFERENT PERMEASE FAMILIES Our current classification system includes nearly 300 different families of transporters (4). These families are defined on the basis of degrees of sequence similarity exhibited by their members. Homology of two proteins is considered to be proven if comparable regions of these proteins (60 residues or more) exhibit a comparison score in excess of 9 standard deviations (5). This value corresponds to a probability that is less than that the observed degree of similarity occurred by chance (6). If two proteins do not exhibit significant sequence similarity, they are assigned to different families. Two families may have arisen independently of each other, or they may have arisen from a common ancestor by extensive sequence diversion. In many cases, it is impossible to distinguish between these two possibilities. However, in some cases, evidence for the latter possibility can be obtained by conducting matrix/motif analyses with programs such
EVOLUTION OF TRANSPORT PROTEINS
3
as PSI-BLAST (7), or MEME and MAST (8,9). In other cases, specifically for proteins that have arisen by intragenic multiplication events, it is sometimes possible to establish independent origins. We have previously discussed the origins of three families for which compelling evidence of independent origin exists (10, 11). These families are (a) the mitochondrial carrier (MC) family of anion exchangers [TC #2.A.29] (12); (b) the major intrinsic protein (MIP) family of aquaporins and glycerol channels [TC #1.A.8] (13); and (c) the major facilitator (MF) superfamily of secondary carriers [TC #2.A.l] (14, 15). The MC family arose in eukaryotic organelles by an intragenic triplication event in which a two transmembrane spanner (TMS) polypeptide gave rise to a 6 TMS polypeptide chain. The MIP family probably arose in prokaryotes before the three major kingdoms of life (bacteria, archaea and eukarya) diverged from each other. In this case, an intragenic duplication event occurred in which a 3-TMS-encoding genetic element gave rise to a 6-TMS-encoding element. This event gave rise to proteins with their two halves in opposite orientation in the membrane. Finally, the MFS probably arose very early, either soon after life on earth became established or perhaps even before life came to earth. In the MFS, an intragenic duplication event gave rise to a 12-TMS encoding element from a 6 TMS unit. Many additional families of transporters are now known to have arisen independently of these families and of each other as discussed in the next section. EVOLUTION OF COMPLEX SUPERFAMILIES INCLUDING PROTEINS OF MULTIPLE TOPOLOGICAL TYPES In recent publications, the evolutionary origins of several independently arising superfamilies were analyzed (14-20, 23, 25). Among the families analyzed were (A) the major facilitator (MF) superfamily [TC #2.A.l] (14, 15), (B) the amino acid/polyamine/organocation (APC) superfamily [TC #2.A.3] (16), (C) the resistance-nodulation-division (RND) superfamily [TC #2.A.6] (18), (D) the drug/metabolite transporter (DMT) superfamily [TC #2.A.7] (25), (E) the chromate resistance (CHR) family [2.A.51] (23), (F) the voltage-gated ion channel (VIC) superfamily [TC #l.A.l] (17), (G) the ATP-binding cassette (ABC) superfamily [TC #3.A.l] (19) and (H) the lysine exporter (LysE) superfamily (20). The three constituent families of the putative LysE superfamily have TC numbers: #2.A.65, 2.A.66 and 2.A.67 (20). Current topologies known for members of most of these families and their postulated routes of appearance are presented in Table 1. The VIC superfamily contains protein members of varying topological types that arose both due to fusion of genetic elements of dissimilar origin and due to intragenic duplication events. The ABC superfamily has a basic 6 TMS unit, but these units have undergone fusion to twelve with various domain orders in different transporters, particularly in efflux pumps. Additionally, in many nutrient uptake systems, the six TMS unit has lost one TMS to give 5 TMSs due to gene deletion, and the basic unit has also gained 1, 2 or 3 TMSs due to gene fusion events to give proteins with more unusual 7, 8 or 9 TMS topologies, respectively. By contrast, the RND and MFS superfamilies are believed to have arisen exclusively by independent intragenic duplication events that occurred multiple times during their evolution. In the case of the RND superfamily, unduplicated “primordial-like” genes were identified in presently existing genomes (18). The demonstration of their presence supported the suggestion that gene duplication events giving rise to 12 TMS permeases occurred repeatedly during the evolution of this superfamily.
4
M.H. SAIER
1
An entry indicates numbers of putative or established TMSs in currently recognized homologues of the family. 2
Appearance of present day homologues from a primordial 2, 4, 6 or 12 TMS precursor occurred by intragenic duplication (d), triplication (t), addition (a) of TMSs, and/or subtraction (s) of TMSs. When not indicated by a letter, the gain or loss of TMSs, as for the MFS and the CHR family, probably occurred by insertion or desertion of preexisting portions of the polypeptide chain.
EVOLUTION OF TRANSPORT PROTEINS
5
Previous studies with MFS permeases failed to reveal the occurrence of such half-sized (6 TMS) protein-encoding genes (15). Recently, as a result of the comprehensive analysis of the Bacillus subtilis genome (21), we identified a single open reading frame (ORF) that encodes a 6 TMS, half-sized, MFS permease homologue (22). This protein, YitZ, is 168 amino acyl residues (aas) in length, exhibits 6 putative TMSs rather than the usual 12, and most closely resembles known drug resistance permeases. The function of the 6 TMS Bacillus MFS homologue is unknown, nor is it known if it functions as a monomer, a homodimer, or some other oligomer. Physiological and biochemical analyses of this protein should prove most interesting. The occurrence of this protein provides further support for the notion that the 12 TMS MFS proteins arose by internal gene duplications events. Not all MFS permeases exhibit 12 (or 6) TMSs. Three of the 30 currently recognized families within this superfamily appear to exhibit 14 TMSs, and in one of these proteins, a 14 TMS topology has been experimentally verified. Sequence analyses have revealed that the 14 TMS topology probably arose by insertion of a cytoplasmic loop between TMSs 6 and 7 into the membrane due to substitution of hydrophilic residues for hydrophobic residues. This event must have occurred by a series of point mutations in the encoding gene over an extended period of evolutionary time (see Table 1). The functional advantages of having a 14 TMS topology versus a 12 TMS topology are not known. On the other hand, LysE of Corynebacterium glutamicum probably arose from a 6 TMS unit, but it lost one TMS due to substitution of hydrophilic residues for primordial hydrophobic residues in its N-terminal TMS so that a 5 TMS unit resulted (20). Another interesting example where we can trace the probable evolutionary pathway taken by a permease family is the small but widespread chromate resistance (CHR) family [TC #2.A.51; Table 1]. Two “half-sized” 6 TMS CHR family permease homologues are found encoded adjacent to each other in the Bacillus genome (23). Surprisingly, most of the members of this family exhibit an apparent 10 TMS topology, and topological analyses have established that at least one member of the family, ChrA of Alcaligenes eutrophus, does in fact exhibit a 10 TMS topology (23) rather than the expected 12 TMS topology. In this case, the ten TMS protein may have arisen from a 12 TMS precursor protein by hydrophilic amino acyl residue substitutions in the first 2 TMSs of the promordial 12 TMS protein. The hydrophilic N-termini of CHR permeases, corresponding to the first two TMSs in the putative 12 TMS primordial protein, prove to be localized to the cytoplasmic side of the membrane (23). Thus, even though these proteins exhibit 10 TMSs, they apparently arose by an internal gene duplication event from a genetic precursor that encoded a 6 TMS protein. The 6 TMS protein was the precursor of a 12 TMS protein, and a 12 TMS protein gave rise to a 10 TMS protein (Table 1). In spite of the sequence similarity that allows us to establish that the two halves of these proteins arose from a single precursor polypeptide, it is therefore clear that the N-terminal domain has assumed a topology quite different from that of the corresponding portion of the C-terminal half. A related but distinct situation may exist for a yeast phosphate permease, Pho84 of Saccharomyces cerevisiae (16, 24). Based on sequence similarity, the two halves of Pho84 are homologous. Uniquely, however, hydropathy analyses have led to the prediction that the two halves of the protein differ very substantially in topology (24). Although the topology of this protein has not been experimentally verified, entirely different portions of the polypeptide chains of these two halves are predicted to transverse the membrane. It is therefore not yet established that for membrane proteins, common evolutionary origin necessarily implies common structure.
6
M.H. SAIER
Divergence in topology and structure for proteins of similar sequence may be more common for membrane proteins than for soluble proteins. A NOVEL SUPERFAMILY OF TOPOLOGICALLY DIVERSE PERMEASES Recently our phylogenetic efforts have resulted in definition of a novel superfamily that we have called the Drug/Metabolite Transporter (DMT) superfamily [TC #2.A.7]. The DMT superfamily consists of 14 currently recognized families derived from bacteria, archaea and eukaryotes (25). One of these families, the small multidrug resistance (SMR) family of drug exporters, consists of small proteins (105-115 aas) with only 4 TMSs per polypeptide chain. These proteins probably form oligomeric structures that function by drug A second family within the DMT superfamily is called the Bacterial/Archaeal Transporter (BAT) family. This family consists of proteins with 5 putative TMSs. In this case, an N-terminal TMS is apparently present that is lacking in the 4 TMS SMR-like proteins. BAT family proteins thus resemble SMR drug exporters but are longer due to an N-terminal extension (25). Finally, several remaining families within the DMT superfamily appear to exhibit a 10 TMS topology, having probably arisen from a 5 TMS precursor protein like those of the BAT family by an internal gene duplication event. As for the MIP family of channel proteins, the 10 TMS DMT superfamily proteins possess two halves that must exhibit opposite orientation in the membrane (see Table 1). This is the only currently documented example of such an occurrence in a family of secondary carriers. It will be interesting to determine if DMT superfamily permeases, with opposite membrane topological orientations for their two halves, will prove to function by a mechanism that is substantially different from those that are operative for other secondary carriers where the two halves of the proteins have the same membrane orientation. PATHWAYS FOR COMPLEX PERMEASE EVOLUTION We have recently proposed that integral membrane transport proteins evolved independently of soluble enzyme, regulatory proteins and ligand-binding receptors (27). Moreover, the pathway proposed for the appearances of secondary carriers and primary active transporters from simple channel proteins is simply:
In the review cited above (27), examples of present-day transporter in which 2° active transporters have evidently been converted to 1° active transporters include members of the ArsAB family of ATP hydrolysis-driven active arsenite efflux pumps [TC #3.A.4] and those of the decarboxylation-driven efflux pumps [TC #4.A.1] (4). In these examples, a typical soluble enzyme, an ATPase or a decarboxylase, respectively, may have been superimposed on a secondary active transporter to give rise to a primary active transporter (27). In a third family of transporters (the TRAP-T family, [TC #2.A.56]), a receptor protein was superimposed on a secondary active transporter to yield a high affinity, high specificity carrier (28). The evidence suggests that some families of transporters may be more conducive to modification by superimposition of other protein activities upon them than others (28).
EVOLUTION OF TRANSPORT PROTEINS
7
A POSSIBLE EVOLUTIONARY PATHWAY FOR THE APPEARANCE OF P-TYPE ATPASES Another superfamily of primary active transporters is the P-type ATPase superfamily [TC #2.A.3], Most members of this superfamily pump ions. The of mammals is an example of a P-type ATPase. Until recently, no member of this superfamily had been shown to function in a capacity other than in ATP hydrolysis-driven active transport (29, 30). However, recently, Ogawa et al. (31) identified an ORF from the fully sequenced genome of Methanococcus jamaschii that was proposed to resemble an ancestral protein of the P-type ATPases. The primary structure of this protein proved to resemble those of the core catalytic domains of P-type ATPases. However, the protein lacks sequences that constitute the transmembrane domains of Ptype ATPases. Thus, the novel M. jannaschii protein, MJ0968, appears to resemble the cytoplasmic, energy-coupling, ATP-hydrolyzing domains of P-type ATPases. Ogawa et al. (31) overexpressed the MJ0968 gene in E. coli, purified the encoded protein and demonstrated its catalytic activity. MJ0968 proved to exhibit an ATPase activity that was strongly inhibited by vanadate. Moreover, the protein could be shown to be autophosphorylated by ATP. These properties are characteristic of P-type ATPases. The results thus support the contention that MJ0968 is in fact a soluble ATPase that resembles a putative evolutionary precursor of the energy-coupling domains of P-type ATPases (31). A polypeptide chain capable of catalyzing ATP hydrolysis may have joined with the integral membrane transporter component corresponding to those of present-day P-type ATPases to create ATP-driven active transporters from preexisting secondary active transporters. The identification of MJ0968 and its mode of action strengthens the postulate that ATP-driven 1° active transporters arose from 2° carriers by the superimposition of chemical energy coupling mechanisms onto carriers. In the case of P-type ATPases, the two functional domains became fused into a single polypeptide chain. A similar scenario may have resulted in the coupling of substrate decarboxylation or ATP hydrolysis to ion transport without fusion of the two functional polypeptides as noted above for decarboxylases or ArsAB family members, respectively. Further studies are expected to provide documentation of other mechanistically related examples of primary active transporters that have arisen from simpler channel-type or carrier-type transporters. TRANSPORT PROTEIN SIZE VARIATION OBSERVED FOR HOMOLOGUES IN DIFFERENT ORGANISMAL TYPES Many families of transporters are ubiquitous, being found in bacteria, archaea and eukaryotes. In a recent study, we compared the sizes (in numbers of amino acyl residues) of homologues within various protein families derived from the three domains of life: bacteria, archaea and eukarya. We also examined the size differences observed for homologues derived from the three principal eukaryotic kingdoms: animals, plants and fungi (32). The surprising results will be briefly summarized here. With respect to integral membrane secondary carriers, the predominant class of transporters analyzed, it was found that while the eukaryotic homologues are consistently larger than their bacterial counterparts, the archaeal homologues are almost always smaller (32). Moreover, within the eukaryotic domain, plant homologues proved to be consistently smaller than the fungal and animal homologues. Proteins from the latter two kingdoms, however, are of similar
8
M.H. SAIER
sizes. All of these observations appear to apply to integral membrane channel and primary active transport proteins, although the sample sizes were, in these cases, small, preventing definitive statistical analysis. By contrast, corresponding size differences were not observed for extracellular receptors of archaeal versus bacterial ABC-type uptake permeases. The archaeal receptors proved to be somewhat larger than those in bacteria. Moreover, no size difference was observed for cytoplasmic enzymes derived from the three domains of living organisms or from the three eukaryotic kingdoms. These results clearly imply that integral membrane proteins have been subject to evolutionary pressures that were not applicable to cytoplasmic or extracytoplasmic proteins. The mechanistic and physiological implications of these surprising observations have yet to be examined. GENERAL CONCLUSIONS AND PERSPECTIVES In previous publications, we suggested that families of transporters have arisen continuously over at least the past 4 billion years. Those that have arisen recently, i.e., since the three major domains of living organisms (bacteria, archaea and eukaryotes) separated, are in general found only in the domain in which that family arose. Many of the families that are restricted to only the bacterial, archaeal or eukaryotic domain undoubtedly arose within that domain, and little or no horizontal transmission of relevant genetic material caused the family to become dispersed throughout the living world (1,3). However, it should be kept in mind that only in a relatively few cases have independent origins been proven for different permease families, and two families, defined on the basis of primary sequence similarity, or the lack thereof, may share a common origin but have diverged from each other beyond recognition. Such extensive sequence divergence could have occurred either because of early divergence of the host organisms or because of functional differentiation of the proteins. Either possibility could have prevented exchange of genetic material between the two gene pools. We have seen that many permease proteins arose by tandem intragenic duplication events. In many cases, additional extragenic duplications have given rise to functional differentiation of constituents within the permease complexes. In other cases, independent origins of the constituents of modularly constructed permeases have been proposed and documented. It is clear that different families have evolved at different times in evolutionary history, from different sources, following different routes. A common topological unit of 6 TMSs is found in many families of permeases, and it has been possible to show that the 6 TMS unit evolved independently in several distinct families. Variations in the 6 TMS unit also evolved independently in several distinct families. Variations in this 6 TMS unit (or its duplicated 12 TMS unit) have resulted from the documented gain or loss of TMSs by amino acyl substitutions that increased or decreased the hydrophobicities of portions of a polypeptide chain. (See, for example, the MFS and CHR families illustrated in Table 1). In other cases, it seems more likely that proteinaceous elements were incorporated into or deleted from a pre-existing 6 TMS unit, giving rise to the observed topological variability (see, for example, the APC, DMT and VIC families illustrated in Table 1). Such events could have occurred at the gene level by DNA splicing, fusion and/or deletion. The fact that so many permeases of distinct origin exhibit units of 6 TMSs, and that so many primary and secondary carriers include two such units in either one or at most two polypeptide chains, is still without explanation. However, the 6 (or 12) TMS topology may have arisen frequently because of the
EVOLUTION OF TRANSPORT PROTEINS
9
ease by which such a topology can be generated from primordial units of either 2 or 3 TMS. It seems that channel formation occurs most frequently by construction of the transmembrane hydrophilic pore from multiple polypeptide chains that may reversibly come together and separate. However, maintenance of carrier function probably requires that the complete functional unit be present within just one, or at most two, polypeptide chains, so that a well-defined transport pathway is formed. A requirement for an architecturally constrained solute transport pathway may be a prerequisite for stereospecific solute recognition and carrier-mediated transport. Motif and sequence matrix analyses of permease families led to the tentative identification of superfamilies, the members of which may exhibit very little primary sequence similarity. Of great potential significance is the fact that any such well-conserved residues and sequence motifs are likely to be of structural and/or functional significance. The identification of these residues and motifs is likely to be the first step toward providing detailed three-dimensional structures and mechanistic schemes for these permeases. Finally, phylogenetic analyses, based on primary protein structure, provide a reliable guide to macromolecular structure, function and mechanism, with very few exceptions. Any laboratory involved in structure/function studies must therefore be cognizant of the benefits and limitations of the phylogenetic approach. A knowledge of family members allows extrapolation of data from many proteins to all members of the family, and detailed information derived from just one family member will be applicable to all family members to degrees that in general will be directly related to their degrees of sequence similarity. Such extrapolations are not justified for nonhomologous proteins. It seems certain that when genome sequencing increases the volume of molecular data available for analysis by several orders of magnitude, the experimental biologist will become increasingly dependent on molecular archaeology to provide a guide through the wilderness (1, 3, 4). ACKNOWLEDGMENTS I wish to thank Jackie Richardson for her assistance in the preparation of this manuscript. Work in the author’s laboratory was supported by NIH grants 2R01 AI14176 from The National Institute of Allergy and Infectious Diseases and 9RO1 GM55434 from the National Institute of General Medical Sciences, as well as by the M.H. Saier, Sr. Research Fund. REFERENCES 1 Saier, M.H., Jr. (1999) Curr. Opin. Microbiol. 2, 555-561. 2 Paulsen, I.T., Sliwinski, M.K. and. Saier, M.H., Jr. (1998) J. Mol. Biol. 277, 573-592. 3 Saier, M.H., Jr. (1998) in Advances in Microbial Physiology (R.K. Poole, ed.), pp. 81-136. Academic Press, San Diego, CA. 4 Saier, M.H., Jr. (2000) Microbiol. Mol. Biol. Rev. 64, 354-411. 5 Devereux, J., Haeberli, P. and Smithies, O. (1984) Nucl. Acids Res. 12, 387-395. 6 Dayhoff, M.O., Barker, W.C. and Hunt, L.T. (1983) Meth. Enzymol. 91, 524-545 7 Altshul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Nucl. Acids Res. 25, 3389-3402.
10
M.H. SAIER
8 9 10 11 12 13 14 15
Baily, T.L. and Gribskov, M. (1998) Bioinformatics 14, 48-54. Baily, T.L. and Gribskov., (1998) J. Comp. Biol. 5, 211-221. Saier, M.H., Jr. (1994) Microbiol. Rev. 58, 71-93. Saier, M.H., Jr. (1996) Microb. Comp Genom. 1, 129-150. Kuan, J. and. Saier, M.H., Jr. (1993) Crit. Rev. Biochem. Mol. Biol. 28, 209-233. Park, J.H. and Saier, M.H., Jr. (1996) J. Memb. Biol. 153, 171-180. Pao, S.S., Paulsen, I.T. and Saier, M.H., Jr. (1998) Microbiol. Mol. Biol. Rev. 62, 1-32. Saier, M.H., Jr., Beatty, J.T., Goffeau, A., Harley, K.T., Heijne, W.H.M., Huang, S.-C., Jack, D.L., Jahn, P.S., Lew., K., Liu, J., Pao, S.S., Paulsen, I.T., Tseng T.-T., and Virk, P.S. (1999) J. Mol. Microbiol. Biotechnol. 1, 257-279. Saier, M.H., Jr. and Tseng, T.-T. (1999) in Transport of Molecules across Microbial Membranes, Symposium 58, Society for General Microbiology (Broome-Smith, J.K., Baumberg, S., Stirling, C.J. and Ward, F.B. eds.), pp. 252-274. Cambridge University Press, Cambridge, UK. Nelson, R.D., Kuan, G., Saier, M.H., Jr. and Montal, M. (1999) J. Mol. Microbiol. Biotechnol. 1, 281-287. Tseng, T.-T., Gratwick, K.S., Kollman, J., Park, D., Nies, D.H., Goffeau, A. and Saier, M.H., Jr. (1999) J. Mol. Microbiol. Biotechnol. 1, 107-125. Paulsen, I.T, Beness, A.M. and Saier, M.H., Jr. (1997) Microbiology 143, 2685-2699. Vrljic, M., Garg, J., Bellmann, A., Wachi, S., Freudl, R., Malecki, M.J., Sahm, H., Kozina, V.J., Eggeling L. and Saier, M.H., Jr. (1999) J. Mol. Microbiol. Biotechnol. 1, 327-336. Paulsen, I.T., Nguyen, L., Sliwinski, M.K,. Rabus, R. and Saier, M.H., Jr. (2000) J. Mol. Biol. (in press). Saier, M.H., Jr., Goldman, S.R., Maile, R.R., Moreno, M.S., Weyler, W., Yang, N. and Paulsen, I.T. (2001) in Bacillus subtilis and other Gram+ Bacteria (A.L. Sonnenshein, ed.), ASM Press, Washington, DC. (in press). Nies, D.H., Koch, S., Wachi, S,. Peitzsch, N and Saier, M.H., Jr. (1998) J. Bacteriol. 180, 5799-5802. Persson., B.L., Berhe, A., Fristedt, U., Martinez, P., Pattison, J., Petersson, J. and Weinander, R. (1998) Biochim. Biophys. Acta 1365, 23-30. Jack, D.L., Yang N. and Saier, M.H., Jr. (2001) Eur. J. Biochem. (in press). Paulsen, I. T., Skurry, R.A., Tam, R., Saier, M.H., Jr., Turner, R.J., Weiner, J.H., Goldberg, E.B. and Grinius, L.L. (1996) Mol. Microbiol. 19, 1167-1175. Saier, M.H., Jr. (2000) J. Bacteriol. 182:5029-5035. Rabus, R., Jack, D.L., Kelly, D.J. and M.H. Saier, Jr. (1999) Microbiology 145, 3431-3445. Fagan, M.J. and. Saier, M.H., Jr. (1994) J. Molec. Evol. 38, 57-99. Palmgren, M.G. and. Axelsen, K.B. (1998) Biochim. Biophys. Acta 1365, 37-45. Ogawa, H., Haga, T. and. Toyoshima, C. (2000) FEBS Lett. 471, 99-102. Chung, Y.J., Krueger, C., Metzger, D. and. Saier, M.H, Jr. (2001) J. Bacteriol. (in press).
16
17 18 19 20
21 22
23 24 25 26 27 28 29 30 31 32
MECHANISMS OF APOPTOSIS REPRESSION
Collin C. Q. Vu and John A. Cidlowski Laboratory of Signal Transduction National Institute of Environmental Health Sciences
National Institutes of Health 111 T. W. Alexander Drive Research Triangle Park, NC 27709
INTRODUCTION In both vertebrates and invertebrates, the preservation of tissue homeostasis is orchestrated by a continuous cycle of cell proliferation and death. The molecular and biochemical mechanisms that propel this cycle account for the massive flux of cells through our bodies that, in a given year, each of us could produce and concurrently eradicate nearly the entire body weight (1). For this coupling, the cells have endowed themselves with an elaborate arsenal of survival and proliferative factors that are appropriately counterbalanced by death effector molecules. Under most physiological circumstances, the cells die by an intrinsic program known as apoptosis. This form of cell death shows remarkable conservation during evolution as it occurs throughout the animal kingdom. Apoptosis is executed in response to both environmental and developmental signals, specifically during embryogenesis, maintenance of the immune repertoire, and the selective elimination of virally infected or malignant cells, all of which culminate in a systematic dismantling of cellular components. Interestingly, the observation that many death effector molecules are constitutively expressed in living cells has led to the idea that apoptosis is a default program restrained
Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
11
12
C.C.Q. VU AND J.A. CIDLOWSKI
only by survival or anti-death factors (2, 3). One example of maintaining cell survival by apoptosis repression is exemplified by the nematode Caenorhabditis elegans, in which the anti-apoptotic ced-9 gene is essential to suppress the killing of cells that are required for normal development (4). Like the death effector mechanisms, defects in survival pathways inevitably disrupt the life and death balance, and consequently promote multiple pathologies ranging from carcinogenesis to neurodegeneration, depending on whether the defects produce excessive or insufficient cell survival. This understanding has transpired into the discovery of anti-apoptotic mechanisms that functionally repress apoptosis in living cells. In this review, we highlight some of the cellular molecules that are the key elements of such anti-apoptotic mechanisms, including the structural and functional bases by which they contribute to apoptosis repression. THE BCL-2 FAMILY PROTEINS Structural and Functional Diversity The genetics of apoptosis in vertebrates first emerged with the unexpected finding that bcl-2, a proto-oncogene constitutively expressed as a result of chromosomal translocation in human B-cell lymphomas (5), facilitates the survival of cytokinedependent hematopoietic cells (6). In contrast to many proto-oncogenes, BCL-2 facilitates neoplastic B-cell expansion by anti-apoptotic rather than proliferative mechanisms (7, 8), thus paving a novel concept that cell survival and growth are under separate genetic control. Overexpression of BCL-2 in various cell lines confers extended survival and resistance to diverse apoptotic stimuli, including cytotoxic drugs, oxidative stress, irradiation, glucocorticoids, and activation of some but not all death receptors (911). Conversely, bcl-2 knockout mice initially develop a normal hematopoietic system but subsequently exhibit massive apoptosis of lymphocytes with loss of the thymus and spleen (12). Since the discovery of BCL-2, a growing number of other mammalian, avian, and viral proteins sharing amino acid sequence homology have emerged and constitute the BCL-2 family (Figure 1). Functional characterization of individual members surprisingly reveals the existence of not only anti-apoptotic proteins but also pro-apoptotic effectors. Structurally, all members known to date possess at least one of four conserved motifs known as BCL-2 homology (BH) domains. Among these, the BH3 domain is necessary and in some cases sufficient for pro-apoptotic function (13). This idea is supported by the recently identified subset of "BH3-only" pro-apoptotic members such as BID, BIK, BAD, HRK and BIM (Figure 1). The BH domains are thought to be essential for the inherent propensity of BCL-2 family proteins to form dimers. Indeed, certain proand anti-apoptotic members can heterodimerize and apparently antagonize each other's function, as shown for BAX and BCL-2 (14, 15) or BAK and (16, 17). This mutual neutralization suggests that the relative ratios of these counteractive members may dictate the ultimate sensitivity of cells to apoptotic stimuli (14, 18). In addition to BH domains, most BCL-2 family proteins (except the BH3-only members) possess a Cterminal hydrophobic motif, which appears to constitute a transmembrane domain important for post-translational membrane targeting and insertion (19, 20).
MECHANISMS OF APOPTOSIS REPRESSION
13
Tissue and Cellular Distribution BCL-2 family proteins are widely expressed in many tissues, although the pattern of expression can differ vastly among the members. Immunohistochemical analyses reveal the presence of the BCL-2 protein in lymphoid and non-lymphoid tissues such as the breast, thyroid, prostate, pancreas, brain and intestine, but not in liver, lung, heart, kidney, cervix, myometrium, testis, ovary, or bladder (21). Similarly, the BCL-X and MCL-1 proteins are expressed in epithelial cells of the prostate, breast, endometrium, epidermis, stomach, intestine, colon, respiratory tract and reproductive tissues (22, 23). However, the expression gradient of BCL-X or MCL-1 in the epidermis often opposes that of BCL2, suggesting that these members may regulate apoptosis at distinct stages of cell differentiation. Also contrasting with BCL-2, little or no MCL-1 protein is present in neurons of the brain and spinal cord, but strong MCL-1 immunostaining is found in
14
C.C.Q. VU AND J.A. CIDLOWSKI
chondrocytes, hepatocytes, as well as cardiac and skeletal muscles that contain little or no BCL-2 (23). For the pro-apoptotic member BAX, its protein expression profile often overlaps with BCL-2 in several epithelia including the small intestines, colon, breast, skin, respiratory tract, and prostate (24). In the latter, however, strong BAX immunostaining is seen in androgen-dependent secretory epithelial cells, whereas BCL-2 is limited to androgen-independent basal cells. BAX protein expression is also detectable in neurons, including the Purkinje cells, large neurons of the cortex and brain stem, and neurons of sympathetic ganglia (24). Interestingly, BAX protein is found in the liver, pancreas and kidney where BCL-2 appears undetectable. Such differential expression of BCL-2 family proteins suggests the presence of tissue-specific mechanisms for controlling their expression and thereby the survival or apoptotic functions. Within a particular cell, the subcellular localization differs greatly among the proand anti-apoptotic members. BCL-2 and its anti-apoptotic relatives are generally membrane-integral proteins found in membranes of the mitochondria, endoplasmic reticulum (ER) and nuclear envelope (21, 25, 26). However, the pro-apoptotic counterparts often localize to cytosol or cytoskeleton under quiescent conditions, but can target and integrate into membranes and particularly the mitochondrial outer membrane during apoptosis (27-29). Potential Mechanisms of Action The precise mechanisms underlying the pro- and anti-apoptotic actions of BCL-2 family proteins are only beginning to be elucidated, but recent genetic and biochemical studies suggest the existence of multiple mechanisms that may function in concert (Figure 2). The mitochondria have been proposed as key elements that mediate many apoptotic pathways in part by the release of apoptogenic factors such as cytochrome c and apoptosis-inducing factor (30, 31), thus it is not surprising that these organelles are also critical targets of the BCL-2 family proteins. The release of mitochondrial cytochrome c triggers the activation of caspases (cysteine aspartate-specific proteases) in a cascade of signal transduction, first by binding to apoptosis-associated factor-1 (Apaf1) in the presence of dATP or ATP, and subsequently recruiting the initiator pro-caspase9 to activate downstream effector caspases (30-32). Both BCL-2 and can prevent, whereas the pro-apoptotic BAX can activate this caspase pathway by modulating the release of mitochondrial cytochrome c (33-36). Even downstream from the release of cytochrome c, and BOO/DIVA have the ability to interact with Apaf-1 (37-39). This interaction presumably sequesters Apaf-1 and prevents it from activating the caspase cascade in an analogous manner used in C. elegans by CED-9 (BCL-2 homologue) to regulate CED-4 (Apaf-1 homologue) (40-42). Exactly how cytochrome c is released during apoptosis is unclear but one theory suggests the participation of the so-called mitochondrial permeability transition pore (PTP), a multimeric complex composed of voltage-dependent anion channel (VDAC), adenine nucleotide translocate (ANT), cyclophilin D, the peripheral benzodiazepine receptor, and possibly other proteins (31,43). inhibits this pore opening in part by interacting with VDAC (as shown with reconstituted liposomes), whereas BAX or BAK in isolated mitochondria can open the pore through interaction with VDAC and presumably ANT to induce cytochrome c
MECHANISMS OF APOPTOSIS REPRESSION
15
release (44, 45). In contrast, the BH3-only proteins BID and BIK do not appear to affect VDAC activity directly, at least in synthetic liposomes (46), but possibly by indirect effects on other pro-apoptotic members or mechanisms independent of the PTP. Support for the former possibility is the demonstration that BID cooperatively induces a conformational change of BAX and BAK, and BH3 mutations that hinder this interaction can cripple BID's pro-apoptotic activity (47).
16
C.C.Q. VU AND J.A. CIDLOWSKI
Interestingly, both anti- and pro-apoptotic members of the BCL-2 family such as BCL-2, and BAX can form ion channels in lipid bilayers in vitro (48-50), and those formed by BAX and BCL-2 have distinct ion selectivity (51). While the mechanism by which this ability is linked to the control of apoptosis is currently being defined, certain BCL-2 family proteins also display activities in non-mitochondrial membranes. BCL-2 has been shown to modulate fluxes across the endoplasmic reticulum to control intracellular homeostasis that in turn may affect mitochondrial function (52-54). Additional evidence suggests that BCL-2 may regulate protein trafficking across the nuclear envelope (55, 56). Some BCL-2 family proteins also have been shown to modulate cellular redox status (57, 58), and influence the activities of certain transcription factors, including p53 and nuclear (59-61). Although BCL-2 can protect cells against diverse death stimuli and particularly those involved in stress pathways, this protective effect does not prevail in all apoptotic paradigms. Several reports have identified a BCL-2-insensitive mechanism for receptormediated cell death in lymphocytes (62-64). Similarly, the deletion of autoreactive thymocytes during negative selection appears unaffected by the overexpression of BCL-2 in transgenic mice (65). Our laboratory also showed that S49 murine lymphoma cells overexpressing BCL-2 are resistant to glucocorticoid-induced apoptosis but retain the potential to undergo apoptosis in response to the calcium ionophore A23187, cycloheximide and serum depletion (66, 67). These studies indicate that cell-type and signal-specific mechanisms can govern the protective effect of BCL-2. It is also conceivable that certain death stimuli may by-pass the BCL-2-inhibitable step and target a downstream effector pathway. Alternatively, where BCL-2 does not appear to play a major protective role, the effects of other anti-apoptotic members of the BCL-2 family may become dominant. Nevertheless, it is likely that mechanisms by which the BCL-2 family proteins regulate apoptosis are as diverse and complex as their structural heterogeneity. Regulation of BCL-2 Family Proteins As previously mentioned, members of the BCL-2 family have an inherent propensity to dimerize and this interaction may represent one way to regulate their activities negatively or positively. An example is the interaction between BAK and which apparently neutralizes BAK's pro-apoptotic function (16, 17). X-ray and NMR analyses reveal that such interaction is accomplished by the BH1, BH2 and BH3 domains of forming an elongated hydrophobic cavity to which the BH3 domain of BAK can bind (68, 69). In addition to heterodimerization, certain members can function independent of each other. One example is BID that is cleaved by caspase-8 and possibly other caspases during apoptosis to produce a truncated C-terminal fragment, which subsequently translocates to the mitochondria and induces cytochrome c release (70, 71). This observation has led to the idea that the N-terminal domain of BID, BIM, or other pro-apoptotic members may function as an inhibitory domain, the removal of which induces a conformational change to expose the BH3 domain for death-inducing activity (11,72). An additional level of regulating the BCL-2 family proteins is phosphorylation. Upon a survival stimulus, BAD is phosphorylated at three distinct serine sites
MECHANISMS OF APOPTOSIS REPRESSION
17
and by Akt/PKB/RAC (a serine/threonine kinase downstream of phosphatidylinositol 3-kinase), mitogen-activated protein (MAP) kinase-activated protein kinase-1 (MAPKAP-Kl, also called RSK), and cAMP-dependent protein kinase (PKA) (73-75). This phosphorylation facilitates the sequestration of BAD by a 14-3-3 protein and results in its inactivation and cytosolic confinement. Conversely, BAD is dephosphorylated by calcineurin upon a death signal and associates with and thereby neutralizes or BCL-2 via the exposed BH3 domain (76). In the case of BCL-2, in vivo phosphorylation also has been observed and this modification may either enhance or attenuate activity, depending on the particular phosphorylation site (77-80; Huang and Cidlowski, unpublished data). Aside from these post-translational mechanisms, emerging evidence indicates that both pro- and anti-apoptotic BCL-2 members can be transcriptionally regulated. For instance, the tumor suppressor pS3 reportedly can function as a direct transcriptional activator of the bax gene (81). In hematopoietic progenitor cells, growth factor withdrawal selectively induces rapid expression of the pro-apoptotic member HRK while downregulating the expression of Recent studies also demonstrated that the anti-apoptotic and A1/BFL1 are direct transcriptional targets of (83, 84). Similarly, the expression of BCL-2 can be induced by neurotrophins in part through a mechanism involving the CREB family of transcription factors (85). These studies illustrate some of the potential mechanisms by which BCL-2 family proteins are regulated. However, it is likely that many regulatory pathways exist and may work together to provide a functional balance between pro- and anti-apoptotic members. THE INHIBITOR OF APOPTOSIS PROTEINS (lAPs)
Like the BCL-2 family, a growing number of IAP proteins are continually being identified in a host of organisms ranging from yeast and C. elegans to viruses and mammals (Figure 3) (86, 87). These proteins were first discovered in baculoviruses by Miller and colleagues for their ability to suppress cell death response of the host (88). Overexpression of human X-IAP, c-IAPl, c-IAP2, NAIP, or Survivin has been shown to suppress apoptosis induced by apoptotic stimuli such as Fas, BAX, etoposide and staurosporine (89-93). Although the physiological functions of IAPs are largely unknown, their vital role in apoptosis suppression predicts that genetic defects of certain members are likely to impel carcinogenic consequences. Indeed, Overexpression of Survivin mRNA and protein has been detected in many common human cancers, including those of the lung, breast, pancreas, colon, intestine and prostate (91, 92, 94). Additionally, this expression correlates with unfavorable prognostic outcome and a more aggressive form of the disease (91, 95, 96). The prevalent association between Survivin and cancer is partly attributable to cell cycle-dependent expression of the Survivin gene, and particularly at wherein more than a 40-fold increase has been detected in HeLa cells (97). Exactly how Survivin and possibly other IAPs mechanistically contribute to the etiology of human cancers await further studies, but like the BCL-2 family proteins, overexpression of IAPs may conceivably impart an anti-apoptotic advantage for tumor progression.
18
C.C.Q.VUANDJ.A.CIDLOWSKI
MECHANISMS OF APOPTOSIS REPRESSION
19
Structure and Function Structurally, the IAPs belong to a family of proteins containing a unique motif termed the baculoviral IAP repeat (BIR) of approximately 70 amino acids, and up to three such motifs can occur in tandem within a given member (Figure 3). While the antiapoptotic function of some IAPs has been defined, there are a number of evolutionarily conserved BIR-containing proteins whose anti-apoptotic roles remain undefined. The existence of such proteins in Saccharomyces cerevisiae and Schizosaccharomyces pombe (98), yeast strains that do not appear to possess intrinsic apoptotic programs (99), suggests a possible involvement for BIR-containing proteins in other biological processes. Indeed, overexpression of the nematode BIR-1 does not affect cell death in vivo, but genetic deletion of this protein causes a marked cytokinesis defect in embryos (100). For the IAPs with known anti-apoptotic properties, structural and functional studies have demonstrated a requirement for at least one BIR domain for apoptosis suppression (101-103). As described below, several IAPs can interact with caspases through regions involving BIRs, and thus it is thought that BIRs are caspase-interacting regions. Some members also harbor a C-terminal RING finger zinc-binding domain, which in the case of X-IAP and c-IAPl can exhibit ubiquitin ligase activity for autoubiquitination but appears dispensable for anti-apoptotic activity under some cellular contexts (104). Among the IAPs, c-IAPl and c-IAP2 additionally possess a caspase recruitment domain (CARD) that is conserved in several apoptosis regulatory proteins, including pro-caspase-9, Apaf-1, RAIDD, RIP2 and ARC. This similarity suggests a possible CARD-CARD interaction between these proteins and the IAPs, although the physiological role of this domain for the anti-apoptotic function of IAPs is unclear. Tissue Distribution Northern blot analyses of RNAs demonstrated the ubiquitous expression of X-IAP in all human adults and fetal tissues examined except peripheral blood leukocytes (90). Likewise, c-IAPl and c-IAP2 mRNAs are also widely distributed and the most abundant expression is often found in the kidney, liver, small intestine, lung, and to a lower extent the central nervous system (105). Although their expression overlaps considerably in most tissues, the c-IAP2 message is generally higher than c-IAPl by as much as 4-fold in some tissues, particularly the kidney, liver, appendix and lung (105). In contrast to the widespread distribution of X-IAP, c-IAPl and c-IAP2, no transcripts of Survivin are detectable by Northern analysis in most adult tissues, except in thymus and less prominently in placenta (91). However, substantial expression of the Survivin message and protein is seen in fetal liver and kidney, and lower levels in lung and brain (95). Immunostaining studies further showed that Survivin is expressed in a variety of transformed cells of lymphoid and myeloid lineages but not in untransformed human lung fibroblasts or endothelial cells (91). The expression of NAIP mRNA also appears to be highly tissue-specific with detectable levels only in adult liver and placenta, as judged by Northern analysis, albeit additional RT-PCR studies have detected its expression in brain (106). Since the majority of these studies were directed at the mRNA levels of the IAPs, additional immunohistochemical analyses are needed to confirm whether such levels are mirrored by comparable protein expression.
20
C.C.Q. VU AND J.A. CIDLOWSKI
Potential Mechanisms of Action Recent biochemical studies have revealed that certain caspases are direct targets of IAPs. Caspases commonly exist in living cells as inactive zymogens until a death signal is conveyed whereby an initiator caspase is activated by proteolytic cleavage. This activation is transcended downward to effector caspases by sequential proteolytic processing. One pathway that leads to caspase activation as described earlier is induced by the release of mitochondrial cytochrome c and its subsequent association with Apaf-1 and pro-caspase-9 (32). In this oligomeric complex termed the apoptosome, pro-caspase9 becomes activated (presumably by oligomerization) and subsequently initiates a downstream effector caspase cascade involving caspase-3, -6 and -7. The human X-IAP, c-IAPl and C-IAP2 have been shown to target this caspase pathway at the apex by direct binding to both the unprocessed and processed forms of caspase-9, thereby preventing both the processing and activity of this initiator caspase (Figure 4) (103). Additional targets of the IAPs in this pathway are the effector caspase-3 and -7, but only after these caspases have been proteolytically cleaved (92, 101, 103, 107). These findings are consistent with the known ability of human IAPs to block cell killing induced by BAX and other pro-apoptotic BCL-2 family proteins at a point downstream from cytochrome c release (92, 101, 103, 108, 109). In addition to the cytochrome c/Apaf-1 pathway, an alternate mode of activating caspases is induced upon the engagement of cell surface death receptors (e.g., Fas/CD95/Apo-l) with their cognate ligands (110, 111). This interaction triggers the recruitment of the adaptor molecule FADD (Fas-associated death domain) and sequentially prp-caspase-8 via homophilic death domain and death effector domain (DED) interactions, respectively, to form the so-called death-inducing signaling complex (DISC). Activated caspase-8 then initiates a signal transduction pathway by activating downstream effector caspases (110, 111). This pathway is also sensitive to the inhibitory effects of the IAPs; however, existing data suggest that the human IAPs do not directly target the initiator caspase-8 but only the downstream effector caspase-3 and –7 (92, 101, 103, 107). Like caspase 8, several other caspases including caspase-1, 6 and 10 also appear insensitive to the IAPs tested thus far, indicating that IAPs are caspase-selective but the mechanisms modulating this selectively are unknown. Mechanisms that Regulate IAPs Our current understanding of how the IAP family proteins are regulated is still limited, although recent advances suggest that these proteins are controlled at multiple levels. One example is auto-ubiquitination that arises from intrinsic ubiquitin protein ligase activity within the RING domain of some IAPs. This process has been observed for c-IAPl and X-IAP during dexamethasone- or etoposide-induced apoptosis in rat thymocytes (104). Exactly how this self-destructive mechanism is modulated during apoptosis and whether IAPs in turn can functionally target other substrates for proteolysis remain to be investigated. More recently, a newly discovered protein termed Smac/DIABLO has been shown to physically interact with and inhibit X-IAP (112, 113). During UV-induced apoptosis, Smac/DIABLO is released from the mitochondria into the
MECHANISMS OF APOPTOSIS REPRESSION
21
cytosol where it binds to certain IAPs and neutralizes their inhibitory activities (112, 113). This process is thought to sequester IAPs from interacting with caspases and thus preventing their inhibitory activities on caspases.
22
C.C.Q. VU AND J.A. CIDLOWSKI
OTHER ENDOGENOUS INHIBITORS OF CASPASES FLICE-inhibitory proteins (FLIPs) A family of proteins termed FLIPs capable of inhibiting receptor-mediated apoptosis (particularly Fas and TNFR1) has been identified first in viruses (114-116) and recently in mammals (117-123). The viral forms (v-FLIPs) are structurally characterized by two DEDs in tandem resembling those of pro-caspase-8, whereas some mammalian homologues (c-FLIPs) also exhibit a catalytically inert caspase-8-like domain in the Cterminus. Given these structural features, it has been shown that the anti-apoptotic effects of FLIPs arise in part from their recruitment to the receptor-associated DISC via DED-DED interactions with pro-caspase-8 and/or FADD (116, 117, 123, 124). This interaction apparently hinders further recruitment of pro-caspase-8 and its proteolytic processing, thus preventing the activation of downstream effector caspases. The precise role of FLIPs in apoptosis under physiological conditions is unclear as most data obtained to date were derived from protein overexpression, and in some cases a pro-apoptotic function for c-FLIP has been suggested (118-120, 122). Nevertheless, c-FLIP deficient mice exhibit defective embryogenesis and impaired heart development, a phenotype reminiscent of that reported for FADD and caspase-8 knockout mice (125). Embryonic fibroblasts derived from mice are highly sensitive to Fas- or TNF-induced apoptosis and display rapid induction of caspase activities, thus affirming an important anti-apoptotic function for c-FLIP in vivo (125). CrmA A cowpox viral product known as cytokine response modifier A (CrmA) was the first endogenous caspase inhibitor identified. Although CrmA bears no structural resemblance to IAPs, it is a potent selective inhibitor of caspase-1 and –8 with inhibitory constants in the picomolar range (126-128). Transgenic expression of CrmA has been shown to inhibit anti-Fas-induced apoptosis of lymphocytes (129). This protein is structurally similar to serine protease inhibitors (serpins), but unlike conventional serpins, CrmA specifically targets cysteine proteases. The ability by which CrmA inhibits caspases is partly attributable to the LVAD motif, which serves as a pseudosubstrate region that is proteolytically cleaved by target caspases, possibly in an apparent suicide substrate mechanism (130). p35 Similar to CrmA, the baculoviral protein p35 can block host cells from undergoing apoptosis during viral infection. p35 is structurally distinct from IAPs and CrmA, but can effectively inhibit a wide spectrum of caspases. Some in vitro target caspases include caspase-1, -2, -3, -4, -6, -7, -8 and -10, as well as the C. elegans CED-3 (caspase-3 homologue) (131-134). The anti-apoptotic mechanism of p35 appears to involve irreversible competitive inhibition of target caspases, which recognize a site reactive loop and cleave p35. Subsequently, the cleaved form of p35 remains bound to the caspases in a stable stoichiometric complex and thus sequesters these proteases (134). Several
MECHANISMS OFAPOPTOSIS REPRESSION
23
studies have shown that p35 can block apoptosis induced by Fas, glucocorticoids, DNA damage and nerve growth factor withdrawal (135). Thymocytes from transgenic mice expressing p35 are resistant to apoptosis and antigen-induced negative selection (136). In addition, targeted expression of p35 in oligodendrocytes can protect mice against autoimmune-mediated demyelination (137). THE
TRANSCRIPTION FACTORS
Regulation of Apoptosis A group of transcription factors collectively termed often regulates the expression of genes involving immune and inflammatory responses. In recent years, it has become increasingly clear that these factors are also critical regulators of the cell death apparatus. The cellular expression of is highly ubiquitous, as detectable levels have been found in virtually all cell types examined (138). Interestingly, a constitutively active and nuclear form of can exist in certain cell types such as mature B cells, thymocytes, monocytes, macrophages, neurons, corneal keratinocytes and vascular smooth muscle cells (138). is normally confined to the cytosol by physical interaction with inhibitory proteins. Certain stimuli can trigger the phosphorylation of and subsequently target them for proteasome-mediated degradation. During this process, the proteins are liberated and translocate into the nucleus for transcriptional activation of specific genes. This response can function positively to promote apoptosis or negatively as part of a death-suppressing pathway, depending on the cellular contexts (139). Studies with transgenic mice expressing a super-repressive form of in T cells suggest an essential role for during T-cell receptor-mediated apoptosis of thymocytes, possibly through down-regulation of the anti-apoptotic gene (140). On the other hand, one of the first implications that also mediates cell survival comes from the observation that knockout mice exhibit massive hepatocyte apoptosis (141). Consistent with this finding, our laboratory has observed an anti-apoptotic role for against glucocorticoid-mediated killing of hepatoma cells (142). It has been shown also that glucocorticoid treatment in vivo induces the expression of and in thymocytes and down-regulates DNA binding activity, and conversely induction diminishes dexamethasoneinduced death (143). The anti-apoptotic effects of also extend to other cellular models, including both B and T cells of the immune system and neurons of the nervous system in response to chemotherapeutic agents, ionizing radiation and peptide (139). These findings reinforce the idea that under certain cellular contexts, is important for apoptosis repression. Putative Target Genes of
for ApoptosisRepression
The major mechanism by which suppresses apoptosis is transcriptional activation of specific anti-apoptotic genes. Several members of the bcl-2 family genes have been identified as direct targets of These include the genes encoding
24
C.C.Q. VU AND J.A. CIDLOWSKI
Al/BFL-1, and NR13, all of which are anti-apoptotic members and possess functional binding sites in the promoter regions (83, 84, 144, 145). The transcriptional activation of Al/BFL-1 has been shown to protect cells against TNF- and etoposide-induced apoptosis (83, 84, 145). can also affect the transcriptional regulation of certain IAPs, whose induction can inhibit the activation of caspase-3, -7 and –9 (see above). C-IAP2 is induced in an manner in response to TNF and in cells overexpressing protein kinase (146). An site within the promoter region has been identified for C-IAP2 , and the increased expression of c-IAP2 mRNA correlates with high nuclear activity in smooth muscle cells (147). An additional known target gene of is X-IAP, which can protect endothelial cells against apoptosis (148). In HT1080 fibrosarcoma cells, treatment with induces the simultaneous expression of c-IAPl, c-IAP2 and certain TNF-associated receptor factors (TRAF1 and TRAF2). Subsequently, these molecules function cooperatively to suppress caspase-8 at the apical point of the caspase cascade during TNF signaling (149). Thus, in this survival pathway provides an exquisite mechanism to suppress caspase activation, both at the initiator and effector levels. In addition to the IAPs, other putative anti-apoptotic proteins such as the A20 zinc finger protein (150), manganese superoxide dimutase (151) and IET-1L (152), reportedly are also transcriptionally regulated by These findings illustrate a multitude of functional targets involved in cell survival pathways, and suggest that the anti-apoptotic action of can be manifested at multiple levels. CELL VOLUME REGULATION AND ION MOVEMENT AS POSSIBLE MECHANISMS OF APOPTOTIC REPRESSION One of the most striking morphological changes common to all apoptotic cells is the loss of volume. Accumulating evidence indicates that the shrunken morphology is not simply a passive or secondary event of the death process, but rather an early regulatory event essential for apoptotic progression (153, 154). Studies with a variety of apoptotic models have shown that cell shrinkage occurs prior to ultrastructural and biochemical events, including DNA degradation, cytochrome c release and effector caspase activation (155). Our laboratory and others have demonstrated that in lymphoid cells (156) and neuroblastoma cells (157), physical cell shrinkage by hyperosmotic stress is sufficient to initiate an apoptotic program . This hypertonically induced apoptosis apparently occurs only in cells that lack regulatory volume increase (RVI) capability but not in RVIresponsive cells such as COS, HeLa, or L-cells (156), suggesting an RVI response may act as a survival checkpoint that must be either inhibited or overridden for apoptotic progression. The mechanisms by which cells shrink during apoptosis are largely undefined, although this morphological change is not fully dependent on caspase activities in some death pathways (158; Vu and Cidlowski, unpublished data). The maintenance of cell volume is orchestrated mainly by ion transport, osmolyte accumulation, cytoskeletal organization and enzymatic activities (159-163). Among these, the efflux of monovalent ions has been demonstrated to be a critical and necessary element for the loss of cell volume during apoptosis (155, 160, 164-166). The primary ionic movement during cell shrinkage is likely dominated by ions, which are the most abundant intracellular ions.
MECHANISMS OF APOPTOSIS REPRESSION
25
Accumulating evidence implicates the activities of certain channel(s) as likely important regulatory components of apoptosis (155, 166-168), primarily because certain channel blockers can effectively attenuate cell shrinkage during apoptosis (155, 164, 169, 170). A possible relationship between efflux and apoptosis is substantiated by the observation that the ionophore valinomycin can induce cell death (171-173). In contrast, inhibiting efflux by a high concentration of extracellular ions can completely abrogate Fas-mediated apoptosis in Jurkat cells (160). In cortical neurons, serum depletion- or staurosporine-induced apoptosis is associated with an early enhancement of outward channel activity, resulting in net loss of intracellular ions and cell shrinkage (166). Our laboratory has also shown that many apoptotic features such as nuclease activity and effector caspase activation, mitochondrial collapse and cell shrinkage are restricted to cells with reduced levels (158, 160, 165). These findings raise the hypothesis (as depicted in Figure 5) that the maintenance of homeostasis is a critical element for cell survival, possibly by preventing inappropriate activation of apoptotic enzymes. In support of this idea, our laboratory has shown that ions at normal physiological levels can effectively prevent DNA fragmentation and effector caspase activation in rat thymocytes exposed to dexamethasone, thapsigargin and staurosporine (160, 165). Furthermore, the presence of volume regulatory mechanisms that increase intracellular ions can protect against apoptosis (156).
26
C.C.Q. VU AND J.A. CIDLOWSKI
The mechanisms that drive the apoptotic efflux and exactly how the ensuing loss of ions activates the death machinery are presently unknown. In the latter case, it is possible that perturbation of the intracellular ionic environment may directly or indirectly affect enzymatic activities of proteases and nucleases, along with protein structure and function. Furthermore, the loss of cell volume may physically force a localized concentration of pro-apoptotic factors to initiate cell death and/or the release of pro-apoptotic factors during cytoskeletal perturbation, as well as inducing gene expression and cellular metabolism. Although it is currently unclear whether the efflux of ions and volume loss function independently or cooperatively to regulate apoptosis, existing evidence implicates that the proper maintenance of intracellular ions and cell volume is likely to have important consequences for cell survival. ACKNOWLEDGMENTS
We thank Drs. C. Bortner, M. Gomez-Angelats and B. Necela from the Laboratory of Signal Transduction at NIEHS for critical review of the manuscript. REFERENCES
1 2 3 4 5 6 7 8
9 10 11
12 13 14 15 16 17 18
Reed, J. C. (1997) Vitam. Horm. 53, 99-138. Jacobson, M. D., Weil, M. and Raff, M. C. (1997) Cell 88, 347-354. O'Connor, R., Fennelly, C. and Krause, D. (2000) Biochem. Soc. Trans. 28,47-51. Hengartner, M. O. and Horvitz, H. R. (1994) Cell 76, 665-676. Tsujimoto, Y., Cossman, J., Jaffe, E. and Croce, C. M. (1985) Science 228, 14401443. Vaux, D. L., Cory, S. and Adams, J. M. (1988) Nature 335, 440-442. McDonnell, T. J., Deane, N., Platt, F. M., Nunez, G., Jaeger, U., McKearn, J. P. and Korsmeyer, S. J. (1989) Cell 57, 79-88. Katsumata, M., Siegel, R. M., Louie, D. C., Miyashita, T., Tsujimoto, Y., Nowell, P. C., Greene, M. I. and Reed, J. C. (1992) Proc. Nat. Acad. Sci. U.S.A. 89, 1137611380. Adams, J. M. and Cory, S. (1998) Science 281, 1322-1326. Reed, J. C. (1998) Oncogene 17, 3225-3236. Gross, A., McDonnell, J. M. and Korsmeyer, S. J. (1999) Genes Dev. 13, 18991911. Veis, D. J., Sorenson, C. M., Shutter, J. R. and Korsmeyer, S. J. (1993) Cell 75, 229-240. Kelekar, A. and Thompson, C. B. (1998) Trends Cell Biol. 8, 324-330. Oltvai, Z. N., Milliman, C. L. and Korsmeyer, S. J. (1993) Cell 74, 609-619. Hanada, M., Aime-Sempe, C., Sato, T. and Reed, J. C. (1995) J. Biol. Chem. 270, 11962-11969. Simonian, P. L., Grillot, D. A. and Nunez, G. (1997) Oncogene 15, 1871-1875. Holinger, E. P., Chittenden, T. and Lutz, R. J. (1999) J. Biol. Chem. 274, 1329813304. Yang, E., Zha, J., Jockel, J., Boise, L. H., Thompson, C. B. and Korsmeyer, S. J. (1995) Cell 80, 285-291.
MECHANISMS OF APOPTOSIS REPRESSION
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
27
Tanaka, S., Saito, K. and Reed, J. C. (1993) J. Biol. Chem. 268, 10920-10926. Nguyen, M., Branton, P. E., Walton, P. A., Oltvai, Z. N., Korsmeyer, S. J. and Shore, G. C. (1994) J. Biol. Chem. 269, 16521-16524. Hockenbery, D. M., Zutter, M., Hickey, W., Nahm, M. and Korsmeyer, S. J. (1991) Proc. Nat. Acad. Sci. U.S.A. 88, 6961-6965. Krajewski, S., Krajewska, M., Shabaik, A., Wang, H. G., Irie, S., Fong, L. and Reed, J. C. (1994) Cancer Res. 54, 5501-5507. Krajewski, S., Bodrug, S., Krajewska, M., Shabaik, A., Gascoyne, R., Berean, K. and Reed, J. C. (1995) Amer. J. Pathol. 146, 1309-1319. Krajewski, S., Krajewska, M., Shabaik, A., Miyashita, T., Wang, H. G. and Reed, J. C. (1994) Amer. J. Pathol. 145, 1323-1336. de Jong, D., Prins, F. A., Mason, D. Y., Reed, J. C., van Ommen, G. B. and Kluin, P. M. (1994) Cancer Res. 54, 256-260. Krajewski, S., Tanaka, S., Takayama, S., Schibler, M. J., Fenton, W. and Reed, J. C. (1993) Cancer Res. 53, 4701-4714. Hsu, Y. T., Wolter, K. G. and Youle, R. J. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 3668-3672. Gross, A., Jockel, J., Wei, M. C. and Korsmeyer, S. J. (1998) EMBO J. 17, 38783885. Puthalakath, H., Huang, D. C., O'Reilly, L. A., King, S. M. and Strasser, A. (1999) Mol. Cell 3, 287-296. Green, D. R. and Reed, J. C. (1998) Science 281, 1309-1312. Kroemer, G., Dallaporta, B. and Resche-Rigon, M. (1998) Annu. Rev. Physiol. 60, 619-642. Budihardjo, L, Oliver, H., Lutter, M., Luo, X. and Wang, X. (1999) Annu. Rev. Cell. Dev. Biol. 15, 269-290. Jurgensmeier, J. M., Xie, Z., Deveraux, Q., Ellerby, L., Bredesen, D. and Reed, J. C. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 4997-5002. Kharbanda, S., Pandey, P., Schofield, L., Israels, S., Roncinske, R., Yoshida, K., Bharti, A., Yuan, Z. M., Saxena, S., Weichselbaum, R., Nalin, C. and Kufe, D. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 6939-6942. Kluck, R. M., Bossy-Wetzel, E., Green, D. R. and Newmeyer, D. D. (1997) Science 275, 1132-1136. Yang, J., Liu, X., Bhalla, K., Kim, C. N., Ibrado, A. M., Cai, J., Peng, T. I., Jones, D. P. and Wang, X. (1997) Science 275, 1129-1132. Hu, Y., Benedict, M. A., Wu, D., Inohara, N. and Núñtez, G. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 4386-4391. Pan, G., O'Rourke, K. and Dixit, V. M. (1998) J. Biol. Chem. 273, 5841-5845. Song, Q., Kuang, Y., Dixit, V. M. and Vincenz, C. (1999) EMBO J. 18, 167-178. Chinnaiyan, A. M., O'Rourke, K., Lane, B. R. and Dixit, V. M. (1997) Science 275, 1122-1126. Wu, D., Wallen, H. D. and Núñez, G. (1997) Science 275, 1126-1129. Hengartner, M. O. (1997) Nature 388, 714-715. Marzo, I., Brenner, C., Zamzami, N., Susin, S. A., Beutner, G., Brdiczka, D., Remy, R., Xie, Z. H., Reed, J. C. and Kroemer, G. (1998) J. Exp. Med. 187, 1261-1271.
28
44 45 46 47 48 49 50
51 52 53 54
55 56 57 58
59 60 61 62 63
64 65 66 67
C.C.Q. VU AND J.A. CIDLOWSKI
Narita, M., Shimizu, S., Ito, T., Chittenden, T., Lutz, R. J., Matsuda, H. and Tsujimoto, Y. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 14681-14686. Shimizu, S., Narita, M. and Tsujimoto, Y. (1999) Nature 399, 483-487. Shimizu, S. and Tsujimoto, Y. (2000) Proc. Nat. Acad. Sci. U.S.A. 97, 577-582. Desagher, S., Osen-Sand, A, Nichols, A, Eskes, R., Montessuit, S., Lauper, S., Maundrell, K., Antonsson, B. and Martinou, J. C. (1999) J. Cell Biol. 144, 891-901. Minn, A. J., Velez, P., Schendel, S. L., Liang, H., Muchmore, S. W., Fesik, S. W., Fill, M. and Thompson, C. B. (1997) Nature 385, 353-357. Schendel, S. L., Xie, Z., Montal, M. O., Matsuyama, S., Montal, M. and Reed, J. C. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 5113-5118. Antonsson, B., Conti, F., Ciavatta, A., Montessuit, S., Lewis, S., Martinou, I., Bernasconi, L., Bernard, A, Mermod, J. J., Mazzei, G., Maundrell, K., Gambale, F., Sadoul, R. and Martinou, J. C. (1997) Science 277, 370-372. Schlesinger, P. H., Gross, A, Yin, X. M., Yamamoto, K., Saito, M., Waksman, G. and Korsmeyer, S. J. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 11357-11362. He, H., Lam, M., McCormick, T. S. and Distelhorst, C. W. (1997) J. Cell Biol. 138, 1219-1228. Srivastava, R. K., Sollott, S. J., Khan, L., Hansford, R., Lakatta, E. G. and Longo, D. L. (1999) Mol. Cell. Biol. 19, 5659-5674. Foyoua-Youssefi, R., Arnaudeau, S., Borner, C., Kelley, W. L., Tschopp, J., Lew, D. P., Demaurex, N. and Krause, K. H. (2000) Proc. Nat. Acad. Sci. U.S.A. 97, 5723-5728. Beham, A., Marin, M. C., Fernandez, A., Herrmann, J., Brisbay, S., Tari, A. M., Lopez-Berestein, G., Lozano, G., Sarkiss, M. and McDonnell, T. J. (1997) Oncogene 15, 2767-2772. Voehringer, D. W., McConkey, D. J., McDonnell, T. J., Brisbay, S. and Meyn, R. E. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 2956-2960. Kane, D. J., Sarafian, T. A., Anton, R., Hahn, H., Gralla, E. B., Valentine, J. S., Ord, T. and Bredesen, D. E. (1993) Science 262, 1274-1277. Hockenbery, D. M., Oltvai, Z. N., Yin, X. M., Milliman, C. L. and Korsmeyer, S. J. (1993) Cell 75, 241-251. Shen, Y. and Shenk, T. (1994) Proc. Nat. Acad. Sci. U.S.A. 91, 8940-8944. Grimm, S., Bauer, M. K., Baeuerle, P. A. and Schulze-Osthoff, K. (1996) J. Cell Biol. 134, 13-23. de Moissac, D., Mustapha, S., Greenberg, A. H. and Kirshenbaum, L. A. (1998) J. Biol. Chem. 273, 23946-23951. Debatin, K. M. and Krammer, P. H. (1995) Leukemia 9, 815-820. Maraskovsky, E., O'Reilly, L. A., Teepe, M., Corcoran, L. M., Peschon, J. J. and Strasser, A. (1997) Cell 89, 1011-1019. Soilu-Hanninen, M., Ekert, P., Bucci, T., Syroid, D., Bartlett, P. F. and Kilpatrick, T. J. (1999) J. Neurosci.19, 4828-4838. Sentman, C. L., Shutter, J. R., Hockenbery, D., Kanagawa, O. and Korsmeyer, S. J. (1991) Cell 67, 879-888. Caron-Leslie, L. A, Evans, R. B. and Cidlowski, J. A. (1994) FASEB J. 8, 639-645. Huang, S. T. and Cidlowski, J. A. (1999) FASEB J. 13, 467-476.
MECHANISMS OF APOPTOSIS REPRESSION
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
91
29
Muchmore, S. W., Sattler, M., Liang, H., Meadows, R. P., Harlan, J. E., Yoon, H. S., Nettesheim, D., Chang, B. S., Thompson, C. B., Wong, S. L., Ng, S. L. and Fesik, S. W. (1996) Nature 381, 335-341. Sattler, M., Liang, H., Nettesheim, D., Meadows, R. P., Harlan, J. E., Eberstadt, M., Yoon, H. S., Shuker, S. B., Chang, B. S., Minn, A. J., Thompson, C. B. and Fesik, S. W. (1997) Science 275, 983-986. Li, H., Zhu, H., Xu, C. J. and Yuan, J. (1998) Cell 94, 491-501. Luo, X., Budihardjo, I., Zou, H., Slaughter, C. and Wang, X. (1998) Cell 94, 481490. McDonnell, J. M., Fushman, D., Milliman, C. L., Korsmeyer, S. J. and Cowburn, D. (1999) Cell 96, 625-634. Zha, J., Harada, H., Yang, E., Jockel, J. and Korsmeyer, S. J. (1996) Cell 87, 619628. Lizcano, J. M., Morrice, N. and Cohen, P. (2000) Biochem. J. 349, 547-557. Virdee, K., Parone, P. A. and Tolkovsky, A. M. (2000) Curr. Biol. 10, 1151-1154. Wang, H. G., Pathan, N., Ethell, I. M., Krajewski, S., Yamaguchi, Y., Shibasaki, F., McKeon, F., Bobo, T., Franke, T. F. and Reed, J. C. (1999) Science 284, 339-343. Haldar, S., Jena, N. and Croce, C. M. (1995) Proc. Nat. Acad. Sci. U.S.A. 92, 45074511. Chang, B. S., Minn,A.J., Muchmore, S. W., Fesik, S. W. and Thompson, C. B. (1997) EMBO J.16, 968-977. Maundrell, K.,Antonsson, B., Magnenat, E., Camps, M., Muda, M., Chabert, C., Gillieron, C., Boschert, U., Vial-Knecht, E., Martinou, J. C. and Arkinstall, S. (1997) J. Biol. Chem. 272, 25238-25242. Furukawa, Y., Iwase, S., Kikuchi, J., Terui, Y., Nakamura, M., Yamada, H., Kano, Y. and Matsuda, M. (2000) J. Biol. Chem. 275, 21661-21667. Miyashita, T., Harigai, M., Hanada, M. and Reed, J. C. (1994) Cancer Res. 54, 3131-3135. Sanz, C., Benito, A., Inohara, N., Ekhterae, D., Nunez, G. and Fernandez-Luna, J. L. (2000) Blood 95, 2742-2747. Wang, C. Y., Guttridge, D. C., Mayo, M. W. and Baldwin, A. S., Jr. (1999) Mol. Cell. Biol. 19, 5923-5929. Zong, W. X., Edelstein, L. C., Chen, C., Bash, J. and Gelinas, C. (1999) Genes Dev. 13, 382-387. Riccio, A., Ahn, S., Davenport, C. M., Blendy, J. A. and Ginty, D. D. (1999) Science 286, 2358-2361. Miller, L. K. (1999) Trends Cell Biol. 9, 323-328. Deveraux, Q. L. and Reed, J. C. (1999) Genes Dev. 13, 239-252. Crook, N. E., Clem, R. J. and Miller, L. K. (1993) J. Virol. 67, 2168-2174. Duckett, C. S., Nava, V. E., Gedrich, R. W., Clem, R. J., Van Dongen, J. L., Gilfillan, M. C., Shiels, H., Hardwick, J. M. and Thompson, C. B. (1996) EMBO J. 15, 2685-2694. Liston, P., Roy, N., Tamai, K., Lefebvre, C., Baird, S., Cherton-Horvat, G., Farahani, R., McLean, M., Ikeda, J. E., MacKenzie, A. and Korneluk, R. G. (1996) Nature 379, 349-353. Ambrosini, G., Adida, C. and Altieri, D. C. (1997) Nature Med. 3, 917-921.
30
92 93 94 95 96 97
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
115
C.C.Q. VU AND J.A. CIDLOWSKI
Tamm, I., Wang, Y., Sausville, E., Scudiero, D. A., Vigna, N., Oltersdorf, T. and Reed, J. C. (1998) Cancer Res. 58, 5315-5320. Datta, R., Oki, E., Endo, K., Biedermann, V., Ren, J. and Kufe, D. (2000) J. Biol. Chem. 275, 31733-31738. Kawasaki, H., Altieri, D. C., Lu, C. D., Toyoda, M., Tenjo, T. and Tanigawa, N. (1998) Cancer Res. 58, 5071-5074. Adida, C., Crotty, P. L., McGrath, J., Berrebi, D., Diebold, J. and Altieri, D. C. (1998) Amer. J. Pathol. 152, 43-49. Lu, C. D., Altieri, D. C. and Tanigawa, N. (1998) Cancer Res. 58, 1808-1812. Li, F., Ambrosini, G., Chu, E. Y., Plescia, J., Tognin, S., Marchisio, P. C. and Altieri, D. C. (1998) Nature 396, 580-584. Uren, A. G., Coulson, E. J. and Vaux, D. L. (1998) Trends Biochem. Sci. 23, 159162. Fraser, A. and James, C. (1998) Trends Cell Biol. 8, 219-221. Fraser, A. G, James, C., Evan, G. I. and Hengartner, M. O. (1999) Curr. Biol. 9, 292-301. Roy, N., Deveraux, Q. L., Takahashi, R., Salvesen, G. S. and Reed, J. C. (1997) EMBO J. 16, 6914-6925. Takahashi, R., Deveraux, Q., Tamm, I., Welsh, K., Assa-Munt, N., Salvesen, G. S. and Reed, J. C. (1998) J. Biol. Chem. 273, 7787-7790. Deveraux, Q. L., Roy, N., Stennicke, H. R., Van Arsdale, T., Zhou, Q., Srinivasula, S. M., Alnemri, E. S., Salvesen, G. S. and Reed, J. C. (1998) EMBO J. 17, 22152223. Yang, Y., Fang, S., Jensen, J. P., Weissman, A. M. and Ashwell, J. D. (2000) Science 288, 874-877. Young, S. S., Liston, P., Xuan, J. Y., McRoberts, C., Lefebvre, C. A. and Korneluk, R. G. (1999) Mamm. Genome 10, 44-48. Roy, N. et al. (1995) Cell 80, 167-178. Deveraux, Q. L., Takahashi, R., Salvesen, G. S. and Reed, J. C. (1997) Nature 388, 300-304. Orth, K. and Dixit, V. M. (1997) J. Biol. Chem. 272, 8841-8844. Duckett, C. S., Li, F., Wang, Y., Tomaselli, K. J., Thompson, C. B. and Armstrong, R. C. (1998) Mol. Cell. Biol. 18, 608-615. Krammer, P. H. (1999) Adv. Immunol. 71, 163-210. Wallach, D., Varfolomeev, E. E., Malinin, N. L., Goltsev, Y. V., Kovalenko, A. V. and Boldin, M. P. (1999) Annu. Rev. Immunol. 17, 331-367. Du, C., Fang, M., Li, Y., Li, L. and Wang, X. (2000) Cell 102, 33-42. Verhagen, A. M., Ekert, P. G., Pakusch, M., Silke, J., Connolly, L. M., Reid, G. E., Moritz, R. L., Simpson, R. J. and Vaux, D. L. (2000) Cell 102, 43-53. Thome, M., Schneider, P., Hofmann, K., Fickenscher, H., Meinl, E., Neipel, F., Mattmann, C., Burns, K., Bodmer, J. L., Schroter, M., Scaffidi, C., Krammer, P. H., Peter, M. E. and Tschopp, J. (1997) Nature 386, 517-521. Hu, S., Vincenz, C., Buller, M. and Dixit, V. M. (1997) J. Biol. Chem. 272, 96219624.
MECHANISMSOFAPOPTOSIS REPRESSION
31
116 Bertin, J., Armstrong, R. C., Ottilie, S., Martin, D. A., Wang, Y., Banks, S., Wang, 117
118 119 120 121 122
123
124
125
126 127 128 129 130 131 132 133
134 135 136
G. H., Senkevich, T. G., Alnemri, E. S., Moss, B., Lenardo, M. J., Tomaselli, K. J. and Cohen, J. I. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 1172-1176. Irmler, M., Thome, M., Hahne, M., Schneider, P., Hofmann, K., Steiner, V., Bodmer, J. L., Schroter, M., Burns, K., Mattmann, C., Rimoldi, D., French, L. E. and Tschopp, J. (1997) Nature 388, 190-195. Goltsev, Y. V., Kovalenko, A. V., Arnold, E., Varfolomeev, E. E., Brodianskii, V. M. and Wallach, D. (1997) J. Biol. Chem. 272, 19641-19644. Shu, H. B., Halpin, D. R. and Goeddel, D. V. (1997) Immunity 6, 751-763. Inohara, N., Koseki, T., Hu, Y., Chen, S. and Nunez, G. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 10717-10722. Srinivasula, S. M., Ahmad, M., Ottilie, S., Bullrich, F., Banks, S., Wang, Y., Fernandes-Alnemri, T., Croce, C. M., Litwack, G., Tomaselli, K. J., Armstrong, R. C. and Alnemri, E. S. (1997) J. Biol. Chem. 272, 18542-18545. Han, D. K., Chaudhary, P. M., Wright, M. E., Friedman, C., Trask, B. J., Riedel, R. T., Baskin, D. G., Schwartz, S. M. and Hood, L. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 11333-11338. Rasper, D. M., Vaillancourt, J. P., Hadano, S., Houtzager, V. M., Seiden, I., Keen, S. L., Tawa, P., Xanthoudakis, S., Nasir, J., Martindale, D., Koop, B. F., Peterson, E. P., Thornberry, N. A., Huang, J., MacPherson, D. P., Black, S. C., Hornung, F., Lenardo, M. J., Hayden, M. R., Roy, S. and Nicholson, D. W. (1998) Cell Death Differ. 5, 271-288. Scaffidi, C., Schmitz, I., Krammer, P. H. and Peter, M. E. (1999) J. Biol. Chem. 274, 1541-1548. Yeh, W. C., Itie, A., Elia, A. J., Ng, M., Shu, H. B., Wakeham, A., Mirtsos, C., Suzuki, N., Bonnard, M., Goeddel, D. V. and Mak, T. W. (2000) Immunity 12, 633642. Komiyama, T., Ray, C. A., Pickup, D. J., Howard, A. D., Thornberry, N. A., Peterson, E. P. and Salvesen, G. (1994) J. Biol. Chem. 269, 19331-19337. Zhou, Q., Snipas, S., Orth, K., Muzio, M., Dixit, V. M. and Salvesen, G. S. (1997) J. Biol. Chem. 272, 7797-7800. Garcia-Calvo, M., Peterson, E. P., Leiting, B., Ruel, R., Nicholson, D. W. and Thornberry, N. A. (1998) J. Biol. Chem. 273, 32608-32613. Smith, K. G., Strasser, A. and Vaux, D. L. (1996) EMBO J. 15, 5167-5176. Ekert, P. G, Silke, J. and Vaux, D. L. (1999) EMBO J. 18, 330-338. Bump, N. J. et al. (1995) Science 269, 1885-1888. Xue, D. and Horvitz, H. R. (1995) Nature 377, 248-251. Bertin, J., Mendrysa, S. M., LaCount, D. J., Gaur, S., Krebs, J. F., Armstrong, R. C., Tomaselli, K. J. and Friesen, P. D. (1996) J. Virol. 70, 6251-6259. Zhou, Q., Krebs, J. F., Snipas, S. J., Price, A., Alnemri, E. S., Tomaselli, K. J. and Salvesen, G. S. (1998) Biochemistry 37, 10757-10765. Villa, P., Kaufmann, S. H. and Earnshaw, W. C. (1997) Trends Biochem. Sci. 22, 388-393. Izquierdo, M., Grandien, A., Criado, L. M., Robles, S., Leonardo, E., Albar, J. P., de Buitrago, G. G. and Martinez, A. C. (1999) EMBO J. 18, 156-166.
32
C.C.Q. VU AND0 J.A. CIDLOWSKI
137 Hisahara, S., Araki, T., Sugiyama, F., Yagami, K., Suzuki, M., Abe, K., Yamamura, K., Miyazaki, J., Momoi, T., Saruta, T., Bernard, C. C., Okano, H. and Miura, M. (2000)EMBO J. 19, 341-348. 138 Ghosh, S., May, M. J. and Kopp, E. B. (1998) Annu. Rev. Immunol. 16, 225-260. 139 Barkett, M. and Gilmore, T. D. (1999) Oncogene 18, 6910-6924. 140 Hettmann, T., DiDonato, J., Karin, M. and Leiden, J. M. (1999) J. Exp. Med. 189, 145-158. 141 Beg, A. A., Sha, W. C., Bronson, R. T., Ghosh, S. and Baltimore, D. (1995) Nature 376, 167-170. 142 Evans-Storms, R. B. and Cidlowski, J. A. (2000) Endocrinology 141, 1854-1862. 143 Wang, W., Wykrzykowska, J., Johnson, T., Sen, R. and Sen, J. (1999) J. Immunol. 162, 314-322. 144 Lee, H. H., Dadgostar, H., Cheng, Q., Shu, J. and Cheng, G. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 9136-9141. 145 Grumont, R. J., Rourke, I. J. and Gerondakis, S. (1999) Genes Dev. 13, 400-411. 146 Johannes, F. J., Horn, J., Link, G., Haas, E., Siemienski, K., Wajant, H. and Pfizenmaier, K. (1998) Eur. J. Biochem. 257, 47-54. 147 Erl, W., Hansson, G. K., de Martin, R., Draude, G., Weber, K. S. and Weber, C. (1999) Circ. Res. 84, 668-677. 148 Stehlik, C., de Martin, R., Kumabashiri, I., Schmid, J. A., Binder, B. R. and Lipp, J. (1998) J. Exp. Med. 188, 211-216. 149 Wang, C. Y., Mayo, M. W., Korneluk, R. G., Goeddel, D. V. and Baldwin, A. S., Jr. (1998) Science 281, 1680-1683. 150 Sarma, V., Lin, Z., Clark, L., Rust, B. M., Tewari, M., Noelle, R. J. and Dixit, V. M. (1995) J. Biol. Chem. 270, 12343-12346. 151 Jones, P. L., Ping, D. and Boss, J. M. (1997) Mol. Cell. Biol. 17, 6970-6981. 152 Wu, M. X., Ao, Z., Prasad, K. V., Wu, R. and Schlossman, S. F. (1998) Science 281, 998-1001. 153 Bortner, C. D. and Cidlowski, J. A. (1998) Biochem. Pharmacol. 56, 1549-1559. 154 Yu, S. P. and Choi, D. W. (2000) Proc. Nat. Acad. Sci. U.S.A. 97, 9360-9362. 155 Maeno, E., Ishizaki, Y., Kanaseki, T., Hazama, A. and Okada, Y. (2000) Proc. Nat. Acad. Sci. U.S.A. 97, 9487-9492. 156 Bortner, C. D. and Cidlowski, J. A. (1996) Amer. J. Physiol. 271, C950-961. 157 Matthews, C. C. and Feldman, E. L. (1996) J. Cell. Physiol. 166, 323-331. 158 Bortner, C. D. and Cidlowski, J. A. (1999) J. Biol. Chem. 274, 21953-21962. 159 Burg, M. B., Kwon, E. D. and Kultz, D. (1996) FASEB J. 10, 1598-1606. 160 Bortner, C. D., Hughes, F. M., Jr. and Cidlowski, J. A. (1997) J. Biol. Chem. 272, 32436-32442. 161 Moustakas, A., Theodoropoulos, P. A., Gravanis, A., Haussinger, D. and Stournaras, C. (1998) Contrib. Nephrol. 123, 121-134. 162 Chevaile, A., Santos, B., Randall, J., Stears, R. and Gullans, S. R. (1998) Contrib. Nephrol. 123, 110-120. 163 Roger, F., Martin, P. Y., Rousselot, M., Favre, H. and Feraille, E. (1999) J. Biol. Chem. 274, 34103-34110. 164 Beauvais, F., Michel, L. and Dubertret, L. (1995) J. Leukoc. Biol. 57, 851-855.
MECHANISMS OF APOPTOSIS REPRESSION
33
165 Hughes, F. M., Jr., Bortner, C. D., Purdy, G. D. and Cidlowski, J. A. (1997) J. Biol. Chem. 272, 30567-30576. 166 Yu, S. P., Yeh, C. H., Sensi, S. L., Gwag, B. J., Canzoniero, L. M., Farhangrazi, Z. S., Ying, H. S., Tian, M., Dugan, L. L. and Choi, D. W. (1997) Science 278, 114117. 167 Avdonin, V., Kasuya, J., Ciorba, M. A., Kaplan, B., Hoshi, T. and Iverson, L. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 11703-11708. 168 Wang, L., Xu, D., Dai, W. and Lu, L. (1999) J. Biol. Chem. 274, 3678-3685. 169 Dallaporta, B., Marchetti, P., de Pablo, M. A., Maisse, C., Duc, H. T., Metivier, D., Zamzami, N., Geuskens, M. and Kroemer, G. (1999) J. Immunol. 162, 6534-6542. 170 Nietsch, H. H., Roe, M. W., Fiekers, J. F., Moore, A. L. and Lidofsky, S. D. (2000) J. Biol. Chem. 275, 20556-20561. 171 Allbritton, N. L., Verret, C. R., Wolley, R. C. and Eisen, H. N. (1988) J. Exp. Med. 167, 514-527. 172 Duke, R. C., Witter, R. Z., Nash, P. B., Young, J. D. and Ojcius, D. M. (1994) FASEB J. 8, 237-246. 173 Inai, Y., Yabuki, M., Kanno, T., Akiyama, J., Yasuda, T. and Utsumi, K. (1997) Cell Struct. Funct. 22, 555-563.
CYTOKINE ACTIVATION OF TRANSCRIPTION
Kerri A. Mowen and Michael David Division of Biology, Molecular Biology Section University of California, San Diego La Jolla, CA 92093
INTRODUCTION Cytokines represent a group of transiently-expressed, low molecular weight molecules that fulfill a vital role in intercellular communication, particularly in the immune system. Cytokines regulate numerous physiological and pathological events such as lymphocyte development and activation, programmed cell death and inflammatory processes. Research over the past decade has led to the discovery of many new Cytokines and their respective cell surface receptors, and has elucidated the signaling pathways that lead to cytokine receptor-mediated gene induction. Equal to their complex biological effects, cytokine-induced transcriptional activation is the product of an intricate network of multiple signaling cascades. This article presents an overview of cytokine-mediated transcriptional induction through STAT proteins and two signaling pathways widely utilized among cytokine receptors. TRANSCRIPTIONAL ACTIVATION THROUGH THE JAK/STAT PATHWAY The discovery of the Jak/STAT pathway is owed to the quest for understanding the mechanism by which interferons activate immediate early response genes. The first STAT proteins were identified as the regulatory components of the type I interferon induced
Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
35
36
K.A. MOWEN AND M. DAVID
transcription factor complex, ISGF3 (1). STAT1 and STAT2, in conjunction with the DNA-binding subunit p48, were found to bind the ISRE (Interferon Stimulated Response Element) enhancer as a heterodimer in response to Alternatively, STAT1 homodimers activated via the receptor can bind a distinct element termed GAS Shortly after the cloning of STAT1 and STAT2, numerous other cytokines and growth factors were reported to activate DNA-binding proteins with STAT-like characteristics (summarized in Table 1) (3, 4). Efforts to identify these proteins resulted in the cloning of five additional mammalian STAT genes (STAT3, STAT4, STAT5A, STAT5B, STAT6), which, in analogy to the interferon-signaling model, bind to GAS-like elements (5). Several of the STAT proteins are represented by various splice variants; however, STAT5A and STAT5B are encoded by two distinct genes. In contrast to the more restricted expression of STAT4 in spleen, heart, brain, peripheral blood cells, and testis, most STAT proteins are found rather ubiquitous. All seven STAT proteins share several conserved structural and functional domains (Figure 1). A carboxy-terminal tyrosine residue, which serves as the target for ligand-induced phosphorylation, is crucial for dimerization, nuclear translocation and DNA-binding of STAT molecules. The src-homology 2 (SH2) domain of STAT proteins facilitates homoor heterodimerization via reciprocal interaction with the phosphorylated tyrosine residue. In addition, the SH2 domain accounts for the specific interaction of STAT molecules with activated, tyrosine phosphorylated cytokine and growth factor receptors, as well as Jak tyrosine kinases (6). Dimerization of STAT proteins allows the centrally-located DNAbinding domains to interact with consensus palindromic sequences that differ only in their core nucleotide sequence (7). Interestingly, STAT2 is unable to bind DNA in such a way; rather, its function as a transcriptional activator appears to be restricted to its participation in the multimeric ISGF3 complex. The more divergent region carboxy-terminal to the tyrosine phosphorylation site harbors the transactivation domain (TAD). Posttranslational modification of several STAT members through serine phosphorylation (8), presumably by members of the ERK/SAPK kinase family (9), accounts for the regulated interaction of the STAT-TAD domains with transcriptional coactivators such as CBP/p300 or MCM5 (10,11). It was again the paradigm of the interferon-signaling cascade that led to the discovery of the participation of the Janus tyrosine kinases in STAT-mediated gene transcription (12). The demonstration that mutagenically ablated expression of Tyk2 or Jak1 resulted in unresponsive cells, while similar abrogation of Jak1 and Jak2 expression resulted in the loss of responsiveness, provided the basis for the function of these tyrosine kinases and the identity of this novel signaling pathway (13, 14). Contrary to the ubiquitous expression of these kinases, a fourth family member, Jak3, was identified whose expression is restricted to cells of hematopoietic origin (15). The most striking feature in the structure of Jak family members is the presence of a kinase-like domain of function thus far unclear, in addition to the true kinase domain. Also noteworthy is the absence of any SH2- or SH3-iike structures that are commonly found in cytoplasmic tyrosine kinases. The relatively large (120-140 kDa) Jak kinases are constitutively associated with many cytokine and growth factor receptors. Nevertheless, this association can be increased after cytokine stimulation.
CYTOKINE ACTIVATION OF TRANSCRIPTION
37
38
K.A. MOWEN AND M. DAVID
The general outline of the sequential events that lead to STAT-mediated gene transcription has been derived mostly from evidence gathered from the interferon signaling paradigm. It is believed that engagement of ligands with their respective receptors results in an increased local concentration of Jak proteins due to receptor aggregation and increased affinity of the receptors for Jak kinases. Subsequent crossphosphorylation of the Jaks results in the activation of their kinase activity, such that they can phosphorylate tyrosine residues in the receptor chains. These phospho-tyrosine moieties provide the docking sites for the STAT proteins via their SH2-domains, leading to their tyrosine phosphorylation, dimerization and nuclear translocation. Parallel signaling events, presumably involving ERK/SAPK family members, are responsible for further phosphorylation of the serine residues in the carboxy-terminus of the STAT proteins (Figure 2), which are required for efficient transcriptional response. In the case of growth factor receptors, not the presence of Jak kinases, but rather the intrinsic tyrosine kinase activity of the receptors is required for the tyrosine phosphorylation of STAT proteins. As is the case with other signaling pathways, much of our understanding of the diverse functions of Jak and STAT proteins in vivo has been revealed through the generation of knock-out mice (16). As anticipated, STAT1-deficient mice display dramatically increased sensitivity towards viral and microbial pathogens, presumably due to their inability to respond to interferons (17, 18). Disruption of the STAT3 gene results in early embryonic lethality, and only the use of conditional gene targeting revealed an essential role for STAT3 in T cell and macrophage function (19, 20). As predicted by the
CYTOKINE ACTIVATION OF TRANSCRIPTION
restricted activation of STAT4 by IL-12, Thl-cell development is severely impaired by the disruption of the STAT4 gene (21, 22). Consistent with the activation of STAT5 by prolactin, STAT5a-deficient mice fail to lactate; in contrast, the absence of STAT5b causes failure to respond to growth hormone (23). The role of STAT6 in IL-4 signaling was confirmed by the impairment of Th2 development and lack of IgE class switching. Jakl-deficient mice fail to nurse, display severely impaired lymphocyte development and die perinatally (24, 25) The function of Jak2 in erythropoietin signaling is evidenced by embryonic lethality due to defective erythropoiesis in its absence (26). Of particular interest are results obtained from Jak3-deficient mice (27), since mutations in Jak3 have also been identified in humans. Jak3 interacts with the common cytokine receptor chain, a component of the receptors for IL2, IL4, IL7, IL9 and IL15, and mutation of which is responsible for X-SCID. Similarly, Jak3 deficiency in mice or mutations of Jak3 in humans cause SCID-like phenotypes (28). Of equal importance to the activation of a signaling pathway is its spatially and temporally coordinated attenuation. Several independent mechanisms are responsible for the negative regulatory control over the Jak/STAT pathway (29). The tyrosine
39
40
K.A. MOWEN AND M. DAVID
phosphatase SHP1 acts on activated receptors and the Jak kinases (30, 31), whereas an as of yet unidentified nuclear tyrosine phosphatase dephosphorylates STAT proteins (32). The SH2-domain containing SOCS proteins have been identified as a family of cytokineinducible inhibitors of Jak kinase activity, thus acting in a classical negative feedback loop (33). In contrast, the constitutively expressed nuclear PIAS proteins do not prevent the phosphorylation of STAT proteins, but exert their negative regulatory role by association with tyrosine phosphorylated STAT dimers and preventing them from binding DNA (34). TRANSCRIPTIONAL REGULATION VIA
ACTIVATION
Stress events such as infections or tissue damage cause the immune system to evoke an inflammatory response leading to a cascade of local and systemic counteracting effects. Components or products of infectious microorganisms, e.g., LPS, induce the expression of cytokines such as IL-1 and tumor necrosis factor (TNF). The binding of these proinflammatory cytokines to their cell surface receptors rapidly induces a genetic program to respond to cellular stress and to generate inflammatory mediators (e.g., chemokines, acute phase proteins, proteases and prostaglandins). Chronic inflammation can result in extensive tissue destruction and disease, such as rheumatoid arthritis. Therefore, it is important that signaling molecules transducing IL-1 and TNF signals not only be triggered quickly but also signal transiently. The transcription factor fulfills these characteristics. Although a key mediator in the inflammatory response, was first isolated in B cells as a factor necessary for immunoglobulin kappa light chain transcription. Later, was found to exist in most cell types (35). binding sites have been identified in a number of genes encoding cytokines and chemokines, adhesion molecules, acute phase proteins, antiapoptotic genes or transcription factors (36). Thus, was implicated in innate immunity as well as in the adaptive immune response. exists as a homo- or heterodimer of any of the five subunits thus far isolated – RelA (p65), RelB, c-Rel, p50 and p52. The p50 and p52 subunits are first generated as the longer p105 and p100 forms, respectively, which are then proteolytically processed into the shorter, active forms. All members contain a conserved rel homology domain (RHD) within their N-terminus which contains the dimerization, I B- binding and NLS regions. Additionally, RelA, RelB and cRel contain a C-terminal transactivation domain which is absent in p50 and p52. Hence, p50 and p52 homodimers are transcriptionally repressive (35). TNF and IL-1, and LPS itself, activate a pathway which results in the degradation of inhibitors of freeing proteins from the cytosolic tether and exposing the nuclear localization signal (NLS). Similar to transcriptional induction via STAT proteins, the activation of does not require protein synthesis. Once in the nucleus, negatively regulates its own activity by inducing transcription of which enters the nucleus and chaperones molecules out through an active export pathway involving the nuclear export sequence (NES) (37). The proteins bind to dimers and mask the NLS. The family members and Bc1-3) contain ankyrin repeats which bind the an N-terminal regulatory domain and a C-terminal PEST motif for proteolytic degradation (35). Activators of the pathway trigger serine phosphorylation of proteins (Ser32 and Ser36 on which targets for
CYTOKINE ACTIVATION OF TRANSCRIPTION
41
ubiquitin-mediated destruction by the 26S proteasome (38). Two closely-related proteins with kinase (IKK) activity were identified as and Additionally, a putative scaffolding subunit was isolated and called (NEMO). and are 52% similar and both contain an N-terminal serine/threonine kinase domain, a leucine zipper and a helix-loop-helix (HLH) motif (39). Upon stimulation by TNF and IL-1, the IKK subunits are phosphorylated on serine residues. Substitution of alanines for these serines abrogated kinase activity; however, mutations of the homologous serines within left TNF and IL-1 induced activity intact, suggesting that is the main IKK kinase subunit through which proinflammatory cytokines signal (40). The and knock-out mice confirmed these earlier biochemical experiments, as embryonic fibroblasts are normal in proinflammatory cytokine-induced activation. In contrast, mice, which die between embryonic day 12.5 and 14.5 similar to RelA-/- embryos from severe fetal liver destruction, lack the ability to suppress TNF-induced apoptosis via Indeed, when RelA-/- or mice are crossed to TNF receptor 1 or deficient mice, embryonic survival is rescued (37). Analysis of the activation sites of the IKK kinases revealed that the MAPKKK NIK activates the IKK complex and NIK was originally isolated in a yeast two hybrid screen for TNF-associated factor 2 (TRAF2) and was found to inhibit TNF- and IL-1-induced activation when expressed in a dominant negative form (41). The natural mouse mutant alymphoplasia (aly) was identified to have a single amino acid mutation (G855R) within the C-terminal TRAF-interacting region of NIK. Interestingly, embryonic fibroblasts from NIK-/- mice are only defective in lymphotoxin induced but not IL-1- or TNF-induced activation, suggesting that NIK plays a critical role in lymphotoxin signaling but not in the TNF or IL-1 pathways, despite the results of the earlier overexpression studies (42). In addition to NIK, MEKK1 can also phosphorylate in vitro (43). However, although MEKK1 activity is induced by TNF, MEKK1 is not as potent a stimulator of IKK activity as NIK (42). Furthermore, inactivation of MEKK1 does not impair activation by TNF or IL-1 (44). Thus, additional kinases are likely to be involved in activating IKK in particular in vivo situations. A number of adapter proteins link the IL-1 and TNF receptors to IKK activation and consequently activation. The TNFR1 is a death-domain-containing receptor. The death domains of TNFR1 interact with the death domain of FADD, initiating proapoptotic signals, but also bind the death domain of TRADD, thereby inducing the antiapoptotic pathway. In turn, TRADD binds to TRAF2 (45). A dominant negative form of TRAF2 inhibits TNF-induced activation; nevertheless, TRAF2-/fibroblasts have no defect in TNF-induced activation (46). However, TRAF2 recruits RIP1 (receptor interacting protein) to the TNFR complex, and RIP1-/- cells are deficient in activity (47). The binding of IL-1 to the IL-1R also leads to the development of a signaling scaffold resulting in activation. First, the adapter protein MyD88 (myeloid differentiation factor 88) binds to the receptor. Interacting through its death domain, MyD88 binds the serine/threonine kinase IRAK (IL-lR-associated kinase) (48), which in turn recruits TRAF6 to the complex. Both IRAK-/- and TRAF6-/- mice display defects in IL-1-induced activity. Additionally, the MAPKKK TAK1 activated kinase)
42
K.A. MOWEN AND M. DAVID
associates with TRAF6 in an IL-1- dependent manner, leading to activation of NIK-dependent manner (49).
in a
CONCLUSIONS In this review we have attempted to outline the events leading to cytokinemediated gene transcription by using that Jak/STAT and the pathways as two examples. It is important to remember that cytokine activation of gene transcription is not the result of a single, linear signaling cascade, but involves a complex network of interacting signal-transduction pathways. Integration of numerous, often contradicting instructions received by the cell through a variety of cytokine and growth factor receptors determines the qualitative and quantitative transcriptional response. REFERENCES
1 2 3 4 5 6
7 8 9
10 11 12 13 14
15 16
Fu, X.-Y., Kessler, D. S., Veals, S. A., Levy, D. E. and Darnell, J. E., Jr. (1990) Proc. Nat. Acad. Sci. U.S.A. 87, 8555-8559. Fu, X.-Y. (1992) Cell 70, 323-335. Lamer, A. C. and Finbloom, D. S. (1995) Biochim. Biophys. Acta 1266, 278-287. Lamer, A. C., David, M., Feldman, G. M., Igarashi, K., Hackett, R. H., Webb, D. S. A., Sweitzer, S. M., Petricoin, E. F., III and Finbloom, D. S. (1993) Science 261, 1730-1733. Schindler, C. and Darnell, J. E., Jr. (1995) Annu. Rev. Biochem. 64, 621-51 Heim, M. H., Kerr, I. M., Stark, G. R. and Darnell, J. E., Jr. (1995) Science 267, 1347-1349. Seidel, H. M., Milocco, L. H., Lamb, P., Darnell, J. E., Jr., Stein, R. B. and Rosen, J. (1995) Proc. Nat. Acad. Sci. U.S.A. 92, 3041-3045. Wen, Z., Zhong, Z. and Darnell, J. E., Jr. (1995) Cell 82, 241-250. David, M., Petricoin, E. F., III, Benjamin, C., Pine, R., Weber, M. J. and Larner, A. C. (1995) Science 269, 1721-1723. Korzus, E., Torchia, J., Rose, D. W., Xu, L., Kurokawa, R., McInerney, E. M., Mullen, T. M., Glass, C. K. and Rosenfeld, M. G. (1998) Science 279, 703-707. Zhang, J. J., Zhao, Y., Chait, B. T., Lathem, W. W., Ritzi, M., Knippers, R. and Darnell, J. E., Jr. (1998) EMBO J. 17, 6963-6971. Velazquez, L., Fellous, M., Stark, G. R. and Pellegrini, S. (1992) Cell 70, 313-322. Muller, M., Briscoe, J., Laxton, C., Guschin, D., Ziemiecki, A., Silvennoinen, O., Harpur, A. G., Barbieri, G., Withuhn, B. A., Schindler, C., Pellegrini, S., Wilks, A. F., Ihle, J. N., Stark, G. R. and Kerr, I. M. (1993) Nature 366, 129-135. Watling, D., Guschin, D., Muller, M., Silvennoinen, O., Witthuhn, B. A., Quelle, F. W., Rogers, N. C., Schindler, C., Ihle, J. N., Stark, G. R. and Kerr, I. M. (1993) Nature 366, 166-170. Kawamura, M., McVicar, D. W., Johnston, J. A., Blake, T. B., Chen, Y.-Q., Lal, B. K., Lloyd, A. R., Kelvin, D. J., Staples, E., Ortaldo, J. R. and O'Shea, J. J. (1994) Proc. Nat. Acad. Sci. U.S.A. 91, 6374-6378. Akira, S. (1999) Stem Cells 17, 138-146.
CYTOKINE ACTIVATION OF TRANSCRIPTION
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
43
Durbin, J. E., Hackenmiller, R., Simon, M. C. and Levy, D. E. (1996) Cell 84, 443-450. Meraz, M. A., White, J. M., Sheehan, K. C. F., Bach, E. A., Rodig, S. J., Dighe, A. S., Kaplan, D. H., Riley, J. K., Greenlund, A. C., Campbell, D., Carver-Moore, K., DuBois, R. N., Clark, R., Aguet, M. and Schreiber, R. D. (1996) Cell 84, 431-442. Takeda, K., Kaisho, T., Yoshida, N., Takeda, J., Kishimoto, T. and Akira, S. (1998) J. Immunol. 161, 4652-4660. Takeda, K., Noguchi, K., Shi, W., Tanaka, T., Matsumoto, M., Yoshida, N., Kishimoto, T. and Akira, S. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 3801-3804. Kaplan, M. H., Sun, Y. L., Hoey, T. and Grusby, M. J. (1996) Nature 382, 174177. Thierfelder, W. E., van Deursen, J. M., Yamamoto, K., Tripp, R. A, Sarawar, S. R., Carson, R. T., Sangster, M. Y., Vignali, D. A., Doherty, P. C., Grosveld, G. C. and Ihle, J. N. (1996) Nature 382, 171-174. Teglund, S., McKay, C., Schuetz, E., van Deursen, J. M., Stravopodis, D., Wang, D., Brown, M., Bodner, S., Grosveld, G. and Ihle, J. N. (1998) Cell 93, 841-850. Shimoda, K., van Deursen, J., Sangster, M. Y., Sarawar, S. R., Carson, R. T., Tripp, R. A, Chu, C., Quelle, F. W., Nosaka, T., Vignali, D. A. A., Doherty, P. C., Grosveld, G., Paul, W. E. and Ihle, J. N. (1996) Nature 380, 630-633. Takeda, K., Tanaka, T., Shi, W., Matsumoto, M., Minami, M., Kashiwamura, S., Nakanishi, K., Yoshida, N., Kishimoto, T. and Akira, S. (1996) Nature 380, 627630. Neubauer, H., Cumano, A., Muller, M., Wu, H., Huffstadt, U. and Pfeffer, K. (1998) Cell 93, 397-409. Nosaka, T., A., v. D. J. M., Tripp, R. A., Thierfelder, W. E., Witthun, B. A., McMickle, A. P., Doherty, P. C., Grosveld, G. C. and Ihle, J. N. (1995) Science 270, 800-802. Russell, S. M., Tayebi, N., Nakajima, H., Riedy, M. C., Roberts, J. L., Aman, M. J., Migone, T.-S., Noguchi, M., Markert, M. L., Buckley, R. H., O'Shea, J. J. and Leonard, W. J. (1995) Science 270, 797-800. Hilton, D. J. (1999) Cell. Mol. Life Sci. 55, 1568-1577. Klingmuller, U., Lorenz, U., Cantley, L. C., Neel, B. G. and Lodish, H. F. (1995) Cell 80, 729-738. David, M., Chen, H. E., Goelz, S., Larner, A. C. and Neel, B. G. (1995) Mol. Cell. Biol. 15, 7050-7058. David, M., Grimley, P. M., Finbloom, D. S. and Larner, A. C. (1993) Mol. Cell. Biol. 13, 7515-7521. Chen, X. P., Losman, J. A. and Rothman, P. (2000) Immunity 13, 287-290. Shuai, K. (2000) Oncogene 19, 2638-2644. Ghosh, S., May, M. J. and Kopp, E. B. (1998) Annu. Rev. Immunol. 16, 225-260. Pahl, H. L. (1999) Oncogene 18, 6853-6866. Karin, M. (1999) J. Biol. Chem. 274, 27339-42 Zandi, E., and Karin, M. (1999) Mol. Cell. Biol. 19, 4547-4551. Karin, M. (1999) Oncogene 18, 6867-6874. May, M. J. and Ghosh, S. (1998) Immunol. Today 19, 80-88. Mercurio, F. and Manning, A. M. (1999) Curr. Opin. Cell Biol. 11, 226-232.
44
K.A. MOWEN AND M. DAVID
42 43 44 45
Karin, M. and Delhase, M. (2000) Semin. Immunol. 12, 85-98. Lee, F. S., Hagler, J., Chen, Z. J. and Maniatis, T. (1997) Cell 88, 213-222. Israel, A. (2000) Trends Cell Biol. 10, 129-133. Wallach, D., Varfolomeev, E. E., Malinin, N. L., Goltsev, Y. V., Kovalenko, A. V. and Boldin, M. P. (1999) Annu. Rev. Immunol. 17, 331-367. Arch, R. H., Gedrich, R. W. and Thompson, C. B. (1998) Genes Dev. 12, 28212830. Barkett, M. and Gilmore, T. D. (1999) Oncogene 18, 6910-6924. Adachi, O., Kawai, T., Takeda, K., Matsumoto, M., Tsutsui, H., Sakagami, M., Nakanishi, K. and Akira, S. (1998) Immunity 9, 143-150. Ninomiya-Tsuji, J., Kishimoto, K., Hiyama, A., Inoue, J., Cao, Z. and Matsumoto, K. (1999) Nature 398, 252-256.
46 47 48 49
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
Pamela Sears,* Thomas Tolbert and Chi-Huey Wong The Scripps Research Institute 10550 N. Torrey Pines Rd. La Jolla, CA 92037 In memory of Norton B. Gilula
INTRODUCTION Saccharides are an important class of biomolecules, though their function was underestimated for many years. This was in part due to the subtlety of many of the biological effects, but also due to the lack of methods available for isolating and/or synthesizing specific saccharides or glycoconjugates. As part of glycoproteins, glycolipids and other conjugates, they play key roles in a variety of processes such as signaling and molecular and cellular targeting. Despite the important roles that glycoproteins play in many biological recognition events, the details of these events are generally not well understood. Research has been stymied by the lack of synthetic and analytical methods available. Recent advances in both classical and enzymatic synthesis are beginning to provide methods for the preparation of homogeneous glycoproteins which were previously inaccessible. NATURAL GLYCOPROTEIN STRUCTURES Proteins with both N-linked and O-linked glycans have been found in eukaryotes, eubacteria and the archaea. Although the number of glycoforms seen in nature is fairly overwhelming, they do fall into a few classes, which are categorized first by the residues to which the sugars are linked. The linkages observed in nature are shown in Figure 1. Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
45
46
P. SEARS ET AL.
The most common are the N-linked (to asparagine) and mucin-type O-linked (GalNAc linked to ser/thr), though the transient O-GlcNAc modification of cytoplasmic and nuclear proteins is becoming recognized as a common mode for regulation of their function, akin to phosphorylation (1). N-linked glycans in eukaryotes fall into three categories as shown in Figure 2. Apart from the chitobiose core, the high-mannose type saccharides contain almost entirely mannose. Complex-type saccharides have all but the innermost 3 mannose residues cleaved away, and have subsequently been re-extended, usually with LacNAc, and then capped with galactose, N-acetylgalactosamine, fucose and/or sialic acids. Hybrid-type chains have the characteristics of high mannose glycans on the mannose branch, but look like complex-type chains on the mannose branch. O-linked glycans have more varied core structures. The mucin-type glycoproteins have GalNAc to serine or threonine which can be extended into a variety of basic units, shown in Figure 3. In mammals, core structures of both N- and mucin-type O-linked glycans are most commonly extended via the addition of LacNAc or polyLacNAc chains which may be fucosylated at the GlcNAc residues. The non-reducing ends, especially of N-linked glycans, usually terminate in type 1
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
47
48
P. SEARS ET AL.
or type 2 saccharides that are capped with fucose, Nacetylgalactosamine, galactose, and sialic acids in any of a variety of linkages. There may be a variety of other modifications in addition: sulfation, methylation, phosphorylation, Oacetylation, and addition of GlcNAc-, mannose-, or GalNAc-1-phosphate. BIOLOGICAL RELEVANCE OF GLYCOSYLATION Glycosylation is an important modification at the molecular, cellular and organismal levels, though the effects can be subtle and hard to pinpoint. A well-researched review of the biological consequences of glycosylation was written by Varki (2). At the molecular level, there are examples where glycosylation affects nearly every property of a protein. There are a host of cases in which it has been demonstrated to increase the stability of proteins toward denaturation or protease degradation. Glycosylation can also alter the physical characteristics of the glycoprotein solution, as in the case of mucins, which are responsible for the sliminess of mucous secretions. The activity of proteins can be modulated by their glycosylation states. O-GlcNAcylation is a recently-discovered transient modification of cytoplasmic and nuclear proteins that is thought to have a regulatory role (3). Recognition of glycoproteins is often based on the saccharides they carry. Many immunoglobulins recognize foreign saccharide epitopes. It is the recognition of that is responsible, at least in part, for many cases of xenograft rejection, and N-glycans fucosylated at the core may contribute to bee sting anaphylaxis (4). Some proteins cannot fold properly when unglycosylated, frequently aggregating instead, so in these cases glycosylation may help both to improve solubility and aid folding (possibly by nucleating "kinks" in the protein). Glycosylation also allows the binding of folding chaperones such as calnexin (5). There are many examples in which glycosylation affects protein targeting. Proteins labeled with mannose6-phosphate are shuttled to lysosomes, for example, and the presence of on some pituitiary hormones is thought to play a part in their retrieval from the blood by hepatic endothelial cells (6). APPROACHES TO GLYCOPROTEIN PREPARATION A major stumbling block to the study of glycoproteins has been microheterogeneity: a single protein species can have literally hundreds of isoforms that differ only in saccharide composition. This occurs because the saccharide that a protein receives reflects the cumulative effort of many glycosidases and transferases, and the action of some of these will preclude the action of others. The glycan produced will be determined by many factors (7, 8) including the local protein structure around the glycosylation site, the relative amounts of glycoprocessing enzymes produced in the cell and their location, the rate of transit of the protein through the endoplasmic reticulum and Golgi apparatus, and so on. Many of these factors also vary with the cell line, so a glycoprotein produced in one cell line will have different glycosylation than the same protein produced in another cell line. A variety of techniques are becoming available for the preparation of homogeneous glycoproteins. For small glycopeptides with relatively simple glycans, chemical synthesis is a good option, and in fact many of the techniques that will be presented in this chapter involve the chemical synthesis of a short glycopeptide fragment that is then biochemically
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
49
coupled to a larger protein core. Synthetic techniques were initially problematic, since the protecting groups commonly used for standard solid-phase peptide synthesis required deprotection conditions under which the glycosides were unstable: strong acid, which promoted acetal cleavage and anomerization, and/or strong base, which caused of O-glycans (9). Protecting group chemistry with milder conditions (such as hydrogenation, oxidation, or treatment with weak base) compatible with the glycosidic linkage, has been worked out, however, and so good yields can be obtained for short peptides with small to moderate amounts of sugar (9, 10). As the peptide and sugar moieties become longer, of course, yields still drop off, and so for larger glycoproteins and peptides it becomes desirable to do much of the synthesis biologically. Fermentation, as mentioned above, necessarily produces a population of many different glycoforms of a given protein. This mixture, however, can be used as a starting point in a variety of schemes in which the glycoprotein is digested down to a simple, homogeneous core and then re-elaborated enzymatically (Figure 4).
One option is to remove the glycosylated sections with proteases, and then reattach short, chemically-synthesized glycopeptides in their place. This ligation can be accomplished via "native peptide ligation" strategies, via the use of proteases, or with inteins (in which case the peptide segment to be replaced is substituted at the genetic level with the sequence encoding the intein). Alternatively, N-glycosylated proteins can have the glycans digested down to the innermost N-acetylglucosamine with endonucleases, thus converting a heterogenous
50
P. SEARS ET AL.
population to a homogeneous, albeit simple, one. These simple glycoproteins can then be elaborated enzymatically to increase the size and complexity of the glycan with the use of glycosyltranferases or endoglycosidase-catalyzed transglycosylation. Protein Hydrolysis and Glycopeptide Ligation with Proteases
One starting point for this method is to produce either a heterogeneous glycoprotein population from eukaryotic cell culture or a non-glycosylated protein from prokaryotic cell fermentation. Proteases may then be used in an aqueous medium to remove the (glyco)peptide to be substituted. A replacement glycopeptide is synthesized chemically and then ligated onto the protein fragment with proteases again, this time under conditions that favor peptide synthesis (e.g., addition of organic cosolvents). Clearly, the fewer segments one must reassemble, the more useful this approach will be. In order for this method to work, one must be able to find a set of conditions under which the peptides can be reassembled by the same enzymes that normally degrade them. Peptide synthesis with the use of proteases has been a topic of interest to many (11-13). There are two different approaches by which normally proteolytic enzymes can be coaxed into running the synthetic reaction, as illustrated in Figure 5.
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
51
In the thermodynamic method, the acid of the acyl donor and the amino group ofthe acyl acceptor are driven to form the amide, typically by removing one of the two products (peptide or water). In some cases, the product precipitates in water and so the reaction is pulled to the peptide product even in aqueous solution. This is true, for example, in the synthesis of cbz-L-Asp-L-Phe-O-methyl ester (a precursor of the artificial sweetener aspartame) with papain (14), which can be run to greater than 90% yield in buffer. In most cases, however, the reaction is driven toward the peptide by conducting the reaction in nearly dry organic solvents. Unfortunately, under very low water conditions, many enzymes are minimally active. In particular, the solvents in which enzyme suspensions are most active and stable (hydrophobic solvents) are the same solvents in which the peptide substrates are minimally soluble. The more polar solvents such as DMF and DMSO, which are better able to dissolve the peptide substrates, are poor solvents with regard to enzymatic activity and stability (15). Also, the substrate specificity (16) and enantioselectivity (17) can be hampered by the addition of solvents. Finally, the activity and stability of the enzyme suspension is exquisitely sensitive to many variables (18), including the type, concentration, and pH of the buffer from which it was lyophilized (19), the amount of water in the solvent (20, 21) and particle shearing. An alternate approach, then, is to produce the peptides in a kinetically-controlled fashion, in which the acyl donor is activated as an ester, and then the reaction is conducted typically in a water/cosolvent mixture and is a competition between hydrolysis and aminolysis of the ester. This approach suffers from problems of its own. The enzymes are often less stable in high concentrations of solvent than in water, and this instability is exacerbated at elevated temperature (22). Also, the competing hydrolytic reaction must be suppressed as much as possible, which can be accomplished in a number of ways: the organic cosolvent concentration can be increased, the amount of acyl acceptor can be increased and/or the protease can be modified. Another approach is to freeze the reaction mixture. Frozen solutions have solutes concentrated in water "channels" between ice crystals, and this freeze concentration effect is at least in part responsible for the increased yield observed in frozen reactions (23, 24). The most common approach is to add a water-miscible organic cosolvent such as DMF, DMSO, or methanol. The addition of cosolvent enhances synthesis by removing much of the competing nucleophile (water) and suppressing the ionization of the amine, but appears to have additional effects on the active site of the serine protease subtilisin. X-ray crystallography and NMR studies have provided evidence for a flip in the orientation of the active-site histidine (Figure 6) (25). How this would contribute to an enhanced rate of aminolysis is not clear, although the flipped orientation is reminiscent of that of the activesite modified enzymes methylsubtilisin and methylchymotrypsin, in which the Nmethylated active-site histidine is also thought to be flipped during catalysis (26, 27). These enzymes also display an improved aminolysis:hydrolysis ratio, which has been attributed to the displacement of the nucleophilic water molecule due to the new active-site geometry (27). An increase in the concentration of cosolvent can have negative effects on enzyme activity and stability, and for this reason many groups have been improving the stability of several key proteases toward both thermal and organic-cosolvent-mediated denaturation and evolving them for better activity under elevated concentrations of cosolvent. The subtilisins, a set of serine proteases from Bacillus species, have been the subject of intensive studies in this regard. These enzymes have been favored because of their broad
52
P. SEARS ET AL.
substrate specificity (28), innate stability and ease of production. Directed evolution approaches have been used with subtilisin E, for example, to produce a variant (coined subtilisin 13M) that is 471 times more stable than the parent in 60% DMF (29) and another, subtilisin E-5H3, with a highly elevated thermostability at 83°C = 3.5 min.) and optimum temperature (30). Likewise, subtilisin BPN' has been modified at 5 sites via site-directed mutagenesis to produce a variant, subtilisin 8397, that denatures at 73°C (whereas the wild type denatures at 59°C) (31).
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
53
Many of these variants have been shown to be useful in the synthesis of peptides under high concentrations of solvent, but significant hydrolysis is frequently observed, even when a high ratio of acyl acceptor:donor is used. In order to suppress the hydrolysis further, the enzyme active site can be modified. The replacement of the active site serine of serine proteases with a cysteine (32) or selenocysteine (33) can dramatically improve the aminolysis:hydrolysis ratio. This modification, which can be made chemically (Figure 7A) (34) or at the genetic level (35) produces a thiolenzyme that can have an improvement in the aminolytic:hydrolytic ratio of several orders of magnitude (Figure 7B).
Reactions with the thiolenzyme can, in fact, be run in water without cosolvent at all to obtain good yields of the peptide product. The enzyme is somewhat crippled, however, and works best with activated ester substrates such as cyanomethyl esters. With the use of thermostable thiolsubtilisin variants, however, the reaction can be run at elevated temperature (50°C) to avoid the need for activated leaving groups. Simple methyl esters of glycopeptide substrates can be coupled at moderate yields, in aqueous solution, at 50°C (Figure 8) (36). Under these conditions, thiolsubtilisin BPN' would quickly deactivate, while the serine protease would primarily hydrolyze the substrate. Wells and coworkers took a different tactic to improve thiolsubtilisin activity. Rather than raising the reaction temperature, they discovered that an additional modification near the active site (Pro225Ala) would pull the active site thiol away from the histidine slightly, providing more room for the larger sulfur atom and restoring much of the activity of the enzyme toward ester hydrolysis at room temperature, though the enzyme still retains the highly diminished protease activity characteristic of thiolsubtilisin (37).
54
P. SEARS ETAL.
Many proteases are commercially available for use in the degradation and reassembly of glycoproteins. Those used for peptide ligation are typically serine or thiolproteases, for which a covalent acyl enzyme intermediate is formed that may then undergo aminolysis to form the amide product. Proteases range from the exquisitely specific (and thus of limited utility for this purpose), such as enterokinase, to the very nonspecific. An exhaustive discussion of the substrate specificities of these enzymes is not feasible here, but an excellent handbook was recently published that tabulates substrate preferences and other information on the known proteases (38). Notably, subtilisin and papain are frequently used because of their ready availability, good stability and broad specificity. Subtilisin, for example, prefers hydrophobic sequences, but will accept a very wide range of amino acid residues around the scissile bond (Figure 9).
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
55
[For table of subsite preferences of subtilisin, see Moree et al. (28).] Subtilisin, being a microbial protease, is very easy to modify genetically, and many subtilisin mutants have also been created with altered or broadened specificity (22, 39). Other proteases that have been used in synthesis include trypsin, chymotrypsin, elastase and thermolysin. An early synthesis of the octapeptide dynorphin was accomplished with a several proteases in succession (11), while human insulin is prepared from the porcine hormone by coupling the desoctapeptide (B23-30)-insulin with a synthetic peptide corresponding to positions 23-30 of the human hormone (40). The artificial sweetener aspartame is made by coupling cbz-L-glutamic acid with L-phenylalanine methyl ester, catalyzed by thermolysin in aqueous solution (14). Most proteases have preferred cleavage sequences, but it is worth noting that the tertiary structure of the protein substrate has so much influence on the ability of the protease to cleave it that even the broadly-specific enzymes may only cleave the substrate at one or two sites. Subtilisin, despite its broad substrate acceptance, only cleaves a single site of ribonuclease after four hours at room temperature, and that site is not flanked by the "preferred" residues for subtilisin cleavage. This site is, however, on the surface of the protein, and thus highly accessibly to proteases. (Figure 10)
56
P. SEARS ETAL.
This is a fairly typical observation. For this reason, the choice of a protease for cleavage and religation will require a certain amount of trial and error. When sugars are added to the equation, matters change even further, if the sugars are near the scissile bond. Glycosylation has been recognized for many years to promote resistance to proteolysis. The ability of subtilisin to accept glycosylated peptide substrates has been investigated in this laboratory via the systematic placement of single sugars at the and positions (nomenclature according to Schecter and Berger (41), where and refer to the peptide residues and enzyme subsites, respectively, to the N-terminal side of the scissile bond and and refer to those on the C-terminal end). Subtilisin would not accept sugars immediately adjacent to the scissile bond, and accepted the sugars better as they were moved further away (Figure 11) (42).
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
57
Although cleavage and religation reactions have been grouped together here, it is not necessary that the same enzyme be used to complete both reactions. They frequently are, due to enzyme specificity issues, but an interesting strategy for ligating peptides in which the resulting amide is not a substrate for hydrolysis has been presented, and is called the "inverse substrate" (43) or "substrate mimetic" (44) principle. With this strategy, the acyl donor contains a small residue and a leaving group that satisfies the substrate requirements of the subsite. The leaving group wraps around into the site, eliminating the need of the residue to fill that site. This has been illustrated in Figure 12 for trypsin, which prefers basic residues at the position. With the use of p-guanidinophenyl ester, the necessity of having a basic residue can be avoided. This substrate can serve as an acyl donor, but once the leaving group is gone, the resulting amide product is no longer a substrate for the enzyme, avoiding the problem of further hydrolysis of the peptide product.
Intein-Mediated Glycopeptide Ligation A recent alternative to protease-catalyzed fragment condensation has been made available since the discovery of self-splicing inteins, segments of proteins that excise themselves post-translationally. The general mechanism of a self-splicing intein is shown in Figure 13 (45). The intein is flanked by the two exteins to be ligated. In the canonical mechanism, the N-terminal residues of both the intein and the C-terminal extein is cysteine (or sometimes a hydroxylated amino acid). Acyl migration of the C-terminus of the Nterminal extein to the side-chain thiol at the N-terminus of the intein forms the thioester. Transesterification produces another thioester, this time with the side-chain thiol of the Cterminal extein. The asparagine at the C-terminus of the intein cyclizes to the aspartimide,
58
P. SEARS ETAL.
releasing the N-terminus of the C-terminal extein, which attacks the thioester to form the peptide product. It is possible to intervene in the splicing reaction by eliminating the Cterminal extein, then allowing the reaction to be completed with an exogenously-added peptide. This tactic has been used, for example, in the production of tyrosinephosphorylated Src kinase (46), and in the semisyntheses of cytotoxic proteins such as RNase A by biological production of an inactive "proenzyme" that has a segment replaced with an intein, then in vitro replacement of the intein with a synthetically-produced peptide to generate the native protein (47). The introduction of glycosylated peptides to the Cterminus of a biologically-produced protein has also been accomplished by this mechanism (48), as shown in Figure 14. The necessity of having a thiol at the amino terminus of the incoming peptide, which can sometimes be overcome by using very high concentrations of nucleophile, may not always be desirable. The intein can also be used to simply generate a thioester, which then acts as a substrate for protease-catalyzed peptide synthesis as shown by Welker and Scheraga (Figure 15) (49).
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
59
Ligation Strategies (i.e., cleavage/religation and N-terminus versus C-terminus). The strategies presented above for protein ligation each have areas of application for which they are best suited. Ligation with proteases, with the kinetically-controlled approach, requires the activation of the acyl donor as an ester. As a result, proteasecatalyzed ligation is better for attaching a small synthetic peptide to the N-terminus of an expressed protein. Intein-mediated ligation, on the other hand, is useful for the attachment of a peptide to the C-terminus of an expressed protein. One can therefore envision using the two techniques in combination to produce a relatively complex glycoprotein with several glycosylation sites (Figure 16).
60
P.SEARS ET AL.
Glycosidase Digestion to a Common Core Another method for converting heterogeneous glycoproteins produced by fermentation into homogeneous ones is to remove much of the glycan with an endoglycosidase. There are a number of microbial endoglycosidases that cleave N-linked glycans between the two N-acetylglucosamine residues of the reducing-end chitobiose unit. These differ in the classes of N-glycan that they can accept (50). The first discovered, Endo-D from Streptococcus (Diplococcus) pneumoniae (51, 52), accepts high-mannosetype and some complex-type glycans. Endo-H, the second such enzyme found, accepts only high-mannose-type N-glycans. Endo-C1 and C2 are similar to Endo-D and H, respectively (53, 54). Others have been identified as well. Three were isolated from Chryseobacterium (Flavobacterium) meningosepticum and are called Endo-F1, F2 and F3 (55). Endo-F1 accepts high-mannose and hybrid, but not complex, type; Endo-F2 accepts high-mannose and biantennary-complex type; and Endo-F3 will cleave biantennary- and triantennarycomplex type. Another enzyme that shows great promise has been isolated from Mucor hiemalis, and is called Endo-M. It has perhaps the broadest specificity, and will accept a wide range of high-mannose, hybrid and complex-type glycans (56). This tactic has been used, for example, in the synthesis of a molecule of ribonuclease that has a pendant sialyl Lewis X group at the single glycosylation site (57). Ribonuclease B (the glycosylated form) from bovine pancreas was digested to completion with Endo-H, and then re-elaborated with galactosyltransferase, sialyltransferase and fucosyltransferase to produce homogenous sLex-RNase (see next section). Exoglycosidases may also be of use in mis strategy. In some cases, the saccharides may be refractory to cleavage by the endoglycosidases, and will require partial exoglycosidase digestion of the glycan in order to make inner glycosidic linkages accessible to endoglycosidases. Alternatively, digestion back to the innermost sugar may not even be necessary; a battery of exoglycosidases may be used to trim the glycans back to a larger common core instead, such as A host of exoglycosidases are commercially available to do this, and the reactivities of these are discussed in a several reviews (58, 59). Although many of the glycosidases show strict specificities and thus are of limited utility when attempts are made to digest the glycan back to a homogeneous core, many have broad specificity. Mannosidase II from Jack bean, for example, is a highly nonspecific enzyme that can hydrolyze and 6-linked mannose. N-acetylneuraminic acid hydrolase III (NANAase III) can cleave sialic acid in any of the natural linkages Hexosaminidase II can cleave GlcNAc and GalNAc-linked O-glycan cores vary more than N-glycan cores, and so there is no enzyme analogous to Endo-H which can cleave off all but the reducing end sugar. Instead, a combination of enzymes is required to digest the glycan down to the innermost N-acetylgalactosamine. Again, enzymes with broad specificity such as hexosaminidase II and NANAase III will be of great help in these cases, although there may be some glycans that are still refractory to digestion by the available enzymes. One should bear in mind, then, that while glycoproteins produced in some cell lines will have glycans that are refractory to hydrolysis by the available glycosidases, production of the same protein in another cell line can give more readily-hydrolyzed glycans. For example, yeasts are well known to use almost exclusively high-mannose-type oligosaccharides, so mannosidases can be used to trim them partially, or endoglycosidase H can cleave the entire glycan except the innermost GlcNAc. There are also many mammalian cell lines (see Cell Line Engineering, below) that are
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
61
deficient in glycoprocessing enzymes and produce highly truncated glycans, which will be more accessible to the available glycosidases (60). Growth in the presence of glycoprocessing inhibitors, such as the glucosidase inhibitors castanospermine and deoxynojirimycin, can also be used to produce glycans which are more accessible to endoglycosidases. Re-elaboration of the Glycan
When a homogeneous glycoprotein with a simple glycosylation site has been synthesized, the glycan can be re-elaborated with the use of any one of a variety of enzymatic techniques. Glycosyltransferases can be used in series to build the glycan back up from the core. A great number of glycosyltransferases have been cloned (61), and many of these are commercially available. Still more have been isolated and characterized. Although the substrates for these enzymes are typically expensive, regeneration schemes to restore the glycosyl donor with an added energy source (such as phosphoenolpyruvate), an inexpensive activated sugar (such as a sugar phosphate), and the released cofactor have been developed for most of them (62). These regeneration methods have the added benefit of removing the released cofactors which are in general inhibitory to the glycosyltransfer enzymes. A thorough review of the literature regarding the use of glycosyltransferases in synthesis was recently published (58), but a few examples from this laboratory will be highlighted here (Figure 17). A glycotripeptide, Boc (GlcNAc)-Asn-Gly-Tyr, was prepared by solid-phase peptide synthesis. This was elongated while still bound to the resin with galactosyltransferase, sialyltranferase and then fucosyltransferase to give the tripeptide with sialyl Lewis X attached (63). Yields for two of the glycosyltransferase reactions in this example were not exceptional (55 and 65% for the GalT and SialT reactions, respectively), but could probably be improved with the use of cofactor recycling schemes, since the cofactor released in glycosyltransferase reactions is a strong competitive inhibitor of glycosyltransferases (61). A few years later, the sialyl Lewis X derivative of the MadCAM 1 mucin repeat was prepared along similar lines (64). The singly glycosylated heptapeptide was produced by solid phase peptide synthesis and then elaborated enzymatically, which could be performed on both the resin bound peptide and in solution, though the reaction in solution gave better yield (Figure 17A). Ribonuclease was prepared with both sialyl Lewis X and a mercuric derivative of that tetrasaccharide by the sequential addition of sugars to GlcNAc-RNase (Figure 17B), which itself was prepared via the Endo-H digestion of RNase B (a mixture of glycoforms) (57). More recently, intein-mediated ligation was used to attach a Nacetylglucosminyldipeptide (Cys(GlcNAc)Asn) to the C-terminus of maltose binding protein, and galactosyltransferase was able to modify this glycoprotein further to produce the LacNAc derivative (Figure 14) (48). In some cases, there will be no glycosyltransferase available for the desired transformation, but a glycosidase that hydrolyzes the desired linkage may be known. In an analogy to protease-catalyzed synthesis, some groups have added organic cosolvent to glycosidase reactions in hopes of driving them in reverse [see for example, Finch and Yoon (65) and references therein] . A retaining glycosidase is mixed with an activated substrate
62
P.SEARSET AL.
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
63
such as a p-nitrophenyl glycoside; a covalent intermediate is formed, which can then be attacked either by water (to give the hydrolysis product) or by the hydroxyl of an acceptor, to give the desired transglycosylation product. While this works somewhat, the yields are usually very poor, even with a large excess of glycosyl donor. A somewhat different approach has been investigated by Withers and coworkers (67, 68). Hehre and coworkers made the interesting discovery in 1979 that could catalyze the hydrolysis of a maltosyl fluoride, and further, that the enzyme seemed to do this by using two molecules of the substrate (66). It was postulated that the reaction proceeded through the formation of a glycosidic linkage between two substrate molecules via direct displacement of the fluoride. This glycoside, being inverted from the original, is now a substrate for the enzyme, and is rapidly hydrolyzed. Many glycosidases have since been shown to accept fluorides of an anomer opposite to that preferred (67). These and similar studies prompted the removal of the nucleophilic carboxylate of retaining glycosidases to create an enzyme that could catalyze the addition of glycosyl fluorides of the opposite anomer with a glycosyl acceptor, but which would then be incapable of hydrolyzing the glycosidic linkage formed. (Figure 18). These enzymes have been shown to catalyze transglycosylation reactions in good yields (typically 65-86% of the acceptor is modified, with 2-3 equivalents of glycosyl donor) (68). A recent approach that shows some promise is the transfer, en masse, of an entire saccharide to the simple glycoprotein via endoglycosidase-catalyzed transglycosylation. Several groups have used endoglycosidases to transfer saccharides from short oligosaccharylpeptides to proteins with a GlcNAc attached to asparagine. In general, an excess of either the saccharide donor or acceptor must be used, and yields are in general not above 30%, even in the presence of cosolvent. The saccharide must be bound to asparagine, as the endoglycosidases will not accept saccharides that are non N-linked to that amino acid. Inazu and coworkers synthesized eel calcitonin analogs by using Endo-M to transfer saccharides from human transferrin to a synthetically-produced GlcNAcylated peptide as shown in Figure 19A (8.5% yield based on acceptor; 2.5 equivalents of donor) (69). Y.C. Lee and coworkers constructed both a natural and a C-glycopeptide using the transglycosylation route (70). A glycopeptide with either a C- or N-linked GlcNAc was prepared synthetically, and Endo-A (Arthrobacter endoglycosidase, which has similar specificity to Endo-H) was used to transfer the oligosaccharide from (prepared from protease-digested soybean agglutinin). The reaction was carried out in 35% acetone, and yield was 25% based on donor, with a 4-fold excess of acceptor (70) (Figure 19B). Cell-line Engineering
Cell-line engineering is an alternative to the chemoenzymatic techniques presented here that has only recently begun to be investigated (60). The heterogeneity of glycoproteins produced in biological systems can be greatly diminished by knocking out key glycoprocessing enzymes and/or overexpressing others. Many cell lines are naturally deficient in particular glycoprocessing enzymes. Chinese hamster ovary (CHO) cells, for example, are naturally deficient in a number of glycosyltransferases such as C2GnT (Core 2 GlcNAc transferase; see Figure 3), and Various groups have altered CHO and other cell lines such
64
P. SEARS ETAL.
as baby hamster kidney (BHK) cells to modify their glycosylation, usually by overexpressing a new glycosyltransferase, and in some cases by knocking others out. The resulting glycans are typically more homogeneous, since the glycosyltransferases introduced are in general overexpressed and outcompete many of the endogenous enzymes. As an example, Berger and coworkers transfected CHO cells with a construct containing C2GnT, and a Neo resistance marker. The resulting cell lines were then transfected with the gene for the P-selectin glycoprotein ligand (PSGL-1). This ligand binds to E- and P-selectins, but only if it is glycosylated with certain types of glycans such as sialyl Lewis X Unmodified CHO cells transfected with PSGL-1 were incapable of binding to P-selectin, but those with the added glycosyltranferases gained PSGL-1 binding (71).
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
65
It is possible to make very radical changes in glycosylation by knocking out key enzymes involved in the construction of the N-glycan core. Several CHO cell lines have been constructed which produce highly truncated N-glycans. Some of these, for example, produce glycoproteins containing only the innermost cores (60). If further elaboration is desired, it could be accomplished in vitro with glycosyltransferases. FUTURE PROSPECTS New strategies in combined chemical and enzymatic synthesis are finally making homogeneous glycoproteins accessible. Hundreds of proteases, glycosidases and glycosyltransferases have become widely available, and the specificities of many of these enzymes are well documented. Recycling systems have been developed to minimize the cost of reactants. The biggest stumbling block to these combined approaches is simply the perception that enzymes are difficult to work with, that many of them are hard to acquire, highly unstable, and have poorly-characterized specificities. In some cases this is still true, but it has become less so in the past decade. A number of companies sell proteases,
66
P. SEARS ET AL.
glycosidases and even glycosyltransferases, which were at one time the most difficult enzymes to acquire. A great deal of work has been conducted regarding the preferences of the enzymes, too, granting us more predictive power on the results of reactions. Stability issues can sometimes be a problem and can increase the cost of the synthesis since more catalyst is required to account for enzyme deactivation, but frequently one can find conditions that stabilize the enzyme (e.g., enzyme immobilization or crosslinking; inclusion of certain additives such as glycerol to stabilize the enzyme). Enzyme engineering to improve stability or cloning of homologous enzymes from alternate organisms is another alternative. A combined chemical and enzymatic approach is still an excellent, and in some cases, the only available option for the synthesis of large, complex glycoproteins. REFERENCES 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21
Hart, G. W., Haltiwanger, R. S., Holt, G. D. and Kelly, W. G. (1989) Annu. Rev. Biochem. 58, 841-874. Varki, A. (1993) Glycobiology 3, 97-130. Hart, G. W., Kreppel, L. K., Comer, F. I., Arnold, C. S., Snow, D. M., Ye, Z., Cheng, X., Dellamanna, D., Caine, D. S., Earles, B. J., Akimoto, Y., Cole, R. N. and Hayes, B. K. (1996) Glycobiology 6, 711-716 . Tretter, V., Altmann, F., Kubelka, V., Marz, L. and Becker, W. M. (1993) Internat. Arch. Allergy Immunol. 102, 259-266 . Helenius, A., Trombetta, E. S., Hebert, D. N. and Simons, J. F. (1997) Trends Cell. Biol. 7, 193-200. Baenziger, J. U. (1996) Endocrinology 137, 1520-1522 . Lis, H. and Sharon, N. (1993) Eur. J. Biochem. 218, 1-27 . Kornfeld, R. and Kornfeld, S. (1985) Annu. Rev. Biochem. 54, 631-664 . Kunz, H. and Schultz, M. (1997) in Glycopeptides and Related Compounds (Large, D. C. and Warren, C. D., eds.), pp. 23-78. Marcel Dekker, Inc, New York, NY. Danishefsky, S. and Roberge, J. Y. (1997) in Glycopeptides and Related Compounds (Large, D. C. and Warren, C. D., eds.), pp. 245-296. Marcel Dekker, Inc, New York, NY. Kullman, W. (1982) J. Org. Chem. 47, 5300-5303 . Inoue, K., Watanabe, k., Tochino, Y., Kobayashi, M. and Shigeta, Y. (1981) Biopolymers 20, 1845-1858. Sears, P. and Wong, C.-H. (1996) Biotechnol. Prog. 12, 423-433 . Isowa, Y., Ohmori, M., Ichikawa, T., Mori, K., Nonaka, Y., Kihara, K.-I., Oyama, K., Satoh, H. and Nishimura, S. (1979) Tetrahedron Lett. 28, 2611-2612 . Dordick, J. S. (1989) Enzym. Microb. Technol. 11, 194-211 . Wescott, C. R. and Klibanov, A. M. (1993) J. Amer. Chem. Soc. 115, 1629-1631 . Tawaki, S. and Klibanov, A. M. (1992) J. Amer. Chem. Soc. 114, 1882-1884 . Sears, P., Witte, K. and Wong, C.-H. (1999) J. Mol. Catal. B 6, 297-304 . Khmelnitsky, Y. L., Welch, S. H., Clark, D. S. and Dordick, J. S. (1994) J. Amer. Chem. Soc. 116, 2647-2648. Xu, Z.-F., Focht, K. and Dordick, J. S. (1992) Ann. N.Y. Acad. Sci. 672, 94-99 . Yang, Z., Zacherl, D. and Russell, A. J. (1993) J. Amer. Chem. Soc. 115, 1225112257.
ENZYMATIC APPROACHES TO GLYCOPROTEIN SYNTHESIS
67
Sears, P., Schuster, M., Wang, P., Witte, K. and Wong, C.-H. (1994) J. Amer. Chem. Soc. 116, 6521-6530. 23 Schuster, M., Aaviksaar, A., Haga, M., Ullmann, U. and Jakubke, H.-D. (1991) Biomed. Biochim. Acta 50, 84-89 . 24 Ullmann, G., Haensler, M., Gruender, W., Wagner, M., Hofmann, H.-J. and Jakubke, H.-D. (1997) Biochim. Biophys. Acta 1338, 253-258 . 25 Kidd, R. D., Sears, P., Huang, D.-H., Witte, K., Wong, C.-H. and Farber, G. K. (1999) Protein Sci. 8, 410-417. 26 Henderson, R. (1971) Biochem. J. 124, 13 . 27 Zhong, Z., Bibbs, J. A., Yuan, W. and Wong, C.-H. (1991) J. Amer. Chem. Soc. 113, 2259-2263 . 28 Moree, W. J., Sears, P., Kawashiro, K., Witte, K. and Wong, C.-H. (1997) J. Amer. Chem. Soc. 119, 3942-3947 . 29 Zhao, H., Li, Y. and Arnold, F. A. (1996) Ann. N.Y. Acad. Sci., 1-5 . 30 Zhao, H. and Arnold, F. H. (1999) Protein Eng. 12, 47-53 . 31 Pantoliano, M. W., Whitlow, M., Wood, J. F., Dodd, S. W., Hardman, K. D., Rollence, M. L. and Bryan, P. N. (1989) Biochemistry 28, 7205 . 32 Nakatsuka, T., Sasaki, T. and Kaiser, E. T. (1987) J. Amer. Chem. Soc. 109, 3808 . 33 Wu, Z.-P. and Hilvert, D. (1989) J. Amer. Chem. Soc. 111, 4513 . 34 Polgar, L. and Bender, M. (1967) Biochemistry 6, 610. 35 Jackson, D. Y., Burnier, J., Quan, C., Stanley, M., Tom, J. and Wells, J. A. (1994) Science 266, 243-247. 36 Wong, C.-H., Schuster, M., Wang, P. and Sears, P. (1993) J. Amer. Chem. Soc. 115, 5893-5901 . 37 Abrahmsen, L., Tom, J., Burnier, J., Butcher, K. A., Kossiakoff, A. and Wells, J. A. (1994) Biochemistry 30, 4151-4159 . 38 Barrett, A. J., Rawlings, N. D., and Woessner Jr., F. F., (1998) Handbook of Proteolytic Enzymes, Academic Press, San Diego, CA. 39 Bonneau, P. R., Graycar, T. P., Estell, D. A. and Jones, J. B. (1991) J. Amer. Chem. Soc. 113, 1026-1030. 40 Inouye, K., Watanabe, K., Morihara, K., Tochino, Y., Kanaya, T., Emura, J. and Sakakibara, S. (1979) J. Amer. Chem. Soc. 101, 751-752 . 41 Schecter, I. and Berger, A. (1967) Biochem. Biophys. Res. Commun. 27, 157 . 42 Witte, K., Seitz, O. and Wong, C.-H. (1998) J. Amer. Chem. Soc. 120, 1979-1989 . 43 Tanizawa, K., Kasaba, Y. and Kanaoka, Y. (1977) J. Amer. Chem. Soc. 99, 44854488. 44 Bordusa, F., Ullmann, D., Elsner, C. and Jakubke, H.-D. (1997) Angew. Chem. Internat. Ed. Engl. 36, 2473-2475 . 45 Noren, C. J., Wang, J. and Perler, F. B. (2000) Angew. Chem. Internat. Ed. Engl. 39, 450-466. 46 Muir, T. W., Sondhi, D. and Cole, P. A. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 6705-6710. 47 Evans, T. C. J., Benner, J. and Xu, M.-Q. (1998) Protein Sci. 7, 2256-2264 . 48 Tolbert, T. J. and Wong, C.-H. (2000) J. Amer. Chem. Soc. 122, 5421-5428 . 49 Welker, E. and Scheraga, H. A. (1999) Biochem. Biophys. Res. Commun. 254, 147151. 50 Yamamoto, K. (1994) J. Biochem. 116, 229-235 . 22
68
P. SEARS ET AL.
51 52
Koide, N. and Muramatsu, T. (1974) J. Biol. Chem. 249, 4897-4904. Tai, T., Yamashita, K., Ogata-Arakawa, M., Koide, N., Muramatsu, T., Iwashita, S., Inoue, Y. and Kobata, A. (1975) J. Biol Chem. 250, 8569-8575 . Tai, T., Yamashita, K. and Kobata, A. (1977) Biochem. Biophys. Res. Commun. 78, 434-441 . Ito, S., Muramatsu, T. and Kobata, A. (1975) Arch. Biochem. Biophys. 171, 78-86 . Trimble, R. B. and Tarentino, A. L. (1991) J. Biol. Chem. 266, 1646-1651 . Yamamoto, K., Kadowaki, S., Fujisaki, M., Kumagai, H. and Tochikura, T. (1994) Biosci. Biotech. Biochem. 58, 72-77 . Witte, K., Sears, P., Martin, R. and Wong, C.-H. (1997) J. Amer. Chem. Soc. 119, 2114-2118. Ichikawa, Y. (1997) in Glycopeptides and Related Compounds, (Large, D. C., and Warren, C. D. eds.), pp. 79-206. Marcel Dekker, Inc, New York, NY. Dwek, R. A., Edge, C. J., Harvey, D. J. and Wormald, M. R. (1993) Annu. Rev. Biochem. 62, 65-100. Stanley, P. (1992) Glycobiology 2, 99-107 . Sears, P. and Wong, C.-H. (1998) Cell. Mol. Life Sci. 54, 223-252 . Wong, C.-H., Halcomb, R. L., Ichikawa, Y. and Kajimoto, T. (1995) Angew. Chem. Internat. Ed. Engl. 34, 412-432, 521-546 . Schuster, M., Wang, P., Paulson, J. C. and Wong, C.-H. (1994) J. Amer. Chem. Soc. 116, 1135-1136. Seitz, O. and Wong, C.-H. (1997) J. Amer. Chem. Soc. 119, 8766-8776 . Finch, P. and Yoon, J. H. (1997) Carbohydr. Res. 303, 339-345 . Hehre, E. J., Brewer, C. F. and Genghof, D. S. (1979) J. Biol. Chem. 254, 5942-4940. Williams, S. J. and Withers, S. G. (2000) Carbohydr. Res. 327, 27-46 . Mackenzie, L. F., Wang, Q., Warren, R. A. J. and Withers, S. G. (1998) J. Amer. Chem. Soc. 120, 5583-5584 . Mizuno, M., Haneda, K., Iguchi, R., Muramoto, L, Kawakami, T., Aimoto, S., Yamamoto, K. and Inazu, T. (1998) J. Amer. Chem. Soc. Wang, L.-X., Tang, M., Suzuki, T., Kitajima, K., Inoue, Y., Inoue, S., Fan, J.-Q. and Lee, Y. C. (1997) J. Amer. Chem. Soc. 119, 11137-11146 . Dinter, A., Zeng, S., Berger, B. and Berger, E. G. (2000) Biotechnol. Lett. 22, 25-30.
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
VECTOR DESIGN AND DEVELOPMENT OF HOST SYSTEMS FOR PSEUDOMONAS
Herbert P. Schweizer1, Tung T. Hoang2, Katie L. Propst1, Henry R. Ornelas1 and RoxAnn R. Karkhoff-Schweizer1 1
Department of Microbiology, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, CO 80523 2 Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
INTRODUCTION Many members of the species Pseudomonas and closely-related bacteria are agriculturally, industrially and medically important and therefore receive considerable attention by researchers working in agricultural, industrial and biomedical research settings. Since commercially available cloning and expression systems are only available for Escherichia coli, many laboratories around the world have designed broad-host-range cloning and expression vectors suitable for analysis of bacteria other than E. coli. The majority of these broad-host-range vectors is based on the RSF1010, RK2 or pRO1600 replicons (see Table 1 for representative examples). Many of the vectors contain regulatable promoters, such as the T7 promoter (from the T7 bacteriophage gene 10), the leftward and rightward promoters from E. coli phage and from the Pseudomonas putida TOL plasmid pWWO, and the E. coli lac operon-based promoters and that allow regulated transcription of cloned genes. Of these, the T7 and lac operon-based promoters have traditionally found the most widespread use, because the biology of these two systems is more familiar and the required materials are readily available in most laboratories. With the rapid emergence of entire genome sequences for various bacteria, including P. aeruginosa, P. putida and presumably other Pseudomonads in the future, widely applicable, user-friendly vectors and host systems will be required Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
69
VECTOR DESIGN OF HOST SYSTEMS FOR PSEUDOMONAS
71
for elucidation of the functions of the many unknown proteins encoded by these genomes since many of these may be toxic when expressed in heterologous hosts. In this chapter, we summarize some of our past and recent efforts aimed at development of improved host and vector systems for Pseudomonas. Although we developed these systems for P. aeruginosa, with appropriate modification they should also be applicable to other bacteria. THE pUCP FAMILY OF BROAD-HOST-RANGE CLONING AND EXPRESSION VECTORS
Almost ten years ago we described the first generation of the pUCP vectors, pUCP18 and pUCP19 (1). These vectors are based on the popular Escherichia coli pUC18 and pUC19 cloning vectors (2), and besides their multiple cloning sites (MCS), preserve all of their other popular features. These include (i) blue/white screening for recombinants, (ii) use of commercially-available primers for sequencing and PCR amplification of inserts, and (iii) regulated expression from Direct cloning, regulated expression and blue/white screening in P. aeruginosa were facilitated by construction of a transposon-based expression and regulatory module containing and Determination of the complete nucleotide sequence of the pR01600-derived replicon allowed the construction of improved pUCP derivatives containing different selectable markers and a minimal RK2-derived origin of transfer (oriT) for conjugative mobilization of the vectors from E. coli mobilizer strains (4, 5). Although regulated expression was possible in specialized P. aeruginosa strains containing the aforementioned chromosomally-integrated expression cassette, the same was not readily possible in different P. aeruginosa mutant backgrounds or in other Pseudomonas species. To this end, we constructed a mobilizable expression vector, pEX30 (Figure 1) that also exhibits the following features: (i) expression of cloned genes from (6, 7); (ii) blue/white screening by insertional inactivation of after cloning into several unique cloning sites; (iii) in-frame cloning with the ATG start codon, and (iv) increased plasmid stability because of the E. coli 5S RNA gene-derived double transcriptional terminators (8). We and others (9) have used this vector for expression of various P. aeruginosa proteins. Several specialized derivatives of the pUCP vectors have been described facilitating protein expression and purification. Two of these, pUCP-Nde and pUCPNco, facilitate protein expression by fusion of its reading frame to that of via unique Ncol or Ndel sites (10). With the use of pUCP-Nde, a subunit of P. putida pcresol methylhydroxylase was successfully expressed in the P. aeruginosa PA01-Lac1 host containing a chromosomal copy of the aforementioned transposon-based expression and regulation cassette after expression attempts of the same protein in E. coli hosts proved unsuccessful (11). Two derivatives of pUCP19, pUCPKS and pUCPSK, were constructed that contain the extended MCS of pBluescript and allow expression from in E. coli and Pseudomonas (12). Since these two vectors contained the redundant DNA found in the original pUCP19 vector, including the 3' end of the gene, its transcriptional terminator and a Tn3 terminal repeat (5), we derived two new pBluescript-derived
72
H.P. SCHWEIZER ET AL.
vectors, pBSP II SK(-) (Figure 2) and pBSP II KS(-), using a previously outlined strategy (12). In conjunction with specially engineered host strains, these vectors allow the controlled production of plasmid-encoded proteins from either or in both E. coli and P. aeruginosa.
VECTOR DESIGN OF HOST SYSTEMS FOR PSEUDOMONAS
73
A derivative of pUCP26, pEP31, was constructed allowing expression of hexahistidine fusion proteins in E. coli and Pseudomonas. With this vector, a Rhodococcus globerulus oxygenase component was expressed and purified from a P. putida host strain (13). We have successfully expressed fusion proteins in a new P. aeruginosa host after equipping a pET-15b (Novagen, Madison, WI) derivative with the pRO1600 replicon (14). Specialized expression vectors for E. coli can easily be converted into broad-host-range vectors by inserting the 1.6 kb containing cassette from pSF2 (4) into a suitable restriction site in these vectors. We also constructed pUCP-derivatives containing the in vivo selectable asd marker and their possible applications have been discussed (14, 15).
74
H.P. SCHWEIZER ET AL.
STRATEGIES FOR PSEUDOMONAS HOST STRAIN CONSTRUCTION Since many of the regulatory elements required for regulated expression from especially the and plasmids are missing from heterologous hosts, they either must be carried on the cloning vector or provided from another source, either a compatible plasmid or a regulatory element integrated into the host’s chromosome. Until very recently, introduction of regulatory elements into bacterial genomes other than E. coli was mostly achieved via transposable elements (3, 16, 17). The two most common approaches involve Tn5-based (16, 17) or phage D3112-based transposons (3, 18). Although it has been successfully used many times, this approach has several drawbacks. (i) Insertions occur randomly in the genome and position effects cannot easily be controlled; (ii) analysis of gene expression in different mutant backgrounds is difficult since the regulatory elements will be integrated in different chromosomal locations, and (iii) the integrated transposable elements contain antibiotic markers that prevent use of vectors with the same selectable markers. We recently described a family of integration-proficient mini-CTX vectors that allow insertion of gene cassettes at a defined location, attB, in the P. aeruginosa chromosome (14); the 30 bp attB sequence (19) is located at 2.94 Mb on the 6.24 Mb chromosome (38). Furthermore, unwanted sequences, including plasmid-associated promoters and antibiotic markers, can be removed in vivo by yeast Flp recombinase. This system is therefore analogous to constructing E. coli host strains by lysogenization with recombinant phage except that mini-CTX vector integrants are more stable due to the absence of excision functions. In the following paragraphs, we will describe two mini-CTX elements allowing derivation of host strains for lac operon and T7 promoter containing expression vectors. CONSTRUCTION OF HOST STRAINS FOR
PLASMIDS
Our newest vector, mini-CTX-LacMl5, contains the gene and a allele (Figure 3A), and allows ready engineering of host strains creating an complementation and regulated expression system for cloning in P. aeruginosa. The system is analogous to a previously-described system based on the transposable bacteriophage D3112 (3). However, the new system is more user friendly and allows site-specific integration of an expression cassette into the P. aeruginosa chromosome with the steps oulined in Figure 3B, and the resulting host strain is devoid of any antibiotic resistance marker. Integration into the genome and excision of unwanted plasmid sequences via yeast Flp recombinase generates a host strain with a stable, chromosomally-integrated expression cassette (Figure 4A) that allows (i) blue/white screening of recombinants in the presence of (IPTG) and (Xgal) and (ii) tight regulation of expression from (Figure 4B). In contrast to E. coli, regulation from is very tight and full induction is readily obtained within 2 hr of inducer addition. This situation is advantageous in cases where overproduced gene products may be toxic to the host strain.
VECTOR DESIGN OF HOST SYSTEMS FOR PSEUDOMONAS
75
76
CONSTRUCTION OF HOST STRAINS FOR
H.P. SCHWEIZER ET AL.
PLASMIDS
We previously described an integration-proficient vector, mini-CTX-T7 (14), that allows construction of host strains for tight regulation of expression from broad-hostrange vectors containing (Figure 5). Unlike the only previously described system for expression in P. aeruginosa, which required a specific host strain or rather cumbersome derivatization of custom strains via phage infection and transposition of the regulatory element into random locations on the chromosome, the mini-CTX-T7 vector allows for ready construction of host strains within desired genetic backgrounds. In addition, the host strains are devoid of any antibiotic resistance markers, thus allowing use of virtually any broad-host-range T7 expression vector.
VECTOR DESIGN OF HOST SYSTEMS FOR PSEUDOMONAS
CURRENT DEVELOPMENTS AND FUTURE DIRECTIONS Although the arsenal of genetic tools for analysis of non-enteric bacteria has steadily improved over the last few decades, there is still considerable room for improvements in many areas. In the following paragraphs we will present some ideas on how advances in molecular biology, genetics and genomics may be exploited to further facilitate genetic manipulation and studies of the many non-enteric Pseudomonas and related bacteria of agricultural, biotechnological and medical importance.
77
78
H.P. SCHWEIZER ET AL.
EXTENSION OF THE MINI-CTX SYSTEM TO P. PUTIDA A search of finished and unfinished bacterial genome databases revealed the presence of a sequence in the P. putida KT2440 genome sequence which over 30 bp is 87% identical to the P. aeruginosa attB sequence (Figure 6A). Moreover, this sequence overlaps at the 3' end with the same gene as in P. aeruginosa, and this tRNA gene contains the same TGA anticodon and the same 24 kDa putative membrane protein in its 5' upstream region (Figure 6B) (19). Given the high degree of similarity between the two sequences, it will be interesting to test whether this site could indeed mediate integration of vectors containing the attP site and its cognate integrase gene. Alternatively, the attB sequence contained on the mini-CTX vectors could be engineered to match exactly the sequence found in the P. putida chromosome. If the mini-CTX system could be adopted for P. putida, it may become useful for the genetic or metabolic engineering of strains destined for environmental release. Since our integration vectors are equipped with FRT sites, they will allow engineering of strains eventually devoid of any selection marker, as is currently required for metabolic engineering of novel phenotypes for environmental applications and development of live vaccine strains (20, 21).
VECTOR DESIGN OF HOST SYSTEMS FOR PSEUDOMONAS
79
OTHER PROMOTER SYSTEMS One of the strongest promoters available for regulated gene expression in E. coli is based on the AraC-regulated arabinose operon (araBAD) promoter, In E. coli, the ratio of induction/repression can be 1200-fold, compared to 50-fold with vectors (22). The tight regulation of has been exploited to study the phenotypes of null mutations and to construct various expression vectors (22), some of which are commercially available (Invitrogen Corp., Carlsbad, CA). Although the system has significant potential for use in non-enteric Gram-negative bacteria, only a few broad-host-range expression vectors have been described and they have only been tested in a few bacteria, e.g., Agrobacterium tumefaciens (23) and Xanthomonas ssp. (24). The use of this promoter system in Pseudomonads has been very limited although broad-host-range, derivatives of commercially available vectors have been used to study gene and protein expression in P. aeruginosa (P. Warrener, PathoGenesis Corp., personal communication). The main drawbacks of the system outside of E. coli, at least in some non-enteric bacteria, seem to be high background activity in the absence of the inducer arabinose and induction/repression ratios that are lower or only marginally higher than those obtainable with (23, 24). Nonetheless, promising results of this promoter system with P. aeruginosa suggest further investigations with broad-host-range expression vectors. POSITIVE SELECTION MARKERS A drawback of many broad-host-range vectors is the absence of screenable or selectable markers that would allow positive selection of recombinants. Although incorporation of a allele into many of these vectors would allow blue/white screening of recombinants in appropriate E. coli, P. aeruginosa and conceivably other bacterial hosts, positive selection may in many cases be the preferred alternative. Positive E. coli selection vectors have been described that are based on ccdB, a gene derived from the F plasmid and encoding a bacterial cytotoxin (25, 26). The target of CcdB is the bacterial gyrase, an subunit tetrameric protein catalyzing the ATPdependent negative supercoiling of DNA. More specifically, CcdB acts by trapping DNA gyrase in a complex with the gyrase subunit covalently linked to the cleaved DNA. CcdB-resistant mutants contain mutations in gyrA. E. coli vectors were developed containing a multiple cloning site fused to the promoter-proximal end of ccdB (26) and these vectors can be stably propagated in a gyrA462-resistant mutant. Since evidence was obtained that CcdB is active against gyrase from P. aeruginosa (cited in (26)), it is conceivable that broad-host-range vectors derived from these E. coli vectors could be obtained in conjunction with appropriate Pseudomonas gyrA mutant host strains. HOST STRAINS Routine cloning and subcloning in any bacterial system also require suitable host strains. Such strains should (i) be easy to transform, (ii) allow screening or positive selection of recombinant clones, (iii) stably maintain recombinant DNA and (iv) allow
80
H.P. SCHWEIZER ETAL.
isolation of high-quality plasmid DNA suitable for subsequent manipulations, e.g., restriction enzyme digestion, subcloning, PCR analysis, etc.. In E. coli, many such host strains were engineered and contain alleles for blue/white screening, recA and other alleles for increased plasmid stability, endA mutations for superior plasmid DNA quality and many more. As outlined earlier in this chapter, we already have the capability of extending the blue/white screening system past the enteric bacteria. Furthermore, with many bacterial genome sequences available, it should be possible to engineer improved host strains, e.g., strains containing recA, endA and other mutations. Such mutations should preferably be unmarked so that plasmid antibiotic resistance markers are not compromised; suitable broad-host-range technologies to achieve this already exist, e.g., the Flp recombinase/FRT (Flp recombinase target site) method (27). Transformation barriers can be overcome by the use of alternative DNA transfer techniques, i.e., electroporation or conjugation. ACKNOWLEDGMENTS Research by H.P.S and R.R.K.-S. was supported by NIH grant GM56685. K.L.P was supported by an Undergraduate Research Fellowship from the Howard Hughes Medical Foundation. H.R.O. was an Undergraduate McNair Summer Research Scholar. REFERENCES 1 2 3 4
5 6
7 8 9
10 11 12 13
Schweizer, H.P. (1991) Gene 97, 109-112 . Yanisch-Perron, C., Vieira, J. and Messing, J. (1985) Gene 33, 103-119. Karkhoff-Schweizer, R.R. and Schweizer, H.P. (1994) Gene 140, 7-15. Schweizer, H.P., T. R. Klassen, T.R. and Hoang, T. (1996) in Molecular biology of Pseudomonads (T. Nakazawa, K. Furukawa, D. Haas and S. Silver, eds.), Amer. Soc. Microbiol. Press, pp. 229-237. West, S.E.H., Schweizer, H.P., Dall, C., Sample, A.K. and Runyen-Janecky, L.J. (1994) Gene 128, 81-86. De Boer, H.A., Comstock, L.J. and Vasser, M. (1983) Proc. Nat. Acad. Sci. U.S.A. 80, 21-25. Brosius, J., Erfle, M. and Storella, J. (1985) J. Biol. Chem. 260, 3539-3541. Brosius, J., Dull, T.J., Sleeter, D.D. and Noller, H.F. (1981) J. Mol. Biol. 148, 107-127. Howell, M.L., Alsabbagh, E., Ma, J.-F, Ochsner U., Klotz M.G., Beveridge, T.J., Blumenthal, K.M., Niederhoffer, E.C., Morris, R.E., Needham, D., Dean, G.E., Wani, M.A. and Hassett, D.J. (2000) J. Bacteriol. 182, 4545-4556.
Cronin, C.N. and McIntire, W.S. (1999) Anal. Biochem. 272, 112-115. Cronin, C.N. and McIntire, W.S. (2000) Prot. Expr. Pur. 19, 74-83. Watson, A.A., Alm, R.A. and Mattick, J.S. (1996) Gene 172, 163-164. Chebrou, H., Hurtubise, Y.,Barriault, D. and Sylvestre, M. (1999) J. Bacteriol. 181, 4804-4811.
VECTOR DESIGN OF HOST SYSTEMS FOR PSEUDOMONAS
14
15 16 17 18
19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
39
81
Hoang, T.T., Kutchma, A.J., Becher, A. and Schweizer, H.P. (2000) Plasmid 43, 59-72. Handfield, M. (1998) BioTechniques 24, 261-264. De Lorenzo, V. and Timmis, K.N. (1994) Methods Enzymol. 235, 386-405. De Lorenzo, V., Eltis, L., Kessler, B. and Timmis, K.N. (1993) Gene 123, 17-24. Schweizer, H.P. and Karkhoff-Schweizer, R.R. (1996) in Methods in Molecular Biology: Recombinant Gene Expression Protocols (R. Tuan, ed.), pp. 17-27, Humana Press, Totowa, NJ. Hayashi, T., Matsumoto, H., Ohnishi, M. and Terawaki, Y. (1993) Mol. Microbiol. 7, 657-667. De Lorenzo, V. and Timmis, K.N. (1992) in Pseudomonas: Molecular biology and biotechnology (S. Galli, S. Silver, B. Witholt, eds.), pp. 151-174, Amer. Soc. Microbiol. Press, Washington, DC. De Lorenzo, V. (1994) Trends Biotechnol. 12, 365-371. Guzman, L.-M., Belin, D., Carson, M.J. and Beckwith, J. (1995) J. Bacteriol. 177, 4121-4130. Newman, J.R. and Fuqua, C. (1999) Gene 227, 197-203. Sukchawalit, R., Vattanaviboon, P., Sallabhan, R. and Mongkolsuk, S. (1999) FEMS Microbiol. Lett. 181, 217-223. Bernard, P. (1995) Gene 162, 159-160. Bernard, P. (1996) BioTechniques 21, 320-323. Hoang, T.T., Karkhoff-Schweizer, R.R., Kutchma, A.J. and Schweizer, H.P. (1998) Gene 212, 77-86. Brunschwig, E. and Darzins, A. (1992) Gene 111, 35-41. Leemans, R., Remaut, E. and Fiers, W. (1987) J. Bacteriol. 169, 1899-1904. Winstanley, C., Morgan, J.A.W., Pickup, R.W., Jones, J.G. and Saunders, J.R. (1989) Appl. Environm. Microbiol. 55, 771-777. Blatny, J.M., Brautaset, T., Winther-Larsen, H.C., Haugan, K. and Valla, S. (1997) Appl. Environm. Microbiol. 63, 370-379. Ramos, J.L., Gonzalez-Carrero, M. and Timmis, K.N. (1988) FEBS Lett. 226, 241-246. Blatny, J.M., Brautaset, T., Winther-Larsen, H.C., Karunakaran, P. and Valla, S. (1997) Plasmid 38, 35-51. Fuerste, J.P. (1986) Gene 48, 119-131. Keen, N.T., Tamaki, S., Kobayashi, D. and Trollinger, D. (1988) Gene 70, 191197. Brosius, J. (1989) DNA 8, 759-777. Prentki, P. (1992) Gene 122, 231-232. Stover, C.K., Pham, X.-Q., Erwin, A.L., Mizoguchi, S.D., Warrener, P., Hickey, M.J., Brinkman, F.S.L., Hufnagle, W.O., Kowalik, D.J., Lagrou, M., Garber, R.L., Goltry, L., Tolentino, E., Westbrock-Wadman, S., Yuan, Y., Brody, L.L., Coulter, S.N., Folger, K.R., Kas, A., Larbig, K., Lim, R., Spencer, D., Wong, G. K.-S., Wu, Z., Paulsen, I.T., Reizer, J., Saier, M.H. , Hancock, R.E.W., Lory, S. and Olson, M.V. (2000) Nature 406, 959-964. Calos, M.P. (1978) Nature 274, 762-765.
GENETIC AND BIOCHEMICAL STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS Timothy L. Tellinghuisen1, Rushika Perera and Richard J. Kuhn* Department of Biological Sciences, Purdue University West Lafayette, IN 47907
INTRODUCTION Sindbis virus (SINV) is a member of the Togaviridae family of enveloped, positive stranded RNA viruses, and is the prototype virus of the alphavirus genus. The alphaviruses are of worldwide distribution (1). Several of the alphaviruses are important pathogens of man, causing diseases ranging from rash and arthralgia to fatal encephalitis (2). Additionally, some alphaviruses are pathogens of domestic animals. They are transmitted in nature by mosquitoes, and replication in the mosquito is an important stage of the virus life cycle. The alphaviruses are well characterized at a structural, genetic and biochemical level. In addition to cryo-electron microscopy (cryo-EM) image reconstructions of a number of alphaviruses, the structure of a portion of the capsid protein (CP) has been solved at atomic resolution for several members of the genus. The organization and composition of the virion, described in greater detail below, suggest a relatively simple assembly mechanism. The use of the alphaviruses as a model system for studying enveloped virus assembly is further supported by the powerful reverse genetics systems available. The sequence of all or part of the approximately 12,000 nt genome is known for at least nine alphaviruses. Infectious RNA transcripts can be generated from cDNA clones for at least five of these viruses, allowing mutagenesis to be performed and mutant viruses to be rescued. Additionally generated from cDNA, systems for the study of viral replication and assembly in cell culture have been *Corresponding Author 1 Present Address: Center for the Study of Hepatitis C, The Rockefeller University, 1230 York Avenue, New York, NY 10021 Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
83
84
T.L. TELLINGHUISEN ET AL.
described. The alphaviruses provide an ideal system for understanding the complex events of virus assembly, envelopment and release. The material presented in this chapter will summarize the life cycle of the alphaviruses and will then focus on the assembly process of the virion. THE ALPHAVIRUS LIFE CYCLE The alphavirus life cycle begins with the attachment of the virion to a host cell receptor protein, followed by receptor-mediated endocytosis. Following acidification of the endocytotic vesicle, membrane fusion occurs and the nucleocapsid core (NC) is released into the cytoplasm (3, 4). After NC interaction with cellular ribosomes, the viral genomic RNA is uncoated from the NC and released into the cytoplasm (5). Genomic RNA serves directly as messenger RNA (mRNA) for the translation of two non-structural polyproteins, which differ only at their C-termini as a result of readthrough of a stop codon. A complex series of proteolytic processing events, that are mediated by a virusencoded protease, generates the viral non-structural proteins (nsP1, nsP2, nsP3, nsP4) (6, 7). The temporal processing of the polyproteins has been proposed to be responsible for the switch between positive- and negative-strand RNA synthesis. Once generated, these nsPs comprise the membrane-associated viral replication complex (8). After translation and processing of the nsPs, genome replication occurs through a negative-strand replication intermediate. In addition to the production of progeny positive-strand genomes, a subgenomic mRNA is transcribed from a subgenomic promoter on the minusstrand RNA by the replication complex (Figure 1). This subgenomic mRNA is translated to produce the 130 kDa structural polyprotein. The order of proteins in this polyprotein is CP-PE2-6K-E1. The CP, through an autocatalytic cleavage, removes itself from the polyprotein co-translationally, thereby exposing a signal sequence directing the remaining structural polyprotein to the endoplasmic reticulum (ER). Once in the ER, the glycoproteins undergo a complex maturation process leading to the generation of a ‘spike’ composed of a trimer of heterodimers of the El and E2 proteins (9-11). For some alphaviruses, the E3 glycoprotein is secreted from the cell and is not part of the mature virion, but in other members, the E3 protein remains part of the spike complex and is present in the mature virion. The glycoprotein spikes are transported to the cell surface where they associate to form patches of spike complexes. Independent of glycoprotein processing, newly-released CP is associated with both cellular ribosomes and viral genomic RNA. The role of the ribosomal association in the assembly process is unclear (12, 13). Following association with viral RNA, the CP oligomerizes into the NC. The NC is a stable cytoplasmic complex that can be directly observed in infected cells by electron microscopy and purified from the cytoplasm of infected cells. Once assembled, the NC is found associated with both internal cell membranes and the plasma membrane at the cell surface. This interaction has been shown to be via direct contact of a hydrophobic pocket on the surface of the NC with the cytoplasmic domain of the E2 glycoprotein (cdE2) (14-17). Some debate exists as to whether the location of productive NC assembly is both cytoplasmic and membrane associated or exclusively at the plasma
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
85
86
T.L. TELLINGHUISEN ET AL.
membrane (18, 19). Nevertheless, the association of NCs with cell surface spikes, leads to budding of progeny virions from the plasma membrane. With an overview of the alphavirus life cycle now presented, more specific details of the various processes of viral infection can be discussed as they relate to virus assembly. VIRION STRUCTURE Despite the inability to obtain an x-ray crystal structure of the alphavirus virion (20), considerable information about the structure and organization of the virion is available. Cryo-EM image reconstructions of Ross River virus (RRV), SINV and Semliki Forest virus (SFV) have been generated that demonstrate the organization of the alphavirus virion, which is strikingly similar among these viruses (21-25). The cryo-EM image reconstruction of RRV is shown in Figure 2 (23). The virion has an outer diameter of 710 Å with the outermost layer made up of the glycoprotein spikes (Figure 2A). These glycoproteins are arranged as 80 spikes, each of which is composed of a trimer of heterodimers of the El and E2 glycoproteins. The 80 spikes, that project approximately 100 Å off the surface of the virus particle, are arranged into a T=4 icosahedron. The most striking feature of the outer layer, is that the majority of the particle is covered with protein, with only small holes at the five-fold axes and slightly larger holes at the twofold axes, through which the lipid bilayer can be observed. As the density corresponding to the spikes is followed inward towards the level of the lipid bilayer, a dramatic rearrangement of the spikes occurs. The eighty trimers seen on the outside of the particle reorganize to yield 12 pentamers and 30 hexamers of the glycoproteins at the level of the outer leaflet of the plasma membrane (Figure 2B). Spikes on the three-fold axes of symmetry become part of three adjacent hexamers, whereas the spikes on the quasi-threefold axes of symmetry become part of two hexamers and a pentamer. This rearrangement of the glycoproteins produces an orientation that mirrors that seen for the internal NC (Figure 2C). Following the rearrangements of the spikes just above the plasma membrane, the transmembrane domains of both the El and E2 glycoproteins cross the 45 Å membrane. Once internal to the membrane, the El protein terminates, with only two amino acids on the inner leaflet of the membrane (26). The E2 protein, which has 33 internal residues (26), crosses the membrane and interacts with the internal NC (15, 17, 27-30). This interaction with E2 is believed to stabilize the structure of the NC, possibly serving a role analogous to that described below for Helix I of the CP. Removal of the density corresponding to the glycoproteins and the lipid bilayer from the image reconstruction allows visualization of the inner NC (Figure 2C). The 410 Å diameter NC is composed of 240 copies of the CP surrounding a single viral genomic RNA. The NC is arranged as a T=4 icosahedral lattice, with the CP organized into discrete pentameric and hexameric capsomeres that project approximately 40 Å from the surface of the core particle. The complete NC is composed of 12 pentamers and 30 hexamers. The resolution of the cryo-EM reconstruction (about 25 Å) allows visualization of density corresponding to the C-terminal domain of CP as discrete ‘lobes’ within the capsomeres (23). The region at the base of these capsomeres, the ‘floor’ of the NC, is believed to be composed of both RNA and the N-terminal region of the CP.
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
87
Although large numbers of cytoplasmic NCs are observed in infected cells, it has been proposed that these cores represent a dead end in the assembly process, with only productive assembly occurring at the plasma membrane (18, 19). The role of cytoplasmic cores in the assembly pathway remains an area of debate. No structure of the free ‘cytoplasmic form’ of the NC exists, and only limited information about the structural changes that occur in the NC during interaction with the glycoproteins and other maturation events exists (31). The SINV CP is a 264 amino acid protein of approximately 30 kDa (32). The structures of the C-terminal domain of the SINV CP (residues 114-264) and CP(106264), as well as the SFV CP(119-267), have been determined by x-ray crystallography (15, 33, 34). In all of these crystal structures the CPs are arranged as crystallographic dimers. In addition, crystal structures of CP(106-264) have demonstrated another interesting protein-protein contact. In this structure an N-terminal arm, corresponding to
88
T.L. TELLINGHUISEN
ET AL.
amino acids 106-113, was found to extend from one monomer and dock into a hydrophobic pocket on the surface of an adjacent monomer (15). In this interaction, two conserved leucine residues, L108 and L110, are directly inserted into the hydrophobic pocket. It has been proposed that this hydrophobic pocket functions both in assembly and virus budding, and may serve as a switch between NC formation and NC association with the glycoproteins (15, 17). The structure of the CP demonstrated that the C-terminal domain (amino acids 114 to 264) possesses a chymotrypsin-like serine proteinase fold, containing the canonical catalytic triad residues. These residues, S215, H141 and D163, have been shown by mutagenesis to be required for the autocatalytic activity of the CP (35, 36). CP(114-264) has been shown to represent the minimum sequence required for proteolysis in vivo and in vitro (37; Little, C. and RJK, unpublished data). The structure of the C-terminal protease domain of the CP consists of two 6- or 7- stranded antiparallel sub-domains, with the active site located in a cleft between the two sub-domains. The folding of the CP appears to position the C-terminal tryptophan residue (W264) in the protease active site, where cleavage of the CP occurs. Once cleaved, W264 remains in the substrate binding pocket, preventing further catalytic activity of the CP. OVERVIEW OF VIRION ASSEMBLY The mechanism of the alphavirus NC assembly, although studied for several decades, is poorly understood. It is known from previous in vivo studies that immediately following translation and autocatalytic proteolysis, the CP is associated with the large subunit of the ribosome (12, 38). It then associates with genomic RNA and rapidly assembles into cytoplasmic NCs. The rapid rate of NC formation has made identification of assembly intermediates difficult. To date, in vivo studies have only identified large protein-nucleic acid aggregates that have been proposed to represent valid intermediates in the assembly process (13). An in vitro assembly system for SINV core-like particles (CLPs) was previously established with the use of CP isolated from virus particles (39, 40). CLPs produced by this system with viral genomic RNA closely resembled cytoplasmic cores purified from infected cells in size, shape and composition. This assembly system provided the first insights into the biochemical requirements of NC assembly. A significant limitation of this early assembly system was the reliance on CP purified from assembled virus particles, thereby eliminating the ability to assay CP mutants that were defective in virus production. An in vitro CLP assembly system with CP purified from E. coli and a variety of nucleic acids has been developed to overcome the limitations of the previous assembly system (41). CLPs generated by this system are also identical to authentic NCs and intermediates in the assembly process have now been identified. The synthesis and processing of alphavirus glycoproteins have been reviewed extensively (1, 42). Following autocatalytic cleavage of the CP, the glycoproteins are translated from the nascent polyprotein in the form of PE2-6K-E1, with the PE2 peptide being the precursor of E3 and E2. During transit to the plasma membrane, PE2, 6K and El are co-translationally processed from the polyprotein by proteolytic enzymes within the lumen of the ER, and glycosylated by the addition of high mannose oligosaccharides.
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
89
The glycoproteins are then transported to the cis or medial Golgi cisternae where they are covalently modified with fatty acyl chains. These modified proteins are then transported to the trans Golgi cisternae where the high mannose oligosaccharides are trimmed and then modified to form complex oligosaccharides. In a late Golgi or post-Golgi compartment, PE2 is cleaved into mature E3 and E2 proteins. Finally, the E3, E2, 6K and El complex is transported to the cell surface where the cdE2 interacts with NCs that have assembled in the cytoplasm. This interaction has been proposed to involve the binding of the cdE2 into the hydrophobic pocket found in the CP. A productive set of cdE2-CP interactions leads to budding of the mature virus particle from the plasma membrane of the cell. However, the precise molecular details that describe this budding process have yet to be elucidated. THE SEARCH FOR THE MECHANISM OF NC ASSEMBLY Regions of the CP Implicated in Assembly Examination of the CP with structural, biochemical, and genetic approaches has allowed dissection of the protein into three main regions (Figure 3A). The C-terminal portion of the CP, corresponding to amino acids 114-264, has been solved to atomic resolution and contains the protease domain (33). Examination of the cryo-EM image reconstruction of RRV suggests this domain of the CP lies completely within the capsomeric projections on the surface of the NC (23). This region of the CP has also been shown to contain a hydrophobic pocket that has been predicted to be involved with NC assembly as well as E2-NC interactions (15, 16). The central region of the CP, composed of amino acids 76-113, is involved in specific RNA recognition and genome packaging. Virus particles normally contain greater than 95% genome RNA, suggesting that specific RNA packaging occurs. The sequence on the RNA genome that serves as the encapsidation signal has been mapped to residues 945-1076 on the viral genome RNA (43). Weiss and colleagues have demonstrated that deletion of CP amino acids 76-107 resulted in almost a complete loss of specific RNA binding activity, whereas, residues 76-132 by themselves exhibit specific RNA binding activity (44). The deletion of CP residues 97-106 disrupted specific RNA binding but not NC assembly in vivo, leading to increased viral encapsidation of the subgenomic RNA (45). This region of the CP, specifically amino acids 94-106, has also been implicated in the interaction of the CP with ribosomes (4648). This association with ribosomes has been postulated to be involved in both the disassembly and assembly of the NC (5, 48). The third region of the CP, consisting of amino acids 1-75, is poorly conserved among the alphaviruses. This highly charged region of the CP has been proposed to be involved in nonspecific charge neutralization interactions with the viral RNA (37, 49-52). Deletions of large charged sequences in this region did not eliminate virus replication, suggesting no specific role exists for this unconserved region of the CP (37). However, within this region lies a sequence of largely hydrophobic and polar amino acids that have been proposed to form an based on modeling of the primary sequence into helical wheel plots (52a). These residues represent the only significant stretch of amino acids in
90
T.L. TELLINGHUISEN ET AL.
this region of the CP that are conserved among the alphaviruses. This putative helix, designated Helix I, is believed to stabilize interactions between capsomeres by extending from one CP to another, across the inter-capsomeric space, where a leucine zipper dimerization interaction occurs. Additional regions of the CP that may be involved in the assembly of the NC can be gleaned from the atomic model of the NC generated by the fitting of the atomic coordinates of the CP(114-264) crystal structure into the electron density of the RRV NC (23). In this NC model, the interface between individual CP monomers within a
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
91
capsomere has been proposed to form a clamp-like interaction in which a loop from one monomer contacts a cleft in an adjacent monomer. Analysis of a series of N-terminal deletions of the CP with the in vitro assembly system suggests that deletions up to amino acid 19 do not affect assembly in vitro, whereas further deletions block CLP formation (Figure 3B) (41). Truncations of the Nterminus of the CP that failed to assemble CLPs, but contained the specific RNA binding site (amino acids 97-106), retained nucleic acid binding capability. Additionally, truncated CPs that were competent for nucleic acid binding could be incorporated into wild-type CLPs when present in small amounts in assembly reactions (Figure 3B). These proteins could inhibit CLP formation when larger amounts of truncated CP were present in assembly reactions. Particles containing a mixture of truncated and wild-type CP appeared identical to wild-type CLPs by negative stain electron microscopy. The best characterized of these incorporation-active truncations is the SINV CP containing amino acids 81-264, referred to as CP(81-264). Truncations of the CP that remove the RNA binding site generated proteins that failed to bind nucleic acid and failed to incorporate into CLPs. These data suggested that nucleic acid binding was an early and required step in the assembly process and that additional contacts in the N-terminal region of the CP were required to complete particle formation following nucleic acid binding. Analysis of a wide range of mutants designed to disrupt the key assembly regions of the CP have been generated and examined for their ability to assemble CLPs in the in vitro system (41). Mutations designed to disrupt the 'clamp' interaction observed in the NC model have no effect on either the ability to generate CLPs in vitro or the properties of the resulting particles. Analysis of residues that make contact surfaces between monomers in the crystallographic dimer also demonstrate no aberrations in the in vitro CLP assembly system. Alteration of the conserved leucine residues in the 'arm-pocket' region to charged residues leads to CPs that retain assembly competence (15). The particles generated from these mutants have altered phenotypes in several of the in vitro particle assays; however, they appear normal at the level of negative stain electron microscopy. The reason for the altered behavior of these particles remains unclear. Additionally, a mutation that deletes amino acids 107-113 of the CP has been generated and it failed to assemble CLPs in vitro. Mutant CPs that delete the specific RNA binding site of the CP appear to have no affect on in vitro assembly. Perhaps the most interesting mutants generated to date are those that lie within the Helix I region of the CP (52a). Single amino acid substitutions of the key leucine residues in this region are sufficient to block assembly completely in vitro. The best characterized of these mutants is the CP containing a substitution of leucine 52 to aspartic acid (referred to as CP(19-264)L52D). Experiments outlined below serve to define the role of the N-terminal region of the CP in assembly. NC Assembly in Vitro Requires Nucleic Acid During the development of the in vitro NC assembly systems, the importance of nucleic acid to the assembly process was observed (39, 41). More extensive analysis of the nucleic acid and protein requirements for in vitro CLP assembly has indicated that nucleic acid is absolutely required for NC formation. This supports in vivo observations
92
T.L. TELLINGHUISEN ET AL.
that virus particles lacking nucleic acid were never observed. No particle formation was detectable in vitro in the absence of nucleic acid with the use of a variety of low and high ionic strength buffers. In addition to the requirement for nucleic acid, several limitations on the type, size and concentration of nucleic acid required for in vitro CLP assembly was observed (41). Initial assembly experiments were conducted with viral genomic RNAs, single-stranded DNAs and synthetic polyanions. Although CLP assembly occurred with each of these substrates, these nucleic acids were all single-stranded nucleic acids, or polyanion analogs of single-stranded nucleic acids. Double-stranded DNAs of a wide range of sizes failed to serve as substrates for CLP assembly. Surprisingly, structured non-specific RNAs, such as tRNA, could serve as assembly substrates, despite containing large regions of helical, double-stranded RNA. The ability to assemble in vitro CLPs with a variety of single-stranded nucleic acids and polyanions, but not with dsDNA, indicates that charge neutralization of the basic CP by polyanionic substrates is not the only role of nucleic acid in core assembly. The helical structure of DNA may prevent some nucleotide base-specific contacts required for protein-nucleic acid interactions in particle formation. The DNA helix also may pose problems for assembly due to the relative size and rigidity of the helix compared to single-stranded substrates. Nucleic acid length requirements were observed for in vitro CLP formation with a variety of single-stranded nucleic acid substrates (41). The standard assembly oligonucleotide used to develop the heterologous protein in vitro assembly system was a non-virus specific 48 nucleotide single-stranded DNA. Truncations of this non-specific oligonucleotide were generated in order to define a minimum nucleic acid oligo size length for assembly of CLPs. Truncations of the standard oligonucleotide past 14 bases led to a loss of particle assembly in vitro. The reason for the observed minimum length of single- stranded substrates for in vitro assembly is unknown, but a role for nucleic acid in tethering adjacent CPs together has been proposed. As a specific CP binding site on the viral RNA has been identified and shown to be required for encapsidation of viral RNAs in vivo, the ability of the viral encapsidation signal to be specifically recognized in the in vitro system was investigated (41, 43). In the presence of a 5-fold molar excess of tRNA or assembly oligonucleotide, only viral genomic RNA was encapsidated, with other substrates excluded from assembled particles. Nucleic Acid-Dependent Crosslinking of the CP As several CP truncations and mutations that lacked assembly competence were capable of binding nucleic acid in vitro, the oligomeric state of these proteins when present as part of a protein-nucleic acid complex was investigated. Analysis of the oligomeric state of the SINV CP by a variety of methods has indicated that the protein exists as a monomer in solution in the absence of nucleic acid (41). Previous crosslinking analysis of free CP and CP in NCs suggested the protein existed in different conformations in these two environments (53-55). As mentioned previously, the ability of truncated, assembly-defective proteins to bind nucleic acid has been observed in the in vitro assembly system. In addition to nucleic acid binding, these truncated proteins had biological activity, in that they were capable of incorporating into CLPs when present as
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
93
minor components in assembly reactions. The ability of these proteins to bind nucleic acid and incorporate into CLPs, but not assemble alone, suggested that the nucleic acid bound form of these CPs represented an early step in the assembly process. A nucleic acid-dependent crosslinking of the CP into protein dimers was observed with the imidoester crosslinking reagent dimethyl suberimidate (DMS) (Figure 3B) (56). The nucleic acid type, length and concentration restrictions observed for in vitro CLP assembly were also observed for crosslinking. In addition, crosslinking of CLPs suggested the same dimeric species existed in complete particles. Full length assemblydefective CPs that retained nucleic acid binding competence were capable of being crosslinked (Figure 3B). No crosslinking was observed in the absence of nucleic acid. It was found that lysine-specific crosslinking reagents, regardless of chemical mechanism, could generate the dimer of the CP if the crosslinking groups were approximately 11 Å apart. Shorter reagents failed to generate dimeric forms of the CP. To demonstrate the biological activity and physiological relevance of the dimer, crosslinked dimers of assembly-defective CPs were added as minor components to assembly reactions containing wild-type CP (56). Crosslinked dimers of CP(81-264) and CP(19-624)L52D were capable of incorporating into CLPs under these conditions. Analysis of these particles containing crosslinked dimers by negative stain electron microscopy suggested that no major defects in the particle morphology existed and they were indistinguishable from wild type in vitro CLPs. The ability of crosslinked dimers of the CP to incorporate into assembled CLPs suggests the dimer represents a valid trapped intermediate in the assembly process. With the relevance of the dimer to the assembly pathway demonstrated, the location of the crosslink was investigated and found to occur between lysine 250 residues on adjacent monomers (Figure 4A) (56). Examination of the NC model indicated that lysine 250 faces outward from the capsomeres into the inter-capsomeric space. The distances between lysine 250 residues of CPs in adjacent capsomeres were found to be 10.5 Å between pentamer-hexamer contacts and 14.7 Å between hexamer-hexamer contacts. Thus the dimer isolated in these crosslinking experiments may represent one of the two possible dimer forms present in the mature NC. However, it does appear to be a true intermediate in the assembly process and has substantiated the atomic model of the NC. Rescue of Assembly Defects by Crosslinking The demonstration of a crosslink between adjacent CPs across the intercapsomeric space, in conjunction with the idea that the N-terminal portion of the CP occupies this same space, led to the hypothesis that assembly-defective CPs with truncations or mutations in the N-terminus might become assembly competent if the crosslink was used to substitute for a functional N-terminus. The CLP assemblydefective CPs, CP(81-264) and CP(19-264)L52D, contain either a deletion of the entire N-terminal region or a point mutation in the Helix I region of the CP, respectively. No CLP formation for these proteins has been observed under normal in vitro assembly conditions but the ability to bind nucleic acid and incorporate into CLPs in the presence of the wild-type protein was retained in these mutants. As these mutations were in the
94
T.L. TELLINGHUISEN
ET AL.
region of the CP believed to lie in the inter-capsomere space of the NC, it was proposed that inter-capsomere crosslinking of CPs could provide a stabilizing contact in place of the authentic N-terminus. Methods for the purification of monomer free crosslinked dimers of these assembly-defective CPs were developed. The presence of the intercapsomere crosslink was found to rescue assembly of CLPs for these mutants (Figure 3B) (56a). Mutagenesis of Helix I and analysis of the resulting virus mutants in vivo have suggested this region is not required for virus viability, but is directly involved in the stability of the virion, presumably through stabilization of the NC (52a). The ability of crosslinked dimers of CP(19-264)L52D to assemble CLPs suggests the chemical crosslink overcomes the lack of stabilization caused by disruption of inter-capsomere helix interactions when amino acid substitutions are introduced into the conserved leucine 52 residue of Helix I. The ability of CP(81-264), which has a substantial deletion of the N-terminus to assemble into CLPs when present as crosslinked dimers, demonstrates that the crosslink can substitute for the first 80 amino acids of the CP. This implies that no required contacts exist in this region of the CP for spherical particle assembly, apart from the Helix I contacts described above. Previous deletion analysis of the N-terminal region of SFV demonstrated that much of the N-terminal region of the CP could be eliminated without completely eliminating virus formation in vivo (37). This suggested the glycoprotein interactions could suppress the defect caused by the loss of N-terminal sequences.
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
95
96
T.L. TELLINGHUISEN ET AL.
A MODEL OF NC ASSEMBLY Previous models of assembly of the NC have proposed the assembly process begins with the crystallographic dimer, the ‘arm-pocket’ dimer, or preformed pentameric and hexameric capsomeres, which then polymerize to generate the NC (15, 33, 50). The polymerization of these individual subunits is driven by charge neutralization induced by binding to the viral RNA. Several independent lines of inquiry have demonstrated the solution state of the CP is that of a monomer. Additionally, previous examination of assembly in vitro and in infected cells, has suggested that preformed capsomeres do not exist as precursors to NC formation. These data make the previous models of assembly unsatisfying. More recent data outlined above has led to the development of a new NC assembly model in which the polymerization of 120 nucleic acid bound dimers in an inter-capsomere fashion can generate an icosahedral NC (56). It is apparent that nucleic acid binding is an early and absolutely required step in the initiation of assembly. In addition to the previously proposed charge neutralization role of nucleic acid in assembly, a specific role for nucleic acid in protein oligomerization has been identified (Figure 4A). Once the initial nucleic-acid-bound dimer has oligomerized in solution, the polymerization of 120 identical dimers leads to the generation of the assembled NC. The ability to generate what appears to be a complete particle from only purified crosslinked dimers of the CP and nucleic acid supports a model in which the polymerization of identical dimers leads to the generation of the NC. Presumably these dimers make both intra-capsomeric and inter-capsomeric stabilization contacts with nucleic acid during the condensation into a core particle. The polymerization of 120 identical dimers raises an important question that involves whether the initial inter-capsomeric contact occurs between two hexamers at the three-fold symmetry axis or between a pentamer and hexamer at the quasi-three-fold axis. An initial inter-capsomeric contact between a hexamer and a pentamer around a growing hexamer would generate a hexamer of dimers, with half of each dimer occupying a portion of either a hexamer or pentamer on an adjacent capsomere (Figure 4B, top row). If the polymerization is controlled such that the growing hexamer makes four hexamer and two pentamer contacts, a T=4 icosahedron would result as polymerization of additional dimers into the particle proceeded. A recent 9 Å reconstruction of the SFV virion indicates a skewing of the two-fold axes that are occupied by the hexameric capsomeres (25). Perhaps this skewing of the hexamer limits polymerization around the hexamer to that of four hexamers and two pentamers. If this is the case, the assembly of a T=4 icosahedron is pre-programmed into the particle early in the assembly process. The observation of uniform-sized particles is consistent with a T=4 structure and suggests a controlling element in the polymerization of dimers. Assembly around a pentamer would generate a pentamer of dimers in which each dimer occupied a portion of a hexamer in the growing icosahedron (Figure 4B, bottom row). Once assembled, a pentamer surrounded by nearly complete hexamers could continue polymerization of additional dimers and generate either a T=3 or T=4 icosahedron. It is easy to imagine the pentamerhexamer contacts skew the hexamer, thereby allowing the oligomerization of only two pentamers around a hexamer. The current assembly data are unable to distinguish between either a hexamer- or pentamer-ordered assembly.
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
97
Examination of the model of the NC provides some insight into the possible assembly mechanism. Measurement of the distances and angles between lysine 250 side chains between pentamer-hexamer (10.5 Å, 35°) and hexamer-hexamer (14.7 Å, 35°) inter-capsomeric spaces suggests the observed crosslink described previously represents a pentamer-hexamer contact. However, one must consider the fact that measurements of distances between capsomeres described above is from the model of the NC, and may not represent actual distances in the NC. It is likely that the NC exists as a dynamic structure, thereby leading to a flexibility of the measured distances. However, the data suggest that the initial assembly nucleus occurs around a pentameric structure, possibly a pentamer of dimers. The use of longer crosslinking reagents may allow discrimination between the pentamer-hexamer and hexamer-hexamer contacts. MATURATION OF THE GLYCOPROTEINS Proteolytic Processing Immediately following the autocatalytic cleavage of the CP, the N-terminal 19 residues of PE2 function as a signal sequence and insert PE2 and the nascent polyprotein into the ER. The topology of this membrane-bound polyprotein is shown schematically in Figure 5. This polypeptide is then immediately glycosylated, resulting in the retention of the uncleaved signal sequence in the lumen, and preventing further translocation of the polyprotein (57-59). The 26-residue transmembrane domain of E2 (SINV) acts as a stop transfer signal and anchors PE2 in the membrane (26, 60). The 33-residue cdE2 then acts as a signal peptide for the translocation of a 55-residue peptide, termed 6K. The cdE2, although transiently found in the lumen of the ER, is later retracted into the cytoplasm to interact with the NC (61, 62). The C-terminal domain of the 6K protein acts as the signal peptide for the El protein (63-67). Following translocation, the transmembrane domain of El, also 26 residues in length (SINV), acts as a stop transfer signal leaving two E1 residues on the cytoplasmic side of the membrane. The release of 6K from both PE2 and E1 is accomplished by signalase cleavage in the lumen of the ER (66, 68, 69). In vertebrate cells, PE2 is cleaved into the E3 and the E2 polypeptides by a furin-like protease downstream of a R-X-(K/R)-R sequence. This cleavage occurs during the transport of the proteins through the trans Golgi complex (7076). In contrast to vertebrate cells, mosquito cells process this cleavage site at an early step during transport (76, 77). Several studies have shown that the cleavage of PE2 is not required for the production of virus particles, and that mutants deficient in PE2 cleavage can incorporate the precursor into particles (75, 78-83). It has, however, been demonstrated that these particles have low, if any, specific infectivity and that their biological activity was highly dependent on the genetic background of the virus. Second site suppressors of PE2 cleavage mutants have been mapped to E2 and E3 (78, 83, 84). The PE2-E1 heterodimer is more stable than the E2-E1 heterodimer. Therefore, the delayed cleavage of PE2 may be a mechanism to maintain a stable heterodimer until the final stages of assembly. Prior to exit, the virus needs to be activated for disassembly, hence PE2 is cleaved, resulting in a less stable E2-E1 heterodimer. The low infectivity of the PE2 cleavage-deficient mutants may be due to the increased stability of the PE2-El
98
T.L. TELLINGHUISEN ET AL.
complex, rendering it less efficient in disassembly. The compensating mutations in E2 and E3, therefore, may destabilize the complex and favor proper disassembly (1). Post-Translational Additions It has been well established that glycosylation is important for the proper folding and maturation of proteins that travel through the secretory pathway. The alphavirus glycoproteins are glycosylated by asparagine-linked oligosaccharides of the form These groups are transferred to the polypeptide chain via a lipid intermediate, dolichol pyrophosphate, and the transfer is initiated shortly after the
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
99
translocation of the polypeptides into the lumen of the ER (1, 57, 85-87). Both PE2 and E1 glycoproteins are glycosylated, but the number and position of these glycosylation sites are not conserved among the different alphaviruses. In SINV, there are two potential glycosylation sites in both E1 (N139 and N245) and in E2 (N196 and N318), whereas SFV and RRV have only one site in E1 (N141) and two sites in E2 (N200 and N262). Several studies with tryptic or pronase-digested peptides of SINV glycoproteins have shown that all four sites are glycosylated (88, 89). Yet, the type and extent of glycosylation is variable depending on the host cell, time of passage, and most importantly, the accessibility of the site to cellular processing enzymes. In chicken cells for instance, the N196 site in E2 contains a complex oligosaccharide and N318 contains a high-mannose structure. The N139 site in El is also complex, but the N245 site is variable. Yet in BHK cells, the oligosaccharides are predominantly complex (90-92). It has been suggested that this variability in glycosylation patterns could be due to the accessibility of the oligosaccharide structures in these proteins. The sites with high mannose structures are less accessible to processing enzymes and the complex chains are found at the more accessible N-termini of these glycoproteins (93, 94). The existence of differential kinetics in the processing and transport of the proteins, or availability of the processing enzymes may also contribute to the variability in glycosylation. The fatty acid acylation of SINV and SFV glycoproteins occurs about 10-20 minutes after their synthesis (95,96). The covalent attachment of palmitic acid, the primary long-chain fatty acid found in these proteins, is catalyzed by an acyl transferase that uses palmitoyl coenzyme A as the donor (97). The site of modification has been localized to cysteines proximal to or embedded within the plasma membrane (42). Studies in vertebrate cells have shown that the acylation reaction takes place in an intermediate compartment prior to the arrival of proteins in the cis-Golgi (74, 98). 1,2mannosidase I, which is a cis-Golgi-associated function, releases high mannose chains from oligosaccharides on these glycoproteins. It has been shown through sensitivity to endo-D and endo-H, that fatty acyl chains are observed in these proteins prior to mannosidase function (74). Chemical analysis of purified acyl peptides of El has identified the palmitoylation site for SFV as C433. This residue is in the transmembrane segment of the protein in the middle of the inner leaflet of the membrane (99). In SINV E1, C430 has been identified as a palmitoylation site. Substitution of C430 with alanine has shown that the mutant virus was defective in replication early during infection, and that the incorporation of fatty acid was decreased by <5% compared to the wild type. The mutant virus particles were also sensitive to detergent treatment (100). C433 in SFV and C430 in SINV are conserved among the alphaviruses, with the exception of Venezuelan equine encephalitis virus (VEE) and Eastern equine encephalitis virus (EEE) that do not have these residues and do not appear to be palmitoylated (1). Site-directed mutagenesis of three conserved cysteine residues in the cdE2 (C396S, C416A, C417A), and two residues within the transmembrane domain of E2 (C388A and C390A), have implicated these residues as sites for palmitoylation (100103). Interestingly, although all the mutants were reduced in the amount of palmitic acid
100
T.L. TELLINGHUISEN
ET AL.
incorporated and the rate of growth early in infection, the C396S, C416A and C417A mutants produced infectious, multi-cored particles that were more thermostable than wild- type particles. This observation indicated that these mutants were also defective in proper NC-cdE2 interactions (102, 103). Analysis of the 6K protein by labeling with acid has indicated that there are four possible palmitic acid groups attached to the cysteines in the protein (104). Site-directed mutagenesis of the 6K protein has also indicated that C35, C36, and C39 and possibly another cysteine are palmitoylated (105). It has been demonstrated in the above studies that underacylated proteins affect virus assembly. It has been suggested that acylation influences the lipid composition of membranes in the vicinity of viral integral membrane proteins and may be responsible for the selective acquisition of membrane components during assembly and budding of the virus (42). This is further supported by the observation that the lipid composition of viral membranes differs from that of the plasma membrane. Acylation of the cytoplasmic domain cysteines in E2 may also play a role in presenting the cytoplasmic domain in a favorable conformation for NC binding. The multi-cored phenotypes observed in the C416 and C417 mutants could then be a combination of the inaccessibility of the cdE2 to the NC, and also the lack of lipid organization favorable for proper position of the envelope around the NC. Glycoprotein Transport Analyses of temperature-sensitive mutants in the glycoproteins have contributed significantly to the understanding of glycoprotein transport through the secretory pathway (11, 106-113). These studies have shown that the glycoproteins move from the ER to the plasma membrane via the cisternal membranes of the Golgi complex in a cis to trans direction (114-120). It has also been shown that this transport may involve intermediate compartments referred to as pre-and post-Golgi vacuoles (108, 109). Analyses of these mutants have indicated that the proper folding of the proteins in the ER is a prerequisite for transport of the proteins to the plasma membrane. The folding of these proteins occurs immediately following translocation of the polypeptide chain into the ER, and is assisted by molecular chaperones, folding enzymes and disulfide bond formation (121-123). Oligomerization of PE2 and El is also a requirement for the transport of the glycoproteins, but the presence of the CP or the non-structural proteins is not required (3, 4, 58, 101, 124-128). It has been shown that PE2 oligomerizes with a partially-folded intermediate of El and that this oligomerization is sufficient for the proteins to exit the ER (113, 129). In the absence of El, PE2 is able to reach the plasma membrane, although mutants in El that abrogate its folding have been also shown to aggregate PE2. In the absence of PE2, El is unable to exit the ER (123). However, deletion of the cytoplasmic domain, or substitution of the transmembrane domain with the transmembrane domains of vesicular stomatitis virus glycoprotein G or fowl plague virus hemagglutinin HA, have been shown to have no effect on the transport process (58, 126).
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
101
NC AND CP INTERACT WITH GLYCOPROTEINS CP and Viral Glycoproteins are Required for Alphavirus Budding The final stage of the alphavirus life cycle involves interaction of the NC with membrane-bound glycoprotein spikes. Although several factors contribute towards this process, structural, biochemical and molecular genetic analyses have indicated that the primary driving force for this envelopment is a specific interaction between a hydrophobic pocket in the C-terminal domain of the CP and the cdE2. Thin section electron microscopy of cells infected with the wild-type virus has shown a clustering of NCs on the ER membrane and the plasma membrane (8, 17, 30). This membrane localization of NCs has been shown to be coincident with the distribution of glycoproteins on these membranes. When the glycoproteins are expressed in the absence of the CP, an accelerated turnover of the glycoproteins is observed, with no virus-like particles being released. Similarly when the CP is expressed in the absence of the glycoproteins, NC assembly occurs normally but no virus-like particles are released (14, 130). Furthermore, studies with polarized cells, where the transport of glycoproteins to the cell surface is limited to a selected surface, have also shown that virus budding only occurs through the surface containing the glycoproteins (131, 132). These observations are further supported by studies with temperature-sensitive SINV and SFV mutants that are defective in the transport of glycoproteins to the cell surface (133). ts23 of SINV, which has two substitutions in El (A106T and R267Q), is defective in glycoprotein transport at the non-permissive temperature and remains in the ER (11, 70, 134). The NCs in this mutant are distributed freely in the cytoplasm and seem unable to associate with membranes or form budding particles at this temperature (68, 135). At the permissive temperature, the transport defect is corrected and the association of the NC with the glycoproteins is resumed, releasing virus at high levels. Similar results were observed for a temperature-sensitive mutant in SFV (106). These data stress the importance of the NC-glycoprotein interaction during the budding process of these viruses. The final destination of these membrane-bound NCs is the plasma membrane, where the budding process has been found to occur synchronously in specific patches on the membrane (136, 137). This preferential clustering of budding structures is due to the localized concentration of membrane-bound spikes in these regions. This phenomenon has been observed through immunogold labeling and electron microscopy of thin sections of infected cells (138-140). It has been shown that these patches of spikes are almost exclusively associated with budding structures and that these regions are devoid of host proteins (139-141). As mentioned previously, the membrane composition of the virus particles is distinct from the membrane composition of the cellular membrane, and it has been suggested that the patching phenomenon mentioned above is partially responsible for the selection of membrane components during envelopment (42).
102
T.L. TELLINGHUISEN ET AL.
Structural Evidence for the CP-E2 Interaction Several crystal structures of the SINV CP have shown that the N-terminal arm of one CP monomer binds into a hydrophobic pocket on an adjacent CP monomer (15, 142). Residues within this N-terminal arm (SINV residues 108-110) show similarity to the region of cdE2 (residues 400-402) that specifically interacts with the NC during spike incorporation (15-17). Therefore, it has been proposed that the arm-pocket interaction observed in these structures plays a role during the early stages of assembly (discussed under NC assembly), and also mimics the interaction that occurs between the NC and cdE2 during the final stages of virus assembly (15, 17). The hydrophobic pocket is situated between two subdomains in the CP and is bordered by Y180, W247 and F166 (SINV numbering) (15). Following NC assembly, the hydrophobic pocket is oriented towards the surface of the NC, making the N-terminal arm too short to maintain its interaction with the pocket (15, 143). It has been suggested that following NC assembly, the arm is displaced from the pocket so that the pocket is now available for cdE2 docking. Fitting the atomic coordinates of the alphavirus CPs into the cryo-EM reconstructions of RRV and SFV have provided a location for the hydrophobic pocket in the NC (23, 25). These fits placed the hydrophobic pocket ~28 Å from the site of penetration of E2 through the inner leaflet of the lipid bilayer. This distance corresponds well with the distance spanned by the amino acids (391-400) of cdE2 and places the hydrophobic pocket in a position favorable for cdE2 docking (15). The current crystallographic model for the NC-cdE2 interaction places Y400 and L402 (SINV cdE2) in the hydrophobic pocket of the CP. In this model, A401 of cdE2 is exposed on the surface and does not make any significant contacts with the CP. Furthermore, this model also proposes that the side of the pocket that binds L402 is shorter in length than the side that binds Y400. This is supported by computer modeling studies that substituted a tyrosine for L402, and showed steric clashes between Y402 of the mutated cdE2, and M137 and F166 in the pocket (17). Molecular Genetic and Biochemical Studies of the NC-cdE2 Interaction Sequences within cdE2 are conserved among the alphaviruses, with residues within the 398-403 region being invariant (See Table 1). A 31 amino acid synthetic peptide corresponding to the complete sequence of the SFV cdE2, and a shorter peptide corresponding to the N-terminal 13 amino acids of cdE2, were tested for their binding properties to viral and cytoplasmic NCs. These peptides, either free in solution or coupled to solid supports, were observed to interact with both types of NCs, although the longer peptide was favored over the shorter one for binding. Furthermore, oligomeric forms of the longer peptides were favored for binding over the monomeric peptides (27). In a separate experiment, Collier and coworkers added synthetic peptides of varying lengths to SINV-infected cells and assayed for virus release (144). They observed that a 25 amino acid peptide corresponding to residues 391-415 of cdE2 of SINV inhibited virus release by 50-70% at peptide concentrations of 1 mg/ml. Analysis of shorter overlapping peptides with the same assay indicated that the shortest inhibitory sequence,
GENETIC STUDIES ONTHE ASSEMBLY OFAN ENVELOPED VIRUS
103
LTPYAL, corresponding to residues 397-402, inhibited virus release by 70% at peptide concentrations of 1 mg/ml. Mutagenesis of residues within this 6 amino acid sequence (residues 397-402) has further supported its involvement in NC binding. Substitution of L402 in SINV with residues of varying side-chain character reduced the efficiency of virus release by 100- to 100,000-fold compared to wild type, with more detrimental effects occurring when the substitution was less hydrophobic (17). Y400 was also analyzed in a similar manner and proven critical for virus budding (30). Interestingly, substitution of A401 did not affect virus release or plaque phenotype significantly (145). These data support the crystallographic model presented above by showing that Y400 and L402 are critical, but A401 is dispensable for the cdE2-NC interaction (17, 145). This is further supported by the observation that Y400 and L402 are invariant among the alphaviruses, while A401 lacks conservation (Table 1).
Mutagenesis of several other regions of the cdE2 have also demonstrated its importance in NC binding (16, 102, 103, 146). A series of mutants that include P399G, P404G, A401I, A401K and C461A, released normal amounts of virus although at a slower rate, and demonstrated slight defects in PE2-6K cleavage. The C417A was extremely detrimental to virus replication (103, 147) and the double mutant, C416S/C417A, failed to produce virus (102). Substitution of A401I, C396S and Y400F produced multi-cored particles (102, 103). Revertants isolated from these mutants, C417A/T256M, and P404G/S182N, mapped close to the hydrophobic pocket of the CP
104
T.L. TELLINGHUISEN ET AL.
(147). The three cysteine residues included in the above study have been previously proposed to be palmitylated and have been implicated in orienting the cdE2 in the membrane in a conformation favorable for binding the NC (100-103). Therefore, it has been suggested that mutagenesis of these residues alters the conformation of cdE2 and affects its interaction with the NC (102, 103). The two residues in the cytoplasmic domain of El are dispensable for NC binding (29). Through analysis of mutants at T398, Y400 and S420, Liu and Brown have proposed that phosphorylation is important for retracting the cdE2 from the lumen of the ER and orienting it in a conformation favorable for NC binding (61, 62). A double mutant at T398A/Y400N prevented phosphorylation of PE2 or E2, and retained the cdE2 in the membrane. As a result, the double mutation inhibited virus release, although glycoproteins were transported normally to the cell surface. Revertants to this mutant retained the 398 substitution, but had a five amino acid deletion (residues 401-405) that restored a tyrosine at position 400, and valine at position 402. This revertant restored virus release to wild-type levels (61, 62). Further evidence for the pocket-cdE2 interaction is provided by the analysis of chimeric viruses where the glycoproteins and CP are derived from different viruses (148, 149). Substitution of the RRV CP with the SINV CP, renders the virus nonviable, although NC assembly remains unaffected. Adapting the cdE2 sequence of RRV to that of SINV improved viability by 10,000-fold (149). BUDDING AND VIRION RELEASE Cholesterol Requirements During Alphavirus Budding Studies on alphavirus fusion have demonstrated that cholesterol is required in the target membranes for efficient fusion to occur (150). Optimal fusion efficiency requires a 33 mole-percent of in the bilayer (151). Sphingolipid is also required, although only at 2-5 mole-percent, with ceramide being the minimal sphingolipid structure tolerated (152, 153). This sensitivity to membrane composition has been attributed to the direct interaction of cholesterol and sphingolipids with E1, where cholesterol is involved in the initial low pH binding of the spikes to membranes, and sphingolipid is involved in subsequent fusion (152, 154-156). Interestingly, cholesterol is also required during the exit of these viruses from infected cells (157). Studies with control and sterol-depleted C6/36 mosquito cells have demonstrated that SFV released only 3-4% of infectious virus from sterol-depleted cells compared to control cells (157). Sterol-independent srf (sterol requirement in function) mutants (157, 158), or vesicular stomatitis virus (which does not require cholesterol for exit or fusion), released infectious virus similarly in both control and sterol-depleted cells (157-159). Using radiolabeled spike proteins derivatized with biotin and isolating biotintagged virions using magnetic streptavidin particles, Lu and Kielian (160), have shown that cholesterol-dependant mutants were deficient in spike protein incorporation into particles, but were not affected in E1-E2 heterodimerization, or spike protein transport to the cell surface. These studies also correlated an increased turnover of spike proteins with a deficiency in virus budding.
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
105
This cholesterol dependence has been localized to a region of 12 residues between 224-235 (SFV numbering) in the El protein (157, 158, 161). The srf mutants discussed above have mutations within this region that enable them to function in a cholesteroldepleted environment. Analysis of single-site mutants have indicated that it is the structure of the entire region rather than specific amino acids that influence this effect on cholesterol dependence (161). Furthermore, analyses of these mutants have demonstrated that in the absence of cholesterol, the virus particles are highly unstable and dissociate under shear force, indicating that cholesterol plays a critical role in virus structure. Although their exact role is yet unidentified, the above studies have suggested that cholesterol and sphingolipid aid in the formation of proper lateral interactions among the spike subunits critical for efficient NC-E2 interactions, structural integrity and fusion competency. Lateral Interactions Between Glycoproteins Play a Role in Virus Budding
It has been proposed that the energy required for budding comes from the thermodynamics of burying the cdE2 sequences into the hydrophobic pocket on the surface of the NC (14-17). This interaction occurs 240 times, as each trimeric spike contacts three CPs. Several groups have indicated that lateral interactions between the glycoproteins also play an important role in the budding process (162, 163). von Bonsdorff and Harrison have shown that in the absence of interaction with the NCs, the glycoproteins form hexagonal arrays similar to those observed in the viral surface lattice (164). This indicates that the T=4 arrangement of the glycoproteins is not solely due to their interaction with a T=4 NC, but that lateral interactions among glycoproteins also play an important role in lattice formation. Hahn and colleagues have shown that a change of A344 to valine (ts103) in the ectodomain of E2 releases virus at a slower rate than the wild-type virus and produces multi-cored particles (165, 166). A revertant to this mutant maps to E1 with a K227 to methionine change in the ectodomain. The revertant, although still multi-cored, releases virus at a much faster rate, indicating that lateral interactions between E2 and El seems to support the budding process. Site-directed mutagenesis of the 6K protein has also indicated its involvement in these lateral interactions (105, 145, 167). Although, only 1 molecule of 6K is found per 10 molecules of E1-E2 heterodimers, the complete deletion of the protein causes severe defects in budding and thermolability of the virus particles. Single-site mutants in the 6K gene produced multi-cored particles that resembled the ts103 mutant (105, 145). A revertant to one of these mutants corrected for the aberrant particle formation and mapped to the ectodomain of E2 . Another set of mutants constructed by Barth and Garoff, suggested that deletion of the entire El protein disrupts the cdE2-NC interaction and prevents virus release (168). Furthermore, E2 in these mutants, although properly processed and transported to the plasma membrane, degraded rapidly, with a small fraction forming aggregates. These data indicated that an intimate interaction exists between the individual members of the glycoprotein complex and that changes in the interface between the E1, E2 and 6K proteins could affect the conformation of cdE2 and its ability to bind NCs.
106
T.L. TELLINGHUISEN ET AL.
In conclusion, virus release from the cell is dependent upon the interaction of the NC and the glycoproteins. This process is mediated through the specific interaction between the cdE2 and the surface of the NC. Specifically, a hydrophobic pocket in the CP bordered by Y180, W247 and F166, binds residues Y400 and L402 of cdE2. Other residues in cdE2 may play a role in positioning the cytoplasmic domain in an orientation favorable for the pocket-tail interaction. This positioning could occur through binding of these residues at a secondary, yet unidentified, site on the NC, or through the interaction of these residues with the lipid bilayer. Lateral interactions between the glycoproteins also play a significant role in NC-cdE2 binding and virus budding. CONCLUSIONS AND FUTURE PERSPECTIVES The data summarized in this chapter present the sum of our understanding of the assembly mechanism of the alphavirus virion. Significant gains have recently been made in the understanding of the mechanism of NC assembly through the use of an in vitro assembly system. The definition of a putative assembly intermediate, a nucleic acid bound dimer of the CP, has led to the development of a model of NC assembly based on polymerization of multiple dimeric CPs. Although significant portions of the assembly mechanism remain unclear, it should be possible, given the recent progress, to define the assembly process of the NC completely in the near future. A complete understanding of glycoprotein maturation and budding will require a high-resolution atomic structure of the components involved. Although the crystal structure of an alphavirus is not available, recent progress in solving the x-ray structures of viral proteins combined with higher resolution cryo-electron microscopy reconstructions will soon provide us with a detailed look at the molecular organization of this group of viruses. These results will be extremely informative in deciphering the vast amount of genetic and biochemical data already accumulated on virus assembly as well as providing a roadmap for future studies on structure and function. Despite the simple architecture of the alphavirus virion, the goal of a comprehensive understanding of the particle and its assembly, has yet to be achieved. ACKNOWLEDGMENTS This research was supported by Public Health Service grant GM56279 from the National Institutes of Health. T.L.T was supported, in part, by NIH Biophysics training grant (GM98296). The authors thank Dr. Alexander Gorbalenya for assistance with the sequence alignments. REFERENCES 1 2
3
Strauss, J. H. and Strauss, E. G. (1994) Microbiol. Rev. 58, 491-562. Griffin, D. E. (1986) in The Togaviridae and Flaviviridae, (Schlesinger, S. and Schlesinger, M. J., eds.), pp. 209-250, Plenum Publishing Corp., New York, NY. Kondor-Koch, C., Riedel, H., Soderberg, K. and Garoff, H. (1982) Proc. Nat. Acad. Sci. U.S.A. 79, 4525-4529.
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33
107
Kondor-Koch, C., Burke, B. and Garoff, H. (1983) J. Cell Biol. 97, 644-651. Singh, I. and Helenius, A. (1992) J. Virol. 66, 7049-7058. Strauss, E. G., Rice, C. M. and Strauss, J. H. (1984) Virology 133, 92-110. Hardy, W. R. and Strauss, J. H. (1988) J. Virol. 62, 998-1007. Froshauer, S., Kartenbeck, J. and Helenius, A. (1988) J. Cell Biol. 107, 20752086. Jones, K. J., Scupham, R. K., Pfeil, J. A., Wan, K., Sagik, B. P. and Bose, H. R. (1977) J. Virol. 21, 778-787. Ziemiecki, A., Garoff, H. and Simons, K. (1980) J. Gen. Virol. 50, 111-123. Rice, C. M. and Strauss, J. H. (1982) J. Mol. Biol. 154, 325-348. Söderlund, H. (1973) Intervirology 1, 354-361. Söderlund, H. and Ulmanen, I. (1977) J. Virol. 24, 907-909. Suomalainen, M., Liljeström, P. and Garoff, H. (1992) J. Virol. 66, 4737-4747. Lee, S., Owen, K. E., Choi, H. K., Lee, H., Lu, G., Wengler, G., Brown, D. T., Rossmann, M. G. and Kuhn, R. J. (1996) Structure 4, 531-541. Skoging, U., Vihinen, M., Nilsson, L. and Liljeström, P. (1996) Structure 4, 519529. Owen, K. E. and Kuhn, R. J. (1997) Virology 230, 187-196. Ekstrom, M., Liljeström, P. and Garoff, H. (1994) EMBO J. 13, 1058-1064. Forsell, K., Griffiths, G. and Garoff, H. (1996) EMBO J. 15, 6495-505. Harrison, S. C., Strong, R. K., Schlesinger, S. and Schlesinger, M.J. (1992) J. Mol. Biol. 226, 277-280. Fuller, S. D. (1987) Cell 48, 923-934. Paredes, A. M., Brown, D. T., Rothnagel, R., Chiu, W., Schoepp, R.J., Johnston, R. E. and Prasad, B. V. V. (1993) Proc. Nat. Acad. Sci. U.S.A. 90, 9095-9099. Cheng, R. H., Kuhn, R. J., Olson, N. H., Rossmann, M. G., Choi, H.-K., Smith, T. J. and Baker, T. S. (1995) Cell 80, 621-630. Fuller, S. D., Berriman, J. A., Butcher, S. J. and Gowen, B. E. (1995) Cell 81, 715-725. Mancini, E. J., Clarke, M., Gowen, B. E., Rutten, T. and Fuller, S. D. (2000) Mol. Cell 5, 255-266. Rice, C. M., Bell, J. R., Hunkapiller, M. W., Strauss, E. G. and Strauss, J. H. (1982) J. Mol. Biol. 154, 355-378. Metsikkö, K. and Garoff, H. (1990) J. Virol. 64, 4678-4683. Kail, M., Hollinshead, M., Ansorge, W., Pepperkok, R., Frank, R., Griffiths, G. and Vaux, D. (1991) EMBO J. 10, 2343-2351. Barth, B. U., Suomalainen, M., Liljeström, P. and Garoff, H. (1992) J. Virol. 66, 7560-7564. Zhao, H., Lindqvist, B., Garoff, H., von Bonsdorf, C. H. and Liljeström, P. (1994) EMBO J. 13, 4204-4211. Coombs, K., Brown, B. and Brown, D. T. (1984) Virus Res. 1, 297-302. Boege, U., Wengler, G., Wengler, G. and Wittman-Liebold, B. (1981) Virology 113, 293-303. Choi, H.-K., Tong, L., Minor, W., Dumas, P., Boege, U., Rossmann, M. G. and Wengler, G. (1991) Nature 354, 37-43.
108
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 52a 53 54 55 56 56a 57 58 59 60 61
T.L. TELLINGHUISEN ET AL.
Choi, H.-K., Lee, S., Zhang, Y.-P., McKinney, B. R., Wengler, G., Rossmann, M. G. and Kuhn, R. J. (1996) J. Mol. Biol. 262, 151-167. Hahn, C. S., Strauss, E. G. and Strauss, J. H. (1985) Proc. Nat. Acad. Sci. U.S.A. 82, 4648-4652. Hahn, C. S. and Strauss, J. H. (1990) J. Virol. 64, 3069-3073. Forsell, K., Suomalainen, M. and Garoff, H. (1995) J. Virol. 69, 1556-1563. Glanville, N. and Ulmanen, J. (1976) Biochem. Biophys. Commun. 71, 393-399. Wengler, G., Boege, U., Wengler, G., Bischoff, H. and Wahn, K. (1982) Virology 118, 401-410. Wengler, G., Wengler, G., Boege, U. and Wahn, K. (1984) Virology 132, 401412. Tellinghuisen, T. L., Hamburger, A. E., Fisher, B. R., Ostendorp, R. and Kuhn, R. J. (1999) J. Virol. 73, 5309-5319. Schlesinger, M. and Schlesinger, S. (1986) in The Togaviridae and Flaviviridae, (Schlesinger, S. and Schlesinger, M.J., eds.) pp. 121-148, Plenum Publishing Corp., New York, NY. Weiss, B., Nitschko, H., Ghattas, I., Wright, R. and Schlesinger, S. (1989) J. Virol. 63, 5310-5318. Weiss, B., Geigenmüller-Gnirke, U. and Schlesinger, S. (1994) Nucl. Acids Res. 22, 780-786. Owen, K. E. and Kuhn, R. J. (1996) J. Virol. 70, 2757-2763. Ulmanen, I, Söderlund, H. and Kääriäinen, L. (1976) J. Virol. 20, 203-210. Wengler, G. and Wengler, G. (1984) Virology 134, 435-442. Wengler, G., Würkner, D. and Wengler, G. (1992) Virology 191, 880-888. Garoff, H., Frischauf, A. M., Simons, K., Lehrach, H. and Delius, H. (1980) Proc. Nat. Acad. Sci. U.S.A. 77, 6376-6380. Wengler, G. (1987) Arch. Virol. 94, 1-14. Rossmann, M. G. and Johnson, J. E. (1989) Annu. Rev. Biochem. 58, 533-573. Geigenmüller-Gnirke, U., Nitschko, H. and Schlesinger, S. (1993) J. Virol. 67, 1620-1626. Perera, R., Owen, R.E., Tellinghuisen, T.L., Gorbalenya, A.E. and Kuhn, R.J. (2001) J. Virol. 75, 1-10. Coombs, K. and Brown, D. T. (1987) J. Mol. Biol. 195, 359-371. Coombs, K. and Brown, D. T. (1987) Virus Res. 7, 131-49. Coombs, K. M. and Brown, D. T. (1989) J. Virol. 63, 883-891. Tellinghuisen, T. L. and Kuhn, R. J. (2000) J. Virol. 74, 4302-4309. Tellinghuisen, T.L., Perera, R. and Kuhn, R.J. (2001) J. Virol. 75 (in press). Sefton, B. M. (1977) Cell 10, 659-668. Garoff, H., Kondor-Koch, C., Petterson, R. and Burke, B. (1983) J. Cell Biol. 97, 652. Garoff, H., Huylebroeck, D., Robinson, A., Tillman, U. and Liljeström, P. (1990) J. Cell Biol. 111, 867-76. Beltzer, J. P., Fiedler, K., Fuhrer, C., Geffen, L, Handschin, C., Wessels, H. P. and Spiess, M. (1991) J. Biol. Chem. 266, 973-978. Liu, N. and Brown, D. T. (1993) J. Cell Biol. 120, 877-883.
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
62 63 64 65 66 67 68
69 70 71 72
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
109
Liu, N. and Brown, D. T. (1993) Virology 196, 703-711. Welch, W. J. and Sefton, B. M. (1980) J. Virol. 33, 230. Hashimoto, K., Erdei, S., Keränen, S., Saraste, J. and Kääriäinen, L. (1981) J. Virol. 38, 34-40. Melancon, P. and Garoff, H. (1986) EMBO J. 5, 1551-1560. Liljeström, P. and Garoff, H. (1991) J. Virol. 65, 147-154. Migliaccio, G., Pascale, M. C., Leone, A. and Bonatti, S. (1989) Exp. Cell Res. 185, 203-16. Strauss, E. G. and Strauss, J. H. (1986) in The Togaviridae and Flaviviridae, (Schlesinger, S. and Schlesinger, M.J., eds.) pp. 35-90, Plenum Publishing Corp., New York, NY. Melancon, P. and Garoff, H. (1987) J. Virol. 61, 1301-1309. Erwin, C. and Brown, D. T. (1980) J. Virol. 36, 775. Strauss, E. G. and Strauss, J. H. (1985) in Virus Structure and Assembly, (Casjens, S., ed.) pp. 205-234, Jones and Bartlett Publishers, Boston, MA. Kinney, R. M., Johnson, R. J. B., Brown, V. L. and Trent, D. W. (1986) Virology 152, 400-413. de Curtis, I. and Simons, K. (1988) Proc. Nat. Acad. Sci. U.S.A. 85, 8052-8056. Bonatti, S., Migliaccio, G. and Simons, K. (1989) J. Biol. Chem. 264, 1259012595. Watson, D. G., Moehring, J. M. and Moehring, T. J. (1991) J. Virol. 65, 23322339. Scharer, C. G., Nairn, H. Y. and Koblet, H. (1993) Arch. Virol. 132, 237-254. Nairn, H. Y. and Koblet, H. (1988) Arch. Virol. 102, 73-89. Russell, D. L., Dalrymple, J. M. and Johnston, R. E. (1989) J. Virol. 63, 16191629. Presley, J. F. and Brown, D. T. (1989) J. Virol. 63, 1975-1980. Lobigs, M. and Garoff, H. (1990) J. Virol. 64, 1233-1240. Salminen, A., Wahlberg, J. M., Lobigs, M., Liljeström, P. and Garoff, H. (1992) J. Cell Biol. 116, 349-357. Dubuisson, J. and Rice, C. M. (1993) J. Virol. 67, 3363-3374. Heidner, H. W., McKnight, K. L., Davis, N. L. and Johnston, R. E. (1994) J. Virol. 68, 2683-2692. Davis, N. L., Brown, K. W., Greenwald, G. F., Zajac, A. J., Zacny, V. L., Smith, J. F. and Johnston, R. E. (1995) Virology 212, 102-110. Robbins, P. W., Hubbard, S. C., Turco, S. J. and Wirth, D. F. (1977) Cell 12, 893900. Schlesinger, S., Koyama, A. H., Malfer, C., Gee, S. L. and Schlesinger, M. J. (1985) Virus Res. 2, 139-149. Kornfeld, R. and Kornfeld, S. (1985) Annu. Rev. Biochem. 54, 631. Mayne, J. T., Bell, J. R., Strauss, E. G. and Strauss, J. H. (1985) Virology 142, 121-133. Davidson, S. K. and Hunt, L. A. (1985) J. Gen. Virol. 66, 1457-1468. Burke, D. J. and Keegstra, K. (1976) J. Virol. 20, 676-686. Rice, C. M. and Strauss, J. H. (1981) J. Mol. Biol. 150, 315-340.
110
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121
T.L. TELLINGHUISEN ET AL.
Hsieh, P., Rosner, M. R. and Robbins, P. W. (1983) J. Biol. Chem. 258, 25482554. Hsieh, P., Rosner, M. R. and Robbins, P. W. (1983) J. Biol. Chem. 258, 25552561. Pollack, L. and Atkinson, P. H. (1983) J. Cell Biol. 97, 293-300. Schmidt, M. F. G., Bracha, M. and Schlesinger, M. J. (1979) Proc. Nat. Acad. Sci, U.S.A. 76, 1687-1691. Schmidt, M. F. G. (1982) Virology 116, 327. Berger, M. and Schmidt, M. F. G. (1984) J. Biol. Chem 259, 7245-7252. Schmidt, M. F. G. and Schlesinger, M. J. (1980) J. Biol. Chem 255, 3334-3339. Schmidt, M., Schmidt, M. F.G. and Rott, R. (1988) J. Biol. Chem 263, 1863518639. Ryan, C., Ivanova, L. and Schlesinger, M. J. (1998) Virology 249, 62-67. Roman, L. M. and Garoff, H. (1986) J. Cell Biol. 103, 2607-2618. Gaedigk-Nitschko, K. and Schlesinger, M. J. (1991) Virology 183, 206-214. Ivanova, L. and Schlesinger, M. J. (1993) J. Virol. 67, 2546-2551. Gaedigk-Nitschko, K. and Schlesinger, M. J. (1990) Virology 175, 274-281. Gaedigk-Nitschko, K., Ding, M. and Schlesinger, M. J. (1990) Virology 175, 282291. Saraste, J., von Bonsdorff, C.-H., Hashimoto, K., Kääriäinen, L. and Keränen, S. (1980) Virology 100, 229-245. Kääriäinen, L., Virtanen, L, Saraste, J. and Keränen, S. (1983) Meth. Enzymol. 96, 453-465. Saraste, J. and Hedman, K. (1983) EMBO J. 2, 2001-2006. Saraste, J. and Kuismaner, E. (1984) Cell 38, 535-549. Arias, C., Bell, J. R., Lenches, E. M., Strauss, E. G. and Strauss, J. H. (1983) J. Mol. Biol. 168, 87-102. Lindqvist, B. H., DiSalvo, J., Rice, C. M., Strauss, J. H. and Strauss, E. G. (1986) Virology 151, 10-20. Presley, J. F., Draper, R. K. and Brown, D. T. (1991) J. Virol. 65, 1332-1339. Carleton, M. and Brown, D. T. (1996) J. Virol. 70, 5541-5547. Bergmann, J. E., Tokuyasu, K. T. and Singer, S. J. (1981) Proc. Nat. Acad. Sci. U.S.A. 78, 1746-1750. Bergeron, J. J., Kotwal, G. J., Levine, G., Bilan, P., Rachubinski, R., Hamilton, M., Shore, G. C. and Ghosh, H. P. (1982) J. Cell Biol. 94, 36-41. Green, J., Griffiths, G., Louvard, D., Quinn, P. and Warren, G. (1981) J. Mol. Biol. 152, 663-698. Griffiths, G., Brands, R., Burke, B., Louvard, D. and Warren, G. (1982) J. Cell Biol. 95, 781-792. Griffiths, G., Quinn, P. and Warren, G. (1983) J. Cell Biol. 96, 835-850. Quinn, P., Griffiths, G. and Warren, G. (1983) J. Cell Biol. 96, 851-856. Rindler, M. J., Ivanov, I. E., Plesken, H., Rodriguez-Boulan, E. and Sabatini, D. D. (1984) J. Cell Biol. 98, 1304-1319. Syväoja, P., Peränen, J., Suomalainen, M., Keränen, S. & Kääriäinen, L. (1990) Virology 179, 658-666.
GENETIC STUDIES ON THE ASSEMBLY OF AN ENVELOPED VIRUS
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
111
Doms, R. W., Lamb, R. A., Rose, J. K. and Helenius, A. (1993) Virology 193, 545-562. Carleton, M. and Brown, D. T. (1997) J. Virol. 71, 7696-7703. Huth, A., Rapoport, T. A. and Kääriäinen, L. (1984) EMBO J. 3, 767-771. Rice, C. M., Franke, C. A., Strauss, J. H. and Hruby, D. E. (1985) J. Virol. 56, 227-239. Reidel, H. (1985) J. Virol. 54, 224-228. Wen, D., Ding, M. and Schlesinger, M. J. (1986) Virology 153, 150-154. Lobigs, M., Wahlberg, J. M. and Garoff, H. (1990) J. Virol. 64, 5214-5218. Carleton, M., Lee, H., Mulvey, M. and Brown, D. T. (1997) J. Virol. 71, 15581566. Zhao, H. and Garoff, H. (1992) J. Virol. 66, 7089-7095. Nitsch, L., Tramontano, D., Ambesi-Impiombato, F. S., Quarto, N. and Bonatti, S. (1985) Eur. J. Cell Biol. 38, 57-66. Zurzolo, C., Polistina, C., Saini, M., Gentile, R., Aloj, L., Migliaccio, G., Bonatti, S. and Nitsch, L. (1992) J. Cell Biol. 117, 551-564. Strauss, E. G., Lenches, E. M. and Stamreich-Martin, M.A. (1980) J. Gen. Virol. 49, 297-307. Knipfer, K. W. and Brown, D. T. (1989) Virology 170, 117-122. Brown, D. T. and Smith, J. F. (1975) J. Virol. 15, 1262-1266. Birdwell, C. R., Strauss, E. G. and Strauss, J. H. (1973) Virology 56, 429-438. Brown, D. T., Waite, M. R. F. and Pfefferkorn, E.R. (1972) J. Virol. 10, 524-536. Smith, J. F. and Brown, D. T. (1977) J. Virol. 22, 662-678. Pavan, A., Lotti, L. V., Torrisi, M. R., Migliaccio, G. and Bonatti, S. (1987) Exp. Cell Res. 168, 53-62. Pavan, A., Covelli, E., Pascale, M. C., Lucania, G., Bonatti, S., Pinto da Silva, P. and Torrisi, M. R. (1992) J. Cell Sci. 102, 149-155. Strauss, E. G. (1978) J. Virol. 28, 466-474. Choi, H. K., Lee, S., Zhang, Y. P., McKinney, B. R., Wengler, G., Rossmann, M. G. and Kuhn, R. J. (1996) J. Mol. Biol. 262, 151-167. Lee, S., Kuhn, R. J. and Rossmann, M. G. (1998) Proteins 33, 311-317. Collier, N. C., Adams, S. P., Weingarten, H. and Schlesinger, M. J. (1992) Antivir. Chem. Chemother. 3, 31-36. Ivanova, L., Le, L. and Schlesinger, M. J. (1995) Virus Res. 39, 165-79. Lee, H., Ricker, P. D. and Brown, D. T. (1994) Virology 204, 471-474. Ryan, C., Ivanova, L. and Schlesinger, M. J. (1998) Virology 243, 380-387. Hahn, C. S., Lustig, S., Strauss, E. G. and Strauss, J. H. (1988) Proc. Nat. Acad Sci. U.S.A. 85, 5997-6001. Lopez, S., Yao, J.-S., Kuhn, R. J., Strauss, E. G. and Strauss, J. H. (1994) J. Virol. 68, 1316-1323. White, J. and Helenius, A. (1980) Proc. Nat. Acad. Sci. U.S.A. 77, 3273-3277. Kielian, M. C. and Helenius, A. (1984) J. Virol. 52, 281-283. Nieva, J. L., Bron, R., Corver, J. and Wilschut, J. (1994) EMBO J. 13, 2797-2804. Moesby, L., Corver, J., Erukulla, R. K., Bittman, R. and Wilschut, J. (1995) Biochemistry 34, 10319-10324.
112
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
T.L. TELLINGHUISEN
ET AL.
Kielian, M. C. and Helenius, A. (1985) J. Cell Biol. 101, 2284-2291. Kielian, M., Jungerwirth, S., Sayad, K. U. and DeCandido, S. (1990) J. Virol. 64, 4614-4624. Wilschut, J., Corver, J., Nieva, J. L., Bron, R., Moesby, L., Reddy, K. C. and Bittman, R. (1995) Mol. Membr. Biol. 12, 143-149. Marquardt, M. T., Phalen, T. and Kielian, M. (1993) J.Cell Biol. 123, 57-65. Vashishtha, M., Phalen, T., Marquardt, M. T., Ryu, J. S., Ng, A. C. and Kielian, M. (1998) J. Cell Biol. 140, 91-99. Eidelman, O., Schlegel, R., Tralka, T. S. and Blumenthal, R. (1984) J. Biol. Chem. 259, 4622-4628. Lu, Y. E. and Kielian, M. (2000) J. Virol. 74, 7708-7719. Lu, Y. E., Cassese, T. and Kielian, M. (1999) J. Virol. 73, 4272-4278. Strauss, J. H., Strauss, E. G. and Kuhn, R. J. (1995) Trends Microbiol. 3, 346-350. Garoff, H., Hewson, R. and Opstelten, D.- J. E. (1998) Microbiol. Mol. Biol. Rev. 62, 1171-1190. von Bonsdorff, C. H. and Harrison, S. C. (1975) J. Virol. 16, 141-145. Strauss, E. G., Birdwell, C. R., Lenches, E. M., Staples, S. E. and Strauss, J. H. (1977) Virology 82, 122-149. Hahn, C. S., Rice, C. M., Strauss, E. G., Lenches, E. M. and Strauss, J. H. (1989) J. Virol. 63, 3459-3465. Loewy, A., Smyth, J., von Bonsdorff, C. H., Liljeström, P. and Schlesinger, M. J. (1995) J. Virol. 69, 469-475. Barth, B. U. and Garoff, H. (1997) J. Virol. 71, 7857-7865.
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY
Margaret E. Black Department of Pharmaceutical Sciences P. O. Box 646534 Washington State University Pullman, WA 99164-6534
INTRODUCTION In 1986, Moolten proposed that genetic modification of tumor cells could impart a novel drug sensitivity that would then allow cell killing by exposure to a prodrug (1). Two primary therapeutic or "suicide" genes comprise the majority of all clinical gene therapy trials for cancer; thymidine kinase from Herpes simplex virus type 1 and cytosine deaminase from E. coli. Both of these enzymes are absent in human cells and contain unique activities. While significant advances have been made over the last decade to improve the delivery and specific expression of suicide genes, much improvement remains to be achieved before this therapy becomes mainstay. One focal point for improving suicide gene therapy is with respect to the gene itself. With enzyme engineering and pathway engineering, novel genes have been created that demonstrate dramatic therapeutic effects in the presence of the appropriate prodrug. In this chapter, two suicide genes (HSV-1 thymidine kinase and cytosine deaminase) are discussed regarding the unique biochemical features and activities that make them excellent candidates for suicide gene therapy. While these enzymes both demonstrate compelling results in preclinical evaluations, there are drawbacks that restrict their efficacy in vivo. A discussion of these limitations is presented as well as an overview of technologies used to improve these genes and results obtained from mutagenesis and analysis of the HSV-1 thymidine kinase mutants. Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
113
114
M.E. BLACK
HSV-1 Thymidine Kinase Unlike the majority of prokaryote and virus-encoded nucleoside metabolizing enzymes that display similar activities to their eukaryotic counterparts, thymidine kinases from Herpes viruses are structurally and functionally distinct (Figure 1). The thymidine kinase (TK) from Herpes simplex virus type I (HSV-1) has been extensively studied and is the most widely used TK to date. Structurally, HSV-1 TK is a ~90 kDa homodimer composed of 45 kDa monomeric subunits (2). This contrasts with the ~80 kDa homotetrameric enzymes (4 x 20 kDa) of all non-herpes sources (3). While non-herpes TKs are strictly thymidine kinases, able to phosphorylate only thymidine and closelyrelated analogs (4), the TK from HSV-1 is multifunctional. HSV-1 TK can phosphorylate thymidine, thymidylate (dTMP), deoxycytidine and a variety of pyrimidine and guanosine analogs (5-9). HSV-1 TK is also much less sensitive to feedback inhibition by dTTP than other thymidine kinases (10). From a clinical perspective, of particular importance is the enzyme’s ability to bind to and phosphorylate guanosine analogs. Such biochemical features have made HSV-1 TK the target for antiviral drug development and more recently as a suicide agent for gene therapy of cancers.
The relaxed substrate specificity displayed by HSV-1 TK is the basis for the use of nucleoside analogs in the treatment of herpetic infections. Antiviral drugs such as acyclovir (ACV) and ganciclovir (GCV) are guanosine analogs that are selectively phosphorylated by HSV-1 TK (Figure 2). A second phosphorylation step is carried out by the cellular guanylate kinase (11). After phosphorylation to the triphosphate by nucleoside diphosphokinases, the analogs are incorporated into DNA and prevent nascent DNA synthesis (12).
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY
115
The ability to phosphorylate guanosine analogs effectively makes Herpes simplex virus type I (HSV-1) thymidine kinase (TK) particularly attractive for gene therapy. In suicide gene therapy applications (Figure 3), the suicide gene is introduced into tumor cells and the individual treated systemically with the appropriate prodrug. With HSV-1 TK as the suicide gene, the prodrug GCV is phosphorylated by the suicide gene thereby providing the substrate for further conversion of the prodrug to an active drug by endogenous enzymes (11). In the triphosphate state, GCV acts like a chain terminator to inhibit DNA synthesis and causes the cells subsequently to undergo apoptosis (12, 13). In 1992, Culver et al. demonstrated complete tumor regression in a xenograft mouse model of glioma using the HSV-1 TK/GCV approach (14). Today many ongoing clinical trials utilize the HSV-1 TK/GCV paradigm for a wide variety of different cancers.
Critical to the success of suicide genes to ablate tumors is a phenomenon generally known as the bystander effect (13, 15-18). Given that the best delivery systems are able to introduce genes into only 10-30% of the target cells, coupled with the observation that the entire tumor regresses, the ability of suicide gene-expressing cells to induce death to those non-suicide gene-containing-cells, the bystander effect is a potent and essential factor in the success of this type of therapy. The most extensively studied bystander effect is that which occurs with the introduction of HSV-1 TK and GCV treatment. Once phosphorylated by TK, GCV travels into neighboring cells via gap junctions and prevents DNA replication (15). Another component of the TK/GCV bystander effect consists of an immune–mediated response to the presence of a foreign antigen. The immunosuppressive action of GCV precludes its use as a fully effective prodrug. In addition, it is thought that phosphorylated GCV and/or TK itself is transported to neighboring cells via apoptotic vesicles derived from dying cells.
116
M.E. BLACK
Currently over 45 clinical trials use HSV-1 TK in combination with GCV. Only two phase III clinical trials with the use of gene therapy are in progress: one of these uses HSV-1 TK/GCV as a therapeutic modality. The TK/GCV system is also being extended beyond cancer to treat a variety of diverse indications including restenosis (19), AIDS (20) and graft versus host disease (GVHD)(21,22) and is being used in tumor imaging systems (23). Despite the broad use of HSV-1 TK and GCV, there are significant drawbacks. One such problem is associated with the enzyme itself. GCV is very inefficiently phosphorylated by the wild-type HSV-1 TK compared to the native substrate, thymidine (5, 24, 25). This requires that a high dose of GCV be administered to achieve a therapeutic effect, one that is immunosuppressive and is associated with toxic side-effects. A widely used anti-herpetic drug, acyclovir (ACV), is non-toxic even at doses greater than that given with GCV and therefore, is a potential prodrug. However, HSV-1 TK displays an even worse for ACV than it does for GCV (5, 24) and has been shown not to function in suicide gene therapy presumably because the affinity for ACV is so poor. Novel HSV-1 TKs with improved specificity towards GCV and/or ACV could enhance tumor cell killing without increased toxicity and, if ACV is used, one with potentially no side-effects. Another approach to improve the efficacy of the HSV-1 TK/GCV system is to combine HSV-1 TK with guanylate kinase, the second enzyme in the prodrug activation pathway. Kinetic parameters suggest that the endogenous guanylate kinase creates a bottleneck once GCV has been initially phosphorylated by HSV-1 TK. Preliminary studies of a fusion protein containing combined HSV-1 TK and guanylate kinase activities suggest that the fusion enzyme significantly improves cell sensitivity to GCV (unpublished data). Cytosine Deaminase Cytosine deaminase (CD) from bacteria or yeast is also being investigated for gene therapy applications due to the absence of this gene in humans and the enzyme’s ability to deaminate 5-fluorocytosine (5FC) to produce 5-fluorouracil (5FU), a potent antimetabolite (26-30). In non-mammalian cells, cytosine deaminase is responsible for the deamination of cytosine to uracil (Figure 4). The prodrug, 5-fluorocytosine, is commonly used as an antifungal drug in the treatment of candidiasis (31). The product of 5FC deamination, 5FU, and its deoxyribonucleoside, fluorodeoxyuridine (FUdR), are potent inhibitors of DNA synthesis and are widely used in cancer treatment (31). Phosphorylation of FUdR by the endogenous thymidine kinase results in the production of the irreversible inhibitor of thymidylate synthase, 5FdUMP (Figure 4). In addition, 5FU can be incorporated into RNA by salvage pathways and interfere with the function of mRNAs. The bystander effect is also a key component to the success of cytosine deaminase/5FC for gene therapy although the mechanism of the bystander effect differs from that associated with the HSV-1 TK/GCV system due to the biophysical nature of the activated prodrug, 5FU (17). Because 5FU is small and uncharged, it can pass freely through cell membranes such that cell types with limited gap junctions will be affected by
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY
117
5FC treatment. This also means that 5FU can enter the blood stream and affect normal cells distant to the tumor site. Considerably less work has been done with CD for use in gene therapy than with HSV-1 TK and only a few early phase clinical trials with CD are ongoing. One difficulty has been the poor activity of the bacterial enzyme for 5FC = 17.9 mM) (29). In 1997, the sequence of the Saccharomyces cerevisiae cytosine deaminase was reported (32, 33). While the yeast CD displays a much lower for 5FC it is very thermolabile in comparison to the bacterial CD (29). Not only do these enzymes appear functionally distinct, but they are also structurally quite different from each other, specifically in regard to the primary amino acid sequence, monomeric size and subunit number (32, 33) (Figure 5). 5FC lacks significant side-effects when used in humans (31) but the overall inefficiency of gene delivery indicates that one limitation to the bacterial or yeast CD/5FC system is kinetic. While the yeast enzyme may have a greater affinity for 5FC, instability of the protein suggests that it is less than ideal.
118
M.E. BLACK
Both cytosine deaminase and thymidine kinase activities are used in the conversion pathway of 5FC to FdUMP. While phosphorylation of one intermediate, FUdR, is normally performed by endogenous thymidine kinase, the presence of HSV-1 TK can enhance both metabolic suicide and radiosensitivity in tumor cells when both 5FC and GCV are administered (34-37). This synergistic double suicide gene therapy approach has been examined to date only with the bacterial CD. Given that the yeast CD is reported to have a lower for 5FC, it is likely to provide an even greater enhancement than the bacterial enzyme. Furthermore, a combined therapy is more likely to overcome problems associated with drug resistance than a single gene therapy/prodrug modality. Random Sequence Mutagenesis The identification of amino acid residues by virtue of the types of amino acid residues that can replace the native residue and maintain function, is one approach to understanding the structure-function relationships between enzymes and substrates. While structures obtained by x-ray crystallography provide snapshots of the interactions within the enzyme, they do not reflect the dynamics of conformational changes that may occur during substrate binding and catalysis. Additional approaches are required to determine the catalytic mechanisms and to define the potential of individual amino acid residues in substrate or analog interactions. Random sequence mutagenesis in concert with positive genetic selection provides a powerful method to generate large numbers of functional mutant enzymes, to evaluate the diversity of individual residues that can occur throughout the target region and to identify variants that display desired attributes (3840). A major strength of random sequence selection is the direct application to protein structure-function studies. Implementation of this technique is especially informative when little structural information is known about the protein of interest. It is important to note that the three-dimensional structure of only a very few proteins has been established. Furthermore, if one has information on catalytically active regions one can restrict random mutagenesis to these regions and perhaps devise strategies to select for new functions. Knowledge of the structure to function relationship of an enzyme can serve as the basis for creating enzymes with altered or novel functions. Indeed random sequence mutagenesis methods can be used not only for structure-function studies but also to generate such new functions. One of the major difficulties in this task is to evaluate the contribution of different amino acids and their placement to the function of a protein. At some positions side chain characteristics are essential for function; amino acids lacking these side chains cannot substitute (41, 42). In contrast, at other positions a wide variety of different residues may be tolerated at sites that are not necessary for structure or function (41). Positions with stringent requirements for specific residues are likely to contribute significantly to the structure and/or function of the protein and thus have high informational content. A primary means of identifying residues with high informational content is by the use of mutant proteins. Random sequence mutagenesis and selection utilizes oligonucleotides randomized at several contiguous or noncontiguous codons to introduce
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY
119
all possible residues simultaneously at specific positions into a protein sequence (42-44). Recombinant molecules are used to transform E. coli and functional clones are identified on the basis of positive genetic complementation. A comparison of the nucleotide sequences among the active clones is used to evaluate the importance of each position and each amino acid in providing catalytic activity. Creation of a library with the potential to encode all possible residue permutations at a single codon or at multiple sites and identification of the altered residues from active clones is a rapid and efficient means of determining the informational content of different amino acids. In cases where there is little or no information regarding the position of the active site, techniques such as errorprone PCR and DNA shuffling have been developed and used with great success to introduce random mutations throughout the coding region of a gene (45-48). Furthermore, both of these methods provide the opportunity to select for genes with novel functions. We have used random sequence mutagenesis and DNA shuffling in combination with genetic selection to study HSV-1 TK and have obtained more than 5000 active mutants. The properties of newly evolved TKs include increased enzyme activity, altered thermostability, altered substrate specificities and enhanced ability to phosphorylate a variety of nucleoside analogs (38, 49, 50, unpublished data). In the following sections, we describe specific techniques for creating a random sequence library including preparation of the random oligonucleotides and the vector, biological selection and analysis of active mutants. General Procedure The method we use for constructing inserts containing random nucleotide sequences employs two overlapping deoxyoligonucleotides with sequences that code for restriction endonuclease sites at each end (see Figure 6 for overview, 38 for review). One or both of the oligonucleotides contain random sequences. To construct a library of random sequences these oligonucleotides are annealed, extended with the large fragment of E. coli DNA polymerase I (Klenow fragment), amplified by the polymerase chain reaction (optional), restricted with the appropriate endonucleases and cloned into the restricted, purified vector of choice. After transformation of the appropriate bacterial strain either by electroporation or other high efficiency transformation procedure, the transformants are plated onto both antibiotic and selective media. For example, in our random sequence selection for mutants within the putative nucleoside binding site of the Herpes simplex virus type 1 thymidine kinase (TK) gene, transformed cells were plated onto 2 X YT + carbenicillin plates for determination of the number of transformants, and onto TK selection + carbenicillin plates which allows growth of only functional TK mutants (44). The number of transformed colonies on the antibiotic plates represents the total number of transformants and, with the number of colonies on the selective plates, allows one to calculate the frequency of active transformants. DNA from the "active" colonies is isolated, sequenced and the corresponding amino acid change(s) deduced. By sequencing a number of plasmids isolated from bacteria that grow under selective conditions one can compile a spectrum of all the amino acid changes at each position within the randomly mutagenized segment. Any position that is permissive for a variety of amino acid residues suggests that the wild-type residue at that position is deemed not essential or important for function (41). If no residues other than the wild-type residue
120
M.E. BLACK
are tolerated, then that position and particular residue are likely to be essential (41). Further analysis of such mutants will aid in identifying how particular residues are involved in catalysis, structural integrity, etc. This technique allows one to identify important residues rapidly without protein purification or extensive biochemical or biophysical analysis. Furthermore, this technique can be used to create mutants and to select for variants with specific novel or altered activities. Improving HSV-1 TK for Gene Therapy In the following section we describe the creation, selection and evaluation of novel HSV-1 thymidine kinase mutants. Several different approaches were taken to generate and identify mutants with improved phosphorylation activities towards the nucleoside analogs ACV and GCV as well as to define residues of structural and functional importance. In the absence of a crystal structure or experimental data to elucidate the substrate binding site, we sought initially to create numerous single amino acid substitutions. From these studies (44, 50, 51), it was apparent that highly conserved (immutable) residues were poor targets since only a few, if any, residues could substitute for the native
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY
121
residues. Our overriding goal is to create HSV-1 TK mutants with improved activities of GCV and/or ACV. Towards that end, we selected six codons for mutagenesis based on earlier mutational structure-function studies and amino acid alignments (44, 49-51). To create mutants that display major alterations in substrate binding, we rationalized that introducing multiple amino acid substitutions was more likely to generate enzymes with the desired activities. Therefore, all six codons were completely randomized. The total number of possible amino acid substitutions generated in this library is or (if one disregards the possibility that multiple codons may encode a particular amino acid). Clearly, this is an impossible number of clones to screen if one were to make them all by site-directed mutagenesis or to assay them individually. This factor makes positive genetic selection a critical component of this technique.
LIF-ALL Library Mutant Analysis A plasmid library consisting of over a million TK mutants was constructed by random sequence mutagenesis and used to transform a E. coli (49). Because the codons targeted for mutagenesis correspond to residues L I F at positions 159-161 and residues A L L at positions 168-170, it was designated as the LIF-ALL library. Any functional mutant complementing the tk deficient phenotype allowed the bacteria to grow. Only a very small subset of the library expressed active mutants - 426 clones or 0.039% of the entire library. All of these functioned with thymidine as the substrate. The next step was to determine which of the 426 mutants displayed an increased affinity towards GCV and/or ACV. All 426 mutants were grown to log phase, diluted and spread onto selective plates containing GCV or ACV plus a low level of thymidine. The wildtype TK grows on these plates by virtue of a higher affinity for thymidine than for the prodrug. Any mutant with an improved affinity for the prodrug or a reduced affinity for thymidine will be sensitive to prodrug-mediated cell death. Several rounds of prodrug plates were used to identify the high prodrug affinity mutants. Each round of selection was done with lower prodrug concentrations than the previous round to select for mutants with higher prodrug specificity. A total of 26 mutants were shown to be sensitive to GCV and 54 were shown to be sensitive to ACV on negative selection plates. Only six demonstrated increased sensitivity to both GCV and ACV (49). From this library, ten individual mutants were identified on the basis of activity towards the nucleoside analogs, ganciclovir and acyclovir, with the use of enzymes synthesized in rabbit reticulocyte lysates (49). These TK mutants contain three to six amino acid substitutions within the active site. All 10 TK mutants were evaluated in vitro with a mammalian expression vector that allows selection of clones by histidinol resistance and a secondary selection by cell sorting for GFP expression (49, 52, 53). In that study, eight of the ten mutants were shown to confer enhanced sensitivity to BHK transfected cells in the presence of ganciclovir (GCV) and acyclovir (ACV). In the presence of GCV, cell ablation studies with the six mutants showed three mutant TKs (30, 75 and 340) with decreased values. One mutant, 75, displayed a 43-fold reduction in compared to wild-type TK- expressing cells. Others displayed dramatic reductions in when transfectants were incubated in the presence of ACV. Mutant TK-expressing cells demonstrated between 1.4- and 18-fold lower than wild-type TK-expressing transfectant pools. Clearly, this indicates the potential use of
122
M.E. BLACK
clinically relevant doses of ACV with several of the mutant TKs described here for ablative gene therapy. While we sought to understand the biochemical nature of the changes in substrate specificities conferred by the multiple amino acid changes, the differences in prodrug sensitivity values cannot be correlated with the differences in the Michaelis constants determined (53, unpublished data). At a minimum, TK must contend with competitive substrates (thymidine, dTMP and deoxycytidine), feedback inhibition by dTTP and factors that dictate protein longevity, e.g., sensitivity to proteases and protein stability. Characterization of the kinetic parameters of two mutants suggested two distinct mechanisms for improving prodrug-mediated cell killing (53, data not shown). Although the values determined do not directly relate to the changes in prodrug sensitivities conferred to cells expressing the mutant TKs, they do reflect the result of multiple amino acid changes at the active site. An overriding factor is that the majority of these mutants display a poorer for thymidine. One exception to this is mutant 75. Mutant 75 exhibits close to wild-type for thymidine and apparently achieves improved prodrug mediated cell killing due to 5- and 18- fold reduced for GCV and ACV, respectively. In BHK cells, mutant 75 demonstrated a 43- and 20-fold increased sensitivity to GCV and ACV, respectively (49). However, in rat C6 glioma cells, mutant 75 is unable to exhibit the level of sensitivity previously observed. Conversely, mutant 30, which has a very high for thymidine compared to wild-type TK displays greater sensitivity to both ACV and GCV in both cell lines (53). One explanation as to why these mutants function as superior suicide genes is an ability to reduce the competition between the normal substrates and prodrug. In contrast to our results indicating that mutant 75 expressing rat C6 glioma cells responds similarly to the wild-type TK, Valerie et al. have recently demonstrated that, in the presence of ACV, mutant 75 enhances apoptosis in RT2 glioma cells compared to the wild-type TK (54). Using adenoviral vectors, wild- type or mutant 75 thymidine kinases were used to transduce RT2 glioma cells. Cells expressing mutant 75 were found to be much more sensitive to ACV than cells expressing wild-type TK. When adenoviral-mutant 75 transduced cells were exposed to ACV and radiation, they were dramatically more sensitive than wild-type TK expressing cells exposed to the same treatment. Studies with mutant 75 have so far only been performed in cell culture experiments. Because of solubility issues associated with GCV and its inability to traverse the blood brain barrier effectively, ACV and mutant 75 are proposed to perform better in vivo than either the wild-type TK with ACV or GCV (54). In xenograft tumor studies, mutant 30 demonstrates enhanced rat C6 glioma tumor growth reduction in the presence of both GCV and ACV compared with wild-type TK (53). Growth of mutant 30 expressing tumors (rat C6 gliomas) is restricted by GCV at a dose at least 10-fold lower than one that impedes growth of wild type expressing tumors. Furthermore, bystander experiments, in which cells containing either TK (or mutant 30) or vector were mixed at different ratios prior to tumor seeding, a substantial tumor growth restriction was observed with GCV when only 20% of the tumor cell population expressed mutant 30, whereas no restriction in tumor growth was seen in tumors bearing the wild-type TK gene under the same conditions. Mutant 30 has also been tested in breast (MCF-7, MDA-MB231), brain (SKI-1, SKMG-4), colon (HT-29), cervix (HeLa S3), ovarian (PA1) and pancreatic (Panc89, PanTuI, Colo357) tumor cell
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY
123
lines for enhanced sensitivity to GCV (53,55,56). In all cases, mutant 30 displayed increased sensitivity to GCV as compared to wild-type TK expressing tumor cell lines. Furthermore, a 13-fold reduction in tumor volume was observed in a xenograft pancreatic tumor model with pseudotype retrovirus containing mutant 30 transduced PancTuI tumors (56). Three of the seven mice in this treatment group displayed complete responses. A bystander effect was also observed with mutant 30 but not with wild-type TK (53). Clearly, mutants 30 and 75 are promising candidates for suicide gene therapy clinical trials. Semi-Random Library
Above we described the creation and selection of mutant TKs that conferred enhanced prodrug sensitivity from a random sequence derived library of over a million members (LIF-ALL library). Because the library was constructed to be randomized 100% at six codons, one would expect the diversity of this library to be on the order of Therefore, with a library size of one million variants, only a small sampling of the molecular diversity of the first library was examined. As a way to tap into the rest of the diversity pool, the randomized oligonucleotides used to create this library were only partially randomized (semi-random) in order to skew the amino acid substitutions to represent a specific subset of residues. This subset includes those found in mutants 30 and 75 as well as any predominant residue observed in the other top ten mutants from the LIF-ALL library. In essence, this second generation library is a scrambling of the substitutions that yielded the desired activities from the LIF-ALL library with the anticipation that novel combinations with greater activities would be generated (57). The total number of possible amino acid combinations found in this library is 512. To ensure that all mutants were present multiple times, a total of 12376 clones were plated onto TK selection plates. A total of 1717 colonies were TK positive indicating that 13.9% of the 512 possible substitutions yielded detectable levels of enzyme activity. Therefore, approximately 71 amino acid variations from the library of 512 yield active enzymes. Small colonies (~140) were preferentially picked because we had observed from previous libraries that there is a general correlation between colony size and the ability to utilize thymidine. We suggest that mutants with high for thymidine, such as mutant 30, are at a distinct kinetic advantage to phosphorylate prodrugs, presumably due to reduced competition with thymidine for the active site and less feedback inhibition by dTTP which binds at the thymidine site. To identify the diversity of substitutions that provide functional activity, 64 mutants were sequenced (57). All mutants were novel combinations and contained three to five amino acid substitutions. Of the 64 sequences, 12 were duplicates and 3 were represented three times. This level of multiple representatives indicates that most, if not all of the 512 members of this library were screened. To determine the level of sensitivity to ACV and GCV of these mutants, 120 clones were picked into TK selection broth and after overnight incubation, diluted and spread onto TK selection plates containing different concentrations of prodrug as described (49). Approximately 30 mutants conferred increased sensitivity to at least one of the prodrugs. Crude lysates were made from the bacterial cultures and directly assayed for thymidine, GCV or ACV phosphorylation. Seven mutants with the desired
124
M.E. BLACK
activities were identified and completely sequenced. All contained multiple amino acid substitutions. To evaluate the seven semi-random (SR) mutants in vitro, mammalian expression vectors were constructed and used to transfect rat C6 glioma cells. Pools of bulk sorted transfectants expressing TK were then assayed for the level of sensitivity to GCV or ACV as compared to wild-type TK transfected C6 cells. Mutants SR11, SR26 and SR39 show a similarly high degree of sensitivity to both GCV and ACV. Mutant SR39 displays a 2500-fold lower (GCV) than wild-type TK. Because of this, SR39 was chosen for evaluation in a mouse xenograft tumor model. To examine the tumor ablative activity of SR39 in the presence of GCV or ACV, mice (n = 5) were injected with rat C6 cells (SR39 transfectant pools). After five days to allow for tumor formation the mice were injected intraperitoneally twice a day for 5 days with saline (control), GCV at 0.5 mg/kg or 5 mg/kg, or ACV at 5 mg/kg or 25 mg/kg. During the prodrug dosing period and for an additional 10 days, tumor size was monitored by caliper measurement every other day. Tumors expressing wild-type TK continued to grow over the duration of the experiment but were significantly smaller in the GCV at 5 mg/kg and ACV at 25 mg/kg treatment groups compared to saline controls. A statistical comparison of the prodrug treated tumors expressing wild-type TK and mutant SR39 shows a significant size difference in all four prodrug treatment groups. Of particular note is the lack of tumor growth in ACV-treated SR39 tumors. This is the first in vivo demonstration that ACV could function as an effective prodrug for suicide gene therapy of cancer. Furthermore, doses of GCV required to limit or ablate tumors are 10- to 100-fold lower than others reported to be necessary with the wild-type TK. The extent of the bystander phenomenon was also examined in the xenograft model. Mixtures of vector to TK-expressing cells (TK or SR39) were injected into nude mice and treated as described above with GCV at 5 mg/kg or ACV at 25 mg/kg. Ratios of 80:20 and 95:5 were used such that only 20% or 5% of the cells expressed TK or SR39. Treatment with ACV indicated a slight difference in tumor growth. However, treatment with GCV retarded tumor growth when SR39 expressing cells were present at 20%. No impairment of tumor growth was observed with wild-type TK-expressing cells at the same GCV concentration. Not only does mutant SR39 confer increased sensitivity to GCV and ACV in transfected tumors but it also displays an effective bystander effect at low doses of GCV. Such a mutant could provide significant benefit to ablative gene therapy protocols. CONCLUSIONS While some work remains to be done to establish that these new TK variants function firmly as super suicide genes in vivo, it is important to recognize that they contain multiple amino acid substitutions and would likely never be made by site-directed mutagenesis. Clearly, these results demonstrate the power of random sequence mutagenesis to tailor enzymes such as cytosine deaminase and HSV-1 thymidine kinase for gene therapy of cancers. Mutant HSV-1 TKs also hold promise for use in a wide variety of other applications including as a negative selectable marker for homologous recombination events for cell lineage ablation investigations and for tumor imaging by positron emission tomography. Random sequence mutagenesis coupled with positive
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY
125
genetic selection is a powerful means not only to investigate the contributions of a particular residue to function but also for the generation of enzymes with novel functions for numerous biomedical applications. REFERENCES 1 2
3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23
Moolten, F.L. (1986) Cancer Res. 46, 5276-5281. Sanderson, M.R., Freemont, P.S., Murthy, H.M.K., Krane, J.F., Summers, W.C. and Steitz, T.A. (1988) J. Mol. Biol. 202, 917-919. Black, M.E. and Hruby, D.E. (1990) Biochem. Biophys. Res. Comm. 169, 10801086. Black, M.E. and Hruby, D.E. (1991) Rev. Med. Virol. 1, 235-245. Balzarini, J., Bohman, C. and De Clerq, E. (1993) J. Biol. Chem. 268, 6332-6337. Balzarini, J., Bohman, C., Walker, R.T. and de Clercq, E. (1994) Molec. Pharm. 45, 1253-1258. Alrabiah, F.A. and Sacks, S.L. (1996) Drugs 52,17-32. Cameron, J.M. (1993) Rev. Med. Virol. 3, 225-236. Faulds, D. and Heel, R.C. (1990) Drugs 39, 597-638. Black, M.E. and Hruby, D.E. (1992) J. Biol. Chem. 267, 9743-9748. Miller, W.H. and Miller, R.L. (1980) J. Biol. Chem. 255, 7204-7207. Reardon, J.E. (1989) J. Biol. Chem. 264, 19039-19044. Freeman, S.M., Abboud, C.N., Whartenby, K.A., Packman, C.H., Koeplin, D.S., Moolten, F.L. and Abraham, G.N. (1993) Cancer Res. 53, 5274-5283. Culver, K.W., Ram, Z., Wallbridge, S., Ishii, H., Oldfield, E.H. and Blaese, R.M. (1992) Science 256, 1550-1552. Bi, W.L., Parysek, L.M., Warnick, R. and Stambrook, P.J. (1993) Hum. Gene Ther. 4, 725-731. Colombo, B.M., Benedetti, S., Ottolenghi, S., Mora, M., Polio, B., Poli, G. and Finocchiaro, G. (1995) Hum. Gene Ther. 6, 763-772. Kuriyama, S., Masui, K., Sakamoto, T., Nakatani, T., Kikukawa, M., Tsujinoue, H., Mitoro, A., Yamazaki, M., Yoshiji. H., Fukui, H., Ikenaka, K., Mullen, C.A. and Tsujii, T. (1998) Anticancer Res. 18, 3399-3406. Dilber, M.S., Abedi, M.R., Christensson, B., Bjorkstrand, B., Kidder, G., Naus, C.C.G., Gahrton, G. and Smith, C.I.E. (1997) Cancer Res. 57, 1523-1528. Ohno. T., Gordon, D., San, H., Pompili, V.J., Nabel, G.J. and, Nabel, E.G. (1994) Science 265, 781-784. Caruso, M., Salomon, B., Zhang, S., Brisson, E., Clavel, F., Lowry, I. and Klatzmann, D. (1995) Virology 206, 495-503. Cohen, J.L., Boyer, O., Salomon, B., Onclercq, R., Charlotte, F., Brunei, S., Boisserie, S. and Klatzmann, D. (1997) Blood 89, 4636-4645. Helene, M., Lake-Bullock, V., Bryson, S.J., Jennings, C.D. and Kaplan, A.M. (1997) J. Immun. 158, 5079-5082. Gambhir, S.S., Bario, J.R., Wu, L., lyers, M., Namavari, M., Satyamurthy, N., Bauer, E., Parrish, C., MacLaren, D.C., Borghei, A.R., Green, L.A., Sharfstein, S., Berk, A.J., Cherry, S.R., Phelps, M.E. and Herschman, H.R. (1998) J. Nucl. Med. 39, 2003-2011.
126
M.E. BLACK
24 Field, A.K., Davies, M.E., DeWitt, C., Perry, H.C., Liou, R., Germershausen, J., Karkas, J.D., Ashton, WT., Johnston, D.B.S. and Tolman, R.L. (1983) Proc. Nat. Acad. Sci. U.S.A. 80, 4139-4143. 25 Munir, K.M., French, D.C. and Loeb, L.A. (1993) Proc. Nat. Acad. Sci. U.S.A. 90, 4012-4016. 26 Austin, E.A. and Huber, B.E. (1992) Molec. Pharm. 43, 380-387. 27 Hirschowitz, E.A., Ohwada, A., Pascal, W.R., Russi, T.J. and Crystal, R. G. (1995) Hum. Gene Ther. 6, 1055-1063. 28 Huber, B.E., Austin, E.A., Good, S.S., Knick, V.C., Tibbels, S. and Richards, C.A. (1993) Cancer Res. 53, 4619-4626. 29 Kievit, E., Bershad, E., Ng, E., Sethna, P., Dev, I., Lawrence, T.S. and Rehemtulla, A. (1999) Cancer Res. 59, 1417-1421. 30 Mullen, C.A., Kilstrup, M. and Blaese, R.M. (1992) Proc. Nat. Acad. Sci. U.S.A. 89, 33-37. 31 DiPiro, J.T., Talbert R.L., Yee, G.C., Matzke, G.R., Wells, B.G. and Posey, L.M. Eds. (1997) in Pharmacology, A Pathophysiologic Approach, 3rd edition. Appleton and Lange, Stamford, CT. 32 Erbs, P., Exinger, F. and Jund, R. (1997) Curr. Genet. 31, 1-6. 33 Hayden, M.S., Linsley, P.S., Wallace, A.R., Marquardt, H. and Kerr, D.E. (1998) Prot. Exp. Purif. 12,173-184. 34 Rogulski, K., Kim, J.H., Kim, S.H. and Freytag, S.O. (1997) Hum. Gene Ther. 8, 7385. 35 Aghi, M., Kramm, C.M., Chou, T., Breakefield, X.O. and Chiocca, E.A. (1998) J. Nat. Cancer Inst. 90, 370-380. 36 Freytag, S.O., Rogulski, K.R., Paielli, D.L., Gilbert, J.D. and Kim, J.H. (1998) Hum. Gene Ther. 9, 1323-1333. 37 Kim, J.H., Kolozsvary, A., Rogulski, K., Khil, M.S., Brown, S.L. and Freytag, S.O. (1998) Cancer J. Sci. Amer. 4, 364-369. 38 Black, M.E., Munir, K.M. and Loeb, L.A. (1993) Meth. Mol. Genet. 2, 26-43. 39 Black, M.E. and Loeb, L.A. (1996) Meth. Mol. Biol. 57, 335-349. 40 Dube, D.K., Black, M.E., Munir, K.M. and Loeb, L.A. (1993) Gene 137, 41-47. 41 Bowie, J.U., Reidhaar-Olson, J.F., Lim, W.A. amd Sauer, R.T. (1990) Science 247, 1306-1310. 42 Reidhaar-Olson, J.F. and Sauer, R.T. (1988) Science 241, 53-57. 43 Dube, D.K. and Loeb, L.A. (1989) Biochemistry 28, 5703-5707. 44 Munir, K.M., French, D.C., Dube, D.K. and Loeb, L.A. (1992) J. Biol. Chem. 267, 6584-6589. 45 Stemmer, W.P.C. (1994) Proc. Nat. Acad. Sci. U.S.A. 91, 10747-10751. 46 Stemmer, W.P.C. (1994) Nature 370, 389-391. 47 Crameri, A., Whitehorn, E.A., Tate, E. and Stemmer, W.P.C. (1996) Nature Biotech. 14,315-319. 48 Zhang, J.H., Dawes, G. and Stemmer, W.P.C. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 4504-4509. 49 Black, M.E., Newcomb, T.G, Wilson, H.-M.P. and Loeb, L.A. (1996) Proc. Nat. Acad. Sci. U.S.A. 93, 3525-3529. 50 Black, M.E. and Loeb, L.A. (1993) Biochemistry 32, 11618-11626.
ENZYME AND PATHWAY ENGINEERING FOR SUICIDE GENE THERAPY
127
51 Black, M.E., Rechtin, T.M. and Drake, R.R. (1996) J. Gen. Virol. 77, 1521-1527. 52 Kokoris, M.S., Sabo, P. and Black, M.E. (2000) Anticancer Res. 20 , 959-964. 53 Kokrois, M.S., Sabo, P., Adman, E.T. and Black, M.E. (1999) Gene Ther. 6, 1415-
1426. 54 Valerie, K., Brust, D., Farnsworth, J., Amir, C., Taker, M.M., Hershey, C. and Feden, J. (2000) Cancer Gene Ther. 7, 879-884. 55 Qiao, J., Black, M.E. and Caruso, M. (2000) Hum. Gene Ther. 11,1569 1576. 56 Howard, B.D., Boenicke, L., Schniewind, B., Henne-Bruns, D. and Kalthoff, H. (2000) Cancer Gene Ther. 7, 927-938. 57 Black, M.E., Kokoris, M.S. and Sabo, P., (unpublished data).
DECONSTRUCTING A CONSERVED PROTEIN FAMILY: THE ROLE OF MCM PROTEINS IN EUKARYOTIC DNA REPLICATION
Sally G. Pasion and Susan L. Forsburg Molecular Biology and Virology Laboratory The Salk Institute for Biological Studies 10010 N. Torrey Pines Rd. La Jolla,CA 92037
INTRODUCTION Cell division is an exquisitely orchestrated process resulting in the duplication of cellular structures, organelles, and genetic material, followed by its separation into two daughter cells. Proper chromosome replication and segregation are essential for successful completion of division. Eukaryotic cells face a particular challenge with their relatively large genomes divided among multiple linear chromosomes. They regulate S phase by coordinating the initiation of DNA replication from replication origins throughout the genome. To ensure that S phase occurs just once per cell cycle, re-initiation from these origins is prevented until cell division occurs and the cells enter the subsequent cycle. Much progress has been made in dissecting the initiation of DNA replication in eukaryotes over the last ten years, based on complementary strategies including the genetic analysis of the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, the biochemistry of in vitro replication with SV40 DNA replication and Xenopus laevis cell-free systems, and molecular studies in cultured cells. Work in multiple systems has provided a model for how replication is initiated at individual origins (Figure 1). In this two-step model, the hexameric origin recognition complex (ORC) binds the origin sequence apparently throughout the cell cycle in yeast cells. The first step is defined by the assembly of the pre-replicative complex operationally defined as the origin that is bound by ORC and replication initiation proteins and is competent for the initiation of replication. Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
129
130
S.G. PASION AND S.L. FORSBURG
The second step involves triggering of the initiation of DNA replication presumably due to the action of cyclin-dependent kinases (CDK) and CDC7 kinase (reviewed in 1). During Gl, the initiation factor CDC6/Cdcl8 and the associated protein Cdtlp bind to ORC. This binding is required for the subsequent loading of an essential protein complex, the heterohexameric minichromosome maintenance (MCM) protein complex, to form the pre-replicative complex. Firing of the replication origin occurs in response to phosphorylation and remodeling of the MCM complex, allowing the binding of additional replication factors. After the origin fires, CDC6/Cdcl8 is degraded, and as replication progresses, the MCM proteins dissociate from the chromatin. After the completion of S phase, the origins will not fire again because the absence of the initiator protein CDC6/Cdcl8 prevents re-loading of the MCM proteins and reassembly of the pre-replicative complex. Most of the proteins that assemble into the pre-replicative complex are conserved among all eukaryotes (Table 1). The conserved factor CDC45 is essential for DNA replication initiation, physically interacts with the MCM proteins, and may recruit downstream replication factors (2, 3). However, although fission yeast Cdtlp has homologues in Xenopus, Drosophila melanogaster, human, mouse, Arabidopsis thaliana and Caenorhabditis elegans, no structural homologue has yet been reported in budding yeast (4-7). Detailed consideration of the process of replication initiation has been covered in several recent reviews (1, 8, 9). This chapter focuses on one set of conserved factors. The MCM proteins form a family of six related proteins that are found in all eukaryotes and are essential for the initiation of DNA replication and progression of S phase. Despite their related sequences, each MCM subtype (named MCM2 through MCM7) is essential for viability, suggesting that they are not simply redundant, but that each provides unique functions. Recent in vitro studies have led to the speculation that the six MCM proteins may function together as a DNA helicase, although their precise function(s) in origin unwinding, replication fork progression, or chromatin remodeling in vivo remains to be defined. In this chapter, we will discuss the molecular characterization and
132
S.G. PASION AND S.L. FORSBURG
genetic and physical interactions of each MCM family member, and review the data concerning their roles in DNA replication. We will also consider some of the important questions remaining in the field and provide a model for MCM protein activity. Several excellent recent reviews on MCM proteins are available for additional detail (10-12). Identification of the MCM Family and the Licensing Factor Hypothesis The original identification of the MCM proteins came from genetic screens for budding yeast mutants that were defective in minichromosome maintenance (13). A subset of these original mcm mutants was subsequently found to define a family of structurally- and functionallyrelated proteins, now referred to as MCM2 through MCM7 (Table 1). Other mutants in mcm genes were isolated in a cell division cycle mutant screen (14). Several fission yeast mutants defective for cell division or minichromosome maintenance identified S. pombe homologues of MCM proteins (15-20). The identification of a human MCM protein homologue, P1MCM3, via its interaction with DNA polymerase alpha-primase, illustrated the conservation of this family outside of fungi, and was consistent with an apparent role in DNA replication (21). Interest in this family of proteins heightened when the budding yeast MCM proteins were demonstrated to display cell cycle-regulated nuclear localization consistent with the predicted behavior of "licensing factor" (22-24). The licensing factor model attempted to explain the "once and only once" regulation of the initiation of DNA replication. Based on experiments carried out in Xenopus cell-free systems, this model proposed that DNA replication initiation was restricted to once per cell cycle because of the nuclear envelope. Licensing factor was presumed to be a soluble factor that binds chromatin after nuclear envelope formation and is degraded or inactivated after initiation. The cytoplasmic protein cannot regain access to the chromatin until the nuclear envelope breaks down during mitosis. Thus, cells must pass through cell division to allow the factor access to the chromatin for another round of DNA replication. Shuttling of these proteins in budding yeast was expected because yeast undergoes a closed mitosis during which the nuclear envelope never breaks down. Ironically, as we will describe in more detail below, recent work showed that MCM protein localization is regulated during the cell cycle only in the budding yeast. The MCM proteins in other organisms remain in the nucleus constitutively, and biochemical data indicate that another protein is likely to be the true “licensing factor”. For clarity of nomenclature, in this review, we will describe specific MCM proteins using their species designation (e.g., SpMcm2 is S. pombe Mcm2p), and identify the MCM classes in upper case (e.g., MCM2 for all Mcm2p family members). Table 1 lists additional allelic designations for several genes. Six Classes of MCM Proteins The MCM proteins share a central homology domain containing a nucleotide binding motif of about 200 amino acids that is characteristic of DNA-dependent and ATPases (25, 26). The genes encoding all six MCM proteins have been cloned or partially identified in S. cerevisiae (Sc), S. pombe (Sp), Drosophila (Dm), Xenopus (XI), mouse (Mm) and human (Hs) cells (Table 1). Homologues for some of the MCM proteins have been identified in the genome databases for Arabidopsis and C. elegans. In addition, from one to four MCM homologues have been identified in the sequenced genomes of various Archaea, indicating that the MCM proteins are an ancient protein family. Members within each class of MCM proteins share conserved
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
133
structural motifs in addition to the conserved MCM central homology domain. Outside of this region, they are relatively unrelated to each other, so that the MCM2 homologues from yeast and humans, for example, are more related to one another than the MCM2 and MCM6 homologues from yeast. Figure 2 shows an alignment of the structural motifs found in the S. pombe MCM proteins, which here depict representative MCM proteins. SpMcm2, SpMcm4, SpMcm6 and SpMcm7 contain zinc finger motifs, which are conserved in their homologues in other species. These zinc finger domains may be involved in protein-protein or protein-DNA interactions. Each MCM protein also has regions rich in proline, aspartic and glutamic acid, serine and threonine residues (PEST sequences), which may be involved in protein degradation and are found in various MCM proteins from other species (27). SpMcm2 and SpMcm3 have nuclear localization signals (NLS), which are conserved in MCM2 and MCM3 homologues in other species. However, the putative leucine zipper motifs detected in SpMcm3 and SpMcm5 are not found in their cognate homologues. The consensus CDK phosphorylation site sequence (S/TPXK/R) is conserved in the amino-terminal region of the MCM4 class; CDK sites are also found in other MCM classes but their presence is absolutely conserved in MCM4 homologues. Furthermore, at least one consensus CDK phosphorylation site is positioned near the NLS in the yeast and Drosophila MCM3 homologues.
134
S.G. PASION AND S.L. FORSBURG
Figure 3 shows a sequence alignment of the six fission yeast MCM proteins. This alignment highlights the regions of the protein that are conserved across the six class members. The MCM central homology domain corresponds to the original Region II described by Yan and coworkers (28) and contains the four domains identified by Koonin to have significant similarity with the DNA-dependent ATPase domain (25). This includes the Walker A domain and the diagnostic MCM motif (IDEFDKM) within the Walker B domain. In addition, the alignment shows conservation of the zinc finger motifs present in the MCM2, MCM4, MCM6 and MCM7 homologues. Similar full length or partial sequence alignments for MCM proteins in other species have been published previously (Sc, 10; Hs, 29; X1, 30: Dm, 31). MCM2
MCM2 protein ranges in length from 830 to 890 amino acid residues in different species, and encodes a predicted 93 to 100 kDa protein. Typical of many MCM proteins, MCM2 migrates more slowly than predicted with an apparent 120 kDa molecular mass when analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). This difference between the predicted and apparent molecular mass suggests post-translational modifications or charged amino acids may affect electrophoretic mobility (28, 32-34). In addition to the characteristic MCM central homology domain, the MCM2 proteins contain a conserved zinc finger which may be involved in interactions with proteins or DNA. Mutational analysis of the conserved zinc finger in ScMcm2 and SpMcm2 demonstrates that the cysteine residues are essential for viability (28, 32). MCM2 family members often have two potential NLS motifs. When these were examined by mutational analysis in SpMcm2, each was found essential for nuclear localization and cell viability, although a single NLS sequence is able to target a heterologous protein to the fission yeast nucleus (32, 35). The mouse Mcm2 protein similarly has two potential NLSs, a bipartite NLS at residues 18-34 and positively-charged amino acids at residues 118-152. The charged residues at 118-152 are capable of targeting MmMcm2-GFP fusions to the nucleus (36, 37). In addition, DmMcm2 has basic residues in its amino-terminal region, which may function as an NLS (31). Although SpMcm2 is rich in PEST sequences, the steady-state protein level in S. pombe remains constant during the cell cycle (19, 32). The analysis of phenotypes associated with loss of MCM2 function indicates that MCM2 is required for DNA replication. Budding yeast conditional mutants exhibit plasmid stability defects (28) and the Aspergillus nidulans MCM2 homologue encoded by nimQ is essential for DNA replication (34). In Drosophila, a P-element insertion mutant allele of MCM2 results in reduced cell proliferation and delays S phase in both embryonic and larval CNS (38). In mammalian cells, injection with anti-MCM2 antisera inhibits DNA synthesis and cell division (39). In fission yeast, mcm2ts cells undergo a checkpoint-dependent arrest late in S phase, with an apparently replicated genome; this arrest occurs within one cell cycle of shifting to the restrictive temperature and results in loss of viability (18, 19, 40). mcm2 mutants show genetic interactions with mutations in either the initiator protein Cdcl8p or the subunits of the replicative DNA polymerase consistent with a role in replication initiation and elongation. Genetic interactions between mcm mutants in both fission and budding yeasts gave some of the first indications that MCM proteins act together to perform their cellular function. Budding yeast mutant mcm2-1 is synthetically lethal with mutant alleles in two other MCM genes: mcm3-1 (28) and mcm4-1 (41). In fission yeast, mcm2ts is synthetically lethal with mcm4ts (19). In S.
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
135
136
S.G. PASION AND S.L. FORSBURG
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
137
cerevisiae, overproduction of ScMcm3 in the mcm2 mutant is toxic (28), and in S. pombe, overproduction of SpMcm4, SpMcm5, or SpMcm6 is toxic in the mcm2 mutant strain (32). MCM3
MCM3 proteins in various organisms range from 812 to 971 amino acids with a predicted 90 to 107 kDa molecular mass but an apparent 104 to 125 kDa molecular mass on SDS-PAGE gels (31, 42-46). One conserved feature of the MCM3 protein class is the presence of a potential NLS. In addition, budding yeast, fission yeast and Drosophila MCM3 homologues contain at least one consensus CDK phosphorylation site. S. cerevisiae Mcm3 has a bipartite NLS, which is necessary for translocation of ScMcm3 to the nucleus and is also able to target a heterologous protein to the nucleus. The loss of four out of five consensus CDK phosphorylation sites has no apparent effect on cell growth, viability, or nuclear accumulation of ScMcm3 (42, 47). SpMcm3 has a putative SV40 large T antigen-type NLS in the carboxy-terminal region (residues 676-681, KPKRKK) near a consensus CDK phosphorylation site (residues 527-530, TPVR) (43). Both mouse and Xenopus MCM3 have a putative NLS in the C-terminal region, and XlMcm3 has nine DNA-dependent protein kinase phosphorylation sites also within the C-terminal region of the protein, three of these being non-optimal CDK phosphorylation sites (45, 48). HsMcm3 also has a possible bipartite NLS in its carboxy-terminal region, which may require interaction with an associated protein Map30 for its nuclear localization (49), while DmMcm3 has a putative bipartite NLS in its carboxy-terminal region with a consensus CDK phosphorylation site embedded within it (31). SpMcm3 contains a leucine zipper (residues 41 to 62, which is not conserved among MCM3 homologues and may have a role in interacting with DNA or proteins (43). Not surprisingly, loss of MCM3 function also causes defects in DNA replication. The S. cerevisiae temperature-sensitive mutant mcm3-1 arrests as large-budded cells with fully or nearly fully replicated DNA content and exhibits an increased rate of chromosome loss and recombination (42). Injection of anti-MmMCM3 antisera into synchronized NIH3T3 fibroblasts in Gl inhibits DNA synthesis, consistent with MCM3 being required for DNA replication (50). In addition to the genetic interactions between mcm2 and mcm3 mutants described above, overproduction of ScMcm2 suppresses the temperature-sensitive growth defect of the mcm3-1 allele (28). Furthermore, carboxy-terminal epitope tagging of SpMcm3 in fission yeast with either the myc or hemagglutinin epitope generates a cold-sensitive conditional mutant strain. This mutant phenotype may reflect loss of SpMcm3 interactions with other MCM proteins (35, 51). MCM4
S. cerevisiae, S. pombe, Xenopus, Drosophila, mouse and human MCM4 genes have been cloned and encode 863 to 933 amino acid proteins with predicted molecular masses of 97 to 105 kDa and apparent molecular masses on SDS-PAGE gels ranging from 97 to 110 kDa (17, 52-56). The MCM4 homologues all contain zinc finger motifs distinct from the motif found in MCM2 homologues. However, the zinc fingers in both budding and fission yeast MCM4 have an additional two residues between the third and fourth conserved cysteine residues compared to Drosophila, Xenopus, mouse and human MCM4 homologues The S. pombe, Xenopus and mouse MCM4 proteins have several amino-terminal potential proline-directed serine or threonine phosphorylation sites, some of which
138
S.G. PASION AND S.L. FORSBURG
are consensus CDK phosphorylation sites (17, 53, 54). No functional NLS has been identified in this class of proteins, although HsMcm4 has two possible NLSs at residues 504 and 855 (55). Cold-sensitive budding yeast mcm4 mutants arrest with a large bud and the nucleus at the neck between the mother and daughter (14). At the restrictive temperature, the fission yeast mcm4 mutant arrests with an apparently replicated genome (15, 17). In Drosophila, mutations in the MCM4 homologue encoded by dpa (disc proliferation abnormal) indicate that dpa is essential for mitotically proliferating cells (56). In addition to the synthetic lethality observed between mcm2 and mcm4 in fission yeast discussed above, the budding yeast mcm4cs is synthetically lethal with mcm5cs and mcm7ts, and mcm5ts suppresses mcm4cs (2). These genetic interactions are again consistent with direct physical interactions among the MCM proteins. MCM5
MCM5 genes have been cloned in S. cerevisiae, S. pombe, Drosophila, Xenopus, mouse and human cells and encode 720 to 776 amino acid proteins with predicted molecular masses of 80 to 86 kDa and apparent molecular masses of 80 to 95 kDa (18, 30, 33, 41, 54, 57-59). In addition to the MCM central homology domain, budding yeast, fission yeast, Xenopus and human MCM5 proteins contain highly acidic stretches (2, 18, 33, 59). No functional NLS has been identified in the MCM5 class of proteins. Analysis of SpMcm5 reveals it harbors a leucine zipper motif (residues 57 to 78, which is not shared by other MCM5 homologues but aligns with the leucine zipper motif found in SpMcm3. Loss of MCM5 function in budding yeast causes cells to arrest at Gl/S with a large bud and a single nucleus. In fission yeast, mcm5cs cells arrest in interphase and are defective in the onset of DNA synthesis, though this block is reversible (18). Injections of anti-DmMcm5 antisera into Drosophila embryos result in the accumulation of chromosome bridges consistent with a role for MCM5 in DNA synthesis (58). Analysis of the chromosomes in pulsed field gel electrophoresis (PFGE) analysis demonstrated that incubation of either budding yeast or fission yeast MCM5 mutant cells at the restrictive temperature resulted in chromosomes which fail to migrate into the gels (33, 57). Similar results have been obtained in PFGE analysis of the S. pombe mcm2 mutant strains and are considered typical of S phase mutants and diagnostic for the presence of replication intermediates (33, 40). As mentioned above, budding yeast mcm5ts suppresses the cold-sensitive growth defect of mcm4cs (2). A different mcm5ts allele suppresses cdc45cs that is mutant for another DNA replication gene CDC45 (60). Similar suppression is also observed between the mutant alleles of the fission yeast CDC45 and MCM5 homologues: the temperature-sensitive sna41-912 suppresses the cold sensitivity of mcm5cs (61). One revealing genetic interaction was uncovered in budding yeast when the bob1 mutation was recognized as an allele of MCM5 (62). The bob1 mutation is a recessive bypass suppressor of mutations in the Cdc7 kinase, which is normally essential for the Gl/S transition and for initiation of DNA replication. Cdc7 activity requires association with an essential regulatory subunit Dbf4 (reviewed in 63, 64); mcm5(bob1) also bypasses dbf4 mutations (62). This suppression will be discussed further as we consider MCM protein phosphorylation.
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
139
MCM6
Fission yeast, budding yeast, Drosophila, Xenopus, mouse and human MCM6 have been cloned and encode proteins of 817 to 1017 amino acid residues with predicted molecular masses of 93 to 113 kDa and a range of apparent molecular masses from 100 to 120 kDa on SDS-PAGE gels (10, 12, 20, 29-31, 36, 65, 66). The ammo-terminal region contains a conserved zinc finger motif found also in the yeast MCM4 homologues and all MCM7 homologues (10, 20, 29, 31, 36) and distinct from the zinc finger sequence found in the MCM2 class. While the Drosophila MCM6 protein has a consensus CDK phosphorylation site near the aminoterminal end, the human MCM6 protein contains four proline-directed serine and threonine phosphorylation sites, which are non-optimal CDK phosphorylation sites (29, 31). Loss of MCM6 function has been assayed in the fission yeast conditional mutant mcm6ts (20). This mutant allele was generated in a screen for mutant strains that exhibit a high loss rate of minichromosomes at a permissive or semi-permissive temperature. The fission yeast mcm6ts mutant strain has a DNA replication defect, is hypersensitive to treatment with ultraviolet light indicating sensitivity to DNA damage, and is sensitive to hydroxyurea, an inhibitor of DNA replication (20). In addition, analysis of the S phase progression of haploid spores disrupted for mcm6 function demonstrates the spores exhibit a delayed S phase and are unable to proceed through the cell cycle (40). MCM7
MCM7 has been cloned in S. cerevisiae, S. pombe, Drosophila, Xenopus, mouse, human and Arabidopsis and encodes a protein ranging from 720 to 84S amino acid residues with both the predicted and apparent molecular masses ranging from 81 to 95 kDa (D.T. Liang and S.L. Forsburg, in preparation, 31, 36, 65, 67-72). MCM7 homologues contain the characteristic conserved zinc finger motif found in the yeast MCM4 homologues and all MCM6 homologues. ScMcm7 exhibits cell cycle-regulated nuclear localization (67). SpMcm7 contains a consensus CDK phosphorylation site in the carboxy terminal region, which is not conserved amongst MCM7 homologues (D.T. Liang and S.L. Forsburg, in preparation). The budding yeast mutant mcm7-1 was originally isolated as an allele-specific suppressor of the cdc45cs mutant. The mcm7-1 mutant is itself thermosensitive, and at the restrictive temperature arrests with a large bud and a single nucleus. mcm7-1 is lethal in combination with mcm4cs (2). In Arabidopsis, the MCM7 homologue PROLIFERA is expressed in proliferating cells and is required for embryo development (72). In human fibroblast WI38 cells, the introduction of antisense RNA to MCM7 inhibits S phase entry, further supporting the view that MCM7 is essential for DNA replication (73). Archaeal MCM
The MCM proteins are clearly conserved proteins in eukaryotes; in every eukaryote examined, all six MCM proteins have been identified. As archaeal genomes have been sequenced, it is apparent that Archaea utilize translation, transcription and replication mechanisms similar to those in eukaryotes, while their metabolic mechanisms are similar to those in prokaryotes (74). Several MCM homologues have been found in the archaeal genomes; depending on the species, there are between one and four MCM proteins (10). Recently, three laboratories have cloned and
140
S.G. PASION AND S.L. FORSBURG
characterized MtMcm, the single MCM homologue found in the Methanobacterium thermoautotrophicum genome. Although diverged from the eukaryotic MCM proteins, MtMcm clearly falls into the family and has closest homology to the MCM4 class. Structural analyses show that the protein assembles into the tube-like structure of a double hexamer. This complex binds single-stranded DNA, shows DNA-dependent ATPase activity, and possesses a robust 3' to 5' ATP-dependent DNA helicase activity (75-77). The enzymatic activity of this distant MCM homologue provides some of the best evidence of the cellular role of the eukaryotic MCM proteins, as will be discussed later. MCM Proteins Form a Hexameric Protein Complex Allele-specific genetic interactions between different mcm mutants in both yeast species provide evidence that the MCM proteins physically interact. Indeed, the six MCM proteins appear to assemble together in a complex in all eukaryotes. They have been detected as a heterohexamer with each MCM protein present in an equimolar ratio. However, smaller subcomplexes have also been detected during fractionation, suggesting that the hexameric complex disassembles in reproducible ways. Biochemical techniques including glycerol gradient centrifugation, gel filtration chromatography, two-hybrid analysis, and co-immunoprecipitation have allowed us to develop a model of the soluble MCM complex that takes into account relative affinity of the different subunits for one another (Figure 4A). This has yielded strikingly consistent results across different species and different methods. These data indicate that the MCM complex in all species contains a tightly associated core of MCM4/6/7, bound more loosely by MCM2, probably through interaction with MCM4. MCM3 and MCM5 bind tightly to one another, but very loosely to the other subunits, probably through interaction with MCM7. MCM subunits may therefore be functionally divided into “core” and “peripheral” proteins. Assembly of the core MCM4/6/7, or of the peripheral MCM3/5 dimer, is independent of other MCM subunits. The data supporting this model come from multiple systems. In mammals, classical biochemical approaches reveal a mouse 450 to 600 kDa MCM complex containing at least MmMcm2, MmMcm4, MmMcm6 and MmMcm7 (36). MmMcm3 co-immunoprecipitates with MmMcm5, but not with MmMcm4. In turn MmMcm4 co-immunoprecipitates with MmMcm2, MmMcm6 and MmMcm7 (36, 54). Glycerol gradient centrifugation of HeLa cell extracts combined with co-immunoprecipitation indicates that all six human MCM proteins interact in a multiprotein complex in mitotic extracts (78). Glycerol gradient analysis of S phase extracts indicates the presence of a slightly smaller complex with less HsMcm3 and HsMcm5. HsMcm2, HsMcm4, HsMcm6 and HsMcm7 are found in a tetrameric complex with HsMcm2 only loosely associated, and HsMcm3 and HsMcm5 are stably associated with one another (78). HsMcm2, HsMcm4, HsMcm6 and HsMcm7 could also be captured by chromatography of HeLa extracts over a histone-sepharose column, while HsMcm3 and HsMcm5 are not retained on the column (79). Co-immunoprecipitation experiments show that HsMcm4 co-immunoprecipitates with HsMcm2, HsMcm6 and HsMcm7, that HsMcm3 co-precipitates with HsMcm5, and that HsMcm6 stably interacts with HsMcm2 and HsMcm7 (55, 80, 81). MCM proteins in Xenopus were first identified by biochemical searches for Replication Licensing Factor (RLF) and other proteins required for replication initiation (45, 48, 82). One of the fractions isolated, called RLF-M, is required for replication licensing and was further fractionated into three subfractions (83): Q1 (X1Mcm3/5), Q2 (X1Mcm3/7) and Q3 (X1Mcm2/4/6/7). Mixing Q1 and Q3 restores partial licensing activity. Further fractionation generates several distinct MCM complexes: QH1 (X1Mcm3/5), QH2 (X1Mcm3/7), QH3a
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
141
(X1Mcm2/4/6/7) and QH3b (X1Mcm4/6/7). Again, mixing of QH1 and QH3a reconstitutes activity (83). These experiments confirm that the hexameric MCM complex is the essential component of RLF-M that supports replication licensing in the in vitro Xenopus sperm chromatin replication assay. The strongly-interacting XlMcm3/7 pair is unexpected, but has been detected by co-immunoprecipitation (84). Glycerol gradient sedimentation of budding yeast extracts indicates that both ScMcm2 and ScMcm3 can be detected in a 150 to 200 kDa complex and in a 443 to 669 kDa complex (85). ScMcm2, ScMcm3 and ScMcm5 interact by co-immunoprecipitation and GST-affinity chromatography with the use of a GST-ScMcm2 fusion protein (85). Two-hybrid analysis also shows that ScMcm3 and ScMcm5 interact strongly (85). Purification of the S. pombe MCM complex by column chromatography and gel filtration identified a 560 kDa complex containing an equimolar ratio of each MCM protein, although smaller complexes with a subset of MCM proteins are also apparent (43, 51, 86). Electron microscopy of the fission yeast MCM complex indicates it has a globular shape with a central cavity (86), suggesting a hexameric ring structure. Co-immunoprecipitation assays show that SpMcm4 and SpMcm6 interact tightly with one another, and more weakly with SpMcm2, while
142
S.G. PASION AND S.L. FORSBURG
SpMcm3 and SpMcm5 bind very tightly with one another, but only very loosely with the other MCM subunits (43, 51). The co-immunoprecipitation strategy also shows that an mcm4 mutation prevents any association of SpMcm2 with SpMcm6, but an mcm2 mutation has no effect on SpMcm4/6 association. Recently, various stable subcomplexes of S. pombe MCM proteins have been purified with the use of a baculovirus expression system. These stable complexes purified by immunoaffinity chromatography include a heterotetramer of SpMcm2/4/6/7, a dimer of the heterotrimer core complex SpMcm4/6/7, and the heterodimer SpMcm3/5 (87). Attempts to define a minimal MCM-binding domain within the MCM proteins have failed. Deletion of the amino-terminal 122 amino acids of XlMcm7 (outside of the MCM central homology domain) abolished the interaction with XlMcm3 (84). Extensive mutational analysis of SpMcm2 indicates that the mutations throughout the protein inside and out of the MCM central homology domain affect its ability to interact with other MCM proteins, even though they do not affect the stability of the protein (51). Of particular interest, the MCM proteins have loose homology to the ATPase superfamily (26), which includes proteins as diverse as NSF (N-ethylmaleimide-sensitive fusion protein), involved in membrane fusion, and the ClpA protease/chaperone from Escherichia coli. Both NSF and ClpA have two ATPase domains, one of which is required for ATP-mediated multimerization as a trimer (NSF) or a hexamer (ClpA) (88, 89). As will be discussed below, evidence suggests that the core MCM complex functions as a replicative helicase. Perhaps the ATPase domains of the peripheral MCM proteins contribute to ATP-dependent complex assembly as seen for other ATPases. This would be consistent with data suggesting that point mutations in the SpMcm2 ATPase domain reduce affinity for other MCM subunits in fission yeast (32, 51). Not surprisingly, the MCM proteins also physically interact with other cellular proteins involved in DNA replication or chromatin organization. ScMcm2, ScMcm6 and ScMcm7 interact with an unrelated but conserved replication factor, Mcml0, by two-hybrid analysis and in GST fusion chromatography. MCM10 is required for DNA replication, and mutants exhibit pausing in elongation forks near replication origin sequences (90). In addition, loss of MCM10 function correlates with loss of ScMcm2 chromatin association (91). The isolation of HsMcm2, HsMcm4, HsMcm6 and HsMcm7 via histone-sepharose column chromatography was based on an interaction with histone H3 (79). It is likely that this interaction was mediated via HsMcm2, since MmMcm2 alone also binds histone H3 (37). Budding yeast Cdc45, which is essential for DNA replication, associates with chromatin during S phase and co-immunoprecipitates with ScMcm2, ScMcm3 and ScMcm5. In addition, its association with chromatin is dependent on interacting with the MCM proteins during S phase (92). Spatial Regulation of MCM Protein Complex The six MCM proteins are synthesized in the cytoplasm and must be imported into the nucleus in order to perform their cellular function. As described above, considerable excitement was generated by the observation that budding yeast MCM proteins enter and exit the nucleus in a cell cycle-regulated fashion, being nuclear only during S phase (57, 67, 93). This focused the attention of the replication community on the MCM protein family. After it became apparent that budding yeast MCM proteins are unusual in this regard, there was a report suggesting that the yeast MCM proteins were actually evenly distributed between the cytoplasm and nucleus throughout the cell cycle (94). However, more recent data with the use of GFP (green fluorescent protein) fusions have convincingly confirmed the original immunofluorescence
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
143
observations, showing that MCM proteins in budding yeast shuttle in and out of the nucleus (95, 96). ScMcm4-GFP enters the nucleus at the end of mitosis, and this nuclear accumulation requires the initiator protein CDC6. G1 cyclins and B type cyclins promote nuclear exit of ScMcm4 at the end of S phase. These results are consistent with the G1 cyclin acting directly on ScMcm4 or on a factor required for nuclear translocation of ScMcm4 (95). These results have been supported by work analyzing the nuclear localization of ScMcm2, ScMcm3, ScMcm4, ScMcm6 and ScMcm7 GFP fusion proteins in budding yeast cells (96). The importance of this shuttling behavior in budding yeast is not clear, and may represent the unusual timing constraints on the rather telescoped budding yeast cell cycle, in which S phase and M phase occur very closely (reviewed in 97). Constitutive nuclear localization of the MCM proteins in budding yeast can be achieved by fusion of the SV40 T antigen NLS to one of the MCM proteins, but this constitutive nuclear localization does not result in re-replication (96). Interestingly, constitutive nuclear localization of ScMcm3 results in a temperature-sensitive phenotype and a defect in the ability to maintain some plasmids (47), suggesting the shuttling behavior of the MCM proteins has regulatory significance in this organism. In contrast, shuttling of the MCM proteins does not occur in mammalian cells or in fission yeast; instead, the MCM proteins remain nuclear throughout the cell cycle (D.T. Liang and S.L. Forsburg, in preparation; 35, 39, 48, 55, 98-102). Cell cycle regulation apparently occurs at the level of chromatin association (see below). Nevertheless, the nuclear envelope does play a crucial role even in organisms in which the MCM proteins are constitutively nuclear. Rather than conferring cell cycle regulation, the nuclear envelope is required for the correct assembly of the MCM complex. The first tantalizing evidence for this was the observation that ectopically expressed MmMcm4 and MmMcm5 are not localized appropriately to the nucleus unless MmMcm2 and MmMcm3, respectively, were also expressed (36). The interdependence of each MCM protein on other MCM proteins for nuclear localization was fully revealed in the tractable fission yeast system (35). Inactivation of any single mcm gene led to redistribution of the wild type MCM protein subunits to the cytoplasm via active nuclear export, indicating mutual dependence for localization (35, 103). Thus, subunits that are not assembled within the complex are removed from the nucleus. Moreover, entry into the nucleus itself requires complex assembly. A mutation of the SpMcm2 NLS blocked nuclear localization but did not affect complex assembly; however, when expressed at a sufficiently high level, the NLS-mutant SpMcm2 is a dominant negative trapping other MCM proteins in the cytoplasm. Combined with analysis of other mcm mutants, this suggested a mechanism in which complex assembly is linked to localization to ensure that only intact MCM complexes are maintained within the nucleus. In this model, association of SpMcm2 and the core SpMcm4/6/7 complex in the cytoplasm is required for targeting of this subcomplex to the nucleus with the use of the NLS on SpMcm2. SpMcm5 interacts tightly with SpMcm3 and may be targeted to the nucleus due to the NLS on SpMcm3. Retention of all six MCM proteins in the nucleus may require association of all six subunits in order to inactivate or mask nuclear export sequences. This example of spatial regulation of multimeric assembly and targeting may provide a mechanism to ensure that only appropriately assembled complexes persist in the nucleus (35). Other factors may also be involved; for example, HsMcm3 also has a functional NLS and interacts with another protein, Map80, which may be involved in regulating the nuclear localization of HsMcm3 (49).
144
S.G. PASION AND S.L. FORSBURG
Chromatin Association of MCM Protein Complex The pivotal control of MCM protein regulation in most eukaryotes appears to be at the level of their association with the chromatin during S phase. Early cytological data suggested that the abundant MCM proteins are liberally distributed on the chromatin (104). More recently, work in multiple systems has demonstrated that the MCM proteins show cell cycle-regulated chromatin localization. Efforts have been focused on determining the timing of association, identifying regions of the chromosomes bound by the MCM proteins, defining which MCM proteins bind chromatin and whether other factors are required, and assessing whether post-translational modifications affect chromatin association. Early experiments typically monitored only a subset of the six MCM proteins, leaving open the possibility that individual subunits may have subtly different behaviors. However, recent studies suggest that the bulk of MCM proteins in the cell do behave similarly in regard to their chromatin binding; this is consistent with the studies of localization described above which suggest that only intact MCM complexes remain in the nucleus. Chromatin fractionation studies in budding yeast show that ScMcm2 and ScMcm3 exist in either a soluble nucleosolic form or in a chromatin-bound form, and associate with chromatin only between late M phase and early S phase (93, 94). Analysis of spread chromosomes demonstrates that ScMcm7 also associates with chromatin, and chromatin immunoprecipitation shows that ScMcm7 associates with replication origins during G1. Loading of ScMcm7 onto chromatin depends upon Cdc6 association with the replication origin and on S phase and M phase CDK activity (105). Similar chromatin immunoprecipitation experiments also demonstrate that ScMcm4 and ScMcm7 associate with replication origin DNA and that this association depends on ORC and CDC6 function. Association of the MCM proteins with replication origin DNA occurs at the same time in both early-firing and late-firing origins (106). Interestingly, ScMcm4 and ScMcm7 appear to dissociate from the origin DNA and associate with non-origin DNA during S phase (107), suggesting that they may be traveling with the replication fork. Similar cell cycle-specific chromatin association has also been demonstrated in fission yeast with a novel in situ chromatin binding assay. Cells expressing SpMcm4 fused to GFP were permeabilized, washed with a non-ionic detergent and scored for the retention of SpMcm4-GFP on the chromatin. SpMcm4 associates with chromatin during anaphase B and this association depends on both Orc1p and Cdc18p. Release from chromatin occurs during S phase and requires DNA synthesis (103). Significantly, changes in association with the chromatin in vertebrate cells appear to correlate with changes in the phosphorylation state of the MCM proteins. HsMcm4 is found in the nuclear fraction in a hyperphosphorylated form in the nucleoplasm and in a hypophosphorylated form more tightly associated with chromatin (55). During G2/M, Cdc2/cyclin B kinase may phosphorylate HsMcm2 and HsMcm4, suggesting that CDK negatively regulates MCM complex association with chromatin (108). Work in Xenopus also suggests that X1Mcm4 chromatin association is regulated by phosphorylation (53, 68, 109). These observations are consistent with experiments suggesting a negative role for CDKs in replication regulation (110). X1Mcm3 binds to Xenopus sperm chromatin before formation of the intact nuclear envelope; however, X1Mcm3 does not co-localize with replication sites but distributes uniformly on chromatin (48, 104). This uniform distribution of MCM proteins is not observed in yeast chromosomes or mammalian cells in which the MCM proteins appear to bind at foci during G1 and S phases (53, 101, 105). The finding that X1Mcm3 crosses the intact nuclear envelope of G2
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
145
HeLa nuclei but fails to bind chromatin demonstrates that a second cytosolic factor is required for X1Mcm3 to bind to chromatin (104). MmMcm3 in Swiss 3T3 cells appears to accumulate within the heterochromatic chromatin transiently during G1, earlier than the replicative polymerase processivity factor, Proliferating Cell Nuclear Antigen (PCNA) associates with heterochromatic chromatin (99). HsMcm2 is chromatin-associated in G1 and early S phase and then lost from chromatin as S phase proceeds (98). HsMcm6 appears to be excluded from chromatin during prophase and metaphase until it reassociates during telophase (29). Comparative studies on all six MCM proteins show they all bind to chromatin before S phase and are displaced as S phase progresses (111). Importantly, the timing and location of chromatin binding by the MCM proteins appear to be distinct from the behavior of other replication proteins, such as PCNA and the single-strand DNA binding protein Replication Protein A (RPA) (70, 78, 101). For example, the level of HsMcm7 in a chromatin-associated insoluble form decreases in HeLa cells as the cells progress through S phase, coincident with the loss of chromatin-bound HsMcm3, but distinct from the loss of PCNA from chromatin (70). Chromatin-bound HsMcm3, HsMcm4 and HsMcm5 are present in late G1 and early S phase in discrete sites that are not coincident with replication foci but with unreplicated chromatin (78, 101). This lack of correlation between the MCM proteins and PCNA is unexpected, if MCM proteins are indeed moving with the replication fork in the same foci as components of the replication machinery, and will be considered further below. There is evidence that the MCM complex remains intact on and off the chromosome. Analysis of the MCM proteins in permeabilized nuclei suggests that the six MCM proteins exist as a heterohexameric complex whether or not bound to chromatin, and that chromatin association is dependent on the presence of ATP (112). ATP may serve as a cofactor for enhancing binding to chromatin since the non-hydrolyzable analogue AMP-PNP appears also to stabilize chromatin binding (108); this could reflect a role for ATP in mediating complex formation, as suggested above. Reciprocal immunoprecipitations of the nuclease-released MCM proteins demonstrated that all six human MCM proteins are present with a 1:1 stoichiometry in chromatin (108). Similar to what has been observed of the soluble complex, HsMcm4, HsMcm6 and HsMcm7 are tightly associated, as are HsMcm3 and HsMcm5. However, other experiments suggest that the MCM complex may not exist on chromatin as a heterohexameric complex. It is possible that a subset of MCM complexes behave differently from the majority, which could explain some of these puzzles. Some HsMcm3 and HsMcm6 appear to be chromatin-bound but not associated with the other MCM proteins (113). More recent analysis of MCM complex chromatin association in Xenopus suggests an ordered assembly of MCM subcomplexes on chromatin: X1Mcm2, X1Mcm4 and X1Mcm6 bind chromatin initially with similar kinetics to those of ORC1 binding to chromatin. X1Mcm5 and X1Mcm7 bind afterwards followed by X1Mcm3. This results in a model in which the MCM proteins associate with chromatin loosely as a heterohexameric complex, followed by stabilization of MCM4 and MCM6, then remodeling of the MCM complex to assemble the core MCM4/6/7 complex on chromatin (66, 114). However, localization studies described above clearly indicate that soluble MCM proteins must be assembled in a complex to remain in the nucleus. Regulation of the MCM Protein Complex by Phosphorylation As described, changes in the phosphorylation state of the MCM subunits correlate with changes in MCM complex interactions with the chromatin. This is important because regulation of the MCM protein complex is at the center of replication initiation and progression.
146
S.G. PASION AND S.L. FORSBURG
Localization, chromatin association and activity of the MCM proteins are affected by the action of several cell cycle kinases, in particular the cyclin-dependent kinases such as Cdc2, and the CDC7 family of kinases. Recent data suggest additional kinases that remain to be identified are also playing a role. The likely consequence of some of these phosphorylation events is remodeling of the MCM complex, facilitating the loading of additional protein factors and enzymatic activation (Figure 4B). This model will be discussed further in the Conclusions. The role of the CDC7 kinase and its partner DBF4 in the regulation of MCM protein function was originally suggested based on the genetic interactions between budding yeast dbf4 and mcm2 mutant alleles and the suppression of dbf4 and cdc7 mutants by an mcm5(bob1) mutant (62). Biochemical studies have subsequently demonstrated that the CDC7 proteins found in all eukaryotes are active as MCM kinases. Budding yeast Cdc7/Dbf4 is able to phosphorylate ScMcm2, ScMcm3, ScMcm4 and ScMcm6 in vitro (94, 115). In addition, human CDC7 is able to phosphorylate X1Mcm2 and HsMcm3 in vitro (116). Purified Hsk1p/Dfp1p, the fission yeast homologue of CDC7/DBF4, phosphorylates SpMcm2 in the purified MCM protein complex and recombinant SpMcm2 in vitro (117, 118). In addition, human CDC7 can phosphorylate either free HsMcm2 or HsMcm2 in the complex of HsMcm2/4/6/7 in vitro. It appears that CDK phosphorylation of this complex makes HsMcm2 a better CDC7 phosphorylation substrate [119]. Work in budding yeast convincingly demonstrates that CDC7 kinase functions at individual replication origins to activate them (120, 121). Moreover, further data indicate that the CDK and Cdc7 phosphorylations may be required for loading of ScCdc45 onto the pre-replicative complex mediated by association of ScCdc45 with ScMcm2 (122-124). Taken together, these experiments suggest that Cdc7 acts on individual MCM complexes to activate replication. Surprisingly, though the mcm5(bob1) suppressor allele in budding yeast bypasses the requirement for the essential gene CDC7, ScMcm5 is not a substrate for Cdc7. These results suggest that the mcm5(bob1) may allow a conformational change in the MCM complex that normally requires Cdc7/Dbf4 activity (115). This change in conformation could allow the loading of CDC45, RPA and additional replication factors, as indicated in Figure 1. In addition, it could facilitate reorganization of the MCM complex to allow the core MCM4/6/7 to dimerize to form a hexamer, as indicated in Figure 4B. CDK phosphorylation apparently plays a negative role for at least some of the MCM proteins, probably by affecting chromatin association. The first evidence came from analysis of X1Mcm4, which has conserved consensus CDK sites in all species. The protein is a target of CDK, and phosphorylation correlates with loss of chromatin association (53, 68). In human cells, data suggest that the Cdc2 kinase negatively regulates MCM complex association with the chromatin by phosphorylation of HsMcm2 and HsMcm4 (108). In addition, treatment of G2 HeLa cells with a serine-threonine kinase inhibitor results in redistribution of the MCM proteins from the nucleoplasm to the chromatin, suggesting that in G2 cells, a protein kinase prevents rebinding of the MCM proteins to chromatin (125). This is also supported by the observation that chromatin association of the Drosophila MCM proteins during mitosis does not occur when cyclin A degradation is prevented (126). In vitro cyclin A/Cdk2 phosphorylates HsMcm4, and this phosphorylation correlates with inactivation of the DNA helicase activity of the core complex (127). Finally, CDK phosphorylation may regulate ability of the MCM proteins to accumulate in the nucleus during the cell cycle, at least in budding yeast. All MCM3 homologues contain proline-directed serine and threonine non-optimal CDK phosphorylation sites, but the juxtaposition of at least one CDK consensus phosphorylation site near the nuclear localization signal in some MCM3 homologues suggests that CDK activity may also act to regulate MCM
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
147
protein localization. In budding yeast, G1 cyclins and B type cyclins can trigger the export of ScMcm4 (95) and Cdc28/Clb kinases are required to promote the nuclear export of the MCM proteins to prevent reassembly of the pre-replicative complex (96). However, phosphorylation of MCM proteins is not restricted solely to CDC7 and CDK kinases. X1Mcm4 in the nucleus during S phase also undergoes Cdc2 kinase-independent nuclear phosphorylation that appears to occur early in S phase preceding elongation (53). Similar hyperphosphorylation is observed for chromatin-bound SpMcm4, though the relevant kinase has not been established (5). More recent data demonstrate that upon exit from mitosis, XlMcm4 in the cytosolic complex is dephosphorylated and X1Mcm2, X1Mcm3 and X1Mcm6 are phosphorylated. X1Mcm2, X1Mcm3 and XlMcm6 are phosphorylated in vivo during interphase, independent of CDK activity. If an immunoprecipitated MCM complex is subjected to an in vitro kinase assay, X1Mcm2, X1Mcm3 and X1Mcm6 are again phosphorylated by an associated kinase that is neither CDK nor CDC7. X1Mcm4 is partially dephosphorylated as the pre-replicative complex is formed. Complete dephosphorylation of X1Mcm4 inactivates the MCM complex and prevents its binding to the chromatin, indicating that phosphorylation has both positive and negative effects (109), and suggesting that additional kinases which regulate MCM proteins remain to be identified. Is the MCM Complex a Replicative Helicase? One of the missing pieces for somatic cell DNA replication is the identity of the replicative helicase that opens the double helix ahead of the advancing replication fork. Known helicases of this sort, including the DnaB protein of E. coli and the viral SV40 large T antigen, are typically multimers with six identical subunits forming a ring structure (reviewed in 128). Several pieces of data highlighted the MCM complex as an attractive candidate for this role, including consideration of structure, biochemistry and mutant phenotypes. The MCM proteins contain a conserved domain that places them in a superfamily of putative DNA-dependent ATPases, which also includes the prokaryotic NtrC-related transcription regulators and a group of bacterial and chloroplast proteins that have not been characterized (25, 26). Within the 200 amino acid conserved region referred to as the MCM central homology domain, the MCM proteins harbor a modified Walker A type nucleotide binding sequence. The signature Walker A sequence (GXXGXGKS/T) has been modified such that either an alanine or serine is in the third glycine position and a glycine or serine is in the second glycine position. In other ATPases, the Walker A motif typically interacts directly with the phosphate of the NTP substrate. The typical Walker B motif consists of a stretch of hydrophobic residues followed by DE/D. Koonin speculated that the MCM proteins might act at the origin of replication much the same way that the transcription regulator NtrC acts at a promoter for transcription initiation, perhaps by using ATP hydrolysis to melt the DNA duplex (25). Recently, it has been demonstrated that NtrC appears to couple ATP hydrolysis to conformational changes in the sigma factor leading to the generation of open complexes (reviewed in 129). Furthermore, if DNA is threaded through the center of the proposed toroidal complex, conformational changes of the hexameric MCM complex may be coupled to translocation of DNA through this complex (26). Ultrastructural analysis with electron microscopy suggests that the MCM complex forms a ring, as would be expected for a hexameric helicase (76, 86, 130). The structure of the 600 kDa HsMcm4/6/7 complex has been visualized by electron microscopy and appears to be a toroidal
148
S.G. PASION AND S.L. FORSBURG
form (11-12 nm) with a central channel (2-3 nm diameter) and occasionally forms structures that appear to have a slit (130), broadly similar to the complex visualized from fission yeast (86). Interestingly, binding of the HsMcm4/6/7 complex to single-stranded DNA appears like beads on a string and can be disrupted by the addition of HsMcm2. Ring structure complexes typically require a loading factor; in bacteria, for example, DnaC loads the helicase DnaB onto the chromosome. Analysis of the SpCdc18, which must associate with ORC to allow MCM complex binding, has led to the suggestion that it may act as the clamp loader for the MCM ring (26, 131-133). Biochemical analysis in vitro provides the most compelling data for an MCM helicase activity. A human MCM core complex containing HsMcm4, HsMcm6 and HsMcm7 was purified from HeLa cell extracts and this complex co-purifies with an ATPase and a 3' to 5' DNA helicase activity (134). Further fractionation of this purified complex reveals that it consists of two trimeric complexes of HsMcm4/6/7, and its DNA helicase activity supports the displacement of a 17mer oligonucleotide from a single-stranded DNA circle. These results are consistent with the MCM proteins playing a role in the unwinding of DNA at the origin. In addition, a complex consisting of HsMcm2/4/6/7 has affinity for histone H3 and presents the possibility that the MCM proteins may unwind nucleosome-bound DNA (79, 134). This HsMcm2/4/6/7 complex has diminished helicase activity, and the apparent inhibition of the DNA helicase activity of this complex may be due to disruption of the ability of the trimeric complex to dimerize into
MmMcm2 is able to disrupt this hexameric complex and inhibits the DNA helicase activity of HsMcm4/6/7 (37). The S. pombe heterotrimer SpMcm4/6/7 was purified as a dimer and exhibits ATP-dependent DNA binding, ssDNA-dependent ATPase activity, and a 3' to 5' helicase activity. Mixing of either SpMcm2 or the heterodimer SpMcm3/5 with the dimeric SpMcm4/6/7 reduces the complex to the single trimeric form and results in loss of its DNA helicase activity (87). Significantly, however, the activity observed in all these experiments is quite low, indicating that if the MCM core complex is a replicative helicase, it requires additional processivity factors in vivo. Analysis of site-directed mutants in vivo provides further information about MCM protein function. In fission yeast, mutation of catalytic lysine residues to alanine in the Walker A and Walker B motifs in SpMcm2 is lethal, and the mutant protein is unable to assemble into an MCM complex (32, 51). Mutations of the equivalent residues in SpMcm4 or SpMcm7 are also lethal, although it has not been determined whether they affect complex association (E. B. Gómez and S.L. Forsburg, unpublished data). As discussed in the section on complex assembly, it is possible that the ATPase motifs of at least some of the MCM proteins function as multimerization domains, rather than as helicase domains. However, ATPase mutants of MmMcm4, MmMcm6 and MmMcm7 do assemble into complexes, demonstrating that mutations of some of the ATPase motifs do not affect assembly of the core complex. This analysis further showed that the ATPase domain of MmMcm6 appears to be critical for DNA helicase activity, and the ATPase domain for MmMcm4 is important for ssDNA binding (135). Furthermore, MCM2 is not singular in its inhibition of the core complex DNA helicase activity; the addition of a complex of HsMcm3/5 also inhibits the DNA helicase activity of the HsMcm4/6/7 complex (130). Complementing these experiments in eukaryotes, analysis of the archaeal MtMcm described above revealed DNA helicase activity in vitro (75-77). Purification of the core Mcm4/6/7 complex from both human and fission yeast cells demonstrates that a hexamer consisting of two trimeric complexes exhibits DNA helicase activity although it is less processive
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
149
than the archaeal helicase (87, 134). This offers strong evidence that the MCM proteins act at some level as a helicase in vivo. The possibility that the MCM complex is a replicative helicase explains some puzzling phenotypes in yeast that indicated they function after replication initiation. For example, several of the early isolated mcm mutants in both yeast species arrested the cell cycle with an apparently replicated DNA content (2, 13, 15-17, 19, 20, 67). Although the popular model at the time favored an initiation-specific function, a role later in S phase was postulated based on these results (19, 32). Separation of the initiation from the elongation function was also indicated by physiological experiments in fission yeast with the use of deletion mutants undergoing spore germination. These data suggest that relatively little MCM protein is required for initiation, but high levels are required for successful completion of S phase (40). This was consistent with observations that modest reduction in the level of a single MCM protein leads to reduced growth rate and genetic instability, indicating a threshold level for activity (32, 85). Recent experiments in budding yeast convincingly demonstrate that MCM proteins also function after replication initiation. Use of an engineered “temperature-sensitive” MCM allele allows each MCM protein to be degraded rapidly after initiation but before the end of DNA replication. Cells that lost mcm function in this experiment were unable to complete DNA synthesis (136). Thus, the weight of evidence suggests that the MCM complex is indeed a helicase active throughout S phase. Is it the replicative helicase that acts at the replication fork? Or does it have some additional or different functions? Some important questions in the field remain to be answered before a conclusive answer can be given. Conclusions and Future Directions Work in the field to date allows us to build a model of MCM protein function as follows. The MCM proteins play multiple roles during DNA replication. A subset of MCM proteins associate with individual replication origins, where they are required for the assembly of the prereplicative complex and the loading of additional replication factors (Figure 1). In response to activation by CDC7 and other kinases, the MCM proteins undergo a change from a loading platform to an active participant in replication progression. Analysis of the MCM complex structure clearly indicates that the catalytically active form of MCM proteins is a dimer of the MCM4/6/7 core. Consider the data from the archaeal MtMcm, suggesting that MtMcm homomultimerizes as a dodecamer in a dual hexamer arrangement. Perhaps the eukaryotic MCM complex forms a similar dual hexamer with two complexes adjacent to one another. Upon activation of the replication origin, this double MCM complex may be rearranged to form a catalytically active structure as suggested in Figure 4B. Under this model, the peripheral subunits play a regulatory role, repressing the DNA helicase activity of the MCM core complex until they are activated, and then perhaps bridging the interaction between the MCM complex and other replication factors. This could also explain why eukaryotes require six different MCM proteins. In order to link complex assembly and localization, and to provide negative and positive regulatory function, the individual subunits may have acquired specialized functions in addition to their shared activity. If this is so, then the putative negative role associated with the peripheral subunits might be further investigated with the use of dominant negative alleles of the genes, although none has been identified thus far. Although this model appears to account for data in both eukaryotes and archaea, it intriguingly may not explain initiation of DNA replication during meiosis. Recent work suggests that the fission yeast MCM proteins are dispensable for DNA replication during the
150
S.G. PASION AND S.L. FORSBURG
meiotic cell cycle (137). This might reflect differences in the mechanism of initiation or progression of DNA replication in mitotic versus meiotic S phases, indicating that there are other ways of carrying out the replication of complex genomes. Significantly, if there were a conformational change of the MCM complex bound to the replication origin, it would affect only a subset of the MCM proteins in the cell. The proteins are extremely abundant, with ScMcm2 and ScMcm3 estimated to be at molecules and molecules per cell, respectively, and HsMcm2 and HsMcm3 to be at molecules and molecules per cell, respectively (80, 85, 98). This is particularly striking because in budding yeast, the number of replication origins is estimated at about 400 (138), indicating that the MCM proteins are in at least 100-fold excess over replication origins. Data in both budding and fission yeasts (32, 85) indicate that modestly reducing the levels of the MCM proteins below this level sharply affects growth rate, even though they still remain in excess over the number of origins. Why is it necessary to have so many MCM proteins if their primary function is at the origin of replication and the elongating fork? One possibility is that these abundant proteins fall into two functional classes. The MCM complexes tightly associated with chromatin at replication origins are distinct from the “soluble excess” of MCM proteins, which are actually bound more loosely to the chromatin. In this context, this additional pool of MCM proteins may play a structural role in maintaining chromosome integrity and could explain the observation that MCM2 also binds to histone. The chromatin immunoprecipitation data derived from budding yeast suggest that the MCM proteins appear to move with the replication fork. Certainly, this observation is compatible with its possible role as a replicative DNA helicase. However, this is inconsistent with the chromatin association data in Xenopus and mammalian systems that suggest that the MCM proteins localize in foci which are not coincident with replication foci as defined by PCNA or RPA staining or by bromodeoxyuridine incorporation. Thus, it is formally possible that MCM proteins are not actually moving with the locomotive of the replication fork; chromatin immunoprecipitation experiments that are interpreted as fork movement might also be detecting the replication fork as it passes over a track of MCM proteins along the chromosome. Without doubt, the MCM proteins have essential roles in replication initiation and elongation, and these roles represent only a subset of several intriguing facets of this ubiquitous and ancient protein family that remain to be elucidated. REFERENCES 1 2 3 4 5 6 7 8 9 10 11 12 13
Kelly, T.J. and Brown, G.W. (2000) Annu. Rev. Biochem. 69, 829-880. Hennessy, K.M., Lee, A., Chen, E. and Botstein, D. (1991) Genes Dev. 5, 958-969. Hardy, C.F. (1997) Gene 187, 239-246. Hofmann, J.F. and Beach, D. (1994) EMBO J. 13, 425-434. Nishitani, H., Lygerou, Z., Nishimoto, T. and Nurse, P. (2000) Nature 404, 625-628. Maiorano, D., Moreau, J. and Mechali, M. (2000) Nature 404, 622-625. Whittaker, A.J., Royzman, I. and Orr-Weaver, T.L. (2000) Genes Dev. 14, 1765-1776. Rowles, A. and Blow, J.J. (1997) Curr. Opin. Genet. Dev. 7, 152-157. Ritzi, M. and Knippers, R. (2000) Gene 245, 13-20. Kearsey, S.E. and Labib, K. (1998) Biochim. Biophys. Acta 1398, 113-136. Fujita, M. (1999) Front Biosci. 4, D816-823. Tye, B.K. (1999) Annu. Rev. Biochem. 68, 649-686. Maine, G.T., Sinha, P. and Tye, B.K. (1984) Genetics 106, 365-385.
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39
40 41 42 43 44 45 46
151
Moir, D., Stewart, S.E., Osmond, B.C. and Botstein, D. (1982) Genetics 100, 547-563. Nasmyth, K. and Nurse, P. (1981) Mol. Gen. Genet. 182, 119-124. Toda, T., Umesono, K., Hirata, A. and Yanagida, M. (1983) J. Mol. Biol. 168, 251-270. Coxon, A., Maundrell, K. and Kearsey, S.E. (1992) Nucl. Acids Res. 20, 5571-5577. Miyake, S., Okishio, N., Samejima, I., Hiraoka, Y., Toda, T., Saitoh, I. and Yanagida, M. (1993) Mol. Biol. Cell 4, 1003-1015. Forsburg, S.L. and Nurse, P. (1994) J. Cell Sci. 107, 2779-2788. Takahashi, K., Yamada, H. and Yanagida, M. (1994) Mol. Biol. Cell 5, 1145-1158. Thömmes, P., Fett, R., Schray, B., Burkhart, R., Barnes, M., Kennedy, C., Brown, N.C. and Knippers, R. (1992) Nucl. Acids Res. 20, 1069-1074. Blow, J.J. and Laskey, R.A. (1988) Nature 332, 546-548. Blow, J.J. (1993) J. Cell Biol. 122, 993-1002. Chong, J.P., Thommes, P. and Blow, J.J. (1996) Trends Biochem. Sci. 21, 102-106. Koonin, E.V. (1993) Nucl. Acids Res. 21, 2541-2547. Neuwald, A.F., Aravind, L., Spouge, J.L. and Koonin, E.V.(1999) Genome Res. 9, 27-43. Rogers, S., Wells, R. and Rechsteiner, M. (1986) Science 234, 364-368. Yan, H., Gibson, S. and Tye, B.K. (1991) Genes Dev. 5, 944-957. Tsuruga, H., Yabuta, N., Hosoya, S., Tamura, K., Endo, Y. and Nojima, H. (1997) Genes Cells 2, 381-399. Kubota, Y., Mimura, S., Nishimoto, S., Masuda, T., Nojima, H. and Takisawa, H. (1997) EMBO J. 16, 3320-3331. Ohno, K., Hirose, F., Inoue, Y.H., Takisawa, H., Mimura, S., Hashimoto, Y., Kiyono, T., Nishida, Y. and Matsukage, A. (1998) Gene 217, 177-185. Forsburg, S.L., Sherman, D.A., Ottilie, S., Yasuda, J.R. and Hodson, J.A. (1997) Genetics 147, 1025-1041. Miyake, S., Saito, I., Kobayashi, H. and Yamashita, S. (1996) Gene 175, 71-75. Ye, X.S., Fincher, R.R., Tang, A., McNeal, K.K., Gygax, S.E., Wexler, A.N., Ryan, K.B., James, S.W. and Osmani, S.A. (1997) J. Biol. Chem. 272, 33384-33393. Pasion, S.G. and Forsburg, S.L. (1999) Mol. Biol. Cell 10, 4043-4057. Kimura, H., Ohtomo, T., Yamaguchi, M., Ishii, A. and Sugimoto, K. (1996) Genes Cells 1, 977-993. Ishimi, Y., Komamura, Y., You, Z. and Kimura, H. (1998) J. Biol. Chem. 273, 83698375. Treisman, J.E., Follette, P.J, O'Farrell, P.H. and Rubin, G.M. (1995) Genes Dev. 9, 17091715. Todorov, I.T., Pepperkok, R., Philipova, R.N., Kearsey, S.E., Ansorge, W. and Werner, D. (1994) J. Cell Sci. 107, 253-265. Liang, D.T., Hodson, J.A. and Forsburg, S.L. (1999) J. Cell Sci. 112, 559-567. Chen, Y., Hennessy, K.M., Botstein, D. and Tye, B.K. (1992) Proc. Nat. Acad. Sci. U.S.A. 89, 10459-10463. Gibson, S.I., Surosky, R.T. and Tye, B.K. (1990) Mol. Cell. Biol. 10, 5707-5720. Sherman, D.A. and Forsburg, S.L. (1998) Nucl. Acids Res. 26, 3955-3960. Hu, B., Burkhart, R., Schulte, D., Musahl, C. and Knippers, R. (1993) Nucl. Acids Res. 21, 5289-5293. Madine, M.A., Khoo, C.Y., Mills, A.D. and Laskey, R.A. (1995) Nature 375, 421-424. Sabelli, P.A., Burgess, S.R., Kush, A.K., Young, M.R. and Shewry, P.R. (1996) Mol. Gen. Genet. 252, 125-136.
152
S.G. PASION AND S.L. FORSBURG
47
Young, M.R., Suzuki, K., Yan, H., Gibson, S. and Tye, B.K. (1997) Genes Cells 2, 631643. Kubota, Y., Mimura, S., Nishimoto, S., Takisawa, H. and Nojima, H. (1995) Cell 81, 601609. Takei, Y. and Tsujimoto, G. (1998) J. Biol. Chem. 273, 22177-22180. Kimura, H., Nozaki, N. and Sugimoto, K. (1994) EMBO J. 13, 4311-4320. Sherman, D.A., Pasion, S.G. and Forsburg, S.L. (1998) Mol. Biol. Cell 9, 1833-1845. Whitbread, L.A. and Dalton, S. (1995) Gene 155, 113-117. Coué, M., Kearsey, S.E. and Mechali, M. (1996) EMBO J. 15, 1085-1097. Kimura, H., Takizawa, N., Nozaki, N. and Sugimoto, K. (1995) Nucl. Acids Res. 23, 2097-2104. Musahl, C., Schulte, D., Burkhart, R. and Knippers, R. (1995) Eur. J. Biochem. 230, 1096-1101. Feger, G., Vaessin, H., Su, T.T., Wolff, E., Jan, L.Y. and Jan, Y.N. (1995) EMBO J. 14, 5387-5398. Hennessy, K.M., Clark, C.D. and Botstein, D. (1990) Genes Dev. 4, 2252-2263. Su, T.T., Yakubovich, N. and O'Farrell, P.H. (1997) Gene 192, 283-289. Paul, R., Hu, B., Musahl, C., Hameister, H. and Knippers, R. (1996) Cytogenet. Cell Genet. 73, 317-321. Moir, D. and Botstein, D. (1982) Genetics 100, 565-577. Miyake, S. and Yamashita, S. (1998) Genes Cells 3, 157-166. Hardy, C.F., Dryga, O., Seematter, S., Pahl, P.M. and Sclafani, R.A. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 3151-3155. Johnston, L.H., Masai, H. and Sugino, A. (1999) Trends Cell Biol. 9, 249-252. Sclafani, R.A. (2000) J. Cell Sci. 113, 2111-2117. Feger, G. (1999) Gene 227, 149-155. Holthoff, H.P., Hameister, H. and Knippers, R. (1996) Genomics 37, 131-134. Dalton, S. and Whitbread, L. (1995) Proc. Nat. Acad. Sci. U.S.A. 92, 2514-2518. Hendrickson, M., Madine, M., Dalton, S. and Gautier, J. (1996) Proc. Nat. Acad. Sci. U.S.A. 93, 12223-12228. Takizawa, N., Kimura, H. and Sugimoto, K. (1995) Gene 167, 343-344. Fujita, M., Kiyono, T., Hayashi, Y. and Ishibashi, M. (1996) J. Biol. Chem. 271, 43494354. Kiyono, T., Fujita, M., Hayashi, Y. and Ishibashi, M. (1996) Biochim. Biophys. Acta 1307, 31-34. Springer, P.S., McCombie, W.R., Sundaresan, V. and Martienssen, R.A. (1995) Science 268, 877-880. Fujita, M., Kiyono, T., Hayashi, Y. and Ishibashi, M. (1996) Biochem. Biophys. Res. Commun. 219, 604-607. Cann, I.K. and Ishino, Y. (1999) Genetics 152, 1249-1267. Kelman, Z., Lee, J.K. and Hurwitz, J. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 1478314788. Chong, J.P., Hayashi, M.K., Simon, M.N., Xu, R.M. and Stillman, B. (2000) Proc. Nat. Acad. Sci. U.S.A. 97, 1530-1535.
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
77 78
Shechter, D.F., Ying, C.Y. and Gautier, J. (2000) J. Biol. Chem. 275, 15049-15059. Richter, A. and Knippers, R. (1997) Eur. J. Biochem. 247, 136-141.
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
79 80 81
153
Ishimi, Y., Ichinose, S., Omori, A., Sato, K. and Kimura, H. (1996) J. Biol. Chem. 271, 24115-24122. Burkhart, R., Schulte, D., Hu, D., Musahl, C., Gohring, F. and Knippers, R. (1995) Eur. J. Biochem. 228, 431-438. Tsuruga, H., Yabuta, N., Hashizume, K., Ikeda, M., Endo, Y. and Nojima, H. (1997)
Biochem. Biophys. Res. Commun. 236, 118-125. 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
104 105 106 107 108 109 110 111
Chong, J.P., Mahbubani, H.M., Khoo, C.Y. and Blow, J.J. (1995) Nature 375, 418-421. Prokhorova, T.A. and Blow, J.J. (2000) J. Biol. Chem. 275, 2491-2498. Romanowski, P., Madine, M.A. and Laskey, R.L.A. (1996) Proc. Nat. Acad. Sci. U.S.A. 93, 10189-10194. Lei, M., Kawasaki, Y. and Tye, B.K. (1996) Mol. Cell. Biol. 16, 5081-5090. Adachi, Y., Usukura, J. and Yanagida, M. (1997) Genes Cells 2, 467-479. Lee, J.K. and Hurwitz, J. (2000) J. Biol. Chem. 275, 18871-18878.
Whiteheart, S.W., Rossnagel, K., Buhrow, S.A., Brunner, M., Jaenicke, R. and Rothman, J.E. (1994) J. Cell Biol. 126, 945-954. Singh, S.K. and Maurizi, M.R. (1994) J. Biol. Chem. 269, 29537-29545. Merchant, A.M., Kawasaki, Y., Chen, Y., Lei, M. and Tye, B.K. (1997) Mol. Cell. Biol. 17, 3261-3271. Homesley, L., Lei, M., Kawasaki, Y., Sawyer, S., Christensen, T. and Tye, B.K. (2000) Genes Dev. 14, 913-926. Zou, L. and Stillman, B. (1998) Science 280, 593-596. Yan, H., Merchant, A.M. and Tye, B.K. (1993) Genes Dev. 7, 2149-2160.
Young, M.R. and Tye, B.K. (1997) Mol. Biol. Cell 8, 1587-1601. Labib, K., Diffley, J.F. and Kearsey, S.E. (1999) Nature Cell Biol. 1, 415-422.
Nguyen, V.Q., Co, C., Irie, K. and Li, J.J. (2000) Curr. Biol. 10, 195-205. Forsburg, S.L. and Nurse, P. (1991) Annu. Rev. Cell Biol. 7, 227-256. Todorov, I.T., Attaran, A. and Kearsey, S.E. (1995) J. Cell Biol. 129, 1433-1445. Starborg, M., Brundell, E., Gell, K., Larsson, C., White, I., Daneholt, B. and Hoog, C. (1995) J. Cell Sci. 108, 927-934. Maiorano, D., Van Assendelft, G.B. and Kearsey, S.E. (1996) EMBO J. 15, 861-872. Krude, T., Musahl, C., Laskey, R.A. and Knippers, R. (1996) J. Cell Sci. 109, 309-318. Okishio, N., Adachi, Y. and Yanagida, M. (1996) J.Cell Sci. 109, 319-326. Kearsey, S.E., Montgomery, S., Labib, K. and Lindner, K. (2000) EMBO J. 19, 16811690.
Madine, M.A., Khoo, C.Y., Mills, A.D., Mushal, C. and Laskey, R.A. (1995) Curr. Biol. 5, 1270-1279. Tanaka, T., Knapp, D. and Nasmyth, K. (1997) Cell 90, 649-660. Aparicio, O.M., Stout, A.M. and Bell, S.P. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 91309135. Aparicio, O.M., Weinstein, D.M. and Bell, S.P. (1997) Cell 91, 59-69. Fujita, M., Yamada, C., Tsurumi, T., Hanaoka, F., Matsuzawa, K. and Inagaki, M. (1998) J. Biol. Chem. 273, 17095-17101. Pereverzeva, I., Whitmire, E., Khan, B. and Coue, M. (2000) Mol. Cell. Biol. 20, 36673676.
Hua, X.H., Yan, H. and Newport, J. (1997) J. Cell Biol. 137, 183-192. Thömmes, P., Kubota, Y., Takisawa, H. and Blow, J.J. (1997) EMBO J. 16, 3312-3319.
154
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
S.G. PASION AND S.L. FORSBURG
Fujita, M., Kiyono, T., Hayashi, Y. and Ishibashi, M. (1997) J. Biol. Chem. 272, 1092810935. Holthoff, H.P., Baack, M., Richter, A., Ritzi, M. and Knippers, R. (1998) J. Biol. Chem. 273, 7320-7325. Maiorano, D., Lemaitre, J.M. and Mechali, M. (2000) J. Biol. Chem. 275, 8426-8431. Lei, M., Kawasaki, Y., Young, M.R., Kihara, M., Sugino, A. and Tye, B.K. (1997) Genes Dev. 11, 3365-3374. Sato, N., Arai, K. and Masai, H. (1997) EMBO J. 16, 4340-4351. Brown, G.W. and Kelly, T.J. (1998) J. Biol. Chem. 273, 22083-22090. Takeda, T., Ogino, K., Matsui, E., Cho, M.K., Kumagai, H., Miyake, T., Arai, K. and Masai, H. (1999) Mol. Cell. Biol. 19, 5535-5547. Masai, H., Matsui, E., You, Z., Ishimi, Y., Tamai, K. and Arai, K. (2000) J. Biol. Chem. 275, 29042-29052. Donaldson, A.D., Fangman, W.L. and Brewer, B.J. (1998) Genes Dev. 12, 491-501. Bousset, K. and Diffley, J.F. (1998) Genes Dev. 12, 480-490. Tanaka, T. and Nasmyth, K. (1998) EMBO J. 17, 5182-5191. Owens, J.C., Detweiler, C.S. and Li, J.J. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 1252112526. Zou, L. and Stillman, B. (2000) Mol. Cell. Biol. 20, 3086-3096. Coverley, D., Wilkinson, H.R., Madine, M.A., Mills, A.D. and Laskey, R.A. (1998) Exp. Cell Res. 238, 63-69. Su, T.T. and O'Farrell, P.H. (1997) J. Cell Biol. 139, 13-21. Ishimi, Y., Komamura-Kohno, Y., You, Z., Omori, A. and Kitagawa, M. (2000) J. Biol. Chem. 275, 16235-16241. Patel, S.S. and Picha, K.M. (2000) Annu. Rev. Biochem. 69, 651-697. Rombel, I., North, A., Hwang, I., Wyman, C. and Kustu, S. (1998) Cold Spring Harbor Symp. Quant. Biol. 63, 157-166. Sato, M., Gotow, T., You, Z., Komamura-Kohno, Y., Uchiyama, Y., Yabuta, N., Nojima, H. and Ishimi, Y. (2000) J. Mol. Biol. 300, 421-431. Perkins, G. and Diffley, J.F. (1998) Mol. Cell 2, 23-32. DeRyckere, D., Smith, C.L. and Martin, G.S. (1999) Genetics 151, 1445-1457. Liu, J., Smith, C.L., DeRyckere, D., DeAngelis, K., Martin, G.S. and Berger, J.M. (2000) Mol. Cell 6, 637-648. Ishimi, Y. (1997) J. Biol. Chem. 272, 24508-24513. You, Z., Komamura, Y. and Ishimi, Y. (1999) Mol. Cell. Biol. 19, 8003-8015. Labib, K., Tercero, J.A. and Diffley, J.F. (2000) Science 288, 1643-1647. Forsburg, S.L. and Hodson, J.A. (2000) Nature Genet. 25, 263-268. Newlon, C.S. (1988) Microbiol. Rev. 52, 568-601. Muzi-Falconi, M. and Kelly, T.J. (1995) Proc. Nat. Acad. Sci. U.S.A. 92, 12475-12479. Bell, S.P., Mitchell, J., Leber, J., Kobayashi, R. and Stillman, B. (1995) Cell 83, 563-568. Pak, D.T., Pflumm, M., Chesnokov, I., Huang, D.W., Kellum, R., Marr, J., Romanowski, P. and Botchan, M.R. (1997) Cell 91, 311-323. Rowles, A., Chong, J.P., Brown, L., Howell, M., Evan, G.I. and Blow, J.J. (1996) Cell 87, 287-296. Gavin, K.A., Hidaka, M. and Stillman, B. (1995) Science 270, 1667-1671. Leatherwood, J., Lopez-Girona, A. and Russell, P. (1996) Nature 379, 360-363. Bell, S.P., Kobayashi, R. and Stillman, B. (1993) Science 262, 1844-1849.
DECONSTRUCTING A CONSERVED PROTEIN FAMILY
146 147
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
155
Foss, M., McNally, F.J., Laurenson, P. and Rine, J. (1993) Science 262, 1838-1844. Gossen, M., Pak, D.T., Hansen, S.K., Acharya, J.K. and Botchan, M.R. (1995) Science 270, 1674-1677. Landis, G., Kelley, R., Spradling, A.C. and Tower, J. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 3888-3892. Carpenter, P.B., Mueller, P.R. and Dunphy, W.G. (1996) Nature 379, 357-360. Takahara, K., Bong, M., Brevard, R., Eddy, R.L., Haley, L.L., Sait, S.J., Shows, T.B., Hoffman, G.G. and Greenspan, D.S. (1996) Genomics 31, 119-122. Moon, K.Y., Kong, D., Lee, J.K., Raychaudhuri, S. and Hurwitz, J. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 12367-12372. Pinto, S., Quintana, D.G., Smith, P., Mihalek, R.M., Hou, Z.H., Boynton, S., Jones, C.J., Hendricks, M., Velinzon, K., Wohlschlegel, J.A., Austin, R.J., Lane, W.S., Tully, T. and Dutta, A. (1999) Neuron 23, 45-54. Tugal, T., Zou-Yang, X.H., Gavin, K., Pappin, D., Canas, B., Kobayashi, R., Hunt, T. and Stillman, B. (1998) J. Biol. Chem. 273, 32421-32429. Springer, J., Nanda, I., Hoehn, K., Schmid, M. and Grummt, F. (1999) Cytogenet. Cell Genet. 87, 245-251. Chuang, R.Y. and Kelly, T.J. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 2656-2661. Chesnokov, I., Gossen, M., Remus, D. and Botchan, M. (1999) Genes Dev. 13, 12891296. Quintana, D.G., Hou, Z., Thome, K.C., Hendricks, M., Saha, P. and Dutta, A. (1997) J. Biol. Chem. 272, 28247-28251. Ishiai, M., Dean, F.B., Okumura, K., Abe, M., Moon, K.Y., Amin, A.A., Kagotani, K., Taguchi, H., Murakami,Y., Hanaoka, F., O'Donnell, M., Hurwitz, J. and Eki, T. (1997) Genomics 46, 294-298. Loo, S., Fox, C.A., Rine, J., Kobayashi, R., Stillman, B. and Bell, S. (1995) Mol. Biol. Cell 6, 741-756. Li, J.J. and Herskowitz, I. (1993) Science 262, 1870-1874. Kelly, T.J., Martin, G.S., Forsburg, S.L., Stephen, R.J., Russo, A. and Nurse, P. (1993) Cell 74, 371-382. Zhou, C., Huang, S.H. and Jong, A.Y. (1989) J. Biol. Chem. 264, 9022-9029. Hateboer, G., Wobst, A., Petersen, B.O., Le Cam, L., Vigo, E., Sardet, C. and Helin, K. (1998) Mol. Cell. Biol. 18, 6679-6697. Saha, P., Chen, J., Thome, K.C., Lawlis, S.J., Hou, Z.H., Hendricks, M., Parvin, J.D. and Dutta, A. (1998) Mol. Cell. Biol. 18, 2758-2767. Shaikh, T.H., Gottlieb, S., Sellinger, B., Chen, F., Roe, B.A., Oakey, R.J., Emanuel, B.S. and Budarf, M.L. (1999) Mamm. Genome 10, 322-326. Mimura, S. and Takisawa, H. (1998) EMBO J. 17, 5699-5707. Williams, R.S., Shohet, R.V. and Stillman, B. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 142-147. Saha, P., Thome, K.C., Yamaguchi, R., Hou, Z., Weremowicz, S. and Dutta, A. (1998) J. Biol. Chem. 273, 18205-18209. Rechsteiner, M. and Rogers, S.W. (1996) Trends Biochem. Sci. 21, 267-271.
EXPRESSION OF FOREIGN GENES IN THE YEAST Pichia pastoris Geoffrey P. Lin Cereghino#, Anthony J. Sunga, Joan Lin Cereghino# and James M. Cregg Keck Graduate Institute of Applied Life Sciences 535 Watson Drive Claremont, CA 91711
Key words: Pichia pastoris, yeast, heterologous gene expression, foreign protein production, methanol. INTRODUCTION Since 1985 the methylotrophic yeast Pichia pastoris has been successfully used for the production of over 300 recombinant proteins (1). (A brief description of published proteins with links to publications via PubMed is available at: http://public.kgi.edu/~cregg/Pichia. html). The increasing popularity of P. pastoris is attributed to: (a) the ease with which it can be genetically manipulated and cultured on a large scale; (b) its ability to produce foreign proteins either intracellularly or extracellularly at high levels; (c) its capability of performing many eukaryotic post-translational modifications, such as proteolytic processing, folding, disulfide bond formation, and glycosylation; and (d) its ready availability as a kit from Invitrogen Corporation (Carlsbad, CA). This article provides a general outline on the use of the P. pastoris expression system, summarizes the advantages and disadvantages of the system, and describes the characteristics of several new P. pastoris promoters, which may broaden the usefulness of the system. More detailed information on the P. pastoris expression system may be obtained in one of the many reviews on the system (2 and references therein) and from Invitrogen (http://www.invitrogen.com). #
Current address: Department of Biological Sciences, University of the Pacific, Stockton, CA 95211
Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
157
158
G.P. LIN CEREGHINO ET AL.
BACKGROUND Initial studies on P. pastoris were not directed toward its use for heterologous protein expression. Because this single-celled microorganism was capable of growing to extremely high concentrations (> 130 g/L dry cell weight) on methanol, an inexpensive carbon source, it was selected by Phillips Petroleum Company during the early 1970s as a source of yeast biomass or single cell protein (SCP), primarily for use as a high-protein animal feed (3). However, during the 1970’s, the price of petroleum-based products, including methanol, increased dramatically, and as a result the economics of SCP production from methanol were never favorable. Seeing the potential of combining their high-density fermentation technology with the emergence of recombinant DNA methods, Phillips Petroleum Company contracted with the Salk Institute Biotechnology/Industrial Associates, Inc. (SIBIA) in the 1980s to develop P. pastoris as an organism for foreign gene expression. The conceptual basis for the expression system originated with the observation that many enzymes required for methanol metabolism are present at substantial levels only when cells are grown on methanol (4, 5). The most extreme example of this methanol regulation is displayed by alcohol oxidase (AOX), the first enzyme in the methanol utilization pathway. In methylotrophic yeasts, AOX is undetectable in cells grown on glucose, glycerol, or ethanol but is dramatically induced when cells are shifted to methanol (6). Based on this observation, it was predicted that the promoter from the AOX gene would be an excellent choice for the regulated expression of foreign genes. The first step in developing the system was the cloning and characterization of the AOX gene. Studies revealed that the P. pastoris genome contains two AOX genes, AOX1 and AOX2, but that AOX1 is responsible for all but a minor fraction of the total AOX message and protein in methanol-grown cells (7-9). In methanol shake-flask cultures, this level is typically ~5% of total soluble protein but can rise to greater than 30% in cells fed methanol at growth-limiting rates in fermenters (10). While ~5% of mRNA is from AOX1 in methanol-grown cells, the mRNA is undetectable in cells grown on most other carbon sources (11). Thus, the presence of methanol triggers a very sensitive and powerful induction mechanism in the absence of other carbon sources. In order to exploit this tightly regulated expression system, it was necessary to develop techniques to manipulate P. pastoris genetically (12). Fortunately, most techniques needed for its molecular genetic manipulation turned out to be similar to those developed for the wellstudied baker's yeast Saccharomyces cerevisiae. Relative to their S. cerevisiae counterparts, P. pastoris cells are smaller and prefer to remain in a haploid state (13). However, both yeasts share many anatomical and physiological traits, the foremost being a propensity for homologous recombination between genomic and artificially introduced DNAs. P. pastoris grows on the same media as S. cerevisiae and requires no special equipment for culturing. Moreover, key methods needed for DNA-mediated transformation, gene targeting, and gene replacement are similar to those used for S. cerevisiae and lead to similar results (14). With this large arsenal of "adapted" protocols at hand, P. pastoris researchers were able to transform a once obscure methylotroph rapidly into an easily manipulated user-friendly recombinant protein production machine.
EXPRESSION OF FOREIGN GENES IN THE YEAST PICHIA PASTORIS
159
HOW TO CONSTRUCT A P. pastoris EXPRESSION STRAIN Expression of a foreign gene in P. pastoris requires three basic steps: (a) insertion of the gene into an expression vector; (b) introduction of the vector into the P. pastoris genome; and (c) examination of potential expression strains for the foreign product. The first step is to recombine the coding sequence of the desired gene into a P. pastoris expression vector, such as pAO815. As shown in Figure 1, this typical expression vector contains an expression cassette composed of the AOX1 promoter, one or more foreign gene insertion sites, and the AOX1 transcriptional terminator, along with a bacterial origin of replication and selectable markers for transformation of the vector into P. pastoris (HIS4) and bacteria In addition, some P. pastoris expression vectors contain additional functional DNA sequences such as sequences encoding a secretion signal or, as with pAO815, a fragment originating from AOX1 3' sequences which, as described in the next paragraph, can be used along with the AOX1 promoter sequences to carry out an AOX1 gene replacement. The foreign gene is inserted between the AOX1 promoter and terminator sequences (the EcoRI site in pAO815). The cassette is designed so that the resulting transcription product (cap-AOX1 5' UTR-ORF3' UTR-polyA) is a mature mRNA structure that is familiar to the yeast cellular machinery and does not contain cryptic sequences that may affect message stability or translational efficiency (15). Once the chimeric expression vector is constructed, it can be inserted into the P. pastoris genome in either of two ways. The simplest way is to digest the vector at a unique restriction site within either the HIS4 or the AOX1 sequences and then transform the linearized
160
G.P. LIN CEREGHINO ET AL.
vector into a P. pastoris his4 mutant host which stimulates the vector to recombine homologously at the cut locus via a single crossover event (Figure 2). Alternatively, the vector can be cut with BglII to release the expression cassette and HIS4 marker on a DNA fragment flanked by AOX1 terminal sequences that stimulate gene replacement events at AOX1 (Figure 3). The resulting replacement strains, which constitute approximately 10% of transformants, are not only prototrophic for histidine but are also deleted at AOX1. This disruption of the AOX1 forces these strains to rely on the transcriptionally weak AOX2 gene for growth on methanol (16). The gene replacement strains are easily identified among transformants by this slow-methanol-growth phenotype. Although these AOX1 replacement strains metabolize methanol at a greatly reduced rate, they sometimes express higher levels of foreign protein than do wild-type strains (16-18). With either single-crossover or gene replacement integration methods, gene conversion events, in which wild-type HIS4 sequences from the vector integrate into the genome without other vector sequences, occur at a substantial (10-50%) frequency (19). Thus, the presence of a stably integrated expression cassette in transformants should be confirmed by Southern blot or PCR analysis of the genome, by Northern blot for the foreign message, or by immunoblot or functional assay directly for the foreign protein.
EXPRESSION OF FOREIGN GENES IN THE YEAST PICHIA PASTORIS
161
After five or more strains carrying the expression cassette have been identified, small shake-flask cultures of each strain are grown to determine whether the foreign protein is synthesized and, if so, which strain appears to produce the highest level. As with virtually all P. pastoris expression cultures, from shake-flask to fermenter scale, the strains are initially grown in a glucose- or glycerol-based medium where the AOX1 promoter is tightly repressed and the foreign gene is not expressed. The ability to maintain cultures in an "expression off" mode is vital since it minimizes selection for non-expressing mutant strains during the cellgrowth phase. When the cultures reach the late logarithmic phase, the cells are shifted from the repressing medium to a medium in which methanol is the sole carbon source to induce expression. Samples of each culture are removed at selected times and analyzed for the presence of the foreign protein. For intracellular proteins, peak levels are usually reached within 24 hr in methanol, and cell extracts are examined by immunoblot or by biological activity for the foreign protein. For secreted proteins, peak levels are typically reached anywhere from 24 to 100 hr after being shifted to methanol, depending on the characteristics of the expression host strain (AOX1 or AOX1 gene replacement) and the foreign protein (e.g., its rate of accumulation in the medium, its structural stability and resistance to degradation by proteases in the medium). Especially with secreted foreign proteins, care must be taken to ensure that the product is of the expected molecular weight and biological activity, since proteolytic degradation or glycosylation has been a significant problem for some yeast-secreted proteins (15, 20, 21). The best expression strain is selected, and further culture studies on parameters such as culture density, media composition/pH, and time required for optimal induction of the foreign protein are conducted.
162
G.P. LIN CEREGHINO
ET AL.
The highest levels of production are invariably seen in fermenter cultures. Compared to shake-flask cultures, fermenters have the potential for much higher cell densities, more uniform culture conditions, and optimal induction of the AOX1 promoter (22, 23). Most fermenter-culturing protocols involve an initial batch growth phase with glycerol as the carbon source to generate biomass. Once the glycerol is exhausted, a short glycerol-limited fed-batch phase is performed, followed by a methanol-induction phase in which the methanol feed rate is increased as the culture adapts to growth on methanol. During this period the foreign protein accumulates. Fermenter culturing of P. pastoris expression strains is essential for producing the highest levels of foreign proteins, especially secreted proteins. A major advantage of P. pastoris relative to S. cerevisiae is that it does not like to ferment. Thus, the term fermenter is actually a misnomer when applied to P. pastoris. In high-cell-density cultures, the product of S. cerevisiae fermentation, ethanol, rapidly builds to toxic levels, which limits further growth and foreign protein production. With its preference for respiration over fermentation, P. pastoris can be cultured at extremely high densities without significant ethanol accumulation. Since the concentration of a secreted protein is approximately proportional to cell density, the facility of P. pastoris for high-density culturing represents a key property of the yeast. A second advantage of P. pastoris for secreted foreign proteins is that the methylotroph secretes only low levels of endogenous proteins. Extracellular foreign proteins typically comprise the major protein in the culture medium (17, 24). Thus, secretion acts as a major purification step, separating the heterologous protein from the intracellular proteins and other cellular debris. Due to protein stability and folding requirements, the choice of secretion is usually reserved for foreign proteins that are normally secreted by their native hosts. The P. pastoris system provides this option by allowing a researcher to clone the foreign ORF in frame with sequences encoding the S. cerevisiae prepro-peptide or another secretion signal through the availability of expression vectors that include these sequences (see Invitrogen catalog or web site). The resulting protein is initially synthesized with an Nterminal secretion signal fused directly to the mature heterologous product. This small string of amino acids directs the foreign protein into the endoplasmic reticulum (ER) and through the secretory pathway, and is usually (but not always) efficiently removed during transit through the pathway (25). As a lower eukaryote, P. pastoris is equipped to carry out many of the posttranslational modifications of higher eukaryotes to generate and secrete a properly folded and functional protein. With P. pastoris, most foreign proteins directed into the ER are secreted into the medium, although there is at least one report of a heterologous protein, collagen, that was not secreted but accumulated in the secretory pathway (26). Yields of foreign proteins from fermenter cultures range anywhere from to g/L, depending on the empirical properties of the gene, its transcript and protein product (2). Products can be purified by a variety of strategies, such as traditional or immunoaffinity chromatography (16, 21, 27). In addition, certain expression vectors are designed to generate heterologous proteins with C-terminal myc and His6 epitope tags for visualization and purification with commercially available antibodies (myc tag) or resins tag) (see Invitrogen catalog or web site).
EXPRESSION OF FOREIGN GENES IN THE YEAST PICHIA PASTORIS
163
APPROACHES TO INCREASING THE YIELD OF FOREIGN PROTEINS Obtaining the highest possible level of a foreign protein with the P. pastoris system often requires further research to optimize modified expression strains or culture conditions empirically. The most productive avenues include: 1. Isolation of Multicopy Expression Strains Strains that contain multiple integrated copies of an expression cassette often (but not always) yield more of a heterologous protein than do single-copy strains (28, 29). Therefore, after confirming that a single-copy P. pastoris strain produces significant amounts of the correct-sized, biologically active protein, it is advisable to construct and examine expression of so-called multicopy strains. Three approaches reliably lead to P. pastoris multicopy expression strains. The first approach entails the use of an expression vector that has both the HIS4 gene and the bacterial gene, which confers resistance to the eukaryotic antibiotic G418 (30). The HIS4 gene must be retained in these vectors, since it is difficult to select directly for G418 resistance in P. pastoris. Transformed cells are first selected for HIS4 complementation and then replicaplated onto plates containing G418. Selection for resistance to high concentrations (1-2 mg/ml) of G418 enriches for transformants that contain high copy numbers of the expression vector along with the gene. The second approach consists of using a vector with the bacterial Sh ble gene, which confers resistance to the antibiotic Zeocin (31). Unlike G418 selection, strains that receive expression cassettes with the Zeocin marker can be directly selected by resistance to the drug, and transformant populations can be enriched for either single-copy or multicopy expression cassette strains by the concentration of Zeocin in the selection plates. In addition, because the Sh ble gene serves as a selectable marker for both E. coli and P. pastoris, the vectors are small and convenient to use. The third option entails constructing a vector with multiple head-to-tail copies of an expression cassette (21). The key to this procedure is a vector, such as pAO815, in which the expression cassette is flanked by restriction sites that form complementary termini such as BamHI and BglII (Figure 1). Multicopy expression strains of all three types have proven to be stable under the selective pressure of production in fermenter cultures (30, 31). 2. Reduction of Proteolytic Degradation Yeast culture media, especially from high-cell-density cultures, contain significant amounts of hydrolytic enzyme activities that can degrade certain foreign proteins. Thus, in addition to increasing the rate of synthesis, foreign protein yields can sometimes be markedly enhanced by reducing their rate of degradation. Three strategies, either individually or in combination, have proven effective. One strategy involves the use of P. pastoris host strains defective in PEP4 (32, 33). This gene encodes the vacuolar hydrolase proteinase A, which is responsible for activating several of the other vacuolar hydrolases. These hydrolases are present in the culture medium, presumably due to cell lysis. A second strategy involves adjusting the pH of the culture medium to one that is not optimal for the problem protease. Because P. pastoris can grow between a pH of 3 and 7, there is considerable freedom to find a
164
G.P. LIN CEREGHINO ET AL.
more suitable pH. A third strategy is to add peptide- or amino acid-rich supplements to the medium, which appear to act as carriers that divert proteases from the foreign protein. 3. Maintenance of the AOX1 5' Untranscribed Region (UTR) Length. Highest yields of foreign protein result when the normal length of the AOX1 5' UTR is maintained (34). In the native AOX1 gene, the alcohol oxidase ORF is preceded by an unusually long 5' UTR of 116 nucleotides (7). Many P. pastoris expression vectors contain a multiple cloning site (MCS) composed of a string of restriction sites. In the process of inserting a heterologous gene into an AOX1 expression cassette, researchers often select a convenient restriction site in the 5' UTR sequences of the foreign gene (cDNA) and insert the fragment into the most convenient polylinker site. This can be a critical mistake. The added sequences from the 5'UTR and the MCS appear to decrease transcript stability or translatability or both, usually resulting in reduced foreign gene expression, sometimes dramatically (34). Thus, the highest expression levels are usually obtained when the first ATG of the heterologous gene is inserted as close as possible to the position of the first AOX1 ATG. This position coincides with the nearest 5' cloning site in P. pastoris expression vectors and is usually an EcoRI site. ALTERNATIVE PROMOTERS FOR EXPRESSION IN P. pastoris Although there are many examples of the use of the AOX1 promoter to express foreign genes successfully, there are circumstances in which this promoter may not be suitable. For example, the use of methanol to induce gene expression may not be appropriate for the production of food products, since methanol is produced from methane, a petroleum-related compound. Also, methanol is a potential fire hazard, especially in the quantities needed for large-scale production of foreign proteins. Thus, strong promoters that are not induced by methanol are attractive for expression of certain genes. One alternative promoter that we have developed for expression in P. pastoris is derived from the P. pastoris glyceraldehyde-3-phosphate dehydrogenase (GAP) gene (35). In S. cerevisiae, this promoter provides a strong constitutive level of expression on fermentable and non-fermentable carbon sources (36). Both Northern and reporter activation results indicate that the P. pastoris GAP promoter is most active in glucose-grown cells with a level comparable to that seen with the AOX1 promoter. Levels in glycerol- and methanol-grown cells are approximately two-thirds and one-third of the levels observed for glucose, respectively. Methanol is not required for induction of GAP promoter-based expression strains, nor is it necessary to shift cultures from one carbon source to another, which makes expression from such strains more convenient. However, the GAP promoter is not a good choice for the production of proteins that are toxic to the yeast. A second alternative to the AOX1 promoter is that from the P. pastoris FLD1 gene. FLD1 encodes a glutathione-dependent formaldehyde dehydrogenase, a key enzyme required for the metabolism of methanol as a carbon source and certain methylated amines such as methylamine as nitrogen sources (37). The FLD1 promoter is independently induced by either methanol as a sole carbon source (with ammonium sulfate as a nitrogen source) or methylamine as a sole nitrogen source (with glucose as a carbon source), but repressed in
EXPRESSION OF FOREIGN GENES IN THE YEAST PICHIA PASTORIS
165
medium with glucose and ammonium sulfate. With methanol or methylamine induction, expression levels of a reporter gene under the control of the FLD1 promoter were comparable to those obtained with the AOX1 promoter in methanol. Thus, the FLD1 promoter offers the ability to induce expression with an inexpensive non-toxic nitrogen source, methylamine. For some applications, the AOX1 promoter may express genes at too high a level. For example, we found that expression of a transcription factor halted cell growth shortly after methanol induction (unpublished data). As a second example, we observed that the expression of green fluorescence protein under control of the AOX1 promoter resulted in a fluorescence signal that was too bright and in the mislocalization of some of the protein. As a third example, there is evidence that the high level of expression from the AOX1 promoter may overwhelm components of the protein-handling machinery of the cell, causing a significant portion of the foreign protein population to be misfolded, unprocessed, or mislocalized (21). Finally, some applications require the expression of multiple foreign genes, each potentially at a different level (26). For these and other applications, a more moderate promoter is desirable, relative to the AOX1, GAP or FLD1 promoters. To meet this need, we investigated the promoter from the P. pastoris PEX8 gene. The PEX8 gene encodes a peroxisomal matrix protein that is essential for peroxisome biogenesis (38). It is expressed at a low but significant level on glucose medium and is induced modestly when cells are shifted to methanol. While the level of expression from the PEX8 promoter is higher than that of the FLD1 and AOX1 promoters on glucose, it is significantly lower than the latter promoters in methanol. A second moderate promoter is derived from P. pastoris YPT1, a gene that encodes a GTPase involved in secretion. The YPT1 promoter provides a low but constitutive level of expression in media with either glucose, methanol, or mannitol as carbon sources (39). Expression levels from the YPT1 promoter have been reported to be at least tenfold lower than those from the GAP promoter. PROBLEMS AND SOLUTIONS
Although the P. pastoris expression system has been successful in expressing a large number of foreign genes, there are potential problems in its use, some of which have solutions while others as yet do not. Here are three problems commonly faced by P. pastoris users. 1. Identifying a Highly Productive Expression Strain is often More Laborious than with Other Non-Yeast Expression Systems
Because foreign proteins of interest are of higher eukaryotic origin, the ability of lower eukaryotes to synthesize, fold, process, or secrete these proteins at a high rate can be limited. Thus, effort is often required to find the maximum rate at which the yeast can properly fold and process the heterologous protein. This may require empirically examining the effect of different promoters, expression cassette copy numbers, and secretion signal sequences. In addition, studies aimed at minimizing product degradation may also be needed. Fortunately, once these problems are solved at the shake-flask culture level, few further adjustments are needed for scale-up to a fermenter.
166
G.P. LIN CEREGHINO ET AL.
2. P. pastoris carries out O- and N-Linked Glycosylation that Differs from that Employed by Mammalian Cells (40, 41) In mammals, O-linked oligosaccharides added to secreted proteins consist of a variety of sugars, including galactose and sialic acid. In contrast, P. pastoris appears to add O-linked oligosaccharides composed solely of mannose residues (21). Moreover, P. pastoris can add Olinked mannose to amino acid residues that are not glycosylated in mammals. The structure of N-linked oligosaccharides on P. pastoris-secreted proteins is also different from that on mammalian glycoproteins (1, 42 for further discussion). These differences in N- and O-linked glycosylation potentially affect the folding and function of a heterologous protein. In addition, yeast oligosaccharides contain a high content of terminal mannose residues, that are rapidly cleared from the blood. Thus, most glycoproteins produced by yeasts, including P. pastoris, are unsuitable for intravenous therapeutic use due to their immunogenicity and rapid blood clearance rate. 3. Using the P. pastoris System to Simultaneously Express Multiple Genes has been Problematic in the Past (26) Until recently, only two selectable marker genes had been available for transformations of P. pastoris: his4 and Sh ble (Zeocin resistance) (12, 31). Thus, the expression of more than three different foreign genes in the same strain was difficult. To alleviate this problem, we have constructed a series of expression vectors with new biosynthetic marker genes (ARG4, ADE1 and URA3) along with complementing auxotrophically marked P. pastoris host strains containing all possible combinations of the four mutant biosynthetic genes (43). In addition, a positive selection system based on blasticidin resistance has been recently made available by Invitrogen, raising the number of selectable markers for P. pastoris to six. CONCLUSION Although the P. pastoris expression system is not suitable for the production of all foreign proteins, many are compatible with the system. For these proteins, there are major advantages to production in this yeast. Relative to higher eukaryotic expression systems, culturing costs are lower, yields (foreign protein/L of culture medium) are usually higher, and for secreted proteins, the initial purity of the product in the culture medium is higher. Relative to bacterial expression systems, the chances of recovering a eukaryotic foreign protein in its biologically active form are significantly higher with P. pastoris. Although few major P. pastoris-produced human Pharmaceuticals are on the market, several are in the clinical trial pipeline, including the anti-tumor drugs angiostatin and endostatin (44). In addition, P. pastoris is one of the only lower eukaryotic expression systems that is available to academic researchers as a complete kit (see Invitrogen catalog). Recent improvements in the system, including new marker host combinations and alternative promoters, should increase the range of proteins that can be produced by this methylotrophic host. As more proteins are successfully (or unsuccessfully) synthesized in P. pastoris, parameters involved with the optimal expression of specific classes of proteins will become better defined.
EXPRESSION OF FOREIGN GENES IN THE YEAST PICHIA PASTORIS
167
ACKNOWLEDGMENTS
Preparation of this manuscript was supported by grants DK43698 from the National Institutes of Health and DE-FG03-99ER20334 from the Department of Energy, Basic Energy Sciences Program to J.M.C. G.P.L.C. is the recipient of Postdoctoral Fellowship ES05759 from the National Institute of Environmental Health Sciences. J.L.C. is the recipient of a postdoctoral fellowship from the American Heart Association, Oregon Affiliate. We thank Donald R. Helinski for his guidance and Terry Hadfield for her assistance in preparing this manuscript. REFERENCES
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
Lin Cereghino, J. and Cregg, J.M. (2000) FEMS Micro. Rev. 24,45-66. Higgins D.R. and Cregg, J.M. (eds.) (1998) Pichia Protocols, Humana Press, Totowa, N.J. Wegner, G. (1990) FEMS Microbiol. Rev. 87, 279-284. Veenhuis, M., van Dijken, J.P. and Harder, W. (1983) Adv. Microb. Physiol. 24, 182. Egli, T., van Dijken, J.P., Veenhuis, M., Harder, W. and Fiechter, A. (1980) Arch. Microbiol. 124, 115-121. Roggenkamp, R., Janowicz, Z., Stanikowski, B. and Hollenberg, C.P. (1984) Mol. Gen. Genet. 194, 489-498. Ellis, S.B., Brust, P.F., Koutz, P.J., Waters, A.F., Harpold, M.M. and Gingeras, T.R. (1985) Mol. Cell. Biol. 5, 1111-1121. Cregg, J.M., Madden, K.R., Barringer, K.J., Thill, G.P. and Stillman, C.A. (1989) Mol. Cell. Biol. 9, 1316-1323. Koutz, P., Davis, G.R., Stillman, C., Barringer, K., Cregg, J.M. and Thill, G. (1989) Yeast 5, 167-177. Couderc, R. and Baratti, J. (1980) Agric. Biol. Chem. 44, 2279-2289. Cregg, J.M. and Madden, K.R. (1988) Dev. Ind. Microbiol. 29, 33-41. Cregg, J.M., Barringer, K.J., Hessler, A.Y. and Madden, K.R. (1984) Mol. Cell. Biol. 5, 3376-3385. Cregg, J.M., Shen, S., Johnson, M. and Waterham, H. (1998) in Higgins, D.R. and Cregg, J.M. (eds.) Pichia Protocols, Humana Press, Totowa, N.J. 103, 17-26. Cregg, J.M. and Madden, K.R. (1989) Mol. Gen. Genet. 219, 320-323. Romanos, M.A., Scorer, C.A. and Clare, J.J. (1992) Yeast 8, 423-488. Cregg, J.M., Tschopp, J.F., Stillman, C., Siegal, R., Akong, M., Craig, W.S., Buckholz, R.G., Madden, K.R., Kellaris, P.A., Davis, G.R., Smiley, B.L., Cruze, J., Torregrossa, R., Velicelebi, G. and Thill, G.P. (1987) Bio/Technology 5, 479-485. Tschopp, J.F., Brust, P.F., Cregg, J.M., Stillman, C.A. and Gingeras, T.R. (1987) Nucl. Acids Res. 15, 3859-3876. Tschopp, J.F., Sverlow, G., Kosson, R., Craig, W. and Grinna, L. (1987) Bio/Technology 5, 1305-1308. Cregg, J.M. and Russell, K.A. in Higgins, D.R. and Cregg, J.M. (eds.) (1998) Humana Press, Pichia Protocols, Totowa, N.J. 103, 27-40.
168
20 21 22 23
24 25 26 27
28
29 30 31
32 33
34 35 36 37 38 39
G.P. LIN CEREGHINO
ET AL.
Scorer, C.A., Buckholz, R.G., Clare, J.J. and Romanos, M.A. (1993) Gene 136, 111119. Brierley, R.A. in Higgins, D.R. and Cregg, J.M., (eds.) (1998) Pichia Protocols, Humana Press, Totowa, N.J. 103, 149-177. Stratton, J., Chiruvolu, V. and Meagher, M. in Higgins, D.R. and Cregg, J.M., eds. (1998) Pichia Protocols, Humana Press, Totowa, N.J. 103, 107-120. Romanos, M., Scorer, C., Sreekrishna, K. and Clare, J. in Higgins, D.R. and Cregg, J.M., (eds.) (1998) Pichia Protocols, Humana Press, Totowa, N.J. 103, 55-72. Barr, K.A., Hopkins, S.A. and Sreekrishna, K. (1992) Pharm. Eng. 12, 48-51. Vedvick, T., Buckholtz, R.G., Engel, M., Urcan, M., Kinney, J., Provow, S., Siegel, R.S. and Thill, G.P.(1991) J. Ind. Microbiol. 7, 197-202. Vuorela, A., Myllyharju, J., Nissi, R., Pihlajaniemi, T. and Kivirikko, K.I. (1997) EMBO J. 16, 6702-6712. Sreekrishna, K., Nelles, L., Potenz, R., Craze, J., Mazzaferro, P., Fish, W., Fuke, M., Holden, K., Phelps, D., Wood, P. and Parker, K. (1989) Biochemistry 28, 41174125. Thill, G.P., Davis, G.R., Stillman, C., Holtz, G., Brierly, R., Engel, M., Buckholtz, R., Kenney, J., Provow, S., Vedvick, T. and Siegel, R.S. in Heslot, H., Davies, J., Florent, J., Bobichon, L., Durand, G. and Penasse, L., (eds.) (1990) Sixth International Symposium on the Genetics of Microorganisms. Paris: Societé Francaise de Microbiologie, 2, 477-490. Clare, J.J., Romanos, M.A., Rayment, F.B., Rowedder, J.E., Smith, M.A., Payne, M.M., Sreekrishna, K. and Henwood, C.A. (1991) Gene 105, 205-212. Scorer, C.A., Clare, J.J., McCombie, W.R., Romanos, M.A.and Sreekrishna, K. (1993) Bio/Technology 12, 181-184. Higgins, D.R., Busser, K., Comiskey, J., Whittier, P.S., Purcell, T.J. and Hoeffler, J.P. in Higgins, D.R. and Cregg, J.M., (eds.) (1998) Pichia Protocols, Humana Press, Totowa, N.J. 103, 41-53. Chen, Y.L., Cino, L., White, C.E. and Komives, E.A. (1996) J. Chem. Technol. Biotechnol. 67, 143-148. Gleeson, M.A.G., White, C., Meininger, D.P. and Komives, E.A. in Higgins, D.R. and Cregg, J.M., (eds.) (1998) Pichia Protocols, Humana Press, Totowa, N.J. 103, 8194. Sreekrishna, K. in Baltz, R.H., Hegeman, G.D. and Skatrud, P.L., (eds.) (1993) Industrial Microorganisms: Basic and Applied Molecular Genetics. American Society of Microbiology, Washington, D.C. 119-126. Waterham, H.R., Digan, M.E., Koutz, P.J., Lair, S.V. and Cregg, J.M. (1997) Gene 186, 37-44. Schena, M., Picard, D. and Yamamoto, K.R. in Guthrie, C. and Fink, G. (eds.) (1991) Guide to Yeast Genetics and Molecular Biology. Academic Press, San Diego, CA. 194, 389-398. Shen, S., Suiter, G., Jeffries, T.W. and Cregg, J.M. (1998) Gene 216, 93-102. Liu, H., Tan, X., Russell, K.A., Veenhuis, M. and Cregg, J.M. (1995) J. Biol. Chem. 270, 10940-10951. Sears, I.B., O'Connor, J., Rossanese, O.W. and Glick B.S. (1998) Yeast 14, 783-790.
EXPRESSION OF FOREIGN GENES IN THE YEAST PICH1A PASTORIS
40 41 42 43
44
169
Trimble, R.B., Atkinson, P.H., Tschopp, J.F., Townsend, R.R. and Maley, F. (1991) J. Biol. Chem. 266, 22807-22817. Cregg, J.M. and Higgins, D.R. (1995) Can. J. Bot. 73 (Suppl 1), S891-897. Grinna, L.S. and Tschopp, J.F. (1989) Yeast 5, 107-115. Lin Cereghino, G.P., Lin Cereghino, J., Sunga, A.J., Johnson, M.A., Lim, M., Gleeson, M.A.G. and Cregg, J.M. (2001) Gene (in press). Sim, B.K., O'Reilly, M.S., Liang, H., Fortier, A.H., He, W., Madsen, J.W., Lapcevich, R. and Nacy, C.A. (1997) Cancer Res 57, 1329-1334.
PROTEIN SPLICING AND ITS APPLICATIONS
Izabela Giriat, Thomas W. Muir and Francine B. Perler*
The Rockefeller University 1230 York Avenue New York, NY 10021 *New England BioLabs 32 Tozer Road Beverly, MA 01915
INTRODUCTION Protein splicing is a post-translational process by which an intervening sequence, the intein, is removed from a host protein, the extein (1). The first intein was discovered in 1990 as a 'spacer' within the Saccharomyces cerevisiae vacuolar ATPase (Sce VMA) gene that was absent in other VMA homologues (2-4). Since then, over 100 putative inteins have been identified and cataloged in InBase, the on-line intein registry and database (http://www.neb.com/neb/inteins. html) (5). Intein-mediated protein splicing is the protein equivalent of RNA splicing and inteins have sometimes been called protein introns. Two proteins are synthesized from the precursor gene: the ligated exteins and the excised intein. Although the role of inteins in early evolution is unknown, several groups have suggested that they were involved in enzyme evolution by peptide or domain ligation and shuffling - using mechanisms now employed in various intein-mediated protein engineering applications (6-8). The majority of inteins are bifunctional proteins that also encode homing endonuclease signature motifs. Homing endonucleases were originally shown to initiate mobility (or homing) of intron genes into 'empty' sites in homologous genes (the intron 'home') by creating doublestrand breaks at the homing site (9-19). Several inteins have demonstrated endonuclease activity Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
171
172
I. GIRIAT ET AL.
and the Sce VMA intein gene has been shown to be mobile in yeast (20-30). Smaller mini-inteins do not have homing endonuclease motifs and are therefore not mobile genetic elements. Biochemical analysis indicates that splicing and endonuclease active sites are in different regions of the intein (29). Crystal structures of several inteins indicate that splicing and homing endonuclease functions are in separately-folded subdomains (31-37). Protein splicing is an autocatalytic process whose mechanism involves splice-junction nucleophiles and many facilitating amino acids (aa) present throughout the intein (5, 6, 29, 3858). Understanding the mechanism of protein splicing has led to the exploitation of inteins as tools for protein purification, labeling, protein synthesis, ligation, backbone cyclization, etc. (8, 40, 59-72). WHAT IS AN INTEIN? An intein is defined as an intervening sequence that is present in a gene and its mRNA, but absent in the mature protein product. However, many types of insertion sequences also fit this limited definition. Therefore, several additional criteria need to be met before a sequence is considered a putative intein. Inteins are always present within the host protein and not at either end of the precursor. After removal of the intein, the host protein has an intact peptide backbone, as opposed to two or more protein fragments that remain associated after proteolysis (as occurs in maturation of insulin). Formation of a native peptide bond between the ligated exteins is a key distinguishing feature of intein-mediated protein splicing. Finally, an intein should encode the 4 major conserved splicing regions motifs (Figure 1) (5, 6, 56-58, 73, 74). Experimentally, the observed molecular mass of the gene product is similar to homologues lacking the intein and corresponds to the predicted size of the ligated exteins. A second protein product is present at the predicted molecular mass of the intein. However, in heterologous host cells, the excised intein sometimes aggregates and is not found in the soluble fraction.
PROTEIN SPLICING AND ITS APPLICATIONS
173
To date, all inteins are greater than 100 aa. Only a small number of inteins have been experimentally shown to splice, while the majority have been found by similarity searches of whole genome sequences (5, 6, 56). Inteins are very divergent. Although several motifs are present in all inteins, their sequence is not highly conserved (Table 1) (5, 6, 56-58, 73, 74). There is even variability in nucleophiles and facilitating amino acids.
Inteins are given 2-part names that begin with a 3 letter genus/species abbreviation followed by the host protein (extein) name, such as Tli Pol intein for the Thermococcus litoralis
174
I. GIRIAT ET AL.
DNA polymerase intein (1). In the case of isolates that do not have species names, the isolate designation is included, as in the Psp-GBD Pol intein for the Pyrococcus sp. strain GB-D DNA polymerase intein. If endonuclease activity has been demonstrated, the intein is also given an endonuclease name: following restriction enzyme nomenclature guidelines, the genus/species abbreviation is preceded by 'PI-' and the enzyme is numbered in order of identification of intein endonucleases in that species, such as PI-TliI and PI-TliII (1, 5, 17). Precursor residues are numbered based on their location: (a) residues in the N-extein are numbered from the carboxyterminus to the amino-terminus beginning at the amino-terminal splice junction and include a minus sign to denote the N-extein, (b) residues in the intein are numbered independently from the precursor, beginning with 1 from amino-terminus to carboxy-terminus, and (c) residues in the Cextein are numbered from the carboxy-terminal splice junction to the precursor carboxy-terminus and include a plus sign to denote the C-extein. Homologous or allelic inteins have been found at the same insertion site in homologous host proteins from different isolates or species. However, intein distribution is sporadic and the presence of an intein in one isolate does not necessarily indicate that the intein will be present in genes from all closely-related organisms (5, 6, 22, 25, 27, 30, 48, 56-58, 75-81). Several host proteins contain multiple inteins at different locations in the same gene or in homologous genes from different organisms. For example, 3 inteins are present in the Methanococcus jannaschii (Mja) Replication Factor C gene (82); inteins are present in 3 different insertion sites in Class B DNA polymerases from 8 Archaea, but only the Thermococcus aggregans DNA polymerase has an intein in all 3 positions (5, 79). When more than 1 intein is present in a single protein, each intein is given a numerical suffix. Since not all host proteins contain the same number of inteins or inteins at the same insertion site, this suffix is not necessarily the same for all intein alleles. For example, the Tli Pol-1 intein is an allele of the Mja Pol-2 intein. To clear up this problem, each integration site in the host protein is given a name consisting of the gene abbreviation with a letter suffix, so that both the Tli Pol-1 and Mja Pol-2 inteins are present in the pol-b insertion site (Table 2) (5, 6). INTEIN DISTRIBUTION AND EVOLUTION Inteins are present in unicellular organisms from all 3 phylogenetic domains: Archaea, Eukarya and Eubacteria. Many of the inteins cataloged in InBase are from Archaea. The absence of inteins in multicellular organisms may be limited by the inability of mobile intein genes to invade germline cells. However, multicellular organisms have orthologous hedgehoglike proteins which are essential for embryonic development and contain autoprocessing domains that are structurally and mechanistically similar to intein splicing domains (6, 31, 36, 56, 83-85). Intein distribution may also be limited by the host range of vectors that carry inteins from cell to cell. The number of inteins per organism may relate to its ability to take up foreign DNA (either because of natural competency or infection by broad host range viruses or plasmids) and the efficiency of its double-strand break repair pathways. This argument assumes that inteins are selfish DNA (7, 15, 86-88). It is probable that many ancient inteins have been lost with time and that the inteins we see today are the survivors that have been mobilized by acquisition of homing endonucleases (6,15,18, 86, 87, 89-94).
PROTEIN SPLICING AND ITS APPLICATIONS
175
The number (No.) of intein alleles registered in InBase as of 12/31/00 and their location in each extein (5).
176
I. GIRIAT ET AL.
As of Dec 1, 2000, there are 107 inteins registered in InBase that are present in 37 different extein hosts at 47 different insertion sites (5). The majority of these inteins are allelic (Table 2). The number of alleles identified is often a reflection of scientific interest, as several groups have focused on single inteins or exteins and analyzed numerous related organisms for the presence of a particular intein. It has often been said that inteins are present in genes involved in information transfer (replication, DNA repair, recombination, transcription and translation) (13). Inteins are present in 21 different genes involved in information transfer and 16 genes encoding other functions or. unknown functions. The overrepresentation of inteins in informational genes may relate to several factors. First, these are ancient genes that could have acquired inteins early in evolution. Second, if inteins are gained and lost during evolution, then these types of genes that are more likely to be present on mobile vectors (phage, F-factors, plasmids) are more likely to spread inteins to various host organisms. Finally, inteins play some biological role that has yet to be discovered. Inteins tend to be located in conserved extein motifs, active sites and cofactor-binding pockets (6, 40, 56, 57). Because of this, the presence of an intein suggests that this position is an important extein sequence, even in uncharacterized host proteins. There are many possible reasons why inteins are found in conserved extein sequences. First, conservation of the insertion site sequence increases the probability that the homing endonuclease recognition site will be maintained and homing will occur. Second, the presence in essential extein regions ensures selection for active inteins, since host protein activity requires splicing. Since the splicing domain is small (as few as 134 aa) and intein sequences are not very conserved, genetic drift would soon mask the presence of an inactive intein in a nonessential extein location. Since there is no spliced message, the intein sequence cannot be removed by retrohoming (91). It is unlikely that simple deletions would surgically remove the intein sequence. If the intein-encoded homing endonuclease is active, then gene conversion from intein-minus to intein-plus will occur at a very high frequency (20, 95-97). However, if the homing endonuclease becomes inactive, then the intein can be lost by recombination with inteinless alleles. Finally, intein location may be due to structural issues. Binding pockets and active sites tend to be on protein surfaces, rather than buried within the hydrophobic core. Folding of the extein is less likely to be perturbed by the intein domain if the intern is inserted on the extein surface or at a position that is normally accessible to substrates or cofactors. INTEIN MOTIFS AND REGIONS Most putative inteins have been identified by the presence of conserved residues (Table 1 and Figure 1), rather than on the basis of interrupting extein genes. Initially, 7 motifs were identified as intein Blocks A through G (58). Blocks C, D and E are also present in the LAGLIDADG family of homing endonucleases (also known as the dodecapeptide or DOD family), which typically contains 2 LAGLIDADG motifs (intein Blocks C and E). The LAGLIDADG family of homing endonucleases is the largest family of homing endonucleases; like all homing endonucleases, the family is named after its signature motif sequences (12, 13, 17-19, 73). A fourth motif, Block H, was found in the LAGLIDADG endonuclease region (6, 57). Sequence comparison suggests that Blocks C, D, E and H comprise the LAGLIDADG homing endonuclease region, which splits the region responsible for protein splicing into aminoterminal and carboxy-terminal splicing regions (Figure 1). Two minor motifs have also been
PROTEINSPLICING AND ITSAPPLICATIONS
177
identified, Blocks N2 and N4 (6). An alternative intein motif nomenclature has been proposed (6), based on the region in which the conserved sequence is located: (a) Blocks A (Nl), N2, B (N3) and N4 are in the ammo-terminal splicing region, (b) Blocks C (EN1), D (EN2), E (EN3) and H (EN4) are in the LAGLIDADG homing endonuclease region, and (c) Blocks F (C2) and G (Cl) are in the carboxy-terminal splicing region (Figure 1). Most inteins registered in InBase contain LAGLIDADG homing endonuclease motifs (5). There are some exceptions. The Synechocystis sp., strain PCC6803, DNA gyrase B subunit (Ssp GyrB) intein encodes an HNH family homing endonuclease in place of the LAGLIDADG family homing endonuclease (6, 56, 73, 98). The Ssp DnaX intein is inserted into a homologue of the 71-kDa tau-subunit of Escherichia coli DNA polymerase III (6, 56, 73, 98, 99). The sequence between the amino-terminal and carboxy-terminal splicing regions in the Ssp DnaX intein encodes two open reading frames, one in-frame with the splicing regions that has no similarity to known genes and a second reading frame which encodes a putative LAGLIDADG homing endonuclease (73, 98). The remaining ~10% of inteins are much smaller than the rest (~130-200 aa versus 360-600 aa) and have a short 'linker' sequence separating the splicing regions. Each position within every intein motif is made up of either related amino acids (rather than a single conserved residue) or a non-conserved position (5, 6, 56-58). As more inteins have been discovered, the pattern of conserved residues has become more diverse, rather than clearer. Initially, all inteins were thought to (a) begin with Ser or Cys, (b) have a conserved His in Block B that is often preceded by a Thr which is 3 residues away, (c) end in the dipeptide His-Asn, and (d) are inserted prior to Ser, Thr or Cys at the beginning of the C-extein (Table 1). Inteins have now been identified that begin with Ala (6, 56, 100), lack a Block B His (5), have penultimate residues other than His (6, 56, 100-104) and end in Gln (5, 74,105). Although Thr has yet to be observed at the beginning of an intein, splicing occurred when the Tli Pol-2 intein ammoterminal Ser was mutated to Thr (29). Three families of inteins begin with Ala: the DnaB inteins present in the dnaB-b insertion site in Mycobacterium leprae and Mycobacterium avium genes, the Deinococcus radiodurans SNF2 helicase intein and the KlbA family of inteins present in several Archaea (5). Splicing of the Mja KlbA intein has recently been demonstrated (100). These polymorphisms in the most conserved intein residues make it all the more difficult to identify new inteins. To help address this issue, InBase now has an on-line Blast server that queries only InBase intein sequences and can identify inteins with lower scores than would be possible if searching standard databases such as GenBank (5). INTEIN STRUCTURE The crystal structures of 4 inteins have been solved: (a) the Sce VMA excised intein (33) and precursor (34), (b) the Mycobacterium xenopi DNA gyrase A subunit (Mxe GyrA) miniintein with a single amino acid N-extein (32), (c) the Pyrococcus kodakaraensis strain KOD1 DNA polymerase intein-2 (Pko Pol-2 intein or PI-PkoII) (37) and (d) the Pyrococcus furiosus ribonucleoside-diphosphate reductase, alpha subunit intein (Pfu RIRl-1 intein or PI-PfuI) (35). These structures confirm the separate splicing and endonuclease regions identified by sequence analysis. In all cases, the amino-terminal and carboxy-terminal splicing regions form a single splicing domain. This splicing-domain fold is also conserved in the related Drosophila Hedgehog protein autoprocessing domain, leading Hall et al. to call this fold the HINT (Hedgehog-intein) module (31, 84). The HINT module is mainly Folding of the intein
178
I. GIRIAT ET AL.
splicing regions brings the intein amino- and carboxy-termini in proximity to form the splicing active site. In each precursor, the peptide bond at the ammo-terminal splice junction (between the N-extein and the intein) is distorted, making it more labile (32, 34). In the Sce VMA and Pfu RIR1-1 inteins, the LAGLIDADG motifs are present in a separately folded domain that shares a common fold with intron-encoded LAGLIDADG homing endonucleases like I-Cre (106) and IDmol (107). However, both the Sce VMA and Pfu RIR1-1 intein homing endonucleases have an extra subdomain inserted into a HINT module loop adjacent to the main endonuclease, which is thought to be involved in DNA recognition and/or binding (23, 24, 33, 35,108-114). In the case of PI-SceI, the extra sequence (called the DNA recognition region or DRR) is present in the amino-terminal splicing region between motifs B (N3) and N4 (33) and in PI-PfuI it is called the 'saddle' and is located between the endonuclease region and the C-extein (35). Neither endonuclease subdomain is present in the Mxe GyrA mini-intein (32), nor in the related hedgehog autoprocessing subunit (31). In most inteins, the sequence between motifs N3 and N4 is too short to include a DRR (6, 56,73). At this time, nothing is known about the folding pathway of protein splicing precursors, although the intein and the extein wind up in separately folded subdomains (34). The extein may influence folding of the intein and vice versa. Folding issues may also affect splicing or precursor stability (in vivo proteolysis of misfolded exteins or inteins is commonly observed). Pulse-chase studies and experiments with isolated precursors suggest that splicing occurs posttranslationally (22, 29, 40, 49, 115, 116). THE STANDARD PROTEIN SPLICING MECHANISM The mechanism of protein splicing has been extensively reviewed (8, 36, 40, 68, 117). However, recently identified polymorphic inteins are teaching us new lessons. The standard protein splicing mechanism was first defined with the use of the Psp-GBD Pol intein cloned in a model precursor consisting of the intein (I) inserted in-frame between the Escherichia coli maltose binding protein (MBP or M) and the fragment of Dirofilaria immitis Paramyosin (P) to form the MIP precursor (29, 41-43, 45, 46, 115). The mechanism was later confirmed with the Sce VMA intein (38, 39). The Psp-GBD Pol intein MIP precursor provided a permissive, but suboptimal context for protein splicing. Compared to the native extein precursor, splicing was slower, temperature dependent and was accompanied by dead-end side reactions. Similar manifestations of suboptimal splicing are often observed when an intein is surrounded by a mutated native extein or a foreign extein, suggesting that the natural precursor has evolved to support efficient splicing (39, 53, 55, 77, 116, 118-121). Protein splicing requires 4 nucleophilic displacements mediated by 3 splice junction residues (Figure 2). One or more residues in the precursor that activate the scissile bonds facilitate each of these reactions. Although peptide bonds are broken, no segment of the precursor is released until the exteins are ligated. The standard protein splicing reaction begins when the Ser1 or Cys1 at the intein ammo-terminus undergoes an acyl rearrangement, replacing the amino-terminal splice junction peptide bond with a (thio)ester bond. In step 2, a transesterification reaction occurs, moving the N-extein from the side chain of the first residue in the intein to the side chain of the first residue in the C-extein (the +1 aa), which is always a Ser, Thr or Cys. This step results in ligation of the exteins and formation of a branched protein
PROTEIN SPLICING AND ITS APPLICATIONS
179
intermediate. The Sce VMA intein precursor structure suggests that a conformational shift must occur to bring the C-extein nucleophile into position to attack the (thio)ester at the aminoterminal splice junction (34). Studies on the apparent pKa of the amino-terminal and carboxyterminal Cys in the Sce VMA intein suggest that Cys+1 has an unusually low pKa which makes
180
I. GIRIAT ET AL.
it more reactive, although the residues that facilitate this are unknown (44). The intein is cleaved away from the branched intermediate when cyclization of the intein carboxy-terminal Asn breaks the adjacent peptide bond at the carboxy-terminal splice junction. Although it has not been experimentally demonstrated during protein splicing, Gln can undergo similar cyclization reactions to Asn in model peptides (74, 122). The intein carboxy-terminal succinimide may be hydrolyzed to reform an Asn or isoasparagine (40, 41, 45). In the absence of the intein, a spontaneous acyl shift at the +1 residue results in a peptide bond between the exteins. Two of the four nucleophilic displacements use the C-extein +1 residue as the nucleophile, which is not technically part of the intein (defined as the intervening sequence). No cofactors or exogenous proteins are required. However, recent crystallographic studies have detected a zinc atom in 2 inteins which is not in position to play a catalytic role (34, 35). Instead, the zinc may have a structural role (34, 35). Reactions at the ammo-terminal splice junction are facilitated by residues in intein Block B, especially the conserved His (32, 34, 40, 53). These residues and others yet to be identified, make the scissile bond more electrophilic, helping to form an oxyanion hole and position the splice junction residues, including distorting the peptide bond at the amino-terminal splice junction. They most likely facilitate all intein-mediated reactions at the amino-terminal splice junction, including the trans-esterification step. Only 1 family of inteins (inteins in the CDC21-b insertion site) does not have this conserved His; however, many residues are involved in facilitating amino-terminal splice-junction reactions and any amino acid capable of appropriately hydrogen bonding can substitute for this His (32, 34, 40, 53). More polymorphisms are seen at the intein penultimate position where approximately 15% of inteins (at 10 intein insertion sites) have a penultimate residue other than His (5, 101). The intein penultimate His is thought to assist in branch resolution by Asn or Gln cyclization. Mutation of this penultimate His usually results in branched intermediate accumulation and reduced spliced product, although splicing is not always blocked (39, 46, 47, 51). Initially, examination of inteins that naturally have another residue at the penultimate position yielded contradictory results. The Chlamydomonas eugametos ClpP protease subunit (Ceu ClpP) intein failed to splice in E. coli unless the penultimate position was 'reverted' to His (102). Subsequent experiments showed that the Ssp DnaE (103, 104), Mja phosphoenol pyruvate synthase (PEP) (101), Mja RNA polymerase subunit A' (Rpol A') (101) and Mja KlbA (100) inteins all spliced in E.coli, although none of them have a penultimate His. Splicing also improved when the penultimate position was 'reverted' to His in the Ssp DnaE (104) and Mja PEP (101) inteins. On the other hand, splicing of the Mja Rpol A' intein was inhibited when the native penultimate Gly was changed to His, as evidenced by a decrease in spliced product and an accumulation of branched intermediate (101). We have proposed that all inteins evolved from ancestors that had a penultimate His and that mutation of this residue yielded an intein that still spliced, but not efficiently (101). Enough active extein was synthesized to permit survival of the host until further mutation improved splicing. However, the more essential and highly-expressed the gene product, the stronger the selection for mutations that would improve splicing. Rpol A' is an example of a highly-expressed, essential protein, since it is a subunit of the archaeal RNA polymerase. Genetic selection has resulted in an Mja Rpol A' intein that splices efficiently in the absence of a penultimate His and splicing is now inhibited when its penultimate Gly is 'reverted' to His. The diverse splicing capabilities exhibited by these inteins may reflect different stages of evolving towards rapid splicing in the absence of a penultimate His (101).
PROTEIN SPLICING AND ITS APPLICATIONS
181
Inteins function like regular enzymes, requiring facilitating groups to activate nucleophiles and electrophilic centers. However, unlike normal enzymes they only act on a single substrate, the precursor in which they exist. When the precursor is split and splicing occurs in trans (see below), unequal molar ratios of fragments result in the accumulation of the fragment in excess and not in turnover (the limiting fragment does not react with multiple partners) (123-128). It should be emphasized that protein splicing is context dependent. When an intein is inserted into a foreign context, splicing is often temperature dependent and less efficient (39, 53, 55, 77,116,118-121,129). Like all other enzymes, inteins have substrate preferences, especially in residues proximal to the splice junctions (39, 55, 71, 77, 119-121). Although only the Cextein +1 residue actively participates in the protein splicing reaction, other residues in the extein are important (39, 55, 71, 77, 119-121). The extein requirements for protein splicing are dependent on the specific intein and upon the extein context. Some inteins, such as the Sce VMA intein, are able to splice with almost any amino acid preceding the intein (39). Other inteins, such as the Mxe GyrA intein, will only splice if the native extein Tyr or the similar residue, Phe, precede the intein (119). Splicing of the Synechocystis sp. strain PCC6803 DnaB intein has different proximal extein residue requirements with different host proteins (128).
AN ALTERNATE PROTEIN SPLICING MECHANISM FOR INTEINS BEGINNING WITH ALANINE Three families of inteins begin with Ala, which is incapable of performing the requisite acyl shift to initiate protein splicing. The Mja KlbA intein was the first intein identified with an amino-terminal Ala and it was predicted to be inactive (6, 98). Alal intein alleles are present in the KlbA host protein (5, 6, 56, 98, 105, 130), the DnaB helicase insertion site dnab-2 in M. avium and M. leprae DnaB helicases (5), and the D. radiodurans Snf2 helicase (5). In all inteins tested to date, splicing is blocked when a residue other than Ser, Thr or Cys is substituted at the intein amino-terminus (29, 46, 47, 118, 120, 129, 131). Dalgaard et al. (56) suggested that the Mja KlbA intein could splice by an alternate pathway first described by Xu et al. (45). In this non-canonical pathway, the C-extein Ser, Thr or Cys directly attacks a peptide bond at the amino-terminal splice junction, instead of a (thio)ester bond (Figure 3). This new pathway was demonstrated with the Mja KlbA intein and utilizes the same residues in intein Block B to facilitate amino-terminal splice-junction reactions as in the standard protein splicing pathway (100). Why then, does mutation of any intein amino-terminus to residues other than Ser, Thr or Cys block protein splicing? The answer may lie in the Sce VMA precursor structure, which indicated that the carboxy-terminal nucleophile was too far away from the amino-terminal splice junction to mount a nucleophilic attack (34). Apparently, standard inteins cannot undergo the required conformational change to align this nucleophile without first generating the aminoterminal (thio)ester. However, inteins that begin with Ala have undergone other changes that allow the carboxy-terminal nucleophile to attack the amino-terminal splice junction in the absence of (thio)ester formation (100).
182
I. GIRIAT ET AL.
USE OF INTEINS MODIFIED FOR SINGLE SPLICE-JUNCTION CLEAVAGE IN PROTEIN PURIFICATION Understanding the mechanism of protein splicing has allowed scientists to modify inteins for various biotechnology purposes. Three dead-end side reactions lead to single splice-junction cleavage (Figure 4). The (thio)ester bonds in both the linear and branched intermediate are more labile than peptide bonds and can be cleaved under mild conditions by various nucleophiles. When Asn or Gln cyclization precedes the first two steps in the protein splicing pathway, carboxy-terminal splice cleavage occurs. Various intern-mediated protein purification vectors take advantage of these autocleavage reactions (8, 40, 63, 68, 71, 118-121,129,131-134). These vectors all have the same basic design (Figure 5). An intein is modified so that it can only generate a thioester at the amino-terminal splice junction or only undergo Asn cyclization. An
PROTEIN SPLICING AND ITS APPLICAT1ONS
183
affinity tag is then fused to the inactive splice junction or within the intein. The most commonlyused affinity tags in these systems are the Bacillus circulans chitin binding domain (CBD or B), MBP or a His tag. The key to any of these purification vectors is control of the cleavage reaction. When the target protein's carboxy-terminus is cloned preceding the intein aminoterminus (Figure 5a), release of the target protein by cleavage of the intein thioester bond is initiated by adding an exogenous nucleophile, such as a thiol reagent (118-120, 131-134), changing pH (118) or changing temperature (118, 131). In the second class of intein protein purification vectors, the target protein's amino-terminus is fused to the carboxy-terminal splice junction (Figure 5b) and cleavage is a result of Asn cyclization which is induced by temperature or pH (118, 119, 129, 131, 133). In a unique method of control of carboxy-terminal splicejunction cleavage, the Sce VMA intein was mutated so that thiol-induced cleavage of a small amino-terminal peptide triggers carboxy-terminal splice-junction cleavage and release of the
184
I. GIRIAT ET AL.
target protein from the precursor carboxy-terminus (121). Several of these intein-mediated vectors are commercially available (New England BioLabs, Beverly, MA). Single splicejunction cleavage reactions are also dependent on proximal extein residues, although to a lesser extent than the complete splicing reaction (8, 39, 71, 119-121). The advantages of these intein-mediated protein purification vectors are (a) singlecolumn purification of the target protein free of affinity tags, (b) exogenous proteases that are often expensive and cleave within the target protein are not required to remove the affinity tag and (c) the production of protein fragments with activated arnino- or carboxy-termini for use in protein ligation (see below). Positioning of the target protein at the amino-terminus of the fusion has the disadvantage of having expression dependent on the target protein itself.
PROTEIN SPLICING AND ITS APPLICATIONS
185
Positioning the target protein at the carboxy-terminus of the fusion reduces this problem (8, 71, 119, 121, 129). Thiol-induced cleavage of proteins at the amino-terminus of the precursor results in a chemoselective, regiospecific activated linkage at the C-terminus of the target protein. In the absence of thiols, this linkage is hydrolyzed to yield a normal carboxy-terminus, but in the presence of thiols, it retains its reactive characteristics. All of these vectors with the target protein at the carboxy-terminus of the fusion allow purification of proteins with aminoterminal Cys for use in protein ligation, except the IMPACT-CN™ vector (121). PROTEIN SPLICING OF SPLIT PRECURSORS IN TRANS Several groups independently demonstrated that protein splicing precursors can be cut into two pieces that individually have no activity, but when combined will non-covalently associate to give a functional intein (Figure 6c) (123, 124, 126-128, 135, 136). Note, this transprotein splicing effect is yet another example of so-called protein complementation, a phenomenon that is seeing increasing usage in protein engineering (137). Affinity tags can be fused to the split site for purification of precursor fragments (123). The reconstituted split intein then mediates a normal protein splicing reaction. Assembly occurs spontaneously in vivo when both fragments are expressed in the same cell (128, 136). However, in vitro trans-splicing often requires denaturation and renaturation of the isolated precursor fragments (123, 124, 135, 138140). The denaturants may be required for multiple reasons. Many intein fragments are insoluble when expressed separately (123, 124, 135, 138-140), possibly due to improper folding or exposure of hydrophobic residues in the absence of the complete intein. Alternatively, aggregation or insolubility could be due to the extein portion of the precursor fragment. The Psp-GBD Pol intein is from an extreme thermophile and functions in 6 M urea, suggesting that the need for denaturants during reassembly is not due to a need to denature the intein segments (123). When the length of the extein was decreased, the Psp-GBD Pol intein precursor fragments reassembled and spliced in the absence of urea, suggesting that, in this case, the denaturants were required to compensate for extein factors (123). Protein splicing in trans has also been accomplished without prior denaturation with a biosynthetic amino-terminal fragment and a synthetic carboxy-terminal fragment (127). Splicing in trans has similar host protein requirements as splicing in cis. Some inteins are more robust than others and can splice when inserted into many different proteins. The intein usually requires its native +1 extein amino acid for efficient splicing. Some inteins only require 1 native N-extein amino acid, while others require more native extein residues (39, 104, 119, 125, 128, 140). To minimize foreign extein effects, a few native extein residues are often added at the intein insertion site, as well as residues that allow structural flexibility like Gly (128, 138140). Experiments with split precursors indicate that an intein fragment does not act on multiple substrates, since the fragment in molar excess remains (123, 124, 138, 139, 141). Trans-protein splicing is a bimolecular reaction. However, once the intein fragments reassemble, the reaction is, in effect, a unimolecular reaction. Intein fragment reassociation requires only micromolar concentrations, as opposed to the millimolar concentrations required for expressed protein ligation (see below). Since nonallelic inteins have little sequence identity, different split inteins should only reassociate with their homologous partners and not with heterologous partners. Otomo et al. demonstrated this using two inteins split at different positions (138). The
186
I. GIRIAT ET AL.
PROTEIN SPLICING AND ITS APPLICATIONS
187
fact that different split sites were used should not effect heterologous association, since many proteins, including inteins, can reassemble with large overlapping segments (123). Most recently, the trans-protein splicing area has received a considerable boost through the discovery of the naturally-occurring split Ssp DnaE intein (6, 56, 98, 103). The dnaE gene encodes the catalytic subunit of the replicative DNA polymerase and is an essential gene. The two dnaE gene fragments are present over 700 kb apart on opposite strands of the Synechocystis sp. PCC6803 chromosome. This arrangement suggests that the dnaE gene containing the intein was split during a chromosomal inversion event. Significantly, this split intein does not require denaturants for reconstitution to initiate splicing, with obvious implications for applications of the approach to protein engineering. PROTEIN SEMI-SYNTHESIS In the last several years, there has been an enormous interest in approaches that utilize the tools of synthetic chemistry in the study of protein structure and function. This situation is in part fueled by the success of synthetic approaches (e.g., solid phase peptide synthesis (SPPS)) in delineating structure-activity relationships (SAR) in small bio-active molecules (142). Developing approaches that allow these chemical SAR strategies – introduction of unnatural amino acids with non-coded side-chain or backbone groups - to be extended to proteins have been the focus of much effort (70, 143-148). Although every aspect of stepwise SPPS has undergone considerable refinement over the years (149), there is a practical upper limit on how large a peptide can be synthesized, approximately 50 aa. In response to this limitation, innovative approaches have been developed in an effort to extend the use of chemistry to proteins. These include the nonsense codon suppression strategy developed by Schultz and co-workers (143, 148) and a plethora of approaches that allow the enzymatic or chemical ligation of unprotected peptides (146, 150-156). Peptide ligation strategies were originally developed to ligate unprotected synthetic peptides to produce a protein (i.e., convergent synthesis). Arguably the most successful, and certainly the most frequently used, of these peptide ligation strategies is native chemical ligation (NCL) (67, 70, 157). Native chemical ligation, developed by Kent and co-workers, allows the chemoselective linking of two unprotected peptides at physiological pH (Figure 6a). A peptide containing an group is reacted with a second peptide containing an amino-terminal Cys. Thiol exchange yields an intermediate linked through a thioester, which spontaneously rearranges to a peptide bond. Thus, the end product contains a native backbone linkage at the point of ligation (Figure 6a). Native chemical ligation is a powerful technique, providing almost complete flexibility in terms of the type and number of modifications that can be made in the target protein (70,158). However, building blocks cannot be larger then 50 aa (due to the limitations in SPPS). Consequently, synthesis of proteins larger then 100 aa requires approaches that either allow sequential native chemical ligation reactions to be performed, or that use much larger polypeptide building blocks. Significant progress has been made in the former and indeed approaches are now available that allow sequential ligations to be performed in solution (67, 159-161) and most recently, on the solid phase (162-164). Likewise, important advances have been made in the last few years in terms of preparing large recombinant building blocks
188
I. GIRIAT ET AL.
containing the requisite reactive groups for NCL (118, 119, 131-133, 165-170). These approaches are to a significant degree a consequence of the discovery of protein splicing and the elucidation of its mechanism. LIGATION OF POLYPEPTIDES EXPRESSED AS INTEIN FUSIONS As discussed above, understanding the protein splicing mechanism has led to the design of a number of mutant inteins which have significant utility in protein science and in particular, peptide ligation (63, 104, 125, 131-133, 163, 165-177). Proteins expressed in-frame and aminoterminal to a mutated intein that can only participate in the first step of splicing (i.e., the N-S acyl shift, Figure 4), can be cleaved by thiols via an intermolecular trans-thioesterification reaction (118-120, 131-134). This generates a recombinant protein derivative that can be reacted with a polypeptide containing an amino-terminal Cys via native chemical ligation. It is also possible to mutate an intein such that after cleavage at the carboxy-terminal splice junction (through temperature, pH change or thiol addition, Figure 4), one obtains the protein of interest with an amino-terminal Cys (118, 119, 129, 131, 133). Importantly, the use of these engineered inteins permits the production of large recombinant building blocks for native chemical ligation, and allows for mixing of recombinant and synthetic building blocks. The semi-synthetic version of native chemical ligation is best known as expressed protein ligation (EPL), but is also known as intein-mediated protein ligation (IPL) (Figure 6b) (132, 166, 167). EPL has been used for the incorporation of probes into proteins in vitro (e.g., unnatural amino acids, fluorophores, posttranslational modifications), the isotopic segmental labeling of multi-domain proteins, and the backbone cyclization of peptides and proteins. EPL and trans-protein splicing both permit the ligation of two polypeptides, at least one of which is recombinant. EPL is a very flexible technique because it is based on a chemical reaction (NCL). It allows ligation of polypeptides to take place under a variety of conditions, e.g., in the presence of denaturants, organic solvents, and detergents. In addition, the aminoterminal fragment can contain essentially any residue next to the (161). The potential drawbacks of EPL are that it may be necessary to mutate the protein to add a Cys (to facilitate the chemistry) and the fact that it is a second-order chemical reaction, and thus high concentrations of the reactants are usually required for efficient reaction. Note, there are examples of conformationally-assisted NCL reactions that are less dependant on reactant concentrations (173, 178). An attractive feature of trans-protein splicing is that it is based on a macromolecular recognition event (the intein fragments associate with one another and subsequent splicing is nominally an intra-molecular reaction) and thus high concentrations of the two pieces is less of an issue. On the other hand, trans-protein splicing appears always to require the presence of some naturally-occurring extein amino acids to splice together the exteins efficiently. For example, the Ssp DnaE intein needs 5 amino-terminal extein amino acids and 3 from the C-terminal extein, leaving 8 extra residues (KFAEYCFN) at the site of ligation (103), although the number of native extein residues may be context dependent (104, 125). Moreover, engineered split inteins need to be denatured and reconstituted to promote protein splicing. Such conditions may not be amenable to all proteins, though this requirement can possibly be avoided with the use of the naturally-occurring split intein as noted above. It is important to point out that for both EPL and trans-protein splicing, determining where to split the protein under study
PROTEIN SPLICING AND ITS APPLICATIONS
189
has to be done extremely carefully (and to some degree empirically) to assure fragment solubility and that the final ligation product is well folded and maintains biological activity (132, 179). A good illustration of the use of EPL to incorporate probes into proteins is the work of Cotton et al. (169). They created a biosensor for phosphorylation of the adapter protein c-Crk II by the oncogene c-Abl tyrosine kinase (180-182). This involved incorporating a fluorescence resonance energy transfer (FRET) pair (tetramethylrhodamine and fluorescein) into the aminoand carboxy-terminus of Crk II. Upon phosphorylation of Crk II by c-Abl there was a change in FRET due to the rearrangement of the termini of the protein, permitting a fluorescence read-out of the activity of the kinase. Notably, this work used a solid-phase version of EPL that added the advantages of SPPS to EPL, e.g., high concentrations of one of the reactants to drive the reaction to completion and the ability to wash away excess reagents. SEGEMENTAL ISOTOPIC LABELING One of the most exciting applications of EPL and trans-protein splicing has been segmental labeling of proteins for structural studies through nuclear magnetic resonance (NMR) spectrometry. This method allows a selected portion of a protein to be isotopically labeled with and Segmental isotopic labeling of a large protein leads to simplified NMR spectra; unlabeled regions of the protein are filtered out by suitable heteronuclear correlation experiments leaving only signals from the labeled part of the protein. In principle, segmental isotopic labeling, in combination with newly-developed NMR techniques (183, 184), allows structure analysis of discrete regions within proteins significantly larger than the current ~30 kDa upper limit. Both amino- and carboxy-terminal segments of proteins have been labeled with EPL and trans-protein splicing approaches (i.e., exo-labeling). More recently, middle domains of proteins (i.e., endo-labeling) have been labeled by trans-protein splicing, and the procedures that would allow the same to be done with EPL have been reported. Segmental isotopic exo-labeling Was first reported by Yamazaki and co-workers using a trans-protein splicing system based on the Pfu RIR1-1 intein (139, 140). This system was used to label a region of E. coli RNA polymerase subunit selectively with Each half of the protein was isotopically labeled simply by expressing the corresponding split intein fusion in enriched medium. Protein splicing took place after refolding of the split precursor fragments. Heteronuclear NMR experiments on these samples revealed the improvement in spectral resolution that segmental labeling provides. In a complementary study, EPL was used to a single domain within a Src-homology domain pair derived from the Abl protein tyrosine kinase. An ethyl derivative of recombinant Abl-SH3 was generated from the corresponding intein fusion and then chemically ligated, under physiological conditions, to recombinant Abl-SH2 possessing a factor Xa generated amino-terminal Cys. Comparison between the heteronuclear NMR spectra of fully-labeled and segmental-labeled AblSH(32) again illustrated the power of segmental isotopic labeling (168). Segmental isotopic endo-labeling, also first described by Yamazaki and co-workers using trans-protein splicing, has been used to label an internal region of a model protein, the Maltose Binding Protein (MBP) (138). The key to this approach was the use of two orthogonal split inteins, Pfu RIR1-1 and Pfu RIR1-2. The target MBP protein was expressed as three precursor fragments (Figure 7a); the central MBP segment contained Pfu RIR1-1 and Pfu RIR1-2 intein fragments at its amino- and carboxy-terminus respectively, while the amino- and carboxy-
190
I. GIRIAT ET AL.
terminal MBP segments contained the complementary Pfu RIR1-1 and Pfu RIR1-2 fragments at their carboxy- and amino-termini, respectively. Once again, protein splicing took place after refolding of the homologous split intein fragments. Since trans-protein splicing of overlapping intein fragments can occur (123), it is likely that the specificity of intein reassembly was due to sequence differences between the 2 inteins, rather than split site differences. EPL can also be used to label an internal region of a protein isotopically (Figure 7b). Blaschke et al. recently reported a sequential EPL strategy that allows three or more protein domains to be assembled efficiently. This was used to piece together the eukaryotic adapter protein, Crk-II, from its three constituent domains (185). In this strategy, the central polypeptide domain contained both an and a cryptic masked by a factor Xa cleavable prosequence. The pro-sequence mask prevented the central domain from reacting with itself in an inter- or intra-molecular fashion. The first step in the process was the reaction of the of the central domain with the moiety of the carboxy-terminal domain. The second step involved removal of the pro-sequence by treatment with factor Xa, revealing the requisite for the second ligation reaction, this time with the ammo-terminal domain a-thioester (Figure 7b). This approach has recently been used for segmental isotopic endo-labeling of the Crk-II protein (Blaschke and Muir, unpublished data). Trans-protien splicing and EPL each have their pros and cons when applied to segmental isotopic labeling (exo or endo). Trans-protein splicing does have the advantage of being a onepot process (particularly beneficial for endo-labeling). However, in many cases it will
PROTEIN SPLICING AND ITS APPLICATIONS
191
necessitate denaturing conditions to recover protein splicing activity, and will leave several extra amino acids at the point of ligation. On the other hand, EPL permits all the reactions to take place under physiological conditions, and it leaves no extra residues at the point of ligation, although Cys mutation may be required, However, high concentrations of reactants are needed for EPL and, in the case of endo-labeling, the stepwise nature of the process may lower the overall percentage yield of labeled product. In practical terms, the choice of whether to use trans-protein splicing or EPL will ultimately be dictated by the nature of the protein being studied. BACKBONE CYCLIZATION Backbone cyclization, condensing the amino- and carboxy- termini of a peptide together, has been used to rigidify structure, improve in vivo stability and modify the properties of small bio-active peptides, targets of drug discovery programs (142, 186, 187). Extending backbone cyclization to proteins could help researchers understand the roles of the termini in protein folding, stability and biological activity. Circular proteins are expected to be thermodynamically more stable, and to be resistant to exo-proteolysis. Recently, in vitro and in vivo strategies for cyclization of recombinant peptides and proteins by inteins have been reported. Several EPL strategies have been described that allow polypeptide cyclization in vitro (104, 125, 133, 173, 175). All of these approaches are founded on previous synthetic work demonstrating that native chemical ligation can occur intra-molecularly when the two reactive groups are placed in the same peptide sequence (160, 178, 188). This leads to backbone cyclization of the peptide. Backbone cyclization of recombinant polypeptides with the use of intra-molecular NCL can be achieved by simultaneously introducing a masked cysteine at the amino-terminus and a modified intein at the carboxy-terminus of the polypeptide. Unmasking of the Cys leads to backbone cyclization. Note, masking the Cys allows the process to be controlled and hence performed in vitro. Masking of the Cys has been achieved in several ways: (a) using a factor Xa cleavable pro-sequence to cyclize an SH3 domain (173), (b) using a modified self-splicing intein to cyclize maltose-binding protein (133), and (c) using a removal methionine to cyclize (175). Figure 8 schematically presents each protection strategy. Researchers have taken advantage of the E. coli enzyme methionyl-aminopeptidase (MAP), which is capable of post-translational removal of the amino-terminal Met residue from most cytosolic proteins (Figure 8c). Iwai and Pluckthun expressed with Met-Cys at its amino-terminus, resulting in the in vivo removal of Met (the Cys mask), and therefore bypassing the need for in vitro proteases (175). It is important to point out that incorporation of an amino-terminal Cys and an in the same polypeptide can lead to polymerization, that is, inter-molecular rather than intramolecular reaction. Indeed, such polymerization has been reported on thioredoxin (133). However, backbone cyclization will tend to dominate for proteins that have juxtaposed termini, particularly when the precursor is at low concentration (133). Protein splicing with a circularly permutated precursor and the naturally split Ssp DnaE intein has been used to cyclize peptides and proteins in vivo and in vitro (Figure 8d). The process is called SICLOPPS (Split Intein-mediated Cyclic Ligation Of Peptides and ProteinS) (104). Benkovic and co-workers cyclized the enzyme dihydrofolate reductase (DHFR) by expressing it between the two halves of this split intein (104). Key to the process was fusing the
192
I. GIRIAT ET AL.
carboxy-terminal intein segment to the amino-terminus of the protein of interest, and the aminoterminal intein segment to the target protein carboxy-terminus. The two intein halves come together in vivo to reconstitute an active intein and, in this circularly permuted precursor context, the normal splicing reaction results in backbone cyclization of the target peptide. This clever strategy removes the need of additional in vitro manipulations, and, importantly, uses the intein structure to bring together the termini of the polypeptide. This may be useful for cyclizing
PROTEIN SPLICING AND ITS APPLICATIONS
193
polypeptides whose termini are not suitably juxtaposed for the EPL strategy (e.g., unstructured systems). However, as with the EPL strategy, SICLOPPS, in principle, can lead to polymerization (if the split inteins from different molecules come together), although this has yet to be observed and the bimolecular reaction would be less likely than the unimolecular reaction. One drawback of the original SICLOPPS method is that it precludes introduction of commonlyused affinity tags (e.g., His-tag, GST, MBP) for purification of the cyclic protein (these tags would have to be inserted into the final sequence and could not be removed). In response to this, Xu and co-workers demonstrated that the Ssp DnaE split intein system can be used for the in vitro cyclization of polypeptides, thereby allowing affinity handles to be attached to one of the split intein segments (125). In addition to allowing the preparation of cyclic recombinant proteins with improved thermodynamic stability (175), proteolysis resistance (104, 175) and biochemical activity (173), these cyclization technologies, in principle, allow the generation of genetically-encoded libraries of cyclic peptides (as noted by Benkovic and co-workers (104)). Small circular peptides have many of the qualities of drugs, including structural rigidity and protease resistance. Access to these molecules with the use of molecular biology techniques, offers the intriguing possibility of screening such peptide libraries for drug-like activities. This area is likely to receive considerable attention in the future. FUTURE PERSPECTIVES Protein engineering with inteins has opened new doors for researchers. One of the most promising applications has been in the area of protein-protein screening assays, either in vitro or in vivo. An example of the former is the work of Cotton and Muir describing the use of protein biosensors to screen for c-Abl kinase inhibitors (163). Such technology can be applied to screen other proteins of therapeutic interest. Another exciting application is in the generation of cyclic peptide libraries (104), as discussed above. Recently Ozawa et al. (189) reported an interesting in vivo application of trans-protein splicing that allows protein-protein interactions to be identified in a manner reminiscent of the yeast two-hybrid system. These researchers expressed two proteins known to interact (calmodulin and its target peptide Ml3), each followed by one-half of the Ssp DnaE intein and one-half of the green fluorescence protein (GFP). Interaction of calmodulin and M13 resulted in generation of GFP via trans-protein splicing and hence the interaction could be detected by fluorescence techniques (189). This technique adds to the growing number of reporter protein complementation approaches designed to identify novel protein-ligand interactions rapidly (137), and is attractive because it should be amenable to any cell type. In conclusion, the use of inteins in the context of EPL or trans-splicing offers exciting possibilities for the creation of new tools that could be applied to study basic questions in areas ranging from biochemistry to cell biology. We look forward to the discoveries such tools will unearth in the post-genomic era. REFERENCES 1
Perler, F.B., Davis, E.O., Dean, G.E., Gimble, F.S., Jack, W.E., Neff, N., Noren, C.J., Thorner, J. and Belfort, M. (1994) Nucl. Acids Res. 22, 1125-1127.
194
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
I. GIRIAT ET AL.
Hirata, R., Ohsumi, Y., Nakano, A., Kawasaki, H., Suzuki, K. and Anraku, Y. (1990) J. Biol. Chem. 265, 6726-6733. Kane, P.M., Yamashiro, C.T., Wolczyk, D.F., Neff, N., Goebl, M. and Stevens, T.H. (1990) Science 250, 651-657. Shih, C., Wagner, R., Feinstein, S., Kanik-Ennulat, C. and Neff, N. (1988) Mol. Cell. Biol. 8, 3094-3103. Perler, F.B. (2000) Nucl. Acids Res. 28, 344-345. Pietrokovski, S. (1998) Protein Sci. 7, 64-71. Perler, F.B. (1999) Trends Biol. Sci. 24, 209-210. Perler, F.B. and Adam, E. (2000) Curr. Opin. Biotechnol. 11, 377-383. Dujon, B., Belfort, M., Butow, R.A., Jacq, C., Lemieux, C., Perlman, P.S. and Vogt, V.M. (1989) Gene 82, 115-118. Delahodde, A., Goguel, V., Becam, A.M., Creusot, F., Perea, J., Banroques, J. and Jacq, C. (1989) Cell 56, 431-441. Lambowitz, A.M. and Belfort, M. (1993) Annu. Rev. Biochem. 62, 587-622. Mueller, J.E., Bryk, M., Loizos, N. and Belfort, M. (1994) in Nucleases (S.M. Linn, R.S. Lloyd and R.J. Roberts, eds.), pp. 111-143 Cold Spring Harbor Press, Cold Spring Harbor, NY. Shub, D.A. and Goodrich-Blair, H. (1992) Cell 71,183-186. Grivell, L.A. (1994) Curr. Biol. 4584,161-164. Neff, N.F. (1993) Curr. Opin. Cell. Biol. 5, 971-976. Belfort, M. and Perlman, P.S. (1995) J. Biol. Chem. 270, 30237-30240. Belfort, M. and Roberts, R.J. (1997) Nucl. Acids Res. 25, 3379-3388. Jurica, M.S. and Stoddard, B.L. (1999) Cell. Mol. Life Sci. 55, 1304-1326. Gimble, F.S. (2000) FEMS Microbiol. Lett. 185, 99-107. Gimble, F.S. and Thorner, J. (1992) Nature 357, 301-306. Gimble, F.S. and Thorner, J. (1993) J. Biol. Chem. 268, 21844-21853. Perler, F.B. et al. (1992) Proc. Nat. Acad. Sci. U.S.A. 89, 5577-5581. Wende, W., Grindl, W., Christ, F., Pingoud, A. and Pingoud, V. (1996) Nucl. Acids Res. 24, 4123-4132. Gimble, F.S. and Wang, J. (1996) J. Mol. Biol. 263, 163-180. Nishioka, M., Fujiwara, S., Takagi, M. and Imanaka, T. (1998) Nucl. Acids Res. 26, 44094412. Komori, K., Ichiyanagi, K., Morikawa, K. and Ishino, Y. (1999) Nucl. Acids Res. 27, 41754182. Saves, I., Ozanne, V., Dietrich, J. and Masson, J.M. (2000) J. Biol. Chem. 275, 2335-2341. Komori, K., Fujita, N., Ichiyanagi, K., Shinagawa, H., Morikawa, K. and Ishino, Y. (1999) Nucl. Acids Res. 27, 4167-4174. Hodges, R.A., Perler, F.B., Noren, C.J. and Jack, W.E. (1992) Nucl. Acids Res. 20, 61536157. Saves, I., Eleaume, H., Dietrich, J. and Masson, J.M. (2000) Nucl. Acids Res. 28, 43914396. Hall, T.M., Porter, J.A., Young, K.E., Koonin, E.V., Beachy, P.A. and Leahy, D.J. (1997) Cell 91, 85-97.
PROTEIN SPLICING AND ITS APPLICATIONS
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
195
Klabunde, T., Sharma, S., Telenti, A., Jacobs, W.R., Jr. and Sacchettini, J.C. (1998) Nature Struct. Biol. 5, 31-36. Duan, X., Gimble, F.S. and Quiocho, F.A. (1997) Cell 89, 555-564. Poland, B.W., Xu, M.-Q. and Quiocho, F.A. (2000) J. Biol. Chem. 275, 16408-16413. Ichiyanagi, K., Ishino, Y., Ariyoshi, M, Komori, K. and Morikawa, K. (2000) J. Mol. Biol. 300, 889-901. Perler, F.B. (1998) Cell 92, 1-4. Hashimoto, H., Takahashi, H., Nishioka, M., Fujiwara, S., Takagi, M., Imanaka, T., Inoue, T. and Kai, Y. (2000) Acta Crystallogr. D. Biol. Crystallogr. 56, 1185-1186. Chong, S., Shao, Y., Paulus, H., Benner, J., Perler, F.B. and Xu, M.-Q. (1996) J. Biol. Chem. 271, 22159-22168. Chong, S., Williams, K.S., Wotkowicz, C. and Xu, M.-Q. (1998) J. Biol. Chem. 273, 10567-10577. Noren, C.J., Wang, J. and Perler, F.B. (2000) Angew. Chem. Int. Ed. 39, 450-466. Shao, Y., Xu, M.-Q. and Paulus, H. (1995) Biochemistry 34, 10844-10850. Shao, Y., Xu, M.-Q. and Paulus, H. (1996) Biochemistry 35, 3810-3815. Shao, Y. and Paulus, H. (1997) J. Pept. Res. 50, 193-198. Shingledecker, K., Jiang, S. and Paulus, H. (2000) Arch. Biochem. Biophys. 375, 138-144. Xu, M., Comb, D.G., Paulus, H., Noren, C.J., Shao, Y. and Perler, F.B. (1994) EMBO J. 13, 5517-5522. Xu, M. and Perler, F.B. (1996) EMBO J. 15, 5146-5153. Cooper, A.A., Chen, Y., Lindorfer, M.A. and Stevens, T.H. (1993) EMBO J. 12, 25752583. Davis, E.O., Thangaraj, J.S., Brooks, P.C. and Colston, M.J. (1994) EMBO J. 13, 699-703. Davis, E.O., Jenner, P.J., Brooks, P.C., Colston, M.J. and Sedgwick, S.G. (1992) Cell 71, 201-210. Davis, E.O., Sedgwick, S.G. and Colston, M.J. (1991) J. Bacteriol. 173, 5653-5662. Hirata, R. and Anraku, Y. (1992) Biochem. Biophys. Res. Commun. 188, 40-47. Kawasaki, M., Makino, S., Matsuzawa, H., Satow, Y., Ohya, Y. and Anraku, Y. (1996) Biochem. Biophys. Res. Commun. 222, 827-832. Kawasaki, M., Nogami, S., Satow, Y., Ohya, Y. and Anraku, Y. (1997) J. Biol. Chem. 272, 15668-15674. Kawasaki, M., Satow, Y. and Anraku, Y. (1997) FEBS Lett. 412, 518-520. Nogami, S., Satow, Y., Ohya, Y. and Anraku, Y. (1997) Genetics 147, 73-85. Dalgaard, J.Z., Moser, M.J., Hughey, R. and Mian, I.S. (1997) J. Comput. Biol. 4, 193-214. Perler, F.B., Olsen, G.J. and Adam, E. (1997) Nucl. Acids Res. 25, 1087-1093. Pietrokovski, S. (1994) Protein Sci. 3, 2340-2350. Amitai, G. and Pietrokovski, S. (1999) Nature Biotechnol. 17, 854-855. Camarero, J.A. and Muir, T.W. (1999) Curr. Prot. Protein Sci. 18.4.1-18.4.21. de Grey, A.D. (2000) Trends Biotechnol. 18, 394-399. Doyle, D.F. and Corey, D.R. (1998) Chem. Biol. 5, R157-R160. Evans, T.C., Jr. and Xu, M.-Q. (1999) Biopolymers 51, 333-432. Gimble, F.S. (1998) Chem. Biol. 5, R251-R256. Holford, M. and Muir, T.W. (1998) Structure 6, 951-956. Kochendoerfer, G.G. and Kent, S.B. (1999) Curr. Opin. Chem. Biol. 3, 665-671.
196
I. GIRIAT ET AL.
67 68 69 70 71 72 73
Muir, T.W., Dawson, P.E. and Kent, S.B. (1997) Meth. Enzymol. 289, 266-298. Paulus, H. (2000) Annu. Rev. Biochem. 69, 447-496. Perler, F.B., Xu, M.-Q. and Paulus, H. (1997) Curr. Opin. Chem. Biol. 1, 292-299. Wilken, J. and Kent, S.B. (1998) Curr. Opin. Biotechnol. 9, 412-426. Xu, M.-Q., Paulus, H. and Chong, S. (2000) Meth. Enzymol. 326, 376-418. Yu, H. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 332-334. Dalgaard, J.Z., Klar, A.J., Moser, M.J., Holley, W.R., Chatterjee, A. and Mian, I.S. (1997) Nucl. Acids Res. 25, 4626-4638. Pietrokovski, S. (1998) Curr. Biol. 8, R634-R635. Fsihi, H., Vincent, V. and Cole, S.T. (1996) Proc. Nat. Acad. Sci. U.S.A. 93, 3410-3415. Sander, P., Alcaide, F., Richter, I., Frischkorn, K., Tortoli, E., Springer, B., Telenti, A. and Bottger, E.C. (1998) Microbiology 144, 589-591. Telenti, A., Southworth, M., Alcaide, F., Daugelat, S., Jacobs, W.R., Jr. and Perler, F.B. (1997) J. Bacteriol. 179, 6378-6382. Cambon-Bonavita, M.A., et al. (2000) Extremophiles 4, 215-225. Niehaus, F., Frey, B. and Antranikian, G. (1997) Gene 204, 153-158. Perler, F.B., Kumar, S. and Kong, H. (1996) in Enzymes and Proteins from Hyperthermophilic Microorganisms, (M. W. W. Adams, ed.), pp. 377-435 Academic Press, New York, NY. Takagi, M., Nishioka, M., Kakihara, H., Kitabayashi, M., Inoue, H., Kawakami, B., Oka, M. and Imanaka, T. (1997) Appl. Environ. Microbiol. 63, 4504-4510. Bult, C.J.,et al. (1996) Science 273, 1058-1073. Aspock, G., Kagoshima, H., Niklaus, G. and Burglin, T.R. (1999) Genome Res. 9, 909923. Koonin, E.V. (1995) Trends Biochem. Sci. 20, 141-142. Beachy, P.A., Cooper, M.K., Young, K.E., von Kessler, D.P., Park, W., Hall, T.M.T., Leahy, D.J. and Porter, J.A. (1997) Cold Spring Harbor Symp. Quant. Biol. 62, 191-204. Goddard, M.R. and Burt, A. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 13880-13885. Keeling, P.J. and Roger, A.J. (1995) Nature 375, 283. Pietrokovski, S. (1996) Trends Genet. 12, 287-288. Hickey, D.A. and Benkel, B. (1986) J. Theor. Biol. 121, 283-291. Logsdon, J.J. and Palmer, J.D. (1994) Nature 369, 526; discussion 527-528. Curcio, M.J. and Belfort, M. (1996) Cell 84, 9-12. Bryk, M., Belisle, M., Mueller, J.E. and Belfort, M. (1995) J. Mol. Biol. 247, 197-210. Doolittle, W.F. and Stoltzfus, A. (1993) Nature 361, 403. Doolittle, R.F. (1993) Proc. Nat. Acad. Sci. U.S.A. 90, 5379-5381. Bell-Pedersen, D., Quirk, S.M., Aubrey, M. and Belfort, M. (1989) Gene 82, 119-126. Quirk, S.M., Bell-Pedersen, D. and Belfort, M. (1989) Cell 56, 455-465. Eddy, S.E. and Gold, L. (1992) Proc. Nat. Acad. Sci. U.S.A. 89, 1544-1547. Gorbalenya, A.E. (1998) Nucl. Acids Res. 26, 1741-1748. Liu, X.-Q. and Hu, Z. (1997) FEBS Lett. 408, 311-314. Southworth, M.W., Benner, J. and Perler, F.B. (2000) EMBO J. 19, 5019-5026. Chen, L., Benner, J. and Perler, F.B. (2000) J. Biol. Chem. 275, 20431-20435. Wang, S. and Liu, X.-Q. (1997) J. Biol. Chem. 272, 11869-11873. Wu, H., Hu, Z. and Liu, X.-Q. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 9226-9231.
74 75 76 77
78 79 80
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
PROTEIN SPLICING AND ITS APPLICATIONS
197
104 Scott, C.P., Abel-Santos, E., Wall, M., Wahnon, D.C. and Benkovic, S.J. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 13638-13643. 105 Maeder, D.L., Weiss, R.B., Dunn, D.M., Cherry, J.L., Gonzalez, J.M., DiRuggiero, J. and Robb, F.T. (1999) Genetics 152, 1299-1305. 106 Heath, P.J., Stephens, K.M., Monnat, R.J., Jr. and Stoddard, B.L. (1997) Nature Struct. Biol. 4, 468-476. 107 Silva, G.H., Dalgaard, J.Z., Belfort, M. and Van Roey, P. (1999) J. Mol. Biol. 286, 11231136. 108 Gimble, F.S. and Stephens, B.W. (1995) J. Biol. Chem. 270, 5849-5856. 109 He, Z., Crist, M., Yen, H., Duan, X., Quiocho, F.A. and Gimble, F.S. (1998) J. Biol. Chem. 273, 4607-4615. 110 Grindl, W., Wende, W., Pingoud, V. and Pingoud, A. (1998) Nucl. Acids Res. 26, 18571862. 111 Pingoud, V., Grindl, W., Wende, W., Thole, H. and Pingoud, A. (1998) Biochemistry 37, 8233-8243. 112 Hu, D., Crist, M., Duan, X. and Gimble, F.S. (1999) Biochemistry 38, 12621-12628. 113 Hu, D., Crist, M., Duan, X., Quiocho, F.A. and Gimble, F.S. (2000) J. Biol. Chem. 275, 2705-2712. 114 Pingoud, V., Thole, H., Christ, F., Grindl, W., Wende, W. and Pingoud, A. (1999) J. Biol. Chem. 274, 10235-10243. 115 Xu, M.-Q., Southworth, M.W., Mersha, F.B., Hornstra, L.J. and Perler, F.B. (1993) Cell 75, 1371-1377. 116 Derbyshire, V., Wood, D.W., Wu, W., Dansereau, J.T., Dalgaard, J.Z. and Belfort, M. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 11466-11471. 117 Paulus, H. (1998) Pure Appl. Chem. 70, 1-8. 118 Mathys, S., Evans, T.C., Chute, I.C., Wu, H., Chong, S., Benner, J., Liu, X.-Q. and Xu, M.Q. (1999) Gene 231, 1-13. 119 Southworth, M.W., Amaya, K., Evans, T.C., Xu, M.-Q. and Perler, F.B. (1999) BioTechniques 27, 110-120. 120 Chong, S. et al. (1997) Gene 192, 271-281. 121 Chong, S., Montello, G.E., Zhang, A., Cantor, E.J., Liao, W., Xu, M.-Q. and Benner, J. (1998) Nucl. Acids Res. 26, 5109-5115. 122 Brennan, T.V. and Clarke, S. (1995) in Deamidation and Isoaspartate Formation in Peptides and Proteins, (D.W. Aswad, ed.), pp. 66-90 CRC Press, Boca Raton, FL. 123 Southworth, M.W., Adam, E., Panne, D., Byer, R., Kautz, R. and Perler, F.B. (1998) EMBO J. 17, 918-926. 124 Mills, K.V., Lew, B.M., Jiang, S. and Paulus, H. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 3543-3548. 125 Evans, T.C., Jr., Martin, D., Kolly, R., Panne, D., Sun, L., Ghosh, I., Chen, L., Benner, J., Liu, X.-Q. and Xu, M.-Q. (2000) J. Biol Chem. 275, 9091-9094. 126 Lew, B.M., Mills, K.V. and Paulus, H. (1998) J. Biol. Chem. 273, 15887-15890. 127 Lew, B.M., Mills, K.V. and Paulus, H. (1999) Biopolymers 51, 355-362. 128 Wu, H., Xu, M.-Q. and Liu, X.-Q. (1998) Biochim. Biophys. Acta 1387, 422-432. 129 Wood, D.W., Wu, W., Belfort, G., Derbyshire, V. and Belfort, M. (1999) Nature Biotechnol. 17, 889-892.
198
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
I. GIRIAT ET AL.
Kawarabayasi, Y., et al. (1998) DNA Res. 5, 147-155. Evans, T.C., Jr., Benner, J. and Xu, M.-Q. (1999) J. Biol. Chem. 274, 3923-3926. Evans, T.C., Jr., Benner, J. and Xu, M.-Q. (1998) Protein Sci. 7, 2256-2264. Evans, T.C., Jr., Benner, J. and Xu, M.-Q. (1999) J. Biol. Chem. 274, 18359-18363. Hoang, T.T., Ma, Y., Stern, R.J., McNeil, M.R. and Schweizer, H.P. (1999) Gene 237, 361371. Adam, E. (1994) Masters thesis, Ecole Superieure de Biotechnologie de Strasbourg, Strasbourg, France. Shingledecker, K., Jiang, S. and Paulus, H. (1998) Gene 207, 187-195. Thorner, J., Emr, S.D. and Abelson, J.N., eds. (2000) Meth. Enzymol. Volume 328. Otomo, T., Ito, N., Kyogoku, Y. and Yamazaki, T. (1999) Biochemistry 38, 16040-16044. Otomo, T., Teruya, K., Uegaki, K., Yamazaki, T. and Kyogoku, Y. (1999) J. Biomol. NMR 14, 105-114. Yamazaki, T., Otomo, T., Oda, N., Kyogoku, Y., Uegaki, K., Ito, N., Ishino, Y. and Nakamura, H. (1998) J. Amer. Chem. Soc. 120, 5591-5592. Kawarabayasi, Y., et al. (1999) DNA Res. 6, 83-101, 145-152. Hruby, V.J. and Al-Obeidi, F. (1990) J. Biochem. 268, 249-262. Mendel, D., Cornish, V.W. and Schultz, P.G. (1995) Annu. Rev. Biophys. Biomol. Struct. 24, 435-462. Chen, Y., Ebright, Y.W. and Ebright, R.H. (1994) Science 265, 90-92. Wallace, C.J.A. (1995) Curr. Opin. Biotech. 6, 403-410. Jackson, D.Y., Burnier, J., Quan, C., Stanley, M., Tom, J. and Wells, J.A. (1994) Science 266, 243-247. Cotton, G.J. and Muir, T.W. (1999) Chem. Biol. 6, R247-R256. Noren, C.J., Anthony-Cahill, S.J., Griffith, M.C. and Schultz, P.G. (1989) Science 244, 182-188. Kent, S.B. (1988) Annu. Rev. Biochem. 57, 957-989.
149 150 Gaertner, H.F., Rose, K., Cotton, R., Timms, D., Camble, R. and Offord, R.E. (1992) Bioconjug. Chem. 3, 262-268. 151 Dyckes, D.F., Creighton, T. and Sheppard, R.C. (1974) Nature 247, 202-204. 152 Wallace, C.J.A. (1993) FASEB J. 7, 505-515. 153 Homandberg, G.A. and Laskowski, M., Jr. (1979) Biochemistry 18, 586-592. 154 Gaertner, H.F., Offord, R.E., Cotton, R., Timms, D., Camble, R. and Rose, K. (1994) J. Biol. Chem. 269, 7224-7230. 155 Tam, J.P., Lu, Y.-A., Liu, C.-F. and Shao, J. (1995) Proc. Nat. Acad. Sci. US.A. 92, 12485-12489. 156 Tam, J.P. (1999) Biopolymers 51, 309-310. 157 Dawson, P.E., Muir, T.W., Clark-Lewis, I. and Kent, S.B. (1994) Science 266, 776-779. 158 Dawson, P.E. and Kent, S.B. (2000) Annu. Rev. Biochem. 69, 923-960. 159 Hennard, C. and Tarn, J.P. (1997) Proc. 15th American Peptide Symposium, Peptides: Frontiers of Peptide Science, 247-248. 160 Camarero, J.A., Cotton, G.J., Adeva, A. and Muir, T.W. (1998) J. Pept. Res. 51, 303-316. 161 Hackeng, T.M., Griffin, J.H. and Dawson, P.E. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 10068-10073.
PROTEIN SPLICING AND ITS APPLICATIONS
199
162 Canne, L.E., Botti, P., Simon, R.J., Chen, Y., Dennis, E.A. and Kent, S.B.H. (1999) J. Amer. Chem. Soc. 121, 8720-8727. 163 Cotton, G.J. and Muir, T.W. (2000) Chem. Biol. 7, 253-261. 164 Brik, A., Keinan, E. and Dawson, P.E. (2000) J. Org. Chem. 65, 3829-3835. 165 Erlandson, D.A., Chytil, M. and Verdine, G.L. (1996) Chem. Biol. 3, 981-991. 166 Muir, T.W., Sondhi, D. and Cole, P.A. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 6705-6710. 167 Severinov, K. and Muir, T.W. (1998) J. Biol. Chem. 273, 16205-16209. 168 Xu, R., Ayers, B., Cowburn, D. and Muir, T.W. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 388-393. 169 Cotton, G.J., Ayers, B., Xu, R. and Muir, T.W. (1999) J. Amer. Chem. Soc. 121, 11001101. 170 Welker, E. and Scheraga, H.A. (1999) Biochem. Biophys. Res. Commun. 254, 147-151. 171 Chytil, M., Peterson, B.R., Erlanson, D.A. and Verdine, G.L. (1998) Proc. Nat. Acad Sci. U.S.A. 95, 14076-14081. 172 Kinsland, C., Taylor, S.V., Kelleher, N.L., McLafferty, F.W. and Begley, T.P. (1998) Protein Sci. 7, 1839-1842. 173 Camarero, J.A. and Muir, T.W. (1999) J. Amer. Chem. Soc. 121, 5597-5598. 174 Ayers, B., Blaschke, U.K., Camarero, J.A., Cotton, G.J., Holford, M. and Muir, T.W. (1999) Biopolymers 51, 343-354. 175 Iwai, H. and Pluckthun, A. (1999) FEBS Lett. 459, 166-172. 176 Tolbert, T.J. and Wong, C.-H. (2000) J. Amer. Chem. Soc. 122, 5421-5428. 177 Huse, M., Holford, M.N., Kuriyan, J. and Muir, T.W. (2000) J. Amer. Chem. Soc. 122, 8337-8338. 178 Camarero, J.A., Pavel, J. and Muir, T.W. (1998) Angew. Chem. Int. Ed. 37, 347-349. 179 Blaschke, U.K., Silberstein, J. and Muir, T.W. (2000) Meth. Enzymol. 328, 478-496. 180 Birge, R.B., Knudsen, B.S., Besser, D. and Hanafusa, H. (1996) Genes Cells 1, 595-613. 181 Feller, S.M., Knudsen, B. and Hanafusa, H. (1994) EMBO J. 13, 2341-2351. 182 Rosen, M.K., Yamazaki, T., Gish, G.D., Kay, C.M., Pawson, T. and Kay, L.E. (1995) Nature 374, 477-479. 183 Pervushin, K., Riek, R., Wider, G. and Wuthrich, K. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 12366-12371. 184 Tjandra, N. and Bax, A. (1997) Science 278, 1111-1114. 185 Blaschke, U.K., Cotton, G.J. and Muir, T.W. (2000) Tetrahedron 56, 9461-9470. 186 Rizo, J. and Gierasch, L.M. (1992) Annu. Rev. Biochem. 61, 387-418. 187 Roy, R.S., Allen, O. and Walsh, C.T. (1999) Chem. Biol. 6, 789-799. 188 Camarero, J.A. and Muir, T.W. (1997) J. Chem. Soc. Chem. Commun. 1369-1370. 189 Ozawa, T., Nogami, S., Sato, M., Ohya, Y. and Umezawa, Y. (2000) Anal. Chem. 72, 5151-5157.
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SERIAL ANALYSIS OF GENE EXPRESSION (SAGE)
Hamish S. Scott1 and Roman Chrast2 1
Genetics and Bioinformatics Division Walter and Eliza Hall Institute Royal Parade, Parkville P.O. Royal Melbourne Hospital Victoria 3050, Australia e-mail:
[email protected] 2
Molecular Neurobiology The Salk Institute for Biological Studies 10010 North Torrey Pines Road La Jolla, California 92037, USA
INTRODUCTION AND OVERVIEW Gene discovery and gene expression studies can enable the quantification and tracking of the expression of tens of thousands of genes in space and time. Understanding the spatial and temporal expression patterns of individual genes, and the way these patterns relate to expression patterns of other genes and to physiological events such as changes in behavioral state, onset of disease and response to drugs, is of great interest and can provide valuable insights into molecular physiology. Traditional methods for comparing gene expression between two different RNA samples, such as subtractive hybridization for cDNA library construction, are of limited sensitivity (1). Newer PCR-based techniques, such as representational difference analysis of cDNA (2), differential display (3, 4) and Selective Amplification via Biotin and Restriction-mediated Enrichment (SABRE) (5), have overcome some of these limitations. There are many variations, including high-throughput protocols, for some of these techniques. However, while these techniques may identify some Genetic Engineering, Volume 23, Edited by J. K. Setlow Kluwer Academic / Plenum Publishers, 2001
201
202
H.S. SCOTT AND R. CHRAST
differentially-expressed genes, they are neither quantitative nor truly "global". To achieve a truly global comparison of gene expression patterns between mRNA populations, each mRNA molecule in the cell (at least to a certain level of abundance) must be sampled. This is necessary both to allow quantitative comparison of mRNA transcripts and also to define a "transcriptome", that is, a description of identity of each transcript expressed in the mRNA population and its level of expression. The early 1990s saw the introduction of a series of powerful gene expression technologies that can approach this definition of "global", such as analyses of expressed sequenced tags (ESTs) from cDNA libraries (6-12) and filter arrays (13, 14). As well as the sequences of many human, mouse and rat ESTs being available in public databases (NCBI dbEST http://www.ncbi.nlm.nih.gov/dbEST/index.html and UniGene http://www.ncbi.nlm. nih.gov/UniGene/index.html), the cDNA clones used to produce these sequences are available through a series of international distributors (IMAGE http://wwwbio.llnl.gov/bbrp/image/image.html (15). Similar resources exist for many other model organisms. The availability of these cDNA clones and sequences has provided additional tools for "global" high-throughput differential gene expression technologies leading to a profusion of filter array (13, 14, 16-18), cDNA microarray (19-25) and oligonucleotide microarray (26-28) techniques. Additionally, completed genome sequencing projects such as the yeast (29), C. elegans (30), Drosophila (31), and Arabidopsis thaliana (32), and other genome projects nearing completion, such as the human and mouse genomes (33-35), provide an increasing amount of data to help in both the design and analysis of high-throughput differential gene expression techniques.
Serial Analysis of Gene Expression (SAGE) The scale and cost of EST and array based techniques compared with traditional PCR based approaches is prohibitive for many laboratories. An alternative "global" gene expression technology is Serial Analysis of Gene Expression or SAGE (36) which, curiously, was described in the same issue of Science as the cDNA microarrays from Stanford (19). The SAGE technique provides a digital readout of a large number of short sequences (TAGs), sufficient to identify each transcript. Major advantages of SAGE are that it is not dependent on prior knowledge of transcript sequence, experimental data are electronically stored and can be reanalyzed as genome projects are significantly advanced or completed and, in theory, all transcripts (even the rare ones) can be sampled if enough TAGs are sequenced. Results from any new experiment are directly comparable to existing gene expression databases and SAGE data represent absolute expression levels, based on the digital enumeration of transcript TAGs in a total population of transcripts. However, for in depth analyses, SAGE can be laborintensive and expensive. All gene expression technologies have their advantages and disadvantages and thus should be viewed as complementary to each other, budget permitting, and it is important to remember that the readout of the presence and level of RNA transcripts in a biological system is only one measure of its state. There is increasing evidence that many other important levels of cellular control are widely used including 1/2 lives of RNAs and proteins, the rates of transcription and translation, post-translational modifications, intracellular locations and molecular associations of the protein products or RNAs of the expressed genes, e.g., (37).
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SAGE
203
Principles of SAGE SAGE is a technique that allows the quantitative analysis of a large number of transcripts in a given mRNA population by documenting the presence of each transcript. Differential and quantitative comparison of gene expression can be performed by comparing expression profiles between different samples (36, 38). As for EST projects, SAGE relies on sequencing to identify genes and may be considered as a variation of EST analyses (39). However, unlike ESTs, SAGE identifies only a short nucleotide sequence TAG of 14 to 16 bp from a defined position within the transcript, for example, the 3'-most NlaIII restriction enzyme site (Figure 1A). After ligation of a special linker to the RNA molecules, a short TAG can be created using a type II restriction enzyme that cuts away, 3-prime, from its recognition site in the linker (e.g., BsmFI, Figure 1B). These TAGs are concatemerized, cloned and sequenced. The manner of their generation means that there is a punctuation mark (e.g., the NlaIII site) between each TAG allowing its discrimination (Figure 1C). The concatemerization allows the efficient analysis of transcripts in a serial manner by sequencing of multiple TAGs within a single clone. The analysis of short TAGs enables approximately 40 times more transcripts to be identified than in an EST project, with the use of a comparable sequencing effort. A 14-bp TAG can theoretically distinguish transcripts given a random nucleotide distribution at the TAG site. Current estimates suggest that the human genome contains only around 40,000 genes, although increasingly, alternative splicing is recognized as playing an important role in generation of alternative transcripts and proteins (33, 34). Thus a SAGE TAG isolated from a specific location within each transcript contains, in theory, enough information to find the gene from which it originates. The level of expression of a particular transcript is reflected in the frequency with which its TAG is encountered within the sequenced population. Thus, to perform a comparison of gene expression, one can simply collect TAGs from two or more cell types or tissues. A detailed schematic of SAGE is shown step by step in Figure 2. mRNA is converted to double-stranded cDNA with a biotinylated oligo dT primer. The cDNA is cleaved with the "anchoring enzyme" (AE), a restriction endonuclease with a 4-bp recognition site (usually NlaIII, recognition site CATG). The AE will cut every 256-bp on average Thus a small proportion of transcripts including short transcripts may not be sampled. Theoretically, this can be overcome by using another anchoring enzyme, but this is not generally performed (40). The 3'-most portion of the transcripts, the regions closest to the polyA tail, which have a 4-nt NlaIII overhang, are isolated with streptavidin magnetic beads. The cDNA is divided in two and different linkers are ligated to each aliquot. The linkers contain the recognition sites for a type II restriction enzyme that cuts 10 to 14-bp 3-prime to its recognition site, thus releasing the TAG (tagging enzyme, TA, e.g., BsmFI). Other restriction enzymes could result in larger TAGs (see REBASE site http://rebase.neb.com/rebase/rebase.charts.html). The resulting released TAGs with linker molecules attached are made blunt ended and the two cDNA aliquots with the different linkers are ligated together. Amino group modifications on the linkers control the "sense" of the ligated molecules. The ligated TAGs are then PCR amplified with primers specific to each linker. The use of different linkers and their specific PCR primers at this step results in more efficient amplification, avoiding suppression PCR as a result of formation of lariat structures (41). The PCR amplification product contains two TAGs in a known orientation (a DiTAG) still flanked by the AE recognition sites. The PCR
204
H.S. SCOTT AND R. CHRAST
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SAGE
205
206
H.S.SCOTTAND R. CHRAST
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SAGE
207
product is then purified [a modified purification protocol is described in (42)], digested with the AE to release a DiTAG with 4-nt overhangs at each end and the isolated DiTAGs are concatemerized and cloned. NlaIII digestion of the purified 102 bp PCR product is often problematic (step IX, Figure 2). This may be overcome by the introduction of a simple additional purification step of the 102 bp DiTAGs prior to NlaIII digestion (43). Also, it is essential to avoid the destruction of the NlaIII sites by exonuclease activity in the early steps of the protocol. Heating of the concatemerized DiTAG sample before size fractionation (step X, Figure 2) considerably increases the proportion of clones obtained containing inserts of the expected size, as small concatemers may aggregate and migrate with large ones during gel electrophoresis (44). The sites of the original AE are preserved between DiTAGs and establish their boundaries. The sequence of the clones thus contains DiTAGs punctuated by the AE sites (e.g., AE-5'-TAGa-3'-3'-TAGb-5'-AE-5'-TAGc-3'-3'-TAGd-5'-AE-5'-TAGe-3'-3'-TAGf-5'AE...etc.). The formation of DiTAGs permits an important internal control of the PCR amplification step. The DiTAGs should always contain different combinations of TAGs, i.e., DiTAGl 5'-TAGa-3'-3'-TAGb-5' and DiTAG2 5'-TAGc-3'-3'-TAGd-5'. If the same DiTAG is found repetitively, this is an indication of preferential amplification. Duplicate DiTAGs do occur, either derived from very abundant transcripts or as a result of PCR amplification bias, and can be excluded from subsequent analyses without substantially altering the results.
208
H.S. SCOTT AND R. CHRAST
Some Examples Illustrating the Utility of SAGE Many laboratories have now published SAGE studies of differential gene expression or gene cataloguing in a variety of organisms such as mice, humans, rice and yeast, either in vivo, with directly sampled tissues, or in vitro (38, 40, 45-75). Some of these studies include: Definition of the yeast transcriptome, including a comparison of transcription at 3 different stages of the cell cycle. This study identified additional small transcripts over those already defined in the yeast genome (38, 76). Examination of the response in yeast to growth on different carbon sources (47). Dissection of molecular pathways, particularly with relevance to cancer (50, 51, 53, 54). Description of expression profiles of cells in the immune system (45, 46, 49, 66) Identification of markers from cancer cells (48, 55, 77, 78). Additional information integral to some of these studies is also presented on the Web (see Table 1). The number of TAGs sequenced varies enormously between studies. In mammalian systems, sequencing only 1700 TAGs identified high levels of expression of T-cell-directed CC chemokine (TARC) in Hodgkin's lymphoma, a finding that may be linked to the characteristic T-cell infiltrate in classical Hodgkin's disease (61). From the study of only 4000 TAGs, manganese superoxide dismutase was selected and then shown to be responsible for resistance to nitric oxide in nNOS neurons (52). In contrast, large numbers of TAGs were sequenced by Polyak et al. (53) in their study of p53 overexpression in a colorectal cancer line containing an inactive endogenous p53 gene. p53 is a known transcriptional activator without known partners. Comparisons between 50,000 TAGs for the cells transduced with a p53 andenovirus compared to 50,000 TAGs from a mock transduced cell line, identified only 14 transcripts (0.19% of the 7,202 transcripts detected), expressed at levels more than 10-fold greater in p53-expressing cells than in control cells. Conversely, only 20 transcripts were expressed at levels less than 10-fold lower in the p53-expressing cells than in control cells. The abundance of the 14 induced genes varied between 10 to 123 of 50,000. The promoter of one of these PIGs (p53-induced genes) was isolated and shown to contain a p53 binding site and drive a reporter gene on co-transfection with p53. SAGE studies recently uncovered higher order organizational patterns of chromosomal arrangement in the Human Transcriptome Map (e.g., RIDGEs: Regions of IncreaseD Gene Expression). Integration of the chromosomal position of human genes with ~2.45 million SAGE TAGs generated gene expression profiles for chromosomal regions in 12 normal and pathologic tissue types, revealing clustering of highly-expressed genes. This type of analysis provides a tool to search for genes overexpressed or silenced in cancer or other pathological conditions (75).
Sample Selection; Consideration and Preparations That SAGE gives an accurate reflection of the content of the RNA used in an experiment has been established by the studies described. This does not, however, address the question of whether or not the RNA samples used were appropriate to the biological question being studied. The results of many differential gene expression studies are confirmed by the use of a second independent technique, but used on the same RNA sample.
210
H.S. SCOTT AND R. CHRAST
The biological and statistical (as opposed to only statistical) significance of observed differences will be strengthened by confirming their differential gene expression in independent biological samples. However, differences observed in a differential gene expression experiment may not be replicated in independent biological samples. Discrepancies between apparently identical samples may be due to variable gene expression in different physiological states (e.g., caused by sample collection techniques). Thus, great care should be taken in sample matching and collection to ensure that the genes identified are genuinely differentially expressed. In vitro systems for modelling biological processes can generally be used to produce large quantities of well-defined RNA. In contrast, collecting well matched, biologically relevant samples in vivo may be extremely difficult. As repeating an entire SAGE experiment to confirm differences is not feasible, initial planning to ensure the relevance of the differentially expressed genes identified is essential. The requirement for of polyA+ RNA for a standard SAGE protocol has meant that a large number of SAGE studies have been performed from gross anatomical organs or non-limiting in vitro systems. Clearly, to understand more precisely the biological role and physiological significance of specific cell types it is desirable to generate transcriptomes from finely dissected organs and microanatomical structures. There have been a number of general enhancements of the SAGE protocol. These include more efficient NlaIII digestion of DiTAGs (43) and purification of DiTAGs prior to concatemerization (42, 44), all of which help in reducing the quantity of starting material required. The use of M-MLV RNase H2 reverse transcriptase (Superscript II, Gibco BRL) was shown to generate significantly more cDNA than M-MLV RT and was critical in allowing SAGE to be performed from approximately microdissected kidney tubule cells (60). Several new reverse transcriptases such as Sensiscript and Omniscript (Qiagen) may further increase the efficiency of the RT step but this remains to be investigated. Several similar strategies, which increase the number of "solid phase" manipulations and thus reduce sample loss, also enable SAGE to be performed on a similar number of cells (72, 79). Another protocol, SAGElite, uses exponential PCR amplification of the starting fulllength cDNA (80) and has been used successfully on only 9 oocytes (68). However, the potential for amplification bias during these long PCRs may result in a distortion of the true transcript levels. This is of particular concern for low abundance transcripts, as initial priming of low concentration molecules may be subject to stochastic events, resulting in some transcripts being under represented in the final cDNA population.
How Many TAGs do you Need to Sequence? The number of unique transcripts identified by SAGE in mammalian cells approaches zero at the level of 600,000 TAGs (81). Sequencing is the most costly and time consuming component of a SAGE experiment. As can be seen from the above examples, it is difficult to know how many SAGE TAGs it will be necessary to sequence to achieve biologically relevant information. Each experiment is different and should be planned, keeping in mind the nature of the target molecules and the overall goals of the experiment. Information about the expression levels of some transcripts may be available from other studies and/or the EST databases. An easy way to halve the number of sequencing reactions required is to locate another scientist who has performed a similar experiment and can provide the normal (or control) database! A sizeable collection of SAGE data is available from SAGE studies of
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SAGE
211
normal and cancer tissues and cell lines as part of the Cancer Genome Anatomy Project [http://www.ncbi.nlm.nih.gov/SAGE/,] (81-83) (see Table 1). In addition, a valid way of finding TAGs or transcripts unique to a sample of interest is to compare the test sample to the ever-increasing number of public SAGE databases (Table 1). Estimates of the number of mRNA molecules per cell differ between cell types and are made by comparing the RNA/DNA ratio, the amount of mRNA recovered from a defined number of cells and an estimation of the average molecular mass of an mRNA molecule [~625 kDa for mouse or human mRNAs (84)]. Estimates of the number of mRNA molecules per mammalian cell vary between 150,000 and 700,000 (40, 55). Thus, if an average cell type contains 300,000 mRNA molecules, a comparison of 50,000 TAGs from each sample will detect differences in the overexpression of transcripts which are normally present at more than 5 molecules per cell. SAGE more readily detects genes overexpressed at this level than underexpressed. Additionally, TAGs should be detected quantitatively for roughly 50% of the total cellular transcripts, but for only approximately 14% of unique transcripts. The vast majority of transcripts will only be observed 1 or 2 times in the study and no significant statistical value is attributable. To increase sensitivity, additional sequencing reactions may be performed from the same SAGE library. Theoretically, SAGE is absolutely quantitative, meaning that if a TAG is observed 5 times from 50,000 TAGs, this should be a direct measure of the abundance of the corresponding transcript. In the largest published SAGE analysis of a single tissue transcriptome to date, of 1,004,509 transcripts in colon cancer cells, 90% of the 69,381 unique TAGs detected were present at 5 or fewer copies per cell. There is more than one TAG/transcript due to alternative polyadenylation and splicing. Of these low abundance transcripts, only 64% matched mRNA or EST entries in GenBank (23 Feb. 1999) and they were estimated to comprise 23% of the total cellular transcript mass. Another 30% of the total cellular transcript mass, comprising 9% of the unique transcripts (6,358/69,381), were expressed at between 5 and 50 copies per cell, and 95% of these matched GenBank entries (81). Similar findings for transcript abundance have been observed in the mouse, with a lower number of matches to GenBank entries for low abundance transcripts (74). The sensitivity of array techniques is often claimed to be 1/300,000, but this figure, of course, is dependant on the optimal experimental conditions, especially a large quantity of mRNA to label (18). For most array systems, the lower the abundance of a particular transcript, the lower the chance that it will be represented as a cDNA or EST clone, and subsequently be arrayed. Additionally, representation of a particular transcript on an array does not guarantee a detectable signal. Recent comparisons of SAGE with Affymetrix chips (85), filter arrays (86) and microarrays (unpublished) have shown that the techniques have similar sensitivity, although microarrays tended to underestimate the magnitude of expression differences as determined by SAGE or Northern blot.
Analyses of SAGE Data: From TAG to mRNA Several comprehensive sets of software tools for managing and analyzing SAGE data are available (38, 83, 87, 88). The original PC based program (38) has been considerably modified and improved and has an interface with both web-based tools and Microsoft Access (http://www.sagenet.org/). Other recent programs provide additional tools often unix- and web-based(83, 87) and add extra features such as phred based quality scores to the sequence (88), gene/protein classification and grouping and chromosomal localization (89). The
212
H.S. SCOTT AND R. CHRAST
gene/protein classification and chromosomal localization features are often limited by our current state of knowledge. Sequencing of SAGE libraries produces stretches of DiTAGs separated by AE sequences (CATG). In order to extract single TAGs one uses the information that each TAG is at least 10 bp long and it is always 5-prime or 3-prime to the AE sequence. The majority of laboratories using the SAGE technique use SAGE analysis software (http://www.sagenet.org/), which extracts the TAG sequences from sequencing files. This program, among other functions, allows exclusion of duplicated DiTAGs and TAGs derived from SAGE linkers, and statistically evaluates observed differences among analyzed SAGE projects. It also allows isolation of predicted SAGE TAGs from sequences in gene databases which are then used to identify the sequenced TAGs. The statistical approach to compare SAGE libraries used in the SAGE analysis software is based on Monte Carlo simulation (38). Different statistical approaches based on Bayesian statistical analysis (49, 83) or Z test statistics (47) were also used successfully to identify differences among SAGE libraries and all published techniques produce similar results. However, the results of statistical analysis should always be considered in parallel with their potential biological significance, since it is possible that small, statistically nonsignificant differences may play a role in phenotypic differences between samples. Identification of mRNAs from which all TAGs, but particularly TAGs with statistically significant differences in frequencies among analyzed libraries originate, is essential to evaluate their biological importance. The recent development of “reliable” TAGto-gene mapping, which extracts SAGE TAGs from UniGene with an algorithm based on evaluation of the presence of polyA sites and tails, sequence type and sequence annotation, has improved the accuracy of TAG identification (83). Despite the fact that a 14-bp TAG can theoretically distinguish transcripts given a random nucleotide distribution at the TAG site, in practice unambiguous TAG-to-gene mapping is difficult in mammalian systems with an error rate of around 20% (90). However, this figure does not allow the inclusion of an additional nucleotide in the TAG which reduces the ambiguity. The intuitive input of an investigator may resolve the identification of an ambiguous TAG. The inclusion of genomic sequences in TAG identification algorithms should aid in the correct identification of SAGE TAGs. A specific category of differentially expressed TAGs are those which do not match any known gene or EST. These TAGs may derive from novel genes implicated in unknown regulatory mechanisms and are, therefore, interesting candidates for further study. Again, the inclusion of genomic sequences in TAG identification algorithms should aid in identification of the corresponding transcripts. Several other approaches have also been described. In order to isolate corresponding cDNA from which the “unmatched” TAG derives, it is possible to use the 14 bp TAG sequence (including the 5-flanking NlaIII sequence (CATG)) as a probe in order to screen cDNA libraries (36). Alternatively one can use the 14 bp TAG as a 5-prime oligonucleotide and M13 forward primer as a 3-prime oligonucleotide to perform RT-PCR on cDNA prepared as described for SAGE libraries (Figure 2, step II). Note that the 3-prime oligo dT primer must contain an additional Ml3 forward sequence between the oligo (dT) and the biotinylated 5-prime residue (S3). The 5-prime primer can be extended if the linker ligation step of the SAGE protocol is performed (Figure 2, step III) and several nucleotides of the linker may be included in the 5-prime primer. While it is also possible to perform these protocols without the selection of the 3-prime-most regions of the mRNA transcripts, this step reduces the complexity of the template with a corresponding reduction in false positives.
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SAGE
213
Two other protocols were recently reported: addition of 5 inosine nucleotides to the TAG specific 5' oligonucleotide to increase the annealing temperature of the primer (91), and the use of anchored oligo (dT) as 3' oligonucleotide (92). The success of all these approaches may be improved by the use of peptide nucleic acids (PNAs) as oligonucleotide substitutes. Since a PNA strand is uncharged, PNA/DNAduplexes have a higher Tm than the corresponding DNA/DNA-duplex, giving PNAs greater specificity (93, 94). Increased specificity can also be achieved by increasing the length of a TAG, thus making isolation of the corresponding mRNA easier. This idea led to the development of a modified SAGE technique employing TAGs of 17 bp compared to the 14-16 bp TAGs used in the original SAGE protocol (95). However, this significantly increases the sequencing load, and therefore the cost of a SAGE project for identification of the same number of TAGs.
Validation of Observed Differences among SAGE Libraries Confirmation of SAGE-detected differences with the use of an independent technique remains an important step in order to avoid misleading conclusions based on incorrect data. It is important to notice that some observed differences will not be confirmed. Factors including alternative splicing, polymorphisms, alternative polyadenylation, false cDNA synthesis priming on polyA stretches, incomplete digestion by NlaIII, incorrect identification of transcripts, sequencing and PCR errors and differences in samples used for preparation of SAGE libraries and for confirmation, all may contribute to a failure to confirm observed differences. Approaches used to perform confirmations can be divided into the two groups: “proof of principle” approaches which confirm a selected number of I) differences with techniques such as Northerns blots, RNase protection and real time (quantitative) RT-PCR. large scale confirmation approaches including reverse dot blots, micro and II macroarray hybridizations [e.g., (86)]. The advantage of these approaches is that many (or all) differences detected with SAGE technique may be reanalyzed in different samples. Thus one can consider SAGE as a molecular sieve to isolate potential interesting candidates to be explored further by these “parallel” techniques. However, all approaches in group II remain expensive and time consuming.
From SAGE Data to Functional Hypothesis Comparison of two or more samples by SAGE technique may lead to the discovery of hundreds of differentially-expressed genes. Thus SAGE libraries should be compared to additional SAGE data to identify TAGs or transcripts unique to the sample of interest. Functional grouping of TAGs according to their biological function is usually done manually by gene database searching and annotation. Alternatively, differentially-expressed genes may be scanned against functional databases such as the Expressed Gene Anatomy Database (EGAD) (96), to allow grouping of transcripts based on their cellular role, subcellular localization and chromosome mapping (89).
214
H.S. SCOTT AND R. CHRAST
CONCLUSIONS SAGE has already provided vast amounts of data which can be exploited by the scientific community, including known and unknown transcripts that are either ubiquitously expressed or specific to certain cell types. Used in conjunction with the published human transcriptome, human SAGE analyses will make possible the rapid identification of transcripts unique to specific biological states. Perhaps the most troubling aspect of SAGE data in mammalian systems is the quantification of the number of transcripts that are expressed at <5 copies per cell. This problem had already been identified by ROT studies in the 1960s and 1970s. These rare transcripts comprise 25% of cellular mRNA by mass but 94% of unique transcripts by identity. Not surprisingly, only 50% of these poorly expressed transcripts match transcript sequences (mRNAs and ESTs) in GenBank/EMBL. The magnitude of the problem is even greater when considering the 83% of unique transcripts present at one copy per cell or less. Many of these transcripts will be at least partially identified by genomic sequencing. However, with the push towards high throughput expression technologies, the important question is whether array technologies and SAGE analyses will provide the sensitivity both to detect and quantitate accurately this large number of low abundance transcripts. This level of sensitivity is not consistent with the kinetics of hybridization with the use of microarrays and the quantity of sequencing required by SAGE would only be possible in a large genome center. All gene expression technologies have their advantages and disadvantages and thus should be viewed as complementary to each other. Additionally, most of these techniques are still evolving. Many researchers are already combining different techniques, including SAGE and microarrays. The use of a standard reference RNA in different microarray experiments provides a common denominator for accurate and reproducible comparison of gene expression data. Similarly, for comparison of multiple experimental RNA samples (hybridization experiments), a common reference RNA provides an essential internal control, allowing comparisons to be made among large numbers of samples (experiments) which can dramatically increase the power of a microarray experiment. Microarray elements representing genes representative of different expression levels in a SAGE transcriptome can be used as controls, allowing conversion of microarray data to absolute expression levels as well as thorough monitoring of sensitivity. With the advent of multiple color microarray scanners and additional fluorophores, it may become possible to include three (or more) RNA samples in a microarray experiment, the reference RNA and the two (or more) samples for which an immediate comparison is desired.
ACKNOWLEDGMENTS Thanks to all with whom I have debated differential gene expression techniques and SAGE, particularly Joelle Michaud, Stylianos Antonarakis and Victor Velculescu. I also thank Pauline Crewther for expert editorial assistance with the manuscript.
REFERENCES
1
Sagerstrom, C. G., Sun, B. I. and Sive, H. L. (1997) Annu. Rev. Biochem. 66, 751783.
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SAGE
2 3 4 5 6 7 8 9
10
11 12
13 14 15 16 17 18 19 20 21 22 23 24 25
215
Hubank, M. and Schatz, D. G. (1994) Nucl. Acids Res. 22, 5640-5648. Liang, P. and Pardee, A. B. (1992) Science 257, 967-971. Liang, P. and Pardee, A. B. (1998) Mol. Biotechnol. 10, 261-267. Lavery, D. J., Lopez-Molina, L., Fleury-Olela, F. and Schibler, U. (1997) Proc. Nat. Acad. Sci. U.S.A. 94, 6831-6836. Adams, M. D., Dubnick, M., Kerlavage, A. R., Moreno, R., Kelley, J. M., Utterback, T. R., Nagle, J. W., Fields, C. and Venter, J. C. (1992) Nature 355, 632-634. Okubo, K., Hori, N., Matoba, R., Niiyama, T., Fukushima, A., Kojima, Y. and Matsubara, K. (1992) Nat. Genet 2, 173-179. Adams, M. D., Soares, M. B., Kerlavage, A. R., Fields, C. and Venter, J. C. (1993) Nat. Genet 4, 373-380. Adams, M. D., Kerlavage, A. R., Fields, C. and Venter, J. C. (1993) Nat. Genet 4, 256-267. Adams, M. D., Kerlavage, A. R., Fleischmann, R. D., Fuldner, R. A., Bult, C. J., Lee, N. H., Kirkness, E. F., Weinstock, K. G., Gocayne, J. D. and White, O. (1995) Nature 377 (Supplement), 3-174. Aaronson, J. S., Eckman, B., Blevins, R. A., Borkowski, J. A., Myerson, J., Imran, S. and Elliston, K. O. (1996) Genome Res. 6, 829-845. Hillier, L. D., Lennon, G., Becker, M., Bonaldo, M. F., Chiapelli, B., Chissoe, S., Dietrich, N., DuBuque, T., Favello, A., Gish, W., Hawkins, M., Hultman, M., Kucaba, T., Lacy, M., Le, M., Le, N., Mardis, E., Moore, B., Morris, M., Parsons, J., Prange, C., Rifkin, L., Rohlfing, T., Schellenberg, K. and Marra, M. (1996) Genome Res. 6, 807-828. Gress, T. M., Hoheisel, J. D., Lennon, G. G., Zehetner, G. and Lehrach, H. (1992) Mamm. Genome 3, 609-619. Lennon, G. G. and Lehrach, H. (1991) Trends Genet. 7, 314-317. Lennon, G., Auffray, C., Polymeropoulos, M. and Soares, M. B. (1996) Genomics 33, 151-152. Nguyen, C., Rocha, D., Granjeaud, S., Baldit, M., Bernard, K., Naquet, P. and Jordan, B. R. (1995) Genomics 29, 207-216. Pietu, G., Alibert, O., Guichard, V., Lamy, B., Bois, F., Leroy, E., Mariage-Sampson, R., Houlgatte, R., Soularue, P. and Auffray, C. (1996) Genome Res. 6, 492-503. Bertucci, F., Bernard, K., Loriod, B., Chang, Y. C., Granjeaud, S., Birnbaum, D., Nguyen, C., Peck, K. and Jordan, B. R. (1999) Hum. Mol. Genet. 8, 1715-1722. Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995) Science 270, 467-470. Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P. O. and Davis, R. W. (1996) Proc. Nat. Acad. Sci. U.S.A. 93, 10614-10619. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998) Mol. Biol. Cell 9, 3273-3297. Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998) Proc. Nat. Acad. Sci. U.S.A. 95, 14863-14868. Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P. O. and Herskowitz, I. (1998) Science 282, 699-705. Brown, P. O. and Botstein, D. (1999) Nat. Genet. 21, 33-37. Iyer, V. R., Eisen, M. B., Ross, D. T., Schuler, G., Moore, T., Lee, J. C. F., Trent, J. M., Staudt, L. M., Hudson, J., Jr, Boguski, M. S., Lashkari, D., Shalon, D., Botstein, D. and Brown, P. O. (1999) Science 283, 83-87.
216
H.S. SCOTT AND R. CHRAST
26
Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X. C., Stern, D., Winkler, J., Lockhart, D. J., Morris, M. S. and Fodor, S. P. (1996) Science 274, 610-614. Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H. and Brown, E. L. (1996) Nat. Biotechnol. 14, 1675-1680. Wodicka, L., Dong, H., Mittmann, M., Ho, M. H. and Lockhart, D. J. (1997) Nat. Biotechnol. 15, 1359-1367. Goffeau, A., Barrell, B. G., Bussey, H., Davis, R. W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J. D., Jacq, C., Johnston, M., Louis, E. J., Mewes, H. W., Murakami, Y., Philippsen, P., Tettelin, H. and Oliver, S. G. (1996) Science 274, 546, 563-546, 567. The C. elegans Sequencing Consortium (1998) Science 282, 2012-2018. Adams, M. D. et al. (2000) Science 287, 2185-2195. The Arabidopsis Genome Initiative (2000) Nature 408, 796-815. Venter, J. C. et al. (2001) Science 291, 1304-1351. International Human Genome Sequencing Consortium (2001) Nature 409, 860 - 921. Marshall, E. (2000) Science 290, 242-243. Velculescu, V. E., Zhang, L., Vogelstein, B. and Kinzler, K. W. (1995) Science 270, 484-487. Gygi, S. P., Rochon, Y., Franza, B. R. and Aebersold, R. (1999) Mol. Cell. Biol. 19, 1720-1730. Velculescu, V. E., Zhang, L., Zhou, W., Vogelstein, J., Basrai, M. A., Bassett, D. E., Jr, Hieter, P., Vogelstein, B. and Kinzler, K. W. (1997) Cell 88, 243-251. Adams, M. D. (1996) Bioessays 18, 261-262. Welle, S., Bhatt, K. and Thornton, C. A. (1999) Genome Res. 9, 506-513. Siebert, P. D., Chenchik, A., Kellogg, D. E., Lukyanov, K. A. and Lukyanov, S. A. (1995) Nucl. Acids Res. 23, 1087-1088. Powell, J. (1998) Nucl. Acids Res. 26, 3445-3446. Angelastro, J. M., Klimaschewski, L. P. and Vitolo, O. V. (2000) Nucl. Acids Res. 28, E62. Kenzelmann, M. and Muhlemann, K. (1999) Nucl. Acids Res. 27, 917-918. Hashimoto, S., Suzuki, T., Dong, H. Y., Nagai, S., Yamazaki, N. and Matsushima, K. (1999) Blood 94, 845-852. Hashimoto, S., Suzuki, T., Dong, H. Y., Yamazaki, N. and Matsushima, K. (1999) Blood 94, 837-844. Kal, A. J., van Zonneveld, A. J., Benes, V., van den Berg, M., Koerkamp, M. G., Albermann, K., Strack, N., Ruijter, J. M., Richter, A., Dujon, B., Ansorge, W. and Tabak, H. F. (1999) Mol. Biol. Cell 10, 1859-1872. Hibi, K., Liu, Q., Beaudry, G. A., Madden, S. L., Westra, W. H., Wehage, S. L., Yang, S. C., Heitmiller, R. F., Bertelsen, A. H., Sidransky, D. and Jen, J. (1998) Cancer Res. 58, 5690-5694. Chen, H., Centola, M., Altschul, S. F. and Metzger, H. (1998) 188, 1657-1668. He, T. C., Sparks, A. B., Rago, C., Hermeking, H., Zawel, L., da Costa, L. T., Morin, P. J., Vogelstein, B. and Kinzler, K. W. (1998) Science 281, 1509-1512. Hermeking, H., Lengauer, C., Polyak, K., He, T. C., Zhang, L., Thiagalingam, S., Kinzler, K. W. and Vogelstein, B. (1997) Mol. Cell 1, 3-11.
27
28 29
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48
49 50 51
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SAGE
52 53 54 55 56 57 58 59 60 61 62 63 64
65 66 67 68 69 70 71 72 73
217
Gonzalez-Zulueta, M., Ensz, L. M., Mukhina, G., Lebovitz, R. M., Zwacka, R. M., Engelhardt, J. F., Oberley, L. W., Dawson, V. L. and Dawson, T. M. (1998) J. Neurosci. 18, 2040-2055. Polyak, K., Xia, Y., Zweier, J. L., Kinzler, K. W. and Vogelstein, B. (1997) Nature 389, 300-305. Madden, S. L., Galella, E. A., Zhu, J., Bertelsen, A. H. and Beaudry, G. A. (1997) Oncogene 15, 1079-1085. Zhang, L., Zhou, W., Velculescu, V. E., Kern, S. E., Hruban, R. H., Hamilton, S. R., Vogelstein, B. and Kinzler, K. W. (1997) Science 276, 1268-1272. Kammula, U. S., Lee, K. H., Riker, A. I., Wang, E., Ohnmacht, G. A., Rosenberg, S. A. and Marincola, F. M. (1999) J. Immunol. 163, 6867-6875. He, T. C., Chan, T. A., Vogelstein, B. and Kinzler, K. W. (1999) Cell 99, 335-345. Inoue, H., Sawada, M., Ryo, A., Tanahashi, H., Wakatsuki, T., Hada, A., Kondoh, N., Nakagaki, K., Takahashi, K., Suzumura, A., Yamamoto, M. and Tabira, T. (1999) Glia 28, 265-271. Yu, J., Zhang, L., Hwang, P. M., Rago, C., Kinzler, K. W. and Vogelstein, B. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 14517-14522. Virlon, B., Cheval, L., Buhler, J. M., Billon, E., Doucet, A. and Elalouf, J. M. (1999) Proc. Nat. Acad. Sci. U.S.A. 96, 15286-15291. van den Berg, A., Visser, L. and Poppema, S. (1999) Amer. J. Pathol. 154, 16851691. Ryo, A., Suzuki, Y., Ichiyama, K., Wakatsuki, T., Kondoh, N., Hada, A., Yamamoto, M. and Yamamoto, N. (1999) FEBS Lett. 462, 182-186. Matsumura, H., Nirasawa, S. and Terauchi, R. (1999) Plant J. 20, 719-726. Michiels, E. M., Oussoren, E., Van Groenigen, M., Pauws, E., Bossuyt, P. M., Voute, P. A. and Baas, F. (1999) Physiol. Genomics 1, 83-91. Inadera, H., Hashimoto, S., Dong, H. Y., Suzuki, T., Nagai, S., Yamashita, T., Toyoda, N. and Matsushima, K. (2000) Biochem. Biophys. Res. Commun. 275, 108114. Hashimoto, S. I., Suzuki, T., Nagai, S., Yamashita, T., Toyoda, N. and Matsushima, K. (2000) Blood 96, 2206-2214. Suzuki, T., Hashimoto, S., Toyoda, N., Nagai, S., Yamazaki, N., Dong, H. Y., Sakai, J., Yamashita, T., Nukiwa, T. and Matsushima, K. (2000) Blood 96, 2584-2591. Neilson, L., Andalibi, A., Kang, D., Coutifaris, C., Strauss, J. F., Stanton, J. A. and Green, D. P. (2000) Genomics 63, 13-24. Pauws, E., Moreno, J. C., Tijssen, M., Baas, F., de Vijlder, J. J. and Ris-Stalpers, C. (2000) J. Clin. Endocrinol. Metab. 85, 1923-1927. Yamashita, T., Hashimoto, S., Kaneko, S., Nagai, S., Toyoda, N., Suzuki, T., Kobayashi, K. and Matsushima, K. (2000) Biochem. Biophys. Res. Commun. 269, 110-116. Parle-McDermott, A., McWilliam, P., Tighe, O., Dunican, D. and Croke, D. T. (2000) Brit. J. Cancer 83, 725-728. St Croix, B., Rago, C., Velculescu, V., Traverso, G., Romans, K. E., Montgomery, E., Lal, A., Riggins, G. J., Lengauer, C., Vogelstein, B. and Kinzler, K. W. (2000) Science 289, 1197-1202. Seth, A., Lee, B. K., Qi, S. and Vary, C. P. (2000) J. Bone Miner. Res. 15, 1683-1696.
218
H.S. SCOTT AND R. CHRAST
74
Chrast, R., Scott, H. S., Papasavvas, M. P., Rossier, C., Antonarakis, E. S., Barras, C., Davisson, M. T., Schmidt, C., Estivill, X., Dierssen, M., Pritchard, M. and Antonarakis, S. E. (2000) Genome Res. 10, 2006-2021. Caron, H., Schaik, B. v., Mee, M. v. d., Baas, F., Riggins, G., Sluis, P. v., Hermus, M.-C., Asperen, R. v., Boon, K., Voute, P. A., Heisterkamp, S., Kampen, A. v. and Versteeg, R. (2001) Science 291, 1289-1351. Basrai, M. A., Velculescu, V. E., Kinzler, K. W. and Hieter, P. (1999) Mol. Cell. Biol. 19, 7041-7049. Hibi, K., Westra, W. H., Borges, M., Goodman, S., Sidransky, D. and Jen, J. (1999) Amer. J. Pathol. 155, 711-715. Zhou, W., Sokoll, L. J., Bruzek, D. J., Zhang, L., Velculescu, V. E., Goldin, S. B., Hruban, R. H., Kern, S. E., Hamilton, S. R., Chan, D. W., Vogelstein, B. and Kinzler, K. W. (1998) Cancer Epidemiol. Biomarkers Prev. 7, 109-112. Datson, N. A., van der Perk-de Jong, J., van den Berg, M. P., de Kloet, E. R. and Vreugdenhil, E. (1999) Nucl. Acids Res. 27, 1300-1307. Peters, D. G., Kassam, A. B., Yonas, H., O'Hare, E. H., Ferrell, R. E. and Brufsky, A. M. (1999) Nucl. Acids Res. 27, e39. Velculescu, V. E., Madden, S. L., Zhang, L., Lash, A. E., Yu, J., Rago, C., Lal, A., Wang, C. J., Beaudry, G. A., Ciriello, K. M., Cook, B. P., Dufault, M. R., Ferguson, A. T., Gao, Y., He, T. C., Hermeking, H., Hiraldo, S. K., Hwang, P. M., Lopez, M. A., Luderer, H. F., Mathews, B., Petroziello, J. M., Polyak, K., Zawel, L. and Kinzler, K. W. (1999) Nat. Genet. 23, 387-388. Lal, A., Lash, A. E., Altschul, S. F., Velculescu, V., Zhang, L., McLendon, R. E., Marra, M. A., Prange, C., Morin, P. J., Polyak, K., Papadopoulos, N., Vogelstein, B., Kinzler, K. W., Strausberg, R. L. and Riggins, G. J. (1999) Cancer Res. 59, 54035407. Lash, A. E., Tolstoshev, C. M., Wagner, L., Schuler, G. D., Strausberg, R. L., Riggins, G. J. and Altschul, S. F. (2000) Genome Res. 10, 1051-1060. Hastie, N. D. and Bishop, J. O. (1976) Cell 9, 761-774. Ishii, M., Hashimoto, S., Tsutsumi, S., Wada, Y., Matsushima, K., Kodama, T. and Aburatani, H. (2000) Genomics 68, 136-143. Nacht, M., Ferguson, A. T., Zhang, W., Petroziello, J. M., Cook, B. P., Gao, Y. H., Maguire, S., Riley, D., Coppola, G., Landes, G. M., Madden, S. L. and Sukumar, S. (1999) Cancer Res. 59, 5464-5470. van Kampen, A. H., van Schaik, B. D., Pauws, E., Michiels, E. M., Ruijter, J. M., Caron, H. N., Versteeg, R., Heisterkamp, S. H., Leunissen, J. A., Baas, F. and van Der Mee, M. (2000) Bioinformatics 16, 899-905. Margulies, E. H. and Innis, J. W. (2000) Bioinformatics 16, 650-651. Larsson, M., Stahl, S., Uhlen, M. and Wennborg, A. (2000) Genomics 63, 341-353. Stollberg, J., Urschitz, J., Urban, Z. and Boyd, C. D. (2000) Genome Res. 10, 12411248. van den Berg, A., van der Leij, J. and Poppema, S. (1999) Nucl. Acids Res. 27, el 7. Chen, J. J., Rowley, J. D. and Wang, S. M. (2000) Proc. Nat. Acad. Sci. U.S.A. 97, 349-353. Ratilainen, T., Holmen, A., Tuite, E., Haaima, G., Christensen, L., Nielsen, P. E. and Norden, B. (1998) Biochemistry 37, 12331-12342. Ray, A. and Norden, B. (2000) Faseb J. 14, 1041-1060.
75 76 77 78 79 80 81
82
83 84 85 86 87 88 89 90 91 92 93 94
GLOBAL TRANSCRIPT EXPRESSION PROFILING BY SAGE
95 96 97
219
Ryo, A., Kondoh, N., Wakatsuki, T., Hada, A., Yamamoto, N. and Yamamoto, M. (1998). Nucl. Acids Res. 26, 2586-2592. White, O. and Kerlavage, A. R. (1996) Meth. Enzymol. 266, 27-40. Gunnersen, J. M., Spirkoska, V., Smith, P. E., Danks, R. A. and Tan, S. S. (2000) Glia 32, 146-154.
INDEX aas see amino acyl residues ABC-type uptake permeases, 8 A1/BFL-1, 13, 17, 24 ACV see acyclovir Acyclovir, 114, 116, 120-124 Adenine nucleotide translocator, 14, 15 AE see anchoring enzyme Affymetrix chips, 211 Agrobacterium tumefaciens and 79 Alcaligenes eutrophus ChrA, 5 Alcohol oxidase, 158-162, 164, 165 alphavirus, 83, 84 Amino acid/polyamine organocation, 3, 4, 8 Amino acyl residues, 5 Aminolysis-hydrolysis ratio, 51-53 63 Anchoring exzyme, 203, 204, 207, 211 Angiostatin from Pichia pastoris, 166 ANT see adenine nucleotide translocator AOX see alcohol oxidase Apaf-1 see apoptosis-associated factor-1 APC see amino acid/polyamine/ organocation Apoptosis, 209 Apoptosis-associated factor-1, 14, 15, 19-21 Apoptosis repression, 11-33 Apoptosome, 20 Aquaporins, 3 Arabidopsis thaliana homologue, 130 MCM, 139, 140 PROLIFERA, 139 ARC, 19 Archaea InBase, 174 inteins, 174, 177, 180 MCM homologues, 132, 139, 140, 148, 149 transporters, 7, 8 ArsAB see ATP hydrolysis-driven active arsenite efflux pumps Arthralgia from alphavirus, 83 ASF-IAP, 18 Aspartame, 51, 55
Aspergillus nidulans MCM, nimQ, 134 ATPase superfamily P-type, 7 ATP-dependent protease, 175 ATP hydrolysis-driven active arsenite efflux pumps, 6, 7 AttB, 74-76, 78 AttP, 75 Baby hamster kidney cells, 64, 99 Bacillus circulans chitin binding domain, 183 Bacillus subtilis permease homologue, 5 Backbone cyclization, 191-193 Bacterial/Archaeal Transporter, 6, 8 Bacteria transporters, 7, 8 Baculoviral IAP repeat, 19 BAD, 13, 14, 17 BAK, 12-16 BAT see Bacterial/Archaeal Transporter BAX, 12-17, 20 Bayesian statistical analysis, 212 BCL-2 homology, 12, 13, 17 Bcl-2 knockout mice, 12 BCL-2 protein family, 12-14, 16, 17, 20 and p53, 16 Bcl-3, 40 BCL-W, 13 BCL-XL, 12-14, 16, 17, 24 BCL-Xs, 13 BH see BCL-2 homology BH domain, 18 BHK see baby hamster kidney BH3-only, 12, 13, 15 BHRF1, 13 BID, 12, 13, 15, 16 BIK, 12, 13, 15 BIK/NBK, 13 BIM, 13, 14 BIM/BOD, 13 BIR see baculoviral IAP repeat bla, 72, 73 Blasticidin resistance of Pichia pastoris, 166 BLK, 13
221
222
bob1 of S. cerevisiae, 138, 146 BOK/MTD, 13 BOO/DIVA, 13, 14 BRUCE, 18 Bystander effect, 115-117, 123 C230 see catechol-2, 3-dioxygenase Caenorhabditis elegans Cdtlp homologue, 130 Ced-9 gene, 12, 14 Calcineurin, 17 Calcitonin, 63, 65 Calnexin, 48 Camp-dependent protein kinase, 17 Cancer cell gene expression, 209 Candidiasis, 116 Capsid protein, 83-97, 100-106 CARD see caspase recruitment domain Copase see cysteine aspartate-specific protease Caspase recruitment domain, 18, 19 Castanospermine, 61 Catechol-2, 3-dioxygenase, 76 CBP/p300, 36 ccdB gene for cytotoxin, 79 CD see cytosine deaminase cdE2, 89, 90, 99, 100, 102, 104, 106 cDNA difference analysis, 201 library construction, 201 microarray, 202 synthesis, 205 CDK see cyclin-dependent kinase Cdtlp homologues, 130 CED-3 of C. elegans, 22 CED-4 and CED-9 of C. elegans, 14 Ceramide, 104 c-FLIPS, 22 CHD see chitin binding domain Chinese hamster ovary, 63 Chitin binding domain, 183 Chitobiose, 46, 60
INDEX
Chlamydomonas eugametos ClpP protease intein, 180 CHO see Chinese hamster ovary Cholesterol and alphavirus fusion, 104 CHR see chromate resistance Chromate resistance family, 3-5, 8 Chryseobacterium meningosepticum EndoF,60 c-IAP1, 17-21, 24 c-IAP2, 17-21, 24 Ci-IAP, 18 Cisternae, 89 CLP see core-like particle CNTF, 37 codA see cytosine deaminase gene Collagen in Pichia pastoris, 162 Core-like particle, 88, 91-94 CP see capsid protein CREB, 17 c-Rel, 40 CrmA see cytokine response modifier A Cryo-electron microscopy, 83, 86, 87, 89 Cryo-EM see cryo-electron microscopy Cs-BIR-1, 18 Cs-BIR-2, 18 CSF-1, 37 Cyclin-dependent kinase, 130, 133, 137, 138, 144, 146, 147 Cysteine aspartate-specific protease, 14, 15, 20, 21 binding to p35, 22, 25 Cytochrome c release in apoptosis, 14-16, 20, 21 Cytokine response modifier A, 22 Cytokines and transcription, 35-44 Cytosine deaminase from E. coli, 113, 116118, 124 from yeast, 117 gene, 117 Death effector domain, 20, 22 Death-inducing signaling complex, 20-22 DEBCL, 13
INDEX
DED see death effector domain Deinococcus radiodurans helicase intein, 177, 181 deoA see thymidine phosphorylase gene Deoxynojirimycin, 61 Deterin, 18 Dexamethasone, 25 DHFR see dihydrofolate reductase D-IAP1, 18 D-IAP-2, 18 Dihydrofolate reductase, 191 Dimethyl suberimidate, 92 Diplococcus pneumoniae see Streptococcus pneumoniae Dirofilaria immitis Paramyosin, 178 DISC see death-inducing signaling complex DiTAG amplification, 205-207, 210, 212 DMS see dimethyl suberimidate, 92 DMT see drug/metabolite transporter DnaB helicase, 147, 175 DNA polymerase, 114, 119, 175 DNA recognition region, 178 DNA replication of eukaryotics and MCM proteins, 129-155 DNA shuffling, 119 Down syndrome, 209 Drosophila melanogaster CDC, 131 CDK, 137, 139 Cdtlp homologue, 130 Hedgehog protein intein, 177, 178 MCM 131, 132, 134, 138, 139, 146 ORC, 131 DRR see DNA recognition region Drug/metabolite transporter superfamily, 3, 4, 6, 8 D3112 transposons, 74 dut see dUTPase gene dUTPase gene, 117 Eastern equine encephalitis virus, 99 E. coli DNA polymerase III tau subunit, 177 EEE see Eastern equine encephalitis virus
223
EGAD see Expressed Gene Anatomy Database EGF, 37 EiB-19K, 13 Elastase, 55 Encephalitis from alphavirus, 83 Endo-A of Arthrobacter, 63 for synthesis of C- and Nglycopeptides, 65 Endo-C, Endo-D, Endo-E, Endo-F, EndoH, Endo-M, 60-62, 65 Endo-D, 99 Endo-H, 99 Endoglycosidase, 60, 63, 65 Endoplasmic reticulum, 84, 85, 97-101, 104, 162 and BCL-2 family, 14 Endostatin from Pichia pastoris, 166 EPL see expressed protein ligation EPO, 37 El, E2 and E2 proteins of alphaviruses, 84, 86, 88, 89, 97, 99, 100, 103-105 ER see endoplasmic reticulum ERK/SAPK, 36, 38 Error-prone PCR, 119 EST see expressed sequence tag Ethanol in Pichia pastoris, 162 in Saccharomyces cerevisiae, 162 Etoposide and apoptosis, 17, 24 Eukaryote transporters, 7, 8 Exoglycosidase, 60 Expressed Gene Anatomy Database, 213 Expressed protein ligation, 188-191, 193 Expressed sequence tag, 202, 203, 210, 211,214 Extein, 57, 58, 171-185, 188 FADD see Fas-associated death domain Fas and apoptosis, 17, 21, 23, 24 Fas-associated death domain, 20-22 Fas/CD95/Apo-l, 20 5FC see 5-fluorocytosine 5FdUMP, 116-118 Filter array, 202, 209, 211
224
Flavobacterium meningosepticum see Chryseobacterium meningosepticum FLD1 gene of Pichia pastoris, 164, 165 FLICE-inhibitory protein, 22 FLIP see FLICE-inhibitory protein Flp recombinase from yeast, 74 target, 75, 80 Fluorescein, 189 Fluorescence resonance energy transfer, 189 5-Fluorocytosine, 116-118 Fluorodeoxyuridine, 116 Fluorophores, 214 5-Fluorouracil, 116, 117 FRET see fluorescence resonance energy transfer FRT see Flp recombinase target 5FU see 5-fluorouracil 3, 47 Fucose, 46 Fucosyltransferase, 62, 63 FudR see fluorodeoxyuridine Galactose, 46, 48, 58 46 46 Galactosyltransferase, 61-63 phosphoryl-Thr, 46 GalNAc, 60 46 46 Ganciclovir, 114-116, 118, 120-124 GAP see glyceraldehydes-3-phosphate dehydrogenase GAS see sequence G-CSF, 37 GCV see ganciclovir GFP see green fluorescent protein GH, 37 GlcNAc, 46-48, 60, 62, 63 Glioma, 115, 122 Glucocorticoids and p35, 23 Glucose, 46 46
INDEX
46 Glycan, 45-49, 60, 61 Glyceraldehyde-3-phosphate dehydrogenase, 164 Glycerol channels, 3 Glycoproteins of alphavirus, 85-88, 97-101 Glycoprotein synthesis, 45-68 Glycosidase, 63, 65, 66 Glycosidase digestion, 60, 61 Glycosylation by Pichia pastoris, 166 Glycosylphosphatidylinositol, 46 Glycosyltransferases, 49, 61-63, 65, 66 Glycosynthase, 64 GM-CSF, 37 gmk see guanylase kinase gene Golgi, 85, 97, 99 Golgi cisternae, 89, 100 gp130, 37 GPI see glycosylphosphatidylinositol G855R, 41 Graft versus host disease, 116 Green fluorescence protein, 144, 193 Growth factors, 37 GRR, 38 Guanylase kinase gene, 114 Guanylase kinase protein, 114, 116 GVHD see graft versus host disease Gyrase of bacteria, 79 Gyrase subunit A intein, 175
HDA see high-density oligonucleotide array Hedgehog protein intein of Drosophila, 177, 178 Helicase, 147-149 Helix-loop-helix, 41 Herpes simplex virus type 1, 113-124 Hexosaminidase II, 60 Hexose, 46 High-density oligonucleotide array, 209 HINT of Drosophila, 177, 178 HLH see helix-loop-helix Homing endonuclease, 171-174, 176-178 in P. aeruginosa, 77
INDEX
HRK 13, 14 HRK/DP5, 13 HSV-1 see Herpes simplex virus type 1 HT1080 and 24 Human CDC homologue, 131 CDK, 144 Cdtlp, 130 MCM, 131, 132, 138-140, 142145, 148 NLS, 138, 143 ORC, 131 Transcriptome Map, 208 104
IAP see inhibitor of apoptosis proteins 39, 40 and 23 23 IET-1L, 24 IFN, 38 36, 37 36, 37 36 39, 41 IL-1, 2-7, 9, 11-13, 15, 37, 39-42 InBase, 171, 174-177 Inhibitor of apoptosis proteins, 17-21 Intein, 49, 57-59, 61, 171-193 Interferon induced transcription factor complex, 36, 38 Interferon Stimulated Response Element, 36, 38 IPTG see IRAK, 39 ISGF see interferon induced transcription factor complex 70, 74, 76, 77 ISRE see Interferon Stimulated Response Element Jak1, Jak2, Jak3, 36-40 Jak/STAT pathway, 35, 38, 39
225
6K, 88, 97-99, 103, 105 and apoptosis, 24, 25 Klenow fragment, 119 KS-BCL-2, 13 KT2440 in P. putida, 78 LacNAc, 46, 58, 61 lac operon promoters, 69, 70 of E. coli, 72, 73, 79 of E. coli, 75, 76, 80 46 Licensing factor of yeast, 132 LIF, 37 LIF-ALL library, 121, 123 LMW5-HL, 13 LVAD of CrmA, 22 LysE see lysine exporter Lysine exporter superfamily, 3 MadCAM, 61, 62 Major facilitator, 3 superfamily, 3-5, 8 Major intrinsic protein, 3 Maltose binding protein, 58, 189, 190, 193 63 Manganese superoxide dismutase, 24 65 Mannose, 46-48, 99 46, 47 Mannosidase II of Jack bean, 60 46 MAP see methionyl-aminopeptidase MAP see mitogen-activated protein MAPKAP-K1 see mitogen-activated kinase-activated protein MAPKKK, 39, 41 MBP see maltose binding protein MC see mitochondrial carrier MCL-1, 13 MCM see minichromosome maintenance MCM5, 36 MCM proteins, 129-155 in hexameric complex, 140-149 MCS see multiple cloning site
226
MEKK1, 39, 41 Methanobacterium thermoautotrophicum MCM, 140 Methanococcus jannaschii KlbA intein, 177, 180, 181 MJ0968 protein, 7 phosphoenol pyruvate synthase intein, 180 Replication Factor C gene, 174 RNA polymerase subunit A intein, 180 Methanol in Pichia pastoris, 158, 164 Methionyl-aminopeptidase, 191, 192 Methylchymotrypsin, 51, 52 Methylsubtilism, 51, 52 MF see major facilitator MFS see major facilitator superfamily Microarrays, 211, 214 Minichromosome maintenance protein complex, 129-155 Mini-CTX vectors, 74 -LacM15, 17, 74, 95 -T7, 76 MIP see major intrinsic protein Mitochondria and apoptosis, 14, 15 Mitochondria BCL-2, 14 Mitochondrial carrier, 3 Mitogen-activated kinase-activated protein, 17 Mitogen-activated protein, 17 Mja see Methanococcus jannaschii M-MLV reverse transcriptase, 210 M-MLV RNase H2 reverse transcriptase, 210 Monte Carlo simulation, 212 Mouse CDC, 131 Cdtlp homologue, 130 MCM, 131, 132, 137-139, 148
INDEX
NLS, 134, 137 ORC, 131 Mucin, 46, 48 Mucor hiemalis and Endo-M, 60 Multiple cloning site, 164, 183 Mxe see Mycobacterium xenopi Mycobacterium aviam inteins, 177, 181 leprae inteins, 177, 181 xenopi gyrase A subunit intein, 177, 178, 181 MyD88, 39 N-acetylgalactosamine, 46, 48 N-acetylglucosamine, 49, 60 NAIP, 17-19 NANAase III, 60 Native chemical ligation, 187, 188 NC see nucleocapsid core NCL see native chemical ligation ndk see nucleoside diphosphokinase gene NEMO, 41 Nerve growth factor and p35, 23 NES see nuclear export sequence N-ethylmaleimide-sensitive fusion protein, 142 Neuron BAX, 14 NIK, 39, 41 35, 39-42 activation of c-IAPs and TRAF1 and 2,21 and BCL-2 protein family, 16, 17 transcription factors, 23, 24 NIP3, 13 NIX/BNIP3, 13 N-linked glycans, 45, 46, 60 NLS see nuclear localization signal Non-structural protein, 84 NR13, 13, 24 NSF see N-ethylmaleimide-sensitive fusion protein nsP see non-structural protein Nuclear envelope BCL-2, 14 Nuclear export sequence, 40
INDEX
227
pERD20/pERD21, 70 Permeability transition pore, 14, 15 Permease evolution, 2, 5-9 PEST, 40 PESTFIND program, 133 PEST sequences, 133, 134 O-linked glycans, 45, 46, 60 pET-15b, 73 PEX8 gene promoter from Pichia pastoris, O-GlcNAc, 46 165 Oligonucleotide microarray, 202 pEX30, 70-72 Open reading frame, 159, 164 16, 13 PFGE see pulsed field gel of B. subtilis encoding electrophoresisPfu RIR1-1 see Pyrococcus furiosus permease homologue, 5 ORC see origin recognition complex ribonucleoside-diphosphate reductase intein ORF see open reading frame Origin recognition complex, 129-131 p-guanidinophenyl ester, 57 OSM, 37 Phage promoters, 69, 70 Phosphoenol pyruvate synthase of p35, 22, 23 Methanococcus jannaschii, 180 p48, 36 Phosphorylation and BCL-2 protein family, p50, 40 16, 17 p65, 40 PIAS proteins, 40 p53, 209 Pichia pastoris expression as vector, 157p53 and BCL-2 proteins, 16, 17 169 Palmitic acid, 99 pJB137, 70 Palmitoyl coenzyme A, 99 pJB653, 70 PA01, 76, 77 PKA see cAMP-dependent protein kinase pAO815, 159, 163 pLV, 70 Papain, 51 pMMB66EH, 70 PBAD of E. coli, 79 PNAs see peptide nucleic acids p-nitrophenyl glycoside, 63 pBluescript, 71-73 pBSP II KS/pBSP II SK, 70, 72, 73 pol see DNA polymerase PCNA see proliferating cell nuclear antigen polyA sites and tails, 212, 213 PCR see polymerase chain reaction Polymerase chain reaction, 201, 207, 210, PDGF, 37 212 PE2, 85, 88, 89, 97, 98, 100, 103 errors, 213 pEB8, 70 pPLGN1, 70 PEP see phosphoenol pyruvate synthase of preRC see pre-replicative complex Methanococcus jannaschii Pre-replicative complex, 130 PEP4 in Pichia pastoris, 163 pRK415, 70 PRL, 37 pEP31,70, 73 pR01600, 69, 70, 72, 73 Peptide ligation, 59 nucleic acids, 213 Pro-caspase-8, 21, 22 Nuclear localization signal, 40 inMCMs, 133, 134, 137, 148 Nucleocapsid core, 84-97, 99, 100-106 Nucleoside diphosphokinase gene, 114, 117
228
Pro-caspase-9, 15, 19-21 Proliferating cell nuclear antigen, 145, 146, 150 Protease and peptide ligation, 39 Protein intron see intein Protein splicing, 171-199 pSE380, 72 P-selectin glycoprotein ligand, 64 Pseudomonas host systems, 69-81 Pseudomonas aeruginosa, 69, 71, 72, 74, 76-79 putida, 69, 78 pSF2, 72, 73 PSGL-1 see P-selectin glycoprotein ligand Psp see Pyrococcus sp. PTP see permeability transition pore pUCP, 70, 71 pUCP18 and pUCP19, 71 pUCP20T, 72 pUCP21T, 73 pUCP26, 73 pUCPKS/pUCPSK, 70, 71 pUCP-Nde, pUCP-Nco, 70, 71 Pulsed field gel electrophoresis, 138 Purkinje cell BAX, 14 pVLT, 70 pWWO, 69 pyrH see uridylate kinase gene Pyrococcus furiosus ribonucleosidediphosphate reductase intein, 177, 178, 189, 190, 193 kodakaraensis DNA polymerase intein, 177 sp. Pol intein, 174, 185 RAIDD, 19 Random sequence mutagenesis, 118-125 Rash from alphavirus, 83 RBS see ribosome binding site RecA intein, 175 Regulatory volume increase, 24 RelA, RelB, 40, 41 Rel homology domain, 40 rep, 72, 73
INDEX
Replication factor C intein, 175 Replication licensing factor, 140 Resistance-nodulation-division superfamily, 3, 4 Rhamnose, 46 RHD see rel homology domain Rhodococcus globerulus oxygenase, 73 Ribonuclease B, 60-62 Ribonucleoside-diphosphate reductase intein, 175 Ribonucleotide kinase gene, 117 Ribosome binding site, 72 RLF see replication licensing factor RING domain, 18-20 RIP, 39, 41 RIP2, 19 RK2, 69-71 RND see resistance-nodulation-division rnr see ribonucleotide kinase gene Ross River virus, 86, 87, 89, 102, 103 rrnB of E. coli, 72 RRV see Ross River virus RSF1010, 69, 70 RSK see mitogen-activated kinaseactivated protein RVI see regulatory volume increase SABRE see Selective Amplification via Biotin and Restriction-mediated Enrichment Saccharides, 45, 46, 48 Saccharomyces cerevisiae CDC, 131, 142 cytosine deaminase, 117 genetic analysis, 129 IAP, 18, 19 MCM, 131, 132, 134, 137-139, 141144, 146, 147, 149, 150 NLS, 137 ORC, 131 Pho84, 5 SAGE data, 209 similarity with Pichia pastoris, 158 vacuolar ATPase, 171, 172, 175, 177179, 181, 183
INDEX
zinc finger motifs, 134 SAGE see seial analysis of gene expression SAGElite, 210 Salk Institute Biotechnology/Industrial Associates, Inc., 158 SAR see structure-activity relationships See VMA gene see Saccharomyces cerevisiae vacuolar ATPase Schizosaccharomyces pombe CDC, 131 genetic analysis, 129 IAP, 18, 19 MCM, 131, 139, 141-144, 146-150 NLS, 143 ORC, 131 PEST sequences, 134 Walker A domain, 134-136 Walker B domain 134, 136 zinc finger motifs, 134 Sc-IAP, 18 SCP see single cell protein Selective Amplification via Biotin and Restriction-mediated Enrichment, 201 Semi-random mutants, 124 Semliki Forest virus, 86, 87, 94, 96, 99, 102, 104 Ser32, Ser36, 40 Serial analysis of gene expression, 201-219 Serine protease inhibitor, 22 Serpin see serine protease inhibitor Sf-IAP, 18 SFV see Semliki Forest virus SH2 see src-homology domain 2 Sh ble gene of E. coli, 163, 166 SHP1, 40 Sialic acid, 46, 48 Sialyl Lewis X 60-62, 64 Sialyltransferase, 62, 63 SEBIA see Salk Institute Biotechnology/ Industrial Associates, Inc. SICLOPPS see split intein-mediated cyclic ligation of peptides and proteins Signalsome, 39 Sindbis virus, 83-88, 90, 91, 94, 95, 97-99, 103, 104
229
Single cell protein, 158 SINV see Sindbis virus Smac/DIABLO, 20, 21 Small multidrug resistance, 6 SMR see small multidrug resistance SOCS proteins, 40 Solid phase peptide synthesis, 62, 187 Sp-IAP, 18 Spingolipid, 104 Split intein-mediated cyclic ligation of peptides and proteins, 191-193 SPPS see solid phase peptide synthesis 26S proteasome, 41 SR see semi-random mutants src-homology domain 2, 36, 38, 40 Src kinase, 58 STAT proteins, 35-40 Staurosporine and apoptosis, 17, 25 Streptavidin magnetic beads, 203, 204 Streptococcus pneumoniae Endo-D, 60 Structure-activity relationships, 187 Subtiligase, 59 Subtilisin, 51-56, 62 Suicide gene therapy, 113-127 Survivin, 17-19 SV40 DNA replication, 129 T antigen as helicase, 147 NLS, 143 Synechocystis sp. DnaB intein, 181 DnaE intein, 180, 181, 188, 191, 193 gyrase B intein, 177 TAD see transactivation domain TAK1, 39, 41 Tagging enzyme, 205 TAGs, 202-204, 207, 208, 211-213 TC see transporter classification tdk see thymidine kinase gene TE see tagging enzyme Tetramethylrhodamine, 189 TF antigen, 47 41, 42 Thapsigargin, 25
INDEX
230
Thermococcus aggregans inteins, 174 Thermolysin, 55 Thiolprotease, 53 Thiolsubtilisin, 53, 54 thyA see thymidylate synthase gene Thymidine kinase from HCV-1, 113-124 Thymidine kinase gene, 117 Thymidine phosphorylase gene, 117 Thymidylate synthase gene, 117 TK see thymidine kinase TMS see transmembrane spanner Tn antigen, 47 TNF, 39-41 and c-IAP2, 24 and apoptosis, 17, 21, 23, 24 TNFR1, 21, 41 Tn5 transposons, 74 Togaviridae virus family, 83 Topoisomerase, 175 TPO, 37 T7 polymerase, 70 T7 promoters, 69, 70 TRADD, 21, 39, 41 TRAF1, TRAF2 and TRAF6, 21, 24, 39, 41, 42 Transactivation domain, 36 Transcriptome, 202, 208-211, 214 Transglycosylation, 64 Translation initiation factor, 175 Transmembrane proteins, 2 Transmembrane spanner, 3, 4, 6, 8, 9 Transporter classification, 2, 3, 5-7 TRAP-T transporter family, 6 Tumor necrosis factor see TNF Tyk2, 36-38 Tyrosine kinase, 189 UBC see ubiquitin-conjugating domain Ubiquitin-conjugating domain, 18 Ubiquitin within RING domain, 20 udk see uridine kinase gene udp see uridine phosphorylase gene UniGene, 209, 212
Untranscribed region, 159, 164 Uridine kinase gene, 117 Uridine phosphorylase gene, 117 Uridylate kinase gene, 117 UTR see untranscribed region Vacuolar ATPase, 171, 172, 175, 177-179, 181, 183 Valinomycin, 25 VDAC see voltage-dependent anion channel Vectors for Pseudomonas, 69-81 VEE see Venezuelan equine encephalitis virus Venezuelan equine encephalitis virus, 99 Vesicular stomatitis virus, 104 VIC see voltage-gated ion channel Voltage-dependent anion channel, 14, 15 Voltage-gated ion channel superfamily, 3, 4, 8 Walker A and Walker B, 147 Xanthomonas ssp. and PBAD, 79 Xenopus laevis CDC, 131 CDK, 137 Cdtlp homologue, 130 cell-free system, 129, 132 MCM, 131, 132, 137-140, 142, 144-148 NLS, 137 ORC, 131 RLF, 140 Xgal X-IAP, 17-20 X-SCID, 39 46 46 YitZ protein of B. subtilis like permease, 5 YPT1 gene promoter from Pichia pastoris, 165
INDEX
Zeocin, 163, 166 Zinc finger protein, 24
231
Zinc finger motifs in MCMs, 133, 134, 136, 137, 139 Z test statistics, 212